my-rl

Model-based RL with latent variable models

Model-based RL methods that learn latent-variable models instead of trying to predict dynamics models in the observed space. The learned world model then can be used in planning effectively rather than being less efficiently, for instance in visual-based tasks, generating images for future time steps and feed them back into the model to predict the next ones, which requires more computation. ...

MuZero

AlphaGo, AlphaGo Zero, AlphaZero

Model-based RL methods that use Monte Carlo Tree Search for planning and ultilize self-play mechanism for training. ...

Multi-agent Deep Deterministic Policy Gradient

Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic

Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...

Deterministic Policy Gradients

The generalization of policy gradient theorems into deterministic case and corresponding policy gradient algorithms. ...

Trust Region Policy Optimization

A model-free RL algorithm that ensures stable and efficient policy updates by optimizing within a trust region, limiting the step size to prevent drastic policy changes and improve convergence. ...