model-free
Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic
Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...
Deterministic Policy Gradients
The generalization of policy gradient theorems into deterministic case and corresponding policy gradient algorithms. ...
Trust Region Policy Optimization
A model-free RL algorithm that ensures stable and efficient policy updates by optimizing within a trust region, limiting the step size to prevent drastic policy changes and improve convergence. ...
Deep Q-learning
Notes on DQN and its variants. ...