Policy-Gradient
Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic
Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...
Deterministic Policy Gradients
Notes on Deterministic Policy Gradient algorithms ...
Trust Region Policy Optimization
Notes on policy optimization using trust region method. ...
Policy Gradient
Notes on Policy gradient methods. ...
Likelihood Ratio Policy Gradient via Importance Sampling
Connection between Likelihood ratio policy gradient method and Importance sampling method. ...
Policy Gradient Theorem
So far in the series, we have been choosing the actions based on the estimated action value function. On the other hand, we can instead learn a parameterized policy, $\boldsymbol{\theta}$, that can select actions without consulting a value function by updating $\boldsymbol{\theta}$ on each step in the direction of an estimate of the gradient of some performance measure w.r.t $\boldsymbol{\theta}$. Such methods are called policy gradient methods. ...