Multi-agent Deep Deterministic Policy Gradient

May 25, 2023 · 5 min · Trung H. Nguyen

Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic

Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...

December 27, 2022 · 11 min · Trung H. Nguyen

Deterministic Policy Gradients

The generalization of policy gradient theorems into deterministic case and corresponding policy gradient algorithms. ...

December 2, 2022 · 12 min · Trung H. Nguyen

Trust Region Policy Optimization

A model-free RL algorithm that ensures stable and efficient policy updates by optimizing within a trust region, limiting the step size to prevent drastic policy changes and improve convergence. ...

November 23, 2022 · 12 min · Trung H. Nguyen

Policy Gradient

Notes on Policy gradient methods. ...

October 6, 2022 · 4 min · Trung H. Nguyen

Likelihood Ratio Policy Gradient via Importance Sampling

Connection between Likelihood ratio policy gradient method and Importance sampling method. ...

May 25, 2022 · 5 min · Trung H. Nguyen

Policy Gradient Theorem

So far in the series, we have been choosing the actions based on the estimated action value function. On the other hand, we can instead learn a parameterized policy, $\boldsymbol{\theta}$, that can select actions without consulting a value function by updating $\boldsymbol{\theta}$ on each step in the direction of an estimate of the gradient of some performance measure w.r.t $\boldsymbol{\theta}$. Such methods are called policy gradient methods. ...

May 4, 2022 · 8 min · Trung H. Nguyen