Temporal consistency loss & Ape-X DQfD

An algorithm consists of three components: the transformed Bellman operator, the temporal consistency (TC) loss and the combination of Ape-X DQN and DQfD to learn a more consistent human-level policy. ...

March 12, 2024 · 4 min · Trung H. Nguyen

MuZero

January 2, 2024 · 5 min · Trung H. Nguyen

AlphaZero

October 17, 2023 · 11 min · Trung H. Nguyen

Multi-agent Deep Deterministic Policy Gradient

May 25, 2023 · 5 min · Trung H. Nguyen

Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic

Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...

December 27, 2022 · 11 min · Trung H. Nguyen

Deterministic Policy Gradients

Notes on Deterministic Policy Gradient algorithms ...

December 2, 2022 · 12 min · Trung H. Nguyen

Trust Region Policy Optimization

Notes on policy optimization using trust region method. ...

November 23, 2022 · 12 min · Trung H. Nguyen