Temporal consistency loss & Ape-X DQfD

An algorithm consists of three components: the transformed Bellman operator, the temporal consistency (TC) loss and the combination of Ape-X DQN and DQfD to learn a more consistent human-level policy. ...

March 12, 2024 · 4 min · Trung H. Nguyen

Deep Q-learning

Notes on DQN and its variants. ...

November 18, 2022 · 8 min · Trung H. Nguyen