Likelihood Ratio Policy Gradient via Importance Sampling

Connection between Likelihood ratio policy gradient method and Importance sampling method. ...

May 25, 2022 · 5 min · Trung H. Nguyen

Planning & Learning

Recall that when using dynamic programming (DP) method in solving reinforcement learning problems, we required the availability of a model of the environment. Whereas with Monte Carlo methods and temporal-difference learning, the models are unnecessary. Such methods with requirement of a model like the case of DP is called model-based, while methods without using a model is called model-free. Model-based methods primarily rely on planning; and model-free methods, on the other hand, primarily rely on learning. ...

May 19, 2022 · 7 min · Trung H. Nguyen

Policy Gradient Theorem

So far in the series, we have been choosing the actions based on the estimated action value function. On the other hand, we can instead learn a parameterized policy, $\boldsymbol{\theta}$, that can select actions without consulting a value function by updating $\boldsymbol{\theta}$ on each step in the direction of an estimate of the gradient of some performance measure w.r.t $\boldsymbol{\theta}$. Such methods are called policy gradient methods. ...

May 4, 2022 · 8 min · Trung H. Nguyen

The Exponential Family, Generalized Linear Models

Notes on Exponential Family & Generalized Linear Models. ...

April 4, 2022 · 14 min · Trung H. Nguyen

Eligible Traces

Beside $n$-step TD methods, there is another mechanism called eligible traces that unify TD and Monte Carlo. Setting $\lambda$ in TD($\lambda$) from $0$ to $1$, we end up with a spectrum ranging from TD methods, when $\lambda=0$ to Monte Carlo methods with $\lambda=1$. ...

March 13, 2022 · 25 min · Trung H. Nguyen

Function Approximation

All of the tabular methods we have been considering so far might scale well within a small state space. However, when dealing with Reinforcement Learning problems in continuous state space, an exact solution is nearly impossible to find. But instead, an approximated answer could be found. ...

February 11, 2022 · 21 min · Trung H. Nguyen

Temporal-Difference Learning

So far in this series, we have gone through the ideas of dynamic programming (DP) and Monte Carlo. What will happen if we combine these ideas together? Temporal-difference (TD) learning is our answer. ...

January 31, 2022 · 21 min · Trung H. Nguyen