posts

Value Function Approximation

All of the tabular methods we have been considering so far might scale well within a small state space. However, when dealing with Reinforcement Learning problems in continuous state space, an exact solution is nearly impossible to find. But instead, an approximated answer could be found. ...

Temporal-Difference Learning

So far in this series, we have gone through the ideas of dynamic programming (DP) and Monte Carlo. What will happen if we combine these ideas together? Temporal-difference (TD) learning is our answer. ...

Gaussian Distribution & Gaussian Network Models

Notes on Gaussian distribution & Gaussian network models. ...

Power Series

Recall that in the previous note, Infinite Series of Constants, we mentioned a type of series called power series a lot. In the content of this note, we will be diving deeper into details of its. ...

Infinite Series of Constants

Notes on infinite series of constants. ...

Monte Carlo Methods in Reinforcement Learning

Recall that when using Dynamic Programming algorithms to solve RL problems, we made an assumption about the complete knowledge of the environment. With Monte Carlo methods, we only require experience - sample sequences of states, actions, and rewards from simulated or real interaction with an environment. ...

Solving MDPs with Dynamic Programming

In two previous notes, MDPs and Bellman equations and Optimal Policy Existence, we have known how MDPs, Bellman equations were defined and how they worked. In this note, we are going to find the solution for the MDP framework with Dynamic Programming. ...