Optimal Policy Existence

In the previous note about Markov Decision Processes, Bellman equations, we mentioned that there exists a policy $\pi_*$ that is better than or equal to all other policies. In this note, we will be proving that. ...

July 10, 2021 · 7 min · Trung H. Nguyen

Markov Decision Processes, Bellman equations

You may have known or heard vaguely about a computer program called AlphaGo - the AI has beaten Lee Sedol - the winner of 18 world Go titles. One of the techniques it used is called self-play against its other instances, with Reinforcement Learning. ...

June 27, 2021 · 5 min · Trung H. Nguyen