Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic
Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...
Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...
Notes on DQN and its variants. ...
Recall that when using dynamic programming (DP) method in solving reinforcement learning problems, we required the availability of a model of the environment. Whereas with Monte Carlo methods and temporal-difference learning, the models are unnecessary. Such methods with requirement of a model like the case of DP is called model-based, while methods without using a model is called model-free. Model-based methods primarily rely on planning; and model-free methods, on the other hand, primarily rely on learning. ...
So far in this series, we have gone through the ideas of dynamic programming (DP) and Monte Carlo. What will happen if we combine these ideas together? Temporal-difference (TD) learning is our answer. ...