q-learning

Maximum Entropy Reinforcement Learning via Soft Q-learning & Soft Actor-Critic

Notes on Entropy-Regularized Reinforcement Learning via SQL & SAC ...

Deep Q-learning

Notes on DQN and its variants. ...

Planning & Learning

Recall that when using dynamic programming (DP) method in solving reinforcement learning problems, we required the availability of a model of the environment. Whereas with Monte Carlo methods and temporal-difference learning, the models are unnecessary. Such methods with requirement of a model like the case of DP is called model-based, while methods without using a model is called model-free. Model-based methods primarily rely on planning; and model-free methods, on the other hand, primarily rely on learning. ...

Temporal-Difference Learning

So far in this series, we have gone through the ideas of dynamic programming (DP) and Monte Carlo. What will happen if we combine these ideas together? Temporal-difference (TD) learning is our answer. ...