Reinforcement learning (RL) is pretty simple in theory – “take actions, get rewards, increase likelihood of high reward actions”. However, we can quickly runs into subtle problems that don’t show up in standard supervised learning. The aim of this post is to give a gentle, concrete introduction to what RL actually is, why we might want to use it instead of (or alongside) supervised learning, and some of the headaches (figure 1) that come with it: sparse rewards, credit assignment, and reward shaping.

Figure 1: I’d like to help take you from confusion/headache 🙁 (left) to having a least some clarity 🙂 (right) with regard to what reinforcement learning is and where its useful
Rather than starting with Atari or robot arms, we’ll work through a small toy environment: a paddle catching falling balls. It’s simple enough to understand visually, but rich enough to show how different reward designs can lead to completely different behaviours, even when the underlying environment and objective are the same. Along the way, we’ll connect the code to the standard RL formalism (MDPs, returns, policy gradients), so you can see how the equations map onto something you can actually run.
Continue reading
