Reinforcement learning originates in animal psychology, where trial and error are used to learn. Early AI researchers believed that this method might be applied in computers and learn how to map states (the environment) to actions.
In 1951, Marvin Minsky created the first example of reinforcement learning to imitate a rat learning to solve a maze (implemented with vacuum tubes that represented the 40 neurons of the simulated rat brain).
The synapses were reinforced as the robotic rat worked through the maze, thanks to its capacity to escape.
Reinforcement learning has had some success forty years later.
In 1992, IBM researcher Gerald Tesauro used reinforcement learning to create TD-Gammon , a backgammon player.
Tesauro employed temporal-difference learning (TD lambda) to train an 80-hidden-unit neural network. TD-Gammon picked up the backgammon game without knowing anything about it and honed its skills through self-play.
TD-Gammon competed against top human players and discovered new methods for the game.