What is reinforcement learning ?
Reinforcement learning is learning what to do — how to map situations to actions — so that numerical signal which is called a reward can be maximized.
In Reinforcement Learning, an agent is placed in a poorly understood, possibly stochastic, and nonstationary environment. The agent then interacts with the environment at discrete time steps. At every time step, the agent can observe and change the environment’s state through its actions . In addition to state changes, the environment responds to the agent’s actions with a reward , a scalar quantity that represents the immediate utility of taking a given action in a given state . The agent’s objective is to develop a mapping from states to actions called a policy that maximizes its long-term return .
In Reinforcement learning, the learner is a decision-making agent that takes actions in an environment and receives reward (or penalty) for its actions in trying to solve a problem.
After a set of trial-and error runs, it should learn the best policy, which is the sequence of actions that maximize the total reward.
Agent acts on its environment, it receives some evaluation of it’s action but not told which action is correct to archive its goal
The decision maker has a set of actions possible. Once an action is chosen and taken, the state changes.
The solution to the task requires a sequence of actions, and we get feedback, in the form of a reward rarely, generally only when the complete sequence is carried out.