RL: Chapter 1: Introduction Flashcards
Reinforcement learning
Learning what to do - how to map situations to actions - so as to maximise a numerical reward signal.
The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them.
Main challenge in reinforcement learning vs other types
Exploration vs exploitation.
The agent has to exploit what it has already experienced in order to obtain reward.
But it also has to explore in order to make better action selections in the future.
6 Main subelements of a reinforcement learning system
- Agent
- Environment
>
- Policy
- Reward signal
- Value function
- A model of the environment
6 Main subelements of a reinforcement learning system
Policy
Defines the learning agent’s way of behaving at a given time.
Roughly, a policy is a mapping from perceived states of the environment to actions to be taken when in those states.
6 Main subelements of a reinforcement learning system
Reward signal
Defines the goal of a reinforcement learning problem.
On each time step, the environment sends to the reinforcement learning agent a single number called the rewards. The agent’s sole objective is to maximize the total reward it receives over the long run.
6 Main subelements of a reinforcement learning system
Value function
Whereas the reward signal indicates what is good in an immediate sense, a value function specifies what is good in the long run.
The value of a state is the total amount of reward an agent can expect to accumulate over the future, starting from that state.