Introduction to RL. Multiarmed bandits Flashcards
Reinforcement Learning is both a class of ___ and a class of ___
problems
algorithms
Policy
Mapping between what the agent is seing and what the agent chooses to do
Rewards
(immediate )nomerical single that provides the agent what good or bad actions are
Agent goal is to get as much reward as possible
Value Functions
Long term functions of reward
We need to see if the agent lives that long on the long term
Models
(of the problem/environment)
State
represents the relevant information to solve the task
Actions
what the agent can do
Goal
Draws the behaviour of the agent (rewards)
Dynamics
Describe how the actions of the agent influence the environment
The agent does not know the ___ and the ___
Goal
Dynamics
The agent showld interact with (or explore / exploit) the environment and figure out what the goal is and the dynamic is
…