Lecture 6 Flashcards
What is one of the main functions of the brain? And how is this done?
To gather new information and to update old information. This is done by reinforcement learning.
The Rescorla-Wagner rule
This rule probes learning through a prediction error (PE), which is the difference between the experienced outcome (R: positive social feedback or no positive feedback) and expected outcome (V) for each trial.
PE takes the form PE = R - V and can be used to subsequently update expected outcome weighted by a fixed learning rate: ∝ (alfa): Vt + 1 = Vt + ∝PE for given trial t.
Which forms can reinforcement learning have?
Classic and operant conditioning
Pavlovian fear condition
- Classic conditioning
- Stimulus - outcome association
- This process is not dependent on behaviour
- Behavioural response is ‘innate’ (reflexive)
Instrumental condition
- Operant conditioning
- Stimulus - action (behaviour) - outcome association
- Dependent on behaviour
- Reinforced behaviour is voluntary
Probabilistic outcomes
The same action (behaviour) does not always lead to the same outcome.
Reinforcement learning concepts: reward, value, state and action
- Reward (R) = experienced outcome
- Value (Q) = expectation of the outcome
- State (s) = current world state, for example if something is present or not
- Action (a) = choice behaviour
How do we use reinforcement learning in our daily life?
We use reinforcement learning to update expectations over the course of multiple experiences.
Prediction error
The difference between the experienced outcome R and your expectations Q(s, a).
Prediction error = R - Q(s, a)
What is the formula for updated expectations for the next timepoint?
𝑄(𝑠,𝑎)𝑡 + 1 =𝑄(𝑠,𝑎)𝑡 + [𝑅𝑡 − 𝑄(𝑠,𝑎)𝑡]
of
V𝑡+1 = V𝑡 + [𝑅𝑡 − V]
What affects the size of the prediction error?
Learning rate.
Some people learn faster than others. Learning speed is captured by a learning rate (∝):
𝑄(𝑠, 𝑎)𝑡+1=𝑄(𝑠,𝑎)𝑡+ ∝ ∗[𝑅𝑡− 𝑄(𝑠, 𝑎)𝑡]
of
V𝑡+1=V𝑡+ ∝ ∗[𝑅𝑡− V]
What happens to the prediction error when there is a larger learning rate?
A large learning rate makes the prediction error larger, which means you update the information more.
What happens when you have a bad experience?
A bad experience somewhere makes them stay away.
Inverse temperature
The extent to which behaviour is guided by value differences. The bigger the number is, the better the option.
What happens when the inverse temperature numbers are close to each other?
Value differences matter more to some people than others. But generally: the smaller the value differences are, the more indifferent you might be.
What happens when the value difference of inverse temperature is high?
When the value difference is high, the choice is easier made than when the value difference is close to 0. However, people vary in how consistent they are in their choices based on what they care about. When the inverse temperature is high, you will always choose the choice you prefer the most.
What happens when the inverse temperature is lower?
It means that the behaviour is more random.
What is R?
Reward, also called outcome (positive or negative feedback).
What is Q?
Value/Expected reward, the expected outcome of an action
What is a?
Action, the behaviour
What is s?
State, the situation
What is [𝑅𝑡 − 𝑄(𝑠,𝑎)𝑡] of [𝑅𝑡 − V𝑡]?
Prediction error, the difference between the actual reward and the expected reward (sometimes denoted as δ).
What does a positive prediction error mean?
That the actual outcome is higher than the expected outcome.