Lecture 6 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is one of the main functions of the brain? And how is this done?

A

To gather new information and to update old information. This is done by reinforcement learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The Rescorla-Wagner rule

A

This rule probes learning through a prediction error (PE), which is the difference between the experienced outcome (R: positive social feedback or no positive feedback) and expected outcome (V) for each trial.

PE takes the form PE = R - V and can be used to subsequently update expected outcome weighted by a fixed learning rate: ∝ (alfa): Vt + 1 = Vt + ∝PE for given trial t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which forms can reinforcement learning have?

A

Classic and operant conditioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Pavlovian fear condition

A
  • Classic conditioning
  • Stimulus - outcome association
  • This process is not dependent on behaviour
  • Behavioural response is ‘innate’ (reflexive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Instrumental condition

A
  • Operant conditioning
  • Stimulus - action (behaviour) - outcome association
  • Dependent on behaviour
  • Reinforced behaviour is voluntary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Probabilistic outcomes

A

The same action (behaviour) does not always lead to the same outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reinforcement learning concepts: reward, value, state and action

A
  • Reward (R) = experienced outcome
  • Value (Q) = expectation of the outcome
  • State (s) = current world state, for example if something is present or not
  • Action (a) = choice behaviour
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we use reinforcement learning in our daily life?

A

We use reinforcement learning to update expectations over the course of multiple experiences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Prediction error

A

The difference between the experienced outcome R and your expectations Q(s, a).

Prediction error = R - Q(s, a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for updated expectations for the next timepoint?

A

𝑄(𝑠,𝑎)𝑡 + 1 =𝑄(𝑠,𝑎)𝑡 + [𝑅𝑡 − 𝑄(𝑠,𝑎)𝑡]

of

V𝑡+1 = V𝑡 + [𝑅𝑡 − V]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What affects the size of the prediction error?

A

Learning rate.

Some people learn faster than others. Learning speed is captured by a learning rate (∝):

𝑄(𝑠, 𝑎)𝑡+1=𝑄(𝑠,𝑎)𝑡+ ∝ ∗[𝑅𝑡− 𝑄(𝑠, 𝑎)𝑡]

of

V𝑡+1=V𝑡+ ∝ ∗[𝑅𝑡− V]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What happens to the prediction error when there is a larger learning rate?

A

A large learning rate makes the prediction error larger, which means you update the information more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when you have a bad experience?

A

A bad experience somewhere makes them stay away.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Inverse temperature

A

The extent to which behaviour is guided by value differences. The bigger the number is, the better the option.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens when the inverse temperature numbers are close to each other?

A

Value differences matter more to some people than others. But generally: the smaller the value differences are, the more indifferent you might be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens when the value difference of inverse temperature is high?

A

When the value difference is high, the choice is easier made than when the value difference is close to 0. However, people vary in how consistent they are in their choices based on what they care about. When the inverse temperature is high, you will always choose the choice you prefer the most.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happens when the inverse temperature is lower?

A

It means that the behaviour is more random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is R?

A

Reward, also called outcome (positive or negative feedback).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Q?

A

Value/Expected reward, the expected outcome of an action

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a?

A

Action, the behaviour

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is s?

A

State, the situation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is [𝑅𝑡 − 𝑄(𝑠,𝑎)𝑡] of [𝑅𝑡 − V𝑡]?

A

Prediction error, the difference between the actual reward and the expected reward (sometimes denoted as δ).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does a positive prediction error mean?

A

That the actual outcome is higher than the expected outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does a negative prediction error mean?

A

That the actual outcome is lower than the expected outcome.

25
Q

What is ∝?

A

Learning rate, the speed of learning.

26
Q

What is Qt+1?

A

Updated value at the next time point.

27
Q

What is 𝑄(𝑠,𝑎)𝑡 + 1 = 𝑄(𝑠,𝑎)𝑡 + ∝ ∗ [𝑅𝑡 − 𝑄(𝑠,𝑎)𝑡] OF V𝑡 + 1 = V𝑡 + ∝ ∗ [𝑅𝑡 − V𝑡]?

A

The current expected reward plus current prediction error weighted by the learning rate.

28
Q

What is inverse temperature?

A

Decision noise

29
Q

Do people have different learning rates for positive vs negative prediction errors?

A

Some people learn faster from bad experiences than from positive experiences.

30
Q

What is reflected in brain function during learning?

A

Reinforcement learning algorithms

31
Q

Joe goes to his usual ice cream store and orders his favorite flavor, blueberry. He did not know the store switched owners and the blueberry ice cream is less good than Joe expected. What is the difference between his expectation and the actual flavor called?

A reward
B value
C positive prediction error
D negative prediction error

A

D negative prediction error

32
Q

Despite his initial disappointment, Joe keeps trying the less good blueberry ice cream and keeps being disappointed on his next few visits. What could cause Joe’s behavior and his disappointment?

A a slow learning rate for negative prediction errors
B a fast learning rate for negative prediction errors
C a slow learning rate for positive prediction errors
D a fast learning rate for positive prediction errors

A

A) a slow learning rate for negative prediction errors

33
Q

Tom participates in a bet with friends. He expects to win €100 but he actually wins €20. Please describe the action, the reward, the value, and the prediction error.

A
  • Action: Participating in the bet with friends.
  • Reward: The actual amount Tom wins, which is €20.
  • Value: The expected amount Tom anticipated winning, which is €100.
  • Prediction error: The difference between the expected value and the actual reward. In this case, it is €20 - €100 = -€80, indicating a negative prediction error.
34
Q

A classic reinforcement learning algorithm

A

A classic reinforcement learning algorithm assumes that individuals learn by incrementally updating their estimates of the value of taking different actions in different states. The extent to which an individual updates her value estimate at each time point is governed by her surprise — the difference between the reward she receives by taking a specific action, and her estimate of the amount of reward she thought she would receive. This difference, the reward prediction error, is then scaled by her learning rate and added to her prior estimate. Formally, this process can be expressed as:
𝑸(𝒔,𝒂)𝒕+𝟏=𝑸(𝒔,𝒂)𝒕+ ∝ ∗[𝒓𝒕− 𝑸(𝒔,𝒂)𝒕] of V𝒕 + 𝟏 = V𝒕 + ∝ ∗ [𝒓𝒕 − V𝒕]

35
Q

What signals prediction errors?

A

Dopamine cells in the brain, located in the Ventral Tegmental Area (VTA).

36
Q

Ventral Tegmental Area (VTA)

A

A very small region, close to the brain stem. It contains the cell bodies of dopamine neurons.

37
Q

How do dopamine cells respond to the presence of unexpected reward?

A

The cells respond, there are more dopamine cells that fire right after the reward than before.

38
Q

How do dopamine cells respond to the predictive cue and to the reward after conditioning?

A

The cells respond to the time that the conditioned cue is shown, not to the time that the actual reward is given. When the reward is given, dopamine levels do not change much compared to the baseline because the reward is expected.

39
Q

How do dopamine cells respond to the absence of an expected reward?

A

When the reward is not given, the dopamine cells respond to the absence of the expected reward by not firing. This also suggests that animals respond to some extent to negative prediction error.

40
Q

What are the limitations of studies on the VTA?

A
  • VTA cells are too small to measure with fMRI, and they are very deeply located in the brain.
  • The invasive measures that are needed cannot be used on humans.
  • Animal studies do use cell recordings with invasive measures, which helps us to get more important insights in these cells.
41
Q

The striatum

A

A larger region crucial for learning from feedback. It can be divided into two parts: the ventral and the dorsal striatum.

42
Q

What does the ventral striatum respond to?

A
  • Unexpected reward magnitude.
  • The reward-predictive stimulus.
  • Greater reactivity when a reward is different than expected (prediction errors).
  • Before learning, the ventral striatum responds at the time of the reward.
  • After learning, the ventral striatum responds at the time of the reward-predictive stimulus. So, not the timing of the reward but the timing of the stimulus.
  • Pavlovian conditioning.
43
Q

What does the dorsal striatum respond to?

A
  • Responds to reward but especially early in learning and when there is a mapping between actions and outcomes.
  • Instrumental conditioning.
44
Q

Which part of the striatum is implicated in learning from reward?

A

Both the dorsal and ventral striatum are implicated in learning from reward.

  • Dorsal striatum is more strongly implicated in instrumental conditioning.
  • Ventral striatum is more strongly implicated in Pavlovian conditioning.
45
Q

The Orbitofrontal cortex (OFC)

A
  • It is a part of the medial prefrontal cortex (mPFC).
  • The OFC processes the outcome value.
  • The OFC responds to processing outcomes, like reward magnitude at the time we experience a reward.
46
Q

What is the difference between in what the ventral striatum and the OFC respond to?

A

The ventral striatum responds to the predicted stimulus and not to the outcome itself.

So, the ventral striatum responds to the stimulus in anticipation of the reward, while the OFC responds to the experienced outcome.

47
Q

Pavlovian conditioning

A

Learning a stimulus-outcome association.

48
Q

Instrumental conditioning

A

Learning a stimulus-action-outcome association.

49
Q

Neural representation of concepts in reinforcement learning: reward, value, prediction error and learning speed

A
  • Reward (also outcome or feedback) - mPFC, often OFC.
  • Value - VS after learning, updated via the mPFC.
  • Prediction error - VS (and DS for early stage of learning and for action values).
  • Learning speed - affects prediction error magnitude and value update.
50
Q

What can reinforcement learning help us understand?

A

Brain function at a mechanistic level

51
Q

Van den Bos et al. (2010) on positive and negative reinforcement learning in the brain of children, adolescents and adults

A

Children have higher negative learning rates than adolescents and it is the lowest for adults. This means that children learn faster from negative prediction errors than adolescents, and adolescents learn faster from negative prediction errors than adults.

Adults learn faster from positive prediction errors than children and adolescents. In children and adolescents, the positive learning rate is equal.

52
Q

Van den Bos et al. (2012) on functional connectivity

A

They found that functional connectivity between VS (which tracks prediction errors) and mPFC (which tracks the outcomes) is higher for a positive prediction error than for a negative prediction error. The relative strength becomes stronger with age, which means that there is an increase of learning from positive prediction errors compared to negative prediction errors. The connectivity is negatively correlated with the negative learning rate.

53
Q

What is the negative side of Van den Bos et al. (2012)?

A

There are inconsistent findings and not every paper finds this.

54
Q

True or false: learning rates are not static traits

A

True

55
Q

What is the ‘best’ learning rate?

A

It depends on the environment.

56
Q

What does it mean to have a really high learning rate?

A

That you update your expectations quickly. This is not always a good thing because outcomes can be different across situations.

57
Q

How can you become better at fine-tuning your learning rate to the environment?

A

Throughout development (Nussenbaum & Hartley)

58
Q

Summary

A
  • Reinforcement learning helps us understand and approximate brain function.
  • Different neural responses exist for components of reinforcement learning.
  • Behavioral findings can help us understand the development of brain function during learning.