Dopamine, reward and reinforcement Flashcards by I M

What is dopamine?

· Dopamine is a type of neurotransmitter, can find it all over the brain.
· Important for learning, cognition, behaviour
Implicated in neurological and psychiatric disorders, Parkinson’s disease (death of dopamine neurons in the midbrain)

How well did you know this?

Not at all

Perfectly

What was Pavlovian conditioning?

· Classical (Pavlovian) conditioning
· Learning phase: reward predicator (bell) + US reward (food) = UR response to reward – salivating

When association has been learned
CS reward predictor (bell) -> response to reward predictor alone (salivating)

How well did you know this?

Not at all

Perfectly

What is operant conditioning?

· Consequences (e.g. getting reward) of a particular behaviour (e.g. pressing a lever) increase or decrease the probability of the behaviour occurring again.
· Rewarded -> INCREASE probability of the action being performed again
· Unrewarded -> DECREASE probability of the action being performed again
· Training: Every time the monkey presses the red button it gets food
· Learned behaviour: The monkey is more likely to press
the red button

How well did you know this?

Not at all

Perfectly

Can behavioural functions act as rewards?

As we have seen, rewards can act as positive reinforcers by increasing the frequency and intensity of behaviour that leads to acquiring them and decreasing the frequency of behaviour that does not lead to reward.

How well did you know this?

Not at all

Perfectly

What are negative reinforcers?

· These decrease the frequency of behaviour leading to their encounter and increase frequency of behaviour leading to their avoidance. Taking away the negative behaviour
· IMPORTANT: NEGATIVE REINFORCEMENT IS NOT PUNISHMENT.
· Punishment = immediate repercussions of a behaviour (you didn’t tidy up; housemate punishes you), different from negative reinforcement
Neg. reinforcement = strengthening behaviour to avoid repercussions (you tidied up; your housemate doesn’t punish you)

How well did you know this?

Not at all

Perfectly

What is a prediction error?

· Build up expectation built on previous experience and the outcome is not what you were expecting
· Errors in the prediction of rewards signal the inappropriate (or appropriate) nature of the actions performed to obtain them.
· Reward Prediction Errors (RPEs) can be
- Positive, in response to unexpected rewards, better than you anticipated
- Negative, in response to the absence of a predicted reward, worse than you anticipated
In both cases, the outcomes are unexpected and are important for driving learning

How well did you know this?

Not at all

Perfectly

What is an example of a prediction error?

· Yes: Got food – learning takes place, reward is bigger than expected
· No: Didn’t get food – learning takes place, negative predication error
However, if the monkey keeps seeing the green light and then always getting a reward, the reward is fully predicted, and the monkey is no longer learning

How well did you know this?

Not at all

Perfectly

What is a learning curve?

· On the first trial, when the monkey gets a reward, a lot of learning takes place
· Less learning takes place as trials go on, updating from previous experiences
· Eventually the monkey makes the relationships between the button and the reward, learning stops taking place
Different shaped learning curves

How well did you know this?

Not at all

Perfectly

What is the The Rescorla-Wagner model?

· This model explains the various forms of classical conditioning very reliably
· VCS(n+1)= VCS(n) + α (R– VCS(n))
· Value of the conditioned experience in the next trial (VCS(n+1))
· Alpha = learning rate, if learning is high the monkey learns the association quickly, if it is low, learning is slower
· This part of the equation represents the prediction error, the difference between the actual and expected outcome (i.e. how surprising is the presence of the US )
· A large error on trial 0 resulted in a large amount of learning on trial n
A smaller error on trial n resulted in a smaller level of learning on trial n+1

How well did you know this?

Not at all

Perfectly

What are dopamine neurons?

· Dopamine is a type of neurotransmitter (it is within the subgroup catecholamines of monoamines)
· About a million nerve cells in the human brain contain dopamine (DA).
· Dopamine neurons secrete dopamine to cells that have dopamine receptors (the two subtypes include D1-like (D1/D5) and D2-like (D2/D3/D4))
…via several dopaminergic pathways.

How well did you know this?

Not at all

Perfectly

What are the dopamine pathways in the brain?

· Originate the in the midbrain
· Two key dopamine pathways which are located in the midbrain
· Reward processing - mesolimbocortical pathway: ventral tegmental area to nucleus accumbens, cortex and hippocampus (midbrain to limbic system and cortex)
Motor control (Parkinson’s disease) – mesostriatal pathway (other half of the midbrain): substantia nigra to striatum (midbrain to striatum)

How well did you know this?

Not at all

Perfectly

Dopamine neurons and RPEs – The classical view?

· The midbrain, particularly the ventral tegmental area (VTA), contains a large proportion of dopamine neurons
· These neurons project to many other regions in the brain. The striatum and anterior cingulate cortex are two such areas.
· Dopamine neurons respond phasically (large burst, at the same time) to the presentation of rewards (burst of firing following event).
· Tonic firing - slower
· Their activity is best explained in terms of learning about events that predict rewards.
- Three types of reward-related neuronal activity
- The presentation of an unexpected/more than expected reward (activation)
- Stimuli that predict rewards (activation)
The failure of an expected reward (inhibition)

How well did you know this?

Not at all

Perfectly

How do Dopamine neurons respond to reward?

· Increase in neurons when reward is given
· Monkey has to reach into a port to get an apple, reward (apple is there) elicits a neuron response, whereas no reward (no apple) does not
· Unpredicted food rewards elicit activity in dopamine neurons
This neuronal activity is specific to rewarding stimuli

How well did you know this?

Not at all

Perfectly

Prediction errors in dopamine neurons?

· Monkeys learned a task using apple juice for rewards – with more learning less activity is predicted. Dopamine neurons are not activated anymore if the stimulus has been learnt
· Monkey makes the association that pressing a lever results in a reward, negative prediction error when the lever is pressed and no reward is received
· Before the task, dopamine neurons responded to unpredicted rewards
· During learning, the reward became increasingly predictable, and neuronal activity gradually decreased to baseline levels
· There was no phasic activity when fully predictable rewards were delivered.
When expected rewards were omitted, there was a phasic decrease in activity at the time that the reward was expected

How well did you know this?

Not at all

Perfectly

Reward prediction in dopamine neurons?

Schultz et al, (1997)
Electrophysiological response
During learning the dopamine response switched from the reward (US) to the reward predictor (CS)

How well did you know this?

Not at all

Perfectly

The VTA in humans

Study These Flashcards

What about in the human VTA?
D’Ardenne et al. (2008) – Subjects played a game whilst undergoing an fMRI scan.
Task: Guess whether the number on the right of the screen would be greater or less than the number on the left (max 10).
They won $1 if they guessed correctly and lost $1 if they guessed incorrectly.
They found activity in the VTA that increased when they unexpectedly got the $1 and was greater as the probability of reward decreased
No response for negative prediction errors.
Still a hotly debated topic (See Duzel et al., 2009)

Evidence against the classical view of dopamine neurons?

Study These Flashcards

· New data suggests dopamine isn’t just crucial for reward learning and doesn’t just signal RPEs
· Some studies have extended the hypothesis to include other influences being integrated with reward prediction, such as goal- directed movement (Syed et al., 2016).
· Other studies have shown that dopamine neurons signal during cognitive tasks, which doesn’t fit with the classical view (Matsumoto & Takada, 2013)
And further studies have shown a role for dopamine in effort-based decision-making (see Phillips, Walton et al., 2007)

VTA dopamine neurons signal cognitive motivation (WM task)?

Study These Flashcards

Delayed match to sample task
Reward predicting stimuli (different cues mean different rewards, e.g. blue or red dot)
Target stimulus (classical view – dopamine neurons should not fire here)
Find the target – dopamine neurons should not fire
Outcome
Matsumoto et al 2013

Dopamine neurons signal for a target stimulus ?

Study These Flashcards

Using electrophysiology to measure dopamine neurons
There is a large increase in activity in response to the sample
Task difficulty: larger response when the easy search array is presented, vs the medium or hard search array

Further areas involved in learning?

Study These Flashcards

· Expectation of future rewards
· A learnt reward predictor stimulus induces a state of expectation
· A neuronal correlate of this expectation of reward might be the sustained neuronal activity that follows the presentation of the reward predictor stimulus
Such “reward-expectation” neurons are found in the monkey and rat striatum.

The Striatum?

Study These Flashcards

· A major target of dopamine neurons in the brain
· The striatum is a part of the basal banglia
· Can be divided into dorsal and ventral striatum.
· Dorsal: two adjacent but anatomically separated groups of neurons:
– The caudate
– The putamen
· Ventral:
– Nucleus accumbens
– Olfactory turbercle – very small tends to be ignored

The nucleus accumbens – dopamine dependent RPEs?

Study These Flashcards

· Pessiglione et al., (2006) – Choice between two abstract stimuli associated with different probabilities of winning £1
· Participants administered with either L-Dopa or Haloperidol
· The signal in the Ventral Striatum was consistent with the RPEs
· L-Dopa enhanced the size of the signal and its behavioural effects
Haloperidol reduced the magnitiude of the RPE signal and its behavioural effects

But how ‘pure’ is the encoding of the RPE in the nucleus accumbens?

Study These Flashcards

Have previously discussed difference between classical and instrumental conditioning:
Schultz’s study = classical conditioning
But what if animals have to do something for reward? Does initiating an action influence dopamine release?

Action initiation influences the dopamine signal?

Study These Flashcards

Syed et al, 2016
Animals had to make or withhold action for a reward
Put their nose in a hole to start the trial, then have to press a lever to trigger a reward
Other half of the trial, cue indicates that the rat should not move and keep their nose in the nose poke
Should get a prediction error to response to both trials, action itself should not make much difference
Small reward on offer = smaller prediction error, vice versa for large reward
Positive prediction error for small reward trials
No go trials = no dopamine release at all, even when a large reward is on offer

Anterior Cingulate Cortex (ACC)?

· Another major target of dopamine neurons is the anterior and mid portions of the Cingulate Cortex · Engaged by many different processes, but also contains neurons which respond to rewarding stimuli. The regions shown in red are engaged during cognitive processes

RPEs in the ACC?

· Kennerley, Behrens & Wallis (2011) · Recorded from the ACC when monkeys were making choices between reward probabilities, had to choose between the two different stimuli They found neurons that responded to positive PEs, separate neurons that responded to negative PEs and a third set of neurons that responded to both.

RPEs in the human ACC?

· Ribas-Fernandes (2011) – fMRI in humans performing a task where there is a subgoal, leading to an overall rewarding outcome. · Have to place a pretend bus to a certain area on a map in the most efficient way possible · Bus would randomly move closer or further away from the location · Prediction errors for more abstract tasks such as sub-goals · They found activity in the ACC signalled prediction errors for the individual actions leading to a goal. Thus, prediction errors in the ACC may relate to the performance of individual actions that lead to a goal.

Drug rewards and dopamine?

· Studies have identified the dopamine system and the ventral striatum, including the nucleus accumbens as some of the critical structures on which most types of drug abuse depends (review: Pierce & Kumaresan, 2006) · We have seen these structures implicated in reward processing above Do drugs modify responses to natural rewards or constitute rewards in their own right – engaging existing mechanisms?

Dopamine self-administration?

· Rats hooked up intravenously or intracranially to equipment that allows for dopamine to be released when they make a lever press (an operant response) · Over time rats start to repeatedly self-administer dopamine to themselves. They find dopamine, in itself, rewarding Dopamine itself can be rewarding

How does cocaine affect dopamine neurons?

- Cocaine works directly on dopamine neurons by blocking dopamine re-uptake and hence increasing the concentration of dopamine at the synapse • Some neurons do indeed seem to treat drugs as rewards in their own right – dopamine neurons in the ventral striatum show phasic responses to the injection of drugs and display activity during the expectation of drugs, similar to natural rewards. • Drugs of abuse that mimic or boost the phasic dopamine reward prediction error might generate a powerful teaching signal and might even produce behavioural changes

Synaptic transmission without cocaine?

1. Synaptic transmission - Dopamine is released into the synapse, and binds with post-synaptic receptors 2. After synaptic transmission - Dopamine unbinds from the receptors and re-uptake takes place by the pre-synaptic neuron - Presynaptic receptors bind to the post synaptic receptors, presynaptic receptors are blocked by cocaine - The blocked dopamine re-binds to the receptors and as dopamine continues to be released, concentration at the synapse increases, causing more receptors to be bound

Cocaine and temporal coding

· As we have seen dopamine neurons also code temporal information (expectation of timing of reward). · So what happens when rats are asked to wait for a reward? · The cocaine rats cannot wait long enough. They are too motivated for getting the reward and are expecting the reward so much they cannot wait to press the lever.

Prediction errors beyond reward learning – learning how to be prosocial

· Same regions that are important for prosocial behaviour (making decisions to benefit others) receive projections from dopamine neurons that encode RPE · Reinforcement learning helps us learn what actions will benefit ourselves. · Can we use the same principle to understand how we learn what actions will benefit others? · Lockwood et al (2016) Choose whether to play for themselves, a friend or nobody – whatever they think will increase rewards - Most predictive of rewards for other people when empathy is higher - More empathetic people learn more quickly about outcomes for other people than those low in empathy - Subgenual ACC only encodes prosocial prediction errors. - Ventral striatum encodes all prediction errors regardless of who’s benefitting, regardless of who the reward is for (e.g. self, other or no one)

Dopamine, reward and reinforcement Flashcards

(33 cards)