Dopamine, reward and reinforcement Flashcards
What is dopamine?
· Dopamine is a type of neurotransmitter, can find it all over the brain.
· Important for learning, cognition, behaviour
Implicated in neurological and psychiatric disorders, Parkinson’s disease (death of dopamine neurons in the midbrain)
What was Pavlovian conditioning?
· Classical (Pavlovian) conditioning
· Learning phase: reward predicator (bell) + US reward (food) = UR response to reward – salivating
When association has been learned
CS reward predictor (bell) -> response to reward predictor alone (salivating)
What is operant conditioning?
· Consequences (e.g. getting reward) of a particular behaviour (e.g. pressing a lever) increase or decrease the probability of the behaviour occurring again.
· Rewarded -> INCREASE probability of the action being performed again
· Unrewarded -> DECREASE probability of the action being performed again
· Training: Every time the monkey presses the red button it gets food
· Learned behaviour: The monkey is more likely to press
the red button
Can behavioural functions act as rewards?
As we have seen, rewards can act as positive reinforcers by increasing the frequency and intensity of behaviour that leads to acquiring them and decreasing the frequency of behaviour that does not lead to reward.
What are negative reinforcers?
· These decrease the frequency of behaviour leading to their encounter and increase frequency of behaviour leading to their avoidance. Taking away the negative behaviour
· IMPORTANT: NEGATIVE REINFORCEMENT IS NOT PUNISHMENT.
· Punishment = immediate repercussions of a behaviour (you didn’t tidy up; housemate punishes you), different from negative reinforcement
Neg. reinforcement = strengthening behaviour to avoid repercussions (you tidied up; your housemate doesn’t punish you)
What is a prediction error?
· Build up expectation built on previous experience and the outcome is not what you were expecting
· Errors in the prediction of rewards signal the inappropriate (or appropriate) nature of the actions performed to obtain them.
· Reward Prediction Errors (RPEs) can be
- Positive, in response to unexpected rewards, better than you anticipated
- Negative, in response to the absence of a predicted reward, worse than you anticipated
In both cases, the outcomes are unexpected and are important for driving learning
What is an example of a prediction error?
· Yes: Got food – learning takes place, reward is bigger than expected
· No: Didn’t get food – learning takes place, negative predication error
However, if the monkey keeps seeing the green light and then always getting a reward, the reward is fully predicted, and the monkey is no longer learning
What is a learning curve?
· On the first trial, when the monkey gets a reward, a lot of learning takes place
· Less learning takes place as trials go on, updating from previous experiences
· Eventually the monkey makes the relationships between the button and the reward, learning stops taking place
Different shaped learning curves
What is the The Rescorla-Wagner model?
· This model explains the various forms of classical conditioning very reliably
· VCS(n+1)= VCS(n) + α (R– VCS(n))
· Value of the conditioned experience in the next trial (VCS(n+1))
· Alpha = learning rate, if learning is high the monkey learns the association quickly, if it is low, learning is slower
· This part of the equation represents the prediction error, the difference between the actual and expected outcome (i.e. how surprising is the presence of the US )
· A large error on trial 0 resulted in a large amount of learning on trial n
A smaller error on trial n resulted in a smaller level of learning on trial n+1
What are dopamine neurons?
· Dopamine is a type of neurotransmitter (it is within the subgroup catecholamines of monoamines)
· About a million nerve cells in the human brain contain dopamine (DA).
· Dopamine neurons secrete dopamine to cells that have dopamine receptors (the two subtypes include D1-like (D1/D5) and D2-like (D2/D3/D4))
…via several dopaminergic pathways.
What are the dopamine pathways in the brain?
· Originate the in the midbrain
· Two key dopamine pathways which are located in the midbrain
· Reward processing - mesolimbocortical pathway: ventral tegmental area to nucleus accumbens, cortex and hippocampus (midbrain to limbic system and cortex)
Motor control (Parkinson’s disease) – mesostriatal pathway (other half of the midbrain): substantia nigra to striatum (midbrain to striatum)
Dopamine neurons and RPEs – The classical view?
· The midbrain, particularly the ventral tegmental area (VTA), contains a large proportion of dopamine neurons
· These neurons project to many other regions in the brain. The striatum and anterior cingulate cortex are two such areas.
· Dopamine neurons respond phasically (large burst, at the same time) to the presentation of rewards (burst of firing following event).
· Tonic firing - slower
· Their activity is best explained in terms of learning about events that predict rewards.
- Three types of reward-related neuronal activity
- The presentation of an unexpected/more than expected reward (activation)
- Stimuli that predict rewards (activation)
The failure of an expected reward (inhibition)
How do Dopamine neurons respond to reward?
· Increase in neurons when reward is given
· Monkey has to reach into a port to get an apple, reward (apple is there) elicits a neuron response, whereas no reward (no apple) does not
· Unpredicted food rewards elicit activity in dopamine neurons
This neuronal activity is specific to rewarding stimuli
Prediction errors in dopamine neurons?
· Monkeys learned a task using apple juice for rewards – with more learning less activity is predicted. Dopamine neurons are not activated anymore if the stimulus has been learnt
· Monkey makes the association that pressing a lever results in a reward, negative prediction error when the lever is pressed and no reward is received
· Before the task, dopamine neurons responded to unpredicted rewards
· During learning, the reward became increasingly predictable, and neuronal activity gradually decreased to baseline levels
· There was no phasic activity when fully predictable rewards were delivered.
When expected rewards were omitted, there was a phasic decrease in activity at the time that the reward was expected
Reward prediction in dopamine neurons?
- Schultz et al, (1997)
- Electrophysiological response
- During learning the dopamine response switched from the reward (US) to the reward predictor (CS)