Hollerman et al: Dopamine neurons and Prediction Reward Flashcards
Paper topic
Dopamine neurons report an error in the
temporal prediction of reward during learning
Background
● Learning = making good predictions
● Computational approaches to learning involve iteratively reducing
prediction errors
○ E.g. ‘reward’ in the algorithm is distributed among earlier predictive events
● Dopamine neurons respond to reward
● Maybe dopamine ‘reward’ signals play the role of a reward prediction
error (RPE)
○ Difference between expected/real reward
Discrimination learning task
Animals released a resting key when a pair of pictures appeared, touched the lever
below the rewarded picture and received a drop of liquid. (b) Pictures used in the task. The same pair of two fractal pictures was used in all familiar trials (top). A new pair of two pictures was used in each block of learning trials (middle and bottom).
Animals were simultaneously presented with
two pictures. If they touched a lever below one of the pictures, they received a drop of liquid, whereas the other picture was not rewarded (Fig. 1). During the initial presentations, 75% of
dopamine neurons were activated when the reward occurred, comparable to other learning situations15,17,18. The same two pic-tures were presented repeatedly (varying randomly between left and right positions), and as the task was learned, the reward gradually ceased to activate dopamine neurons; instead, these neurons became responsive to presentation of the reward-predicting pictures, consistent with previous findings1
Important info from the discrimination learning task
● substantia nigra (SN) and the ventral tegmental area (VTA) dopaminergic neurons
● Reward prediction task
● Reward initially activates dopamine neurons
● After learning, dopamine neuron firing ‘shifts back’
○ No longer firing at reward, but at predictor of reward
Discrimination learning task but with complete learning episodes ==> learning curves and reward responses
For each new episode, a novel pair of pictures was presented, whereas all other
task components remained unchanged. Animals learned by trial
and error to associate one of the novel pictures with reward.
Reward responses of three dopamine neurons (a–c) during learning of pairs of novel pictures. Reward responses decreased after the learning criterion (second of four correct responses) was reached (arrowhead).
Learning pairs - important info
● Dopamine neurons fire after the reward during learning, but do not fire
after reward in ‘familiar’ trials
● Dopamine neurons also fire to freely administered liquid (reward)
Familiar trials/criterior ==> Reward responses of three dopamine neurons during learning of pairs of novel pictures/changes of average population responses to reward/comparison between progress of learning and neuronal responses to reward
reward activations were highest in the trials before the animal reached criterion (i.e. when the error rate was highest) and declined gradually thereafter (Figs 4 and 5). Differences were significant for learning versus familiar performance (Fig. 5b) and, in
particular, for trials prior to reaching criterion versus subsequent learning blocks (Fig. 5b).
● Dopamine firing in response to reward seems to be correlated with
error rate
● Dopamine neuron firing gradually decreases as behavioral predictive
performance increases
Responses of dopamine neurons related to errors in the temporal prediction of reward.
● When reward is withheld, dopamine neuron firing rate decreases
● When reward timing is shifted, dopamine neurons continue to fire at
the (temporally changed) reward
● Dopamine neurons seem to be encoding some sort of prediction error
Conclusions
dopamine neurons code errors in the prediction of both the occurrence and the time of rewards. In this respect, their responses resemble the
teaching signals that have been employed in particularly efficient computational learning models.
Discussion
● Learning influences dopamine neuron firing rate
● Reward does not always trigger dopamine neuron firing
○ Neuronal activity shifts from encoding reward to encoding prediction of reward
● Removing or shifting reward leads to depressions in predictive
dopamine levels
○ Rapid relearning
● Very specific temporal locking