Task 4: Reward Flashcards
A1: Describe the structures that make up the reward system
Hint: basically motivational/limbic loop
Cortex: ACC, mOFC, vmPFC
BG: ventral Striatum
Midbrain: SNr/GPi (limbic territory)
Thalamus: mediodorsal nucleus (MD), VAmc, VLm
A1: Emphasize the functional similarity of the limbic loop with the motor loop
Between the loops:
- Limbic loop: S-O (select facoured object based on expected reward)
- Motor loop: S-A (appropriate motor programme)
- -> both integrate some stimulus information
- -> both involve prioritizing/resolving competition between different options
- -> both include feedback signal via AMY & SNc which is necessary for learning
A1: Apply the Ballot Box Model to Motivation
Keywords: responses to rewards; addiction
Ballot Box model: Voting for a motivation
Ventral striatum –> direct/indirect pathway vote –> “vote” is moved on to cognitive & motor loop
E.g. NA is more responsive to monetary rewards than cognitive rewards
- Money –> NA activates direct pathway more (pro) and indirect less (anti)
- Cognitive reward –> NA activates indirect pathway more (anti) and direct less (pro)
- -> ventral striatum votes for prefered rewarding objects
E.g. VMStriatum is important during drug acquisition
- When drug behaviour still has to be “voted for”/is not habitual
- Enables more direct habitual behaviour later on
A2: Explain the pathway & functional role of PHASIC dopamine in the reward system.
DA system:
Medial forebrain bundle (MFB) -> VTA -> NA
Stimulation in this network leads to increased self-stimulation (with high vigor, priority over vital goals & in reference to previous experience)
-> motivation energizes, directs behaviour & enables learning
75-80% of cells signal prediction error –> best learning signal
- short-latency PHASIC bursts to unpredicted rewards & predictive cues
Learning in the BG
- LTP in striatum depends on input & DA modulation
- > Cortical input –> environmental info
- > Midbrain input (DA) -> prediction error/adjustments needed
- -> stronger transmission when both inputs occur together
A2: Clarify the differences between phasic & tonic DA
Tonic DA:
Purpose
- Modulation of motor decisions
Projection
- SNc -> striatum -> both pathways -> thalamus
- via ambient, sustained extracellular DA concentration
Timing
- Constant
- regulated by DA reuptake, control of DA synthesis/release/presynaptic influences from other NTs
Phasic DA:
Purpose
- Signalling reward prediction error
Projection
- MFB -> VTA -> NA
- via synaptic transmission
Timing
- 2 timepoints
- > Cue (reward predicting stimulus)
- > Reward (receiving it or not)
A3: Discuss the various properties of the phasic DA signal
Midbrain DA-mediated signals
- signals pure reward value of objects independent of its specific features
- Coded as prediction error
Reward prediction: develops from new positive reinforcers that get associated with preceding neutral stimuli, which become reward-predicting cues
Reward prediction error: difference between predicted & obtained reward
A3: Describe the phasic DA signal in various learning paradigms:
- Regular reward prediction
- Unpredicted reward -> positive prediction error -> ventral Striatum activation
- Fully predicted reward -> no error -> no activation
- Predicted but no reward -> negative prediction error -> no ventral striatum activity -> depression
A3: Describe the phasic DA signal in various learning paradigms:
- Blocking Paradigm
If reward is already fully predicted, new cue will not be associated with the reward/will not elicit activity
-> it doesn’t add value
New cue -> no reward -> no error -> no DA response
New cue -> reward -> positive error -> DA activation
Old+new cue elicits same response as old cue alone
A3: Describe the phasic DA signal in various learning paradigms:
- Conditioned Inhibition
Already established cue is paired with new stimulus & NO reward is given
-> new cue becomes conditioned inhibitory cue (predicts NO reward)
Inhibitor -> no reward -> no error
Inhibitor + predictor -> no reward -> no error (despite predictor!)
Inhibitor -> reward -> positive error
A3: Describe features of phasic DA signal:
- graded response
- context relation
- range adaptation
- time sensitivity
- successive learning
Graded response: Partial error -> smaller error response
Context: Context (probability of reward) defines prediction error
-> Context (desert vs. at home) changes activation despite same reward magnitude (glass of water)
Range adaptation: Reward magnitude doesn’t define prediction error a lot
-> Large changes in reward magnitude (1€ vs. 100€) doesn’t change activation as much
Time-sensitivity: neurons also code for timing of reward
-> change of timing -> depression at old time -> positive error/activation at new timing
Learning: error signals slowly disappear over time/over successive learning trials
A3: Describe how phasic DA cells respond to different types of stimuli
Neutral stimuli -> normal reward prediction, no particular response without associated reward
Aversive stimuli -> opposite effect to rewards -> punishment conditioning
Cue -> aversive stimulus/punishment -> positive error (something appeared) -> decreased activation
-> this DA response ~5-10 times slower
-> similar to reward-absence cue
- sometimes initial or rebound activation
A3: Describe how DA cells respond to uncertainty of rewards
Risk-averse people -> uncertainty reduces reward value
Risk-seeking people -> uncertainty increases reward value
Over 1/3 of DA neurons have slow/sustained/moderate activity between cue and reward (during period of uncertainty)
- highest if probability of reward = 0.5 (lowest certainty)
This activity is distinct from DA activation to rewards & cues
- Uncertainty signal -> low DA concentrations -> stimulate high-affinity D2 receptors (tonic)
- Reward signal -> high DA concentrations -> stimulate low-affinity D1 receptors (phasic)
A4: Describe the range of stimuli that can elicit phasic DA
1) Primary rewards (PRs)
2) Secondary rewards (SRs)
1) PRs
- biological/evolutionary basis
- e.g. food, drink, sex
- positive reinforcers without any learning needed
2) SRs
- cues/associated with primary reinforcers
- e.g. light associated with juice
- Generalized conditioned reinforcers (e.g. money)
- > reinforcing in multiple contexts
-> often in reallife distinction is more gradual
A4: Describe the ventral striatum (vS)/NA response to Primary & Secondary reinforcers
vS/NA
-> responds to PRs in classical & instrumental conditioning
-> response more to SRs than PRs!
-> possibly more activated by surprising rewards
these require error/learning -> more easily detected
-> possible differences in subjective value given to SRs vs. PRs
A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
3) Social rewards
3) Social rewards
- anywhere on PR-SR spectrum
- e.g. erotic images, smiling faces
vS/NA
-> responds to facial attractiveness/gaze/partners
-> similar activation by monetary/SRs & social rewards
-> higher activation than to PRs
-> incorporates positive social cues + behaviour in complex social tasks
(cooperation, norms, altruism; e.g. being observed during a charity donation))