Task 4: Reward Flashcards
A1: Describe the structures that make up the reward system
Hint: basically motivational/limbic loop
Cortex: ACC, mOFC, vmPFC
BG: ventral Striatum
Midbrain: SNr/GPi (limbic territory)
Thalamus: mediodorsal nucleus (MD), VAmc, VLm
A1: Emphasize the functional similarity of the limbic loop with the motor loop
Between the loops:
- Limbic loop: S-O (select facoured object based on expected reward)
- Motor loop: S-A (appropriate motor programme)
- -> both integrate some stimulus information
- -> both involve prioritizing/resolving competition between different options
- -> both include feedback signal via AMY & SNc which is necessary for learning
A1: Apply the Ballot Box Model to Motivation
Keywords: responses to rewards; addiction
Ballot Box model: Voting for a motivation
Ventral striatum –> direct/indirect pathway vote –> “vote” is moved on to cognitive & motor loop
E.g. NA is more responsive to monetary rewards than cognitive rewards
- Money –> NA activates direct pathway more (pro) and indirect less (anti)
- Cognitive reward –> NA activates indirect pathway more (anti) and direct less (pro)
- -> ventral striatum votes for prefered rewarding objects
E.g. VMStriatum is important during drug acquisition
- When drug behaviour still has to be “voted for”/is not habitual
- Enables more direct habitual behaviour later on
A2: Explain the pathway & functional role of PHASIC dopamine in the reward system.
DA system:
Medial forebrain bundle (MFB) -> VTA -> NA
Stimulation in this network leads to increased self-stimulation (with high vigor, priority over vital goals & in reference to previous experience)
-> motivation energizes, directs behaviour & enables learning
75-80% of cells signal prediction error –> best learning signal
- short-latency PHASIC bursts to unpredicted rewards & predictive cues
Learning in the BG
- LTP in striatum depends on input & DA modulation
- > Cortical input –> environmental info
- > Midbrain input (DA) -> prediction error/adjustments needed
- -> stronger transmission when both inputs occur together
A2: Clarify the differences between phasic & tonic DA
Tonic DA:
Purpose
- Modulation of motor decisions
Projection
- SNc -> striatum -> both pathways -> thalamus
- via ambient, sustained extracellular DA concentration
Timing
- Constant
- regulated by DA reuptake, control of DA synthesis/release/presynaptic influences from other NTs
Phasic DA:
Purpose
- Signalling reward prediction error
Projection
- MFB -> VTA -> NA
- via synaptic transmission
Timing
- 2 timepoints
- > Cue (reward predicting stimulus)
- > Reward (receiving it or not)
A3: Discuss the various properties of the phasic DA signal
Midbrain DA-mediated signals
- signals pure reward value of objects independent of its specific features
- Coded as prediction error
Reward prediction: develops from new positive reinforcers that get associated with preceding neutral stimuli, which become reward-predicting cues
Reward prediction error: difference between predicted & obtained reward
A3: Describe the phasic DA signal in various learning paradigms:
- Regular reward prediction
- Unpredicted reward -> positive prediction error -> ventral Striatum activation
- Fully predicted reward -> no error -> no activation
- Predicted but no reward -> negative prediction error -> no ventral striatum activity -> depression
A3: Describe the phasic DA signal in various learning paradigms:
- Blocking Paradigm
If reward is already fully predicted, new cue will not be associated with the reward/will not elicit activity
-> it doesn’t add value
New cue -> no reward -> no error -> no DA response
New cue -> reward -> positive error -> DA activation
Old+new cue elicits same response as old cue alone
A3: Describe the phasic DA signal in various learning paradigms:
- Conditioned Inhibition
Already established cue is paired with new stimulus & NO reward is given
-> new cue becomes conditioned inhibitory cue (predicts NO reward)
Inhibitor -> no reward -> no error
Inhibitor + predictor -> no reward -> no error (despite predictor!)
Inhibitor -> reward -> positive error
A3: Describe features of phasic DA signal:
- graded response
- context relation
- range adaptation
- time sensitivity
- successive learning
Graded response: Partial error -> smaller error response
Context: Context (probability of reward) defines prediction error
-> Context (desert vs. at home) changes activation despite same reward magnitude (glass of water)
Range adaptation: Reward magnitude doesn’t define prediction error a lot
-> Large changes in reward magnitude (1€ vs. 100€) doesn’t change activation as much
Time-sensitivity: neurons also code for timing of reward
-> change of timing -> depression at old time -> positive error/activation at new timing
Learning: error signals slowly disappear over time/over successive learning trials
A3: Describe how phasic DA cells respond to different types of stimuli
Neutral stimuli -> normal reward prediction, no particular response without associated reward
Aversive stimuli -> opposite effect to rewards -> punishment conditioning
Cue -> aversive stimulus/punishment -> positive error (something appeared) -> decreased activation
-> this DA response ~5-10 times slower
-> similar to reward-absence cue
- sometimes initial or rebound activation
A3: Describe how DA cells respond to uncertainty of rewards
Risk-averse people -> uncertainty reduces reward value
Risk-seeking people -> uncertainty increases reward value
Over 1/3 of DA neurons have slow/sustained/moderate activity between cue and reward (during period of uncertainty)
- highest if probability of reward = 0.5 (lowest certainty)
This activity is distinct from DA activation to rewards & cues
- Uncertainty signal -> low DA concentrations -> stimulate high-affinity D2 receptors (tonic)
- Reward signal -> high DA concentrations -> stimulate low-affinity D1 receptors (phasic)
A4: Describe the range of stimuli that can elicit phasic DA
1) Primary rewards (PRs)
2) Secondary rewards (SRs)
1) PRs
- biological/evolutionary basis
- e.g. food, drink, sex
- positive reinforcers without any learning needed
2) SRs
- cues/associated with primary reinforcers
- e.g. light associated with juice
- Generalized conditioned reinforcers (e.g. money)
- > reinforcing in multiple contexts
-> often in reallife distinction is more gradual
A4: Describe the ventral striatum (vS)/NA response to Primary & Secondary reinforcers
vS/NA
-> responds to PRs in classical & instrumental conditioning
-> response more to SRs than PRs!
-> possibly more activated by surprising rewards
these require error/learning -> more easily detected
-> possible differences in subjective value given to SRs vs. PRs
A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
3) Social rewards
3) Social rewards
- anywhere on PR-SR spectrum
- e.g. erotic images, smiling faces
vS/NA
-> responds to facial attractiveness/gaze/partners
-> similar activation by monetary/SRs & social rewards
-> higher activation than to PRs
-> incorporates positive social cues + behaviour in complex social tasks
(cooperation, norms, altruism; e.g. being observed during a charity donation))
A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
4) Cognitive feedback
4) Cognitive feedback
- Info on performance rather than reward
- vS/NA activation is differentiated depending on task type -> induce different motivation states
- Form of social approval (extrinsic) -> can work similar to money (as generalized conditioned reinforcer)
- > Monetary reward task –> activity correlated with extrinsic motivation - Guides skill acquisition (instrinsic) -> partially biological component
- > Cognitive feedback task –> activity correlated with intrinsic motivation
A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
5) Indirect learning
5) Indirect Learning
- Learning from observation/instruction; social learning
vS/NA only active in observer if
- > Actions of observed person have direct implications for observer
- > Observer and confederate are similar
- > Indirect learning takes place by observing outcome of confederate
- > Observing task-relevant stimulus-stimulus associations
A4: In reward selection tasks, why is the actual reward often somewhat different from the reward promised before? And why is this not necessary in a complex cognitive task, where the reward is feedback information?
Keyword: RPE
vS is possibly more activated by surprising rewards because they require RPE/learning, which makes them more detectable
-> we need surprising (different to what is predicted) stimuli to measure something in striatum
Reward selection tasks
- Primary rewards often given with 100% certainty/fully predictable
- > no error -> no DA firing
- If we want to measure activation, we need to change the predictability
Complex cognitive tasks (cognitive feedback)
- Reward depends on how well subject performed -> not 100% predictable anyway
Uncertainty about result -> always some prediction error -> always some neuronal activity
-> we don’t need to change the rewards because uncertainty exists in these tasks
A5: Explain the role of phasic DA in addiction & self-stimulation as studied in Willhuhn et al (2012):
- Methods
Cocaine:
- slows DA reuptake in striatum (coming from midbrain DA neurons)
Methods Willhuhn et al. (2012):
1) Measured activity in VMS & DLS in rats over 3 weeks
- Nose poke into active port
- > cocaine + light + tone + 20s timeout after (only tone + light)
- Nose poke into inactive port
- > no response
2) Intra-DLS infusion of DA receptor antagonist
3) Lesion in VMS
A5: Explain the role of phasic DA in addiction & self-stimulation as studied in Willhuhn et al (2012):
- Results
1) 3 week measurement
- Nose pokes
- > stable active responding, no increase
- > decreased inactive responding
- > increase ratio (active:inactive) in 2nd & 3rd week vs 1st
- VMS: early DA release
- > significant increase in phasic DA after active nose pokes (vs. inactive)
- > amplitude decreases in 2nd & 3rd week
- DLS: long-term DA release
- > significant increase in phasic DA after active nose pokes (vs. inactive)
- > only in 2nd & 3rd week, not 1st
2) DA receptor antagonist infusion in DLS
- increased active nose pokes at all timepoints
- > not attributable to conditioned DA signal (only late should be affected then)
- > DLS may contribute early on already
- > possible role in tonic DA rather than phasic DA
- increased inactive nose pokes at late timepoint
- > reversed effect on response ratio
3) VMS lesion
- no general suppression of DA transmission on DLS (as may have been expected)
- Selective effect on task-related signaling
- > VMS activity is required for developing conditioned DA signaling in DLS which regulates drug-taking responses
A5: Summarize the role of phasic DA in addiction & self-stimulation as studied in Willhuhn et al (2012):
- Discussion
There is a hierarchy of DA in striatum when developing response-reward associations:
- VMS receives limbic inputs -> enables DA signaling in DLS (sensorimotor)
1) VMS
- Motivation of taking drug -> limbic loop
- phase of feedback-based learning
- decreased activation in 2nd & 3rd week
2) DLS
- Behavioural addiction to drug -> motor loop
- feedback is no longer necessary
- increase activation in 2nd & 3rd week
A5: How do phasic & tonic DA interact in drug addiction?
DA antagonist –> stopped phasic DA but behaviour was affected in all weeks –> must be effect on tonic DA
High phasic DA -> high tonic DA
High tonic DA -> less phasic DA
–> explains self-facilitating addiciton spiral
A6: Discuss the relationship between reward, motivation & habits
1) Motivation:
- Urge to obtain a particular goal -> wanting
- Exploring environment, begins as recreational behaviour
- VMS
- Goal –> e.g. I want to get high
2) Reward
- Individual learns which cues are related to reward
- E.g. If I see John, he can give me cocaine which will get me high
3) Habits
- After some time: cue-action relation without motivation/goal
- Exploiting environment, habitual/compulsive drug use
- DLS
- No more goal, just response to conditioned stimuli –> e.g I have to find John to get cocaine