Task 4: Reward Flashcards

Question 1

Q

A1: Describe the structures that make up the reward system

Hint: basically motivational/limbic loop

Answer

A

Cortex: ACC, mOFC, vmPFC

BG: ventral Striatum

Midbrain: SNr/GPi (limbic territory)

Thalamus: mediodorsal nucleus (MD), VAmc, VLm

Question 2

Q

A1: Emphasize the functional similarity of the limbic loop with the motor loop

Answer

A

Between the loops:

Limbic loop: S-O (select facoured object based on expected reward)
Motor loop: S-A (appropriate motor programme)
-> both integrate some stimulus information
-> both involve prioritizing/resolving competition between different options
-> both include feedback signal via AMY & SNc which is necessary for learning

Question 3

Q

A1: Apply the Ballot Box Model to Motivation
Keywords: responses to rewards; addiction

Answer

A

Ballot Box model: Voting for a motivation

Ventral striatum –> direct/indirect pathway vote –> “vote” is moved on to cognitive & motor loop

E.g. NA is more responsive to monetary rewards than cognitive rewards

Money –> NA activates direct pathway more (pro) and indirect less (anti)
Cognitive reward –> NA activates indirect pathway more (anti) and direct less (pro)
-> ventral striatum votes for prefered rewarding objects

E.g. VMStriatum is important during drug acquisition

When drug behaviour still has to be “voted for”/is not habitual
Enables more direct habitual behaviour later on

Question 4

Q

A2: Explain the pathway & functional role of PHASIC dopamine in the reward system.

Answer

A

DA system:
Medial forebrain bundle (MFB) -> VTA -> NA

Stimulation in this network leads to increased self-stimulation (with high vigor, priority over vital goals & in reference to previous experience)
-> motivation energizes, directs behaviour & enables learning

75-80% of cells signal prediction error –> best learning signal
- short-latency PHASIC bursts to unpredicted rewards & predictive cues

Learning in the BG

LTP in striatum depends on input & DA modulation
> Cortical input –> environmental info
> Midbrain input (DA) -> prediction error/adjustments needed
-> stronger transmission when both inputs occur together

Question 5

Q

A2: Clarify the differences between phasic & tonic DA

Answer

A

Tonic DA:
Purpose
- Modulation of motor decisions

Projection

SNc -> striatum -> both pathways -> thalamus
via ambient, sustained extracellular DA concentration

Timing

Constant
regulated by DA reuptake, control of DA synthesis/release/presynaptic influences from other NTs

Phasic DA:
Purpose
- Signalling reward prediction error

Projection

MFB -> VTA -> NA
via synaptic transmission

Timing

2 timepoints
- > Cue (reward predicting stimulus)
- > Reward (receiving it or not)

Question 6

Q

A3: Discuss the various properties of the phasic DA signal

Answer

A

Midbrain DA-mediated signals

signals pure reward value of objects independent of its specific features
Coded as prediction error

Reward prediction: develops from new positive reinforcers that get associated with preceding neutral stimuli, which become reward-predicting cues

Reward prediction error: difference between predicted & obtained reward

Question 7

Q

A3: Describe the phasic DA signal in various learning paradigms:
- Regular reward prediction

Answer

A

Unpredicted reward -> positive prediction error -> ventral Striatum activation
Fully predicted reward -> no error -> no activation
Predicted but no reward -> negative prediction error -> no ventral striatum activity -> depression

Question 8

Q

A3: Describe the phasic DA signal in various learning paradigms:
- Blocking Paradigm

Answer

A

If reward is already fully predicted, new cue will not be associated with the reward/will not elicit activity
-> it doesn’t add value

New cue -> no reward -> no error -> no DA response
New cue -> reward -> positive error -> DA activation

Old+new cue elicits same response as old cue alone

Question 9

Q

A3: Describe the phasic DA signal in various learning paradigms:
- Conditioned Inhibition

Answer

A

Already established cue is paired with new stimulus & NO reward is given
-> new cue becomes conditioned inhibitory cue (predicts NO reward)

Inhibitor -> no reward -> no error
Inhibitor + predictor -> no reward -> no error (despite predictor!)
Inhibitor -> reward -> positive error

Question 10

Q

A3: Describe features of phasic DA signal:

graded response
context relation
range adaptation
time sensitivity
successive learning

Answer

A

Graded response: Partial error -> smaller error response

Context: Context (probability of reward) defines prediction error
-> Context (desert vs. at home) changes activation despite same reward magnitude (glass of water)

Range adaptation: Reward magnitude doesn’t define prediction error a lot
-> Large changes in reward magnitude (1€ vs. 100€) doesn’t change activation as much

Time-sensitivity: neurons also code for timing of reward
-> change of timing -> depression at old time -> positive error/activation at new timing

Learning: error signals slowly disappear over time/over successive learning trials

Question 11

Q

A3: Describe how phasic DA cells respond to different types of stimuli

Answer

A

Neutral stimuli -> normal reward prediction, no particular response without associated reward

Aversive stimuli -> opposite effect to rewards -> punishment conditioning
Cue -> aversive stimulus/punishment -> positive error (something appeared) -> decreased activation
-> this DA response ~5-10 times slower
-> similar to reward-absence cue
- sometimes initial or rebound activation

Question 12

Q

A3: Describe how DA cells respond to uncertainty of rewards

Answer

A

Risk-averse people -> uncertainty reduces reward value
Risk-seeking people -> uncertainty increases reward value

Over 1/3 of DA neurons have slow/sustained/moderate activity between cue and reward (during period of uncertainty)
- highest if probability of reward = 0.5 (lowest certainty)

This activity is distinct from DA activation to rewards & cues

Uncertainty signal -> low DA concentrations -> stimulate high-affinity D2 receptors (tonic)
Reward signal -> high DA concentrations -> stimulate low-affinity D1 receptors (phasic)

Question 13

Q

A4: Describe the range of stimuli that can elicit phasic DA

1) Primary rewards (PRs)
2) Secondary rewards (SRs)

Answer

A

1) PRs
- biological/evolutionary basis
- e.g. food, drink, sex
- positive reinforcers without any learning needed

2) SRs
- cues/associated with primary reinforcers
- e.g. light associated with juice
- Generalized conditioned reinforcers (e.g. money)
- > reinforcing in multiple contexts

-> often in reallife distinction is more gradual

Question 14

Q

A4: Describe the ventral striatum (vS)/NA response to Primary & Secondary reinforcers

Answer

A

vS/NA
-> responds to PRs in classical & instrumental conditioning
-> response more to SRs than PRs!
-> possibly more activated by surprising rewards
these require error/learning -> more easily detected
-> possible differences in subjective value given to SRs vs. PRs

Question 15

Q

A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
3) Social rewards

Answer

A

3) Social rewards
- anywhere on PR-SR spectrum
- e.g. erotic images, smiling faces

vS/NA
-> responds to facial attractiveness/gaze/partners
-> similar activation by monetary/SRs & social rewards
-> higher activation than to PRs
-> incorporates positive social cues + behaviour in complex social tasks
(cooperation, norms, altruism; e.g. being observed during a charity donation))

Question 16

Q

A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
4) Cognitive feedback

Answer

A

4) Cognitive feedback
- Info on performance rather than reward
- vS/NA activation is differentiated depending on task type -> induce different motivation states

Form of social approval (extrinsic) -> can work similar to money (as generalized conditioned reinforcer)
- > Monetary reward task –> activity correlated with extrinsic motivation
Guides skill acquisition (instrinsic) -> partially biological component
- > Cognitive feedback task –> activity correlated with intrinsic motivation

Question 17

Q

A4: Describe the range of stimuli that can elicit phasic DA & the response in vS/NA
5) Indirect learning

Answer

A

5) Indirect Learning
- Learning from observation/instruction; social learning

vS/NA only active in observer if

> Actions of observed person have direct implications for observer
> Observer and confederate are similar
> Indirect learning takes place by observing outcome of confederate
> Observing task-relevant stimulus-stimulus associations

Question 18

Q

A4: In reward selection tasks, why is the actual reward often somewhat different from the reward promised before? And why is this not necessary in a complex cognitive task, where the reward is feedback information?
Keyword: RPE

Answer

A

vS is possibly more activated by surprising rewards because they require RPE/learning, which makes them more detectable
-> we need surprising (different to what is predicted) stimuli to measure something in striatum

Reward selection tasks

Primary rewards often given with 100% certainty/fully predictable
- > no error -> no DA firing
If we want to measure activation, we need to change the predictability

Complex cognitive tasks (cognitive feedback)
- Reward depends on how well subject performed -> not 100% predictable anyway
Uncertainty about result -> always some prediction error -> always some neuronal activity
-> we don’t need to change the rewards because uncertainty exists in these tasks

Question 19

Q

A5: Explain the role of phasic DA in addiction & self-stimulation as studied in Willhuhn et al (2012):
- Methods

Answer

A

Cocaine:
- slows DA reuptake in striatum (coming from midbrain DA neurons)

Methods Willhuhn et al. (2012):

1) Measured activity in VMS & DLS in rats over 3 weeks
- Nose poke into active port
- > cocaine + light + tone + 20s timeout after (only tone + light)
- Nose poke into inactive port
- > no response

2) Intra-DLS infusion of DA receptor antagonist
3) Lesion in VMS

Question 20

Q

A5: Explain the role of phasic DA in addiction & self-stimulation as studied in Willhuhn et al (2012):
- Results

Answer

A

1) 3 week measurement
- Nose pokes
- > stable active responding, no increase
- > decreased inactive responding
- > increase ratio (active:inactive) in 2nd & 3rd week vs 1st

VMS: early DA release
> significant increase in phasic DA after active nose pokes (vs. inactive)
> amplitude decreases in 2nd & 3rd week
DLS: long-term DA release
> significant increase in phasic DA after active nose pokes (vs. inactive)
> only in 2nd & 3rd week, not 1st

2) DA receptor antagonist infusion in DLS
- increased active nose pokes at all timepoints
- > not attributable to conditioned DA signal (only late should be affected then)
- > DLS may contribute early on already
- > possible role in tonic DA rather than phasic DA

increased inactive nose pokes at late timepoint
> reversed effect on response ratio

3) VMS lesion
- no general suppression of DA transmission on DLS (as may have been expected)
- Selective effect on task-related signaling
- > VMS activity is required for developing conditioned DA signaling in DLS which regulates drug-taking responses

Question 21

Q

A5: Summarize the role of phasic DA in addiction & self-stimulation as studied in Willhuhn et al (2012):
- Discussion

Answer

A

There is a hierarchy of DA in striatum when developing response-reward associations:
- VMS receives limbic inputs -> enables DA signaling in DLS (sensorimotor)

1) VMS
- Motivation of taking drug -> limbic loop
- phase of feedback-based learning
- decreased activation in 2nd & 3rd week

2) DLS
- Behavioural addiction to drug -> motor loop
- feedback is no longer necessary
- increase activation in 2nd & 3rd week

Question 22

Q

A5: How do phasic & tonic DA interact in drug addiction?

Answer

A

DA antagonist –> stopped phasic DA but behaviour was affected in all weeks –> must be effect on tonic DA

High phasic DA -> high tonic DA
High tonic DA -> less phasic DA
–> explains self-facilitating addiciton spiral

Question 23

Q

A6: Discuss the relationship between reward, motivation & habits

Answer

A

1) Motivation:
- Urge to obtain a particular goal -> wanting
- Exploring environment, begins as recreational behaviour
- VMS
- Goal –> e.g. I want to get high

2) Reward
- Individual learns which cues are related to reward
- E.g. If I see John, he can give me cocaine which will get me high

3) Habits
- After some time: cue-action relation without motivation/goal
- Exploiting environment, habitual/compulsive drug use
- DLS
- No more goal, just response to conditioned stimuli –> e.g I have to find John to get cocaine