Reward and Motivation Flashcards

1
Q

State-dependency of reward

A
  • The value of rewards depend on the state of the receiver. It is dependent on subjective utility.
  • E.g. a bottle of water is much more rewarding if you’re stuck in the desert.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reward

Punishment

A

Reward
- Stimulus that elicits approach behavior

Punishment
- A stimulus that elicits avoidance behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Primary and secondary reinforcers

A

Primary reinforcers elicit approach behavior because of its implicit, unconditioned value (e.g. food)

Secondary reinforcers are rewards that are conditioned, but they don’t have intrinsic value. E.g. money is a reward that is conditioned, you can’t really use it for anything unless you spend it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Negative reinforcement

A

Reinforcement that is due to the removal of a punishment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Reward pathways

A
  1. Nigrostratal pathway
  2. Mesolimbic pathway
  3. Mesocortical pathway

These pathways are characterised by having cells that synthesize dopamine. Thereby, most of the dopamine in the brain stems from one of the two.
The nigrostriatal pathway has dopamine synthesising cells in the substantia nigra.
The Mesolimbic and mesocortical pathways have dopamine synthesising neurons in the ventral tegmental area.

See dedicated cards for further explanations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Nigrostriatal pathway

A

The nigrostriatal pathway is a bilateral dopaminergic pathway in the brain that connects the substantia nigra pars compacta (SNc) in the midbrain with the dorsal striatum (i.e., the caudate nucleus and putamen) in the forebrain. It is one of the major dopamine pathways in the brain, and is critical in the production of movement as part of a system called the basal ganglia motor loop. Death of neurons in this pathway can lead to Parkinson’s disease.

Using single cell recordings from the substantia nigra of monkeys, Romo and Schultz (1990) showed elevated activation when presented with food reward, and no activation when presented with no reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Mesocortical pathway

A

Like the mesolimbic pathway, the mesocortical pathway originates in the ventral tegmental area (VTA). The mesocortical pathway connects the VTA and the prefrontal cortex.

The mesocortical pathway is essential to the normal cognitive function of the dorsolateral prefrontal cortex (part of the frontal lobe), and is thought to be involved in cognitive control, motivation, and emotional response.

Dysfunction of the pathway is hypothesised to be involved in psychosis and schizophrenia.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mesolimbic pathway

A

The mesolimbic pathway originates in the ventral tegmental area (VTA). The VTA has been called the reward center in the brain, but prof. John is skeptical (as always).

The pathway connects the VTA in the midbrain to the ventral striatum of the basal ganglia in the forebrain. The ventral striatum includes the nucleus accumbens and the olfactory tubercle.

The release of dopamine from the mesolimbic pathway into the nucleus accumbens regulates incentive salience (e.g. motivation and desire for rewarding stimuli) and facilitates reinforcement and reward-related motor function learning.

The dysregulation of the mesolimbic pathway and its output neurons in the nucleus accumbens plays a significant role in the development and maintenance of an addiction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Nucleus accumbens
- Location?

A

Located at the intersection of caudate nucleus and putamen in basal ganglia

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the role of the orbitofrontal cortex in reward processing?
- Multiple modalities
- Lesions?

A

Multiple Modalities
- OFC is a zone of convergence from multiple modalities

  • The orbitofrontal cortex (OFC) is a zone of multimodal processing. Studies have shown that pleasurable stimuli from visual, auditory, gustatory or olfactory modalities will increase activity in the OFC, compared to unpleasant stimuli.

Lesions
- Monkeys with lesions in the OFC will select a boring capsule more often than a delicious banana, which shows that they aren’t able to evaluate the rewarding value of stimuli.
- Focus on frequencies: Monkeys with orbitofrontal lesions choose nonfood items much more frequently (parentheses) than sham (control) surgery monkeys.
- OFC lesioned animals fail to adapt choice behavior when a previously rewarded item stops being rewarded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Medial vs lateral OFC

A

Medial OFC show increased activation in response to positive rewards

Lateral OFC show increased activation in response to punishment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Classical conditioning

A

Unconditioned stimulus = e.g. food
unconditioned response = drooling
Without conditioning, dogs will drool when they see food.

Neutral stimulus = e.g. a sound

During conditioning, the neutral stimulus is paired with the unconditioned stimuli.

When conditioned, the sound will now be a conditioned stimulus activating a conditioned response. Thus, the dogs will drool based on the sound, because it is conditioned to do so.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where does reward anticipation show activation in the brain?

A

Midbrain and striatum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is reward expectancy evaluated in the OFC?

A

Like in V1 or M1, it seems like the reward expectancy is encoded by many different neurons which have different tunings to different probabilities of rewards. These tuned activations all contribute to the reward expectancy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does the response of dopamine neurons change from unpredicted reward to predicted reward?
And what happens if the reward is predicted but doesn’t show up?

A

If a reward occurs without any cues, dopamine neurons in the VTA will fire.

If a reward is preceded by a conditioned stimulus in a way where the stimulus allows the brain to predict the reward (E.g. sound precedes food), the dopamine neurons will fire when the conditioned stimulus occurs, but not when the reward is presented.

If the conditioned stimulus is shown without reward, dopamine neurons will fire when the conditioned stimulus is shown, and there will be a decrease in firing when no reward is given.

I.e. the cells aren’t coding the pleasure of an experience, rather they are coding the reward prediction error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Delay of gratification

A

E.g. in the marshmallow test, capability of delaying gratification was predictive for a later success in life.

However, the idea of temporal discounting should be taken into account, as the value of a reward decreases with increasing delay time.

17
Q

Temporal discounting

A

The value of a reward decreases as delay time increases. E.g. one marshmallow now is as good as two marshmallows in 15 mins.

18
Q

What is the ‘reward curve’?

A

The reward curve is a plot showing how different porportions of reward at different delay times create different preferences for ‘sooner-smaller rewards’ and ‘larger-later rewards’.

Can be used to find the point of subjective equality, where 50% chooses sooner-smaller and 50% chooses larger-later, because the temporal discounting equalizes the value of different sized rewards.

19
Q

What does the discounted value of reward show?

A

How much the temporal discounting affects the value of a reward.

20
Q

Reward and Motivation

Normative decision theories
Descriptive decision theories

Action-outcome decisions
Stimulus-response decisions

Model-based
Model-free

social decisions

A

Normative decision theories
- define how people ought to make decisions that yield the optimal choice.
- Very often, however, such theories fail topredict what people actually choose.

Descriptive decision theories
- attempt to describe what people actually do, not what they should do.

Action-outcome decisions
- the decision involves some form of evaluation (not necessarily conscious) of the expected outcomes

Stimulus-response decisions
- After we repeat that action, and if the outcome is consistent, the process becomes habitual; that is, it becomes a stimulus–response decision

Model-based
- the agent has an internal representation of some aspect of the world and uses this model to evaluate different actions

Model-free
- you just have an input–output mapping, similar to stimulus–response decisions

social decisions
- Decisions that involve other people

21
Q

Primary Reinforcers

Secondary Reinforcers

A

Primary Reinforcers
- They have a direct benefit for survival fitness. Their value, or our response to these reinforcers, is to some extent hardwired in our genetic code. But reward value is also flexible and shaped by experience. If you are truly starving, an item of disgust—say, a dead mouse—suddenly takes on reinforcing properties.

Secondary Reinforcers
- Such as money and status, are rewards that have no intrinsic value themselves but become rewarding through their association with other forms of reinforcement.

22
Q

Reward and Motivation

Components of Value

A

Payoff
- What kind and how much reward do the options offer? At the current spot, you might land a small trout or perhaps a bream. At the other spot, you’ve caught a few large-mouthed bass.

Probability
- How likely are you to attain the reward? You might remember that the current spot almost always yields a few catches, whereas you’ve most often come back empty-handed from the secret hole.

Effort or cost
- If you stay put, you can start casting right away. Getting to the fishing hole on the other side of the lake will take an hour of scrambling up and down the hillside
- Temporal discounting
- How long are you willing to wait for a reward? You may not catch large fish at the current spot, but you could feel that satisfying tug 30 minutes sooner if you stay where you are.

Context
- This factor involves external things, like the time of day, as well as internal things, such as whether you are hungry or tired, or looking forward to an afternoon outing with some friends.
- Novelty
- you might be the type who values an adventure and the possibility of finding an even better fishing hole on your way to the other side of the lake, or you might be feeling cautious, eager to go with a proven winner.

Preference
- You may just like one fishing spot better than another for its aesthetics or a fond memory

23
Q

Representation of Value

A
  • OFC plays a key role in the representation of value.
  • More lateral regions of the PFC are important for some form of modulatory control on these representations or the actions associated with them.
24
Q

More Than One Type of Decision System?
- Marginal Value Theorem

A

Marginal Value Theorem
- basic principle in their foraging behavior
- The animal exploits a foraging patch until its intake rate falls below the average intake rate for the overall environment. At that point, the animal becomes exploratory
- the cellular activity in ACC was highly predictive of the amount of time the animal would continue to “forage” by choosing the rewarding stimulus. Most interesting, the cells showed the property of a threshold: When the firing rate was greater than 20 spikes per second, the animal left the patch (monkeys, see pic!)
- the BOLD response in ACC correlates positively with search value (explore) and negatively with the encounter value (exploit) regardless of which choice participants made (humans)

25
Q

Dopamine Activity and RewardProcessing
- what are the main loci of dopaminergic neurons?
- where do they project to? (pathways)

A
  • Substantia nigra pars compacta (SNc)
    • project to the dorsal striatum, the major input nucleus of the basal ganglia
  • Ventral Tegmental Area
    • Mesolimbic Pathway
      • travels to structures important to emotional processing, including the nucleus accumbens (ventral striatum) of the basal ganglia, the amygdala, the hippocampus, and the anterior cingulate cortex
    • Mesocortical Pathway
      • to the neocortex, particularly to the medial portions of the frontal lobe
26
Q

Dopamine Activity and RewardProcessing
Dopamine and Prediction Error
- What is the prediction error (PE)?
- Describe its mechanism!

A

What is the prediction error (PE)?
- a signal that represents the difference between the obtained reward and the expected reward

Describe its mechanism!
- Positive prediction error (PPE)
- the obtained reward is greater than the expected reward

  • Negative prediction error (NPE)
    • the obtained reward is less than the expected reward

How can it explain changes in dopaminergic responses to reward?
- if reward unexpected (PPE) -> Dopamine goes up
- after some time, no reaction to US but to CS because reward is now associated with CS
- if reward expected but not obtained (NPE) -> no dopamine
- response decreasing if no reward for some time

What is extincion?
- a response previously associated with a stimulus is no longer produced

27
Q

Dopamine Activity and RewardProcessing
Reward and Punishment
- What is punishment?
- The Habenula
- location?
- input?
- output?
- role in reward & punishment?
- context dependency

A

What is punishment?
- punishment involves the experience of something aversive
- Aversive events are the opposite of rewarding events in that they are unpleasant, should be avoided, and have opposite motivational values.

The Habenula
Location?
- within the dorsal thalamus

input?
- forebrain limbic regions

output?
- inhibitory projections to dopamine neurons in the substantia nigra pars compacta

role in reward & punishment?
- Masayuki Matsumoto and Okihide Hikosaka (2007) recorded from neurons in the lateral habenula and dopaminergic neurons in the substantia nigra pars compacta while monkeys saccaded to a target that was either to the left or right of a fixation point. A saccade to one target was associated with a juice reward, and a saccade to the other target resulted in non-reinforcement. Habenula neurons became active when the saccade was to the no reward side and were suppressed if the saccade was to the reward side, suggesting that reward related activity of the dopaminergic neurons may be regulated by input from the lateral habenula.

Context dependency
If two actions result in either juice or nothing, the habenula is active when the nothing choice is made. But if the two actions result in either nothing or an aversive puff of air to the eye, the habenula is active only when the animal makes the response that results in the airpuff.