Week 6 (Chapter 47, 48) Flashcards
What is thought to be at the centre of reinforcement learning in the mammalian brain?
The midbrain dopaminergic system
What are the uses for a theoretical framework for reinforcement learning?
- aid in the interpretation of neurophysiology data
- guide the design of future studies
The Rescorla-Wagner model formalized the idea that _______ is needed to drive learning
An error between the actual and predicted outcome
What is the learning theory originally formulated for computer learning?
Temporal difference (TD) learning
Reward expectation reduces dopamine reward responses in a purely _____ fashion
Subtractive
A classic adaptation of TD learning to a biological circuit model utilizes a _____
Complete serial compound (CSC) feature representation
Dopamine neurons show a negative prediction error ______
At the time of an expected reward
Dopamine responses to reward are suppressed only at _____
The time of the expected reward
What happens when the timing of a reward is shifted?
A larger dopamine response
Cues followed by late rewards result in _____ dopamine responses than cues followed by early rewards
smaller
What TD model was proposed to explain the shortcomings of the CSC TD model?
The microstimulus model
How does the microstimulus TD model differ from the CDC TD model?
The microstimulus model is able to account for the longer dip of dopamine responses upon reward omission
In 1998, Hollerman & Shultz discovered that when monkeys were given a reward stimulus earlier than expected, ______
There was a large dopamine response upon the reward, but no negative response at the time of the usual reward
What is one possible modification to the CSC TD model after Hollerman & Shultz’s discovery?
An animal is in two states - ISI when expecting a reward and ITI when not. When one is activated, the other is deactivated
Semi-Markov dynamics imply that time spent in a state is _____ and is defined by ______
- probabilistic
- a probability distribution called a ‘dwell time distribution’
The ______ model accounts for Hollerman & Shultz’s findings
Belief-state TD model
According to the belief-state TD model, uncertainty should ______
Dramatically affect how reward expectation evolves over time
The CSC TD is model-____, while the belief-state TD is model-____
- Free
- Based
What is a shortcoming of the belief-state model?
If an animal is hungry, food-based rewards would have higher state values than drink-based rewards - this is not explicitly learned, therefore the belief-state model does not account for it
Dopamine neurons code for ______
Reward prediction error
The magnitude of dopamine prediction error responses scales _____, integrating them into a biological teaching signal for ______
- reward size, probability, and delay
- utility
Rewards drive learning as _____
positive reinforcement
What brain areas are specialized for processing rewards and reward-related behaviours?
- dopaminergic midbrain
- orbitofrontal cortex
- amygdala
- ventral striatum
Dopamine neurons reside in the _____ and send signals to the ____ and _____
- midbrain
- basal ganglia
- frontal cortex
What do increases in dopamine neuron activity indicate?
The outcome was better than predicted, and the preceding behaviour should be repeated or invigorated
What does the magnitude of dopamine activity indicte?
By what degree behaviours should be updated
Dopamine teaching signals reflect the same values used for _______
Economic decisions
The R-W model refers to association between _____
The conditioned stimulus and unconditioned stimulus
The TD model consists of an explicit ______ that reflects ______
- value function
- reward expectation through time
Dopamine repsonses show a _____ relationship to reward amount
positive monotonic
Reward responses are _____ as reward probability gets larger
Diminished
What is expected utility?
probability multiplied by utility
What two reward parameters are critical to separate objective factors from subjective values?
Timing and risk
What is temporal discounting?
People tend to prefer rewards sooner rather than later
What is the economic definition of risk?
Statistical variance in outcome distributions?
Utility is defined in economics as _____-
Subjective value derived from choice behaviour
What is a certainty equivalent?
The singular reward amount that has the same utility as a gamble
Standard utility functions are _____, because most humans are _____
- concave
- risk-averse
Optogenic stimulation of dopamine neurons shows that ________
Dopamine activations teach animals what to choose