Week 6 (Chapter 47, 48) Flashcards by Coral C

What is thought to be at the centre of reinforcement learning in the mammalian brain?

The midbrain dopaminergic system

How well did you know this?

Not at all

Perfectly

What are the uses for a theoretical framework for reinforcement learning?

aid in the interpretation of neurophysiology data

- guide the design of future studies

How well did you know this?

Not at all

Perfectly

The Rescorla-Wagner model formalized the idea that _______ is needed to drive learning

An error between the actual and predicted outcome

How well did you know this?

Not at all

Perfectly

What is the learning theory originally formulated for computer learning?

Temporal difference (TD) learning

How well did you know this?

Not at all

Perfectly

Reward expectation reduces dopamine reward responses in a purely _____ fashion

Subtractive

How well did you know this?

Not at all

Perfectly

A classic adaptation of TD learning to a biological circuit model utilizes a _____

Complete serial compound (CSC) feature representation

How well did you know this?

Not at all

Perfectly

Dopamine neurons show a negative prediction error ______

At the time of an expected reward

How well did you know this?

Not at all

Perfectly

Dopamine responses to reward are suppressed only at _____

The time of the expected reward

How well did you know this?

Not at all

Perfectly

What happens when the timing of a reward is shifted?

A larger dopamine response

How well did you know this?

Not at all

Perfectly

Cues followed by late rewards result in _____ dopamine responses than cues followed by early rewards

smaller

How well did you know this?

Not at all

Perfectly

What TD model was proposed to explain the shortcomings of the CSC TD model?

The microstimulus model

How well did you know this?

Not at all

Perfectly

How does the microstimulus TD model differ from the CDC TD model?

The microstimulus model is able to account for the longer dip of dopamine responses upon reward omission

How well did you know this?

Not at all

Perfectly

In 1998, Hollerman & Shultz discovered that when monkeys were given a reward stimulus earlier than expected, ______

There was a large dopamine response upon the reward, but no negative response at the time of the usual reward

How well did you know this?

Not at all

Perfectly

What is one possible modification to the CSC TD model after Hollerman & Shultz’s discovery?

An animal is in two states - ISI when expecting a reward and ITI when not. When one is activated, the other is deactivated

How well did you know this?

Not at all

Perfectly

Semi-Markov dynamics imply that time spent in a state is _____ and is defined by ______

probabilistic

- a probability distribution called a ‘dwell time distribution’

How well did you know this?

Not at all

Perfectly

The ______ model accounts for Hollerman & Shultz’s findings

Study These Flashcards

Belief-state TD model

According to the belief-state TD model, uncertainty should ______

Study These Flashcards

Dramatically affect how reward expectation evolves over time

The CSC TD is model-____, while the belief-state TD is model-____

Study These Flashcards

Free

- Based

What is a shortcoming of the belief-state model?

Study These Flashcards

If an animal is hungry, food-based rewards would have higher state values than drink-based rewards - this is not explicitly learned, therefore the belief-state model does not account for it

Dopamine neurons code for ______

Study These Flashcards

Reward prediction error

The magnitude of dopamine prediction error responses scales _____, integrating them into a biological teaching signal for ______

Study These Flashcards

reward size, probability, and delay

- utility

Rewards drive learning as _____

Study These Flashcards

positive reinforcement

What brain areas are specialized for processing rewards and reward-related behaviours?

Study These Flashcards

dopaminergic midbrain
orbitofrontal cortex
amygdala
ventral striatum

Dopamine neurons reside in the _____ and send signals to the ____ and _____

Study These Flashcards

midbrain
basal ganglia
frontal cortex

What do increases in dopamine neuron activity indicate?

The outcome was better than predicted, and the preceding behaviour should be repeated or invigorated

What does the magnitude of dopamine activity indicte?

By what degree behaviours should be updated

Dopamine teaching signals reflect the same values used for _______

Economic decisions

The R-W model refers to association between _____

The conditioned stimulus and unconditioned stimulus

The TD model consists of an explicit ______ that reflects ______

- value function | - reward expectation through time

Dopamine repsonses show a _____ relationship to reward amount

positive monotonic

Reward responses are _____ as reward probability gets larger

Diminished

What is expected utility?

probability multiplied by utility

What two reward parameters are critical to separate objective factors from subjective values?

Timing and risk

What is temporal discounting?

People tend to prefer rewards sooner rather than later

What is the economic definition of risk?

Statistical variance in outcome distributions?

Utility is defined in economics as _____-

Subjective value derived from choice behaviour

What is a certainty equivalent?

The singular reward amount that has the same utility as a gamble

Standard utility functions are _____, because most humans are _____

- concave | - risk-averse

Optogenic stimulation of dopamine neurons shows that ________

Dopamine activations teach animals what to choose

Week 6 (Chapter 47, 48) Flashcards

(39 cards)