Reinforcement +DA Flashcards

1
Q

Main idea in reward dependent learning

A

reward is entirely predicted by a sensory cue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why might reward be driven by error?

A

once a rat learns that presentation of a light is consistently followed by food, no association will be developed to a new stimulus paired with the light (e.g. sound) Kamin, 1969) i.e. no further learning takes place. It appears therefore that learning is driven by deviations or “errors” between the predicted time and amount of rewards and their actual experienced times and magnitudes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DA neurons in which areas are associated with reward-prediction?

A

Dopamine neurons of the ventral tegmental area (VTA) and substantia nigra have long been identified with the processing of rewarding stimuli. These neurons send their axons to brain structures involved in motivation and goal-directed behavior, for example, the striatum, nucleus accumbens, and frontal cortex.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multiple lines of evidence including ___ suggest DA has a role in reward

A

from drugs like amphetamine and cocaine which exert their addictive actions in part by prolonging the influence of dopamine on target neurons (Koob, 1992) and studies of electrical self-stimulation where rats press bars to excite dopamine neurons at the site of an implanted electrode (Phillips, 1975) implicate midbrain dopaminergic activity in reward-dependent learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When measuring single dopamine neurons in monkeys presented

A

with various appetitive stimuli such as a morsel of apple (Schultz, 1986), dopamine neurons respond with short, phasic activations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the characteristics of DA phasic activity?

A

These phasic activations do not, however, discriminate between these different types of rewarding stimuli and are not elicited by aversive stimuli like air puffs to the hand or drops of saline to the mouth. This homogenous response occurs in the majority of dopamine neurons (55 to 80%).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens to DA firing when a reward behaviour is learned?

A

Once a reward behaviour is learned, two remarkable changes occur in the dopamine neuron output: (i) the primary reward no longer elicits a phasic response; and (ii) the onset of the (predictive) stimulus now causes a phasic activation in dopamine cell output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens when a reward stimulus is not presented?

A

In trials where the reward is not delivered at the appropriate time after the onset of the light, dopamine neurons are depressed markedly below their basal firing rate exactly at the time that the reward should have occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the implications of these studies on DA firing?

A

These studies promote the idea that dopaminergic activity encodes expectations about external stimuli or reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why has the TD algorithm been useful?

A

The TD algorithm is particularly well suited to understanding the functional role played by the dopamine signal in terms of the information it constructs and broadcasts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What has TD work used to study DA?

A

This work has used fluctuations in dopamine activity in dual roles

(i) as a supervisory signal for synaptic weight changes and (ii) as a signal to influence directly and indirectly the choice of behavioral actions in humans and bees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the assumptions of TD?

A

First, the computational goal of learning is to use the sensory cues to predict a discounted sum of all future rewards V(t) within a learning trial

The second main assumption is the Markovian one, that is, the presentation of future sensory cues and rewards depends only on the immediate (current) sensory cues and not the past sensory cues.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do the components of TD algorithm represent?

A

where r(t) is the reward at time t and E[·] denotes the expected value of the sum of future rewards up to the end of the trial. 0 ≤ γ ≤ 1 is a discount factor that makes rewards that arrive sooner more important than rewards that arrive later.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the definition of V(t) imply?

A

satisfies a condition of consistency through time

there is information available at each instant in time that can act as a surrogate prediction error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is another way of representing TD?

A

An error in the estimated predictions can now be defined with information available at successive time steps

d(t) = r(t) + yV^(t+1) -V^(t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is d(t) and what does it act as>

A

TD error and acts as a surrogate prediction error signal that is instantly available at time t+1

17
Q

What is d(t) use for?

A

To improve the estimates of V(t) and to choose appropriate actions