class #4 Flashcards

1
Q

Describe the Rescorla-Wagner Model

A

RESCORLA-WAGNER model: Learning occurs when an animal experiences a discrepancy between what it expects to happen and what actually happens.

→ animals learn to associate a conditioned stimulus (CS) with an unconditioned stimulus (US) → by updating their expectations about the outcome of the CS based on the prediction error.

When a CS is presented: the model predicts the strength of the animal’s conditioned response (CR) based on the current associative strength of the CS-US pairing.

→ If the actual outcome is stronger than predicted, the prediction error is positive, and the associative strength of the CS-US pairing is increased.
→ If the actual outcome is weaker than predicted, the prediction error is negative, and the associative strength of the CS-US pairing is decreased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe in detail the equation for Rescorla-Waagner model

A

∆V = alpha * beta (lambda - ∑ V )

The amount of learning (the change ∆ in the predictive value of V) depends on the amount of ‘surprise’ (the difference between what actually happens, lambda, and what you expect ∑ V )

Alpha = learning rate-parameter set by the CS. This value is from 0 to 1. It depends on the type of CS. It essentially refers to the salience of the CS
Beta = learning rate-parameter set by the US. This value is from 0 to 1. It depends on the type of US. It refers to the speed of learning for a given US

  • if you have no experience with a given CS, then it predicts nothing, and you expect nothing: if the US occurs, you are surprised (you learn a lot about the CS’ prediction of the US).
  • if you had many past experiences where the CS occurs then the US follows, then you have learned that the CS means the US is coming, and when the US comes you are not terribly surprised (you learn little more about the CS’ prediction of the US)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we test learning and describe an experiment for it.

A

→ Dopamine response = Reward occurred – Reward predicted
Hence: dopamine neurons report a scalar signal, a single numerical quantity, similar to a prediction error

Single-unit recordings in monkeys: measuring electrophysiological activity (action potentials) from single neurons

Experiment: Two monkeys were conditioned in a Pavlovian procedure with distinct visual stimuli indicating the probability of liquid reward being delivered after a 2s delay.

Results:
Anticipatory licking responses during the interval between stimulus presentation and the actual reward increased with the probability of reward.

Phasic dopamine responses: they decrease with the degree to which the reward is expected

Neurons in high probability trials fire already when the visual stimulus appears, before the actual reward occurs: the concept of expectancy.

Dopamine neurons in the midbrain seem to follow the principles outlined in the prediction-error model: they decrease in activity as a function of the degree of expected reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

describe a second experiment to assess the role of dopamine in learning

A

Two monkeys were placed in a dynamic foraging environment in which they had to track the changing values of alternative choices over time.

Hypothesis:
The matching law = animals distribute their time among foraging sites in proportion to their relative value, i.e., the relative abundance of resources at each site.

We can experimentally measure this by looking at the monkey’s saccades – we use an eye tracking system.

Result: We can see that the blue line generally parallels the black line. This indicates that the monkey matched the ratio of its choices to the ratio of incomes from the two colors.

LIP may play a critical role in remapping abstract valuation to concrete action. This remapping is
demanded by the logic of this dynamic task wherein on every trial the monkey must transform a color-based representation of value into a spatial eye-movement plan.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe an experiment that tests reinforcement learning using parkinson’s patients.

A

Design: Patients were asked to play a (positive and negative) reinforcement learning task

Hypothesis: Cognitive performance should improve when patients take medication that elevates their dopamine levels

Stimulus-pairs (Hiragana characters) were used in both procedural learning conditions (i.e., probabilistic selection and transitive inference).

For probabilistic selection: Over the course of the training, participants learn to choose stimuli A, C, and E more often than B, D, or F.
For transitive inference: Over the course of the training, a hierarchy (A > B > C > D > E)

Patients on medication (green line) chose positive stimuli more reliably than they avoided negative stimuli. Also, these patients chose the positive stimuli more reliably than the other two groups.

Patients off medication (red line) avoided negative stimuli more reliably than they chose positive stimuli, and more reliably than the other two groups.

IMPLICATION: This study suggests that the dopamine system, carrying reward-predictions, is causally involved in making choices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the neural correlates of dopamine and learning

A

Dopamine is excitatory on the direct “Go” pathway, which helps facilitate responding,
whereas it is inhibitory on the indirect “NoGo” pathway, which suppresses responding.

The mesolimbic pathway (direct pathway) projects to the nucleus accumbens and is a part of complex subcortical circuits involving: the amygdala and the hippocampus. Work together to inform us on how rewarding a behavior/action/choice might be.

They therefore control behavior according to incentive salience. The mesocortical pathway (indirect pathway) projects primarily to the prefrontal cortex. Concerned with avoidance of aversive events.

In animals, phasic bursts of dopamine cell-firing are observed during positive reinforcement (as we have seen). These are thought to act as ‘teaching signals’ that lead to the learning of rewarding behaviors.

Conversely, choices that do not lead to reward (as well as aversive events) are associated with dopamine dips (i.e., dopamine dropping below baseline-levels).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly