Reinforcement Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is a value function?

A

An evaluation of the reward associated with actions in a certain context of the world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the goal of reinforcement learning?

A

Maximizing a cumulative reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a policy?

A

Maps states to actions using value functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the VTA?

A

Ventral Tegmental Area: source of dopaminergic neurons.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which neurotransmitter is produced at the VTA?

A

Dopamine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe instrumental conditioning.

A

An association between an action and rewards (or punishments).

Also called ‘operant’ conditioning, or ‘the law of effect’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is dopamine signalling?

A

Expected reward (not simply a reward amount).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between model based and model free learning?

A

Model based learning attempts to make predictions on the basis of a model of the world.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the divergence of VTA connections?

A

500.000 connections per neuron (about 50x more than the “average” cortical neuron)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the Markov property (as in a Markov Decision Process)?

A

Only the present matters for a decision about an action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the ‘exploit x explore’ dilemma?

A

That it is not possible to simultaneously learn about the world (explore) and maximize a reward (exploit). Organisms need to find the middle.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is reinforcement learning a ‘normative framework’?

A

It doesn’t specify what agents will do, but what they should do.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is “classical conditioning”?

A

Pairing a neutral stimulus with an unconditioned response (Pavlovian conditioning).

E.g.: Unconditioned response (salivating) is evoked by neutral stimulus (bell sounding).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the signal thought to be produced in the nucleus accumbens?

A

A critic, a system that gives feedback on how well a system produced prediction about rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is “TD” Learning?

A

Temporal Difference learning. The idea that rewards always follow behavior (rewards come after a delay).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between model free and model based reinforcement learning?

A

Model free is simply an association between recent actions and a reward.

In model based learning, an “agent” builds a model of the world to use that as a source of predictions about potential rewards.