Reinforcement Learning Flashcards

Question 1

Q

What is a value function?

Answer

A

An evaluation of the reward associated with actions in a certain context of the world.

Question 2

Q

What is the goal of reinforcement learning?

Answer

A

Maximizing a cumulative reward.

Question 3

Q

What is a policy?

Answer

A

Maps states to actions using value functions.

Question 4

Q

What is the VTA?

Answer

A

Ventral Tegmental Area: source of dopaminergic neurons.

Question 5

Q

Which neurotransmitter is produced at the VTA?

Question 6

Q

Describe instrumental conditioning.

Answer

A

An association between an action and rewards (or punishments).

Also called ‘operant’ conditioning, or ‘the law of effect’.

Question 7

Q

What is dopamine signalling?

Answer

A

Expected reward (not simply a reward amount).

Question 8

Q

What is the difference between model based and model free learning?

Answer

A

Model based learning attempts to make predictions on the basis of a model of the world.

Question 9

Q

What is the divergence of VTA connections?

Answer

A

500.000 connections per neuron (about 50x more than the “average” cortical neuron)

Question 10

Q

What’s the Markov property (as in a Markov Decision Process)?

Answer

A

Only the present matters for a decision about an action.

Question 11

Q

What is the ‘exploit x explore’ dilemma?

Answer

A

That it is not possible to simultaneously learn about the world (explore) and maximize a reward (exploit). Organisms need to find the middle.

Question 12

Q

Why is reinforcement learning a ‘normative framework’?

Answer

A

It doesn’t specify what agents will do, but what they should do.

Question 13

Q

What is “classical conditioning”?

Answer

A

Pairing a neutral stimulus with an unconditioned response (Pavlovian conditioning).

E.g.: Unconditioned response (salivating) is evoked by neutral stimulus (bell sounding).

Question 14

Q

What is the signal thought to be produced in the nucleus accumbens?

Answer

A

A critic, a system that gives feedback on how well a system produced prediction about rewards.

Question 15

Q

What is “TD” Learning?

Answer

A

Temporal Difference learning. The idea that rewards always follow behavior (rewards come after a delay).

Question 16

Q

What is the difference between model free and model based reinforcement learning?

Answer

Study These Flashcards

A

Model free is simply an association between recent actions and a reward.

In model based learning, an “agent” builds a model of the world to use that as a source of predictions about potential rewards.

Reinforcement Learning Flashcards

(16 cards)