Reinforcement Learning: Actor/Critic Flashcards

1
Q

What is the problem being solved in reinforcement learning?

A

how to assign credit over a sequence of actions leading to cumulative reward

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the role of the teacher?

A

The role of the teacher in reinforcement learning tasks is more evaluative than instructional, and the teacher is sometimes called a critic because of this role

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the critic provide?

A

evaluations of the learning system’s actions as training information, leaving it to the learning system to determine how to modify its actions so as to obtain better evaluations in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the critic not do?

A

does not tell the learning system what to do to improve performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does reinforcement learning methods have to incorporate due to the role of the critic?

A

have to incorporate an additional exploration process that can be used to discover the appropriate actions to store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What did the adaptive critic element construct?

A

an evaluation of different states of the environment, using a temporal difference-like learning rule from which the TD learning rule was later developed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How was ACE evaluation used in reinforcement learning?

A

used to augment the external reinforcement signal and train through a trial-and error process a second unit, the “associative search element (ASE)”, to select the correct action at each state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What insight first gave way to the ACE-ASE model?

A

Sutton, 1978

even when the external reinforcement for a task is delayed (as when playing checkers), a temporal difference prediction error can convey, at every timestep, a surrogate ‘reinforcement’ signal that embodies both immediate outcomes and future prospects, to the action just chosen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens ini the absence of external reinforcement in the ACE-ASE model?

A

in the absence of external reinforcement (ie,rt = 0), the prediction error δt becomes γV(St+1)−V(St), that is, it compares the values of two consecutive states and conveys information regarding whether the chosen action has led to a state with a higher value than the previous state (ie, to a state predictive of more future reward) or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the implications of state changes in the ACE-ASE model?

A

whenever a positive prediction error is encountered, the current action has improved prospects for future rewards, and should be repeated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens when there is a negative prediction error?

A

The opposite is true for negative prediction errors, which signal that the action should be chosen less often in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do prediction errors allow the agent to do?

A

Thus the agent can learn an explicit policy – a probability distribution over all available actions at each state π(S,a) = p(a|S), by using the following learning rule at every timestep

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write the equation for the ACE-ASE model

A

π(S,a)new = π(S,a) old + ηπδt

where ηπ is the policy learning rate and δt is the prediction error from equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the critic use in the actor/critic model?

A

a Critic module uses TD learning to estimate state values V(S) from experience with the environment, and the same TD prediction error is also used to train the Actor module, which maintains and learns a policy π

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What has the actor/critic model been related to?

A

to policy improvement methods in dynamic programming (Sutton, 1988), and Williams (1992)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a specific example of the use of actor/critic models?

A

Sutton et al. (2000) have shown that in some cases the Actor/Critic can be construed as a gradient climbing algorithm for learning a parameterized policy, which converges to a local maximum (see also Dayan & Abbott, 2001)

17
Q

What are the limitations of actor/critic models?

A

in the general case Actor/Critic methods are not guaranteed to converge on an optimal behavioral policy (cf. Baird, 1995; Konda & Tsitsiklis, 2003)

18
Q

Biological plausibility of actor/critic models

A

some of the strongest links between RL methods and neurobiological data regarding animal and human decision making have been related to the Actor/Critic framework

19
Q

What have actor/critic models been used to study in animals?

A

Actor/Critic methods have been extensively linked to instrumental action selection and Pavlovian prediction learning in the basal ganglia (eg. Barto, 1995; Houk et al., 1995; Joel et al., 2002)