C8 Flashcards

Question 1

Q

what is hierarchical reinforcement learning?

Answer

A

the granularity of abstractions is larger than the fine grain of the primitive actions of the environment (taking a train instead of individual steps)

Question 2

Q

advantages of hierarchical methods

Answer

A

simplify problems through abstraction. Agent creates subgoals and solves these fine grain tasks first. Actions are abstracted into macro actions to solve these subtasks
increased sample efficiency: subpolicies are learned to solve subtasks, reducing the environment interactions. Subtasks can be transferred to other problems
policies become more general, and are able to adapt to changes in the environment more easily
the higher level of abstraction allows agents to solve larger, more complex problems

Question 3

Q

disadvantages of hierarchical methods

Answer

A

Many assume that domain knowledge is available to subdivide the environment so that hierarchical RL can be applied
algorithmic complexity: identify subgoals, learn subpolicies etc.
macros are combinations of actions and the number of combinations of actions is exponential in their length, so computational complexity of the planning and learning choices increases by the introduction of the macro actions
the quality of a behavioral policy that includes macro-actions may be less than that of a policy consisting only of primitive actions, because they may skip over possible shorter routes, that the primitive actions would have found

Question 4

Q

what is the options framework?

Answer

A

Whenever a state is reached that is a subgoal, then, apart from following a primitive action (main policy), you can follow the option policy, a macro action consisting of a different subpolicy specially aimed at satisfying the subgoal in one large step. In this way macros are incorporated into the reinforcement learning framework.

Question 5

Q

what is an option?

Answer

A

a group of actions with a termination condition

Question 6

Q

what is an option?

Answer

A

a group of actions with a termination condition. They take in environment observations and output actions until a
termination condition is met.

Question 7

Q

what are the tree elements of an option 𝜔?

Answer

A

initialization set I_𝜔: the states that the option can start from
subpolicy 𝜋𝜔 (𝑎|𝑠): internal to this particular option
terminal condition 𝛽𝜔 (𝑠): tells us if 𝜔 terminates in s

Question 8

Q

what are macros?

Answer

A

any group of actions, possibly open-ended

Question 9

Q

what is intrinsic motivation?

Answer

A

An inner drive to explore, named so to contrast it with classic extrinsic motivation (the conventional RL reward signal). They are related to reward signals for achieving subgoals

Question 10

Q

How do multi agent and hierarchical reinforcement learning fit together?

Answer

A

agents often work together in teams or other hierarchical structure

Question 11

Q

what is so special about Montezuma’s Revenge?

Answer

A

it is a difficult situation to learn for RL, because is has little reward signal and the reward signal is delayed. It consists of long stretches in which the agent has to walk without the reward changing.

C8 Flashcards

(11 cards)