W8 Hierachical RL Flashcards
1) Why can hierarchical reinforcement learning (HRL) be faster?
2) Why can hierarchical reinforcement learning be slower?
1) HRL simplifies problems through abstraction, the temporal abstractions increase sample efficiency
2) domain knowledge needs to be available, algorithmic complexity cost time, marco-actions
Why may hierarchical reinforcement learning give an answer of lesser quality?
The macro-actions may skip over possible shorter routes, that the primitive actions would have found.
Is hierachical reinforcement more general or less general?
More general
subtasks reduce brittleness due to overspecialization of policies. Policies become more general, and are able to adapt to changes in the environment more easily.
What is the options framework?
Whenever a state is reached that is a subgoal, then, apart from following a primitive action (main policy), you can follow the option policy, a macro action consisting of a different subpolicy specially aimed at satisfying the subgoal in one large step. In this way macros are incorporated into the reinforcement learning framework.
What is an option?
An option is a group of actions with a termination
condition.
Options take in environment observations and output actions until a termination condition is met.
What are the three elements that an option consists of?
The initiation set πΌ β π are the states that the option can start from
The subpolicy π : π Γ π΄ β [0, 1] internal to this particular option
The termination condition π½ : π β [0, 1] tells us if π terminates in π
What is a macro?
Macros are combinations of primitive actions, and their use can greatly improve the performance of the policy.
What is intrinsic motivation?
An inner drive to explore.
Named so to contrast it with classic extrinsic motivation (the conventional RL reward signal).
Often related to model curiosity.
what are the three elements of an option π?
initialization set I_π: the states that the option can start from
subpolicy ππ (π|π ): internal to this particular option
terminal condition π½π (π ): tells us if π terminates in s
How do multi agent and hierarchical reinforcement learning fit together?
agents often work together in teams or other hierarchical structure
What is so special about Montezumaβs Revenge?
The reward signal is little. The agent needs to walk without the reward changing