C8 Flashcards
what is hierarchical reinforcement learning?
the granularity of abstractions is larger than the fine grain of the primitive actions of the environment (taking a train instead of individual steps)
advantages of hierarchical methods
- simplify problems through abstraction. Agent creates subgoals and solves these fine grain tasks first. Actions are abstracted into macro actions to solve these subtasks
- increased sample efficiency: subpolicies are learned to solve subtasks, reducing the environment interactions. Subtasks can be transferred to other problems
- policies become more general, and are able to adapt to changes in the environment more easily
- the higher level of abstraction allows agents to solve larger, more complex problems
disadvantages of hierarchical methods
- Many assume that domain knowledge is available to subdivide the environment so that hierarchical RL can be applied
- algorithmic complexity: identify subgoals, learn subpolicies etc.
- macros are combinations of actions and the number of combinations of actions is exponential in their length, so computational complexity of the planning and learning choices increases by the introduction of the macro actions
- the quality of a behavioral policy that includes macro-actions may be less than that of a policy consisting only of primitive actions, because they may skip over possible shorter routes, that the primitive actions would have found
what is the options framework?
Whenever a state is reached that is a subgoal, then, apart from following a primitive action (main policy), you can follow the option policy, a macro action consisting of a different subpolicy specially aimed at satisfying the subgoal in one large step. In this way macros are incorporated into the reinforcement learning framework.
what is an option?
a group of actions with a termination condition
what is an option?
a group of actions with a termination condition. They take in environment observations and output actions until a
termination condition is met.
what are the tree elements of an option π?
- initialization set I_π: the states that the option can start from
- subpolicy ππ (π|π ): internal to this particular option
- terminal condition π½π (π ): tells us if π terminates in s
what are macros?
any group of actions, possibly open-ended
what is intrinsic motivation?
An inner drive to explore, named so to contrast it with classic extrinsic motivation (the conventional RL reward signal). They are related to reward signals for achieving subgoals
How do multi agent and hierarchical reinforcement learning fit together?
agents often work together in teams or other hierarchical structure
what is so special about Montezumaβs Revenge?
it is a difficult situation to learn for RL, because is has little reward signal and the reward signal is delayed. It consists of long stretches in which the agent has to walk without the reward changing.