Reinforcement Learning Flashcards
What is the formal definition of an agent?
An entity that has a set of sensors to observe the state of its environment, and a set of actions it can perform to alter the state
What is the task of an agent?
To learn a control strategy (policy) for choosing actions to achieve its goals
How do we provide reinforcement to an agent?
By rewarding it with a positive score for actions taken towards reaching the goal, and negative score for actions away from the goal
Why can sparse reward spaces result in a failure of reinforcement learning?
- Number of steps to gain reward too high
- Random choice is computationally inefficient
What is reward shaping?
Manually designing a reward function.
This guides the policy to the final goal
What are some drawbacks of reward shaping?
- Ad-hoc process depending on environment
- New reward function for every problem
- Agent may learn to maximise reward without achieving goal
What is the law of unintended concequences?
- Unexpected benefit
- Unexpected drawbacks
- Perverse result
What should an agent learn to do?
It should learn to choose actions that maximise the reward gained from that action
What is a utility based agent?
It learns a utility function on states and uses it to select actions that maximise the expected outcome utility
What do utility based agents need?
A model of the environment
What is Q-Learning?
An agent that learns an action-utility function given the expected utility of taking a given action in a given state
What don’t Q-Learning agents need?
A model of the environment
What is a reflex agent?
An agent that learns a policy which maps directly from states to actions
How do utility based agents work out which action is most efficient?
With a utility function:
- Map each state after each action to a number
- This number represents how efficiently each action achieves the goal
How do reflex agents work?
- Selects actions based on its current perception of the environment
- Past experience not considered
- Only one possibility is acted on
This is called a condition-action rule:
IF battery low THEN charge