Reinforcement Learning Flashcards
What is the formal definition of an agent?
An entity that has a set of sensors to observe the state of its environment, and a set of actions it can perform to alter the state
What is the task of an agent?
To learn a control strategy (policy) for choosing actions to achieve its goals
How do we provide reinforcement to an agent?
By rewarding it with a positive score for actions taken towards reaching the goal, and negative score for actions away from the goal
Why can sparse reward spaces result in a failure of reinforcement learning?
- Number of steps to gain reward too high
- Random choice is computationally inefficient
What is reward shaping?
Manually designing a reward function.
This guides the policy to the final goal
What are some drawbacks of reward shaping?
- Ad-hoc process depending on environment
- New reward function for every problem
- Agent may learn to maximise reward without achieving goal
What is the law of unintended concequences?
- Unexpected benefit
- Unexpected drawbacks
- Perverse result
What should an agent learn to do?
It should learn to choose actions that maximise the reward gained from that action
What is a utility based agent?
It learns a utility function on states and uses it to select actions that maximise the expected outcome utility
What do utility based agents need?
A model of the environment
What is Q-Learning?
An agent that learns an action-utility function given the expected utility of taking a given action in a given state
What don’t Q-Learning agents need?
A model of the environment
What is a reflex agent?
An agent that learns a policy which maps directly from states to actions
How do utility based agents work out which action is most efficient?
With a utility function:
- Map each state after each action to a number
- This number represents how efficiently each action achieves the goal
How do reflex agents work?
- Selects actions based on its current perception of the environment
- Past experience not considered
- Only one possibility is acted on
This is called a condition-action rule:
IF battery low THEN charge
What are some key features of Reinforcement learning?
Delayed reward
Exploration vs Exploitation
Partially observable states
Life Long learning
What is Delayed Reward?
Feedback only provided as the agent executes its sequence of actions
What is Exploration vs Exploitation
A trade off between whether to explore the search space or to exploit known actions that get reward
What is a partially observable state?
Sensors may only provide partial information
Actions may aim at improving observability
What is life-long learning?
The possibility for an agent to use previous experience to guide it
What is the utility of a state?
The expected total reward from that state onwards