DL-10 - Reinforcement learning Flashcards
DL-10 - Reinforcement learning
What type of data do you have in supervised learning?
Pairs of (x, y).
DL-10 - Reinforcement learning
What type of data do you have in unsupervised learning?
Only x, no label.
DL-10 - Reinforcement learning
What type of data do you have in reinforcement learning?
State-action pairs.
DL-10 - Reinforcement learning
What is the goal of supervised learning?
Learning a mapping from x -> y.
DL-10 - Reinforcement learning
What is the goal of unsupervised learning?
Learn an underlying structure in the data.
DL-10 - Reinforcement learning
What is the goal of reinforcement learning?
Maximizing future reward over many time steps.
DL-10 - Reinforcement learning
How do children learn from interactions?
By receiving positive/negative rewards that they learn from. (See image)
DL-10 - Reinforcement learning
What is reinforcement learning about?
Learning in a dynamic environment, where the learned/model can decide what actions to try.
DL-10 - Reinforcement learning
What is a model called in reinforcement learning?
They are typically called agents.
DL-10 - Reinforcement learning
What is the meta-model of reinforcement learning?
- Take actions that affect the environment.
- Observe the changes to the environment.
(See image)
DL-10 - Reinforcement learning
What is an environment?
The dynamic and interactive context in which an agent learns and takes actions.
DL-10 - Reinforcement learning
What is an episode?
A sequence of actions that ends in a terminal state.
DL-10 - Reinforcement learning
What is the formula for total reward?
(See image)
DL-10 - Reinforcement learning
What formula is this? (See image)
Total reward.
DL-10 - Reinforcement learning
What formula is this? (See image)
Discounted reward.
DL-10 - Reinforcement learning
What is the formula for discounted reward?
(See image)
DL-10 - Reinforcement learning
How does the agent affect the environment?
Through its actions.
DL-10 - Reinforcement learning
What does the agent observe from the environment? (2)
- State changes
- Rewards
DL-10 - Reinforcement learning
What does the Q-function do?
It captures the expected rewards for an action, π_t taken in a given state π _t.
DL-10 - Reinforcement learning
What is the name of the function that does the following?
βIt captures the expected rewards for an action, π_t taken in a given state π _t.β
Itβs named the βQ-functionβ.
DL-10 - Reinforcement learning
Whatβs the formula for the Q-function?
(See image)
DL-10 - Reinforcement learning
What is a policy?
The agent needs a policy, π(π ), to infer the best action to take at state, π .
DL-10 - Reinforcement learning
What is π(π )?
The policy function that evaluates the state s.
DL-10 - Reinforcement learning
Whatβs the name of the function that evaluates a state s to decide on the best action to take?
Itβs called the policy function, written π(π ).
DL-10 - Reinforcement learning
What is the RL strategy?
The RL strategy is that the policy chooses an action that maximizes future reward.
DL-10 - Reinforcement learning
What is the formula for the RL strategy?
(See image)
DL-10 - Reinforcement learning
What formula is this? (See image)
The RL strategy.
DL-10 - Reinforcement learning
What are the major classes of RL algorithms?
- Value learning
- Policy learning
DL-10 - Reinforcement learning
How does value learning work?
(See image)
DL-10 - Reinforcement learning
What type of RL algorithm is this?
Value learning