lecture 13 Flashcards
What is reinforcement learning?
A learning framework where an agent interacts with an environment and learns from rewards.
How does RL differ from supervised learning?
RL learns from interaction with the environment, while supervised learning relies on labeled data.
What is the key objective of an RL agent?
To learn a policy that maximizes expected future rewards.
What is a policy in RL?
A function that maps states to actions.
What is a reward in RL?
A numerical value indicating the immediate benefit of taking an action.
What is the environment in RL?
The external system with which the agent interacts.
What is a Markov Decision Process (MDP)?
A mathematical framework for modeling decision-making in RL.
What are the key components of an MDP?
States, actions, rewards, and transition probabilities.
What is the assumption of the Markov property?
The next state depends only on the current state and action, not past states.
What is an episodic RL task?
A task where learning consists of repeated episodes that end in a terminal state.
What is a continuous RL task?
A task where interaction with the environment is ongoing without a clear termination.
What is the exploration-exploitation tradeoff?
Balancing between exploring new actions and exploiting known good actions.
What is a value function in RL?
A function that estimates the expected cumulative reward from a state.
What is the difference between state-value and action-value functions?
State-value estimates future rewards for a state, while action-value estimates for a state-action pair.
What is the Bellman equation?
A recursive equation that expresses the value of a state in terms of its expected rewards and future states.
What is temporal difference (TD) learning?
A method that updates value estimates based on observed rewards and predictions.
What is Q-learning?
An off-policy RL algorithm that learns action values by iteratively updating Q-values.
What is the role of the Q-table in Q-learning?
It stores the estimated reward values for state-action pairs.
What is deep Q-learning?
A variation of Q-learning that uses a neural network to approximate the Q-function.
What is policy gradient learning?
A method where policies are directly optimized using gradient ascent on expected rewards.
What is actor-critic learning?
A hybrid RL method combining value-based and policy-based approaches.
What is reward shaping?
A technique where additional rewards are provided to guide learning.
What is the credit assignment problem in RL?
Determining which past actions contributed to an observed reward.
What is sparse reward in RL?
When rewards are infrequent, making learning difficult.
What is an example of reinforcement learning in real-world applications?
Game playing (e.g., AlphaGo, DeepMind’s Atari games).
What is the difference between model-free and model-based RL?
Model-free learns from experience without a model, while model-based plans using an environment model.
What is the purpose of experience replay in deep RL?
To store past experiences and use them for training, improving stability.
What is an advantage of reinforcement learning?
It can learn complex behaviors without explicit supervision.
What is a limitation of reinforcement learning?
It requires large amounts of data and can be sample inefficient.