lecture 1 Flashcards
What is the main objective of a deep reinforcement learning agent?
to learn a sequential decision-making task from experience in an environment to achieve specific goals
transitions in reinforcement learning
usually stochastic
- because the next state in the environment may depend on various factors beyond the agent’s control
observations (w) and actions (a)
- may be high-dimensional
- observations may not provide full knowledge of the underlying state because the agent only receives partial information about the environment ( w =/= x )
experience
- may be constrained
- no access to accurate simulator
- limited data
What is the ‘reality gap’ in reinforcement learning, and why is it a challenge?
- refers to the difference between the agent’s experience in a simulator and the real-world environment
- is a challenge because if the simulator is not accurate enough, the policies learned in simulation may not perform well in the real world
Why might an agent have limited access to data in reinforcement learning?
- Safety constraints: Real-world exploration can be risky in fields like robotics and healthcare, limiting data collection.
- Compute constraints: High computational costs may limit the ability to run simulations in some environments.
- Exogenous factors: In areas like weather prediction or financial markets, data is inherently limited due to reliance on external conditions beyond control.
How can the reality gap and limited data challenges be addressed in reinforcement learning?
- Develop an accurate simulator: A more accurate simulator reduces the reality gap, enabling better policy transfer to real-world environments. (reduces reality gap)
- Design the learning algorithm for generalization: Algorithms should be designed to improve generalization, allowing agents to perform well in unseen states or environments, even with limited training data.
What does generalization refer to in reinforcement learning?
- The capacity to achieve good performance in an environment where limited data has been gathered.
- The capacity to obtain good performance in a related but different environment by transferring learned knowledge.
How can an agent achieve generalization with limited data?
- Regularization: Prevents overfitting to specific training scenarios.
- Experience replay: Reuses past experiences during training.
- Exploration strategies: Ensures the agent gathers diverse experiences, improving its ability to generalize to unseen states.
What is transfer learning in reinforcement learning, and why is it important?
- Transfer learning in reinforcement learning involves training an agent in one environment and adapting it to a related but different environment.
- It is important because it allows the agent to reuse knowledge, improving its performance in new tasks.
What are common methods for transfer learning in reinforcement learning?
- Fine-tuning: Retrains the agent on a new environment while retaining previously learned knowledge.
- Multi-task learning: Trains the agent on multiple tasks simultaneously to encourage it to learn generalizable features.
- Domain adaptation: Modifies the agent’s policy or input representation to handle differences between the source and target environments.
What is a supervised learning algorithm?
- maps a dataset of learning examples into a predictive model
- assuming the dataset is independently and identically distributed (i.i.d.) and representative of the true underlying data distribution.
What are bias and variance in supervised learning?
- Bias: Represents how much the model’s predictions differ from the true function. High bias leads to underfitting, where the model is too simple to capture patterns.
- Variance: Represents how much the model’s predictions vary when using different subsets of the training data. High variance leads to overfitting, where the model fits the noise in the data.
What is the ideal model in terms of bias and variance?
- low bias and low variance.
- The goal is to balance bias and variance by tuning model complexity and using techniques like regularization and data augmentation.
How can variance (overfitting) be reduced in supervised learning?
Increasing the size of the dataset can help reduce variance by improving the model’s generalization to new data.
What is bias-variance decomposition?
Bias-variance decomposition describes how the total error of a model can be broken down into:
- Bias: The error due to incorrect assumptions in the model.
- Variance: The error due to sensitivity to the specific training data set used. (parametric variance = overfitting)
- Irreducible error: The inherent noise in the output that no model can eliminate. (internal variance)
What does the bias-variance decomposition highlight in reinforcement learning?
highlights a tradeoff between:
- Bias: Error directly introduced by the learning algorithm, leading to underfitting.
- Parametric variance: Error due to the limited amount of data available, leading to overfitting.
Why is direct bias-variance decomposition less straightforward for loss functions other than L2 loss in reinforcement learning?
- Overfitting arises from the sensitivity of the model’s predictions to variations in the training data, which manifests as variance in the loss function.
- With non-L2 loss functions, variance in the loss may not directly correspond to statistical variance due to the non-linear transformation introduced by the loss function.
How can prediction error be decomposed when using non-L2 loss functions?
prediction error can be decomposed into:
- A bias-related term representing the lack of expressivity of the model.
- A variance-related term representing the sensitivity to the limited amount of data.
Replacement for the bias-variance tradeoff in reinforcement learning
tradeoff between:
- A sufficiently rich learning algorithm to reduce model bias.
- A learning algorithm that is not too complex, to avoid overfitting to the limited amount of data.
What is a batch or offline reinforcement learning algorithm?
- Mapping a dataset D into a policy π_D
- This can be done independently of whether the policy is derived from a model-based or model-free approach.
How can the suboptimality of the expected return in an MDP be decomposed?
- Asymptotic bias: Bias in the policy due to limitations of the learning algorithm, even when given an infinite amount of data.
- Overfitting error: Error due to the finite size of the dataset, leading to overfitting.
What is the bias-overfitting tradeoff in reinforcement learning?
- On one side of the scale is data, which represents the percentage of error due to overfitting. This decreases with more data.
- On the other side is policy class complexity, which represents the percentage of error due to asymptotic bias. Increasing policy class complexity reduces bias but increases the risk of overfitting when data is limited.
How can the best policy be obtained in reinforcement learning?
The best policy can be obtained by balancing bias and overfitting through:
- Using a sufficiently expressive policy class to reduce bias.
- Ensuring the dataset is large and diverse enough to prevent overfitting.