lecture 1 Flashcards

1
Q

What is the main objective of a deep reinforcement learning agent?

A

to learn a sequential decision-making task from experience in an environment to achieve specific goals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

transitions in reinforcement learning

A

usually stochastic

  • because the next state in the environment may depend on various factors beyond the agent’s control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

observations (w) and actions (a)

A
  • may be high-dimensional
  • observations may not provide full knowledge of the underlying state because the agent only receives partial information about the environment ( w =/= x )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

experience

A
  • may be constrained
  1. no access to accurate simulator
  2. limited data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the ‘reality gap’ in reinforcement learning, and why is it a challenge?

A
  • refers to the difference between the agent’s experience in a simulator and the real-world environment
  • is a challenge because if the simulator is not accurate enough, the policies learned in simulation may not perform well in the real world
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why might an agent have limited access to data in reinforcement learning?

A
  1. Safety constraints: Real-world exploration can be risky in fields like robotics and healthcare, limiting data collection.
  2. Compute constraints: High computational costs may limit the ability to run simulations in some environments.
  3. Exogenous factors: In areas like weather prediction or financial markets, data is inherently limited due to reliance on external conditions beyond control.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can the reality gap and limited data challenges be addressed in reinforcement learning?

A
  1. Develop an accurate simulator: A more accurate simulator reduces the reality gap, enabling better policy transfer to real-world environments. (reduces reality gap)
  2. Design the learning algorithm for generalization: Algorithms should be designed to improve generalization, allowing agents to perform well in unseen states or environments, even with limited training data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does generalization refer to in reinforcement learning?

A
  1. The capacity to achieve good performance in an environment where limited data has been gathered.
  2. The capacity to obtain good performance in a related but different environment by transferring learned knowledge.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can an agent achieve generalization with limited data?

A
  1. Regularization: Prevents overfitting to specific training scenarios.
  2. Experience replay: Reuses past experiences during training.
  3. Exploration strategies: Ensures the agent gathers diverse experiences, improving its ability to generalize to unseen states.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is transfer learning in reinforcement learning, and why is it important?

A
  • Transfer learning in reinforcement learning involves training an agent in one environment and adapting it to a related but different environment.
  • It is important because it allows the agent to reuse knowledge, improving its performance in new tasks.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are common methods for transfer learning in reinforcement learning?

A
  1. Fine-tuning: Retrains the agent on a new environment while retaining previously learned knowledge.
  2. Multi-task learning: Trains the agent on multiple tasks simultaneously to encourage it to learn generalizable features.
  3. Domain adaptation: Modifies the agent’s policy or input representation to handle differences between the source and target environments.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a supervised learning algorithm?

A
  • maps a dataset of learning examples into a predictive model
  • assuming the dataset is independently and identically distributed (i.i.d.) and representative of the true underlying data distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are bias and variance in supervised learning?

A
  1. Bias: Represents how much the model’s predictions differ from the true function. High bias leads to underfitting, where the model is too simple to capture patterns.
  2. Variance: Represents how much the model’s predictions vary when using different subsets of the training data. High variance leads to overfitting, where the model fits the noise in the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the ideal model in terms of bias and variance?

A
  • low bias and low variance.
  • The goal is to balance bias and variance by tuning model complexity and using techniques like regularization and data augmentation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can variance (overfitting) be reduced in supervised learning?

A

Increasing the size of the dataset can help reduce variance by improving the model’s generalization to new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is bias-variance decomposition?

A

Bias-variance decomposition describes how the total error of a model can be broken down into:

  1. Bias: The error due to incorrect assumptions in the model.
  2. Variance: The error due to sensitivity to the specific training data set used. (parametric variance = overfitting)
  3. Irreducible error: The inherent noise in the output that no model can eliminate. (internal variance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does the bias-variance decomposition highlight in reinforcement learning?

A

highlights a tradeoff between:

  1. Bias: Error directly introduced by the learning algorithm, leading to underfitting.
  2. Parametric variance: Error due to the limited amount of data available, leading to overfitting.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why is direct bias-variance decomposition less straightforward for loss functions other than L2 loss in reinforcement learning?

A
  • Overfitting arises from the sensitivity of the model’s predictions to variations in the training data, which manifests as variance in the loss function.
  • With non-L2 loss functions, variance in the loss may not directly correspond to statistical variance due to the non-linear transformation introduced by the loss function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can prediction error be decomposed when using non-L2 loss functions?

A

prediction error can be decomposed into:

  1. A bias-related term representing the lack of expressivity of the model.
  2. A variance-related term representing the sensitivity to the limited amount of data.
20
Q

Replacement for the bias-variance tradeoff in reinforcement learning

A

tradeoff between:

  1. A sufficiently rich learning algorithm to reduce model bias.
  2. A learning algorithm that is not too complex, to avoid overfitting to the limited amount of data.
21
Q

What is a batch or offline reinforcement learning algorithm?

A
  • Mapping a dataset D into a policy π_D
  • This can be done independently of whether the policy is derived from a model-based or model-free approach.
22
Q

How can the suboptimality of the expected return in an MDP be decomposed?

A
  1. Asymptotic bias: Bias in the policy due to limitations of the learning algorithm, even when given an infinite amount of data.
  2. Overfitting error: Error due to the finite size of the dataset, leading to overfitting.
23
Q

What is the bias-overfitting tradeoff in reinforcement learning?

A
  1. On one side of the scale is data, which represents the percentage of error due to overfitting. This decreases with more data.
  2. On the other side is policy class complexity, which represents the percentage of error due to asymptotic bias. Increasing policy class complexity reduces bias but increases the risk of overfitting when data is limited.
24
Q

How can the best policy be obtained in reinforcement learning?

A

The best policy can be obtained by balancing bias and overfitting through:

  1. Using a sufficiently expressive policy class to reduce bias.
  2. Ensuring the dataset is large and diverse enough to prevent overfitting.
25
Q

What are the key strategies for improving generalization in reinforcement learning?

A
  1. Using an abstract representation: Create a compact and informative state representation by discarding irrelevant or redundant features.
  2. Optimizing the objective function: Properly shape the reward function and tune key parameters (e.g., reward shaping, tuning the discount factor 𝛾).
  3. Choosing the appropriate learning algorithm: Select the right type of function approximator (e.g., neural networks) and decide between model-based and model-free approaches.
  4. Improving the dataset: Enhance the diversity of the dataset using better exploration strategies to improve generalization and address the exploration/exploitation dilemma.
26
Q

How does using an abstract representation help in reinforcement learning?

A

reduces overfitting and improves generalization by focusing only on essential features, avoiding spurious correlations caused by irrelevant features

27
Q

Why is the appropriate level of abstraction important in the bias-overfitting tradeoff?

A
  • helps balance bias and overfitting
  • A small but rich abstract representation allows for improved generalization by focusing on essential information and reducing unnecessary complexity.
28
Q

What are the potential issues of including too many irrelevant features in the state representation?

A
  1. Overfitting: The RL algorithm may pick up spurious correlations.
  2. Increased variance: Adds complexity without meaningful improvement in performance.
29
Q

How does removing irrelevant or redundant features affect bias and variance?

A
  • Removing irrelevant or redundant features helps reduce overfitting (variance) but may introduce some bias.
  • This is because removing features that differentiate states with different roles in the dynamics can reduce the model’s expressivity.
30
Q

How can modifying the objective function improve the policy learned by a deep RL algorithm?

A
  • by introducing a bias that helps the agent generalize better
  • This involves optimizing an objective function that may diverge from the true objective but accelerates learning and improves performance in practice.
31
Q

What are the main approaches for modifying the objective function in reinforcement learning?

A
  1. Reward shaping:
    - Adds an auxiliary reward signal to help the agent learn faster.
    - Can accelerate learning but requires careful design to avoid misleading the agent with rewards that deviate too far from the true objective.
    - Reward normalization is important for deep RL.
  2. Tuning the discount factor 𝛾:
    - Determines how much the agent values future rewards relative to immediate rewards.
    - Proper tuning balances short-term and long-term objectives, improving the agent’s generalization ability across tasks.
32
Q

How does the discount factor 𝛾 influence reinforcement learning?

A
  1. Lower 𝛾: The agent focuses more on immediate rewards, which is useful for tasks where quick feedback is essential or long-term planning is less critical.
  2. Higher 𝛾: The agent prioritizes long-term cumulative rewards, which is important in environments where future rewards are more significant.
    Proper tuning helps balance the tradeoff between short-term and long-term objectives.
33
Q

What is the recommended approach to tuning the discount factor 𝛾 during training?

A
  • to start with a low value of 𝛾 and gradually tune it during training.
  • In the given case, a discount factor of 1 is used with a finite horizon at test time.
34
Q

What are the three main components that an RL agent may include?

A
  1. Value function representation: Predicts the expected return for each state or state-action pair, indicating how good a state or action is.
  2. Policy representation: Directly maps states to actions.
  3. Model of the environment: Predicts the next state and reward, which can be used for planning in model-based reinforcement learning.
35
Q

What is the difference between model-free and model-based reinforcement learning approaches?

A
  1. Model-free approaches: Directly learn the policy or value function without building a model of the environment.
  2. Model-based approaches: Use a learned model of the environment to plan actions, typically leading to more sample-efficient learning.
36
Q

What role do function approximators play in deep reinforcement learning?

A
  • Function approximators, such as neural networks, map inputs (states) to outputs (values or policies).
  • i.e., used to estimate value functions or policies in deep reinforcement learning.
  • They are crucial for enabling the agent to generalize and determine how features are treated at higher levels of abstraction.
37
Q

How does the choice of function approximator affect generalization in reinforcement learning?

A
  1. The choice of function approximator determines the level of abstraction introduced by deep learning, which depends on the network architecture and feature selection mechanisms (e.g., attention mechanisms).
  2. Depending on the task, choosing between model-free or model-based approaches is critical, as model-based methods are more sample-efficient but require accurate models, while model-free methods are often more robust in complex environments without a model.
38
Q

What is the parallel between RL algorithms and human cognition as described by Daniel Kahneman?

A
  1. System 1: Fast, instinctive, and automatic decision-making, similar to model-free RL, where decisions are based on learned patterns.
  2. System 2: Slow, deliberate, and logical decision-making, similar to model-based RL, where decisions involve planning based on a model of the environment.
39
Q

What is transfer learning in the context of reinforcement learning?

A
  • leveraging knowledge acquired in one task or environment to improve performance in a different but related task or environment
  • involves using a pre-trained policy or value function in a different setting that may share similar dynamics but differ in visual representation, reward structure, or specific states
40
Q

Why is exploration critical in reinforcement learning, and what is the main challenge associated with it?

A
  • Exploration is critical because the agent needs to gather sufficient information about the environment to find an optimal policy.
  • The main challenge is balancing exploration (trying new actions) and exploitation (choosing known best actions)
  • Too much exploitation can lead to suboptimal solutions.
  • Too much exploration can delay convergence to an optimal policy.
41
Q

What is undirected exploration in reinforcement learning?

A
  • methods that do not prioritize specific areas of the environment based on uncertainty or potential rewards
  • ex: epsilon-greedy
42
Q

What is directed exploration in reinforcement learning, and when is it useful?

A
  • aims to explore areas with high novelty or uncertainty
  • useful when:
  1. When rewards are not sparse, uncertainty in the value function can guide exploration.
  2. When rewards are sparse, explicit exploration rewards may be used.
  • ex: UCB, thompson sampling
43
Q

What is the role of learning algorithms in reinforcement learning?

A
  • govern how the agent updates its function approximators using data from interactions with the environment.
  • Examples include Q-learning and policy gradients.
44
Q

What is replay memory, and why is it important in reinforcement learning?

A

stores past experiences (state, action, reward, next state) to break temporal correlations in training data, thereby improving sample efficiency and stability during training.

45
Q

What are controllers in reinforcement learning?

A

Controllers handle meta-level operations such as training, validation, and testing phases, as well as hyperparameter tuning.

46
Q

How do policies influence an RL agent’s behavior?

A
  • define how the agent chooses actions, balancing exploration and exploitation
  • e.g., e-greedy policy
47
Q

What role does the environment play in reinforcement learning?

A

The environment is where the agent interacts to gather experiences and learn optimal behavior through trial and error.