lecture 13 Flashcards

1
Q

What is reinforcement learning?

A

A learning framework where an agent interacts with an environment and learns from rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does RL differ from supervised learning?

A

RL learns from interaction with the environment, while supervised learning relies on labeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the key objective of an RL agent?

A

To learn a policy that maximizes expected future rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a policy in RL?

A

A function that maps states to actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a reward in RL?

A

A numerical value indicating the immediate benefit of taking an action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the environment in RL?

A

The external system with which the agent interacts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Markov Decision Process (MDP)?

A

A mathematical framework for modeling decision-making in RL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the key components of an MDP?

A

States, actions, rewards, and transition probabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the assumption of the Markov property?

A

The next state depends only on the current state and action, not past states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an episodic RL task?

A

A task where learning consists of repeated episodes that end in a terminal state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a continuous RL task?

A

A task where interaction with the environment is ongoing without a clear termination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the exploration-exploitation tradeoff?

A

Balancing between exploring new actions and exploiting known good actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a value function in RL?

A

A function that estimates the expected cumulative reward from a state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between state-value and action-value functions?

A

State-value estimates future rewards for a state, while action-value estimates for a state-action pair.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Bellman equation?

A

A recursive equation that expresses the value of a state in terms of its expected rewards and future states.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is temporal difference (TD) learning?

A

A method that updates value estimates based on observed rewards and predictions.

17
Q

What is Q-learning?

A

An off-policy RL algorithm that learns action values by iteratively updating Q-values.

18
Q

What is the role of the Q-table in Q-learning?

A

It stores the estimated reward values for state-action pairs.

19
Q

What is deep Q-learning?

A

A variation of Q-learning that uses a neural network to approximate the Q-function.

20
Q

What is policy gradient learning?

A

A method where policies are directly optimized using gradient ascent on expected rewards.

21
Q

What is actor-critic learning?

A

A hybrid RL method combining value-based and policy-based approaches.

22
Q

What is reward shaping?

A

A technique where additional rewards are provided to guide learning.

23
Q

What is the credit assignment problem in RL?

A

Determining which past actions contributed to an observed reward.

24
Q

What is sparse reward in RL?

A

When rewards are infrequent, making learning difficult.

25
Q

What is an example of reinforcement learning in real-world applications?

A

Game playing (e.g., AlphaGo, DeepMind’s Atari games).

26
Q

What is the difference between model-free and model-based RL?

A

Model-free learns from experience without a model, while model-based plans using an environment model.

27
Q

What is the purpose of experience replay in deep RL?

A

To store past experiences and use them for training, improving stability.

28
Q

What is an advantage of reinforcement learning?

A

It can learn complex behaviors without explicit supervision.

29
Q

What is a limitation of reinforcement learning?

A

It requires large amounts of data and can be sample inefficient.