DL-10 - Reinforcement learning Flashcards

1
Q

DL-10 - Reinforcement learning

What type of data do you have in supervised learning?

A

Pairs of (x, y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DL-10 - Reinforcement learning

What type of data do you have in unsupervised learning?

A

Only x, no label.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DL-10 - Reinforcement learning

What type of data do you have in reinforcement learning?

A

State-action pairs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DL-10 - Reinforcement learning

What is the goal of supervised learning?

A

Learning a mapping from x -> y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DL-10 - Reinforcement learning

What is the goal of unsupervised learning?

A

Learn an underlying structure in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DL-10 - Reinforcement learning

What is the goal of reinforcement learning?

A

Maximizing future reward over many time steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DL-10 - Reinforcement learning

How do children learn from interactions?

A

By receiving positive/negative rewards that they learn from. (See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DL-10 - Reinforcement learning

What is reinforcement learning about?

A

Learning in a dynamic environment, where the learned/model can decide what actions to try.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DL-10 - Reinforcement learning

What is a model called in reinforcement learning?

A

They are typically called agents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DL-10 - Reinforcement learning

What is the meta-model of reinforcement learning?

A
  • Take actions that affect the environment.
  • Observe the changes to the environment.

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DL-10 - Reinforcement learning

What is an environment?

A

The dynamic and interactive context in which an agent learns and takes actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DL-10 - Reinforcement learning

What is an episode?

A

A sequence of actions that ends in a terminal state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DL-10 - Reinforcement learning

What is the formula for total reward?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DL-10 - Reinforcement learning

What formula is this? (See image)

A

Total reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DL-10 - Reinforcement learning

What formula is this? (See image)

A

Discounted reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DL-10 - Reinforcement learning

What is the formula for discounted reward?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DL-10 - Reinforcement learning

How does the agent affect the environment?

A

Through its actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

DL-10 - Reinforcement learning

What does the agent observe from the environment? (2)

A
  • State changes
  • Rewards
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

DL-10 - Reinforcement learning

What does the Q-function do?

A

It captures the expected rewards for an action, π‘Ž_t taken in a given state 𝑠_t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

DL-10 - Reinforcement learning

What is the name of the function that does the following?

β€œIt captures the expected rewards for an action, π‘Ž_t taken in a given state 𝑠_t.”

A

It’s named the β€œQ-function”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

DL-10 - Reinforcement learning

What’s the formula for the Q-function?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

DL-10 - Reinforcement learning

What is a policy?

A

The agent needs a policy, πœ‹(𝑠), to infer the best action to take at state, 𝑠.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

DL-10 - Reinforcement learning

What is πœ‹(𝑠)?

A

The policy function that evaluates the state s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

DL-10 - Reinforcement learning

What’s the name of the function that evaluates a state s to decide on the best action to take?

A

It’s called the policy function, written πœ‹(𝑠).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

DL-10 - Reinforcement learning

What is the RL strategy?

A

The RL strategy is that the policy chooses an action that maximizes future reward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

DL-10 - Reinforcement learning

What is the formula for the RL strategy?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

DL-10 - Reinforcement learning

What formula is this? (See image)

A

The RL strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

DL-10 - Reinforcement learning

What are the major classes of RL algorithms?

A
  • Value learning
  • Policy learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

DL-10 - Reinforcement learning

How does value learning work?

A

(See image)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

DL-10 - Reinforcement learning

What type of RL algorithm is this?

A

Value learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

DL-10 - Reinforcement learning

How does policy learning work?

A

(See image)

32
Q

DL-10 - Reinforcement learning

What type of algorithm is this?

A

Policy learning.

33
Q

DL-10 - Reinforcement learning

When would you use value learning?

A

When your input space is limited

34
Q

DL-10 - Reinforcement learning

When is value learning a better choice than policy learning?

A

Value learning is better when the environment is deterministic and the value function can be easily determined.

35
Q

DL-10 - Reinforcement learning

When is value learning a bad choice?

A

Value learning is a bad choice when the state space is too large or continuous.

36
Q

DL-10 - Reinforcement learning

When is policy learning a better choice than value learning?

A

Policy learning is better when the optimal policy is easier to find than the optimal value function.

37
Q

DL-10 - Reinforcement learning

What class of RL algorithm is Q-learning?

A

Q-learning is a value-based learning algorithm.

38
Q

DL-10 - Reinforcement learning

What does Q-learning try to do? (I.e. What choices will it make)

A

Perform the sequences of actions that will eventually lead to the maximum total reward, because it knows the expected rewards of each action at each step.

39
Q

DL-10 - Reinforcement learning

What is this function? (See image)

A

The Q-function in Q-learning.

40
Q

DL-10 - Reinforcement learning

What is the formula for the Q-function in Q-learning?

A

(See image)

41
Q

DL-10 - Reinforcement learning

What starting values do we use for Q-values in Q-learning?

A

Arbitrary assumptions for Q-values, but they will be learned over time.

42
Q

DL-10 - Reinforcement learning

What is the Bellman equation used for?

A

It’s used to update Q-values in Q-learning.

43
Q

DL-10 - Reinforcement learning

What is the Bellman equation (formula)?

A

(See image)

44
Q

DL-10 - Reinforcement learning

In the Bellman equation, what is alpha?

A

Learning rate (or step size)

45
Q

DL-10 - Reinforcement learning

In Q-learning, what is a Q-table?

A

A mapping between states-action pairs and Q-values.

46
Q

DL-10 - Reinforcement learning

When is the Q-table updated?

A

After each step.

47
Q

DL-10 - Reinforcement learning

When does the table end?

A

When an episode is done.

48
Q

DL-10 - Reinforcement learning

How is the Q-table initialized?

A

With zeroes.

49
Q

DL-10 - Reinforcement learning

What is the Q-table used for?

A

Q-table is used as a reference to view all possible actions for a given state and selects the action based on the max value of those actions.

50
Q

DL-10 - Reinforcement learning

What are the modes the agent uses when interacting with the environment? (2)

A
  • Exploration
  • Exploitation
51
Q

DL-10 - Reinforcement learning

What is exploration in RL?

A

Trying something new. improves knowledge about each action, hopefully leading to a long-term benefit.

52
Q

DL-10 - Reinforcement learning

What is exploitation in RL?

A

chooses the greedy action to get the most reward by exploiting the agent’s current Q-value estimates.

53
Q

DL-10 - Reinforcement learning

What is epsilon-greedy action selection?

A

A way of balancing exploration and exploitation.

54
Q

DL-10 - Reinforcement learning

What is the formula for epsilon-greedy action selection?

A

(See image)

55
Q

DL-10 - Reinforcement learning

What are the challenges in Q-learning? (2)

A
  • Large memory table, can exceed resources available.
  • Unrealistically high time use for exploration, has to explore every state-action pair.
56
Q

DL-10 - Reinforcement learning

What is DQN short for?

A

Deep Q-network / Deep Q-learning Network

57
Q

DL-10 - Reinforcement learning

What is a solution to the problems with Q-learning?

A

Deep Q-learning using neural networks.

58
Q

DL-10 - Reinforcement learning

What does a deep Q-network do?

A

Approximates Q-values with ML.

59
Q

DL-10 - Reinforcement learning

Describe what the architecture of a DQN network looks like.

A

(See image)

60
Q

DL-10 - Reinforcement learning

What is the formula for Q-loss?

A

(See image)

61
Q

DL-10 - Reinforcement learning

What are some problems with deep Q-learning? (2)

A
  • non-stationary or unstable target
  • updates are correlated
62
Q

DL-10 - Reinforcement learning

What are some solutions to DQN problems? (2)

A
  • Use two networks - prediction and target (see image)
  • Experience replay
63
Q

DL-10 - Reinforcement learning

How are the target/prediction networks trained in DQN?

A

Parameters are updated fro mthe prediction to the target network at every C iterations.

64
Q

DL-10 - Reinforcement learning

What is experience replay?

A

A buffer of past experiences is used to stability training, by decorrelating the training examples in each batch used to update the NN.

65
Q

DL-10 - Reinforcement learning

How is the experience replay buffer created?

A

(See image)

66
Q

DL-10 - Reinforcement learning

Describe the full schematic of using DQNs.

A

(See image)

67
Q

DL-10 - Reinforcement learning

List the DQN steps. (7)

A

1) At state s, selection an action a using an epsilon-greedy policy.
2) Perform the action and move to a new state s’.
3) Store transition in the replay buffer.
4) Sample random batches from replay buffer, calculate the loss.
5) Optimization (e.g. gradient descent) for prediction network.
6) After C iterations, copy prediction network params to target network.
7) Repeat for M episodes.

68
Q

DL-10 - Reinforcement learning

What are the downsides of Q-learning? (2)

A
  • Complexity - okay for small action, discrete action spaces.
  • Flexibility - Cannot learn stochastic policies.
69
Q

DL-10 - Reinforcement learning

What is policy learning?

A

Directly optimizing the policy πœ‹(𝑠).

70
Q

DL-10 - Reinforcement learn

How do you interpret the output of the policy function pi(s)?

A

The policy output P(a|s) is the probability that taking that action is going to result in the highest reward.

71
Q

DL-10 - Reinforcement learning

What is the advantage of policy learning?

A

It’s not constrained to a discrete action space. Can parameterize probability dists how we like, either discrete/continuous.

72
Q

DL-10 - Reinforcement learning

What is PG short for?

A

Policy gradient

73
Q

DL-10 - Reinforcement learning

What are the outputs of PG (2)?

A

Mean and variance as separate outputs.

74
Q

DL-10 - Reinforcement learning

What are some limitations of using RL in the real world?

A

Cannot run a lot of policies in real life.

E.g. car collisions near people is bad.

75
Q

DL-10 - Reinforcement learning

How do we get around the limitations of using RL in the real world?

A

Simulate the environment virtually before deploying to the real world.

76
Q

DL-10 - Reinforcement learning

What are some problems with RL simulators?

A

Not well suited for realistic simulation to facilitate transfer from virtual to real world.