7 - Model Free Control Flashcards

1
Q

In your own words, what is Q(s, a)?

A

This Q function or Q value is a measure of the overall expected reward given that an agent is in state s and does action a. We can think of it as the Quality of being in that state while considering the sum of future discounted rewards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Off-policy learning can learn to evaluate a policy that is _______________.

A

Off-policy learning can learn to evaluate a policy using experience gathered from a different policy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do the letters in SARSA stand for?

A

State-Action-Reward-State-Action. It means that the Q-value depends on the current state of the agent, the action chosen at that state, the reward associated with that action, the next state taken after this action, and the new action taken in this next state.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q-learning is a _______ {model-free, model-based} algorithm.

A

Q-learning is a model-free algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Maximization Bias?

A

Maximization Bias is a technical way of saying that Q-learning algorithm overestimates the value function estimates (V). That is okay you can still have good policy selection if you overestimate the value function estimates (V).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly