7 - Model Free Control Flashcards
In your own words, what is Q(s, a)?
This Q function or Q value is a measure of the overall expected reward given that an agent is in state s and does action a. We can think of it as the Quality of being in that state while considering the sum of future discounted rewards.
Off-policy learning can learn to evaluate a policy that is _______________.
Off-policy learning can learn to evaluate a policy using experience gathered from a different policy
What do the letters in SARSA stand for?
State-Action-Reward-State-Action. It means that the Q-value depends on the current state of the agent, the action chosen at that state, the reward associated with that action, the next state taken after this action, and the new action taken in this next state.
Q-learning is a _______ {model-free, model-based} algorithm.
Q-learning is a model-free algorithm.
What is Maximization Bias?
Maximization Bias is a technical way of saying that Q-learning algorithm overestimates the value function estimates (V). That is okay you can still have good policy selection if you overestimate the value function estimates (V).