5 - Monte Carlo Flashcards
Monte Carlo policy evaluation is a good choice when we know ___________ about the dynamics and/or reward model.
Monte Carlo policy evaluation is a good choice when we know nothing about the dynamics and/or reward model.
Monte Carlo assumes __________ {one-shot, episodic, infinite} MDP.
Monte Carlo assumes episodic MDP.
Vπ is ________ {biased, unbiased} estimator for first-visit Monte Carlo (MC) Policy Evaluation.
Vπ is unbiased estimator for first-visit Monte Carlo (MC) Policy Evaluation.
Vπ is ________ {biased, unbiased} estimator for every-visit Monte Carlo (MC) Policy Evaluation.
Vπ is biased estimator for every-visit Monte Carlo (MC) Policy Evaluation.
Which function does Monte Carlo (MC) Policy Evaluation apply across all possible future state values?
We would like to apply/find the value of a state which is the expected or average return, or in other words the expected cumulative future discounted reward starting from that state (denoted as Vπ(s) for a given policy π)
In general, Monte Carlo (MC) Policy Evaluation is _________________ {biased, unbiased} _________________ {low variance, high variance} estimator. Why?
In general, Monte Carlo (MC) Policy Evaluation is unbiased + high variance estimator.
This is because every process where we take the average or expected value is an unbiased estimate (where standard error falls as 1/sqrt(number of returns averaged)). Furthermore, we have a variance that grows with the increasing number of random actions, transitions, and rewards, that can all vary and hence contribute to the increase in variance.