5 - Monte Carlo Flashcards

Question 1

Q

Monte Carlo policy evaluation is a good choice when we know ___________ about the dynamics and/or reward model.

Answer

A

Monte Carlo policy evaluation is a good choice when we know nothing about the dynamics and/or reward model.

Question 2

Q

Monte Carlo assumes __________ {one-shot, episodic, infinite} MDP.

Answer

A

Monte Carlo assumes episodic MDP.

Question 3

Q

Vπ is ________ {biased, unbiased} estimator for first-visit Monte Carlo (MC) Policy Evaluation.

Answer

A

Vπ is unbiased estimator for first-visit Monte Carlo (MC) Policy Evaluation.

Question 4

Q

Vπ is ________ {biased, unbiased} estimator for every-visit Monte Carlo (MC) Policy Evaluation.

Answer

A

Vπ is biased estimator for every-visit Monte Carlo (MC) Policy Evaluation.

Question 5

Q

Which function does Monte Carlo (MC) Policy Evaluation apply across all possible future state values?

Answer

A

We would like to apply/find the value of a state which is the expected or average return, or in other words the expected cumulative future discounted reward starting from that state (denoted as Vπ(s) for a given policy π)

Question 6

Q

In general, Monte Carlo (MC) Policy Evaluation is _________________ {biased, unbiased} _________________ {low variance, high variance} estimator. Why?

Answer

A

In general, Monte Carlo (MC) Policy Evaluation is unbiased + high variance estimator.

This is because every process where we take the average or expected value is an unbiased estimate (where standard error falls as 1/sqrt(number of returns averaged)). Furthermore, we have a variance that grows with the increasing number of random actions, transitions, and rewards, that can all vary and hence contribute to the increase in variance.

5 - Monte Carlo Flashcards

(6 cards)