5 - Monte Carlo Flashcards

1
Q

Monte Carlo policy evaluation is a good choice when we know ___________ about the dynamics and/or reward model.

A

Monte Carlo policy evaluation is a good choice when we know nothing about the dynamics and/or reward model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Monte Carlo assumes __________ {one-shot, episodic, infinite} MDP.

A

Monte Carlo assumes episodic MDP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Vπ is ________ {biased, unbiased} estimator for first-visit Monte Carlo (MC) Policy Evaluation.

A

Vπ is unbiased estimator for first-visit Monte Carlo (MC) Policy Evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Vπ is ________ {biased, unbiased} estimator for every-visit Monte Carlo (MC) Policy Evaluation.

A

Vπ is biased estimator for every-visit Monte Carlo (MC) Policy Evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which function does Monte Carlo (MC) Policy Evaluation apply across all possible future state values?

A

We would like to apply/find the value of a state which is the expected or average return, or in other words the expected cumulative future discounted reward starting from that state (denoted as Vπ(s) for a given policy π)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In general, Monte Carlo (MC) Policy Evaluation is _________________ {biased, unbiased} _________________ {low variance, high variance} estimator. Why?

A

In general, Monte Carlo (MC) Policy Evaluation is unbiased + high variance estimator.

This is because every process where we take the average or expected value is an unbiased estimate (where standard error falls as 1/sqrt(number of returns averaged)). Furthermore, we have a variance that grows with the increasing number of random actions, transitions, and rewards, that can all vary and hence contribute to the increase in variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly