Module 8: Markov Decision Process Flashcards

1
Q

Which of the following statements is true of a Markov decision process or MDP?

A

An MDP is defined for a fully observable, stochastic environment.
A solution or policy must specify what the agent should do for all states that it can reach.

Discounted reward is not absolutely necessary, proper policy might not always exist - see for example Figure 17.2 bottom right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Select all the following components that the policy iteration algorithm equation takes into account.

A

The probability of entering a state Sā€™ from state S after performing action A.
The utility associated with a state S.
All of the actions A that an agent can take.

Max reward that can be obtained after being in state S is used in value iteration, not policy iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F
Policy is implicitly updated in value iteration.

A

True

Policy corresponds to the action paired with the max Q-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

T/F
The utility function estimate must be completely accurate in order to get an optimal policy.?

A

False
The utility is a human defined function to evaluate the state and thus it cannot be completely correct. But it could still obtain an optimal policy as long as it describes the state properly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T/F
Policy is explicitly updated in value iteration.

A

False

Policy is implicitly updated in value iteration, see textbook section 17.2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the principle of MEU?

A

An agent should maximize the weighted average of their utilities.

Refer to the equation of MEU in the slides.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly