11 - Rational Decisions Over Time Flashcards

Question 1

Q

Difference between search problems and MDPs

Answer

A

Search problems aim to find an OPTIMAL SEQUENCE. MDPs aim to find an OPTIMAL POLICY.

Question 2

Q

Optimal policy maximizes the ________.

Answer

A

EXPECTED UTILITY

Question 3

Q

Types of utility functions

Answer

A

Additive & Discounted utility functions

Question 4

Q

What is a Markov Decision Process (MDP)?

Answer

A

MDP = Markov Chain + Actions + Rewards

Question 5

Q

What is a Partially Observable Markov Decision Process (POMDP)?

Answer

A

POMDP = Hidden Markov Model + Actions + Rewards

Question 6

Q

POMDPs are generalizations of a ________ without ________.

Answer

A

POMDPs are generalizations of a MDP without DIRECT STATE OBSERVATIONS.

Question 7

Q

Advantage of MDPs

Answer

A

Easy to compute and specify

Question 8

Q

Disadvantage of MDPs

Answer

A

Assumes perfect knowledge of the state

Question 9

Q

Advantage of POMDPs

Answer

A

Allows for learning and uncertainty

Question 10

Q

Disadvantage of POMDPs

Answer

A

Computationally expensive

Question 11

Q

What problems can be modeled as Markov Decision Processes?

Answer

A

Sequential Decision Problems in uncertain discrete environments

Question 12

Q

Utility of a state sequence is the sum of all ________.

Answer

A

REWARDS OVER THE SEQUENCE

Question 13

Q

MDP vs. POMDP - What is more difficult to solve?

Question 14

Q

How are POMDPs solved?

Answer

A

Solved by conversion to an MDP

Question 15

Q

Value iteration vs. policy iteration - What converges faster and why?

Answer

A

Policy iteration since the policy might be optimal without knowing the exact utilities of each state.