11 - Rational Decisions Over Time Flashcards
Difference between search problems and MDPs
Search problems aim to find an OPTIMAL SEQUENCE. MDPs aim to find an OPTIMAL POLICY.
Optimal policy maximizes the ________.
EXPECTED UTILITY
Types of utility functions
Additive & Discounted utility functions
What is a Markov Decision Process (MDP)?
MDP = Markov Chain + Actions + Rewards
What is a Partially Observable Markov Decision Process (POMDP)?
POMDP = Hidden Markov Model + Actions + Rewards
POMDPs are generalizations of a ________ without ________.
POMDPs are generalizations of a MDP without DIRECT STATE OBSERVATIONS.
Advantage of MDPs
Easy to compute and specify
Disadvantage of MDPs
Assumes perfect knowledge of the state
Advantage of POMDPs
Allows for learning and uncertainty
Disadvantage of POMDPs
Computationally expensive
What problems can be modeled as Markov Decision Processes?
Sequential Decision Problems in uncertain discrete environments
Utility of a state sequence is the sum of all ________.
REWARDS OVER THE SEQUENCE
MDP vs. POMDP - What is more difficult to solve?
POMDPs
How are POMDPs solved?
Solved by conversion to an MDP
Value iteration vs. policy iteration - What converges faster and why?
Policy iteration since the policy might be optimal without knowing the exact utilities of each state.