POMDPs Flashcards
What is a POMDP?
A partially observable MDP. Generally, a way of talking about “non-Markov” environments where we cannot observe the full state.
Given the following definition of an MDP and a POMDP how can one convert an MDP to a POMDP?
MDP = (S, A, T, R)
POMDP = (S, A, Z, T, R, O)
S=S, A=A, Z=S, T=T, R=R
O(S,Z) = 1 if S==Z
O(S,Z) = 0 if S!=Z
T/F, POMDPs generalize MDPs?
True - We can convert any MDP to a POMDP with specific mappings
True/False We cannot use VI/PI/LP in POMDPs because they have infinite states.
False, we can use these solution methods by reducing the infinite states to a finite set of states using max to create piecewise linear and convex representations of the belief state.
What are the requirements to purge a vector when reducing a POMDP state space to a finite piecewise linear convex combination?
It must not take place in the max. All dominated vectors can easily be purged (only need to check endpoints).
Categorize learning models based on the following criteria: observed/partially observed, controlled/uncontrolled.
Observed, controlled - MDP
Observed, uncontrolled - Markov Chain
Partially observed, controlled - POMDP
Partially observed, uncontrolled - Hidden Markov Model
What are the benefits of framing RL as a POMDP?
It naturally resolves the explore/exploit phenomenon. Our goal is to maximize reward but by taking actions to determine the “state” (MDP) which maximizes reward.
Under what framework can we view RL as a planning problem?
We can view it as a planning problem in a POMDP
Is it possible to reduce the continuous POMDP representation of RL into a piecewise convex poynomial?
Yes there are algorithms such as BEETLE but they tend to be too computationally expensive to be useful.
What is predictive state representation (PSR)?
Predictive states is the probabilities of future predictions.
True/False: If we know the predictive state in PSR we can determine the belief state?
True if we have enough tests. Without enough tests there may not be a unique mapping.
True/False: If we know the belief state in PSR we can determine the predictive state?
True
What is the PSR Theorem?
Any n-step POMDP can be represented by a PSR with no more than n tests, each of which is no longer than n steps long
Why use PSR?
Learning a PSR can be easier to learn than a POMDP under certain criteria, although in general easy POMDPs are easy as PSR and hard POMDPs are hard as PSR. From a philosophical perspective it is better to represent our world as something observable.