CS7642_Week7 Flashcards
“Exploration is expectation” in Bayesian RL? (True/False)
True. in Bayesian RL we’re updating our posterior beliefs as we go, so as our beliefs are updated and contain more an more “doses of reality” (i.e. the underlying dynamics of the problem), we end up with an optimal solution.
What is PSR?
PSR :: Predictive State Representation. Essentially, I only care about states that allow me to make some concrete prediction about the world.
Key idea is that we may not ever be able to know the ground truth of what state we’re in, but we can run tests and track outcomes to ground our belief of what state we’re in with empirical evidence.
A PSR can represent any POMDP? (True/False)
True. PSRs are really just a more philosophically palatable representation of POMDPs that dispenses with the notion of hidden/unobservable states.
States and predictors are the same thing? (True/False)
True. Or at least more or less true in the context of RL/machine learning. We only really care about a state or feature insofar as it allows us to make some sort of prediction that has a basis in reality.
POMDPs generalize regular MDPs? (True/False)
True. POMDPs are just a way of talking about “non-Markov” environments.
The definition of a POMDP is a tuple (S, A, Z, T, R, O)? (True/False)
True. Z is the observable (i.e. the thing(s) the agent actually sees), and O is the observation function, i.e. O(S, Z) is a function of the actually underlying state S and what we can actually observe of the process Z
We have to expand out our notion of “states” in order for a POMDP to generalize a regular MDP?
True. We expand it be defining states to be “belief” states b(s) that represent a probability distribution of the states we think we’re in.
In a POMDP, the reward is encoded into the observable Z, i.e. r = f(z)? (True/False)
True. In a POMDP the reward isn’t observed directly.
POMDPs are difficult to solve because the contain an infinite number of states? (True/False)
True. Because we’re working in “belief state space”, there’s no deterministic solution. There’s an infinite number of beliefs we could have in a probabilistic space.
We can make the infinite belief state space of a POMDP tractable from a computational perspective by performing value iteration using a maximum over Piecewise Linear and Convex? (True/False)
True. Think of two belief states on a line. The intersections of the linear functions that denote my belief about whether I’m in belief state A or B define a convex surface that opens upwards to the top. Each linear function can represent an infinite number of states, just by definition of a function (a mapping from input to output). We then just take the max over that linear combination to solve the POMDP.
What do we call a model that is (1) Partially Observed and (2) Controlled?
A POMDP
What do we call a model that is (1) Observed and (2) Controlled?
A regular MDP
What do we call a model that is (1) Observed and (2) uncontrolled?
Markov chain
What do we call a model that is (1) Partially Observed and (2) uncontrolled?
Hidden Markov Model (didn’t really talk about this in the course)
In Bayesian RL, we can think about RL as a POMDP itself?
True. It turns RL into simply planning. The hidden state then becomes the parameters of the MDP.