Review Session #4 Flashcards
True or False: Since SARSA is an “online” algorithm, it will never learn the optimal policy for an environment.
False. As noted: “online” does not matter, SARSA converges with infinite exploration at the limit and a greedy policy.
True or False: RL with linear function approximation only works on toy problems.
False, by doing feature engineering can be used broadly. Add non-linear features and you can cover a lot of problems.
True or False: KWIK algorithms are an improvement of MB (mistake bound) algorithms and as such will have same or better time complexity for all classes of problems.
False: “However, some hypothesis classes are exponentially
harder to learn in the KWIK setting than
in the MB setting. An example is conjunctions of n
Boolean variables, in which MB algorithms can guess
“false” when uncertain and learn with n+1 mistakes,
but a KWIK algorithm may need
(2n/2) [I don’t know]s to acquire
the negative examples required to capture the
target hypothesis.”
True or False: With a classic update using linear function approximation, we will always converge to some values, but they may not be optimal.
False we will always converge to a global optima but the linear function approximation may not be suitable for non-linear domains. Baird’s star problem is not guaranteed to converge.
True or False: Residual algorithms are always guaranteed to converge, but often do so too slowly to be practical.
True: Depending on the choice of psi, the residual algorithm shows this behavior. Residual algorithms are a generalization of residual gradient algorithms.
True or False: Applying generalization with an “averager” on an MDP results in another MDP.
True, any generalization of an MDP would result in another MDP.
True or False: Problems that can be represented as POMDPs cannot be represented as MDPs.
False. You can go from an POMDP over some state-space to a larger MDP over a belief-space.
True or False: The only algorithms that work in POMDPs are planning algorithms.
False: Lots of algorithms will work here that involve using the POMDP: qlearning, sarsa, etc… Certain POMDPs could even be solved by using the PSR. Which I thought was a good example, but, not mentioned in this thread.
Noting they are undecidable for POMDPs in general is a fine answer as well.
True or False: We can represent any POMDP as a PSR, however some classes of POMDPs will be too large to deal with as PSRs.
False. Lot of algorithms will work here. PSR as noted is a great counter example.
PSR theorems shows that any POMDP can be represented as a PSR with no more than n tests. So they can be reasonably converted to PSRs.