Review Session #4 Flashcards

Question 1

Q

True or False: Since SARSA is an “online” algorithm, it will never learn the optimal policy for an environment.

Answer

A

False. As noted: “online” does not matter, SARSA converges with infinite exploration at the limit and a greedy policy.

Question 2

Q

True or False: RL with linear function approximation only works on toy problems.

Answer

A

False, by doing feature engineering can be used broadly. Add non-linear features and you can cover a lot of problems.

Question 3

Q

True or False: KWIK algorithms are an improvement of MB (mistake bound) algorithms and as such will have same or better time complexity for all classes of problems.

Answer

A

False: “However, some hypothesis classes are exponentially
harder to learn in the KWIK setting than
in the MB setting. An example is conjunctions of n
Boolean variables, in which MB algorithms can guess
“false” when uncertain and learn with n+1 mistakes,
but a KWIK algorithm may need
(2n/2) [I don’t know]s to acquire
the negative examples required to capture the
target hypothesis.”

Question 4

Q

True or False: With a classic update using linear function approximation, we will always converge to some values, but they may not be optimal.

Answer

A

False we will always converge to a global optima but the linear function approximation may not be suitable for non-linear domains. Baird’s star problem is not guaranteed to converge.

Question 5

Q

True or False: Residual algorithms are always guaranteed to converge, but often do so too slowly to be practical.

Answer

A

True: Depending on the choice of psi, the residual algorithm shows this behavior. Residual algorithms are a generalization of residual gradient algorithms.

Question 6

Q

True or False: Applying generalization with an “averager” on an MDP results in another MDP.

Answer

A

True, any generalization of an MDP would result in another MDP.

Question 7

Q

True or False: Problems that can be represented as POMDPs cannot be represented as MDPs.

Answer

A

False. You can go from an POMDP over some state-space to a larger MDP over a belief-space.

Question 8

Q

True or False: The only algorithms that work in POMDPs are planning algorithms.

Answer

A

False: Lots of algorithms will work here that involve using the POMDP: qlearning, sarsa, etc… Certain POMDPs could even be solved by using the PSR. Which I thought was a good example, but, not mentioned in this thread.

Noting they are undecidable for POMDPs in general is a fine answer as well.

Question 9

Q

True or False: We can represent any POMDP as a PSR, however some classes of POMDPs will be too large to deal with as PSRs.

Answer

A

False. Lot of algorithms will work here. PSR as noted is a great counter example.

PSR theorems shows that any POMDP can be represented as a PSR with no more than n tests. So they can be reasonably converted to PSRs.

Review Session #4 Flashcards

(9 cards)