CS7642_Week8 Flashcards
From an algorithmic perspective, what makes RL hard?
- Delayed reward (i.e. week feedback signals)
- Bootstrapping (need exploration)
- # of states, # of actions (state spaces grows massive quickly)
What is temporal abstraction?
Making the action space bigger. Think about navigating in a house to the bathroom. Instead of saying take 30 steps to the right then 10 steps to the left, we might just say, go to the first doorway, walk down the hallway, and open the door on your left.
Using temporal abstraction, backing up values is slower than it otherwise would be? (True/False)
False. If we’re backing up over fewer states/actions, the information propogates back to earlier states more quickly.
An option generalizes the idea of an action? (True/False)
True. It’s atomic in the sense that it’s a wrapper that allows us to have more complicated actions that kind of run their own subroutine, then return the reward that was gathered in that subroutine, even though the number of time steps taken may be variable.
What is the difference between a regular MDP and an SMDP (semi-MDP)?
A SMDP allows us to take variable length time steps.
We always have to treat MDPs and SMDPs differently? (True/False)
False. Once we’ve defined the proper form of the Bellman operator for the SMDP, we can essentially treat the SMDP as we would any regular MDP.
The number of steps you take in an SMDP don’t matter? (True/False)
False. This is why we define SMDPs with a variation of the Bellman operator such that we can treat the SMDP and MDP as the same, even though the time that it takes to accumulate reward in each of the options it takes might be variable.
An SMDP will converge to the optimal solution regardless of the options we design? (True/False)
False. This gives us the idea of hierarchical optimality. We can guarantee optimality with respect to the options that we have, but we can’t guarantee that particular set of options is in fact optimal.
Monte Carlo Tree Search isn’t useful for large state spaces? (True/False)
False. This is one of its principal benefits. Planning is also time independent of the size of the state space in MCTS. The trade-off is that you need lots of samples. However, running time grows exponentially.
What are the three types of abstraction?
- Temporal abstraction
- Goal abstraction
- State abstraction
In a 2-player zero-sum deterministic game of perfect information, minimax == maximin? (True/False)
True.
In a 2-player zero-sum deterministic game of perfect information, there is never an optimal pure strategy for each player? (True/False)
False. There is always an optimal pure strategy in a 2-player zero-sum deterministic game of perfect information.
In a 2-player zero-sum non-deterministic game of perfect information, minimax != maximim? (True/False)
False.
In a 2-player zero-sum non-deterministic game of hidden information, minimax != maximim? (True/False)
True. This is where the minimax/maximin equivalence breaks down for zero-sum games. This is because Von Neumann theorem breaks down in presence of imperfect information.
When plotting the probability distributions for two mixed strategies, the intersection is the optimal strategy? (True/False)
False. Similar to the PWLC idea. Find the maximum minimum on the non-dominated piecewise surface to find the optimal strategy.