CS7642_Week8 Flashcards

Question 1

Q

From an algorithmic perspective, what makes RL hard?

Answer

A

Delayed reward (i.e. week feedback signals)
Bootstrapping (need exploration)
# of states, # of actions (state spaces grows massive quickly)

Question 2

Q

What is temporal abstraction?

Answer

A

Making the action space bigger. Think about navigating in a house to the bathroom. Instead of saying take 30 steps to the right then 10 steps to the left, we might just say, go to the first doorway, walk down the hallway, and open the door on your left.

Question 3

Q

Using temporal abstraction, backing up values is slower than it otherwise would be? (True/False)

Answer

A

False. If we’re backing up over fewer states/actions, the information propogates back to earlier states more quickly.

Question 4

Q

An option generalizes the idea of an action? (True/False)

Answer

A

True. It’s atomic in the sense that it’s a wrapper that allows us to have more complicated actions that kind of run their own subroutine, then return the reward that was gathered in that subroutine, even though the number of time steps taken may be variable.

Question 5

Q

What is the difference between a regular MDP and an SMDP (semi-MDP)?

Answer

A

A SMDP allows us to take variable length time steps.

Question 6

Q

We always have to treat MDPs and SMDPs differently? (True/False)

Answer

A

False. Once we’ve defined the proper form of the Bellman operator for the SMDP, we can essentially treat the SMDP as we would any regular MDP.

Question 7

Q

The number of steps you take in an SMDP don’t matter? (True/False)

Answer

A

False. This is why we define SMDPs with a variation of the Bellman operator such that we can treat the SMDP and MDP as the same, even though the time that it takes to accumulate reward in each of the options it takes might be variable.

Question 8

Q

An SMDP will converge to the optimal solution regardless of the options we design? (True/False)

Answer

A

False. This gives us the idea of hierarchical optimality. We can guarantee optimality with respect to the options that we have, but we can’t guarantee that particular set of options is in fact optimal.

Question 9

Q

Monte Carlo Tree Search isn’t useful for large state spaces? (True/False)

Answer

A

False. This is one of its principal benefits. Planning is also time independent of the size of the state space in MCTS. The trade-off is that you need lots of samples. However, running time grows exponentially.

Question 10

Q

What are the three types of abstraction?

Answer

A

Temporal abstraction
Goal abstraction
State abstraction

Question 11

Q

In a 2-player zero-sum deterministic game of perfect information, minimax == maximin? (True/False)

Question 12

Q

In a 2-player zero-sum deterministic game of perfect information, there is never an optimal pure strategy for each player? (True/False)

Answer

A

False. There is always an optimal pure strategy in a 2-player zero-sum deterministic game of perfect information.

Question 13

Q

In a 2-player zero-sum non-deterministic game of perfect information, minimax != maximim? (True/False)

Question 14

Q

In a 2-player zero-sum non-deterministic game of hidden information, minimax != maximim? (True/False)

Answer

A

True. This is where the minimax/maximin equivalence breaks down for zero-sum games. This is because Von Neumann theorem breaks down in presence of imperfect information.

Question 15

Q

When plotting the probability distributions for two mixed strategies, the intersection is the optimal strategy? (True/False)

Answer

A

False. Similar to the PWLC idea. Find the maximum minimum on the non-dominated piecewise surface to find the optimal strategy.

Question 16

Q

What type of game is prisoner’s dilemma an example of?

Answer

A

2-player NON-zero sum non-deterministic game of hidden information.

Question 17

Q

In a 2-player zero-sum deterministic game of perfect information, it doesn’t matter if agents are rational? (True/False)

Answer

A

False. Each player recursively is assuming that you’re doing what’s best for you.

Question 18

Q

What is a Nash Equilibrium?

Answer

A

A Nash equilibrium exists IFF you randomly chose one of the players and gave them the chance to switch, they would have no reason to do it.

Question 19

Q

Nash Equilibrium only applies to pure strategies? (True/False)

Answer

A

False. NE is just a probability distribution over strategies, which can be pure or mixed strategies.

Question 20

Q

In an n=player pure strategy game, if elimination of strictly dominated strategies eliminates all but one combination, that combination is the unique Nash Equilibria? (True/False)

Question 21

Q

Any Nash Equilibria will survive elimination of strictly dominated strategies? (True/False)

Question 22

Q

For any finite game with a finite number of n-players and, there exists at least one Nash Equilibria? (True/False)