CS7642_Readings Flashcards

Question 1

Q

Any finite length discrete time MDP can be converted to a TTD-MDP? (True/False)

Answer

A

True. This is from the TTD MDP paper for week 10

Question 2

Q

It is always possible to find a stochastic policy that always solves a TTD-MDP?

Answer

A

False. See Bhat 2007

Question 3

Q

In policy shaping, probably the most important parameter is gamma?

Answer

A

False. The most important parameter for learning in policy shaping is the probability that an evaluation of an action choice is correct for a given learner (denoted by ‘C’)

Question 4

Q

An Oracle is always better than human teaching agents in policy shaping?

Answer

A

False. Cederbourg 2011 says this is not the case

Question 5

Q

Like MDPs, every Markov game has a non-empty set of optimal policies, at least one of which is stationary? (True/False)

Answer

A

True. From Littman 1994

Question 6

Q

There is always at least one optimal stationary deterministic policy for Markov games? (True/False)

Answer

A

False. Only regular MDPs make this guarantee.

CS7642_Readings Flashcards

(6 cards)