Review Session #5 Flashcards
True or False: Options over an MDP form another MDP.
True. Options is the combination of actions of MDP and therefore can form another MDP.
True or False: Nash equilibria can only be pure, not mixed.
False. Pure strategies are just a subset of mixed strategies where the probably is always 100% for those actions.
True or False: An optimal pure strategy does not necessarily exist for a two-player, zero-sum finite deterministic game with perfect information.
True. Simple games like Rock Paper Scissors don’t have pure optimal strategy.
True or False: The “folk theorem” states that the notion of threats can stabilize payoff profiles in one-shot games.
False. The “folk theorem” is stated to occur in infinite games. In one-shot games, it is finite and the stabilization found by folk theorem won’t occur.
True or False: If following the repeated game strategy “Pavlov”, we will cooperate until our opponent defects. Once an opponent defects we defect forever.
False. The ‘Pavlov’ is more align to the tip 4 tat strategy where you would cooperate if they cooperated and defect if they defected. The stated strategy is actually “Grim Trigger”.
True or False: Correlated equilibria rely on coordination, like side payments.
False: Competitive Cooperation (CoCo) is relies on coordination, like on side payments. Correlated equilibria relies on rationality contraints and shared Q-tables and without explicit coordination.
True or False: “Subgame perfect” means that every stage of a multistage game has a Nash equilibrium.
False: Subgame perfect means that every subgame is a Nash equilibrium.
True or False: Inverse RL means that we invert the reward function before putting an agent in an environment.
False: Inverse RL means the behavior of the of agent derives the reward function of an agent in the environment.
True or False: DEC-POMDPs include communication, this communication allows agents to plan.
True: DEC-POMDPs (Decentralized) allow agents to communicate their actions towards a goal instead of one agent dictating all the actions.
True or False: Policy shaping requires a completely correct oracle to give the RL agent advice.
False: Policy shaping can be done with information with confidence values indicating accuracy.