Review Session #5 Flashcards

1
Q

True or False: Options over an MDP form another MDP.

A

True. Options is the combination of actions of MDP and therefore can form another MDP.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: Nash equilibria can only be pure, not mixed.

A

False. Pure strategies are just a subset of mixed strategies where the probably is always 100% for those actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: An optimal pure strategy does not necessarily exist for a two-player, zero-sum finite deterministic game with perfect information.

A

True. Simple games like Rock Paper Scissors don’t have pure optimal strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or False: The “folk theorem” states that the notion of threats can stabilize payoff profiles in one-shot games.

A

False. The “folk theorem” is stated to occur in infinite games. In one-shot games, it is finite and the stabilization found by folk theorem won’t occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True or False: If following the repeated game strategy “Pavlov”, we will cooperate until our opponent defects. Once an opponent defects we defect forever.

A

False. The ‘Pavlov’ is more align to the tip 4 tat strategy where you would cooperate if they cooperated and defect if they defected. The stated strategy is actually “Grim Trigger”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False: Correlated equilibria rely on coordination, like side payments.

A

False: Competitive Cooperation (CoCo) is relies on coordination, like on side payments. Correlated equilibria relies on rationality contraints and shared Q-tables and without explicit coordination.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True or False: “Subgame perfect” means that every stage of a multistage game has a Nash equilibrium.

A

False: Subgame perfect means that every subgame is a Nash equilibrium.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False: Inverse RL means that we invert the reward function before putting an agent in an environment.

A

False: Inverse RL means the behavior of the of agent derives the reward function of an agent in the environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False: DEC-POMDPs include communication, this communication allows agents to plan.

A

True: DEC-POMDPs (Decentralized) allow agents to communicate their actions towards a goal instead of one agent dictating all the actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or False: Policy shaping requires a completely correct oracle to give the RL agent advice.

A

False: Policy shaping can be done with information with confidence values indicating accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly