C7 Flashcards

Question 1

Q

Why is there so much interest in multi-agent reinforcement learning?

Answer

A

multi-agent is much more realistic because in the real world we also have multiple agents

Question 2

Q

What are the 3 main challenges of multi-agent reinforcement learning?

Answer

A

partial observability
nonstationary environments
large state space

Question 3

Q

what is a Nash strategy?

Answer

A

when the agent is guaranteed to do no worse than tie against any other opponent strategy

Question 4

Q

what is the Nash equilibrium?

Answer

A

a situation where no agent has anything to gain by changing its own strategy (minimax). The agent does not try to exploit the opponent strategy’s flaws, it just wins when the opponent makes mistakes

Question 5

Q

what is a Pareto Optimum?

Answer

A

the best possible outcome for us where we do not hurt others, and others do not hurt us. It is a cooperative strategy.

Question 6

Q

what is the Pareto efficient solution?

Answer

A

the situation where no cooperative agent can be better off without making at least one other agent worse off

Question 7

Q

in a competitive multi-agent system, what algorithm can be used to calculate a Nash strategy?

Answer

A

Counterfactual Regret Minimization (CFR)

Question 8

Q

What makes it diffcult to calculate the solution for a game of imperfect information?

Answer

A

it increases the size of the state space, and computing the unknown outcomes quickly becomes unfeasible

Question 9

Q

what is the Prisoner’s dilemma?

Answer

A

if both prisoners confess they both get 5 years in prison
if both stay silent they both get 2 years in prison
if I confess and the other stays silent, I walk free
if I stay silent and the other confesses, I get 10 years

this is an example of mixed behaviour

Question 10

Q

what are the Pareto and the Nash strategies in the Prisoner’s dilemma?

Answer

A

Pareto: both stay silent (cooperate)
Nash: both confess (defect)

Question 11

Q

what is the Iterated Prisoner’s dilemma?

Answer

A

for multiple rounds of the Prisoner’s dilemma a tit for tat strategy works best: in the first round you play cooperative, after that you play whatever the opponent did in the previous round

Question 12

Q

Name two multi-agent card games of imperfect information

Answer

A

poker, blackjack, bridge

Question 13

Q

name three kinds of strategies that can occur in multi-agent reinforcement learning

Answer

A

CFR
evolutionary strategies
cooperative strategies

Question 14

Q

Counterfactual Regret Minimization

Answer

A

a statistical algorithm that converges to a Nash equilibrium. Unlike minimax, it is suitable for imperfect information games

Question 15

Q

swarm computing

Answer

A

focuses on emerging behavior in decentralized, collective, self-organized systems. Introduces forms of communication between agents

cooperation and survival of the group (Pareto)

Question 16

Q

Name two solution methods that are appropriate for solving mixed strategy games

Answer

Study These Flashcards

A

evolutionary methods and cooperative methods

Question 17

Q

evolutionary algorithms

Answer

Study These Flashcards

A

inspired by bio-genetic processes of reproduction: mutation, recombination, selection. repeat:
1. Evaluate the fitness of each individual of the population
2. Select the fittest individuals for reproduction
3. Through crossover and mutation generate new individuals
4. Replace the least fit individuals by the new individuals

focus on competition and survival of the fittest (Nash)

Question 18

Q

what is regret?

Answer

Study These Flashcards

A

the regret of an action is the amount of reward that is missed by an agent for not choosing the actions with the highest payoff

Question 19

Q

what 3 types of behaviour do we see in MARL?

Answer

Study These Flashcards

A

cooperation, competition and mixed behaviour

Question 20

Q

what is Game Theory?

Answer

Study These Flashcards

A

the study of strategic interaction among rational decision-making agents

Question 21

Q

Describe the behaviour that emerged in Hide and Seek

Answer

Study These Flashcards

A

Collaboration

Question 22

Q

what is population-based training?

Answer

Study These Flashcards

A

it combines evolutionary ideas with RL ideas

teams compete against each other and if a team learns good behaviour they survive: team-based learning of better and better behaviour

Question 23

Q

why is StarCraft used as a testbed or MARL?

Answer

Study These Flashcards

A

to study population-based training: there is collaboration within teams and competition between teams

C7 Flashcards

(23 cards)