C7 Flashcards
Why is there so much interest in multi-agent reinforcement learning?
multi-agent is much more realistic because in the real world we also have multiple agents
What are the 3 main challenges of multi-agent reinforcement learning?
- partial observability
- nonstationary environments
- large state space
what is a Nash strategy?
when the agent is guaranteed to do no worse than tie against any other opponent strategy
what is the Nash equilibrium?
a situation where no agent has anything to gain by changing its own strategy (minimax). The agent does not try to exploit the opponent strategy’s flaws, it just wins when the opponent makes mistakes
what is a Pareto Optimum?
the best possible outcome for us where we do not hurt others, and others do not hurt us. It is a cooperative strategy.
what is the Pareto efficient solution?
the situation where no cooperative agent can be better off without making at least one other agent worse off
in a competitive multi-agent system, what algorithm can be used to calculate a Nash strategy?
Counterfactual Regret Minimization (CFR)
What makes it diffcult to calculate the solution for a game of imperfect information?
it increases the size of the state space, and computing the unknown outcomes quickly becomes unfeasible
what is the Prisoner’s dilemma?
- if both prisoners confess they both get 5 years in prison
- if both stay silent they both get 2 years in prison
- if I confess and the other stays silent, I walk free
- if I stay silent and the other confesses, I get 10 years
this is an example of mixed behaviour
what are the Pareto and the Nash strategies in the Prisoner’s dilemma?
Pareto: both stay silent (cooperate)
Nash: both confess (defect)
what is the Iterated Prisoner’s dilemma?
for multiple rounds of the Prisoner’s dilemma a tit for tat strategy works best: in the first round you play cooperative, after that you play whatever the opponent did in the previous round
Name two multi-agent card games of imperfect information
poker, blackjack, bridge
name three kinds of strategies that can occur in multi-agent reinforcement learning
- CFR
- evolutionary strategies
- cooperative strategies
Counterfactual Regret Minimization
a statistical algorithm that converges to a Nash equilibrium. Unlike minimax, it is suitable for imperfect information games
swarm computing
focuses on emerging behavior in decentralized, collective, self-organized systems. Introduces forms of communication between agents
cooperation and survival of the group (Pareto)