Basic Game Theory - Week 2 Flashcards
Define and outline the purpose of game theory
Game theory is the study of rational behaviour in situations involving interdependent choices. Players choose strategies to maximize their welfare, and the resulting outcome is sensitive to the choice of strategy made by competing players. These are simplified representations of real situations designed to focus analysis on the principles of strategic decision making in situations of different game structures. It is, therefore, a theoretical device based on simplifying (but clear and explicit) assumptions that define the domain wherein results are expected to hold.
Whether we do so formally or informally, openly or subconsciously, intelligently or simplistically, it is a process that we must and do engage in. The value of game theory is that it formalizes our thought process and can be expressed to others clearly so that debate may be focused on the critical aspects of a decision.
Define dominant strategy, dominated strategy, mixed strategy, nash equilibrium, Pareto efficiency,
dominant strategy: a strategy which always yields better payoffs to a player, than alternative strategies, regardless of the strategy chosen by an opponent. Strongly dominates if it earns a payoff that is strictly higher regardless of what strategy the other player might choose. Weakly dominates if the first strategy earns a payoff that is first at least as high regardless of what strategy the other might choose and second strictly higher for at least one strategy the other player might choose.
dominated strategy: a strategy which always yield worse payoffs to a player than some other strategy, no matter what strategy the opponent chooses.
mixed strategy: a strategy consisting of a randomized choice of pure strategies (identified by a probability distribution over the set of pure strategies). Folk theorem - no matter what the game there will always be a nash equilibrium, even if that’s not a pure strategy equilibrium it could be a mixed strategy.
Nash equilibrium: a Nash equilibrium occurs when a player has no incentive to change their strategy given the strategy of the other player(s). They are happy with they are given the choices made by the other players. If a strategy profile is a solution than it must be a nash equilibrium. Note in mixed games there should be no nash equilibrium because it’s usually zero-sum and if your opponent is winning you’re not happy about it because there is an option you can employ to be the one winning.
Pareto efficient: an equilibrium is said to be Pareto efficient when no player (or agent) can be made better off without making someone else worse off. A Pareto superior outcome is one in which both 3 (all) players are better off.
The process of solving games:
- Anticipate the action of others
- find dominated strategies
- remove dominated strategies
- find nash equilibria (stability, multiple equilibria’s)
One of the important rules of successful strategy is to anticipate the actions of rivals. The “first rule” of solving a “game” is indeed to look forward and reason backwards, which formalizes the process of anticipating opponent behaviour. This process is applied in extensive form games (tree-diagram games).
The second rule of game theory is that if a strategy choice always leads to the best outcome regardless of your opponent’s reaction or choice, employ that strategy. In formal terms this approach requires an actor to identify any “dominant strategy”, which is defined as a strategy that has the best result regardless of the strategies used by rivals. Not all games have a dominant strategy. This approach, and the next one, are associated more often with normal form games.
The third rule is the converse of rule two: eliminate all dominated strategies from the choice set. A dominated strategy is one that leads to worse outcomes regardless of the strategy employed by rivals. Note that dominated strategies in a game can be eliminated successively, since a strategy that is dominated in a subset may not be dominated in the universal set of strategy options.
In terms of solving game structures, if rules 2 and 3 fail to identify a solution, then the next step is to identify the Nash equilibrium. A Nash equilibrium is defined as one in which no actor in the game has an incentive to alter his or her strategy. Note that for a game there may be no pure strategy Nash equilibrium, though there will always exist a mixed strategy equilibrium. Note also that a game may have more than one Nash equilibrium, when this happens there are factors including communication (has to be information incentive, so there has to be an incentive for me to be honest in my communication or what I say isn’t credible - the case in battle of the exes but not chicken game), pre-commitment (can be a stubborn move but helpful), convincing (relies on reputation and credibility) . In some more complex games, Nash equilbria may be stable or unstable. Stability refers to the property that if a small deviation from an equilibrium occurs, the same equilibrium will eventually be restored - Dynamics of a model exploding reactions from a small deviation of payoffs - with two equilibriums what could tip the balance away from one to another - stability in the extensive form game analysis what happens if you slightly change the payoff structure assumptions, a relatively small and reasonable modest change can lead to a drastic outcome
Explain the Prisoner’s dilemma game, one page 4 of notes answer the following:
What is the Nash equilibrium and why?
Is the Nash equilibrium Pareto efficient?
How could we get a Pareto efficient outcome?
How rational behavouir leads to irrational outcomes example. Shows you how a nash equlbruim doesn’t have to be pareto efficient. The PD game story is the standard one of two prisoners captured by police. The police have a little evidence to convict both on minor charges if neither testifies against the other. If they both accuse each other they both get a harsher sentence than if they both stay quiet. If one agrees to serve as witness against the other, while the other stays quiet, the first person will get a lighter sentence or go free, while the quiet one will get the harshest sentence. Typically, the prisoners are interrogated in separate cells.
The payoffs can be set up in two different ways: we can use an ordinal ranking where 1 is the “best”, 2 is second best, etc. So smaller numbers mean better (1 > 2 > 3 . 4). Alternatively, and more commonly in economics, we use absolute payoffs where the payoff number is in absolute benefit terms, e.g. dollars. In these first few examples we will use the ranking system (always sort out the definition of the numbers; in some games low numbers are preferred because they indicate preference rank, in other games high numbers are preferred because they identify value)
To identify any pure-strategy Nash equilibrium in a normal form game start with one player and ask which strategy that player would want to use for each of the opponent’s possible strategies. In the PD game above, we can start with Player 1 and ask what she would want to play if player 2 chooses strategy C. If player 1 chooses C she gets the second-best payoff 2, whereas if she plays strategy D she will get the best payoff 1. So, a rational player 1 would want to play strategy D if player 2 plays C. Similarly, if player 2 chooses strategy D, player one would want to choose strategy D to get her 3rd best payoff rather than her wort payoff. So regardless of whether player 2 plays c or D, player 1 will prefer to use strategy D. in this case D is a dominant strategy, C is a dominated strategy. A rational player 1 will always choose strategy D in this specific game structure
If both players have a dominant strategy, then the Nash Equilibrium occurs when they each play this strategy, and the outcome of the game will be the strategy combination DD with payoffs 3,3. So each player will get the 3rd best outcome. To verify that an outcome is or is not a Nash equilibrium, you can take each possible outcome and ask whether either player would prefer a different strategy given their opponent’s strategy. For example, for payoff CD with outcome 4,1, player 1 could move from C to D and get outcome 3,3, which player 1 prefers to 4,1. So CD cannot be a Nash equilibrium, even though player 2 is quite happy with it. This approach will confirm that the Nash equilibrium is, I fact, outcome DD with payoff 3,3.
Notice that this outcome is strictly Pareto inefficient, since strategy combination CC with payoffs 2,2 leaves them both better off, which is Pareto superior. A useful exercise is to rank these payoffs in Pareto terms; it turns out that 2,2 Pareto dominates 3,3, but those are the only two outcomes that can be ranked. The failure of rational behaviour to lead to a Pareto inefficient outcome is an important illustration of how rationality can lead to undesirable outcomes.
Prisoner’s dilemma in the extensive game format (page 5):
What is the Nash equilibrium and why?
Is the Nash equilibrium Pareto efficient?
How could we get a Pareto efficient outcome?
we can illustrate the PD game using an extensive form structure. In the normal form game strategy choices are modelled as being made simultaneously, in the sense that neither player knows what strategy the other is using until after they also make a strategy choice. In an extensive form game, strategy choices can be sequential. Either way, the nash equilibrium would still be for them to defect and that’s where they’ll arrive.
Chicken game: touch on communication, pre-commitment, and reputation
What is the Nash equilibrium and why?
Is the Nash equilibrium Pareto efficient?
How could we get a Pareto efficient outcome?
Two teenage rivals who drive their cars toward each other on a collision course with their friends looking on to see who will swerve first
The options are swerve and don’t swerve, and there are two nash equilibriums which help to avoid one of them dying
Notice that there is no dominant or dominated strategy, and that there are two Nash equilibria (DC and CD)
When you have two nash equilibriums:
In this game you have a first-mover advantage in an extensive form game player 1 would be able to get the best outcome.
. Which will emerge? How can players try to induce their preferred Nash equilibrium
Typically pre-commitment is one important option in this game structure, like taking out your steering wheel so you can’t swerve. But you don’t want to pre-commit to som
But also communicating (if you have credibility and enforcement power) - communication is believable if the information is incentive-compatible where expressing honest communication is also in my best interest in this case it isn’t because we both want to win, but in the case of the battle of the sexes than it is because we both want to end up in the same place
when convincing your opponent that you won’t swerve reputation matters if a nerd shows up then reputation also plays a role because you know they’ll swerve.
But note that pre-commitment won’t work well in the previous PD because there is a dominant strategy either way, regardless of the pre-commitment you will want to defect. Communication is not credible because there’s an incentive to defect.
Solve the sequential normal game found on page 7
(player 1 can only move up and down) (player 2 can move left or right)
1. player 1 starts at point 3,5
2. player 2 starts at point 3,5
the order of moves matters
where it starts matters
the direction matters (player 1 can only move up and down) (player 2 can move left or right)
in the first case starting at 3,5 - player 1 starts would go up to 4,0 then player 2 would go to 0,4 and you would get stuck in that loop
in the second case - player 2 starts and would go from 3,5 to 6,6 and player 1 would stay there and not move which is pareto efficient and is also a nash equilbruim
Dixit and Nalebuff use an interesting model to illustrate the concept of dominated and dominant strategies in a normal form game. Their model is based on the idea of missile interception. One country – the attacker - launches a missile along a finite set of paths, and the other country – the defender - sends an interceptor along a finite set of possible paths. If the interceptor and the missile are ever at the same coordinate at the same time, the missile is destroyed (a “success” for the defender and a “fail” for the attacker). If the two never meet, the missile succeeds in hitting its target (a “fail” for the defender and a “success” for the attacker). Notice that this game is what we would also refer to as a zero-sum game; one player’s win is the other payer’s loss.
Identify the dominating and dominated strategies one page 8
Is there a pure strategy nash equilibrium?
The grid above marks out the paths. In the original game the attacker is located at I (for Iraq) and the defender is at A (America). As an example, if the attacking missile follows the path IFCB and the defender launches an interceptor along ABCF then both are at C at the same time and the attacking missile is destroyed. If the intercept was instead launched along the path ADGH the two never meet and the missile hits its target.
The normal form game looks like the following (you can confirm these outcomes). To keep the Dixit- 8 Nalebuff notation, H will mean the interceptor “hits” the attacker’s missile, and “O” means the missile hits the defender.
You should be able to convince yourself that there is no pure strategy Nash equilibrium: for any outcome one player will always prefer to change its strategy. This is actually a common problem in conflict situations, and we’ll see it again in problems such as terrorism (where a government wants to be protecting the target of the terrorists, and the terrorists want to select an unprotected target, for example). These cases have what we call a “mixed strategy” Nash equilibrium, which we will examine in detail later.
Explain subgame perfection
Subgame perfection requires that any Nash equilibrium for the game as a whole must also be a Nash equilibrium for any sub game that it contains. Conceptually what this means is that some behaviour that a player would like to threaten to engage in will not be credible, and will thus not affect the behaviour of opponents, if that threatened strategy itself is not an optimal strategy once the game gets to that point. Subgame perfection, then, gets at the heart of threat credibility. In practice, subgame perfection means using the principle of backward induction. This approach, as Dixit and Nalebuff has already explained to us, requires us to look at the final choices in an extensive game and work our way backwards to compare the outcomes of the various strategies open to a particular player at each decision node.
Explain backward induction using extensive game theory on page 10
One example of backwards induction: think about the end node and who makes the first decision (or in other words who is further out), then think which outcome out of each option would they prefer, cross out the other ones that they wouldn’t, then do the same for player 2, and player 3, and you will see what decision will manifest. Start with Harry, then Voldemort, then Ginny.
we would start at the last subgame node which is the US (40,0 vs 20,40) they would decide 40,0, then 50,0 vs 30,60, they would decide 50,0. Out of those two options Assad would decide either to intervene>resign to get 80,20 or intervene>repress>escalate to get 40,0 so they would decide to resign. In the other option for Assad they have don’t intervene>resign 100,20, or don’t intervene>repress>intervene for 50,0, so they would decide 100,20. Then the U.S has to decide either intervene for 80,20 or don’t intervene for 100,20. They would decide 100,20.
Why would the U.S win and Assad resign even without intervention? Well maybe first because the threat of intervention coupled with U.S concrete power and credibility, if the U.S is known for intervening then this alone will result in Assad resigning without intervention in the first place.
Uncertainty and risk solve extensive game format on page 12:
These games have so far been solved under circumstances of perfect information, or certainty. Once uncertainty enters into the environment a range of problems might emerge. Consider the following extensive form game. Lets say that the US knows that North Korea has tried to make a nuclear-armed missile that could reach California, and that its chance of having succeeded is 0.8. The US must then decide between attacking North Korea or not, in an attempt to pre-empt. Let’s say the worst outcome is for the Americans is to have North Korea continue to build nuclear weapons and not challenge it. The second worst is to have to fight North Korea when it has only one nuclear weapon. The third worst is to have to attack North Korea when it does not have a weapon, and the best is to not have to invade North Korea, and it doesn’t have a nuclear weapon. I will assign the following payoffs to the US. This example, is not really a “game” (there is only one strategic player) but it is a quick way to show the role of ‘chance” and how it appears in an extensive form game, and of expected utility calculations.
The problem for the US is that it has to choose between two strategies: attack or don’t attack. To get the expected outcome under uncertainty what you do is the following formula: (% of one case X the value assigned one option) + (% of the other case X the value assigned to the SAME option) = compare this number to the other case and option by doing the same thing and then decide which one would yield a better outcome.
The expected outcome of attacking is (0.8 X -15) + (0.2 X –10) = -14. The expected outcome of NOT attacking is (0.8 X –20) + (0.2 X 0) = -16. So in this case, attacking is the best strategy provided the US is an expected utility maximizer. Note that these solutions may depend on the degree of risk aversion, the probabilities, and the actual values attached to each outcome. NOTE in particular that to solve games under uncertainty, ordinal rankings of outcomes are not sufficient (eg. best, second best, least best, etc), cardinal measures are needed (eg. A dollar amount, votes, etc.)
Repeated games: "shadow of the future" Prisoner's dilemma finite vs infinite games - examples of both grimmer "unforgiving strategy" tit-for-tat
Repeated games are interesting for a variety of reasons, including their ability to provide a more accurate reflection of many strategic environments, but also because they highlight aspects such as punishment, credibility and reputation
For example, in classic international relations theories some neoliberals (Keohane on some days?) suggest that game repetition can be useful for inducing cooperation, a reference to the “shadow of the future”. This feature arises because in many cases the ability to punish in the future for non-cooperation now makes cooperation a more attractive option. Thus, in a prisoner’s dilemma game (the most commonly used in IR) if I know the game is played once I might be quite happy to try and trick my opponent, or at least not get tricked by them; I would play non-cooperatively. If I know that I might be playing that same person for several games, that incentive might weaken.
Though notice that there are important difference between finite and infinite games, international relations is really an infinite game because we don’t know when the world comes to an end and we want to have a good reputation, make allies or think about the future, and conflict is largely a repeated game things don’t just end at some point they’re normally repeated an unknown number of times. In a finite game of repeated prisoner’s dilemma cooperation is not a sub-game perfect equilibrium, because it will not be used in the last game. Knowing that the opponent will defect in the last game makes both sides play non-cooperatively in the last game, and in all previous games as the strategy unravels.
Grimmer/unforgiving strategy: cooperate until the other player defects and then always defect thereafter. If the string of losses outweigh the gain then cooperation can emerge and be sustained in the repeated game, this is important because it shows how cooperation can be mutually rational in long-term interactions
Tit-for-tat strategy: whereby a player cooperates in the first round and then matches the other player’s preceding move in each round thereafter. This is fallible at it runs the risk of misperceptions, suppose one player thinks the other is cheating then it will respond with aggression/non-cooperation then because of that the other player will also respond badly. The slightest misperception leads to a breakdown of tit-for-tat, and can only be solved when a counter-misperception corrects the initial one.
The second dimension of repeated games is that learning and learning processes often become more important. Here, psychology and other disciplines have much to say in terms of how actors, even those presumed to be rational, process signals, learn, and modify their behaviour.