Ec2010a GT Sectionnotes All 2021
Ec2010a GT Sectionnotes All 2021
Chang Liu*
December 1, 2021
Contents
Page
1. Welcome to Game Theory .............................................................................................................. 1
2. Nash Equilibrium and Correlated Equilibrium .................................................................................... 14
3. Rationalizability and Nash Implementation ........................................................................................ 21
4. Bayesian Games ........................................................................................................................... 27
5. Dynamic Games (I) ....................................................................................................................... 40
6. Dynamic Games (II) ...................................................................................................................... 51
7. References ................................................................................................................................... 64
* These are an extended version of previous section notes by Kevin He and Jetlir Duraj. They contain additional exercises and material from
older problem sets of Jerry Green, from the book Game Theory by Maschler, Solan, and Zamir and from the graduate book with the same title by
Myerson. Please send comments and critiques to chang_liu@g.harvard.edu.
Ec2010a . Game Theory Section 1: Welcome to Game Theory1 10/24/2021
(1) Course outline; (2) Normal form games; (3) Extensive form games; (4) Strategies in extensive form games;
(5) Nash equilibrium and properties; (6) Optional: On the absent-minded driver
TF: Chang Liu (chang_liu@g.harvard.edu)
1 Course Outline
1.1 A taxonomy of games. The second half of Ec2010a is organized around several types of games, paying particular
attention to (i) relevant solution concepts in different settings, and (ii) some key economic applications belonging
to these settings. To understand the course outline, it might be helpful to first introduce some binary classification
schemes that give rise to these game types. Unfortunately, rigorous definitions of the following terminologies are not
feasible without first laying down some background, so at this point we will instead appeal to hopefully familiar games
to illustrate the classifications.
A game may have...
1.2 Course outline. Roughly, the course can be divided into 4 units. Each unit is focused on one type of game, studying
first its solution concepts then some important examples and applications. The relationship is summarized in Table 1.
Zamir (2013).
1
Game Type 3: Sequential move games with complete information
• Theory: Perfect Bayesian equilibrium (PBE), sequential equilibrium (SE), strategically stable equilibrium (SSE),
etc.
• Application: Reputation, signaling games
1.3 About sections. Sections are optional. We will review lecture material and work out some additional examples.
Please interrupt to ask questions. The use of the plural first-person pronoun “we” in these section notes does not
indicate royal lineage or pregnancy – rather, it suggests the notes form a conversation between the writer and the
audience.
2.1 Interpreting the payoff matrix. Here is the familiar payoff matrix representation of a two-player game.
Player 1 (P1) chooses a row (T op or Bottom) while player 2 (P2) chooses a column (Left or Right). Each cell contains
the payoffs to the two players when the corresponding pair of strategies is played. The first number in the cell is the
payoff to P1 while the second number is the payoff to P2. (By the way, this game is sometimes called the “game of
assurance”.)
Two important things to keep in mind:
(1) In a normal form game, players choose their strategies simultaneously. That is, P2 cannot observe which row P1
picks when choosing his column.
(2) The terminology “payoff matrix” is slightly misleading. The numbers that appear in a payoff matrix are actually
Bernoulli utilities, not monetary payoffs. To spell out this point in painstaking details: the set of possible outcomes
of the game is a four-point set X ≡ {T L, T R, BL, BR}. Each player j has a preference % j over ∆(X), the set of
distributions on this set. Assume % j satisfies the von Neumann-Morgenstern (vNM) axioms of independence and
continuity. Then, running % j through the vNM representation theorem, we find that % j is represented by a utility
function U j : ∆(X) → R with the functional form U j ( p) = pT L · u j (T L) + pT R · u j (T R) + pBL · u j (BL) + pBR · u j (BR).
We then enter u j (T L), u j (T R), u j (BL), u j (BR) into the payoff matrix cells, which happen to be 1, 0, 0, 2.
In particular, in computing the expected utility of each player under a mixed strategy profile, we simply take a weighted
average of the matrix entries – there is no need to apply a “utility function” to the entries before taking the average
as they are already denominated in utils. Furthermore, it is important to remember that this kind of linearity does not
imply risk-neutrality of the players, but is rather a property of the vNM representation.2
2.2 General definition of a normal form game. The payoff matrix representation of a game is convenient, but it is
not sufficiently general. In particular, it seems unclear how we can represent games in which players have infinitely
many possible strategies, such as a Cournot duopoly, in a finite payoff matrix. We therefore require a more general
definition.
D E
Definition 1 (Normal form game). A normal form game G = N, (S j ) j∈N , (u j ) j∈N consists of:
2 In fact, mixed strategies in game theory provided one of the motivations for von Neumann and Morgenstern’s work on their representation
theorem for preference over lotteries. von Neumann’s theorem on the equality between maximin and minimax values in mixed strategies for
zero-sum games assumed players choose the mixed strategy giving the highest expected value. But why should players choose between mixed
strategies based on expected payoff rather than median payoff, mean payoff minus variance of payoff, or say the 4th moment of payoff? The vNM
representation theorem rationalizes players maximizing expected payoff through a pair of conditions on their preference over lotteries.
2
1. A (finite) collection of players N = {1, 2, ..., n}.
2. A set of (pure) strategies S j for each j ∈ N.
3. A (Bernoulli) utility function u j : S → R for each j ∈ N.
To interpret, the pure strategy set S j is the set of actions that player j can take in the game. When each player chooses
an action simultaneously from their own pure strategy set, we get a strategy profile (s1 , s2 , ..., sn ) ∈ S . Players derive
payoffs by applying their respective utility functions to the strategy profile.
The payoff matrix representation of a game is a specialization of this definition. In a payoff matrix for 2 players, the
elements of S 1 and S 2 are written as the names of the rows and columns, while the values of u1 and u2 at different mem-
bers of S 1 × S 2 are written in the cells. If S 1 = {s1A , s1B } and S 2 = {s2A , s2B }, then the game G = h{1, 2}, (S 1 , S 2 ), (u1 , u2 )i
can be written in a payoff matrix:
s2A s2B
s1A u1 (s1A , s2A ), u2 (s1A , s2A ) u1 (s1A , s2B ), u2 (s1A , s2B )
s1B u1 (s1B , s2A ), u2 (s1B , s2A ) u1 (s1B , s2B ), u2 (s1B , s2B )
Conversely, the game of assurance can be converted into the standard definition by taking N = {1, 2}, S 1 = {T, B},
S 2 = {L, R}, u1 (T, L) = 1, u1 (B, R) = 2, u1 (T, R) = u1 (B, L) = 0, u2 (T, L) = 1, u2 (B, R) = 2, u2 (T, R) = u2 (B, L) = 0.
The general definition allows us to write down games with infinite strategy sets. In a duopoly setting where firms
choose own production quantity, their choices are not taken from a finite set of possible quantities, but are in principle
allowed to be any positive real number. So, consider a game with S 1 = S 2 = [0, ∞),
where p(·) and C(·) are inverse demand function and cost function, respectively. Interpreting s1 and s2 as the quantity
choices of firm 1 and firm 2, this is Cournot competition phrased as a normal form game.
2.3 Recurring notations. The following notations are common in game theory but usually go unexplained. Let
X1 , X2 , . . . , Xn be a sequence of sets with typical elements x1 ∈ X1 , x2 ∈ X2 , ... Then:
• X− j means
Q Q
Xk where
1≤k≤n,k, j denotes Cartesian product.
• X sometimes means nj=1 X j .
Q
To see an example of these notations, suppose we are studying a three player game
Then s−2 refers to a vector containing strategies from player 1 and player 3, but not player 2. It is an element of
S 1 × S 3 , also written as S −2 .
2.4 Mixed strategies in normal form games. A player who uses a mixed strategy in a game intentionally introduces
randomness into her play. Instead of picking a deterministic action as in a pure strategy, a mixed strategy user tosses
a coin to determine what action to play. Game theorists are interested in mixed strategies for at least two reasons: (i)
mixed strategies correspond to how humans play certain games, such as rock-paper-scissors; (ii) the space of mixed
strategies represents a convexification of the action set S i and convexity is required for many existence results.
Henceforth, denote by ∆(A) the set of probability distributions over a set A.
D E
Definition 2 (Mixed strategy in normal form). Suppose G = N, (S j ) j∈N , (u j ) j∈N is a normal form game where each
S j is finite.4 Then a mixed strategy for player j, σ j , is a probability distribution over S j . That is, σ j ∈ ∆(S j ).
3 Sometimes also called a profile.
4 We can also define mixed strategies when the set of actions S j is infinite. However, we would need to first equip S j with some σ-algebra, then
define a mixed strategy as a probability measure on this σ-algebra.
3
Sometimes the mixed strategy that puts probability p1 on action s(1) (2)
1 and probability 1 − p1 on action s1 is written
1 ⊕ (1 − p1 )s1 . The “⊕” notation (in lieu of “+”) is especially useful when s1 , s1 are numbers, as to avoid
as p1 s(1) (2) (1) (2)
1. When two or more players play mixed strategies, their randomizations are assumed to be independent.
2. Technically, pure strategies also count as mixed strategies – they are simply degenerate distributions on the
action set. The term strictly mixed is usually used for a mixed strategy that puts strictly positive probability
on every action.
When a profile of mixed strategies σ is played, the assumption on independent mixing, together with payoff matrix
entries being Bernoulli utilities in a vNM representation, imply that player j gets utility:
X
σ1 (s1 ) · · · σn (sn )u j (s1 , s2 , . . . , sn ).
(s1 ,s2 ,...,sn )∈S
We will abuse notation and write u j (σ j , σ− j ) for this utility, extending the domain of u j into mixed strategies. We
observe the following fact which turns out to be very useful.
Fact 3. For any fixed σ− j , the map σ j 7→ u j (σ j , σ− j ) is affine, in the sense that
X
u j (σ j , σ− j ) = σ j (s j )u j (s j , σ− j ).
s j ∈S j
That is, the payoff to playing σ j against opponents’ mixed strategy profile σ− j is some weighted average of the |S j |
numbers (u j (s j , σ− j )) s j ∈S j , where the weights are given by the probabilities that σ j assigns to these different actions.
3.1 Definition of an extensive form game. The rich framework of extensive form games can incorporate sequential
moves, incomplete and perhaps asymmetric information, randomization devices such as dice and coins, etc. It is
one of the most powerful modeling tools of game theory, allowing researchers to formally study a wide range of
economic interactions. Due to this richness, however, the general definition of an extensive form game is somewhat
cumbersome. Roughly speaking, an extensive form game is a tree endowed with some additional structures. These
additional structures formalize the rules of the game: the timing and order of play, the information of different players,
randomization devices relevant to the game, outcomes and players’ preferences over these outcomes, etc.
Definition 4 (Extensive form game). A (finite-horizon) extensive form game Γ consists of:
The game tree captures all possible states of the game. When players reach a terminal vertex z ∈ Z of the game tree,
the game ends and each player j receives utility u j (z). The player function J indicates who moves at each non-terminal
vertex. The move might belong to an actual player j ∈ N, or to chance, “c”. Note that V j refers to the set of all vertices
where player j has the move. If a player j moves at vertex v, she gets to pick an element from the set M j,v and play
proceeds along the corresponding edge. If chance moves, then play proceeds along a random edge chosen according
to f (·|v).
4
An information set I j of player j refers to a set of vertices that player j cannot distinguish between.5 It might be
useful to imagine the players conducting the game in a lab, mediated by a computer. At each vertex v ∈ V\Z, the
computer finds the player J(v) who has the move and informs her that the game has arrived at the information set
I J(v) 3 v. In the event that this I J(v) is a singleton, player J(v) knows exactly her location in the game tree. Else, she
knows only that she is at one of the vertices in I J(v) , but she does not know for sure which one.6 The requirement
that two vertices in the same information set must have the same sets of moves is to prevent a player from gaining
additional information by simply examining the set of moves available to her, which would defeat the idea that the
player supposedly cannot distinguish between any of the vertices in the same information set. For convenience, we
also write MI j for the common move set for all vertices v ∈ I j .
There are two conventions for indicating an information set I j in a game tree diagrams. Either all of the vertices in I j
are connected using dashed lines, or all of the vertices are encircled in an oval.
Example 5. Figure 1 illustrates all the pieces of the general definition of an extensive form game.
1
r x (2, 1)
(5, −2) l 2 y
t (0, 2)
1 a c m
z (3, 0)
d
2
(2, 0) b x
y
2 (1, 1)
y x
(−1, 2)
(2, 0) (1, 2)
Chance move distributions: f (a|(l)) = f (b|(l)) = 0.5.
Information partitions: I1 = { ∅, (l, a) }, I2 = { (r), {(l, b), (m)} }.
Figure 1: An extensive form game with incomplete information and chance moves.
For convenience, let’s name each vertex with the sequence of moves leading to it (and name the root as ∅). The set of
players is N = {1, 2}. The player function J(v) is shown on each v ∈ V\Z in Figure 1, while the payoff pair (u1 (z), u2 (z))
is shown on each z ∈ Z. The set of moves M j,v at vertex v is shown on the corresponding edges. Player 1 moves at two
vertices, V1 = { ∅, (l, a) }. Her information partition contains only singleton sets, meaning she always knows where
in the game tree she is when called upon to move. Player 2 moves at three vertices, V2 = { (r), (l, b), (m) }. However,
player 2 cannot distinguish between (m) and (l, b), though he can distinguish (r) from the other two vertices. As such,
his information partition contains two information sets, one containing just (r), the other containing the two vertices
(m) and (l, b). As required by the definition, M2,(m) = M2,(l,b) = {x, y}, so that player 2 cannot figure out whether he is
at (m) or (l, b) by looking at the set of available moves.
The definition of extensive form games given above, allows us to formally characterize games of perfect information
as well.
Definition 6 (Game of perfect information). An extensive form game is called a game of perfect information if all
the information sets of all players contain one node.
D3.2 Using information E sets to convert a normal form game into extensive form. Every finite normal form game G =
N, (S j )nj=1 , (u j )nj=1 may be converted into an extensive form game of incomplete information with 1 + nm=1 mj=1 |S j |
P Q
5 The use of an information partition to model a player’s knowledge predates extensive form games with incomplete information. Such models
usually specify a set of states of the world, Ω, then some partition I j on Ω. When a state of the world ω ∈ Ω realizes, player j is told the
information set I j ∈ I j containing ω, but not the exact identity of ω. So then, finer partitions correspond to better information. To take an
example, suppose 3 people are standing facing the same direction. An observer places a hat of one of two colors (say color 0 and color 1) on
each of the 3 people. These 3 people cannot see their own hat color or the hat color of those standing behind them. Then the states of the world
are Ω = {000, 001, 010, 011, 100, 101, 110, 111}. The person in the front of the line has no information, so her information partition contains
just one information set with all the states, I1 = { {000, 001, 010, 011, 100, 101, 110, 111} }. The second person in line sees only the hat color
of the first person, so that I2 = { {000, 010, 100, 110}, {001, 011, 101, 111} }. Finally, the last person sees the hats of persons 1 and 2, so that
I3 = { {000, 100}, {001, 101}, {010, 110}, {011, 111} }. In the context extensive form games, one might think of V j as the relevant “states of the
world” for j’s decision-making and the fineness of her information partition I j reflects the extent to which she can distinguish between these states.
6 She might, however, be able to form a belief as to the likelihood of being at each vertex in I
J(v) , based on her knowledge of other players’
strategies and the chance move distributions.
5
vertices. Construct a game tree with n + 1 levels, so that all the vertices at level m belong to a single information set
for player m, for 1 ≤ m ≤ n. Level 1 contains the root. The root has |S 1 | children, corresponding to the actions in S 1 .
These children form the level 2 vertices. Each of these level 2 vertices has |S 2 | children, corresponding to the actions
in S 2 , and so forth. Each terminal vertex z in level n + 1 corresponds to some action profile (sz1 , sz2 , ..., szn ) in the normal
form game G and is assigned utility u j (sz1 , sz2 , ..., szn ) for player j in the extensive form game. Figure 2 illustrates such
a conversion using the game of assurance discussed earlier.
T B
2 2
L R L R
4.1 Pure strategy in extensive form games. How would you write a program to play an extensive form game as player
j? Whenever it is player j’s turn, the program should take the information set as an input and return one of the
feasible moves as an output. As the programmer does not a priori know the strategies that other players will use,
the program must encode a complete contingency plan for playing the game so that it returns a legal move at every
vertex of the game tree where j might be called upon to play. This motivates the definition of a pure strategy in an
extensive form game.
Definition 7 (Pure strategy). In an extensive form game, a pure strategy for player j is a function s j : I j →
S
I j ∈I j MI j , so that s j (I j ) ∈ MI j for each I j ∈ I j . Write S j for the set of all pure strategies of player j.
That is, a pure strategy for player j returns a legal move at every information set of j.
Example 8. In Figure 1, one of the strategies of P1 is s1 (∅) = m, s1 (l, a) = d. Even though playing m at the root
means the vertex (l, a) will never be reached, P1’s strategy must still specify what she would have done at (l, a). This
is because some solution concepts we will study later in the course require us to examine parts of the game tree which
are unreached when the game is played. Intuitively, this is necessary because the optimality of an action for a player
at some information set may depend on what she/he and her/his opponents would have played on an information set
which would be reached only if the player chooses differently than the strategy under consideration.
One of the strategies of P2 is s2 ({(l, b), (m)}) = y, s2 (r) = z. In every pure strategy P2 must play the same action at both
(l, b) and (m), as pure strategies are functions of information sets, not individual vertices. In total, P1 has 6 different
pure strategies in the game and P2 has 6 different pure strategies.
4.2 Two definitions of randomization. There are at least two natural notions of “randomizing” in an extensive form
game: (i) Player j could enumerate the set of all possible pure strategies, S j , then choose an element of S j at random;
(ii) Player j could pick a randomization over MI j for each of her information sets I j ∈ I j . These two notions of
randomization lead to two different classes of strategies that incorporate stochastic elements:
Definition 9 (Mixed strategy). A mixed strategy for player j is an element σ j ∈ ∆(S j ).
Definition 10 (Behavioral strategy). A behavioral strategy for player j is a collection of distributions {bI j }I j ∈I j , where
bI j ∈ ∆(MI j ).
Strictly speaking, mixed strategies and behavioral strategies form two distinct classes of objects. We may, however,
talk about the equivalence between a mixed strategy and a behavioral strategy in the following way:
Definition 11. A mixed strategy σ j and a behavioral strategy {bI j } are equivalent if they generate the same distribution
over terminal vertices regardless of the strategies used by opponents, which may be mixed or behavioral.
6
Note that in this definition for both the behavioral and the mixed case, opponents of j are assumed to play indepen-
dently of each other.
Example 12. In Figure 1, a behavioral strategy for P1 is: b∗∅ (l) = 0.5, b∗∅ (m) = 0, b∗∅ (r) = 0.5, b∗(l,a) (t) = 0.7, b∗(l,a) (d) =
0.3. That is, P1 decides that she will play m and r each with 50% probability at the root of the game. If she ever
reaches the vertex (l, a), she will play t with 70% probability, d with 30% probability. But now, consider the following
1 (∅) = l, s1 (l, a) = t; s1 (∅)
4 pure strategies: s(1) (1) (2)
=l, s(2)
(1) 1 (l, a) = d; s1 (∅) = r, s1 (l,a) = t; s1 (∅)
(3) (3) (4)
=r, s(4)
1 (l, a) = d
and construct the mixed strategy σ so that σ s1 = 0.35, σ s1 = 0.15, σ s1 = 0.35, σ s1 = 0.15. Then
∗ ∗ ∗ (2) ∗ (3) ∗ (4)
It is often “nicer” to work with behavioral strategies than mixed strategies, for at least two reasons. One, behavioral
strategies are easier to write down and usually involve fewer parameters than mixed strategies. Two, it feels more
natural for a player to randomize at each decision node than to choose a “grand plan” at the start of the game. In
general, however, neither the set of mixed strategies nor the set of behavioral strategies is a “subset” of the other, as
we now demonstrate.
Example 13 (A mixed strategy without an equivalent behavioral strategy). Consider an absent-minded city driver
who must make turns at two consecutive intersections. Upon encountering the second intersection, however, she does
not remember whether she turned left (T ) or right (B) at the first intersection. The mixed strategy σ1 putting probability
50% on each of the two pure strategies T 1 T 2 and B1 B2 generates the outcome O1 50% of the time and the outcome
O4 50% of the time. However, this outcome distribution cannot be obtained using any behavioral strategy. That is,
if the driver chooses some probability of turning left at the first intersection and some probability of turning left at
the second intersection, and furthermore these two randomizations are independent, then she can never generate the
outcome distribution of 50% O1 , 50% O4 .
T2 O4
T1
B2 O3
I I
T2 O2
B1
B2 O1
Example 14 (A behavioral strategy without an equivalent mixed strategy). Consider an absent-minded highway
driver who wants to take the second highway exit. Starting from the root of the tree, he wants to keep left (L) at the
first highway exit but keep right (R) at the second highway exit. Upon encountering each highway exit, however, he
does not remember if he has already encountered an exit before. The driver has only two pure strategies: always L or
always R. It is easy to see no mixed strategy can ever achieve the outcome O2 . However, the behavioral strategy of
taking L and R each with 50% probability each time he arrives at his information set gets the outcome O2 with 25%
probability.
O3
R
R O2
L
L O1
7
These two examples are “pathological” in the sense that the drivers “forget” some information that they knew be-
fore. The city driver forgets what action she took at the previous information set. The highway driver forgets what
information sets he has encountered. The definition of perfect recall rules out these two pathologies.
Definition 15 (Perfect recall). An extensive form game has perfect recall if for each player j and information set I j ,
whenever v, v0 ∈ I j , the two paths leading from the root to v and v0 pass through the same sequence of information sets
of player j, and player j takes the same actions at these information sets.
To put it another way, a game of perfect recall makes it impossible for a player who can remember all the information
about the path of play she gathered in previous stages (i.e., never forgets anything) to find out in which node of any
non-singleton information set she is located.
In the examples above: the city driver game fails perfect recall since taking two different actions from the root vertex
lead to two vertices in the same information set. The highway driver game fails perfect recall since vertices x1 and x2
are in the same information set, yet the path from root to x1 is empty while the path from root to x2 passes through one
information set.
Kuhn’s theorem states that in a game with perfect recall, it is without loss to analyze only behavioral strategies. Its
proof is beyond the scope of this course.
Theorem 16 (Kuhn, 1953). In a finite extensive game with perfect recall, (i) every mixed strategy has an equivalent
behavioral strategy, and (ii) every behavioral strategy has an equivalent mixed strategy.
5.1 What does it mean to “solve” a game? A detour into combinatorial game theory. Why are economists interested
in Nash equilibrium, or solution concepts in general? As a slight aside, you may want to know that there actually exist
two areas of research that go by the name of “game theory”. The full names of these two areas are combinatorial
game theory and equilibrium game theory. Despite the similarity in name, these two versions of game theory have
quite different research agendas. The most salient difference is that combinatorial game theory studies well-known
board games like chess where there exists (theoretically) a “winning strategy” for one player. Combinatorial game
theorists aim to find these winning strategies, thereby solving the game. On the other hand, no “winning strategies”
(usually called dominant strategies in our lingo) exist for most games studied by equilibrium game theorists7 . In the
game of assurance, for example, due to the simultaneous move condition, there is no one strategy that is optimal for
P1 regardless of how P2 plays, in contrast to the existence of such optimal strategies in, say, tic-tac-toe.
If a game has a dominant strategy for one of the players, then it is straight-forward to predict its outcome under optimal
play. The player with the dominant strategy will employ this strategy and the other player will do the best they can
to minimize their losses. However, predicting outcome in a game without dominant strategies requires the analyst to
make assumptions. These assumptions are usually called equilibrium assumptions and give equilibrium game theory
its name. One of the most common equilibrium assumptions in normal form games with complete information is the
Nash equilibrium, which we now study.
5.2 Definition of Nash equilibrium. A Nash equilibrium8 is a strategy profile where no player can improve upon her
own payoff through a unilateral deviation, taking as given the actions of others. This leads to the usual definition of
pure and mixed strategy Nash equilibria.
D E
Definition 17 (Pure strategy Nash equilibrium). In a normal form game G = N, (S j ) j∈N , (u j ) j∈N , a pure strategy
Nash equilibrium is a pure strategy profile s∗ such that for every player j, u j (s∗j , s∗− j ) ≥ u j (s0j , s∗− j ) for all s0j ∈ S j .
D E
Definition 18 (Mixed strategy Nash equilibrium). In a normal form game G = N, (S j ) j∈N , (u j ) j∈N , a mixed strategy
Nash equilibrium is a mixed strategy profile σ∗ such that for every player j, u j (σ∗j , σ∗− j ) ≥ u j (s0j , σ∗− j ) for all s0j ∈ S j .
In the definition of a mixed Nash equilibrium, we required no profitable unilateral deviation to any pure strategy, s0i . It
would be equivalent to require no profitable unilateral deviation to any mixed strategy, due to the observation in Fact
3. If there is some profitable mixed strategy deviation σ0j from a strategy profile (σ∗j , σ∗− j ), then it must be the case that
for at least one s0j ∈ S j with σ0j (s0j ) > 0, u j (s0j , σ∗− j ) > u j (σ∗j , σ∗− j ).
7 The one-shot prisoner’s dilemma is an exception here.
8 John Nash called this equilibrium concept “equilibrium point” (Nash, 1950, 1951) but later researchers referred to it as “Nash equilibrium”.
We will see a similar situation later.
8
Example 19 (Game of assurance). Consider the game of assurance,
L R
T 1, 1 0, 0
B 0, 0 2, 2
We readily verify that both (T, L) and (B, R) are pure strategy Nash equilibria. Note one of these two Nash equilibria
Pareto dominates the other. In general, Nash equilibria need not be Pareto efficient. This is because the definition of
NE only accounts for the absence of profitable unilateral deviations. Indeed, starting from the strategy profile (T, L),
if P1 and P2 can contract on simultaneously changing their strategies, then they would both be better off. However,
these sorts of simultaneous deviations by a “coalition” are not allowed.
But wait, there’s more! Suppose P1 plays 23 T ⊕ 31 B, and P2 plays 32 L ⊕ 13 R. This strategy profile is a mixed NE. The
reasoning is as follows. When P1 is playing 23 T ⊕ 13 B, P2 gets an expected payoff of 32 from playing L and an expected
payoff of 32 from playing R. Therefore, P2 has no profitable unilateral deviation because every strategy he could play,
pure or mixed, would give the same payoff of 23 . Similarly, P2’s mixed strategy 23 L ⊕ 13 R means P1 gets an expected
payoff of 32 whether she plays T or B, so P1 does not have a profitable deviation either.
5.3 Nash equilibrium as a fixed-point of the best response correspondence. Nash equilibrium embodies the idea of
stability. To make this point clear, it is useful to introduce an equivalent view of the Nash equilibrium through the lens
of best response correspondences.
Definition 20 (Best response correspondence). The individual pure best response correspondence for player j is
BR j : S − j ⇒ S j 9 where
BR j (s− j ) ≡ arg max u j (s0j , s− j ).
s0j ∈S j
To interpret, the individual best response correspondences return the maximizer(s) of each player’s utility function
when opponents plays some known strategy profile. Depending on others’ strategies, the player may have multiple
maximizers, all yielding the same utility. As a result, we must allow the best responses to be correspondences rather
than functions. Then, it is easy to see that:
Proposition 21. A pure strategy profile is a pure strategy Nash equilibrium if and only if it is a fixed point of BR. A
mixed strategy profile is a mixed strategy Nash equilibrium if and only if it is a fixed point of BR.
Fixed points of the best response correspondences reflect stability of NE strategy profiles, in the sense that even if
player i knew what others were going to play, she still would not find it beneficial to change her actions. This rules
out cases where a player plays in a certain way only because she held the wrong expectations about other players’
strategies. We might expect such outcomes to arise initially when inexperienced players participate in the game, but
we would also expect such outcomes to vanish as players learn to adjust their strategies to maximize their payoffs over
time. That is to say, we expect non-NE strategy profiles to be unstable.
5.4 Some properties of Nash equilibria. Here are several important properties of NE. The first two are useful when
computing NE:
Property 1: The indifference principle in mixed strategy Nash equilibria.
In Example 19, we saw that each action that one player plays with strictly positive probability yields the same expected
payoff against the mixed strategy profile of the opponent. Turns out this is a general phenomenon.
9 The notation f : A ⇒ B is equivalent to f : A → 2B .
9
Proposition 22. If σ∗ is a Nash equilibrium, then for s j and s0j such that σ∗j (s j ) > 0 and σ∗j (s0j ) > 0, we have
u j (s j , σ∗− j ) = u j (s0j , σ∗− j ).
Proof. It suffices to show that u j (s j , σ∗− j ) = u j (σ∗j , σ∗− j ) for any s j such that σ∗j (s j ) > 0. Suppose that, to the contrary,
we may find s j ∈ S j so that σ∗j (s j ) > 0 but u j (s j , σ∗− j ) , u j (σ∗j , σ∗− j ).
1. If u j (s j , σ∗− j ) > u j (σ∗j , σ∗− j ), we contradict the optimality of σ∗j in the maximization problem arg maxσ0j ∈∆(S j ) u j (σ0j , σ− j ),
for we should have just picked σ̂ j = s j , the degenerate distribution on pure strategy s j .
r
X
u j (σ∗j , σ∗− j ) = σ∗j (s(k)
j ) · u j (s j , σ− j ).
(k) ∗
k=1
Property 2: In each Nash equilibrium, no player puts positive probability on a strictly dominated strategy.
Definition 23 (Strictly dominated). A pure strategy s j ∈ S j of player j is strictly dominated if there exists a (mixed)
strategy σ j ∈ ∆(S j ) such that for every s− j ∈ S − j ,
u j (σ j , s− j ) > u j (s j , s− j ).
It turns out that recognizing strictly dominated strategies can simplify the analysis of the Nash equilibria of a game,
since no player would ever employ them in a Nash equilibrium strategy.
Proposition 24. If σ∗ is a Nash equilibrium and s j is strictly dominated, then σ∗j (s j ) = 0.
As a corollary, iterated elimination of strictly dominated strategies (IESDS)10 does not change the set of NE. In a
game G(1) , we can remove some or all of each player’s strictly dominated strategies to arrive at a new game G(2) , which
will have the same set of NE as G(1) . Furthermore, this procedure can be repeated, removing some of each player’s
strictly dominated strategies in G(t) to arrive at G(t+1) . All of the games G(1) , G(2) , G(3) , ... will have the same set of NE,
but computing NE of the later games is probably easier than computing NE of the original game G(1) .
Next we turn to the mathematical properties of the set of Nash equilibria:
Property 3: The set of Nash equilibria of a finite game is closed, but in general not convex.
The set of Nash equilibria of a finite game is a subset of the product space of strategy profiles j∈N ∆(S j ).
Q
• To see that it is a closed subset, consider a sequence of Nash equilibria of the game G, {σ(m) }, that converges to
a strategy profile σ∗ as m → ∞. We then have
u j σ(m)
j , σ− j ≥ u j σ j , σ− j ,
(m) (m)
∀ j ∈ N, ∀σ j ∈ ∆(S j ).
Note that the payoff functions are continuous in their arguments. We can pass to the limit to m → ∞ to show
u j σ∗j , σ∗− j ≥ u j σ j , σ∗− j , ∀ j ∈ N, ∀σ j ∈ ∆(S j ).
This is just the definition of σ∗ being a Nash equilibrium profile!
10 Next section will focus more on IESDS and its properties. Some comments on the relation of the property of never best response to the
property of being strictly dominated: These two concepts are equivalent for two-player games. For games with more than two players, they are in
general not equivalent, under the usual assumption that opponents of a player cannot correlate their strategies.
10
• To give a counterexample to the set of Nash equilibria being convex, consider (again) the game of assurance.
L R
T 1, 1 0, 0
B 0, 0 2, 2
(T, L) and (B, R) are Nash equilibria, but 12 T ⊕ 21 B, 12 L ⊕ 12 R is not, as it violates the indifference principle. For
instance, when P1 plays 21 T ⊕ 12 B, P2 would deviate to playing R with probability one.
Examples of symmetric games are Bertrand duopoly or Cournot duopoly with identical costs, the prisoner’s dilemma,
etc.
Property 4: Every finite symmetric game has a symmetric mixed strategy Nash equilibrium (Nash, 1951).
One can adapt the proof of Nash’s existence theorem to show that every finite symmetric game has a symmetric mixed
strategy Nash equilibrium: an equilibrium σ∗ satisfying σ∗j = σ∗k for all j, k ∈ N.
This fact can come in handy when solving games. Moreover, the assumption of symmetric play is natural in the sense
that players should be interchangeable in symmetric games (i.e., their identity doesn’t matter for the game play).
The two types of analyses presented here are intuitively nearer to the concepts of Bayesian Nash equilibrium and
correlated equilibrium that we will cover in later parts of the lecture. They are based on Aumann, Hart, and Perry
(1997).
Consider the absent-minded driver game we saw in the lecture:
start
exit
X 0
continue
exit
Y 4
continue
11
We calculated in the lecture that the optimal behavioral strategy puts probability of p = 23 on “continue”. Here, in
the first part, we consider another approach to the same problem, which takes beliefs of the driver about her location
within the info set into account. In the second part, we show that with the help of simple correlating devices, it is
possible to achieve an even higher payoff than with mixed or behavioral strategies.
1. The driver makes a decision at each intersection through which he passes. Moreover, at any intersection, she
can determine the action only there (she cannot determine the action at the other intersection).
2. Since she can’t distinguish between intersections, whatever reasoning obtained at one intersection must be
obtained also at the other, and she is aware of this.
• The optimal decision is the same at both intersections; it is pinned down by the probability of choosing “con-
tinue” at each intersection. Call it p∗ .
• Therefore, at each intersection, the driver believes that p∗ is chosen at the other intersection.
• The driver has a belief over her location within her information set. At each intersection, the driver optimizes
her decision given her beliefs. Therefore, choosing p = p∗ at the current intersection she is located, must be
optimal given the belief that p∗ is chosen at the other intersection. Moreover, her belief must be derived from
the strategy she chooses.
By the principle of indifference11 in Bayesian statistics, without any information about the strategy chosen, the prob-
ability of being at X will be 12 . Denote α(p∗ ) the belief the driver has about being at the intersection X, given her
strategy of choosing p∗ at the other intersection. The reasoning above and Bayes rule implies, that
1
α(p∗ ) = 2
.
1
2 + 12 p∗
Given her beliefs about the behavior at the other node, the payoff of choosing p at the current node can be computed
as
(4 − 6p∗ )p + 4p∗
h(p, p∗ ) = α(p∗ )[(1 − p) · 0 + p(1 − p∗ ) · 4 + pp∗ · 1] + (1 − α(p∗ ))[(1 − p) · 4 + p · 1] = .
1 + p∗
p must be chosen optimally, given the belief p∗ . Moreover, p must be equal to p∗ , since the agent doesn’t distinguish
between the nodes. That is, p∗ must fulfill
p∗ ∈ arg max h(p, p∗ ).
p∈[0,1]
This shows that the solution is unique and equal to p∗ = 23 , the same as for the optimal behavioral strategy! Recall that
the payoff of the optimal behavioral strategy is 43 .
The handkerchief solution. Assume the driver has a handkerchief in her pocket. Whenever she goes through an
intersection, if there was no knot, she ties a knot in the handkerchief; if there was a knot, she unties it.
Assume that at the beginning, it is equally probable that the handkerchief had a knot or not. Assume that the driver is
absent-minded in the sense that she cannot remember which was the case. Thus, at each one of the two intersections,
the probability of having a knot in the handkerchief is 12 . Therefore, seeing a knot or not at each intersection does not
reveal any information about the location of the intersection.
11 The principle of indifference states that in the absence of any relevant evidence, agents should distribute their credence equally among all the
12
Consider the following strategy for the driver: exit if there is a knot, continue if there is not. The payoff of this simple
strategy is 12 · 0 + 21 · 4 = 2: with probability 12 the handkerchief had a knot in the first place, so that the driver exits and
the payoff is 0; with probability 12 the handkerchief had no knot so that the driver continues and ties a knot, and at the
next node, seeing the knot the driver exits so that the payoff realized is 4.
Note that the path of play induced by this strategy can not be replicated using a behavioral strategy: the handkerchief
allows the driver to avoid ever reaching the last node with payoff 1. Note also that 2 > 43 , the payoff from the best
behavioral strategy. The handkerchief has served as a coordination device between the forgetful selves in the different
nodes and has achieved a higher payoff than the best behavioral strategy!
13
Ec2010a . Game Theory Section 2: Nash Equilibrium and Correlated Equilibrium 10/31/2021
(1) Solving for Nash equilibria; (2) Correlated equilibrium; (3) Characterization of correlated equilibria
TF: Chang Liu (chang_liu@g.harvard.edu)
The following steps may be helpful in solving for Nash equilibria of two-player games.
L R Y
T 2, 2 −1, 2 0, 0
B −1, −1 0, 1 1, −2
X 0, 0 −2, 1 0, 2
Solution:
Step 1: Strategy X for P1 is strictly dominated by 12 T ⊕ 21 B. Indeed, u1 (X, L) = 0 < 0.5 = u1 12 T ⊕ 12 B, L ,
u1 (X, R) = −2 < −0.5 = u1 12 T ⊕ 12 B, R , and u1 (X, Y) = 0 < 0.5 = u1 21 T ⊕ 12 B, Y . But having eliminated X for P1,
strategy Y for P2 is strictly dominated by R: u2 (T, Y) = 0 < 2 = u2 (T, R), u2 (B, Y) = −2 < 1 = u2 (B, R). Hence we
can restrict attention to the smaller, 2 × 2 game in the upper left corner.
Step 2: (T, L) is a pure Nash equilibrium as no player has a profitable unilateral deviation. (The deviation L → R
does not strictly improve the payoff of P2, so it doesn’t break the equilibrium.) At (T, R), P1 deviates T → B, so it
is not a pure strategy Nash equilibrium. At (B, L), P2 deviates L → R. At (B, R), no player has a profitable unilateral
deviation, so it is a pure strategy Nash equilibrium. In summary, the game has two pure strategy Nash equilibria: (T, L)
and (B, R).
Step 3: Now we look for mixed Nash equilibria where one player is using a pure strategy while the other is using a
strictly mixed strategy. As discussed before, if a player strictly mixes between two pure strategies, then she must be
getting the same payoff from playing either of these two pure strategies.
Using this indifference principle, we quickly realize it cannot be the case that P2 is playing a pure strategy while P1
strictly mixes. Indeed, if P2 plays L then u1 (T, L) > u1 (B, L). If P2 plays R then u1 (B, R) > u1 (T, R).
Similarly, if P1 is playing B, then the indifference condition cannot be sustained for P2 since u2 (R, B) > u2 (L, B).
Now suppose P1 plays T . Then u2 (T, L) = u2 (T, R). This indifference condition ensures that any strictly mixed strategy
of P2 pL ⊕ (1 − p)R for p ∈ (0, 1) is a mixed best response to P1’s strategy. However, to ensure this is a mixed Nash
equilibrium, we must also make check P1 does not have any profitable unilateral deviation. This requires:
That is to say,
1
2p + (−1) · (1 − p) ≥ (−1) · p + 0 · (1 − p) ⇔ p≥ .
4
Therefore, (T, pL ⊕ (1 − p)R) is a mixed Nash equilibrium where P2 strictly mixes when p ∈ [ 41 , 1).
Step 4: There are no mixed Nash equilibria where both players are strictly mixing. To see this, notice that if σ∗1 (B) > 0,
then
u2 (σ∗1 , L) = 2 · (1 − σ∗1 (B)) + (−1) · (σ∗1 (B)) < 2 · (1 − σ∗1 (B)) + (1) · (σ∗1 (B)) = u2 (σ∗1 , R).
14
So it cannot be the case that P2 is also strictly mixing, since P2 is not indifferent between L and R.
To sum up, the game has two pure Nash equilibria, (T, L) and (B, R), as well as infinitely many mixed Nash equilibria,
(T, pL ⊕ (1 − p)R) for p ∈ [ 41 , 1).
Sometimes, iterated elimination of strictly dominated strategy simplifies the game so much that the solution is imme-
diate after this process. The following example illustrates.
Example 27 (Guess two-thirds the average, sometimes also called the beauty contest game12 ). Consider a game of 2
2
players G(1) where S 1 = S 2 = [0, 100], ui (si , s−i ) = − si − 23 · si +s
−i
2 . That is, each player wants to play an action as
close to two-thirds the average of the two actions as possible.
We claim that for each player i, every action in (50, 100] is strictly dominated by the action 50. To see this, for any
opponent action s−i ∈ [0, 100], we have 32 · 50+s
2
−i
≤ 50, so the guess 50 is already too high. At the same time, playing
any si > 50 exacerbates the error relative to playing 50,
2 si + s−i 2 50 + s−i
si − · > 50 − · ≥ 0.
3 2 3 2
si +s−i 2
2
Thus, − si − 2
3 · 2 < − 50 − 2
3 · 50+s
2
−i
for all si ∈ (50, 100] and we have the claimed strict dominance.
This means we may delete the set of actions (50, 100] from each S i to arrive at a new game G(2) where each player is
restricted to using only [0, 50]. The game G(2) will have the same set of Nash equilibria as the original game. But the
same logic may be applied again to show that in G(2) , for each player, any action in (25, 50] is strictly dominated by
the action 25. We may continue in this way iteratively to arrive at a sequence of games (G(k) )k≥1 , so that in the game
k
G(k+1) , player i’s action set is 0, 21 · 100 . All of the games G(1) , G(2) , G(3) , ... have the same Nash equilibria. This
means any NE of G(1) must involve each player playing an action in
∞ !k
\ 1
· 100 = {0}.
0,
k=1
2
X L R Y L R
T 1, 1, 1 0, 1, 3 T 3, 0, 1 1, 1, 0
B 1, 3, 0 1, 0, 1 B 0, 1, 1 0, 0, 0
We claim that there is a unique Nash equilibrium and it is in pure strategies: (T, L, X).
Step 1: There is a unique pure strategy Nash equilibrium, (T, L, X).
To check that (T, L, X) is NE: if players 1 and 2 play (T, L), then player 3 is indifferent, so she might as well choose
X. Note also, that (T, L) is NE of the two-player game created by taking matrix X and deleting all payoffs of player 3.
In this restricted game, there is also NE where players 1 and 2 play (B, L), but then player 3 would like to switch matrix
to Y. If we restrict matrix Y to the payoffs of players 1 and 2 only, we see that there is only one pure strategy NE in
the two-player induced game: (T, R), but then player 3 would like to X.
In all, there is only one pure strategy Nash equilibrium, (T, L, X).
Step 2: There are no Nash equilibria, where two players play pure strategies and the remaining player strictly mixes.
12 The name “beauty contest game” comes from Keynes. He described the action of rational agents in a market using an analogy based on a
fictional newspaper contest, in which entrants are asked to choose the six most attractive faces from a hundred photographs. Those who picked
the most popular faces are then eligible for a prize. A naive strategy would be to choose the face that, in the opinion of the entrant, is the most
handsome. A more sophisticated contest entrant, wishing to maximize the chances of winning a prize, would think about what the majority
perception of attractive is, and then make a selection based on some inference from their knowledge of public perceptions. This can be carried one
step further to take into account the fact that other entrants would each have their own opinion of what public perceptions are. Thus the strategy
can be extended to the next order and the next and so on, at each level attempting to predict the eventual outcome of the process based on the
reasoning of other rational agents. Here, we consider the more explicit scenario that helps to convey the notion of the contest as a convergence to
Nash equilibrium, due to Ledoux (1981).
15
1. Player 3 strictly mixes. The indifference principle implies that players 1 and 2 play (T, L). But then player 2
would deviate to R, as she can guarantee payoff 1 even when Y is played (this happens with positive probability).
Contradiction!
2. Player 2 strictly mixes. The indifference principle implies that players 1 and 3 play (T, X). But then player 1
would deviate to B, as she can guarantee payoff 1 even when R is played (this happens with positive probability).
Contradiction!
3. Player 1 strictly mixes. The indifference principle implies that players 2 and 3 play (L, X). But then player 3
would deviate to Y, as she can guarantee payoff 1 even when B is played (this happens with positive probability).
Contradiction!
Step 3: There are no Nash equilibria, where one player plays pure strategy and the remaining players strictly mix.
1. Player 1 plays a pure strategy. If T is played, player 3 is willing to strictly mix only if player 2 plays L with
probability one; otherwise, she would choose X with probability one, contradiction! If B is played, player 2
would choose L with probability one, contradiction!
2. Player 2 plays a pure strategy. If L is played, player 1 is willing to strictly mix only if player 3 plays X with
probability one; otherwise, she would choose T with probability one, contradiction! If R is played, player 3
would choose X with probability one, contradiction!
3. Player 3 plays a pure strategy. If X is played, player 1 is willing to strictly mix only if player 2 plays L with
probability one; otherwise, she would choose B with probability one, contradiction! If Y is played, player 1
would choose T with probability one, contradiction!
Step 4: There are no Nash equilibria, where all players strictly mix.13
Let (p, q, r) be the probabilities with which, respectively, player 1 plays T , player 2 plays L and player 3 plays X.
The indifference condition for player 1 is
Similarly, the indifference condition for player 2 and 3 can be written as, respectively,
In particular, these three indifference conditions imply that p, q, r > 21 . Now multiply the three equations side by side.
We get
27pqr = (2p − 1)(2q − 1)(2r − 1).
27
RHS is smaller than 1, but LHS is greater than 8 , contradiction!
To sum up, there is a unique Nash equilibrium and it is in pure strategies: (T, L, X).
2 Correlated Equilibrium
Let’s begin with the definition of a correlated equilibrium in a normal form game.
D E
Definition 29 (Correlated equilibrium). In a normal form game G = N, (S j ) j∈N , (u j ) j∈N , a correlated equilibrium
(Ω, p, s∗ ) consists of:
2. A (joint) distribution p ∈ ∆(Ω), so that the marginal distributions p(ω j ) > 0 for each ω j ∈ Ω j .
3. A (signal-dependent) strategy s∗j : Ω j → S j for each j ∈ N such that for every j ∈ N, ω j ∈ Ω j , s0j ∈ S j ,
X X
p(ω− j |ω j )u j (s∗j (ω j ), s∗− j (ω− j )) ≥ p(ω− j |ω j )u j (s0j , s∗− j (ω− j )).
ω− j ∈Ω− j ω− j ∈Ω− j
13 There are several ways to do this step, this is just one of them.
16
A correlated equilibrium envisions the following situation. At the start of the game, an n-dimensional vector of signals
ω realizes according to the distribution p. Player j observes only the j-th dimension of the signal, ω j , and plays an
action s∗j (ω j ) as a function of the signal she sees. Whereas a pure strategy Nash equilibrium has each player playing
one action and requires that no player has a profitable unilateral deviation, in a correlated equilibrium each player may
take different actions depending on her signal. Correlated equilibrium requires that no player can strictly improve
her expected payoffs after seeing any of her signals. More precisely, seeing the signal ω j leads her to have some belief
over the signals that others must have seen, formalized by the conditional distribution p(·|ω j ) ∈ ∆(Ω− j ). Since she
knows how these opponent signals translate into opponent actions through s∗− j , she can compute the expected payoffs
of taking different actions after seeing signal ω j . She finds it optimal to play the action s∗j (ω j ) instead of deviating to
any other s0j ∈ S i after seeing signal ω j .
We make four remarks about correlated equilibria.
1. The signal space and its associated joint distribution, (Ω, p), are not part of the game G, but part of the equilib-
rium. That is, a correlated equilibrium constructs an information structure under which a particular outcome
can arise.
2. There is no institution compelling player j to play the action s∗j (ω j ), but j finds it optimal to do so after seeing
the signal ω j . It might be helpful to think of the traffic lights as an analogy for a correlated equilibrium. The
light color that a player sees as she arrives at the intersection is her signal and imagine a world where there is
no traffic police or cameras enforcing traffic rules. Each driver would nevertheless still find it optimal to stop
when she sees a red light, because she infers that her seeing the red light signal must mean the driver on the
intersecting street received the green light signal, and further the other driver is playing the strategy of going
through the intersection if he sees a green light. Even though the red light (ω j ) merely recommends an action
(s∗j (ω j )), j finds it optimal to obey this recommendation given how others are acting on their own signals.
3. A Nash equilibrium
D is alwaysE a correlated equilibrium. Indeed, if σ∗ is a Nash equilibrium in a normal form
game G = N, (S j ) j∈N , (u j ) j∈N , construct signal spaces Ω j = S j , define the distribution p ∈ ∆(Ω) by
and consider the signal-dependent strategies s∗j : Ω j → S j , s∗j (s j ) = s j . It is trivial to see that this gives a
correlated equilibrium. In particular, correlated equilibria always exist.
4. The set of correlated equilibria of a finite normal form game is convex. Recall that this is not true for Nash
equilibria. To see this intuitively, consider the following examples.
Example 30 (Game of assurance). Consider the game of assurance,
L R
T 1, 1 0, 0
B 0, 0 2, 2
We have seen that (T, L) and (B, R) are Nash equilibria, but 12 T ⊕ 12 B, 12 L ⊕ 12 R is not. Here is a correlated equilibrium
where player 1 plays 21 T ⊕ 12 B and player 2 plays 21 L ⊕ 12 R “effectively”: Ω1 = {t, b}, Ω2 = {l, r}, p(t, l) = p(b, r) = 0.5,
p(t, r) = p(b, l) = 0, s∗1 (t) = T , s∗1 (b) = B, s∗2 (l) = L, s∗2 (r) = R.
In this example, the signal structure is effectively a coordination device that picks the (T, L) Nash equilibrium 50%
of the time, the (B, R) Nash equilibrium 50% of the time. Effectively, this correlated equilibrium can be thought of
as flipping a coin, then instructing the players to play the (T, L) Nash equilibrium if heads up, and the (B, R) Nash
equilibrium if tails up. We also refer to such coordination device as a public randomization device. This point can
be made more general.
Example 31 (Public randomization device). Fix any normal form game G and fix K of its pure Nash equilibria,
s(1) , ..., s(K) . Then, for any probabilities p1 , ..., pK with pk > 0, k=1 pk = 1, consider the signal space with Ω j =
PK
{1, ..., K} for every j ∈ N, the joint distribution such that p(k, ..., k) = pk for each 1 ≤ k ≤ K, and p(ω) = 0 for any
ω where not all n dimensions match, and the strategies s∗j (k) = s(k) ∗
j for each j ∈ N, 1 ≤ k ≤ K. Then (Ω, p, s ) is a
correlated equilibrium. Indeed, after seeing the signal k, each player knows that others must be playing their part of
the k-th Nash equilibrium. As such, her recommended response s(k) j must be optimal.
17
In general, fix K correlated equilibria {(Ω(k) , p(k) , s∗(k) )}k=1
K
of a game and some strictly positive probability weights
K
(pk )k=1 , we can construct a new correlated equilibrium by first throwing a K-faced dice which falls on k with probability
pk , and instruct the players to play the k-th correlated equilibrium if face k realizes as outcome. The players will follow
the instruction exactly because each (Ω(k) , p(k) , s∗(k) ) is a correlated equilibrium in the first place. This two-stage
K
process gives a correlated equilibrium of the game, which is a mixture with weights (pk )k=1 of the original correlated
equilibria.
Example 32 (Coordination game with an eavesdropper). Three players Alice (P1), Bob (P2), and Eve (P3, the “eaves-
dropper”) play a zero-sum game. Alice and Bob win only if they show up at the same location, and furthermore Eve is
not there to spy on their conversation. The payoffs are given below. Alice chooses a row, Bob chooses a column, and
Eve chooses a matrix.
L L R R L R
L −1, −1, 2 −1, −1, 2 L 1, 1, −2 −1, −1, 2
R −1, −1, 2 1, 1, −2 R −1, −1, 2 −1, −1, 2
The following is a correlated equilibrium. Ω1 = Ω2 = Ω3 = {l, r}, p(l, l, l) = 0.25, p(l, l, r) = 0.25, p(r, r, l) = 0.25,
p(r, r, r) = 0.25, s∗i (l) = L and s∗i (r) = R for all i ∈ {1, 2, 3}. The information structure models a situation where Alice
and Bob jointly observe some randomization device unseen by Eve14 and use it to coordinate on either both playing
L or both playing R. Eve’s signals are uninformative of Alice and Bob’s actions. Indeed, after seeing either ω3 = l
or ω3 = r, Eve thinks the chances are 50-50 that Alice and Bob are both playing L or both playing R, so she has no
profitable deviation from the prescribed actions s∗3 (l) = L, s∗3 (r) = R. On the other hand, after seeing ω1 = l, Alice
knows for sure that Bob is playing L while Eve has a 50-50 chance of playing L or R. Her payoff is maximized by
playing the recommended s∗1 (l) = L. (Other deviations can be checked similarly.)
Eve’s expected payoff in this correlated equilibrium is 12 · 2 + 12 · (−2) = 0. However, if Alice and Bob were to play
independent mixed strategies, then Eve’s best response leaves her with an expected payoff of at least 1. To see this,
suppose Alice plays L with probability qA and Bob plays L with probability qB . If qA · qB ≥ (1 − qA ) · (1 − qB ), so that
it is more likely that Alice and Bob coordinate on L than on R, Eve may play L to get an expected payoff of:
1 3
(−2) · (1 − qA ) · (1 − qB ) + (2) · 1 − (1 − qA ) · (1 − qB ) ≥ (−2) · + (2) · = 1,
| {z } | {z } 4 4
Alice and Bob meet without Eve otherwise
In general, it is impossible to characterize the set of correlated equilibria of a given game, due to the arbitrary choice
of signal spaces. Yet every correlated equilibrium induces a probability distribution on the set of strategy profiles, and
such distributions are often the main object of analyses when one applies the concept of correlated equilibrium. The
formal definition of correlated equilibrium distributions is as follows:
D E
Definition 33 (Correlated equilibrium distribution). In a normal form game G = N, (S j ) j∈N , (u j ) j∈N , a correlated
equilibrium distribution is a probability distribution q ∈ ∆(S ) induced by a correlated equilibrium (Ω, p, s∗ ),
q = p ◦ s∗−1 .
That is,
q(s) = p({ω ∈ Ω : s∗ (ω) = s}).
For instance, the correlated equilibrium in Example 30 induces the following correlated equilibrium distribution:
q(T, L) = q(B, R) = 0.5, q(T, R) = q(B, L) = 0.
The set of correlated equilibrium distributions has a very convenient structure. It is a convex and compact subset of
∆(S ), characterized by a set of linear inequalities.
14 Perhaps an encrypted message.
18
D E
Proposition 34. In a (finite) normal form game G = N, (S j ) j∈N , (u j ) j∈N , a probability distribution q ∈ ∆(S ) is a
correlated equilibrium distribution if and only if for each s j with q(s j ) > 0 and for each s0j ,
X X
q(s− j |s j )u j (s j , s− j ) ≥ q(s− j |s j )u j (s0j , s− j ). (1)
s− j ∈S − j s− j ∈S − j
The condition (1) is called the obedience condition. To see its logic, suppose that a disinterested moderator randomly
selects a strategy profile s from the distribution q and recommends each player j to play s j without giving any other
information. Hearing the recommendation, player j comes to believe that the other players’ strategies are distributed
by q(·|s j ). The obedience condition states that she follows the recommendation.
Formally, this corresponds to the simple correlated equilibrium (Ω, p, s∗ ) with Ω = S , p = q, s∗ (s) = s. Hence, the
obedience condition is a sufficient condition for a correlated equilibrium distribution. Conversely, in order to capture
probability distributions induced by correlated equilibria with respect to arbitrary information structures, it suffices to
consider this limited set of information structures. To see this, take any correlated equilibrium (Ω, p, s∗ ) and its induced
correlated equilibrium distribution q on S . Now suppose that instead of letting j know that the realized state is ω j ,
we only inform him that he needs to play s∗j (ω j ) according to s∗j . Since he did not have an incentive to deviate under
any information (by definition of correlated equilibrium), by the sure-thing principle, he does not have an incentive to
deviate now. Hence, the new model with limited information is also a correlated equilibrium. One crucial assumption
that leads to this simplification is that u j does not depend on ω j .
Thanks to Proposition 34, the set of correlated equilibrium distributions is characterized by a finite set of linear in-
equalities: X
u j (s j , s− j ) − u j (s0j , s− j ) q(s j , s− j ) ≥ 0 ∀ j ∈ N, ∀s j , s0j ∈ S j .
s− j ∈S − j
Example 35 (Game of assurance). We calculate in this example all correlated equilibrium distributions of the game
of assurance,
L R
T 1, 1 0, 0
B 0, 0 2, 2
L R
T a b
B c d
From the obedience conditions, this is a correlated equilibrium distribution if and only if
The above conditions reduce to a ≥ 2b, a ≥ 2c, 2d ≥ b, 2d ≥ c. For all possible vectors of probabilities (a, b, c, d) that
satisfies these conditions, there exists an associated correlated equilibrium.
Next consider the symmetric correlated equilibria, where b = c. Such symmetric distributions can be represented by
pairs (a, b), with a + 2b ≤ 1 and d = 1 − a − 2b. The above conditions further reduce to b ≤ a/2 and 2a + 5b ≤ 2. The
set of symmetric correlated equilibrium distributions is the shaded area in Figure 6.
19
b
1/2
2/5
2/9
a
0 4/9 1
Figure 6: Symmetric correlated equilibria and Nash equilibria in the game of assurance.
Note that the Nash equilibria are also among the symmetric correlated
equilibriumdistributions where (1, 0) is (T, L),
(0, 0) is (B, R) and (4/9, 2/9) is the mixed strategy equilibrium 32 T ⊕ 13 B, 23 L ⊕ 31 R .
In this example, the set of symmetric correlated equilibrium distributions is simply the convex hull of the Nash equi-
librium distributions. Of course, there are also asymmetric correlated equilibria. In general, under broad conditions,
the Nash equilibria are located on the boundary of the set of correlated equilibrium distributions.
Example 36. We calculate in this example all correlated equilibrium distributions of the game studied in the lecture,
L R
U 5, 1 0, 0
D 4, 4 1, 5
L R
U a b
D c d
From the obedience conditions, this is a correlated equilibrium distribution if and only if
The above conditions reduce to a ≥ b, a ≥ c, d ≥ b, d ≥ c. For all possible vectors of probabilities (a, b, c, d) that
satisfies these conditions, there exists an associated correlated equilibrium.
Note that (1, 0, 0, 0) corresponds to the pure strategy Nash equilibrium (U, L), (0, 0, 0, 1) corresponds to the
pure strat-
egy Nash equilibrium (D, R), and (1/4, 1/4, 1/4, 1/4) corresponds to the mixed strategy Nash equilibrium 12 U ⊕ 12 D, 21 L ⊕ 12 R .
Moreover, (1/3, 0, 1/3, 1/3) corresponds to the correlated equilibrium constructed in the lecture. From a ≥ c, d ≥ c,
and a + c + d ≤ 1, we can conclude that c ≤ 1/3. Hence, this correlated equilibrium is the one with the highest prob-
ability that the strategy profile (D, L) is played. One can proceed to show that this correlated equilibrium maximizes
the sum of the players’ expected payoffs.
20
Economics 2010a . Section 3: Rationalizability and Nash Implementation 11/07/2021
(1) Rationalizability; (2) Mechanism design and Nash implementation
TF: Chang Liu (chang_liu@g.harvard.edu)
1 Rationalizability
1.1 Two algorithms. Consider a normal form game G. Here we review the two algorithms of iterative strategy elimi-
nation studied in lecture.
Algorithm 37 (Iterated elimination of strictly dominated strategies, “IESDS”).
3. Output: \
Ŝ i∞ := Ŝ i(t) .
t≥0
The idea behind IESDS is that if some mixed strategy σi yields strictly more payoff than the action si regardless
of what other players do, then i will never use the action si . The “iterated” part comes from requiring that (i) the
dominating mixed strategy must be supported on i’s actions that survived the previous rounds of eliminations; (ii)
the conjecture of what other players might do must be taken from their strategies that survived the previous rounds
of eliminations.
Algorithm 38 (Iterated elimination of never best responses, “IENBR”).
3. Output: \
S̃ i∞ := S̃ i(t) .
t≥0
(t) Q
It is important to note that ∆ S̃ −i , k,i ∆ S̃ k(t) when n ≥ 3.15 The left-hand-side is the set of correlated strategies
(t)
of players other than i, i.e., the set of all joint distributions on S̃ −i . Such a correlated mixed strategy might be generated,
for example, using a signal-space kind of setup as in correlated equilibrium. The elimination of never best responses
can be viewed as asking each action of player i to “justify its existence” by naming a correlated mixed strategy16 of
opponents for which it is a best response. The “iterated” part comes from requiring that this conjecture of correlated
opponents’ strategy have support in their strategies that survived the previous rounds of eliminations.
(t) Q
= k,i ∆ S̃ k(t) . This is because −i refers to exactly 1 player, not a group of
15 When there are only two players, equality does hold: ∆ S̃ −i
players, so we do not get anything new by allowing −i to “correlate amongst themselves”. As such, we did not have to worry about correlated vs.
independent opponent strategies when we computed rationalizable strategy profiles for a two-player game in lecture.
16 This correlated opponents’ strategy might reflect i’s belief that opponents are colluding and coordinating their actions, or it could reflect
correlation in i’s subjective uncertainty about what two of her opponents might do.
21
Another view on these two algorithms is that they make progressively sharper predictions about the game’s outcome
by making more and more levels of rationality assumptions. A “rational” player i is someone who maximizes the
utility function ui as given in the normal form game G. Rational players are contrasted against the so-called “crazies”
present in some models of reputation, who are mad in the sense of either maximizing a different utility function
than normal players, or in not choosing actions based on utility maximization at all. From the analysts’ perspective,
knowing that every player is rational allows us to predict that only actions in Ŝ i(1) (equivalently, S̃ i(1) ) will be played by
i, since playing any other action is incompatible with maximizing ui . But we cannot make more progress unless we are
also willing to assume what i knows about j’s rationality. If i is rational but i thinks that j might be mad, in particular
that j might take an action in S j \Ŝ (1) (2)
j , then the step for constructing Ŝ i for i does not make sense. As it is written in
Algorithm 37, we should eliminate any action of i that does strictly worse than a fixed mixed strategy against all action
(1)
profiles taken from Ŝ −i , which in particular assumes that j must be playing something in Ŝ (1)
j . In general, the t-th step
for each of Algorithm 37 and Algorithm 38 rests upon assumptions of the form “i knows that j knows that ... that k is
rational” with length t.
1.2 Equivalence of the two algorithms. In fact, Algorithm 37 and Algorithm 38 are the equivalent, as we now demon-
strate.17
Proposition 39. Ŝ i(t) = S̃ i(t) for each i ∈ N and t ≥ 0. In particular, Ŝ i∞ = S̃ i∞ .
In view of this result, we call S̃ i∞ the rationalizable strategies of player i, but note that it can be computed through
either IENBR or IESDS.
Proof. Do induction on t. When t = 0, Ŝ i(0) = S̃ i(0) = S i by definition. Suppose for each i ∈ N, Ŝ i(t) = S̃ i(t) .
(t)
To establish that S̃ i(t+1) ⊆ Ŝ i(t+1) , take some s∗i ∈ S̃ i(t+1) . By definition of IENBR, there is some σ−i ∈ ∆ S̃ −i s.t.
(t)
The inductive hypothesis lets us replaces all tildes with hats, so that there is some σ−i ∈ ∆ Ŝ −i s.t.
If s∗i were strictly dominated by some σ̂i ∈ ∆ Ŝ i(t) , then ui (s∗i , σ−i ) < ui (σ̂i , σ−i ), because the same strict inequality
holds at every s−i in the support of σ−i . By Fact 3, there exists some ŝi ∈ Ŝ i(t) with σ̂i ( ŝi ) > 0 so that ui (s∗i , σ−i ) <
ui ( ŝi , σ−i ), contradicting s∗i being a best response to σ−i .
Conversely, for the reverse inclusion S̃ i(t+1) ⊇ Ŝ i(t+1) , suppose s∗i ∈ Ŝ i(t+1) . Combining definition of IESDS
and the
inductive hypothesis shows that for each σi ∈ ∆ S̃ i , there corresponds some s−i ∈ S̃ −i so that ui si , s−i ≥ ui (σi , s−i )
(t) (t) ∗
n o
(otherwise, s∗i is strictly dominated by σi ). Now enumerate S̃ −i (t)
= s(1) −i , ..., s (d)
−i and hence construct the following
subset of of Rd : n o
V ≡ v ∈ Rd : ∃σi ∈ ∆ S̃ i(t) s.t. vk ≤ ui σi , s(k) −i , ∀1 ≤ k ≤ d .
That is, every σi ∈ ∆ S̃ i(t) gives rise to a point ui (σi , s(1)
−i ), . . . , ui (σi , s−i ) ∈ R and V is the region to the “lower-left”
(d) d
of this collection of points. We can verify that V is convex and non-empty. Now consider the point
−i ), . . . , ui (si , s−i ) ∈ R .
w = ui (s∗i , s(1) ∗ (d) d
We must have w < int(V), where int(V) is the interior of V. As such, separating hyperplane theorem implies there is
some q ∈ Rd \{0} with q · w ≥ q · v for all v ∈ V. Since V includes points with arbitrarily large negative numbers in each
coordinate, we must in fact have q ∈ Rd+ \{0}, i.e., q cannot have a negative coordinate. So then, q may be normalized
so that its coordinates add up to 1, and thus it can be viewed as some correlated strategy σ∗−i ∈ ∆ S̃ −i (t)
. This strategy
17 Note that the finiteness of the strategy space is also important for this equivalence result. To see a counterexample, consider the following
In other words, player 1 can secure as payoff any positive integer she picks, but she can also swap for the integer that player 2 picks. In this game,
the strategy swap for player 1 is not strictly dominated by any mixed strategy (with finite mean): swap works better when player 2 chooses an
integer larger than the mean of that mixed strategy. However, swap is never a best response to any player 2’s mixed strategy (with finite mean):
swap is worse than player 1’s choice of any integer larger than the mean of that mixed strategy.
22
has the property that ui s∗i , σ∗−i ≥ ui σi , σ∗−i for all σi ∈ ∆ S̃ i(t) , showing that in particular s∗i is a best response to
σ∗i amongst S̃ i(t) , hence s∗i ∈ S̃ i(t+1) . This establishes the reverse inclusion Ŝ i(t+1) ⊆ S̃ i(t+1) and completes the inductive
step.
1.3 Rationalizability and equilibrium concepts. In some sense, the collection of rationalizable strategies includes the
collection of correlated equilibrium strategies. To be more precise,
Proposition 40. If (Ω, p, s∗ ) is a correlated equilibrium, then s∗i (ωi ) ∈ S̃ i∞ for every i ∈ N and ωi ∈ Ωi .
Proof. We show for any player i and any si such that si ∈ s∗i (Ωi ) (the image of mapping s∗i ), sk ∈ S̃ i(t) for every t. This
statement is clearly true when t = 0. Suppose this statement is true for t = T . Then, for each player i and each signal
ωi ∈ Ωi , consider the correlated opponent strategy σ∗−i constructed by
(T )
By definition of correlated equilibria, s∗i (ωi ) best responds to σ∗−i . Furthermore, σ∗−i ∈ ∆ S̃ −i by inductive hypothesis.
Therefore, ŝi ∈ S̃ i(T +1) , completing the inductive step.
Therefore, we see that correlated equilibria (and in particular, Nash equilibria) embed the assumption of common
knowledge of rationality: not only is Alice rational, but also Alice knows Bob is rational, and Alice knows that Bob
knows Alice is rational, etc.
1.4 Nested solution concepts. Here we summarize the inclusion relationships between several solution concepts. For
a normal form game G,
Rat(G) ⊇ CE(G) ⊇ NE(G).
3. A set of outcomes A.
4. A state-dependent utility ui : A × Θ → R for each player i ∈ N.
5. A social choice rule f : Θ ⇒ A.
Every mechanism design problem presents an information problem. Consider a designer who is omnipotent (all-
powerful) but not omniscient (all-knowing). It can choose any outcome a ∈ A. However, the outcome it wants to pick
depends on the state of the world. When the state of the world is θ, the designer’s favorite outcomes are f (θ). While
every player knows the state of the world, the designer does not. Think of, for example, a town hall (designer) trying to
decide how much taxes to levy (outcomes) on a community of neighbors (players), where the optimal taxation depends
on the productivities of different neighbors, a state of the world that every neighbor knows but the town hall does not.
Due to the designer’s ignorance of θ, it does not know which outcome to pick and must proceed more indirectly. The
goal of the designer is to come up with an incentive scheme, called a mechanism, that induces self-interested players
to choose one of the designer’s favorite outcomes. The mechanism enlists the help of the players, who know the state
of the world, in selecting an outcome optimal from the point of view of the designer.
More precisely,
Definition 42 (Mechanism). Given a mechanism design problem, a mechanism (S , g) consists of:
23
A mechanism is a game form, i.e., a way to model the rules of a game, or an institution, independently of the players’
utility functions over the game’s outcomes. The designer announces a set of pure strategies S i for each player and a
mapping between the profile of pure strategies and the outcome. The designer promises to implement the outcome
g(s) when players choose the strategy profile s.
In state θ, the mechanism (S , g) gives rise to a normal form game, G(θ), where the set of actions of player i is S i
and the payoff i gets from strategy profile s is ui (g(s), θ). The mechanism solves the designer’s information problem
if playing the game G(θ) yields the same outcomes as f (θ). To predict what agents will do when they play the game
G(θ), the designer must pick a solution concept. We will use Nash equilibrium.
Let NEg (θ) denote the set of Nash equilibrium outcomes in each state of the world,
Definition 43 (Nash implementation). The mechanism (S , g) Nash implements social choice rule f if NEg (θ) = f (θ)
for every θ ∈ Θ.
If the designer wants to use a solution concept other than Nash equilibrium, then it would simply replace “NE” in the
above definition.
Loosely speaking, mechanism design is “reverse game theory”. Whereas a game theorist takes the game as given and
analyzes its equilibria, a mechanism designer takes the social choice rule as given and acts as a “game maker”, aiming
to engineer a game with suitable equilibria.
2.2 Maskin monotonicity and no veto power. It is natural to ask which mechanism design problems admit Nash
implementations. As we saw in lecture the following pair of conditions are important.
Definition 44 (Maskin monotonicity). A social choice rule f satisfies Maskin monotonicity (MM)18 provided that
for all a ∈ A and θ, θ0 ∈ Θ, if
1. a ∈ f (θ),
2. for all i ∈ N and b ∈ A, ui (a, θ) ≥ ui (b, θ) ⇒ ui (a, θ0 ) ≥ ui (b, θ0 ),
then a ∈ f (θ0 ).
Equivalently, we can write the second condition as for all player i, {b : ui (a, θ) ≥ ui (b, θ)} ⊆ {b : ui (a, θ0 ) ≥ ui (b, θ0 )}.
In words, if a is chosen in some state θ, then it should also be chosen when the set of outcomes weakly worse than a
expands for everyone.
Definition 45 (No veto power). A social choice rule f satisfies no veto power (NVP) provided that for all a ∈ A and
θ ∈ Θ, if there exists i ∈ N such that
then a ∈ f (θ).
Theorem 46 (Maskin, 1999).
#{i : ui (a, θ) > ui (c, θ) for all c , a} ≥ #{i : ui (b, θ) > ui (c, θ) for all c , b}.
To see why f Top satisfies NVP, suppose for a ∈ A and θ ∈ Θ, there exists i∗ ∈ N such that u j (a, θ) ≥ u j (b, θ) for all j ,
i∗ and all b ∈ A. Since we assume preferences are strict, it follows that
24
while for any b , a,
#{i : ui (b, θ) > ui (c, θ) for all c , b} ≤ 1.
Top
Since n ≥ 3, a ∈ f (θ).
To see why f Top does not satisfy MM, consider the following preferences: n = 3, A = {a, b, c}. In state θ, u1 (a, θ) >
u1 (b, θ) > u1 (c, θ), u2 (b, θ) > u2 (c, θ) > u2 (a, θ), and u3 (c, θ) > u3 (b, θ) > u3 (a, θ); in state θ0 , the preferences are
unchanged except that u3 (b, θ0 ) > u3 (c, θ0 ) > u3 (a, θ0 ). Then outcome a did not drop in ranking relative to any other
outcome for any individual from θ to θ0 , yet f Top (θ) = {a, b, c} while f Top (θ0 ) = {b}. This shows f Top does not satisfy
MM. Hence by Theorem 46 it is not Nash implementable.
Example 48 (MM but not NVP, yet implementable). Suppose n ≥ 3 and individuals have strict preferences over
outcomes A in any state of the world. Consider the social choice rule “dictator’s rule” f D , that chooses the top-ranked
outcome of player 1, the dictator.
To see why f D satisfies MM, note that a ∈ f (θ) implies that u1 (a, θ) > u1 (b, θ) for all b , a. In any state of the world θ0
where a does not fall in ranking relative to any other outcome for any individual, it remains true that u1 (a, θ0 ) > u1 (b, θ0 )
for any b , a. As such, a ∈ f D (θ0 ) also.
To see why f D does not satisfy NVP, consider the following preferences: n = 3, A = {a, b}. In state θ, u1 (a, θ) >
u1 (b, θ), u2 (b, θ) > u2 (a, θ), and u3 (b, θ) > u3 (a, θ). We have b being top-ranked for all individuals except player 1, yet
f D (θ) = {a}.
Theorem 46 does not say whether f D is Nash implementable or not. However, it is easy to see that f D can be
implemented by the following mechanism: ask each player their favorite outcome, but only implements the answer of
player 1 while ignoring everyone else. This example shows MM plus NVP are sufficient for Nash implementability
when n ≥ 3, but NVP is not necessary.
Example 49 (MM but not NVP, and not implementable19 ). Suppose n = 3 and each state represents a set of strict
orderings (1 , 2 , 3 ) of all individuals over outcomes A = {a, b, c}. Consider the social choice rule, f :
• For x ∈ {a, b}, x ∈ f (θ) if and only if x is Pareto-optimal and top-ranked for 1.
• c ∈ f (θ) if and only if c is Pareto-optimal and not bottom-ranked for 1.
It is easy to verify that f satisfies MM. To see why f does not satisfy NVP, suppose that in state θ player 1 bottom-ranks
c, then c < f (θ) even if players 2 and 3 top-rank c.
To see why f is not Nash implementable, consider the following three profiles θ, θ0 , θ00 :
If f were implementable, there would exist a mechanism (S , g) and a Nash equilibrium s∗ of G(θ) such that g(s∗ ) = c.
It follows that for all s1 ∈ S 1 , g(s1 , s∗2 , s∗3 ) , b, otherwise s1 would be a profitable deviation for player 1 in state θ.
If there existed s01 ∈ S 1 such that g(s01 , s∗2 , s∗3 ) = a, then (s01 , s∗2 , s∗3 ) would be a Nash equilibrium of G(θ00 ) (no one has
a profitable deviation), a contradiction since a < f (θ00 ). We concluded that for all s1 ∈ S 1 , g(s1 , s∗2 , s∗3 ) = c. But this
indicates that s∗ is a Nash equilibrium of G(θ0 ), a contradiction since c < f (θ0 ).
This example shows MM per se is not sufficient for Nash implementability.
Example 50 (The electoral college rule20 ). Consider a society made up of three states, {A, B, C}. The voters in each
state will vote over the set of candidates {H, T, J}. State A has 10 voters and 6 electors, state B has 7 voters and 5
electors, and state C has 3 voters and 2 electors. Once a candidate wins the state, the electors will vote according to
the winning candidate in their state. Overall, the candidate wins who wins the most electoral votes.
Assume that in state of the world θ all 10 voters of state A have preferences H T J, all 7 voters of state B have
preferences T H J, while in state C one voter has the preferences H J T , while the two remaining voters
have J T H. Then H carries state A and has 6 electors, T carries state B and has 5 electors, while J carries state
C and has 2 electors. Overall, candidate H wins with 6 electoral votes.
19 This
example is adapted from Maskin (1999, Example 2).
20 This
example is developed by Jetlir Duraj and Kevin He. They conjecture that the electoral college in the U.S. electoral system does not satisfy
MM. This is an attempt at showing what could go wrong in a simple example.
25
Consider now state of the world θ0 , which is the same as θ except, that in state C the two last individuals have
preferences T J H (instead of J T H). Then H hasn’t fallen relatively to the other candidates for any voter,
but now candidate T wins the electoral college, by carrying states B and C with 7 electors in all.
This shows that the electoral college rule does not satisfy MM. Note, that by simple popular vote, H would win in both
states of the world θ and θ0 , since she has 11 voters assured, while the most that T could hope to get is 9 votes (in θ0 ).
Nevertheless, popular vote is also susceptible to failure of MM. You can try to think of an example.
Example 51 (December 2016 Final Exam). Suppose n = 3 and each state represents a set of strict orderings
(1 , 2 , 3 ) of all individuals over alternatives A = {a, b, c}. Let social choice rule f be “rank-order voting” (“Borda
count”). That is, an alternative gets 3 points every time it is ranked first by some individual, 2 every time it is ranked
second, and 1 point every time it is ranked third. Points are summed across individuals, and f (θ) consists of the
alternative(s) with the highest overall point total. Prove that f is not implementable in Nash equilibrium.
Solution:
Let Rθi (a) ∈ {1, 2, 3} be the ranking alternative a gets from agent i in state θ. Then the rule we are considering is
3 3
X X
θ
f (θ) = Rθi (b), ∀b ∈ .
a ∈ A : R (a) ≥ A
i
i=1 i=1
26
Economics 2010a . Section 4: Bayesian Games 11/14/2021
(1) Bayesian games; (2) Auction model; (3) Solving for auction BNEs; (4) Revenue equivalence theorem;
1 Bayesian Games
1.1 The model of a Bayesian game. In our brief encounter with mechanism design, we considered a setting where
the designer is uncertain as to the state of the world θ ∈ Θ, but every player knows θ perfectly. Many economic
situations involve uncertainty about payoff-relevant state of the world amongst even the players themselves. To take
some examples:
Definition 52 (Bayesian game). A Bayesian game B = hN, (Θi )i∈N , (Ai )i∈N , (ui )i∈N , (pi )i∈N i consists of:
While there exist some more general approaches (see the optional material on the universal type space, for example),
most models of incomplete-information games you will encounter will impose the common prior assumption: there
exists a common prior for each i ∈ N, which we call µ ∈ ∆(Θ).
For ease of exposition, for now we will focus on the case where Θ is finite.21
A Bayesian game proceeds as follows. A state of the world θ is realized. Player i learns the i-th dimension, θi , then
takes a pure action from her action set Ai or a mixed action from ∆(Ai ). The utility of player i depends on the profile
of actions as well as the state of the world θ, so in particular it might depends on the dimensions of θ that i does not
observe. The subset of Bayesian games where ui does not depend on θ−i are called private value games.
Player i’s strategy is a function of θi , not of θ, for i can only condition her action on her partial knowledge of the state
of the world. For reasons we make clear later, Θi is often called the type space of i and one often describes a strategy
of i as “type θi0 does X, while type θi00 does Y”.
A strategy profile in a Bayesian game might remind you of a correlated equilibrium. Indeed, in both setups each
player observes some realization (her signal in CE, her type in Bayesian game), then performs an action dependent on
her observation. However, unlike (Ω, p) in the definition of a correlated equilibrium, the (Θ, p) in a Bayesian game
is part of the game, not part of the solution concept. Furthermore, while the signal profile ω ∈ Ω in a CE is only
a coordination device that does not by itself affect players’ payoffs (as in an unenforced traffic light), the state of the
world in a Bayesian game is payoff-relevant.
21 The Bayesian game model can also accommodate games with infinitely many states of the world, such as auctions with a continuum of possible
27
Example 54 (August 2013 General Exam). Two players play a game. With probability 0.5, the payoffs are given by
the left payoff matrix. With probability 0.5, they are given by the right payoff matrix. Player 1 knows whether the
actual game is given by the left or right matrix, while Player 2 does not. Model this situation as a Bayesian game.
L C R L C R
T −2, −2 −1, 1 0, 0 T 0, 0 0, 0 0, 0
M 1, −1 3, 5 3, 4 M 0, 0 0, 0 0, 0
B 0, 0 4, 2 2, 4 B 0, 0 1, 0 4, 4
Solution:
Let Θ1 = {l, r}, Θ2 = {0}, µ ∈ ∆(Θ) with µ(l, 0) = µ(r, 0) = 0.5. In state (l, 0), the payoffs are given by the left matrix.
In state (r, 0), they are given by the right matrix. There are thus two types of player 1: the type who knows that the
payoffs are given by the left matrix, and the type who knows that the payoffs are given by the right one. There is
only one type of player 2. The utility of each player depends on (a1 , a2 , θ). For example u2 (B, C, (l, 0)) = 2 while
u2 (B, C, (r, 0)) = 0. In particular, the payoff to player 2 depends on θ1 , so this is not a private value game.
A pure strategy of player 1 in this Bayesian game is a function s1 : Θ1 → A1 , in other words the strategy must specify
what the l-type and r-type of player 1 will do. A pure strategy of player 2 is a function s2 : Θ2 → A2 , but since Θ2 is
just a singleton, player 2 has just one action in each of her pure Bayesian game strategies.
1.2 Bayesian Nash equilibrium. When a profile of pure opponent strategies s−i is played, after observing θi , player i
evaluates her expected utility from any pure action ai ∈ Ai by:
X
Ei [ui (ai , s−i (θ−i ), (θi , θ−i ))|θi ] = pi (θ−i |θi )ui (ai , s−i (θ−i ), (θi , θ−i )).
θ−i ∈Θ−i
Similarly, we can extend the domain into mixed strategies. When a profile of mixed opponent strategies σ−i is played,
after observing θi , player i evaluates her expected utility from any mixed action αi ∈ ∆(Ai ) by:
X X Y
pi (θ−i |θi )αi (ai ) [σ j (θ j )](a j ) ui (ai , a−i , (θi , θ−i )).
θ−i ∈Θ−i a∈A j,i
We will write Ei [ui (αi , σ−i (θ−i ), (θi , θ−i ))|θi ] for this utility. Here’s the most common equilibrium concept for Bayesian
games.
Definition 55 (Bayesian Nash equilibrium). A pure strategy Bayesian Nash equilibrium is a pure strategy profile
s∗ such that for each player i ∈ N and each type θi ∈ Θi ,
Ei [ui (s∗i (θi ), s∗−i (θ−i ), (θi , θ−i ))|θi ] ≥ Ei [ui (a0i , s∗−i (θ−i ), (θi , θ−i ))|θi ] for all a0i ∈ Ai .
Similarly, a mixed strategy Bayesian Nash equilibrium (BNE) is a mixed strategy profile σ∗ such that for each
player i ∈ N and each type θi ∈ Θi ,
Ei [ui (σ∗i (θi ), σ∗−i (θ−i ), (θi , θ−i ))|θi ] ≥ Ei [ui (a0i , σ∗−i (θ−i ), (θi , θ−i ))|θi ] for all a0i ∈ Ai .
Note that in the definition of a mixed BNE, it is without loss of generality to require no profitable unilateral deviation
to any pure action, a0i , rather than any mixed action.
A BNE might be understood as a “correlated equilibrium with payoff-relevant signals”. Let’s focus on a pure
strategy BNE, s∗ . After observing her type θi , player i derives from her prior a conditional belief pi (·|θi ) ∈ ∆(Θ−i ) about
the types of other players. She knows s∗−i (·), so she knows how these opponent types translate into opponent actions.
Unlike in a CE, however, she knows that her payoff also depends on the complete state of the world, θ = (θi , θ−i ).
Analogous to CE, a BNE is a strategy profile such that, after player i observes her type θi and calculates her expected
payoffs to different actions, she finds it optimal to play the prescribed action s∗i (θi ) across all of her choices in Ai .
Example 56 (August 2013 General Exam). Find all the BNEs in Example 54.
Solution:
Since player 2 has only one type, it is easiest to break things down by player 2’s action in equilibrium.
Step 1: Consider BNE where player 2 plays a pure strategy.
28
1. s∗2 = L. In any such BNE, we must have s∗1 (l) = M since the type-l player 1 knows for sure that player 2 is
playing L and that payoffs are given by the left matrix, leading to a unique best response of M. Yet this means
player 2 has a profitable deviation. Playing C yields an expected payoff of 12 · 5 + 21 · 0 = 2.5 (regardless of what
s∗1 (r) is), which is better than playing L and getting an expected payoff of 12 · (−1) + 12 · 0 = −0.5. Therefore,
there is no BNE with s∗2 = L.
2. s∗2 = C. In any such BNE, we must have s∗1 (l) = s∗1 (r) = B for similar reasoning as above. But that means player
gets an expected payoff of 12 · 2 + 12 · 0 = 1 by playing C, yet he can get 21 · 4 + 12 · 4 = 4 by playing R. Therefore,
there is no BNE with s∗2 = C.
3. s∗2 = R. In any such BNE, we must have s∗1 (l) = M, s∗1 (r) = B for similar reasoning as above. As such, player
2 gets an expected payoff of 12 · 4 + 12 · 4 = 4 from playing R. By comparison, he would get an expected
2 · (−1) + 2 · 0 = −0.5 from playing L and an expected 2 · 5 + 2 · 0 = 2.5 from playing C.
1 1 1 1 22
Therefore, we see
that s1 (l) = M, s1 (r) = B, s2 = R is a BNE of the game.
∗ ∗ ∗
2 knows only the prior probabilities of the two matrices, but not which one is actually being played.
29
This is again strictly concave in e2 and the maximum is attained at
1 + E [e1 ]
BR2 (e1 ) = .
4
In a BNE the strategies e∗1 (c1 ) , e∗2 are best responses to each other. That is
h i
e∗ 1 + E e∗1 (c1 )
e∗1 (c1 ) = 2 , e∗2 = .
4c1 4
To solve we calculate first
3 e∗2 2 e∗ e∗
Z ! Z 3
E e∗1 (c1 ) = c1 dc1 = 2 dc1 = 2 .
2 4c1 5 10 2 10
Plugging this into the best response function for player 2 yields
e∗
1 + 102
e∗2 = .
4
Solving we find e∗2 = 10
39 . Plugging this into the equation of player 1 we get
10 5
e∗1 (c1 ) = = .
4 · 39c1 78c1
Since each player’s maximization problem is strictly concave, the best response correspondences are single-valued
(for fixed strategy of the other player). From this it follows there are no mixed strategy BNEs.
In all, the unique BNE is !
5 10
e∗1 (c1 ) , e∗2 = , .
78c1 39
Example 58 (From MWG). The Alphabeta research and development consortium has two (non-competing) members,
firms 1 and 2. The rules of the consortium are that any independent invention by one of the firms is shared fully with
the other. Suppose that there is a new invention, the ‘Zigger’, that either of the two firms could potentially develop. To
develop this new product costs c ∈ (0, 1). The benefit of the Zigger to each firm is known only to that firm. Formally,
each firm i has a type θi that is independently drawn from a uniform distribution over [0,1], and its benefit in case of
type θi is θi2 . The timing is as follows: The two firms privately observe their type. Then they each both simultaneously
decide to develop or not. Find the pure BNE of this game.
Solution:
We write si (θi ) = 1 if firm i develops and si (θi ) = 0 otherwise. If firm i develops when her type is θi then her payoff is
θi2 − c, regardless
of the action of the other firm. If firm i decides to not develop when her type is θi her expected payoff
is θi Prθ j s j θ j = 1 . Hence, one calculates easily that developing is a best response for type θi of i if and only if
2
21
c
θi ≥ .
1 − Prθ j s j θ j = 1
This inequality shows that the strategies in any potential BNE take the form of a cut-off rule: develop if and only if
owntype is than a threshold. Let θi , i = 1, 2 be the cut-offs in a BNE. Given the cutoff strategies, we have
higher
Prθ j s j θ j = 1 = 1 − θ j so that the thresholds satisfy the equations
2 2
θ1 θ2 = c, θ2 θ1 = c.
1
This implies that θ1 = θ2 = c 3 . This gives a unique potential BNE. It is then straightforward to check that the threshold
1
strategies with threshold equal to c 3 are indeed a BNE.
30
1. Find the unique Nash equilibrium of the following normal form game.
L R
T 0, 0 0, −1
B 1, 0 −1, 3
2. Consider now the following perturbed version where ε > 0 is a small number and a, b are independent and
uniformly distributed in [0, 1].
L R
T εa, εb εa, −1
B 1, εb −1, 3
Assume player 1 sees the draw of a but not of b and player 2 sees the draw of b but not of a. Find the essentially
unique Bayesian Nash equilibrium for fixed ε.
3. What happens as ε goes to zero? Interpret.
Solution:
1. It easy to see that there is no Nash equilibrium in pure strategies. It is also easy to see that there cannot be a
Nash equilibrium where only one of the players strictly mixes. By using
the usual indifference
conditions we
arrive at the unique Nash equilibrium where both player strictly mix 43 T ⊕ 14 B; 21 L ⊕ 12 R .
2. Note that for a fixed strategy of player 2, as a becomes larger and larger, T becomes more and more attractive in
comparison to B. There is thus a threshold value for a, call it p, so that player 1 chooses T whenever a > p and
B whenever a < p. The same logic applies to player 2 and thus there is a threshold q for him so that L is chosen
if b > q and R if b < q.
From the perspective of player 1, at the threshold p the two pure strategies should be indifferent. Given the
threshold strategy of player 2 this implies that for a = p
εp = 1 · (1 − q) + (−1) · q.
εq = −1 · (1 − p) + 3 · p.
2+ε 4−ε
!
(p, q) = , .
8 + ε2 8 + ε2
To fully specify a BNE we need to specify strategies for the cases where a = p and b = q. We can take any pos-
sible specification of strategies for these cases, because they happen with probability zero from the perspective
of the other player23 , and the relevant optimality calculations involve comparing expectations/averages which
don’t depend on changes made in zero-probability events. Thus, up to strategies picked for the cases a = p and
b = q, the BNE strategies are unique.
23 a is distributed according to a continuous distribution from the perspective of player 2 and similarly for b and player 1.
31
3. Note that as ε goes to zero, (p, q) converges to 14 , 12 . These thresholds give precisely the mixed NE calculated
in part 1. These kinds of perturbation arguments to support mixed Nash equilibria are well-known since seminal
work from Harsanyi who was trying to ‘micro-found’ why players in a game as in part 1, which has a unique
mixed NE, would pick the ‘right’ probabilities to randomize with, given their indifference between the pure
strategies in equilibrium. One possible story to tell is thus, that the players perceive the game as ‘perturbed’ as
above and are actually playing a pure BNE in a Bayesian game whose play converges to the mixed Nash of the
unperturbed game, as the perturbation goes to zero.
2 Auction Model
2.1 Definition of an auction. In section, we will make a number of simplifying assumptions instead of studying the
most general auction model. We will assume that: (i) auctions are in private value, so the types of −i do not matter
for i’s payoff; (ii) the type distribution is identical and independent across players; (iii) there is one seller who sells
one indivisible item; (iv) players are risk neutral, so getting the item with probability H and having to pay P in
expectation gives a player i with type θi a utility of θi H − P.
D h i E
Definition 60 (Auction). An auction N, F, 0, θ , (Hi )i∈N , (Pi )i∈N consists of:
At the start of the auction, each player i learns h heri own valuation θi . The valuations of different players are drawn i.i.d.
from F, which is supported on the interval 0, θ . Each player simultaneously submits a nonnegtive real number as
her bid. When the profile of bids (b1 , ..., bn ) is submitted, player i gets the item with probability Hi (b1 , ..., bn ) and pays
Pi (b1 , ..., bn ) in expectation.
2.2 Some examples of (H, P) pairs. In lecture, we showed that a number of well-known auction formats – namely,
first-price and second-price auction – can be written in terms of some (allocation rule, payment rule) pairs. Now, we
turn to a number of unusual auctions to further illustrate the definition.24
• Raffle. Each player chooses how many raffle tickets to buy. Each ticket costs $1. A winner is selected by
drawing a raffle ticket at random. This corresponds to Hi (b1 , ..., bn ) = Pbk ibk , Pi (b1 , ..., bn ) = bi . Unlike the usual
auction formats like first-price and second-price auctions, the allocation rule Hi involves randomization for
almost all profiles of “bids”.
• War of attrition. A strategic territory is contested by two generals. Each general chooses how much resources
to use in fighting for this territory. The general who commits more resources destroys all of her opponents’
forces and wins the territory, but suffers as much losses as the losing general. This corresponds to
1, if bi > b−i ,
Hi (bi , b−i ) = 0.5, if bi = b−i ,
if bi < b−i ,
0,
and Pi (bi , b−i ) = min{b1 , b2 }, so it is as if two bidders each submits a bid and everyone pays the losing bid.
• All-pay auction. Each player submits a bid and the highest bidder gets the item. Every player, win or lose, must
pay her own bid. Here, Hi (b1 , ..., bn ) is the same as in first-price auction, but the payment rule is Pi (b1 , ..., bn ) =
bi .
24 Some of these examples may seem to have nothing to do with auctions at a first glance, yet our definition of an auction in terms of (H, P) is
general enough to apply to them. Here, as elsewhere in economics, theory allows us to unify our modeling and understanding of seemingly disparate
phenomena under a single framework.
32
2.3 Auctions as private-value Bayesian games. Auctions form an important class of examples in Bayesian games.
As defined above, an auction is aD private h value
i Bayesian game
E with a continuum of types for each player. Referring
back to Definition 52, an auction N, F, 0, θ , (Hi )i∈N , (Pi )i∈N can be viewed as a common prior private value Bayesian
game B = hN, (Θi )i∈N , (Ai )i∈N , (ui )i∈N , µi, where:
As such, many terminologies from general Bayesian games carry over to auctions. A pure strategy of bidder i is a
function si : Θi → R+ , mapping i’s valuation to a nonnegative bid. A (pure strategy) BNE in an auction is a strategy
profile s∗ such that for each player i and valuation θi ∈ Θi ,
s∗i (θi ) ∈ arg max Eθ−i θi Hi (bi , s∗−i (θ−i )) − Pi (bi , s∗−i (θ−i )) .
bi ∈R+
As usual, player i of type θi knows the mapping from opponent’s types θ−i to their actions s∗−i (θ−i ), i.e., how each
opponent would bid as a function of their valuation, but she does not know opponents’ realized valuations. She
does know the distribution over opponents’ valuations, so she can compute the expected payoff of playing different
bids, with expectation25 taken over opponents’ types.
Given an auction, here are two approaches for identifying some of its BNEs. But be warned: an auction may have
multiple BNEs and the following methods may not find all of them.
3.1 Weakly dominant BNEs. The following holds in general for private-value Bayesian games.
Definition 61 (Weakly dominant). In a private value Bayesian game, a strategy si : Θi → Ai is weakly dominant for
player i if for all a−i ∈ A−i and all θi ∈ Θi ,
Proposition 62. In a private value Bayesian game, consider a strategy profile s∗ where for each i ∈ N, s∗i is weakly
dominant for i. Then s∗ is a BNE.
ui (s∗i (θi ), a−i , θi ) ≥ ui (a0i , a−i , θi ) for all a0i ∈ Ai and a−i ∈ A−i .
So in particular,
ui (s∗i (θi ), s∗−i (θ−i ), θi ) ≥ ui (a0i , s∗−i (θ−i ), θi ) for all a0i ∈ Ai and θ−i ∈ Θ−i .
Taking expectation over θ−i ∈ Θ−i , we get
Ei [ui (s∗i (θi ), s∗−i (θ−i ), θi )|θi ] ≥ Ei [ui (a0i , s∗−i (θ−i ), θi )|θi ] for all a0i ∈ Ai .
As a result, if we can identify a weakly dominant strategy for each player in an auction, then a profile of such strategies
forms a BNE.
Example 63 (Second-price auction with reserve price). The seller sets reserve price r ∈ R+ , then every bidder submits
a bid simultaneously.
25 This is analogous to Definition 55. However, in Definition 55 we spelled out a weighted sum over θ ∈ Θ instead of writing an expectation.
−i −i
This was possible since we focused on the case of a finite Θ in that section.
33
• If every bid is less than r, then no bidder gets the item and no one pays anything.
• If the highest bid is r or higher, then the highest bidder gets the item and pays either the bid of the second highest
bidder or r, whichever is larger. If several players tie for the highest bidder, then one of these high bidders is
chosen uniformly at random, gets the item, and pays the second highest bid (which is equal to her own bid).
3.2 The FOC approach. In the BNE of an auction, fixing bidder i with type θi , we have:
s∗i (θi ) ∈ arg max Eθ−i θi Hi (bi , s∗−i (θ−i )) − Pi (bi , s∗−i (θ−i )) .
(2)
bi ∈R+
So in particular, h i
θi ∈ arg max Eθ−i θi Hi (s∗i (θ̂i ), s∗−i (θ−i )) − Pi (s∗i (θ̂i ), s∗−i (θ−i )) (3)
θ̂i ∈Θi
because (3) restricts the optimization problem in (2) to the domain of s∗i (Θi ) ⊆ R+ . Essentially, condition (3) converts
the problem of choosing an optimal bidding strategy to the problem of choosing a type to report. There could exist
other best responses, but we require truth telling to be one of them. More generally, the revelation principle implies
that for any BNE of any auction game, there exists an equivalent BNE of a direct revelation mechanism in which
players announce types as strategies, and, in equilibrium, report their true types.
Consider now the objective function of this second optimization problem,
h i
Ui (θ̂i , θi ) ≡ Eθ−i θi Hi (s∗i (θ̂i ), s∗−i (θ−i )) − Pi (s∗i (θ̂i ), s∗−i (θ−i )) . (4)
If it is differentiable (which will hold provided Hi , Pi , and the distribution F are “nice enough”) and valuation θi ∈ 0, θ
is interior, then the first-order condition (FOC) of optimization implies
∂Ui
(θi , θi ) = 0.
∂θ̂i
In auctions without a weakly dominant strategy, sometimes this FOC can help identify a BNE by giving us a closed-
form expression of s∗i (θi ) after manipulation.
Example 64 (First-price auction). Consider a first-price auction with two bidders. The two bidders’ types are dis-
tributed i.i.d. with θi ∼ U[0, 1]. Each bidder submits a nonnegative bid and whoever bids higher wins the item and
pays her own bid. If there is a tie, then each bidder gets to buy the item at her bid with equal probability. It is
known that this auction has a symmetric BNE (s∗1 , s∗2 ) where (i) s∗i (θi ) is differentiable, strictly increasing in θi ; (ii) the
associated equation (4) is differentiable. Find a closed-form expression for s∗i (θi ).
Solution:
In the symmetric BNE (s∗1 , s∗2 ), the expected probability of player 1 winning the item by playing the BNE strategy
of type θ̂1 is θ̂1 . This is because s∗2 is strictly increasing and symmetric to s∗1 , so that bidding s∗1 (θ̂1 ) wins is exactly
when θ2 < θ̂1 , which happens with probability θ̂1 since θ2 ∼ U[0, 1]. At the same time, the expected payment for
34
submitting the BNE bid of type θ̂i is θ̂i s∗i (θ̂i ), because bidding s∗i (θ̂i ) wins with probability θ̂i and pays s∗i (θ̂i ) in the
event of winning. The relevant optimization problem is therefore
4.1 The revenue equivalence theorem. While you may be familiar with statements like “first-price auction and second-
price auction are revenue equivalent” before taking this course, it is important to gain a more precise understanding of
the revenue equivalence theorem (RET). To see how a cursory reading of the RET might lead you astray, consider
the asymmetric second-price auction BNE from lecture, where bidder 1 always bids θ and everyone else always bids
0, regardless of their types. The seller’s expected revenue is 0!
Strictly speaking, RET is not a statement comparing two auction formats, but a statement comparing two equilibria
of two auction formats. “Revenue” is an equilibrium property and an auction game might admit multiple BNEs with
different expected revenues.
So let a BNE s∗ of some auction game be given.27 Let us define two functions Gi , Ri : Θi → R for each player i, so that
Gi (θ̂i ) and Ri (θ̂i ) give the expected probability of winning and expected payment when bidding as though valuation
is θ̂i :
h i
Gi (θ̂i ) ≡ Eθ−i Hi (s∗i (θ̂i ), s∗−i (θ−i )) ,
h i
Ri (θ̂i ) ≡ Eθ−i Pi (s∗i (θ̂i ), s∗−i (θ−i )) .
The expectations are taken over opponents’ types. Importantly, Gi and Ri are dependent on the choice of BNE s∗ . If
we consider a different BNE of the same auction, then we will have a different pair (G̃i , R̃i ).
To illustrate, consider the symmetric BNE we derived in the two-player auction in Example 64, where s∗i (θi ) = θi /2. It
θ̂i θ̂i2
should be intuitively clear that Gi (θ̂i ) = θ̂i and Ri (θ̂i ) = θ̂i · 2 = 2. We can also derive these expressions by definition:
Z θ̂1
1
θ̂1 θ2
Z ! Z 1
G1 (θ̂1 ) = H1 , dθ2 = 1 dθ2 + 0 dθ2 = θ̂1 ,
0 2 2 0 θ̂1
Z θ̂1
θ̂1 θ2 θ̂1 θ̂2
Z 1 ! Z 1
R1 (θ̂1 ) = P1 , dθ2 = dθ2 + 0 dθ2 = 1 .
0 2 2 0 2 θ̂1 2
As we have seen in lecture, the celebrated RET is just a corollary of the following result:
R θi
Proposition 65. Fix a BNE s∗ of the auction game. Under regularity conditions, Ri (θi ) = 0
xG0i (x) dx + Ri (0) for all
bidder i and type θi .
This result expresses the expected payment of an arbitrary type of bidder i in a BNE as a function of: (i) expected
payment of the lowest type of bidder i in this BNE; (ii) the expected probabilities of winning for various types of
player i in this BNE. It then follows that:
26 Even though the FOC only applies for interior θ ∈ (0, 1), continuity of s∗ implies that equation (5) holds even at the boundary points. This is
i i
sometimes called “value matching”.
27 We can in fact define G and R for any arbitrary profile of strategies s, without imposing that it is a BNE. However, Proposition 65 only holds
i i
when s is a BNE.
35
Theorem 66 (Revenue equivalence theorem). Under regularity conditions, for two BNEs of two auctions such that
This follows directly from Proposition 65. Since in a BNE, the expected payment of an arbitrary type is entirely
determined by the winning probabilities of different types and the expected payment of the lowest type, two BNEs
where these two objects match must have the same expected payment for all types.
Here are two examples where RET is not applicable due to Gi and G◦i not matching up for two BNEs.
Example 67. In a second-price auction, the asymmetric BNE does not satisfy the conditions of RET when compared
to the symmetric BNE of bidding own valuation. In the asymmetric equilibrium where where bidder 1 always bids θ
and everyone else always bids 0, Gi (θi ) = 0 for all i , 1 and all θi , since bidders other than 1 never win. Therefore, we
cannot conclude from RET that these two BNEs yield the same expected revenue. (In fact, they do not.)
Example 68. In Example 63, we showed that bidding own valuation is a BNE in a second-price auction with reserve
price. When reserve price is r > 0, this BNE does not satisfy the conditions of RET when compared to the BNE
of bidding own valuation in a second-price auction without reserve price. In the former BNE, Gi (θi ) = 0 for any
θi ∈ (0, r), whereas in the latter BNE these types have a strictly positive probability of winning the item. Therefore,
we cannot conclude from RET that these two BNEs in two auction formats yield the same expected revenue.
In fact, different reserve prices may lead to different expected revenues. Myerson (1981) tells you how to pick optimal
reserve prices to maximize the expected revenue of an auction.
4.2 Using RET to solve auctions. Sometimes, we can use RET to derive a closed-form expression of the BNE strategy
profile s∗ .
Example 69. As in Example 64, consider a first-price auction with two bidder whose valuations are i.i.d. with
θi ∼ U[0, 1]. Assume this auction has a symmetric BNE where s∗i (θi ) strictly increasing in θi . Then this BNE is
revenue equivalent to the BNE of second-price auction where each player bids own valuation. To see this, since both
BNEs feature strategies strictly increasing in type, i of type θi wins precisely when player −i has a type θ−i < θi . That
is to say, Gi (θi ) = θi = G◦i (θi ). At the same time, the expected payment from type 0 is 0 in both BNEs – in particular,
the type 0 bidder in first-price auction never wins since bids are strictly increasing in type, so never pays anything.
But in the bid-own-valuation BNE of the second-price auction, R◦i (θi ) = θi · (θi /2), where θi is probability of being the
highest bidder and θi /2 is expected rival bid in the event of winning. By RET, Ri (θi ) = θi · (θi /2) also. In first-price
auction, i pays own bid s∗i (θi ) whenever she wins, which happens with probability θi . Hence s∗i (θi ) = Ri (θi )/θi = θ2i .
This is the same as what we found using FOC in Example 64.
While in the above example we used RET to verify a result we already knew from FOC, RET can also be used in lieu
of FOC to find BNEs. This can be particularly helpful when the differential equation from the FOC approach is harder
to solve.
Example 70 (December 2012 Final Exam). Suppose there are two risk-neutral potential buyers of an indivisible good.
It is common knowledge that each buyer i’s valuation is drawn independently from the same distribution on [0, 1] with
distribution function F(θ) = θ3 , but the realizations of the θi ’s are private information. Calculate the expected payment
Ri (θi ) that a buyer with reservation price θi makes in the unique symmetric equilibrium of a second-price auction.
Then, using the revenue equivalence theorem, find the equilibrium bid function in a first-price auction in the same
setting.
Solution:
In the second-price auction, it is a weakly dominant BNE to bid own valuation (regardless of underlying distribution).
In this BNE, Z θi Z θi
3 θ j =θi 3 4
Ri (θi ) = θ j f (θ j ) dθ j = θ j (3θ2j ) dθ j = θ4j = θi .
0 0 4 θ j =0 4
This symmetric BNE of second-price auction is revenue equivalent to any BNE in first-price auction where bid in-
creases strictly with own type. This is because in these BNEs, Gi (θi ) = G◦i (θi ) = θi3 (since i of type θi wins exactly
when −i is of type lower than θi ) and Ri (0) = R◦i (0) = 0 (since type 0 never wins, so never pays). But in first-price
auction, Ri (θi ) = si (θi )Gi (θi ), so then si (θi ) = 4 θi /θi3 = 43 θi .
◦ ∗ ◦ ∗ 3 4
36
Example 71 (December 2016 Final Exam). Consider the high-bid auction with one indivisible good for sale assuming
the seller sets a reserve price of 12 . There are two risk-neutral buyers whose valuations are drawn independently from
a distribution on [0, 1] with c.d.f. F(t) = t2 . Find the buyers’ bid function in the symmetric BNE of this auction game.
Don’t forget to justify your answer.
Solution:
In Example 63, we showed that bidding own valuation is a BNE in a second-price auction with reserve price. Thus,
s◦ (θ) = θ, θ ∈ [0, 1] is a symmetric BNE of the second-price auction with reservation price 21 (henceforth SPAr). We
conjecture that there is a symmetric BNE of the first-price auction with reservation price 12 (henceforth FPAr) with
h i
s(θ) = 0 when θi < 21 and s(θ) strictly increasing for θ ∈ 12 , 1 .
Note that in both BNE, type θ = 0 pays 0 in expectation:h R◦ (0) = R(0) = 0. Furthermore, inh both
i BNE we have for the
expected probability of winning G (θ) = G(θ) = 0, θ ∈ 0, 2 and G (θ) = G(θ) = θ , θ ∈ 2 , 1 . In all, the conditions
◦ 1 ◦ 2 1
for applying RET are satisfied. We calculate for the case of the BNE of SPAr: R◦ (θ) = 0 for θ < 21 and for θ ≥ 12
Z 1 Z θ Z 1
2 1 2 3 1
R◦ (θ) = · 2y dy + y · 2y dy + 0 · 2y dy = θ + .
0 2 1
2 θ 3 24
Meanwhile, we have R(θ) = θ2 s(θ) for FPAr. Equating R◦ (θ) = R(θ), we get
0, if θ < 12 ,
s(θ) =
23 θ + 24θ
1
2, if θ ≥ 12 .
need it).
Finally, we check that the candidate is indeed a BNE of FPAr. It is obvious that types θ ≤ 12 don’t have any profitable
deviation: if they bid so as to have positive probability of winning the action they will have a negative expected payoff,
which is worse than 0 they are getting in equilibrium. We focus now on types θ > 12 . They wouldn’t deviate to
some b > s(1), which is the highest possible rival bid. If they were to bid b ≤ 21 , then payoff would be zero, while
equilibrium payoff is ! !
2 1 1 3 1
θ ·θ−θ · θ+
2 2
= θ − ,
3 24θ2 3 8
i
which is positive for θ > 21 . Thus, it remains to consider deviations to b ∈ 12 , s(1) . Deviating to such b is equivalent
i
to imitating some type in 12 , 1 . This follows because the candidate equilibrium bidding function is strictly increasing
i
in that range. The payoff from imitating θ̂ ∈ 21 , 1 is
!
2 1
θ̂ θ − θ̂
2 2
θ̂ + .
3 24θ̂2
i
This is a strictly concave function in θ̂ ∈ 2, 1
1
and the FOC condition w.r.t. θ̂ is
2θ̂θ − 2θ̂2 = 0,
5.1 Higher orders of belief. We have considered a Bayesian game as a model of how a group of Bayesian players
confront uncertainty. Common prior assumption (CPA) is useful in simplifying analysis, yet it makes several as-
sumptions: (i) Θ is assumed to have a product structure; (ii) it is common knowledge that θ is drawn according to µ.
That is to say, everyone knows µ, everyone knows that everyone else knows µ, etc. What if we relax the common prior
assumption? That is to say, how should a group of Bayesian players in general behave when confronting uncertainty
Θ?
If there is only one player, then the answer is simple. TheR Bayesian player comes up with a prior µ ∈ ∆(Θ) through
introspection, then chooses some s1 ∈ S 1 as to maximize θ∈Θ u1 (s1 , θ) dµ(θ). The prior µ is trivially a common prior,
since there is only one player.
37
However, in a game involving two players28 , the answer becomes far more complex. P1 is uncertain not only about
state of the world Θ, but also about P2’s belief over state of the world. P2’s belief matters for P1’s decision-making,
since P1’s utility depends on the pair (P1’s action, P2’s action) while P2’s action depends on his belief. As a Bayesian
must form a prior distribution over any relevant uncertainty, P1 should entertain not only a belief about state of the
world, but also a belief about P2’s belief, which is also unknown to P1.
To take a more concrete example, suppose there are two players Alice and Bob and the states of the world concern the
weather tomorrow, Θ = {sunny, rain}. Alice believes that there is a 60% chance that it is sunny tomorrow, 40% chance
that it rains, so we say she has a first-order belief µ(1) Alice ∈ ∆(Θ) with µAlice (sunny) = 0.6, µAlice (rain) = 0.4. Now
(1) (1)
Alice needs to form a belief about Bob’s belief regarding tomorrow’s weather. Alice happens to know that Bob is a
meteorologist who has access to more weather information than she does. In particular, Alice believes Bob’s belief
about weather tomorrow is correlated with the actual weather tomorrow. Either it is the case that tomorrow will
be sunny and Bob believes today that it will be sunny tomorrow with probability 90%, or it is the case that tomorrow
will rain and today Bob believes it will be sunny with probability 20%. Alice assigns 60-40 odds to these two cases.
We say Alice has a second-order belief µ(2) ∈ ∆(Θ × ∆(Θ)), where µ(2) Alice is supported on two points (sunny, µcase 1 ),
(1)
h i Alice (2) h i
(rain, µcase 2 ) with µAlice sunny, µcase 1 = 0.6, µAlice rain, µcase 2 = 0.4. Here µcase 1 and µcase 2 are elements of ∆(Θ)
(1) (2) (1) (1) (1) (1)
and µ(1)
case 1 (sunny) = 0.9 while µcase 2 (sunny) = 0.2. We are not finished. Surely Bob, like Alice, also holds some
(1)
second-order belief. Alice is uncertain about Bob’s second-order belief, so she must form a third-order belief
µ(3)
Alice ∈ ∆(Θ × ∆(Θ) × ∆(Θ × ∆(Θ)))
that is a joint distribution over (i) the weather tomorrow; (ii) Bob’s first-order belief about the weather; (iii) Bob’s
second-order belief about the weather. Alice further needs a fourth-order belief, fifth-order belief, and so on.
We highlight the following features of the above example, which will be relevant to the subsequent theory on the
universal type space:
• Alice entertains beliefs of order 1, 2, 3, ... about the state of the world, where kth-order belief is a joint dis-
tribution over state of the world, Bob’s first order belief, Bob’s second-order belief, ..., Bob’s (k − 1)th-order
belief.
• Alice’s second-order belief is consistent with her first-order belief, in the sense that whereas µ(1)
Alice assigns
probability of 60% to sunny weather tomorrow, µAlice marginalized to a distribution only over the weather also
(2)
Harsanyi (1967) first conjectured that for each specification of states of the world Θ, there corresponds an object
now called the “universal type space”29 , say T . Points in the universal type space correspond to all “reasonable”
hierarchies of first order belief, second order belief, third order belief, ... that a player could hold about Θ. Furthermore,
there exists a “natural” homeomorphism
f : T → ∆(Θ × T )
so that each universal type t encodes a joint belief f (t) over the state of the world and opponent’s universal type.
The universal type space is thus “universal” in the senses of (i) capturing all possible hierarchies of beliefs that might
arise under some signal structure about Θ; (ii) putting an end to the seemingly infinite regress of having to resort to
(k + 1)th-order beliefs in order to model beliefs about kth-order beliefs, then having to discuss (k + 2)th-order beliefs
to describe beliefs about the (k + 1)th-order beliefs just introduced, etc.
5.2 Constructing the universal type space. Mertens and Zamir (1985) first constructed the universal type space.
Brandenburger and Dekel (1993) gave an alternative, simpler30 construction, which we sketch here.
There are two players, i and j. The states of the world Θ is a Polish space (complete, separable metric space). For
each Polish space Z, write ∆(Z) for the set of probability measures on Z’s Borel σ-algebra. It is known that ∆(Z) is
metrizable by the Prokhorov metric, which makes ∆(Z) a Polish space of its own right.
Iteratively, define X0 ≡ Θ, X1 ≡ Θ × ∆(X0 ), X2 ≡ Θ × ∆(X0 ) × ∆(X1 ), etc. Each player has a first-order belief
µ(1)
i , µ j ∈ ∆(X0 ) that describes her belief about state of the world, a second-order belief µi , µ j ∈ ∆(X1 ) that describes
(1) (2) (2)
28 All of this extends to games with 3 or more players, but with more cumbersome notations.
29 Harsanyi initially called members of such space “attribute vectors”. The word “type” only appeared in a later draft after Harsanyi discussed his
research with Aumann and Maschler, who were also working on problems in information economics.
30 Brandenburger and Dekel’s construction was based on a slightly different set of assumptions than that of Mertens and Zamir. For instance,
Mertens and Zamir assumed Θ is compact, but Brandenburger and Dekel required Θ to be a complete, separable metric space. Neither is strictly
stronger.
38
her joint belief about state of the world and opponent’s first order belief, and in general a kth-order belief µ(k) i , µj ∈
(k)
∆(Xk−1 ) = ∆(Θ × ∆(X0 ) × ... × ∆(Xk−2 )) that describes her joint belief about state of the world, opponent’s first order
belief, ... and opponent’s (k − 1)-th order belief. Since X0 is Polish, each Xk is Polish.
A hierarchy of beliefs is a sequence of beliefs of all orders, (µ(1) i , µi , ...) ∈
(2)
k=0 ∆(Xk ) ≡ T 0 . Note that there is a great
Q∞
deal of redundancy within the hierarchy. Indeed, as µi is a distribution over the first k elements of Θ, ∆(X0 ), ∆(X1 ), ...,
(k)
(k0 )
each µ(k)i can be appropriately marginalized to obtain a distribution over the same domain as µi for any 1 ≤ k0 < k.
Call a hierarchy of beliefs consistent if each µi marginalized on all except the last dimension equals µi(k−1) and
(k)
write T 1 ⊂ T 0 for the subset of consistent hierarchies. Then, Kolmogorov extension theorem implies Q for each
consistent hierarchy (µ(1)i , µ(2)
i , ...), there exists a measure f (µ i , µ
(1) (2)
i , ...) over the infinite product Θ × k=0 ∆(Xk )
∞
such that f (µi , µi , ...) marginalized to Θ × ... × ∆(Xk−1 ) equals µi for each k = 0, 1, 2, ... But Θ ×
(1) (2) (k)
k=0 ∆(Xk ) is
∞
Q
in fact Θ × T 0 , so that f associates each consistent hierarchy with a joint belief over state of the world and (possibly
inconsistent) hierarchy of the opponent. Further, this association is natural in the sense that f (µ(1) i , µi , ...) describes
(2)
the same beliefs and higher order beliefs about Θ as the hierarchy (µi , µi , ...). We may further verify the map
(1) (2)
T k ≡ {t ∈ T 1 : [ f (t)](Θ × T k−1 ) = 1}
That is, T k is the subset of consistent types who put probability 1 on opponent’s type being in the subset T k−1 . Let
T ≡ k T k , which is the class of types with “common knowledge of consistency”: i “knows”31 j’s type is consistent,
T
i “knows” that j “knows” i’s type is consistent, etc. This is the universal type space over Θ. The map f can be
restricted to the subset T to given a natural homeomorphism from T to ∆(Θ × T ).
5.3 Bayesian game as a belief-closed subset of the universal type space. Here we discuss how the Bayesian game
model relates to the universal type space.
Take a common prior Bayesian game B = hN, (Θi )i∈N , (Ai )i∈N , (ui )i∈N , µi and suppose for simplicity there are two
players, i and j. Each θi∗ ∈ Θi corresponds to a unique point in the universal type space T over Θ, which we write
i [θi ] ∈ ∆(Θ), such that for
as t(θi∗ ) ∈ T . To identify t(θi∗ ), note that player i of the type θi∗ has a first-order belief µ(1) ∗
E1 ⊆ Θ,
µ(1)
i [θi ](E 1 ) ≡ µ(E 1 |θi )
∗ ∗
where µ(·|θi∗ ) ∈ ∆(Θ) is the conditional distribution on Θ derived from the common prior, given that θi = θi∗ .
Furthermore, θi∗ also leads to a second-order belief µ(2) i [θi ] ∈ ∆(Θ × ∆(Θ)), where for E 1 ⊆ Θ, E 2 ⊆ ∆(Θ),
∗
n o
µ(2)
i [θi ] (E 1 × E 2 ) ≡ µ θ̂ ∈ Θ : θ̂ ∈ E 1 and µ j [θ̂ j ] ∈ E 2 θi
∗ (1) ∗
here µ(1)
j : Θ j → ∆(Θ) is defined analogously to µ(1)
i . One may similarly construct the entire hierarchy t(θi ) =
∗
(µ(k) ∗ ∞ ∗
i [θi ])k=1 and verify that it satisfies common knowledge of consistency. Hence, t(θi ) ∈ T . This justifies calling
elements of Θi “types” of player i, for indeed they correspond to universal types over the states of the world.
The set of universal types present in the Bayesian game B, namely
n o
T (B) ≡ t(θk∗ ) : θk∗ ∈ Θk , k ∈ {i, j}
is a belief-closed subset of T . That is, each t ∈ T (B) satisfies [ f (t)] (Θ × T (B)) = 1, putting probability 1 on the event
that opponent is drawn from the set of universal types T (B).
39
Economics 2010a . Section 5: Dynamic Games (I)32 11/21/2021
(1) Subgame-perfect equilibrium; (2) Infinite-horizon games and one-shot deviation;
(3) Rubinstein-Stahl bargaining; (4) Introduction to repeated games; (5) Folk theorem for infinitely repeated games
TF: Chang Liu (chang_liu@g.harvard.edu)
1 Subgame-Perfect Equilibrium
1.1 Nash equilibrium in finite-horizon games. Recall the definition of a finite-horizon extensive form game and the
definition of a strategy in extensive form games from Section 1.
Definition 72. A (finite-horizon) extensive form game Γ consists of:
A pure strategy profile s induces a distribution over terminal vertices Z, which we write as p(·|s) ∈ ∆(Z), where the
randomness only comes from the moves of nature. Hence we may define, for each player i, Ui : S → R where
That is, the extensive game payoff to player i is defined as her expected utility from terminal vertices, according to her
Bernoulli utility ui and the distribution over terminal vertices induced by the strategy profile.
More generally, a mixed strategy profile σ also induces a distribution over terminal vertices Z, where now the random-
ness comes from both the moves of nature and the (independent) randomization of the players. We write p(·|σ) ∈ ∆(Z)
for the implied distribution over terminal vertices, and extend the domain of Ui to k∈N ∆ (S k ), where
Q
Note that we always assume that nature randomizes independently from the players.
A Nash equilibrium in extensive form game is defined in the natural way: a strategy profile where no player has a
profitable unilateral deviation, where potential deviations are different extensive form game strategies.
Definition 74. A Nash equilibrium in an extensive form game is a strategy profile σ∗ with Ui (σ∗i , σ∗−i ) ≥ Ui (s0i , σ∗−i )
for all s0i ∈ S i .
Example 75 (The ultimatum game33 ). Figure 7 shows the game tree of an ultimatum game, Γ. It models an interaction
between players 1 and 2 who must split two identical, indivisible items. Player 1 proposes an allocation. Then, player
2 Accepts or Rejects the allocation. If the allocation is accepted, it is implemented. If it is rejected, then neither player
gets any of the good.
32 Figure 13 is adapted from Osborne and Rubinstein (1994).
33 The ultimatum game is an experimental economics game in which two parties interact anonymously and only once, so reciprocation is not an
issue. The first player proposes how to divide a sum of money with the second party. If the second player rejects this division, neither gets anything.
40
1
Give 0 Give 2
Give 1
2 2
2
A R A R A R
Player 1 moves at the root of the game tree. Her move set at the root is {0, 1, 2}, which correspond to giving 0, 1, 2
units of the good to player 2. Regardless of which action player 1 chooses, the game moves to a vertex where it is
player 2’s turn to play. His move set at each of his three decision vertices is {A, R}, corresponding to accepting and
rejecting the proposed allocation.
The strategy profile s∗1 (∅) = 2, s∗2 (0) = s∗2 (1) = R, s∗2 (2) = A is a Nash equilibrium. Certainly player 2 has no profitable
deviations since U2 (s∗1 , s∗2 ) = 2, which is the highest he can hope to get in this game. As for player 1, she also has
no profitable unilateral deviations, since offering 0 or 1 to player 2 leads to rejection and no change in her payoff. By
the way, this is why we insist that a strategy in an extensive form game specifies what each player would do at each
information set, even those information sets that are not reached when the game is played. What player 2 would have
done if offered 0 or 1 is crucial in sustaining a Nash equilibrium in which player 1 offers 2.
1.2 Subgames and subgame-perfect equilibrium. In some sense, the Nash equilibrium of Example 75 is artificially
sustained by a non-credible threat. Player 2 threatens to reject the proposal if player 1 offers 1, despite the fact that
he has no incentive to carry out the threat if player 1 really makes this offer. This threat does not harm player 2’s
payoff in the game Γ, since player 2’s unoptimized decision vertex is never reached when the strategy profile (s∗1 , s∗2 )
is played – it is “off the equilibrium path”.
Whether or not strategy profiles like (s∗1 , s∗2 ) make sense as predictions of the game’s outcome depends on the availabil-
ity of commitment devices. If at the start of the game player 2 could somehow make it impossible for himself to accept
the even-split offer, then this Nash equilibrium is a reasonable prediction. In the absence of such commitment devices,
however, we should seek out a refinement of Nash equilibrium in extensive form games to rule out such non-credible
threats.
We begin with the definition of a subgame.
Definition 76 (Subgame). In a finite-horizon extensive form game Γ, any vertex x ∈ V\Z such that every information
set is either entirely contained in the subtree starting at x or entirely outside of it defines a subgame, Γ(x). This
subgame is an extensive form game inherits the payoffs, moves, and information structure of the original game Γ in
the natural way.
Example 77 (The ultimatum game). The ultimatum game in Example 75 has 4 subgames: Γ(∅) (which is just Γ), as
well as Γ(0), Γ(1), Γ(2). We sometimes call Γ(0), Γ(1), Γ(2) the proper subgames.
Definition 78 (Subgame-perfect equilibrium). A strategy profile σ∗ of Γ is called a subgame-perfect equilibrium
(SPE) if for every subgame Γ(x), σ∗ restricted to Γ(x) forms a Nash equilibrium in Γ(x).
Note that we can rewrite mixed strategies as behavioral strategies in games with perfect recall. Then, for each player,
the restriction of the strategies to a subgame is just the collection of the behavioral strategies corresponding/relevant
to information sets in the subgame.
Γ(∅) = Γ is always a subgame since the root of the game tree is always in a singleton information set. Therefore, every
SPE is an NE, but not conversely.
Example 79 (The ultimatum game). The NE (s∗1 , s∗2 ) from Example 75 is not an SPE, since (s∗1 , s∗2 ) restricted to the
subgame Γ(1) is not an NE. However, the following is an SPE: s◦1 (∅) = 1, s◦2 (0) = R, s◦2 (1) = s◦2 (2) = A. It is easy
to see that restricting (s◦1 , s◦2 ) to each of the subgames Γ(0), Γ(1), Γ(2) forms an NE. Furthermore, (s◦1 , s◦2 ) is a NE in
Γ(∅) = Γ. Player 1 gets U1 (s◦1 , s◦2 ) = 1 under this strategy profile, while offering 0 leads to rejection and a payoff of 0,
offering 2 leads to acceptance but again a payoff of 0. For player 2, changing s◦2 (0) and s◦2 (2) do not change payoff in
Γ, since these two vertices are never reached. Changing s◦2 (1) from A to R hurts payoff.
41
1.3 Backward induction. Backward induction is an algorithm for finding SPE in a finite-horizon extensive form
game of perfect information. The idea is to successively replace subgames with terminal vertices corresponding to
SPE payoffs of the deleted subgames.
Start with a non-terminal vertex furthest away from the root of the game, say v. Since we have picked the deepest
non-terminal vertex, all of J(v)’s moves at this vertex must lead to terminal vertices. Choose one of J(v)’s moves, m∗ ,
that maximizes her payoff in Γ(v), then replace the subgame Γ(v) with the terminal vertex corresponding to m∗ . Repeat
this procedure, working backwards from the vertices further away from the root of the game. Eventually, the game
tree will be reduced to a single terminal vertex, whose payoff will be an SPE payoff of the extensive form game, while
the moves chosen throughout the deletion process will form a SPE strategy profile.
Example 80. Figure 8 through 12 display the process of backward induction.
L R
2 2
L R L R
1 1 1 1
L R L R L R L R
Backward induction replaces subgames with terminal nodes associated with the SPE payoffs in those subgames. Here
is the resulting game tree after one step of backward induction.
L R
2 2
L R L R
1 1 1
(5, 3)
L R L R L R
Figure 9: The resulting game tree after one step of backward induction.
Proceed similarly, after eliminating all nodes at depth 3 in the original tree, the results look as follows:
42
1
L R
2 2
L R L R
Figure 10: Backward induction in progress. All nodes at depth 3 in the original tree have been eliminated.
We can continue:
L R
(1, 4) (2, 2)
Figure 11: Backward induction in progress. Only nodes with depth 1 remain.
(2, 2)
Figure 12: Backward induction finds the unique SPE payoff in this game.
This completes backward induction, and we have found the unique SPE of the original game, s∗ : s∗1 (∅) = R, s∗1 (L, L) =
L, s∗1 (L, R) = R, s∗1 (R, L) = L, s∗1 (R, R) = R, s∗2 (L) = R, s∗2 (R) = L.
If ui (z) , ui (z0 ) for every i and z , z0 , then backward induction finds the unique SPE of the extensive form game. Oth-
erwise, the game may have multiple SPEs and backward induction may involve choosing between several indifferent
moves. Depending on the moves chosen, backward induction may lead to different SPEs.
Example 81 (From MWG34 ). Consider a game in which the following simultaneous-move game is played twice. The
players observe the actions chosen in the first period before they play in the second period. What are the pure strategy
SPEs of this game?
b1 b2 b3
a1 10, 10 2, 12 0, 13
a2 12, 2 5, 5 0, 0
a3 13, 0 0, 0 1, 1
Solution:
The pure strategy Nash equilibria of the one-shot game are (a2 , b2 ) and (a3 , b3 ). Thus any pure strategy SPE involves
playing either of these in the second period. We conjecture the following four classes of SPE:
43
2. Players plays (ai , bi ) in the first period and (a j , b j ) in the second period, i, j ∈ {2, 3} and i , j.
3. Player 1’s strategy: play ai , i ∈ {1, 2, 3} in period 1; play a2 in period 2 if player 2 played b1 in period 1,
otherwise play a3 .
Player 2’s strategy: play b1 in period 1; play b2 in period 2 if player 2 played ai in period 1, otherwise play b3 .
4. Player 2’s strategy: play bi , i ∈ {1, 2, 3} in period 1; play b2 in period 2 if player 2 played a1 in period 1,
otherwise play b3 .
Player 1’s strategy: play a1 in period 1; play a2 in period 2 if player 2 played bi in period 1, otherwise play a3 .
Classes 1 and 2 are easy to check. To see that classes 3 and 4 are indeed SPEs, note that by deviating a player loses 4
in the second period and no player can gain more than 3 in any of the described strategy profiles. Equilibrium classes
3 and 4 are implemented through ‘punishment for deviations’.
2.1 Infinite-horizon games. So far, we have only dealt with finite-horizon games. These games are represented by
finite-depth game trees and must end within M turns for some finite M. But games such as the Rubinstein-Stahl
bargaining are not finite-horizon, for players could reject each other’s offers forever. We modify Definition 82 to
accommodate such infinite-horizon games. For simplicity, we assume the game has perfect information and no chance
moves.
Definition 82. An extensive form game with perfect information and no chance moves Γ consists of:
5. A set of infinite histories H ∞ , where each h∞ ∈ H ∞ represents an infinite-length path (v0 , v1 , . . . ) in the tree.
6. A (Bernoulli) utility function u j : Z ∪ H ∞ → R for each j ∈ N.
When an infinite-horizon game is played, it might end at a terminal vertex (such as when one player accepts the other’s
offer in the bargaining game), or it might never reach a terminal vertex (such as when both players use a strategy
involving never accepting any offer in the bargaining game). Therefore, each player must have a preference not only
over the set of terminal vertices, but also over the set of infinite histories. In the bargaining game, for instance, it is
specified that u j (h∞ ) = 0 for any h∞ ∈ H ∞ , j = 1, 2, that is to say every infinite history in the game tree (i.e., never
reaching an agreement) gives 0 utility to each player.
Many definitions from finite-horizon extensive form games directly translate into the infinite-horizon setting. For
instance, any nonterminal vertex x in the perfect-information infinite-horizon game defines a subgame Γ(x). NE is
defined in the obvious way, taking into account distribution over both terminal vertices and infinite histories induced
by a strategy profile. SPE is still defined as those strategy profiles that form an NE when restricted to each of Γ’s
(possibly infinitely many) subgames.
2.2 One-shot deviation principle. It is often difficult to verify directly from definition whether a given strategy profile
forms an SPE in an infinite-horizon game. Indeed, given an SPE candidate s∗ of game Γ, we would have to consider
each subgame Γ(x), which is potentially an infinite-horizon extensive form game of its own right, and ask whether
player i can improve her payoff in Γ(x) by choosing a different extensive form game strategy s0i , modifying some or all
of her choices at various vertices in Vi relative to s∗i . This is not an easy task since i’s set of strategies in Γ(x) is a very
rich set. The one-shot deviation principle says for extensive form games satisfying certain regularity conditions, we
need only check that i does not have a profitable deviation amongst a very restricted set of strategies in each subgame
Γ(x), namely those that differ from s∗i only at x.
Definition 83 (Continuous at infinity). Γ is continuous at infinity if for all ε > 0, there exists an integer T such that
for every player i and any two infinite histories h∞ , h̃∞ ∈ H ∞ that share the first T nodes, ui (h∞ ) − ui (h̃∞ ) < ε.
44
Theorem 84 (One-shot deviation principle). If Γ is continuous at infinity, then a strategy profile s∗ is an SPE of Γ if
and only if for every player i, every x ∈ Vi and every strategy s0i such that s∗i (v) = s0i (v) at every v , x,
Continuity at infinity is satisfied by all finite-horizon extensive form games, as well as all infinite-horizon games
studied in lecture, including bargaining and repeated games. Under this condition, to verify whether s∗ is an SPE, we
only need to examine each subgame Γ(x) and consider whether player J(x) can improve her payoff in Γ(x) by changing
her move only at x (a “one-shot deviation”).
3 Rubinstein-Stahl Bargaining
3.1 Bargaining as an extensive form game. The Rubinstein-Stahl bargaining game, or simply “bargaining game”35
for short, is an important example of infinite-horizon, perfect-information extensive form game. It is comparable to
the ultimatum game from Example 75, but with two important differences: (i) the game is infinite-horizon, so that
first rejection does not end the game. Instead, players alternate in making offers; (ii) the good that players bargain over
is assumed infinitely divisible, so that any allocation of the form (x, 1 − x) for x ∈ [0, 1] is feasible.
x1
2 t=1
A (x , 1 − x1 )
1
R
2
x2
1 t=2
A (δx , δ(1 − x2 ))
2
R
Figure 13: Part of the bargaining game tree, showing only some of the branches in the first two periods. The root ∅
has (uncountably) infinitely many children of the form (x1 , 1 − x1 ) for x1 ∈ [0, 1]. At each such child, player 2 may
play R or A. Playing A leads to a terminal node with payoffs (x1 , 1 − x1 ), while playing R continues the game with
player 2 to make the next offer.
Let’s think about what a strategy in the bargaining game looks like. Figure 13 shows a sketch of the bargaining
game tree. Player 1’s strategy specifies s1 (∅), that is to say what player 1 will offer at the start of the game. For
each x1 ∈ [0, 1], player 2’s strategy specifies s2 ((x1 , 1 − x1 )) ∈ {A, R}, that is whether he accepts or rejects a period
1 offer of (x1 , 1 − x1 ). In addition, player 2’s strategy must also specify s2 ((x1 , 1 − x1 ), R) for each x1 ∈ [0, 1],
that is what he offers in period t = 2 if he rejected player 1’s offer in t = 1.36 This offer could in principle could
depend on what player 1 offered in period t = 1. Now for every x1 , x2 ∈ [0, 1], player 1’s strategy must specify
35 Notto be confused with axiomatic Nash bargaining, which we will study in Jerry’s part in 2010b.
36 Remember, a strategy for j is a complete contingency plan that specifies a valid move at any vertex in the game tree where it is j’s turn to play,
even those vertices that would never be reached due to how j plays in previous rounds. Even if player 2’s strategy specifies accepting every offer
from player 1 in t = 1, player 2 still needs to specify what he would do after a history of the form ((x1 , 1 − x1 ), R) for each x1 ∈ [0, 1].
45
s1 ((x1 , 1 − x1 ), R, (x2 , 1 − x2 )) ∈ {A, R}, which could in principle depend on what she herself offered in period t = 1, as
well as what player 2 offered in the current period, (x2 , 1 − x2 ).
3.2 Asymmetric bargaining power. Here is a modified version of the bargaining game that introduces asymmetric
bargaining power between the two players.
Example 86. P1 gets to make offers in periods 3k + 1 and 3k + 2, while P2 gets to make offers in periods 3k + 3. As in
the usual bargaining game, reaching an agreement of (x, 1 − x) in period t yields the payoff profile (δt−1 · x, δt−1 · (1 − x)).
If the players never reach an agreement, then payoffs are (0, 0).
δ2
Consider the following strategy profile: whenever P1 makes an offer in period 3k + 1, she offers 1+δ+δ 1+δ
2 , 1+δ+δ2 .
1+δ2 δ
Whenever P1 makes an offer in period 3k + 2, she offers 1+δ+δ 2 , 1+δ+δ2 . Whenever P2 makes an offer, he offers
δ+δ2 δ2
, 1 . Whenever P2 responds to an offer in period 3k + 1, he accepts if and only if he gets at least 1+δ+δ
1+δ+δ2 1+δ+δ2 2.
δ
Whenever P2 responds to an offer in period 3k + 2, he accepts if and only if he gets at least 1+δ+δ2 . Whenever P1
δ+δ2
responds to an offer, she accepts if and only if she gets at least 1+δ+δ 2 . You may verify that this verbal description
indeed defines a strategy profile that plays a valid move at every non-terminal node of the bargaining game tree.
We use the one-shot deviation principle to verify that this strategy profile is SPE. By the principle, we need only
ensure that in each subgame, the player to move at the root of the subgame cannot gain by changing her move only at
the root. Subgames of this bargaining game may be classified into six families:
1. Subgame starting with P1 making an offer in period 3k + 1.
2. Subgame starting with P1 making an offer in period 3k + 2.
3. Subgame starting with P2 making an offer in period 3k + 3.
4. Subgame starting with P2 responding to an offer (x, 1 − x) in period 3k + 1.
5. Subgame starting with P2 responding to an offer (x, 1 − x) in period 3k + 2.
6. Subgame starting with P1 responding to an offer (x, 1 − x) in period 3k + 3.
We consider these six families one by one, showing in no subgame is there a profitable one-shot deviation. By the
one-shot deviation principle, this shows the strategy profile is an SPE.
1. Subgame starting with P1 making an offer in period 3k + 1.
δ2
Not deviating gives P1 δ3k · 1+δ+δ
1+δ
2 . Offering P2 more than 1+δ+δ2 leads to acceptance but yields strictly less utility
δ2
1+δ2
to P1. Offering P2 less than 1+δ+δ 2 leads to rejection. In the next period, P1 will offer herself 1+δ+δ2 , which P2
1+δ2
will accept. Therefore, this deviation gives P1 utility δ3k+1 · 1+δ+δ 2 < δ
3k 1+δ
· 1+δ+δ 2 . So we see P1 has no profitable
δ
2
δ 2
2 , the strategy for P2 prescribes acceptance, giving P2 a utility of δ · (1 − x) ≥ δ3k · 1+δ+δ
3k
If 1 − x ≥ 1+δ+δ 2.
δ
If P2 rejects instead, then in the next period P1 will offer P2 1+δ+δ2 which P2 will accept, giving P2 a utility of
δ δ2
δ3k+1 · 1+δ+δ 2 ≤ δ
3k
· 1+δ+δ 2 . So P2 has no profitable one-shot deviation.
46
5. Subgame starting with P2 responding to an offer (x, 1 − x) in period 3k + 2.
δ
If 1 − x < 1+δ+δ 1
2 , the strategy for P2 prescribes rejection. In the next period, P2 will offer himself 1+δ+δ2 , which
deviation.
δ δ
2 , the strategy for P2 prescribes acceptance, giving P2 utility of δ · (1 − x) ≥ δ3k+1 · 1+δ+δ
3k+1
If 1 − x ≥ 1+δ+δ 2 . If
1
P2 rejects instead, then in the next period P2 will offer himself 1+δ+δ2 , which P1 will accept, giving P2 a utility
δ
of δ3k+2 · 1+δ+δ
1
2 = δ
3k+1
· 1+δ+δ 2 ≤ δ
3k+1
· (1 − x). So P2 has no profitable one-shot deviation.
6. Subgame starting with P1 responding to an offer (x, 1 − x) in period 3k + 3.
δ+δ 2
If x < 1+δ+δ 1+δ
2 , the strategy for P1 prescribes rejection. In the next period, P1 will offer herself 1+δ+δ2 , which P2
δ+δ 2
2 , the strategy for P1 prescribes acceptance, giving P1 utility of δ · (x) ≥ δ3k+3 · 1+δ+δ
3k+2 1+δ
If x ≥ 1+δ+δ 2 . If P1
1+δ
rejects instead, then in the next period P1 will offer herself 1+δ+δ2 , which P2 will accept, giving P1 a utility of
δ3k+3 · 1+δ+δ
1+δ
2 ≤ δ
3k+2
· (x). So P1 has no profitable one-shot deviation.
δ2
2 , 1+δ+δ2 in t = 1 and P2 accepts. This is a better outcome for P1 than in
1+δ
Along the equilibrium path, P1 offers 1+δ+δ
1
the symmetric bargaining game where P1 gets 1+δ . P1’s SPE payoff improves when she has more bargaining power.
One can also show that the payoffs of this SPE are the unique SPE payoffs of the game. The proof idea is to consider
three bargaining games: Γ1 , Γ2 and Γ3 . Γ1 is the bargaining game exhibited in the statement of the problem. Γ2 is
the bargaining game, where P1 makes an offer in period one, P2 makes an offer in period two and if both reject the
bargaining game Γ1 is played. Γ3 is the bargaining game where P2 makes the first offer in period one and if P1 rejects
the bargaining game Γ1 is played.
Define the minimal, maximal SPE payoffs for both players in all three bargaining games and use an analysis similar
to lecture to find necessary inequalities these minimal, maximal SPE payoffs have to satisfy. Ultimately, one arrives
at ‘enough’ inequalities so that a combination of them gives a tight characterization showing that the minimal and
maximal SPE payoffs for both players are equal to the payoffs in the SPE considered in this example.
4.1 What is a repeated game? Many of the normal form and extensive form games studied so far can be viewed as
models of one-time encounters. After players finish playing Rubinstein-Stahl bargaining or high-bid auction, they
part ways and never interact again. In many economic situations, however, a group of players may play the same
game again and again over a long period of time. For instance, a customer might approach a printing shop every
month with a major printing job. While the printing shop has an incentive to shirk and produce low-quality output
in a one-shot version of this interaction, in a long-run relationship the shop might never shirk as to avoid losing the
customer in the future. In general, repeated games study what outcomes can arise in such repeated interactions.
Formally speaking, repeated games (with perfect monitoring37 ) form an important class of examples within extensive
form games with finite- or infinite-horizon, depending on the length of repetition.
Definition 87 (Finitely repeated game). For a normal form game G = hN, (Ak )k∈N , (uk )k∈N i and a positive integer T ,
denote by G(T ) the extensive form game where G is played in every period for T periods and players observe the
action profiles from all previous periods. G is called the stage game and G(T ) the T -times repeated game. Terminal
vertices of G(T ) are of the form hT = (a1 , a2 , ..., aT ) ∈ AT and payoff to player i at such a terminal vertex is
T
X
Ui (hT ) ≡ ui (at ).
t=1
A pure strategy for player i maps each non-terminal history of action profiles to a stage game action,
T
[ −1
si : Ak → Ai .
k=0
37 That is to say, actions taken in previous periods are common knowledge. There exists a rich literature on repeated games with coarser monitoring
structures – for instance, all players observe an imperfect public signal of each period’s action profile, or each player privately observes such a signal
– and folk theorems in these generalized settings (Fudenberg, Levine, and Maskin, 1994; Kandori and Matsushima, 1998; Sugaya, Forthcoming).
47
Definition 88 (Infinitely repeated game). For a normal form game G = hN, (Ak )k∈N , (uk )k∈N i and δ ∈ [0, 1), denote
by Gδ (∞) the extensive form game where G is played in every period for infinitely many periods and players act like
exponential discounters with discount factor δ. Gδ (∞) is called the infinitely repeated game with discount factor δ.
An infinite history of the form h∞ = (a1 , a2 , ..., ) ∈ A∞ gives player i the payoff
∞
X
Ui (h∞ ) ≡ δt−1 ui (at ).
t=1
A pure strategy for player i maps each finite history of action profiles to a stage game action,
∞
[
si : Ak → Ai .
k=0
A strategy for player i in G(T ) or Gδ (∞) must specify a valid action of the stage game G after any non-terminal history
(a1 , ..., ak ) ∈ Ak , including those histories that would never be reached under i’s strategy. For example, even if P1’s
strategy in repeated prisoner’s dilemma is to always play D, she still needs to specify s1 ((C, C), (C, C)), that is what
she will play in period 3 if both players cooperated in the first two periods.
As defined above, our treatment of repeated games focuses on the the simplest case where payoffs in period t are
independent of actions taken in all previous periods. This rules out, for instance, investment games where players
choose a level of contribution every period and the utility in period t depends on the sum of all accumulated capital up
to period t.
When discussing repeated games, we are often interested in the “average” stage game payoff under a repeated game
strategy profile. The following definitions are just normalizations: they ensure that the (finite or infinite) constant
action profile (a, a, . . . ) leads to an average payoff of ui (a).
Definition 89 (Average payoff). In G(T ), the average payoff to i at a terminal vertex hT = (a1 , a2 , ..., aT ) ∈ AT is
T
T 1X
U i (h ) ≡ ui (at ).
T t=1
In Gδ (∞), the (discounted) average payoff to i at the infinite history h∞ = (a1 , a2 , ..., ) ∈ A∞ is
∞
X
U i (h∞ ) ≡ (1 − δ) δt−1 ui (at ).
t=1
4.2 Some immediate results. The first result is immediate from backward induction.
Proposition 90. If G has a unique NE, then for any finite T , the repeated game G(T ) has a unique SPE. In this SPE,
players play the unique stage game NE after every non-terminal history.
Proof. Let σ∗ be an SPE of G(T ). For any history hT −1 of length T − 1, σ∗ (hT −1 ) must be the unique NE of G. Else,
some player must have a strictly profitable deviation in the last period T . So we deduce σ∗ plays the unique NE of G
in period T regardless of what happened in previous periods.
But this means σ∗ (hT −2 ) must also be the unique NE of G for any history hT −2 of length T − 2. otherwise, consider the
subgame starting at hT −2 . If σ∗ (hT −2 ) does not form an NE, some player i can improve her payoff in the current period
by changing her action in period T − 1, and furthermore this change does not affect her payoff in future periods, since
we have argued the unique NE of G will be played in period T regardless of what happened earlier in the repeated
game. So we have found a strictly profitable deviation for i in the subgame, contradicting the fact that σ∗ is an SPE.
Hence, we have shown σ∗ plays the unique NE of G in the last two periods of G(T ), regardless of what happened
earlier. Continuing this argument shows the unique NE of G is played after any non-terminal history.
48
These are the payoffs that can be obtained if players use a public randomization device to correlate their actions.
Specifically, as every v ∈ co({u(a) : a ∈ A}) can be written as a weighted average v = r`=1 p` u(a(`) ) where p` ≥ 0,
P
`=1 p` = 1 and a ∈ A for each `, one can construct a correlated strategy profile where all players observe a public
Pr (`)
random variable that realizes to ` with probability p` , then player i plays a(`)
i upon observing `. The expected payoff
profile under this correlated strategy profile is v.
This public randomization device will be used in the construction of the equilibrium in two ways:
1. To realize specific feasible payoffs in certain periods as described above (on or off equilibrium path). It follows
from the optimization property of the equilibrium that agents will follow the prescriptions of the public random-
ization device (i.e., the incentives are given by the continuation play encoded in the equilibrium strategies).
2. To construct SPEs which have as payoffs a mixture of the payoffs from two (or more) SPEs. An illustrative
example is as follows: imagine that we have two SPE profiles, σ(1) and σ(2) , which give SPE payoff profiles
U(σ(1) ) and U(σ(2) ). Then, given λ ∈ (0, 1), agents can achieve the SPE payoff profile λU(σ(1) ) + (1 − λ)U(σ(2) )
by using the public randomization device and replicating a toss of coin with heads probability of λ at time t = 0
before play starts: if heads up, then play SPE σ(1) , otherwise play σ(2) . This is an SPE, albeit now constructed
by public randomization, because no matter what the public outcome of the coin is, the agents will follow its
prescription due to the SPE property of σ(1) and σ(2) .
Definition 92 (Minimax payoff). In a normal form game G = hN, (Ak )k∈N , (uk )k∈N i, player i’s minimax payoff is
defined as
vi ≡ min max ui (ai , α−i ).
α−i ∈∆(A−i ) ai ∈Ai
Definition 93 (Individually rational). In a normal form game G = hN, (Ak )k∈N , (uk )k∈N i, call a payoff profile v ∈ Rn
individually rational (IR) if vi ≥ vi for every i ∈ N. Call v strictly individually rational if vi > vi for every i ∈ N.
1. The outer minimization in minimax payoff is across the set of correlated strategy profiles of −i. As demon-
strated in the coordination game with an eavesdropper (Example 32), the correlated minimax payoff of a player
could be strictly lower than her independent minimax payoff (when opponents in −i play independently mixed
actions). This distinction is not very important for this course, as we will almost always consider two-player
stage games when studying repeated games, so that the set of “correlated” strategy profiles of −i is just the set
of mixed strategies of −i.
2. We claimed to have described repeated games with perfect monitoring in Definitions 87 and 88, but the mon-
itoring structure as written was less than perfect. Players only observe past actions and cannot always detect
deviations from mixed strategies or correlated strategies, so in particular they do not know for sure if every-
one is faithfully playing a mixed or correlated minimax strategy profile against i.38 To remedy this problem, we
can assume that every coalition (including singleton coalitions) observes a correlating signal at the start of every
period, which they use to implement correlated strategies and mixed strategies. Furthermore, the realizations
of such correlating signals become publicly known the end of each period, so that even correlated strategies
and mixed strategies are “observable”. This remark is again not very important for this course, for the minimax
action profile turns out to be pure in most stage games we examine. In addition, Fudenberg and Maskin (1986)
showed that their folk theorem continues to hold, albeit with a modified proof, even when players only observe
past actions and not the realizations of past correlating devices.
Proposition 94. Suppose σ∗ is a Nash equilibrium for G(T ) or Gδ (∞). Then the average payoff profile associated
with σ∗ is feasible and IR for the stage game G.
Proof. Evidently, the payoff profile in every period of the repeated game must be in co({u(a) : a ∈ A}). In G(T ),
the average payoff profile under σ∗ is the simple average of T such points, while in Gδ (∞) it is a weighted average
of countably many such points, so in both cases the average payoff profile must still be in co({u(a) : a ∈ A}) by the
convexity of this set.
Suppose now player i’s average payoff is strictly less than vi . Then consider a new repeated game strategy σ0i for i,
where σ0i (h) best responds to the (possibly correlated) action profile σ∗−i (h) after every non-terminal history h. Then
playing σ0i guarantees i at least vi in every period, so that his average payoff will be at least vi . This would contradict
the optimality of σ∗i in the NE σ∗ .
38 Even when there are only 2 players, the minimax strategy against P1 might be a mixed strategy of P2. By observing only past actions, P1 does
49
5 Folk Theorem for Infinitely Repeated Games
δ
5.1 The folk
theorem for infinitely repeated games. It is natural to ask what payoff profiles can arise in G (∞). Write
δ δ
E G (∞) for the set of average payoff profiles attainable in SPEs of G (∞). Since every SPE is NE, in view of
Proposition 94, the most we could hope for are results of the following form: “limδ→1 E Gδ (∞) equals the set of
feasible and IR payoffs of G.” Theorems along this line are usually called “folk theorems”, for such results were
widely believed and formed part of the economic folklore long before anyone obtained a formal proof.
It is important to remember that folk theorems are not merely efficiency results. They are more correctly characterized
as “anything-goes results”. Not only do they say that there exist SPEs with payoff profiles close to the Pareto frontier,
but they also say there exist other SPEs with payoff profiles close to players’ minimax payoffs.
The following is a folk theorem for infinitely repeated games with perfect monitoring.
Theorem 95 (Fudenberg and Maskin, 1986). Write V ∗ for the set of feasible and strictly IR payoff profiles of G.
Assume V ∗ has full dimensionality. For any v∗ ∈ V ∗ , there exists δ ∈ (0, 1) such that v∗ ∈ E Gδ (∞) for all δ ∈ (δ, 1).
5.2 Rewarding minimaxers. The proof of Theorem 95 is constructive and explicitly defines an SPE with average payoff
v∗ . To ensure subgame-perfection, the construction must ensure that −i have an incentive to minimax i in the event
that i deviates. It is possible that the minimax action against i hurts some other player j , i so much that j would
prefer to be minimaxed instead of minimaxing i. The solution, as we saw in lecture, is to promise a reward of ε > 0
in all future periods to players who successfully carry out their roles as minimaxers.39 This way, at a history that calls
for players to minimax i, deviating from the minimax action loses an infinite stream of ε payoffs. As players become
more patient, this infinite stream of strictly positive payoffs matters far more than the utility cost from finitely many
periods of minimaxing i.
39 The strategy profile used in the proof of Theorem 95 is often called the “stick and carrot strategy”. If a player deviates during the normal phase,
the deviator is hit with a “stick” for finitely many periods. Then, all the other players are given a “carrot” for having carried out this sanction.
50
Economics 2010a . Section 6: Dynamic Games (II) 11/28/2021
(1) Extensions of the folk theorem; (2) Refinements of NE; (3) Signaling games
TF: Chang Liu (chang_liu@g.harvard.edu)
1.1 Drop minimaxers’ rewards. Sometimes, no complicated reward schemes as in Theorem 95 are necessary. This is
the case when minimaxing i is not particularly costly for her opponents, as the following example demonstrates.
Example 96 (December 2012 Final Exam). Consider an infinitely repeated game with the following symmetric stage
game.
L C R
T −4, −4 12, −8 3, 1
M −8, 12 8, 8 5, 0
B 1, 3 0, 5 4, 4
Construct a pure strategy profile of the repeated game with the following properties: (i) the strategy profile is an SPE
of the repeated game for all δ close enough to 1; (ii) the average payoffs are (8, 8); (iii) in every subgame, both players’
payoffs are nonnegative in each period.
Solution:
We quickly verify that each player’s pure minimax payoff (i.e., when minimaxers are restricted to using pure strategies)
is 1. P1 minimaxes P2 with T , who best responds with R, leading to the payoff profile (3, 1). Symmetrically, P2
minimaxes P1 with L, who best responds with B, giving us the payoff profile (1, 3). So, (8, 8) is feasible and strictly
individually rational, even when restricting attention to pure strategies.
However, we cannot directly recite Theorem 95, for the construction there uses a public randomization device in
several places – for instance to give the ε > 0 reward to minmiaxers – but the question asks for a pure strategy profile.
Even if we are allowed to use public randomizations, we still face the additional restriction that we cannot let any
player get a negative payoff in any period, even off-path. If we publicly randomize over some action profiles, then we
are restricted to those action profiles in the lower right corner of the payoff matrix in all subgames.
Perhaps the easiest solution is to build a simpler SPE and forget about giving the ε > 0 reward to minimaxers
altogether. This is possible because for this particular stage game, the minimaxer gets utility 3 while the minimaxee
gets utility 1, so it is better to minimax than to get minimaxed. Consider an SPE given by three phases: in normal
phase, play (M, C); in minimax P1 phase, play (B, L); in minimax P2 phase, play (T, R). If player i deviates during
normal phase, go to minimax Pi phase. If player j deviates during minimax Pi phase, go to minimax P j phase, where
possibly j = i. If minimax Pi phase completes without deviations, go to normal phase.
We verify this strategy profile is an SPE for δ near enough 1 using one-shot deviation principle. Due to symmetry, it
suffices to verify P1 has no profitable one-shot deviation in any subgame. For any subgame in normal phase, deviating
gives at most
δ2
12 + δ · 1 + · 8, (6)
1−δ
while not deviating gives
δ2
8+δ·8+ · 8. (7)
1−δ
Equation (7) minus equation (6) gives
−4 + δ · 7,
which is positive for δ ≥ 47 .
For a subgame in the minimax P1 phase, deviating not only hurts P1’s current period payoff, but also leads to another
period of P1 being minImaxed. So P1 has no profitable one-shot deviation in such subgames for any δ.
51
For a subgame in the minimax P2 phase, deviating gives at most
δ2
5+δ·1+ · 8, (8)
1−δ
while not deviating gives
δ2
3+δ·8+ · 8. (9)
1−δ
Equation (9) minus equation (8) gives
−2 + δ · 7,
which is positive for δ ≥ 2
7.
n o
Therefore this strategy profile is an SPE whenever δ ≥ max 7, 7
4 2
= 47 .
It turns out minimaxer rewards are generally unnecessary when there are only two players40 , as the following theorem
shows. In particular, this says we can drop the full-dimensionality assumption from the Fudenberg-Maskin theorem
when n = 2.
Theorem 97 (Fudenberg and Maskin, 1986). Write V ∗ for theset of feasible and strictly IR payoff profiles of G where
δ
n = 2. For any v ∈ V , there exists δ ∈ (0, 1) such that v ∈ E G (∞) for all δ ∈ (δ, 1).
∗ ∗ ∗
Choose M large enough so that Mv∗i ≥ 2vi for each i ∈ {1, 2}. Write ui as the payoff to i when i and −i both play the
minimax actions against each other. Note that ui ≤ 0.
Consider a subgame in normal phase. If player i makes a one-shot deviation, she gets at most:
δ M+1 ∗
vi + δui + δ2 ui + · · · + δ M ui + v, (10)
1−δ i
while conforming gives
δ M+1 ∗
v∗i + δv∗i + δ2 v∗i + · · · + δ M v∗i + v. (11)
1−δ i
Equation (11) minus equation (10) gives
v∗i − vi + (δ + · · · + δ M )(v∗i − ui ),
which is no less than −vi + (δ + · · · + δ M )v∗i since v∗i > 0 and ui ≤ 0. But for δ close to 1, δ + · · · + δ M ≥ M/2,
hence implying −vi + (δ + · · · + δ M )v∗i ≥ 0 by the choice of M. So for δ large enough, there are no profitable one-shot
deviations in normal phase.
Consider a subgame in the first period of the mutual minimax phase. If player i deviates, she gets at most:
δ M+1 ∗
0 + δui + δ2 ui + · · · + δ M ui + v, (12)
1−δ i
where the opponent playing the minimax strategy against i implies that her payoff in the period of deviation is bounded
by 0. On the other hand, conforming gives
δ M+1 ∗
ui + δui + δ2 ui + · · · + δ M v∗i + v. (13)
1−δ i
40 However, the SPE from the proof of Theorem 97 is not allowed in Example 96, as it involves players getting payoffs (−4, −4) in some periods
52
Equation (13) minus equation (12) gives:
(1 − δ M )ui + δ M v∗i ,
which is positive for δ sufficiently close to 1 since ui ≤ 0 and v∗i > 0. This shows i does not have a profitable one-shot
deviation in the first period of the mutual minimax phase. A fortiori, she cannot have a profitable one-shot deviation
in later periods of the mutual minimax phase either.
This completes the proof.
Example 98 (From old problem sets of Jerry Green). Consider the infinitely repeated game whose stage game is as
below and whose common discount factor is 21 .
A D
A 2, 3 1, 5
D 0, 1 0, 1
Show that ((A, A), (A, A), . . . ) cannot be sustained in any SPE path.
Solution:
Consider the incentives of player 2. For this we can use the one-shot deviation principle. On path player 2 is getting 3
each period. The most profitable one-shot deviation of player 2 is to D, and it gives a current gain of 5 − 3 = 2. The
heaviest punishment for the deviation is minimaxing player 2 forever after the deviation, which gives a per-period loss
of 1 − 3 = −2. Since the discount factor is 21 , the deviation to D in a single period, followed by the punishment of
minimaxed forever gives a utility difference of
1
2+ 2
1
· (−2) = 0.
1− 2
Any weaker punishment (in terms of giving a higher continuation payoff) would cause player 2 to deviate. Thus, to
sustain (A, A) on path forever, in the subgame when player 2 deviates player 1 has to play D for all eternity. However,
for player 1, A dominates D in the stage game, so playing D forever leads to an average payoff strictly lower than her
IR payoff, which is a contradiction!
Example 99 (December 2016 Final Exam). Consider the three player game given by the following two tables (player
3 chooses the matrix).
a3 a2 b2 b3 a2 b2
a1 2, 2, 2 1, 1, 1 a1 1, 1, 1 1, 1, 1
b1 1, 1, 1 1, 1, 1 b1 1, 1, 1 2, 2, 2
What is the set of payoff triplets that can arise as average equilibrium payoffs of the infinitely repeated game with
discount factor δ when δ is close to 1? Justify your answer.
Solution:
Note that the version of the Theorem 95 doesn’t help here, because the set of feasible payoffs (the segment connecting
(1, 1, 1) and (2, 2, 2)) is one-dimensional and there are three players.
Since (2, 2, 2) is NE payoff of the stage game, it is automatically attainable as average SPE payoff, irrespective of the
discount factor.
Each player mixing with probabilities 12 between her two strategies is another NE in the stage game, which gives
payoff 45 , 54 , 54 . Infinite repetition of this Nash profile leads to average SPE payoff 54 , 54 , 54 .
Public randomization over the above two SPEs gives us average SPE payoffs the straight line between 45 , 54 , 45 and
(2, 2, 2), both vertices included.
We now show that any payoff strictly below 54 cannot be sustained in any SPE. Denote by
the lowest SPE payoff of a player (from symmetry this has to be the same for all players). Take any SPE payoff
(v, v, v). Suppose that it is attained when players follow strategy σ. We have seen in the lecture that from all mixture of
53
actions in the first period, there exists a player that can “deviate” and get at least 54 . Hence, for σ to be SPE, this player
should find such a one-shot deviation not worthwhile. The deviation yields an average payoff at least (1 − δ) 45 + δα, so
v ≥ (1 − δ) 54 + δα. But v is arbitrary, so it follows that α ≥ (1 − δ) 54 + δα, which implies that α ≥ 45 .
This completes the proof. The set of possible SPE payoffs is the straight line between 54 , 54 , 54 and (2, 2, 2), both
vertices included.
1.2 The folk theorem for finitely repeated games. In view of Proposition 90, the stage game G must have multiple NEs
for G(T ) to admit more than one SPE. Unlike an infinitely repeated game, a finitely repeated game “unravels” because
some NE must be played in the last period. However, if G has multiple NEs, then conditioning which NEs get played
in the last few periods of G(T ) on players’ behavior in the early periods of the repeated game provides incentives for
cooperation. The following result is not the most general one, but it shows how one can use the multiplicity of NEs in
the stage game to incentivize cooperative behavior for most of the T periods.
Proposition
100.
Suppose that each player’s stage game payoffs
from Nash equilibria can vary. n That is, for eacho
i ∈ N, ui α(i) = maxα∈NE(G) ui (α) > minα∈NE(G) ui (α) = ui α(i) . Write di ≡ maxai ,a0i ∈Ai ,a−i ∈A−i ui (a0i , a−i ) − ui (ai , a−i )
for an
upper
bound on the deviation utility to player i in the stage game. Let integer Mi be large enough such that
Mi · ui α(i) − ui α(i) ≥ di , and let M ≡ i∈N Mi . For any feasible payoff profile v∗ with v∗i ≥ ui α(i) for each player
P
i, and any integer T ≥ M, there exists an SPE of G(T ) where the average payoff is v∗ for all except the last M periods.
Proof. Consider the following strategy profile. In the first T − M periods, if no one has deviated so far, publicly
randomize so that expected payoff profile is v∗ . If some players have deviated and player i was the first to deviate, then
play α(i) for the remainder of these first T − M periods. In the last M periods, if no one deviated in the first T − M
periods, then play α(1) for M1 periods, followed by α(2) for M2 periods, ..., finally α(n) for Mn periods. If someone
deviated in the first T − M periods and i was the first to deviate, then do the same as before except play α(i) in the Mi
periods where α(i) was played.
We use the one-shot deviation principle to argue this strategy profile forms an SPE. At any subgame starting in the
first T − M periods without prior deviations, suppose that player i deviates. Compared with conforming to the SPE
strategy, player i gains at most
di in the current period, but gets weakly worse payoffs for the remainder of these first
T − M periods as v∗i ≥ ui α(i) . In addition, i loses at least di utility across Mi periods in the last M periods of the game
by choice of Mi . Therefore, player i does not have a profitable one-shot deviation at any subgame starting in the first
T − M periods without prior deviations.
At a subgame starting in the first T − M periods with prior deviation, the SPE specifies playing some NE action profile
of the stage game thereafter. Deviation can only hurt current period payoff with no effect on the payoffs of any future
periods. Similar reasoning holds for subgames starting in the last M periods.
Example 101 (December 2013 Final Exam). Suppose the following game is repeated T times and each player maxi-
mizes the sum of her payoffs in these T plays. Show that, for every ε > 0, we can choose T big enough so that there
exists an SPE of the repeated game in which each player’s average payoff is within ε of 2.
A B C
A 2, 2 −1, 3 0, 0
B 3, −1 1, 1 0, 0
C 0, 0 0, 0 0, 0
Solution:
For given ε > 0, choose T large enough such that 2(T −1)+1 T > 2 − ε. Consider the following strategy profile for both
players: in period t < T , play A if (A, A) has been played in all previous periods, else play C. In period T , play B if
(A, A) has been played in all previous periods, else play C. At a history in period t ≤ T − 1 where (A, A) has been
played in all previous periods, a one-shot deviation at most gains 1 in the current period but loses 2 in each of periods
t + 1, t + 2, . . . , T − 1, and finally loses 1 in period T . At a history in period t ≤ T − 1 with prior deviation, one-shot
deviation hurts current period payoff and does not change future payoffs. At a history in period T , clearly there is no
profitable one-shot deviation as this is the last period of the repeated game and the strategy profile prescribes playing
a Nash equilibrium of the stage game.
54
2 Refinements of NE in Extensive and Normal Form Games
2.1 Four refinements. In lecture we studied four refinements of NE for extensive form games: perfect Bayesian
equilibrium (PBE), sequential equilibrium (SE), trembling-hand perfect equilibrium (THPE), and strategically
stable equilibrium (SSE). Whereas specifying an NE or SPE just requires writing down a profile of strategies, PBE
and SE are defined in terms of not only a strategy profile, but also a belief system π – that is, a collection of distributions
π j (·|I j ) ∈ ∆(I j ) over the vertices in information set I j for each information set of each player j. The four refinements
differ in terms of some consistency conditions they impose on the belief system.
Definition 102 (Perfect Bayesian equilibrium). A (weak) perfect Bayesian equilibrium (PBE) is a strategy profile
together with a belief system, (σ, π), so that:
1. For all player j ∈ N and information set I j ∈ I j , σ j maximizes expected payoffs starting from information set
I j according to belief system π:
u j (σ j , σ− j |I j , π) ≥ u j (σ0j , σ− j |I j , π) for all σ0j ∈ ∆(S j ).
2. For all on-path41 information sets I j , π j (·|I j ) is derived from Bayes’ rule.
If an information set I j is reached with strictly positive probability under σ, then the conditional probability of having
reached each vertex v ∈ I j given that I j is reached, π j (v|I j ), is well-defined. On the other hand, we cannot use Bayes’
rule to compute the conditional probability of reaching various vertices in an off-path information set, as we would be
dividing by 0. As such, PBE places no restrictions on these off-path beliefs.
Definition 103 (Sequential equilibrium). A sequential equilibrium (SE) is a strategy profile together with a belief
system, (σ, π), so that:
1. For all player j ∈ N and information set I j ∈ I j , σ j maximizes expected payoffs starting from information set
I j according to belief system π:
u j (σ j , σ− j |I j , π) ≥ u j (σ0j , σ− j |I j , π) for all σ0j ∈ ∆(S j ).
n o
2. There exists a sequence of strictly mixed strategies σ(m) so that σ(m) → σ, and furthermore π(m) → π, where
for each m, π(m) is the unique belief system consistent with σ(m) under Bayes’ rule.
Though it is not part of the definition, it is easy to show that in an SE, all on-path beliefs are given by Bayes’ rule, just
as in PBE.
Compared to PBE, SE places some additional restrictions on off-path beliefs. Instead of allowing them to be
completely arbitrary, SE insists that these off-path beliefs must be attainable as the limiting beliefs of a sequence
of strictly mixed strategy profiles that converge to σ – hence the name “sequential equilibrium”. Given a strictly
mixed σ(m) , every information set is reached with strictly positive probability. Therefore, the belief system π(m) is
well-defined, as there exists exactly one such system consistent with σ(m) under Bayes’ rule.
Importantly, there are no assumptions of rationality on the sequence of strategies σ(m) . It is merely a device used
to justify how the belief system π might arise. In particular, there is no requirement that σ(m) forms any kind of
equilibrium under beliefs π(m) .
There is one special case where a PBE is automatically an SE.
Proposition 104. If all non-singleton information sets of all players are on-path in a PBE, then that PBE is an SE.
Relatedly, there is a case where an extensive form NE is automatically SE.
Proposition 105. Suppose σ is a strictly mixed Nash equilibrium in an extensive form game. Let π be the unique
belief system consistent with σ under Bayes’ rule. Then (σ, π) is a sequential equilibrium.
The next two equilibrium concepts, THPE and SSE, are defined in terms of trembles. A tremble ε : M → (0, 1] in
an extensive form game associates a small, positive probability to each move in each information set, interpreted as
the minimum weight that any strategy must assign to the move. That is, the constraint we impose on σ j for all player
j ∈ N, information set I j ∈ I j and move mI j ∈ MI j is σI j (mI j ) ≥ ε(mI j ).
The strategy profile σ is said to be an ε-constrained equilibrium if at each information set I j , σ j maximizes j’s
expected payoff subject to the constraint of minimum weights from the tremble ε. Again, due to strictly mixing, there
exists exactly one belief system consistent with σ under Bayes’ rule.
41 An information set is called on-path if it is reached with strictly positive probability under σ. Else, it is called off-path.
55
Definition 106 (Trembling-hand perfect equilibrium).n A otrembling-hand perfect equilibrium (THPE) is a strategy
profile σ so that there exists a sequence of trembles ε(m) converging to 0 and a sequence of strictly mixed strategies
n o
σ(m) so that σ(m) → σ and for each m, σ(m) is an ε(m) -constrained equilibrium.
Definition 107 (Strategically stable equilibrium).
n o A strategically stable equilibrium (SSE) is a strategy profilen σ soo
that for every sequence of trembles ε(m) converging to 0, there exists a sequence of strictly mixed strategies σ(m)
so that σ(m) → σ and for each m, σ(m) is an ε(m) -constrained equilibrium.
THPE and SSE are also defined for normal form games, where the tremble ε specifies minimum weights for the
different actions of all players.
The following table summarizes some of the key comparisons between these four equilibrium concepts.
Belief at on-path info. set Belief at off-path info. set Robustness to trembles
PBE Bayes’ rule No restriction Not robust
SE Bayes’ rule Limit of beliefs associated with one Not robust
sequence of strictly mixed profiles
THPE N/A N/A Robust to one sequence of trembles
SSE N/A N/A Robust to any sequence of trembles
PBE(Γ)
NE(Γ) ⊇ ⊇ SE(Γ) ⊇ THPE(Γ) ⊇ SSE(Γ).
SPE(Γ)
That is, there is always at least one THPE in a finite extensive form game. The immediate implication is that SE, SPE,
PBE, and NE are also non-empty equilibrium concepts.
2.2 Some examples. We illustrate these refinement concepts through two examples. The first example shows an
extensive form game where we have strict inclusions: NE(Γ) ) PBE(Γ) ) SE(Γ).
Example 110 (A modified market entry game). Consider the following modification to the entry game. The entrant
(P1) chooses whether to stay out or enter the market. If she enters, nature then determines whether her product is
good or bad, each with 50% probability. Incumbent (P2) observes entry decision, but not whether the product is good
or bad. If entrant enters, the incumbent can choose to Allow entry, Fight, or Fight Fiercely. This extensive game is
depicted in Figure 14. Let’s write I2 for P2’s information set and abbreviate strategies in the obvious way (eg. (O, F)
is the strategy profile where P1 plays Out, P2 plays Fight). Restrict attention to pure strategy equilibria.
56
1
O I
(0, 2)
g (w.p. 0.5) b (w.p. 0.5)
2 2
A FF A FF
F F
However, for belief π2 (b|I2 ) ≥ 32 , ((O, F), π2 (·|I2 )) forms a PBE. Note that I2 is off-path under strategy profile
(O, F), so P2 is allowed to hold any belief. P2’s payoff from F is (1 − π2 (b|I2 )) · (−1) + π2 (b|I2 ) · 2 = 3π2 (b|I2 ) − 1.
Given belief π2 (b|I2 ) ≥ 32 , P2’s payoff from F which is greater than his payoff from A or FF.43
In addition, (I, A), π2 (b|I2 ) = 21 is another PBE. In fact, this is the only PBE featuring strategy profile (I, A),
since the information set I2 is on-path for this strategy profile and so π2 (·|I2 ) must be derived from Bayes’ rule.
SE requires that P2’s action at the information set maximizes payoff subject to belief, yet under the belief
π2 (b|I2 ) = 12 , P2 finds it strictly profitable to deviate to A.
We finally check that (I, A), π2 (b|I2 ) = 12 is an SE44 . It is straightforward to verify that actions maximize ex-
pected payoff at each information set given belief in (I, A), π2 (b|I2 ) = 21 . Now, consider a sequence of strictly
mixed strategy profiles σ(m) = m1 O ⊕ 1 − m1 I, 1 − m1 A ⊕ 2m 1
F ⊕ 2m1
FF . It is easy to see that σ(m) → (I, A).
2 (b|I2 ) = 2 , so we get π2 (b|I2 ) → π2 (b|I2 ).
Furthermore, for each such profile, π(m) 1 (m)
The second example illustrates THPE and SSE in a normal form game.
Example 111 (December 2013 Final Exam). In Example 26, we considered the normal form game
L R Y
T 2, 2 −1, 2 0, 0
B −1, −1 0, 1 1, −2
X 0, 0 −2, 1 0, 2
two pure Nash equilibria, (T, L) and (B, R), as well as infinitely many mixed Nash equilibria, (T, pL ⊕ (1 − p)R) for
p ∈ [ 41 , 1). Now find all the THPEs and SSEs of this game.
Solution:
43 Note that this PBE is not an SPE since the strategy profile does not form an NE when restricted to the subgame starting with the chance move.
Recall that in lecture we saw an example of an SPE that is not a PBE. This completes the argument that neither the set of SPEs nor the set of PBEs
nests the other one.
44 This does not follow from the non-emptiness of SE as an equilibrium concept, since we have restricted attention to pure equilibria. A game
57
First we show (T, pL ⊕ (1 − p)R) is not a THPE for any p ∈ [ 14 , 1] (so we also rule out the pure (T, L)). Suppose there is
a sequence of strictly mixed strategy profiles σ(m) , each of which is an ε(m) -constrained equilibrium for the sequence of
trembles, ε(m) , that converge to 0. Since σ(m)
1 (B) > 0 and σ1 (X) > 0 for each m, R is a strictly better response than
(m)
L against σ1(m) for each m. This means σ(m)2 (L) = ε (L) for each m. Since σ2 → σ2 , it follows that σ2 (L) = 0 < 4 .
(m) (m) 1
But we know that THPE(G) ⊆ NE(G) and THPE(G) , ∅, so (B, R) must be the unique THPE.
n o
Now we check that (B, R) is also an SSE.45 Consider any sequence of trembles ε(m) converging to 0. Suppose
P2 plays ε(m) 2 (L)L ⊕ 1 − ε2 (L) − ε2 (Y) R ⊕ ε2 (Y)Y.
(m) (m) (m)
Then P1 gets 2ε(m)
2 (L) − 1 − ε(m)2 (L) − ε(m)
2 (Y) from T ,
2 (L) + ε2 (Y) from B, and −2 1 − ε2 (L) − ε2 (Y) from
−ε(m) (m) (m) (m)
X. Whenever ε(m) 2 (L), ε2 (Y) < 0.1, it is clear that B is
(m)
the unique best response, and thus P1 will play ε(m)1 (T )T ⊕ 1 − ε1 (T ) − ε1 (X) B ⊕ ε1 (X)X. Similarly,
(m) (m) (m)
suppose P1
plays ε1 (T )T ⊕ 1 − ε1 (T ) − ε1 (X) B ⊕ ε1 (X)X. Then P2 gets 2ε1 (T ) − 1 − ε1 (T ) − ε1 (X) from playing
(m) (m) (m) (m) (m) (m) (m)
(m)
L, 2ε1(m) (T ) + 1 − ε(m)
1 (T ) − ε1 (X) + ε1 (X) from playing R, and −2 1 − ε1 (T ) − ε1 (X) + 2ε1 (X) from playing
(m) (m) (m) (m)
Whenever
Y. ε(m)
1 (T ), ε1 (X) < 0.1, it is clear that R is the unique best response, and
(m)
n thus
o P2 will play ε(m)
2 (L)L ⊕
1 − ε(m)
2 (L) − ε(m)
2 (Y) R⊕ε (m)
2 (Y)Y. This means that, given any sequence of trembles ε(m)
converging to 0, eventually
in an ε(m) -constrained equilibrium, P1 puts as much weight as possible on B while P2 puts as much weight as possible
on R – in fact, this happens as soon as the maximum tremble in ε(m) falls below 0.1. Then, σ(m) → (B, R) shows that
(B, R) is an SSE.
3 Signaling Games
3.1 Strategies, beliefs, and PBEs in signaling games. Signaling games form an important class of examples within
extensive form games with incomplete information. For a schematic representation, see Figure 15.
θ = θ1 θ = θ2
P1 P2 P1
Nature determines state of the world, θ ∈ Θ = {θ1 , θ2 }, according to a common prior. P1 is informed of this state. P1
then selects a message from a possibly infinite message set A1 and sends it to P2. In the buyer-seller example from
class, for instance, the state of the world is the quality of the product while the message is a (price, quantity) pair that
the seller offers to buyer.
P2 does not observe the state of the world, but observes the message that P1 sends. This means P2 has one information
set for every message in A1 .
A PBE (σ1 , σ2 , π2 ) in the signaling game must then have the following components:
2. σ2 : A1 → ∆(A2 ) for P2, that is how to respond to every message that P1 could send (even the off-path messages
not sent by P1’s strategy).
3. π2 : A1 → ∆(Θ) for P2, that is what to believe after receiving every message that P1 could send.
45 Since the set of SSE is not always non-empty, we cannot immediately conclude that (B, R) must be an SSE.
58
The requirements are that:
θ = θ1 θ = θ2
P1 P2 P1
θ = θ1 θ = θ2
P1 P2 P1
When there are two states of the world (i.e., two “types” of P1), pure PBEs can be classified into two families, as
illustrated in Figure 16. In a separating PBE, the two types of P1 send different messages, say a01 , a001 . By Bayes’
rule, each of these two messages perfectly reveals the state of the world in the PBE. In a pooling PBE, the two types of
P1 send the same message, say a000 . By Bayes’ rule, P2 should keep his prior about the state of the world after seeing
a000 in such a PBE. In a PBE from either family, most of P2’s information sets (i.e., messages he could receive from
P1) are off-path. PBE allows P2 to hold arbitrary beliefs on these off-path information sets. In fact, we (the analysts)
will often want to pick “pessimistic” off-path beliefs to help support some strategy profile as a PBE. The following
example will illustrate the role of these off-path beliefs in sustaining equilibrium.
3.2 An example. We illustrate separating and pooling PBEs in a civil lawsuit example.
Example 112 (Civil lawsuit). Consider a plaintiff (P1) and a defendant (P2) in a civil lawsuit. Plaintiff knows whether
she has a strong case (θH ) or weak case (θL ), but the defendant does not. Defendant has prior belief that π(θH ) = 31 ,
π(θL ) = 23 . The plaintiff can ask for a low settlement or a high settlement, A1 = {1, 2}. The defendant accepts or
refuses, A2 = {y, n}. If the defendant accepts a settlement offer of x, the two players settle out-of-court with payoffs
(x, −x). If defendant refuses, the case goes to trial. If the case is strong (θ = θH ), plaintiff wins for sure and the payoffs
are (3, −4). If the case is weak (θ = θL ), the plaintiff loses for sure and the payoffs are (−1, 0). The extensive form
representation of this example is given by Figure 17.
Focus on pure strategy PBEs.
59
(1, −1) y y (2, −2)
1 P1 2
P2 c P2
θ = θL π(θL ) = 2
3
(1, −1) y y (2, −2)
n 1 P1 2 n
(−1, 0) (−1, 0)
Separating equilibrium: Typically, there are multiple potential separating equilibria, depending on what action each
type of P1 plays. Be sure to check all of them.
1. s1 (θH ) = 2, s1 (θL ) = 1.
In any such PBE we must have π2 (θH |2) = 1, π2 (θL |1) = 1, s2 (2) = y, s2 (1) = n, as is illustrated in Figure 18.
But this means type θL gets −1 in PBE and has a profitable unilateral deviation by playing s01 (θL ) = 2 instead.
Asking for the high settlement makes P2 think P1 has a strong case, so that P2 will settle and P1 will get 2
instead of −1. Therefore no such PBE exists.
P2 c P2
θ = θL π(θL ) = 2
3
(1, −1) y y (2, −2)
n 1 P1 2 n
(−1, 0) (−1, 0)
60
(1, −1) y y (2, −2)
1 P1 2
P2 c P2
θ = θL π(θL ) = 2
3
(1, −1) y y (2, −2)
n 1 P1 2 n
(−1, 0) (−1, 0)
Pooling equilibrium: In a pooling equilibrium all types of P1 play the same action. When this “pooled” action a∗1
is observed, P2’s posterior belief is the same as the prior, π2 (θ|a∗1 ) = π(θ), since the action carries no additional
information about P1’s type. When any other action is observed (i.e. an off-path action is observed), PBE allows P2’s
belief to be arbitrary. Every member of A1 could serve as a pooled action, so we need to check for all of them
systematically.
1. s1 (θH ) = s1 (θL ) = 1.
In any such PBE we must have π2 (θH |1) = 31 . Under this belief, P2’s expected payoff to a2 = n is 31 ·(−4)+ 23 ·0 =
− 43 , while playing a2 = y always yields −1. Therefore in any such PBE we must have s2 (1) = y, which gives
both types of P1 payoff 1, as is illustrated in Figure 20. But then the θH type of P1 has a profitable unilateral
deviation of s01 (θH ) = 2, regardless of what s2 (2) is! If s2 (2) = y, that is P2 accepts the high settlement, then
type θH P1’s deviation gives her a payoff of 2 rather than 1. If s2 (2) = n, that is P2 refuses the high settlement,
then this is even better for the type θH P1 as she will get a payoff of 3 when the case goes to court. Therefore no
such PBE exists.
P2 c P2
θ = θL π(θL ) = 2
3
(1, −1) y y (2, −2)
n 1 P1 2 n
(−1, 0) (−1, 0)
2. s1 (θH ) = s1 (θL ) = 2.
In any such PBE we must have π2 (θH |2) = 13 . Under this belief, P2’s expected payoff to a2 = n is 31 ·(−4)+ 23 ·0 =
− 43 , while playing a2 = y always yields −2. Therefore in any such PBE we must have s2 (2) = n, which give
type θH payoff of 3 and type θL payoff of −1. Type θH does not have incentive to deviate as she already gets her
maximum payoff. In order to prevent a deviation by type θL , we must ensure s2 (1) = n as well, as is illustrated
in Figure 21. Else, if P2 accepts the low settlement offer, θL would have a profitable deviation: offering the
low settlement instead of following the pooling action of high settlement yields her a payoff of 1 instead of −1.
Whether s2 (1) = n is optimal for P2 depends on the belief, π2 (θH |1). This is an off-path belief and PBE allows
such beliefs to be arbitrary. Suppose π2 (θH |1) = λ ∈ [0, 1]. Then P2’s expected payoff to playing s2 (1) = n is
61
λ · (−4) + (1 − λ) · 0 = −4λ, while s2 (1) = y yields −1 for sure. Therefore to ensure P2 s2 (1) = n is optimal
given belief, we need λ ≤ 14 . If P2’s off-path belief is that P1 has a strong case with probability less than 14
upon seeing a low-settlement offer, then it is optimal for P2 to reject such low-settlement offers and θL will not
have a profitable deviation. In summary, there is a family of pooling equilibria where s1 (θH ) = s1 (θL ) = 2,
s2 (1) = s2 (2) = n, π2 (θH |2) = 31 , π2 (θH |1) = λ where λ ∈ [0, 14 ]. Crucially, it is the judicious choice of off-path
belief π2 (·|1) that sustains an action of s2 (1) = n, which in turn sustains the pooling equilibrium.
P2 c P2
θ = θL π(θL ) = 2
3
(1, −1) y y (2, −2)
n 1 P1 2 n
(−1, 0) (−1, 0)
To sum up, any pure strategy PBE in this game is a pooling equilibrium (s, π) with s1 (θH ) = s1 (θL ) = 2, s2 (1) =
s2 (2) = n, π2 (θH |2) = 13 , π2 (θH |1) = λ where λ ∈ [0, 41 ].
3.3 Intuitive criterion. Some of the PBEs seem fragile, and can be broken using a speech as we saw in the lecture. This
is a heuristic, since it is hard to explicitly model speech. Cho and Kreps (1987) formalize this idea and introduce the
intuitive criterion. It aims to reduce possible outcome scenarios by: (i) restricting the type space to types of agents
who could obtain higher utility levels by deviating to off-path messages, and (ii) by considering in this subset of types
the types for which the off-path message is not dominated under opponent’s best response.
Definition 113 (Intuitive criterion). A PBE (s1 , s2 , π2 ) in a signaling game satisfies the intuitive criterion if there do
not exist (â1 , â2 , θ̂) such that:
Cho and Kreps (1987): “Despite the name we have given it, the intuitive criterion is not completely intuitive.” P2
is trying to infer P1’s type based on the off-path message â1 . The intuitive criterion makes the following restriction:
If for type θ, every response P2 might make after â1 yields strictly less payoff than equilibrium, then P2 “should be
sure” that type θ would not deviate to â1 . Why not restrict the off-path beliefs directly? One answer is that it causes
existence problems in games in which P1 has actions that are dominated for all types.
In the civil lawsuit example above, all of the pooling PBEs satisfy the intuitive criterion. The only off-path message
â1 in the pooling PBE is the low settlement offer, â1 = 1.
If θ̂ = θL , then by condition 4, â2 = n. But this shows that condition 2 must not hold, since θL gets the same payoff in
the PBE as under (â1 , â2 , θ̂) – the defendant rejects the settlement in both cases.
If θ̂ = θH , then by condition 4, â2 = y. But this shows that condition 2 must not hold, since θH was actually getting
higher payoff in PBE when defendant rejects settlement than under (â1 , â2 , θ̂), where defendant accepts settlement.
So there are no (â1 , â2 , θ̂) satisfying conditions 1 through 4, meaning the high settlement pooling PBEs satisfy the
intuitive criterion.
62
4 The End
“Begin at the beginning,” the King said, very gravely, “and go on till you come to the end: then stop."
63
7 References
Aumann, Robert J., Sergiu Hart, and Motty Perry, (1997). “The Absent-Minded Driver.” Games and Economic
Behavior 20 (1):102–116.
Brandenburger, Adam and Eddie Dekel, (1993). “Hierarchies of Beliefs and Common Knowledge.” Journal of
Economic Theory 59 (1):189–198.
Cho, In-Koo and David M. Kreps, (1987). “Signaling Games and Stable Equilibria.” The Quarterly Journal of
Economics 102 (2):179–221.
Fudenberg, Drew, David Levine, and Eric Maskin, (1994). “The Folk Theorem with Imperfect Public Information.”
Econometrica 62 (5):997–1039.
Fudenberg, Drew and Eric Maskin, (1986). “The Folk Theorem in Repeated Games with Discounting or with Incom-
plete Information.” Econometrica 54 (3):533–554.
Harsanyi, John C., (1967). “Games with Incomplete Information Played by Bayesian Players, I-III Part I. The Basic
Model.” Management Science 14 (3):159–182.
———, (1973). “Games with Randomly Disturbed Payoffs: A New Rationale for Mixed-Strategy Equilibrium Points.”
International Journal of Game Theory 2 (1):1–23.
Kandori, Michihiro and Hitoshi Matsushima, (1998). “Private Observation, Communication and Collusion.” Econo-
metrica 66 (3):627–652.
Keynes, John Maynard, (1936). The General Theory of Employment, Interest and Money. Macmillan.
Kuhn, Harold William, (1953). “Extensive Games and the Problem of Information.” In Contributions to the Theory of
Games, vol. 2. Princeton University Press, 193–216.
Ledoux, Alain, (1981). “Concours résultats complets.” Les victimes se sont plu á jouer le 14:10–11.
Mas-Collell, Andreu, Michael D. Whinston, and Jerry R. Green, (1995). Microeconomic Theory. Oxford University
Press.
Maschler, Michael, Eilon Solan, and Shmuel Zamir, (2013). Game Theory. Cambridge: Cambridge University Press.
Maskin, Eric, (1999). “Nash Equilibrium and Welfare Optimality.” The Review of Economic Studies 66 (1):23–38.
Mertens, Jean-François and Shmuel Zamir, (1985). “Formulation of Bayesian analysis for games with incomplete
information.” International Journal of Game Theory 14 (1):1–29.
Myerson, Roger B., (1981). “Optimal Auction Design.” Mathematics of Operations Research 6 (1):58–73.
———, (2013). Game Theory. Harvard University Press.
Nash, John, (1950). “Equilibrium Points in N-Person Games.” Proceedings of the National Academy of Sciences
36 (1):48–49.
———, (1951). “Non-Cooperative Games.” Annals of Mathematics 54 (2):286–295.
Osborne, Martin J. and Ariel Rubinstein, (1994). A Course in Game Theory. Cambridge: The MIT Press.
Sugaya, Takuo, (Forthcoming). “Folk Theorem in Repeated Games with Private Monitoring.” The Review of Economic
Studies .
von Neumann, John, (1928). “Zur theorie der gesellschaftsspiele.” Mathematische Annalen 100 (1):295–320.
64