09.02.2015 Dispensa GT - Print
09.02.2015 Dispensa GT - Print
Pierpaolo Battigalli
September 2014
Abstract
Milano,
P. Battigalli
Preface
These lecture notes provide an introduction to game theory, the formal analysis of strategic interaction. Game theory now pervades most
non-elementary models in microeconomic theory and many models in the
other branches of economics and in other social sciences. We introduce
the necessary analytical tools to be able to understand these models, and
illustrate them with some economic applications.
We also aim at developing an abstract analysis of strategic thinking, and a critical and open-minded attitude toward the standard gametheoretic concepts as well as new concepts.
Most of these notes rely on relatively elementary mathematics. Yet,
our approach is formal and rigorous. The reader should be familiar with
mathematical notation about sets and functions, with elementary linear
algebra and topology in Euclidean spaces, and with proofs by mathematical
induction. Elementary calculus is sometimes used in examples.
[Warning: These notes are work in progress! They probably contain some
mistakes or typos. They are not yet complete. Many parts have been translated
by assistants from Italian lecture notes and the English has to be improved.
Examples and additional explanations and clarifications have to be added. So,
the lecture notes are a poor substitute for lectures in the classroom.]
Contents
1 Introduction
1.1 Decision Theory and Game Theory . . . . . . . . . .
1.2 Why Economists Should Use Game Theory . . . . .
1.3 Abstract Game Models . . . . . . . . . . . . . . . . .
1.4 Terminology and Classification of Games . . . . . . .
1.5 Rational Behavior . . . . . . . . . . . . . . . . . . .
1.6 Assumptions and Solution Concepts in Game Theory
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Static Games
1
1
2
4
7
11
13
22
. . .
. . .
. . .
. . .
Nice
. . . .
. . . .
. . . .
. . . .
Games
.
.
.
.
.
.
.
.
.
.
50
51
53
55
61
66
5 Equilibrium
72
5.1 Nash equilibrium . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Probabilistic Equilibria . . . . . . . . . . . . . . . . . . . . . 82
v
II
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Dynamic Games
113
114
123
127
138
140
159
163
168
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
170
171
185
194
200
205
208
217
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
222
222
224
226
228
231
233
Bibliography
.
.
.
.
.
.
.
239
Introduction
Game theory is the formal analysis of the behavior of interacting
individuals. The crucial feature of an interactive situation is that the
consequences of the actions of an individual depend (also) on the actions
of other individuals. This is typical of many games people play for fun,
such as chess or poker. Hence, interactive situations are called games
and interactive individuals are called players. If a players behavior is
intentional and he is aware of the interaction (which is not always the
case), he should try and anticipate the behavior of other players. This is
the essence of strategic thinking. In the rest of this chapter we provide a
semi-formal introduction of some key concepts in game theory.
1.1
1. Introduction
1.2
We are going to argue that game theory should be the main analytical
tool used to build formal economic models. More generally, game theory
should be used in all formal models in the social sciences that adhere
to methodological individualism, i.e. try to explain social phenomena as
the result of the actions of many agents, which in turn are freely chosen
according to some consistent criterion.
The reader familiar with the pervasiveness of game theory in economics
may wonder why we want to stress this point. Isnt it well known that game
theory is used in countless applications to model imperfect competition,
bargaining, contracting, political competition and, in general, all social
interactions where the action of each individual has a non-negligible effect
on the social outcome? Yes, indeed! And yet it is often explicitly or
implicitly suggested that game theory is not needed to model situations
where each individual is negligible, such as perfectly competitive markets.
We are going to explain why this is in our view incorrect: the
bottom line will be that every complete formal model of an economic (or
social) interaction must be a game; economic theory has analyzed perfect
competition by taking shortcuts that have been very fruitful, but must be
seen as such, just shortcuts.1
If we subscribe to methodological individualism, as mainstream
economists claim to do, every social or economic observable phenomena
we are interested in analyzing should be reduced to the actions of the
1
Our view is very similar to the original motivations for the study of game theory
offered by the founding fathers von Neumann and Morgenstern in their seminal book
The Theory of Games and Economic Behavior.
Given a set of n individuals, we always call profile a list of elements from a given
set, one for each individual i. For instance, a = (a1 , ..., an ) typically denotes a profile of
actions.
3
Ties can be broken at random.
4
The model is still in some sense incomplete: we have not even addressed the issue of
what the individuals know about the situation and about each other. But the elements
sketched in the main text are sufficient for the discussion. Let us stress that what we call
complete model does not include the modelers hypotheses on how the agents choose,
which are key to provide explanations or predictions of economic/social phenomena.
1. Introduction
are observed and taken as parametrically given by all agents (including all
firms) before they act, hence before they can affect such prices; this is a
kind of logical short-circuit, but it allows to determine demand and supply
functions D(p), S(p). Next, (2) market-clearing conditions D(p) = S(p)
determine equilibrium prices. Well, this can only be seen as a (clever)
reduced-form approach; absent an explicit model of price formation (such
as an auction model), the modeler postulates that somehow the choicesprices-choices feedback process has reached a rest point and he describes
this point as a market-clearing equilibrium. In many applications of
economic theory to the study of competitive markets, this is a very
reasonable and useful shortcut, but it remains just a shortcut, forced by
the lack of what we call a complete model of the interactive situation.
So, what do we get when instead we have a complete model? As we are
going to show a bit more formally in the next section, we get what game
theorists call a game. This is why game theory should be a basic tool
in economic modelling, even if one wants to analyze perfectly competitive
situations. To illustrate this point, we will present a purely game-theoretic
analysis of a perfectly competitive market, showing not only how such an
analysis is possible, but also that it adds to our understanding of how
equilibrium can be reached.
1.3
A completely formal definition of the mathematical object called (noncooperative) game will be given in due time. We start with a semi-formal
introduction to the key concepts, illustrated by a very simple example,
a seller-buyer mini-game. Consider two individuals, S (Seller) and B
(Buyer). Let S be the owner of an object and B a potential buyer. For
simplicity, consider the following bargaining protocol: S can ask one Euro
(1) or two Euros (2) to sell the object, B can only accept (a) or reject (r).
The monetary value of the object for individual i (i = S, B) is denoted Vi .
This situation can be analyzed as a game which can be represented with
a rooted tree with utility numbers attached to terminal nodes (leaves),
player labels attached to non terminal nodes, and action labels attached
to arcs:
The game tree represents the formal elements of the analysis: the set
5
S
1 VS
VB 1
0
0
2 VS
VB 2
0
0
of players (or roles in the game, such as seller and buyer), the actions, the
rules of interaction, the consequences of complete sequences of actions and
how players value such consequences.
Formally, a game form is given by the following elements:
I, a set of players.
For each player i I, a set Ai of actions which could conceivably
be chosen by i at some point of the game as the play unfolds.
C, a set of consequences.
E, an extensive form, that is, a mathematical representation of the
rules saying whose turn it is to move, what a player knows, i.e., his
or her information about past moves and random events, and what
are the feasible actions at each point of the game; this determines
a set Z of possible paths of play (sequences of feasible actions); a
path of play z Z may also contain some random events such as the
outcomes of throwing dice;
g : Z C, a consequence (or outcome) function which assigns
to each play z a consequence g(z) in C.
1. Introduction
The above elements represent what the layperson would call the rules
of the game. To complete the description of the actual interactive
situation (which may differ from how the players perceive it) we have
to add players preferences over consequences (and - via expected utility
calculations - lotteries over consequences):
(vi )iI , where each vi : C R is a von NeumannMorgenstern utility function representing player is preferences
over consequences; preferences over lotteries over consequences are
obtained via expected utility comparisons.
1.4
1.4.1
Suppose that the players (or at least some of them) could meet before the
game is played and sign a binding agreement specifying their course of
action (an external enforcing agency e.g. the courts system will force
each player to follow the agreement). This could be mutually beneficial
for them and if this possibility exists we should take it into account. But
how? The theory of cooperative games does not model the process
of bargaining (offers and counteroffers) which takes place before the real
game starts; this theory considers instead a simple and parsimonious
representation of the situation, that is, how much of the total surplus
each possible coalition of players can guarantee to itself by means of some
binding agreement. For example, in the seller-buyer situation discussed
above, the (normalized) surplus that each player can guarantee to itself is
zero, while the surplus that the {S, B} coalition can guarantee to itself is
VS + VB . This simplified representation is called coalitional game. For
every given coalitional game the theory tries to figure out which division
of the surplus could result, or, at least, the set of allocations that are not
excluded by strategic considerations (see, for example, Part IV of Osborne
and Rubinstein, 1994).
1. Introduction
1.4.2
A game is static if each player moves only once and all players move
simultaneously (or at least without any information about other players
moves). Examples of static games are: Matching Pennies, Stone-Scissor-
S \B
a1 a2
a1 r2
r1 a2
r1 r2
p=1
1 VS , VB 1
1 VS , VB 1
0, 0
0, 0
p=2
2 VS , VB 2
0, 0
2 VS , VB 2
0, 0
A strategy in a static game is just the plan to take a specific action, with
no possibility to make this choice contingent on the actions of others, as
such actions are being simultaneously chosen. Therefore, the mathematical
representation of a static game and its normal form must be the same.
For this reason, static games are also called normal-form games, or
strategic-form games. This is an unfortunately widespread abuse of
language. In a correct language there should not be normal-form games
; rather, looking at the normal form of a game is a method of analysis.
10
1.4.3
1. Introduction
11
1.5
Rational Behavior
12
1. Introduction
The choice is made once and for all. (We will show in the part on dynamic
games that this model is much more general than it seems.) Assume,
just for simplicity, that Di and i are both finite and let, for notational
1 , 2 , ..., n }. In order to make a rational
convenience, i = {i
i
i
choice, i has to assess the probability of the different states in i . Suppose
that his beliefs about i are represented by a probability distribution
i (i ) where
(
)
n
X
k
(i ) = i Rn+ :
i (i
)=1 .
k=1
i i
d1i
d2i
..
.
1
i
2
i
...
u11
i
u12
i
...
u21
i
..
.
u22
i
..
.
...
..
.
13
1.6
14
1. Introduction
technology and then we assume that economic agents plans are mutually
consistent best responses to equilibrium prices.
As explained in section 1.2, there is an important difference: the
textbook analysis of competitive markets does not specify a priceformation mechanism and uses equilibrium market-clearing as a theoretical
shortcut to overcome this problem. On the contrary, in a gametheoretic model all the observable variables we try to explain depend on
players actions (and exogenous shocks) according to an explicitly specified
function, as in auction models.
Yet, there are also similarities with the analysis of competitive markets.
One could say that the role of prices is played (sic) by players beliefs since
we assume that they are, in some sense, mutually consistent. The precise
meaning of the statement beliefs are mutually consistent is captured by
a solution concept. The simplest and most widely used solution concept
in game theory, Nash equilibrium, assumes that players beliefs about each
other strategies are correct and each player best responds to his beliefs; as
a result, each player uses a strategy that is a best response to the strategies
used by other players.
Nash equilibrium is not the only solution concept used in game theory.
Recent developments made clear that solution concepts implicitly capture
expressible6 assumptions about players rationality and beliefs, and some
assumptions that are appropriate in some contexts are too weak, or simply
inappropriate, in different contexts. Therefore, it is very important to
provide convincing motivations for the solution concept used in a specific
application.
Let us somewhat vaguely define as strategic thinking what
intelligent agents do when they are fully aware of participating in an
interactive situation and form conjectures by putting themselves in the
shoes of other intelligent agents. As the title of this book suggest,
we mostly (though not exclusively) present game theory as an analysis
of strategic thinking. We will often follow the traditional textbook
game theory and present different solution concepts providing informal
motivations for them, sometimes with a lot of hand-waving. However,
you should always be aware that a more fundamental approach is being
6
We can give a mathematical meaning to the label expressible, but, at the moment,
you should just understand something that can be expressed in a clear and precise
language.
15
16
1. Introduction
1.6.1
Q
n
Q
n
if
Q
n
and p = 0
if
> (for notational convenience, we express inverse demand as a
function of the average output Q
n ). Each firm i has total cost function
q
1 2
C(q) = 2m
q , and marginal cost function M C(q) = m
. Each firm has
obvious preferences, it wants to maximize the difference between revenues
and costs. The technology (cost function) and preferences of each firm are
17
common knowledge.8
Now, we want to represent mathematically the assumption that each
firm is negligible with respect to the (maximum) size of the market .
Instead of assuming a large, but finite set of firms given by a fine grid I in
the interval (0, 1], we use a quite standard mathematical idealization and
assume that there is a continuum of firms,Rnormalized to the unit
interval
1
1 Pn
I = (0, 1]. Thus, the average output is q = 0 q(i)di instead of n i=1 q(i).
Each firm understands that its decision cannot affect the total output and
the market price.9
Given that all the above is common knowledge, we want to derive
the price implications of the following assumptions about rationality and
beliefs:
R (rationality): each firm has a conjecture about (q and) p and chooses
a best response to such conjecture, for brevity, each firm is rational;10
B(R) (mutual belief in rationality): all firms believe (with certainty)
that all firms are rational;
B2 (R) = B(B(R)) (mutual belief of order 2 in rationality): all firms
believe that all firms believe that all firms are rational;
...
Bk (R) = B(Bk1 (R)) (mutual belief of order k in rationality): [all firms
believe that]k all firms are rational
...
Note, we are using symbols for these assumptions. In a more formal
mathematical analysis, these symbols would correspond to events in a
space of states of the world. Here they are just useful abbreviations.
Conjunctions of assumptions will be denoted by the symbol , as in
R B(R). It is a good thing that you get used to these symbols, but
if you find them baffling, just ignore them and focus on the words.
What are the consequences of rationality (R) in this market? Let p(i)
8
The analysis can be generalized assuming heterogeneous firms, where each firm i is
characterized by the marginal cost parameter m(i).
P Then is is enough to assume that
each i knows m(i) and that the average m = n1 n
i=1 m(i) is common knowledge.
9
This example is borrowed from Guesnerie (1992).
10
The symbols R, B(R) etc. below should be read as abbreviations of sentences. They
can also be formally defined using mathematics, but this goes beyond the scope of these
lecture notes.
18
1. Introduction
Z
mp(i)di = m
p(i)di q 1 := m ,
where q 1 denotes the upper bound on average output when firms are
rational.
By mutual belief in rationality, each firm understands that q q 1 and
p max{(m )/, 0}. If m , this is not very helpful: assuming belief
in rationality does not reduce the span of possible prices and does not yield
additional implications for the endogenous variables we are interested in. It
is not hard to understand that rationality and mutual belief in rationality
m
of every order yields the same result q m
, where because
m .
So, lets assume from now on that m < . Note, this is the so called
cobweb stability condition. Given belief in rationality, B(R), each firm
expects a price at least as high as the lower bound
m
1
1
1
p :=
q =
1
> 0.
11
R1
0
m
1
19
p := ( q 2 ) =
2
m
m
X
m k
1
1
=
k=0
2
m
m
mX
m k
1
1
=
k=0
Thus, R B(R) B2 (R) implies that q q 3 and the price must be above
the lower bound
3
X
m k
1
3
3
p := ( q ) =
k=0
Can you guess the consequences for price and average output of
assuming rationality and common belief in rationality (mutual belief in
rationality of every order)? Draw the classical Marshallian cross of demandsupply, trace the upper and lower bounds found above and go on. You will
find the answer.
More formally, define the following sequences of upper bounds and
lower bounds on the price:
L
m k
X
, L even
k=0
L
X
m k
, L odd
: =
pL : =
pL
k=0
L1
T
B k (R) (rationality
k=1
20
1. Introduction
k
P
m
is increasing in `, and they both converge to
p2`+1 = 2`+1
k=0
p1
P (q)
q2
q3
q1
21
Part I
Static Games
22
Sometimes static games are also called normal form games or strategic form
games. As we mentioned in the Introduction, this terminology is somewhat misleading.
The normal, or strategic form of a game has the same structure of a static game, but
the game itself may have a sequential structure. The normal form of shows the
payoffs induced by any combination of plans of actions of the players. Some game
theorists, including the founding fathers von Neumann and Morgenstern, argue that
from a theoretical point of view all strategically relevant aspects of a game are contained
in its normal form. Anyway, here by static game we specifically mean a game where
players move simultaneously.
23
24
The structure hI, C, (Ai )iI , gi, that is, the game without the utility
functions, is called game form. The game form represents the essential
features of the rules of the game. A game is obtained by adding to the game
form a profile of utility functions (vi )iI = (v1 , ..., vn ), which represent
players preferences over lotteries of consequences, according to expected
utility calculations.
From the consequence function g and the utility function vi of player
i, we obtain a function that assigns to each action profile a = (aj )jI the
utility vi (g(a)) for player i of consequence g(a). This function
ui = vi g : A1 A2 ... An R
is called the payoff function of player i. The reason why ui is called
the payoff function is that the early work on game theory assumed that
consequences are distributions of monetary payments, or payoffs, and that
players are risk neutral, so that it is sufficient to specify, for each player i,
the monetary payoff implied by each action profile. But in modern game
theory ui (a) = vi (g(a)) is interpreted as the von Neumann-Morgenstern
utility of outcome g(a) for player i. If there are monetary consequences,
then
g = (g1 , ..., gn ) : A Rn ,
where mi = gi (a) is the net gain of player i when a is played. Assuming
that player i a selfish, then function vi is strictly increasing in mi and
constant in each mj with j 6= i (note that selfishness is not an assumption
of game theory, it is an economic assumption that may be adopted in game
theoretic models). Then function vi captures the risk attitudes of player
i. For example, i is strictly risk averse if and only if vi is strictly concave.
Games are typically represented in the reduced form
G = hI, (Ai , ui )iI i
25
which shows only the payoff functions. We often do the same. However, it
is conceptually important to keep in mind that payoff functions are derived
from a consequence function and utility functions.
For simplicity, we often assume that all the sets Ai (i I) are finite;
when the sets of actions are infinite, we explicitly say so, otherwise they
are tacitly assumed to be finite. This simplifies the analysis of probabilistic
beliefs. A static game with finite action sets is called a finite static game
(or just a finite game, if static is clear from the context). When the
finiteness assumption is important to obtain some results, we will explicitly
mention that the game is finite in the formal statement of the result.
We call profile a list of objects (xi )iI = (x1 , ..., xn ), one object
xi from some set Xi for each player i I. In particular, an action
profile is a list a = (a1 , ..., an ) A := iI Ai .2 The payoff function
ui numerically represents the preferences of player i among the different
action profiles a, a0 , a00 ... A. The strategic interdependence is due to the
fact that the outcome function g depends on the entire action profile a and,
consequently, the utility that a generic individual i can achieve depends
not only on his choice, but also on those of other players.3 To stress how
the payoff of i depends on a variable under is control as well as on a vector
of variables controlled by other individuals, we denote by i = I\{i} the
set of individuals different from i, we define
Ai := A1 ... Ai1 Ai+1 ... An
and we write the payoff of i as a function of the two arguments ai Ai
and ai Ai , i.e. ui : Ai Ai R.
In order to be able to reach some conclusions regarding players
behavior in a game G, we impose two minimal assumptions (further
2
26
This specification assumes that players are risk-neutral with respect to their
consumption of public good.
27
example clarifies that the outcome map a 7 g(a) may depend on personal
features of the agents playing the game: in this case, the productivity i of
each agent affects how the pair of efforts (a1 , a2 ) (say, the hours worked)
is mapped to the output. Therefore, the outcome function g is not always
fully determined by the rules of the game.
Unlike the games people play for fun (and those that experimental
subjects play in most game theoretic experiments), in many economic
games the rules of the game may not be fully known. In the above example,
for instance, it is possible that player i knows his own productivity
parameter i , but does not know K or i . Thus, assumption [1] is
substantive: i may not know the consequence function g and hence he
may not know ui = vi g.
The complete information assumption that we will consider later on
(e.g. in chapter 4) is much stronger; recall from the Introduction that there
is complete information if the rules of the game and players preferences
over (lotteries over) consequences are common knowledge. Although, as we
explained above, the rules of the game may be only partially known, there
are still many interactive situations where it is reasonable to assume that
they are not only known, but indeed commonly known. Yet, assuming
common knowledge of players preferences is often far-fetched. Thus,
complete information should be thought as an idealization that simplifies
the analysis of strategic thinking. Chapter 7 will introduce the formal tools
necessary to model the absence of complete information.
3.1
1
In general, we use the word conjecture to refer to a players belief about variables
that affect his payoff and are beyond his control, such as the actions of other players in
a static game.
28
29
(X). If X is finite2
(
(X) :=
RX
+ :
(x) = 1 .
xX
The definition above is the simplest one for finite domains. But there is
an alternative, equivalent, definition, that can be more easily generalized
to infinite domains. Since sometimes we will consider probability measures
on infinite domains, we present here also the alternative. First consider the
definition above for a finite X, and fix (X). For each event E X,
determines the probability of E as follows:
X
(E) =
(x).
xE
S
Ek X , then
disjoint events and
k=1
[
k=1
!
Ek
(Ek ).
k=1
2
Recall that RX
+ is the set of non negative real valued functions defined over the
n
domain X. If X is the finite set, X = {x1 , ..., xn }, RX
+ is isomorphic to R+ , the positive
n
orthant of the Euclidean space R .
30
(c)v(c)
cC
0 (c)v(c).
cC
(of course, the second equality holds when X is finite). With this, we
obtain a preference relation on (X): for all , 0 (X),
% 0
X
cC
f()(c)v(c)
f(0 )(c)v(c).
cC
3
If X is a closed and measurable subset of Rm , we define the support of a probability
measure (X) as the intersection of all closed sets C such that (C) = 1. If X is
finite, this definition is equivalent to the previous one.
3.2. Conjectures
31
cC
Since g()(c) = g 1 (c) ) and we have defined the payoff function
ui := vi g, for each (A), we have
X
X X
X
g()(c)vi (c) =
(a)vi (g(a)) =
(a)ui (a)
cC
cC ag 1 (c)
aA
aA
3.2
Conjectures
The reason why one needs to introduce preferences over lotteries and
expected payoffs is that individual i cannot observe other individuals
actions (ai ) before making his own choice. Hence, he needs to form
a conjecture about such actions. If i were certain of the other players
choices, then one could represent is conjecture simply with an action
profile ai Ai . However, in general, i might be uncertain about other
players actions and assign a strictly positive (subjective) probability to
several profiles ai , a0i , etc.
Definition 2. A conjecture of player i is a (subjective) probability
measure i (Ai ). A deterministic conjecture is a probability
measure i (Ai ) that assigns probability one to a particular action
profile ai Ai .
Note, we call conjecture a (probabilistic) belief about the behavior
of other players, while we use the term belief to refer to a more general
type of uncertainty.
32
Given that i has only two actions, we can represent the conjecture
3.2. Conjectures
33
4 a
1 f
i (r)
i (`)
3.2.1
Mixed actions
34
wheel, or tossing a coin. In other words, a player could simply choose the
probability of playing any given action.
Definition 3. A random choice by player i, also called a mixed action,
is a probability measure i (Ai ). An action ai Ai is also called pure
action.
Remark 2. The set of pure actions can be regarded as a subset of the set
of mixed actions, i.e. Ai (Ai ).
It is assumed that (according is beliefs) the random draw of an action
of i is stochastically independent of the other players actions. For example,
the following situation is excluded : i chooses his action according to the
(random) weather and he thinks his opponents are doing the same, so that
there is correlation between ai and ai even though there is no causal link
between ai and ai (this type of correlation will be discussed in section
5.2.2 of Chapter 5). More importantly, player i knows that moves are
simultaneous and therefore by changing his actions he cannot cause any
change in the probability distribution of opponents actions. Hence, if
player i has conjecture i and chooses the mixed action i , the subjective
probability of each action profile (ai , ai ) is i (ai )i (ai ) and is expected
payoff is
X
X X
i (ai )i (ai )ui (ai , ai ) =
i (ai )ui (ai , i ).
ui (i , i ) :=
ai Ai ai Ai
ai Ai
If the opponent has only two feasible actions, it is possible to use a graph to
represent the lotteries corresponding to pure and mixed actions. For each
action ai , we consider a corresponding point in the Cartesian plane with
coordinates given by the utilities that i obtains for each of the two actions
of the opponent. If the actions of the opponents are l and r, we denote such
coordinates x = ui (, l) and y = ui (, r). Any pure action ai corresponds
to the vector (x, y) = (ui (ai , l), ui (ai , r)) (a row in the payoff matrix of
i). The same holds for the mixed actions: i corresponds to the vector
(x, y) = (ui (i , l), ui (i , r)). The set of points (vectors) corresponding to
the mixed actions is simply the convex hull of the points corresponding to
the pure actions.4 Figure 3.3 represents such a set for matrix 1.
4
The convex hull of a set of points X Rk is the smallest convex set containing
X, that is, the intersection of all the convex sets containing X.
3.2. Conjectures
35
(`) = 1
3
ui (, r)
b
(r) = 2
3
(r) = 2
3
} (`) =
1
1
3
ui (, `)
Allowing for mixed actions, the set of feasible (expected) payoffs vectors
36
is a convex polyhedron (as in Figure 2). To see this, note that if i and i
are mixed actions, then also p i + (1 p) i is a mixed action, where
p i + (1 p) i is the function that assigns to each pure action ai the
weight pi (ai )+(1p)i (ai ). Thus, all the convex combinations of feasible
payoff vectors are feasible payoff vectors, once we allow for randomization.
However, the idea that individuals use coins or roulette wheels to
randomize their choices may seem weird and unrealistic. Furthermore,
as it can be gathered from Figure 3.3, for any conjecture i and any mixed
action i , there is always a pure action ai that yields the same or a higher
expected payoff than i (check all possible slopes of the iso-expected payoff
curves and verify how the set of optimal points looks like).5 Hence, a player
cannot be strictly better off by choosing a mixed action rather than a pure
action.
The point of view we adopt in these lecture notes is that players never
randomize (although their choices might depend on extrinsic, payoffirrelevant signals, as in Chapter 5, Section 5.2.2). Nonetheless, it will
be shown that in order to evaluate the rationality of a given pure
action it is analytically convenient to introduce mixed actions. (In
Chapter 5, we discuss interpretations of mixed actions that do not involve
randomization.)
3.3
The set of pure actions that are best replies to conjecture i is denoted by6
ri (i ) := Ai arg max ui (i , i ).
i (Ai )
5
6
37
Proof. This is a rather trivial result. But since this is the first proof
of these notes we will go over it rather slowly.
(Only if ) We first show that if Suppi is not included in ri (i ),
then i is not a best reply to i . Let ai be a pure action such that
i (ai ) > 0 and assume that, for some i , ui (i , i ) > ui (ai , i ), so
that ai Suppi \ri (i ).8 Since ui (i , i ) is a weighted average of
the values {ui (ai , i )}ai Ai , there must be a pure action a0i such that
ui (a0i , i ) > ui (ai , i ). But then we can construct a mixed action i0 that
7
38
if ai = ai ,
0,
0
(a ) + i (ai ), if ai = a0i ,
i (ai ) =
i i
i (ai ),
if ai 6= ai , a0i .
P
P
i0 is a mixed action since ai Ai i0 (ai ) = ai Ai i (ai ) = 1. Moreover,
it can be easily checked that
ui (i0 , i ) ui (i , i ) = i (ai )[ui (a0i , i ) ui (ai , i )] > 0,
where the inequality holds by assumption. Thus ai is not a best reply to
i .
(If ) Next we show that if each pure action in the support of i is a
best reply, then i is also a best reply. It is convenient to introduce the
following notation:
u
i (i ) :=
max ui (i , i ).
i (Ai )
By definition of ri (i ),
ai ri (i ), u
i (i ) = ui (ai , i )
i
(3.3.1)
ai Ai , u
i ( ) ui (ai , ).
(3.3.2)
= u
i (i )
X
ai Ai
ai ri (i )
i (ai ) =
X
ai Ai
i (ai )
ui (i )
ai Ai
The first equality holds by definition, the second follows from Suppi
ri (i ), the third holds by eq. (3.3.1), the fourth and fifth are obvious and
the inequality follows from (3.3.2).
In Matrix 1, for example, any mixed action that assigns positive
probability only to a and/or b is a best reply to the conjecture ( 12 , 12 ).
Clearly, the set of pure best replies to ( 12 , 12 ) is ri (( 12 , 12 )) = {a, b}.
39
Note that if at least one pure action is a best reply among all pure and
mixed actions, then the maximum that can be attained by constraining
the choice to pure actions is necessarily equal to what could be attained
choosing among (pure and) mixed actions, i.e.
ri (i ) 6=
i (Ai )
i (Ai )
ai Ai
40
41
U
M = Im , c =
1Tm
(3.3.3)
is a (k + m + 1)-dimensional
0k
0m
1
y c> 0
yT M = c
10
42
s=1
12
s=1
As Figure 2 suggests, a similar result relates undominated mixed actions and mixed
best replies: the two sets are the same and correspond to the North-East boundary of
the feasible set. But since we take the perspective that players do not actually randomize,
we are only interested in characterizing justifiable pure actions.
43
n
X
aj ) ai .
j=1
And risk-neutral.
44
and recall that k < 1. The profile of dominant actions (0, ..., 0) is Paretodominated by any symmetric profile of positive contributions (, ..., ) (with
> 0). Indeed, ui (0, ..., 0) = 0 < (nk 1) P
= ui (, ..., ), where the
n
1
inequality holds because k > n . Let S(a) =
i=1 ui (a) be the social
surplus; the surplus maximizing profile is a
i = Wi for each i.
An action could be a best reply only to conjectures that assign zero
probability to some action profiles of the other players. For instance, action
e in matrix 1 is justifiable as a best reply only if i is certain that i does
not choose r. Let us say that a player i is cautious if his conjecture assigns
positive probability to every ai Ai . Let
o (Ai ) := i (Ai ) : ai Ai , i (ai ) > 0
be the set of such conjectures.14 A rational and cautious player chooses
actions in ri (o (Ai )). These considerations motivate the following
definition and result.
Definition 7. A mixed action i weakly dominates another (pure or)
mixed action i if it yields at least the same expected payoff for every action
profile ai of the other players and strictly more for at least one a
i :
ai Ai , ui (i , ai ) ui (i , ai ),
ai Ai , ui (i , a
i ) > ui (i , a
i ).
The set of pure actions that are not weakly dominated by any mixed action
is denoted by N W Di .
Lemma 4. (See Pearce [29]) Fix an action ai Ai in a finite game.
There exists a conjecture i o (Ai ) such that ai is a best reply to i if
and only if ai is not weakly dominated by any pure or mixed action:
N W Di = ri 0 (Ai ) .
14
o (Ai ) is the relative interior of (Ai ) (superscript o should remind you that
this is a relatively open set, i.e. the intersection between the simplex (Ai ) RAi
Ai
and the strictly positive orthant R++
, an open set in RAi ).
45
46
vi max ai ,
if ai > max ai ,
0,
if ai < max ai ,
ui (ai , ai ) =
1
(v
max
a
)
if ai = max ai
i
i
1+|arg max ai |
where max ai = maxj6=i (a1 , ..., ai1 , ai+1 , ..., an ) (recall that, for every
finite set X, |X| is the number of elements of X, or cardinality of X). It
turns out that offering ai = vi is the weakly dominant action (can you
prove it?). Hence, if the potential buyer i is rational and cautious, he will
offer exactly vi . Doing so, he will expect to make some profits. In fact,
being cautious, he will assign a positive probability to event [max ai < vi ].
Since by offering vi he will obtain the object only in the event that the
price paid is lower than his valuation vi , the expected payoff from this offer
is strictly positive.
3.3.1
If a buyer were to take into account a potential future resale, things might be rather
different. In fact, other potential buyers could hold some relevant information that
affects the estimate of how much the artwork could be worth in the future. Similarly, if
there were doubts regarding the authenticity of the work, it would be relevant to know
the other buyers valuations.
16
See Vickrey [34].
17
A function f : X R with convex domain is strictly quasi-concave if for every
pair of distinct points x0 , x00 X (x0 6= x00 ) and every t (0, 1),
f (tx0 + (1 t)x00 ) > min{f (x0 ), f (x00 )}.
47
Moulin, 1984):
Definition 9. A static game G = hI, (Ai , ui )iI i is nice if, for every
i ] R, the payoff function
player i I, Ai is a compact interval [ai , a
ui : A R is continuous (that is, jointly continuous in all its arguments),
and for each ai Ai , the function ui (, ai ) : [ai , a
i ] R is strictly
quasi-concave.
Remark 6. Let Ai = [ai , a
i ] R and fix ai Ai . Function
ui (, ai ) : [ai , a
i ] R is strictly quasi-concave if and only if the following
two conditions hold: (1) there is a unique best reply ri (ai ) [ai , a
i ] and
(2) ui (, ai ) is strictly increasing on [ai , ri (ai )] and strictly decreasing on
[ri (ai ), a
i ].
Example 4. Consider Cournots oligopoly game with capacity constraints:
I is the set of firms, ai [0, a
i ] is the output of firm i and a
i is its capacity.
The payoff function of firm i is
ui (a1 , ..., an ) = ai P ai +
aj Ci (ai ),
j6=i
P
where P : [0, i a
i ] R+ is the downward-sloping inverse demand
function and Ci : [0, a
i ] R+ is is cost function. The Cournot game
is nice if P () andP
each cost function Ci () are continuous and if, for each
i and ai , P (ai + j6=i aj )ai Ci (ai ) is strictly quasi-concave. The latter
condition is easy to P
satisfy. For example, if each
P Ci is strictly increasing
and convex, and P ( jI aj ) = max{0, p jI aj }, then strict-quasi
concavity holds (prove this as an exercise).18
Nice games allow an easy computation of the sets of justifiable actions
and of dominated actions. We are going to compare best replies to
deterministic conjectures, uncorrelated conjectures, correlated conjectures,
pure actions not dominated by mixed actions and pure actions not
dominated by other pure actions. First note that, by definition, for every
18
48
Y
ij
I(Ai ) := i (Ai ) : (ij )jI\{i} jI\{i} (Aj ), i =
j6=i
is the set of product probability measures on Ai , that is, the set of beliefs
that satisfy independence across opponents. In words, the set of best
replies to deterministic conjectures in Ai is contained in the set of best
replies to probabilistic independent conjecture, which is contained in the
set of best replies to probabilistic (possibly correlated) conjectures, which
is contained in the set of actions not dominated by mixed actions, which
is contained in the set of (pure) actions not dominated by other pure
actions. In each case the inclusion may hold as an equality. The following
very convenient result states that in nice games all these (weak) inclusions
indeed hold as equalities (the proof is left as a rather hard exercise):
Lemma 5. In every nice game, the set of best replies of each player i to
deterministic conjectures coincides with the set of actions not dominated
by other pure actions, and therefore it also coincides with the set of best
replies to independent or correlated probabilistic conjectures, and with the
set of actions not dominated by mixed actions:
ri (Ai ) = ri (I(Ai )) = ri ((Ai )) = N Di
= {ai Ai : bi Ai , ai Ci , ui (ai , ai ) ui (bi , ai )}.
As a preliminary step for the proof of Lemma 5, it is necessary to prove
a technical result, i.e., that in nice games (more generally, in games
with compact and convex action sets with continuous payoff functions that
are strictly quasi-concave in the own action) best replies to deterministic
19
If you do not know how to deal with probability measures on infinite sets, just
consider the simple ones, i.e., those that assign positive probability to a finite set of
points. The set of simple probability measures on an infinite domain X is a subset of
(X), but under the assumptions considered here it is possible to restrict ones attention
to this smaller set.
49
`N
Rationalizability and
Iterated Dominance
The analysis, so far, has been based on a set of minimal epistemic
assumptions:1 every player i knows the sets of possible actions A1 , A2 ,...,
An and his own payoff function ui : Ai Ai R. From this analysis we
derived a basic, decision-theoretic principle of rationality: a rational player
should not choose those actions that are dominated by mixed actions. This
principle of rationality can be interpreted in a descriptive way (assuming
that a rational player will not choose dominated actions), or in a normative
way (a player should not choose dominated actions).
The concept of dominance is sufficient to obtain interesting results
in some interactive situations: those where it is not necessary to guess
the other players actions in order to make a correct decision. Simple
social dilemmas, like the Prisoners Dilemma or contributing to a public
good, have this feature. But when we analyze strategic reasoning, the
decision-theoretic rationality principle is just a starting point: thinking
strategically means trying to anticipate the co-players moves and plan a
best response, taking into account that the also the co-players are intelligent
individuals trying to do just the same.
Strategic thinking is based on knowledge of the rules of the game
1
50
51
4.1
The analysis will proceed in a series of steps. After introducing the concept
of rationality and its behavioral consequences, we will characterize which
actions are consistent not only with rationality, but also with the belief
that everybody is rational. Then we will characterize which actions are
consistent not only with the previous assumptions, but also with the
further assumption that each player believes that each other player believes
that everybody is rational; and so on. Thus, at each further step we
characterize more restrictive assumptions about players rationality and
beliefs.
Such assumptions can be formally represented as events, i.e. as subsets
of a space of states of the world where every state is a
conceivable configuration of actions, conjectures and beliefs concerning
other players beliefs. This formal representation provides an expressive
and precise language that allows to make the analysis more rigorous and
2
Recall that what matters are players preferences over lotteries of consequences,
hence also their risk attitudes. But in some cases (for instance, some oligopolistic
models) risk attitudes are irrelevant for the strategic analysis and risk neutrality can
therefore be assumed with no loss of generality.
52
53
4.2
In mathematics, the term operator is mostly used for mappings from a space of
functions to a space of functions (possibly the same). But sets can always be represented
by functions (e.g. indicator functions). Therefore the present use of the term operator
is consistent with standard mathematical language.
5
For infinite games, we have to specify the collections of events Ai 2Ai (i I) and
A 2A , where iI Ai A, and we replace 2Ai (respectively, 2A ) with Ai (respectively,
A).For example, if Ai is a subset of a Euclidean space (more generally, a metric space),
then Ai = B(Ai ) (respectively, A = B(A)) is the collection of Borel subsets of Ai
(respectively, A). Standard regularity assumptions, such as compactness and continuity,
ensure that (C) is an event, hence (C) C, for each C C.
54
3, 2
0, 1
0, 0
0, 2
3, 1
0, 0
1, 1
1, 2
3, 0
55
4.3
The set of action profiles consistent with R (rationality of all the players)
is (A). Therefore, if every player is rational and believes R, only those
action profiles that are rationalized by (A) can be chosen. It follows that
the set of action profiles consistent with R B(R) is ((A)) = 2 (A).
Iterating the procedure one more step, it is relatively easy to see that the
set of action profiles consistent with R B(R)B2 (R) is (2 (A)) = 3 (A).
The general relationship between events about rationality and beliefs and
corresponding sets of action profiles is given by the Table 4.1.
Note that, as one should expect, the sequence of subsets (k (A))kN
is weakly decreasing, i.e. k+1 (A) k (A) (k = 1, 2, ...). This fact can
be easily derived from the monotonicity of the rationalization operator:
by definition, 1 (A) = (A) A = 0 (A); if k (A) k1 (A), the
monotonicity of implies k+1 (A) = (k (A)) (k1 (A)) = k (A). By
the induction principle k+1 (A) k (A) for every k. Every monotonically
decreasing sequence of subsets has a well defined limit: the intersection of
7
56
Behavioral implications
(A)
R B(R)
2 (A)
R B(R) B2 (R)
3 (A)
...
R
T
...
K
k
k=1 B (R)
K+1 (A)
...
...
4, 3
0, 2
0, 0
0, 1
3, 4
0, 0
1, 1
1, 2
5, 0
Example 7. Here is a story for the payoff matrix above: Rowena and
Colin have to decide independently of each other whether to go to a Bach
57
58
Proof. (a) If Ai is finite, for every conjecture i the set of best replies
ri (i ) is non-empty. Then for each non-empty Cartesian set C A, (C)
is non-empty (for every i, there exists at least a conjecture i (Ci )
with 6= ri (i ) i (Ci )). Thus, k (A) 6= implies k+1 (A) 6= . Since
A 6= , it follows by induction that k (A) 6= for every k.
The sequence of the subsets (k (A))kN is weakly decreasing. Also, if
k (A) = k+1 (A), then k (A) = ` (A) for every ` k. Given that A is
finite, the inclusion k+1 (A) P
k (A) can be strict only for a finite number
of steps K (in particular, K iI |Ai |n, where |Ai | denotes the number
of elements of Ai : when k+1 (A) ( k (A), at least one action for at least
one player i is eliminated, but at least one action for each player is never
eliminated). All the above implies that K+1 (A) = K (A) = (A) 6= .
(b) (This part contains more advanced mathematical arguments and
can be omitted.) Since ui : Ai Ai R is continuous, then also
the expected utility function ui (ai , i ) is continuous in both arguments
((Ai ) is assumed to be endowed with the topology of weak convergence
of measures).9 Given that Ai is compact, the maximum principle implies
that the best reply correspondence ri : (Ai ) Ai is upper hemicontinuous (u.h.c.) with non-empty values. For every non-empty and
compact Ci Ai , the set (Ci ) is non-empty and compact. It follows
that its image through the u.h.c. correspondence ri is also non-empty and
compact. Therefore for any non-empty and compact product set C C,
(C) = r1 ((C1 )) ... rn ((Cn )) is a non-empty, compact set. It
follows by induction that (k (A))kN is a weakly decreasing sequence of
non-empty
and compact sets. Hence, for every K = 1, 2, ..., the intersection
TK k
K
is non-empty and compact. By the finite intersection
k=1 (A) = (A) T
10
Let C Rm be a compact set and consider the probability measures defined over the
Borel subsets of C. A sequence of probability measures k converges weakly
R to a measure
R
if, for every continuous (hence bounded) function f : C R, limk f dk = f d.
10
Any collection of compact subsets {C k ; k Q} has the following property (called
finite intersection property
): if every finite sub-collection {C k ; k F } (F Q)
T
has non-empty intersection ( kF C k 6= ), then also the whole collection has non-empty
T
T
intersection ( kQ C k 6= ). Moreover, C := kQ C k is an intersection of close sets
and hence it is closed. Since C is a closed set contained in a compact set (C C k for
every k Q), C must also be compact.
59
implies that
( (A))
(k (A)) =
k0
k (A) = (A).
k1
\
i
i (
ki (A) = lim i (ki (A)) = 1
i (A)) =
K
k1
60
rationalizable.
Basis step. Since C (C), C A and is monotone, it follows that
C 1 (C) 1 (A).
Inductive step. Suppose that C k (C) k (A). By monotonicity
(C) (k (C)) (k (A)).
Since C (C) and (k ()) = k+1 (), we obtain
C k+1 (C) k+1 (A).
Remark 11. The proof of Theorem 2 clarifies that, under the assumptions
of Theorem 1, (A) has the best reply property, and for every C C with
the best reply property, C (A); therefore (A) is the largest set with
the best reply property.12
We gave a definition of rationalizability based on iterations of the
rationalization operator because it is the most intuitive. An alternative
More precisely, (A) is the maximal set with the best reply property with respect
to the partial order on C.
12
61
4.4
Iterated Dominance
62
[
i (Ci )
63
induction
implies
k (A).
I and
(
k (A)).
64
1\2
...
1, 0
...
1, 1
0, 0
...
0, 0
1
k
...
...
1, 0
...
0, 1
Thus, u2 1 k1 , 2 = 2 (0) 1 k1 , u2 (1, 2 ) = 2 (1) = 1 2 (0). If
2 (1) 21 the best reply is a2 = 1. If 2 (1) < 12 , then u2 (1 k1 , 2 ) >
u2 (1, 2 ) for k sufficiently large, but there is no best reply because
u2 (1 k1 , 2 ) is strictly increasing in k. Note also that a1 = 0 strictly
dominates a1 = 1. Clearly (A) = (A) = {0} {1} and 2 ({0}) = ,
2 ({0} {1}) = {1}. Therefore 2 (A) = =
6 {0} {1} = 2 (A). How
is compactness-continuty violated? A1 is finite, hence compact; A2 is a
closed and bounded subset of the real line (A2 contains the limit of the
sequence (1 k1 )
k=1 ), hence
it is also compact;15but u2 is discontinuous at
1
(0, 1): limk u2 0, 1 k = 1 6= 0 = u2 (0, 1).
Theorem 3 follows quite easily from Lemma 7: indeed Lemma 2 implies
that the iteratively undominated actions coincide with the iteratively
justifiable ones, and by Lemma 7 the iteratively justifiable actions coincide
with the rationalizable actions. The details are as follows:
Proof of Theorem 3.
Basis step. Lemma 2 implies that (A) = N D(A).
Inductive step. Assume that k1 (A) = N Dk1 (A) (inductive
hypothesis) and consider the game Gk1 where the set of actions of
each player i is ik1 (A), and the payoff functions are obtained from
the original game by restricting their domain to k1 (A). The inductive
hypothesis implies that the set of undominated action profiles in Gk1
is N D(k1 (A)) = N D(N Dk1 (A)) = N Dk (A).
By Lemma 2,
N D(N Dk1 (A)) = (N Dk1 (A)). The inductive hypothesis and Lemma
7 yield (N Dk1 (A)) = (k1 (A)) = (k1 (A)) = k (A). Hence
N Dk (A) = k (A).
So far, we have considered a procedure of iterated elimination which
is maximal, in the sense that at any step all the dominated actions of
15
For the mathematically savy: it is easy to change the example so that A2 is not
compact and u2 is trivially continuous, just endow A2 with the discrete topology.
65
all players are eliminated (where dominance holds in the restricted game
that resulted from previous iterated eliminations). However, one can show
that to compute the set of action profiles that are iteratively undominated
(and therefore rationalizable) it is sufficient to iteratively eliminate some
actions which are dominated for some player until in the restricted game
so obtained there are no dominated actions left. Formally:
Definition 14. An iterated dominance procedure is a sequence
K+1 of non-empty Cartesian subsets of A such that (i) C 0 = A,
(C k )K
k=0 C
(ii) for each k = 1, ..., K, N D(C k1 ) C k C k1 ( is the strict
inclusion), and (iii) N D(C K ) = C K .
In words, an iterated dominance procedure is a sequence of steps
starting from the full set of action profiles (condition i) and such that
at each step k, for at least one player, at least one of the actions that
are dominated given the previous steps is eliminated (condition ii); the
elimination procedure can stop only when no further elimination is possible
(condition iii).
Theorem 4. Fix a finite game. For every iterated dominance procedure
K is the set of rationalizable action profiles.16
(C k )K
k=0 , C
Proof. Fix an iterated dominance procedure, i.e. a sequence of subsets
that satisfies (i)-(ii)-(iii) of Definition 14.
Claim. For each k = 0, 1, ..., K,
(C k )K
k=0
66
and
(C k+1 ) (C k ) = N D(C k ) C k+1 .
Since the game is finite, there is at least one best reply to every conjecture
and (4.4.2) holds; thus,
(C k+1 ) = (C k+1 ) = N D(C k+1 ),
where the second equality follows from Lemma 2. Collecting inclusions
and equalities, k+2 (A) (C k+1 ) = N D(C k+1 ).
The claim implies that K+1 (A) (C K ) = N D(C K ). By (iii),
K
C = N D(C K ). Therefore (A) (C K ) = C K , that is, every
rationalizable profile belongs to C K , and C K has the best reply property.
By Theorem 2, C K must be the set of rationalizable profiles.
4.5
Recall that in a nice game the action set of each player is a compact
interval in the real line, payoff functions are continuous and each ui is
strictly quasi-concave in the own action ai (see Definition 9). It turns out
that the analysis of rationalizability and iterated dominance in nice games
is nice indeed!
Let C denote the collection of closed subsets of A with a cross-product
form C = C1 ... Cn (closedness is a technical requirement; for finite
games, this coincides with the earlier definition of C).17 Define the
following operators, or maps from C to C:
(C) = iI ri (Ci ), (C) = iI ri (I(Ci )),
N Dp (C) = iI {ai Ci : bi Ci , ai Ci , ui (ai , ai ) ui (bi , ai )},
where I(Ci ) is the set of product probability measures on Ci , or
independent (uncorrelated) conjectures (see Section 3.3.1). Note that
and are monotone, but N Dp (like N D) is not monotone.
17
We could consider all the Cartesian products of Borel subsets, but this is not
necessary. We can restrict our attention to closed sets because A is closed, and for
every closed set C = C1 ... Cn both and the operators defined below give another
closed set. This follows from the continuity of best reply functions of nice games.
67
This was the solution originally called rationalizability by Bernheim (1984) and
Pearce (1984) before the epistemic analysis proved that the cleaner and more basic
concept is the correlated version (A).
19
Point rationalizability obtains if (a) players are rational, (b) they hold deterministic
conjectures, and (c) there is common belief of (a) and (b).
20
The result holds more generally for games with compact actions sets, where the
payoff functions are jointly continuous in the opponents actions and upper semicontinuous in the own action.
68
22 Market demand
( 4
is a capacity constraint given by the available land).
is given by the function D(p; n) = max{0, n( p)} with 0 < < 2;
therefore price as a function of average quantity is
!
(
!)
n
n
1X
1
1X
P
qi = max 0,
qi
.
n
n
i=1
21
22
i=1
69
q
2m
(qi )2 , if n > qi + j6=i qj ,
j
j6
=
i
n
n
i (qi , qi ) =
P
1
2m
(qi )2 ,
if n qi + j6=i qj .
P
This function is clearly continuous.23 Now, fix qi arbitrarily. If j6=i qj
1
n then i (qi , qi ) = 2m
(qi )2 , which is strictly decreasing in qi . If
P
j6=i qj < n, then i is initially increasing in qi up to the point ri (qi )
where marginal revenue equals marginal cost (see below), and then it
becomes strictly decreasing.24
(2) The best P
reply function is easily obtained from the first-order conditions
1
when > n j6=i qj , in the other case the best reply is the corner solution
qi = 0:
(
)
P
m( n1 j6=i qj )
ri (qi ) = max 0,
.
+ 2m
n
(3) The monopolistic output is
q n,M := ri (0, ..., 0) =
23
24
Note that
qi
B
qi
n
1
n
j6=i
qj
1
(q )2
2m i
m
.
+ 2m
n
1
= 2m
(qi )2 if nA = qi +
j6=i
qj .
There is a kink at qi = A j6=i qj , but this is immaterial. Note also that i is not
concave in qi : in the left neighborhood of the kink point qi the derivative is
X
X
i
q
1
q
q
1
qi
i
i
i
(qi , qi ) = P +
qj + P 0 +
qj
qi
n
n
n
n
n
m
j6=i
j6=i
qi 0 qi
1 X qi
qi
P
+
qj
<
n
n
n
m
m
j6=i
qi
n
1
n
i
i
i (
qi , qi ) <
i (
qi + , qi ),
qi
qi
which implies that i is not concave.
j6=i qj
i
(qi , qi )
qi
0,
=
70
P
If every competitor produces at maximum capacity n1 j6=i qj > (because
< 2 and capacity is 4/), so the best reply is
4
4
= 0.
ri
, ...,
n
Since the game is nice, () = 0, q n,M . If this set has the best response
property, this is also the set of rationalizable profiles. To check this, it is
enough to verify whether the best reply to the most pessimistic conjecture
consistent with rationality
is zero, that is ri (q n,M , ..., q n,M ) = 0; in the
n
affirmative case 0, q n,M has the best reply property. The condition for
m
this is n1
n + 2m , or
n
m
n3
n
.
Therefore,
large enough so that, for each n > n
,
n,M n if < m there is n
0, q
has the best reply property, hence there is a huge multiplicity
of rationalizable outcomes. Conversely, if > m, then
n3
)
n
n
for every n; hence, in this case the set 0, q n,M does not have the best
reply property whatever the number n of firms. Furthermore, it can be
shown that if > m there is a unique rationalizable outcome.25 By
symmetry, each firm has the same rationalizable output q n, which must
> m(
solve q =
q)
m( n1
n
,
+ 2m
n
m
+ m n+1
n
m
+ m n+1
n
!
=
m
n
m n+1
n
+
+
kN
shrinks to a point.
71
,
p =
+m
with average output
q =
m
.
+m
m
m
n+1 = + m = q ,
+m n
+ m
n
=
= q.
= lim
n+1
n + m
+
m
n
lim q n, =
lim pn,
lim
Equilibrium
In chapter 4, we analyzed the behavioral implications of rationality and
common belief in rationality when there is complete information, that is,
common knowledge of the rules of the game and of players preferences.
There are interesting strategic situations where these assumptions about
rationality, belief and knowledge imply that each player can correctly
predict the opponents behavior, e.g. many oligopoly games. But this
is not the case in general: whenever a game has multiple rationalizable
profiles there is at least one player i such that many conjectures i are
consistent with common belief in rationality, but at most one of them can
be correct. In this chapter, we analyze situations where players predictions
are, in some sense, correct, and we discuss why predictions should (or
should not) be correct. Let us give a preview.
We start with Nashs classic definition of equilibrium (in pure actions):
an action profile a = (ai )iI is an equilibrium if each action ai is a best
reply to the actions of the other players ai .
Nash equilibrium play follows from the assumptions that players are
rational and hold correct conjectures about the behavior of other players.
But why should this be the case? Can we find scenarios that make the
assumption of correct conjectures compelling, or at least plausible? We will
discuss two types of scenarios: (1) those that make Nash equilibrium an
obvious way to play the game and (2) those that make Nash equilibrium
a steady state of an adaptive process. In each case, we will point out
that Nash equilibrium is justified under rather restrictive assumptions,
72
73
and that weakening such assumptions suggests interesting generalizations
of the Nash equilibrium concept.
More specifically, Nash equilibrium may be the obvious way to
play the game under complete information either as the outcome of
sophisticated strategic reasoning, or as the result of a non binding pre-play
agreement. If sophisticated strategic reasoning is given by rationality and
common belief in rationality, then Nash equilibrium is obtained only in
those games where rationalizable actions and equilibrium actions coincide.
This coincidence is implied by some assumptions about action sets and
payoff functions that are satisfied in several economic applications, but
in general the relevant concept under this scenario is rationalizability, not
Nash equilibrium.
What about pre-play agreements? True, a non-binding agreement
to play a particular action profile a has to be a Nash equilibrium,
otherwise the agreement would be would be self-defeating. But it will be
shown that more general probabilistic agreements that make behavior
a function of some extraneous random variables may be more efficient.
Such probabilistic agreements are called correlated equilibria and
generalize the Nash equilibrium concept.
Next we turn to adaptive processes. The general scenario is that a
given game is played recurrently by agents that each time are drawn at
random from large populations corresponding to different players/roles in
the game (e.g. seller and buyer, or male and female). The agents drawn
from each population are matched, play among themselves and then are
separated. If each players conjecture about the opponents behavior in the
current play is based on his observations of the opponents actions in the
past and this player best responds, we obtain a dynamic process in which
at each point in time some agent switches from one action to another. If
the process converges so that each agent playing in a given role ends up
choosing the same action over and over, the steady state must look like a
Nash equilibrium: indeed the experience of those playing in role i is that
the opponents keep playing some profile ai and thus they keep choosing
a best reply, say ai .1 Since this is true for each i, (ai )iI must be a Nash
equilibrium.
But it is possible that the process converges to an heterogeneous
situation: a fraction qi1 of the agents in the population i choose some
1
74
5. Equilibrium
action a1i , another fraction qi2 of agents choose action a2i and so on. If
the process has stabilized (qj1 , qj2 , ...) will (approximately) represent the
observed frequencies of the actions of those playing in role j, and i will
conjecture that a1j , a2j etc. are played with probabilities qj1 , qj2 etc. If each
action a1i , a2i , ... is a best reply to such conjecture, given a bit of inertia, i
will keep choosing whatever he was choosing before and the heterogeneous
behavior within each population will persist. In this case, the steady state
is described by mixed actions whose support is made of best responses to
the mixed actions of the opponents. This is called mixed equilibrium.
It turns out that a mixed equilibrium formally is a Nash equilibrium of
the extended game where each player i chooses in the set (Ai ) of mixed
actions and payoffs are computed by taking expected values (under the
assumption of independence across players). Indeed, this is the standard
definition of mixed equilibrium.
The analysis above relies on the assumption that, when a game is
played recurrently, each agent can observe the frequency of play of different
actions. But often the information feedback is poorer. For example, one
may observe only a variable determined by the players actions, such as
the number of customers of a firm in a price setting oligopoly. In such a
context, it may happen that players conjectures are wrong and yet they
are confirmed by players information feedback, so that they keep best
responding to such wrong conjectures and the system is in a steady state
which is not a Nash (or mixed Nash) equilibrium. A steady state whereby
players best respond to confirmed (non-contradicted) conjectures is called
conjectural, or self-confirming equilibrium. Since correct conjectures
(conjectures that corresponds to the actual behavior of the other players)
are necessarily confirmed, every Nash equilibrium is necessarily selfconfirming.
To sum up, a serious effort to provide reasonable justifications for
the Nash equilibrium concept leads us to think about scenarios where
strategic interaction would plausibly lead to players best responding to
correct conjectures about the opponents. But for each such scenario we
need additional assumptions (on the payoff functions, or on information
feedback) to justify Nash play. Without such additional assumptions we
are left with interesting generalizations of the Nash equilibrium concept.
The rest of the chapter is organized as follows. We analyze Nash
equilibrium in section 5.1, focusing on symmetric equilibria (subsection
75
5.1
Nash equilibrium
John Nash, who has been awarded the Nobel Prize for Economics (joint with John
Harsanyi e Reinhard Selten) in 1994, was the first to give a general definition of this
equilibrium concept. Nash analyzed the equilibrium concept for the mixed extension of
a game, and proved the existence of mixed equilibria for all games with a finite number
of pure actions (see Definition 12 and Theorem 7).
Almost all others equilibrium concepts used in non-cooperative game theory can be
considered as generalizations or refinements of the equilibrium proposed by Nash.
Perhaps, this is the reason why the other equilibrium concepts that have been proposed
do not take their name after one of the researchers who introduced them in the literature.
76
5. Equilibrium
5.1.1
As one can see from the sketch of proof of Theorem 6, rather advanced
mathematical tools are necessary to prove a relatively general existence
theorem. Here, we analyze the more specific case in which all players
are in a symmetric position and have unidimensional action sets. This
allows us to use more elementary tools to show that, under convenient
simplifying assumptions, there exists an equilibrium in which all players
choose the same action. The fact that all players choose the same action
is not part of the thesis of Theorem 6, which did not assume symmetry.
Therefore the next theorem is not a special case of Theorem 6.3
Definition 16. A static game G = hI, (Ai , ui )iI i with |I| = n players
is symmetric if the players have the same set of actions, denoted by B
(hence, for every i I, Ai = B), and if there exists a function v : B n R
which is symmetric4 with respect to the arguments 2, ..., n and such that
i I, (a1 , ..., an ) A = B n , ui (a1 , ..., an ) = v(ai , ai ).
A Nash equilibrium a of a symmetric game G is symmetric if for all
i, j I, ai = aj .
Example 10. The Cournot oligopoly game of Example 4 is symmetric if
all n firms have the same capacity constraint and cost function: for some
a
> 0, and C : [0, a
] R, a
i = a
and Ci () = C() for all i. Then
3
However, adding symmetry to the hypotheses of Theorem 6 one can prove the
existence of a symmetric Nash equilibrium.
4
Invariant to permutations.
77
v : [0, a
]n R is the map5
n
X
aj C(a1 ),
j=2
78
5. Equilibrium
5.1.2
Nash equilibrium is the most well known and applied equilibrium concept
in economic theory, besides the competitive equilibrium. Indeed, we have
argued in the Introduction that, in principle, any economic situation
(and more generally any social interaction) can be represented as a noncooperative game. The property according to which every action is a best
reply to the other players actions seems to be essential in order to have
an equilibrium in a non-cooperative game.
Nonetheless, we should refrain from accepting this conclusion without
further reflection.
Why does the Nash equilibrium represent an
interesting theoretical concept? When should we expect that the actions
simultaneously chosen by the players form an equilibrium? Why should
players hold correct conjectures regarding each others behavior?
We propose a few different interpretations of the Nash equilibrium
concept, each addressing the questions above. Such interpretations can
be classified in two subgroups: (1) a Nash equilibrium represents an
obvious way to play, (2) a Nash equilibrium represents a stationary
(stable) state of an adaptive process. In some cases, we will also
introduce a corresponding generalization of the equilibrium concept which
is appropriate under the given interpretation and will be analyzed in the
next section.
(1) Equilibrium as an obvious way to play : Assume complete
information, i.e. common knowledge of the game G (this assumption is
sufficient to justify the considerations that follow, but it is not strictly
necessary). It could be the case that from common knowledge of G and
shared assumptions about behavior and beliefs, or from some prior events
that occurred before the game, the players could positively conclude that
a specific action profile a represents an obvious way to play G. If a
represents an obvious way to play, every player i expects that everybody
else chooses his action in ai . Moreover, if a is to be played by rational
players, it must be the case that no i has incentives to choose an action
79
80
5. Equilibrium
for instance, that if there is an excess demand the sellers will realize
that they are able to sell their goods for a higher price; conversely, if
there is an excess supply the sellers will lower their prices to be able to
sell the residual unsold goods. Essentially, such arguments rely more or
less explicitly on the existence of a feedback effect that pushes the prices
towards the equilibrium level. These arguments cannot be formalized in
the standard competitive equilibrium model, where market prices are taken
as parametrically given by the economic agents and then determined by the
demand=supply conditions. They nonetheless provide an intuitive support
to the theory.
Similar arguments can be used to explain why the actions played in
a game should eventually reach the equilibrium position. In some sense,
the conceptual framework provided by game theory is better suited to
formalize this kind of arguments. As argued in the Introduction, in a
game model every variable that the analyst tries to explain, i.e. any
endogenous variable, is directly or indirectly determined by the players
actions, according to precise rules that are part of the game (unlike prices
in a competitive market model, where the price-formation mechanism is
not specified). Assuming that the given game represents an interactive
situation that players face recurrently, one can formulate assumptions
regarding how players modify their actions taking into account the
outcomes of previous interactions, thus representing formally the feedback
process.
One can distinguish two different types of adaptive dynamics: learning
and evolutionary dynamics. Here, we will present only a general and brief
description.8
(2.a) Learning dynamics. Assume that a given game G is played
recurrently and that players are interested in maximizing their current
expected payoff (a reason for this may be that they do not value the future
or that they believe that their current actions do not affect in any way
future payoffs). Players have access to information on previous outcomes.
Based on such information, they modify their conjectures about their
opponents behavior in the current period. Let a be a Nash equilibrium.
If in period t every player i expects his opponents to choose ai , then ai
is one of his best replies. It is therefore possible that i chooses ai . If this
8
81
happens, what players observe at the end of period t will confirm their
conjectures, which then will remain unchanged for the following period.
So even a small inertia (that is a preference, coeteris paribus, to repeat the
previous action) will induce players to repeat in period t + 1 the previous
actions a . Analogously, a will be played also in period t + 2 and so on.
Hence, the equilibrium a is a stationary state of the process.
We just argued that every Nash equilibrium is a stationary state of
plausible learning processes (we presented the argument in an informal way,
but a formalization is possible). (i) Do such processes always converge to
a steady state? (ii) Is it true that, for any plausible learning process, every
stationary state is a Nash equilibrium? It is not possible, in general, to give
affirmative answers. First, one has to be more specific about the dynamics
of the recurrent interaction. For instance, one has to specify exactly what
players are able to observe about the outcomes of previous interactions: is
it the action profile of the other players? Is it some variables that depend
on such actions? Is it only their own payoffs? Another relevant issue is
whether game G is played always by the same agents, or in every period
agents are randomly matched with strangers. An exact specification of
these assumptions shows that the process does not always converge to a
stationary state. Moreover, as we will see in the next section, there may
be stationary states that do not satisfy the Nash equilibrium condition
of Definition 15. There are two reasons for this: (1) Players conjectures
can be confirmed by observed outcomes even if they are not correct. (2) If
players are randomly drawn from large populations, then the state variable
of the dynamic process is given by the fractions of agents in the population
that choose each action. But then it is possible that in a stationary state
two different agents, playing in the same role, choose different actions.
Even though no agent actually randomizes, such situations look like mixed
equilibria, in which players choose randomly among some of the best
replies to their probabilistic conjectures, and such conjectures happen to
be correct. We analyze notions of equilibrium corresponding to situations
(1) and (2) in the next section.
(2.b) Evolutionary dynamics. In the analysis of adaptive processes in
games, the analogy with evolutionary biology has often been exploited.
Consider, for instance, a symmetric two-person game. In every period two
agents are drawn from a large population. Then they meet, they interact,
they obtain some payoff and finally they split. Individuals are compared
82
5. Equilibrium
5.2
Probabilistic Equilibria
83
5.2.1
Mixed Equilibria
Consider the following game between home owners and thieves. Each
owner has an apartment where he stores goods worth a total value of V .
Owners have the option to keep an alarm system, which costs c < V .11 The
alarm system is not detectable by thieves. Each thief can decide whether to
attempt a theft (burglary) or not. If there is no alarm system, the thieves
successfully seize the goods and resell them to some dealer, making a profit
of V /2. If there is an alarm system, the attempted burglary is detected
and the police is automatically alerted. The thieves in such event need to
leave all goods in place and try to escape. The probability they get caught
is 12 and in this case they are sanctioned with a monetary fine P and then
released.12
If the situation just described were a simple game between one owner
and one thief, it could be represented using the matrix below:
O\T
Burglary
No
Alarm
V c, P2
V c, 0
No
0, V2
V, 0
We assume for simplicity that c represents the cost of installing an alarm system as
well as the cost of keeping it active. Furthermore, we neglect the possibility to insure
against theft.
12
Prisons do not exist. If thieves cannot pay the amount P , they are inflicted an
equivalent punishment.
84
5. Equilibrium
same size: the population of owners (each one with an apartment) and
the population of thieves. Thieves randomly distribute themselves across
apartments. For any given owner, the probability of an attempted burglary
is equal to the fraction of thieves that decide to attempt a burglary. From
the thieves perspective, the probability that an apartment has an alarm
system is given by the overall fraction of owners keeping an alarm system.
Assume that the game is played recurrently. The fractions of agents
that choose the different actions evolve according to some adaptive process
with the following features. At the end of each period it is possible to
access (reading them on the newspapers) the statistics of the numbers
of successful and unsuccessful burglaries. Players are fairly inert in that
they tend to replicate the actions chosen in the previous period. However,
they occasionally decide to revise their choices on the basis of the previous
period statistics. Since in each period only a few agents revise their choices,
such statistics change slowly.
An owner not equipped with an alarm system decides to install it if and
only if the expected benefit is larger than the cost, i.e. if the proportion of
attempted burglaries is larger than c/V . Conversely, an owner equipped
with an alarm system will decide to get rid of it if and only if the proportion
of attempted burglaries is lower than c/V . The proportion of attempted
burglaries changes only slowly and the owners equate the probability of
being robbed today with the one of the previous period. The probability
that makes an owner indifferent between his two actions is c/V . When
indifferent, an owner sticks to the previous period action.
Analogously, a thief that was active in the previous period decides not
to attempt a burglary in the current period if and only if the fraction of
apartments equipped with an alarm system (which is also the fraction of
unsuccessful burglaries) is larger than V /(V + P ). A thief that was not
active in the previous period attempts a burglary in the current one if and
only if the fraction lower than V /(V + P ).
The state variables of this process are the fraction of installed alarm
systems and the fraction of attempted burglaries. It is not hard to see
that grows (resp., decreases) if and only if > Vc (resp. < c/V ).
Similarly, grows (resp., decreases) if and only if < V /(V + P ) (resp.,
> V /(V + P )). If = V /(V + P ) and = c/V, then the state of
the system does not change, i.e. (, ) = ( V V+P , Vc ) is a stationary state,
or rest point, of the dynamic process. Whether this rest point is stable
85
depends on details that we have not specified. The mixed action pair
(1 , 2 ) = (V /(V + P ), c/V ) is said to be a mixed (Nash) equilibrium of
the matrix game above.
In this example, we have interpreted the mixed action of j, say j , both
as a statistical distribution of actions in population j and as a conjecture
of the agents of population i about the action of the opponent. In a
stable environment every conjecture about j, j , is correct in the sense
that it corresponds to the statistical distribution of actions in population
j. Furthermore, j is such that every agent in population i is indifferent
among the actions that are chosen by a positive fraction of agents.
Next, we present a general definition of mixed equilibrium and then
show that it is characterized by the aforementioned properties.
Definition 17. The mixed extension of a game G = hI, (Ai , ui )iI i is
a game G = hI, ((Ai ), ui )iI i where
X
Y
ui () =
ui (a1 , ..., an )
j (aj ),
(a1 ,...,an )A
jI
86
5. Equilibrium
j (aj ).
j6=i
87
m1
X
k
uk`
2 1
k
uk`
2 1
`
uk`
1 2
uk1
2 1 , ` = m2 + 1, ..., |A2 |;
m2
X
u1`
1 2 , k = 2, ..., m1 ,
(5.2.2)
`=1
m2
`=1
m2
`=1
(5.2.1)
k=1
k=1
m2
X
uk1
2 1 , ` = 2, ..., m2 ,
k=1
m1
k=1
m1
m1
X
`
uk`
1 2
u1`
1 2 , k = m1 + 1, ..., |A1 |.
`=1
88
5. Equilibrium
Such system determines the set of mixed actions of player 2 that make
player 1 indifferent among all the actions in A1 and at the same time make
such actions weakly preferred to all the others. Therefore, the indifference
conditions for player 1 determine the equilibrium randomization(s) of
player 2, the indifference conditions for player 2 determine the equilibrium
randomization(s) of player 1.
In the previous example (Owners and Thieves) the equilibrium is
determined as follows: First, note that the best reply to a deterministic
conjecture is unique. However, no pair of pure actions is an equilibrium.
Hence, the equilibrium is necessarily mixed (existence follows from
Theorem 8). An equilibrium with support A1 = {Alarm, N o}, A2 =
{Robbery, N o} is given by the following (we are writing = 1 (A) and
= 2 (B)):
Indifference condition for i = 1 (Owner):
V c = V (1 ).
Indifference condition for i = 2 (Thief ):
V
P
(1 ) = 0.
2
2
Solving the system, = V V+P , = Vc , which is the stationary state
identified by the (informal) analysis of a plausible dynamic process.
5.2.2
Correlated Equilibria
89
3, 1
0, 0
0, 0
1, 3
Matrix 3 represents the classic Battle of the Sexes (BoS): Rowena (row
player) and Colin (column player) would like to coordinate and go to the
same concert, either B ach or S travinsky, but Rowena prefers Bach and
Colin prefers Stravinsky. Suppose that Rowena and Colin have to agree on
how to play the BoS the next day, when no further communication between
them will be possible. There exists two simple self-enforcing agreements,
(B, B) and (S, S). However, the first favors Rowena, the second favors
Colin, and neither player wants to give up. How to sort this out? Colin
can make the following proposal that would ensure in expected values a
fair distributions of the gains from coordination: If tomorrows weather is
bad then both of us choose B ach, if instead it is sunny then we both choose
S travinsky. Notice that the weather forecasts are uncertain: there is a
50% probability of bad weather and 50% of sunny weather. The agreement
generates an expected payoff of 2 for both players. Rowena understands
that the idea is smart. Indeed, the agreement is self-enforcing as both
players have an incentive to respect it if they expect the other to do the
same. For instance, if Rowena expects that Colin sticks to the agreement
and waking up she observes that the weather is bad, then she expects that
15
90
5. Equilibrium
Colin will go to the B ach concert and she wants to play B. Similarly, if
she observes that the weather is sunny, she expects that Colin will go to
the S travinsky concert and she wants to play S.
In other words, a sophisticated agreement can use exogenous and
not directly relevant random variables to coordinate players beliefs and
behavior. In such an agreement the conjecture of a player about the
behavior of others (and so their best replies) depends on the observed
realization of such random variables and the actions of different players
are correlated, albeit spuriously.
Clearly, it is not always possible to condition the choice on some
commonly observable random variable. Furthermore, even if this were
possible, players could still find it more convenient to base their respective
choices on random variables that are only partially correlated. Consider,
for example, the following bi-matrix game:
a
6, 6
2, 7
7, 2
0, 0
One can generalize and obtain distributions over mixed equilibria if players choose
mixed actions according to the observed realization x.
91
variables, they can do better. For example, suppose that Rowena observes
whether X = a or X = b, and Colin observes whether Y = a or Y = b,
where (X, Y ) is a pair of random variables with following joint distribution:
X\Y
1/3
1/3
1/3
They agree that each one should play a when she or he observes a,
and b when she or he observes b. It can be checked that this agreement
is self-enforcing. Suppose that Rowena observes X = a, then she assigns
the same (conditional) probability,
1
2
1
3
1
+ 13
3
0 : i ( 0 ) = ti
> 0 then
92
5. Equilibrium
(5.2.3)
p(E F )
.
p(F )
93
where
(ai |ai ) := (margAi (|ai ))(ai ) =
94
5. Equilibrium
(5.2.5)
Since ui depends only on actions, the terms in (5.2.5) can be regrouped
to obtain
X
X
p()[ui (ai , ai ) ui a0i , ai ] (5.2.6)
a0i ,
ai ( )1 (ai ,ai )
X
=
[ui (ai , ai ) ui a0i , ai ]
ai
p() 0,
( )1 (ai ,ai )
95
96
5. Equilibrium
ai Ai
that is, ai ri ((|ai )). Since this is true for each player i, it follows that
C has the best reply property.
The concept of correlated equilibrium can be generalized assuming
that different players can assign different probabilities to the states
. This generalization is called subjective correlated equilibrium
(Brandenburger e Dekel, 1987). It can be shown that an action ai is
rationalizable if and only if there exists a subjective correlated equilibrium
in which ai is chosen in at least one state (see section 7.3.5 of Chapter
7).
5.2.3
Selfconfirming Equilibrium
2, 0
2, 1
0, 0
3, 1
97
One could object to that if Rowena is patient, then she will want to experiment with
action b so as to be able to observe (indirectly) Colins behavior, even if that implies an
expected loss in the current period. The objection is only partially valid. It can be shown
that for any degree of patience (discount factor) there exists a set of initial conjectures
that induce Rowena to choose always the safe action a rather than experimenting with
b. It is true, however, that the more Rowena values future payoffs, the less is it plausible
that the process will be stuck in (t, r).
22
For an analysis of how this concept has originated and his relevance for the analysis
98
5. Equilibrium
99
100
5. Equilibrium
same action hold the same conjecture. This restriction is without loss of
generality.
Consider an agent from population i who holds conjecture i and keeps
playing a best reply ai ri (i ). In the case of anonymous interaction, what
an agent playing in role i observes at the end of the game depends on the
behavior of the agents playing in the other roles, but these agents are drawn
at random, therefore each message mi will be observed, in the long run,
with a frequency determined by the fraction of agents in population(s) i
choosing the actions that (together with ai ) yield message mi . Conjecture
i is confirmed if the subjective probabilities that i assigns to each
1
message mi given ai , that is (i (fi,a
(mi ))mi Mi , coincide with the observed
i
long-run frequencies of these messages.
The following example clarifies this point:
1\2
2, 3
2, 1
0, 0
3, 1
Assume that 25% of the column agents choose `, so that 75% choose r.
Then the probability that a row agent receives the message You got zero
euros when she chooses b is 14 . Row agents do not know this fraction
and could, for instance, believe that ` happens with probability 12 , which
would induce them to choose t; alternatively a probability of 51 , would
induce them to play b. It may happen, for instance, that half of the row
agents believe that P(`) = 12 and the other half believe P(`) = 15 . The
former ones choose t, the latter ones b. If the fractions of column agents
playing ` and r remain 25% and 75% respectively, then those who play b
will observe Zero euros 25% of the times and Three euros 75% of the
times. They will then realize that their beliefs were not correct and will
keep on revising them until eventually they will coincide with the objective
proportions, 25%:75%. These row agents will continue to play b. The other
half of the row agents, those that believe that P(`) = 21 and choose t, do
not observe anything new and therefore keep on believing and doing the
same things.
But is it possible that the fractions of column agents playing ` and
101
r stay constant? Indeed it is. Suppose that 25% of them believe that
P(t) = 25 and the rest believe that P(t) = 51 . The former ones will choose
` (expected payoff 65 > 1), the latter ones will choose r (as 1 > 35 ). Those
choosing r do not receive any new information and keep on doing the same
thing. Those choosing ` half of the times observe Three euros, and the
other half Zero Euros. Their conjectures are not confirmed, but they
will keep revising upward the probability of t and best replying with ` ,
until their conjectures converge to the long-run frequencies 50%-50%, i.e.
the actual proportions of row agents choosing t and b.
Then there is a stable situation characterized by the following fractions,
or mixed actions: 1 (t) = 21 , 2 (`) = 41 . These fractions do not form
a mixed equilibrium: in fact, the indifference conditions for a mixed
equilibrium yield 1 (t) = 13 , 2 (`) = 13 .
It should be clear from the discussion that here, as in our interpretation
of the mixed equilibrium concept, mixed actions are interpreted as
statistical distributions of pure actions in populations of agents playing
in the same role.
Let us move to a general definition. Denote by Pfaii ,i [mi ] the probability
of message mi determined by action ai and conjecture i , given the
feedback function fi :24
X
Pfaii ,i [mi ] =
i (ai ).
ai :fi (ai ,ai )=mi
In other words, Pfaii ,i [mi ] and Pfaii ,i [mi ] are, respectively, the pushforward
of i and i through fi,ai : Ai Mi , the section of is feedback function
at ai :
1
1
Pfaii ,i [] = i fi,a
, Pfaii ,i [] = i fi,a
.
i
i
24
102
5. Equilibrium
Definition 22. Fix a game with feedback (G, f ). A profile of mixed actions
and conjectures (i , (iai )ai Suppi )iI is an anonymous self-confirming
(or conjectural) equilibrium of (G, f ) if for every role i I and action
ai Suppi the following conditions hold:
(1) ( rationality) ai ri (iai ),
(2) ( confirmed conjectures) mi Mi , Pfaii ,i [mi ] = Pfaii ,i [mi ].
ai
103
In the case of anonymous interaction, we can consider at least two scenarios: (a) the
individuals observe the statistical distribution of the actions in previous periods, (b) the
individuals observe the long-run frequencies of the opponents actions. In both cases we
can say that a conjecture is correct if it corresponds to the observed frequencies.
104
5. Equilibrium
105
106
5. Equilibrium
Ci Fi (ai )
ai Ci j6=i
ai Ci j6=i
ai
ai Ci
therefore
u
i (ai , i ) =
X
Ci Fi (ai )
ui (ai , Ci )
ai Ci
Learning Dynamics,
Equilibria and
Rationalizability
So far we avoided the details of learning dynamics. The analysis of such
dynamics requires the use of mathematical tools whose knowledge we do
not take for granted in these lecture notes (differential equations, difference
equations, stochastic processes). Nonetheless, it is possible to state some
elementary results about learning dynamics by addressing the following
question: When is it the case that a trajectory, i.e., an infinite sequence of
t
t
action profiles (at )
t=1 (with a = (ai )iI A), is consistent with adaptive
learning? We start with a qualitative answer.
Consider a finite game G that is played recurrently. Assume that all
actions are observable and consider the point of view of a player i that
observes that a certain profile ai has not been played for a very long
time, say for at least T periods. Then it is reasonable to assume that i
assigns to ai a very small probability. If T is sufficiently large, and ai was
not played in the periods t, t+ 1, ..., t+ T , ..., t+ T , then the probability of
ai in t > t+T will be so small that the best reply to is conjecture in t will
also be the best reply to a conjecture that assigns probability zero to ai .
In other words, i will choose in t > t + T only those actions that
are best
replies to conjectures i such that Suppi ai : t < t , i.e. only
107
108
actions in the set ri ai : t < t .1 Notice that this argument
assumes only that i is able to compute best replies to conjectures. The
argument is therefore consistent with a high degree of incompleteness of
information about the rules of the game (the consequence function) and
the opponents preferences.
6.1
Ai is finite and ui (ai , i ) is continuous (in fact linear) with respect to probabilities
( (ai ))ai Ai . It follows that if ai is a best reply to a conjecture that assigns a
sufficiently small probability to ai , then ai is a best reply also to a slightly different
conjecture that assigns probability zero to ai .
2
The following results are adapted from Milgrom and Robert (1991).
i
109
(at )
t=1 converges to a , written a a , if there exists a time t such that,
t
(a) = lim
i.e. a
/ Supp. Therefore, for
every a Supp
there exists a time Ta0
such that, t > t + Ta0 , a a : t < t . Let T 0 := maxaSupp Ta0
(this
because A is finite). Then t > t + T , Supp
is well defined
a : t < t . Moreover, if (a) = 0 there exists a Ta00 such that
3
110
Therefore
there exists
a sufficiently large T for which t > t + T ,
t
a ( a : t < t ).
The next two results show that, in the long run, adaptive learning
induces behavior consistent with the complete-information solution
concepts of earlier chapters, even if consistency with adaptive learning
does not require complete information.
Proposition 2. Let (at )
t=0 be a trajectory consistent with adaptive
learning. Then (1)
k 0, tk , t tk , at k (A),
(6.1.1)
so that only rationalizable actions are chosen in the long run; (2) if
at a , then a is a Nash equilibrium.
Proof.
(1) First recall that since A is finite there exists a K such that k K,
k (A) = (A). Then, (6.1.1) implies that from some time tK onwards
only rationalizable actions are chosen. The proof of (6.1.1) is by induction.
The statement trivially holds for k = 0. Suppose by way of induction that
the statement is true for a given k. By consistency with adaptive learning,
there exists a Tk such that
t > tk + Tk , at ({a : tk < t}).
111
6.2
if for every
that, for every
t there exists a T such
t > t + T and i I,
t
ai i ai : , t < t, fi (a ) = fi (ai , ai ) .
112
(2)
ai : fi (ai , ai ) = fi (ai , ai ) = 1; (ai , i )iI is a self-confirming
equilibrium.
114
100, 100
0, 99
99, 0
99, 99
7.1
Vice versa, recall that the interpretation of a Nash equilibrium (pure or mixed) as a
stationary state of an adaptive process does not require common knowledge, nor mutual
knowledge of the game.
2
The set of the opponents actions could be unknown as well. We omit this source of
uncertainty for the sake of simplicity.
115
assume that there is no such residual uncertainty (in many cases, this
assumption is rather innocuous). In this case 0 is a singleton and it
makes sense to omit it from the notation, writing = (1 , ..., n ).
Example 12. Consider again the team of two players producing a public
good (example 1). Now we express the production function and cost-ofeffort functions in a slightly different parametric form. The parametrized
production function is
y
y
Y = 0 (a1 )1 (a2 )2 .
The parametrized cost of the effort of player i measured in terms of output
is
1
Ci (ic , ai ) = c (ai )2 .
2i
The parametrized consequence function specifies a triple (Y, C1 , C2 ) as a
function of (, a1 , a2 ):
y
y
1
1
g(, a1 , a2 ) = 0 (a1 )1 (a2 )2 , c (a1 )2 , c (a2 )2 .
21
22
The commonly known utility function of player i is vi (Y, C1 , C2 ) = Y Ci .
The parametrized payoff function is then
y
1
(ai )2 .
2ic
116
n
X
qi .
i=1
1
(qi )2 .
2ic
Then
(
ui (, q1 , ..., qn ) = qi max 0, 0
n
X
i=1
)
qi
1
(qi )2 .
2ic
(7.1.1)
0.
Assume, for simplicity, that the marketing department operational cost is equal to
117
(1) firm i can condition its behavior on im ; (2) firm j 6= i may believe that i
will condition its behavior on im and try to exploit the correlation between
jm , 0 and im to infer the behavior of i, and so on.
7.1.1
118
7.1.2
An elementary representation
information rationalizability
of
incomplete
119
1 :
G
4, 0
2, 1
2, 0
0, 1
3, 1
1, 0
0, 1
1, 2
120
i knows. Then, in this case, the set of possible actions of Colin (the
singleton {d }) is independent of .
Example 18. Players 1 and 2 receive an envelope with a money prize
of k thousands Euros, with k = 1, ...K. It is possible that both players
receive the same prize. Every player knows the content of her own envelope
and can either offer to exchange her envelope with that of the other
player (action OE=Offer to Exchange), or not (action N=do Not offer to
exchange). The OE/N actions are taken simultaneously and the exchange
takes place only if it is offered by both. In order to offer an exchange a
player has to pay a small transaction cost . The players payoff is given
by the amount of money they end up with at the end of the game. For
this example, i = {1, ..., K} and ui (, a) is given by the following table:
ai \aj
OE
OE
121
0 ,i ,ai
k1
122
Behavioral implications
( A)
R B(R)
2 ( A)
R B(R) B2 (R)
3 ( A)
...
R
T
K
k
k=1 B (R)
...
...
K+1 (
A)
...
Note that, by finiteness of A, for each i there is at least one action justified by
some conjecture i (i ). Therefore proji i (i ) = i .
123
ai i (i )
i (0 , i , ai )ui (0 , i , i , ai , ai )
0 ,i ,ai
7.2
124
2 :
G
4, 0
2, 1
1, 1
0, 0
3, 1
1, 0
0, 1
2, 0
Of course, equilibrium has yet to be defined in this context. The formal definition
will follow.
125
126
beliefs (p )iI is
structure I, 0 , (i , Ai , ui , pi )iI assuming that all the features of the
interactive situation it describes are transparent, and we can meaningfully
define as equilibrium a profile of choice functions (1 , ..., n ) such that,
for each i and i
X
pi (0 , i )ui (0 , i , i , ai , i (i )),
(7.2.1)
i (i ) arg max
ai Ai
0 ,i
127
It turns out that this game has a symmetric equilibrium in which each
player bids i (i ) = n1
n i .
How can we guess that these functions form an equilibrium? Consider the
following heuristic derivation. If i = 0 it does not make sense (indeed,
it is weakly dominated) to offer more than zero. Now assume i > 0. If
bidder i conjectures that each competitor bids according to the linear rule
j (j ) = kj , where k (0, 1) is a coefficient that we have to determine,
then i believes that, if he bids ai , the probability of winning the object is
ai
P[j 6= i, j (j ) < ai ] = P[j 6= i, j < ]
k
(we can neglect ties because, for each competitor j, the probability of event
[e
aj = ai ] is zero). Since we assumed independent and uniform distributions
of values, we have
a n1
i
ai
, if aki < 1,
k
P[j 6= i, j < ] =
k
1,
if aki 1.
Thus, if bidder i is rational he offers the smallest of the following numbers:
n1
12
arg max0ai <k aki
(i ai ) = n1
n i and k, that is
n1
i }
n
In a symmetric equilibrium each player has a correct conjecture about the
bidding functions of the competitors and each player has the same (best
n1
reply) bidding function, therefore k = n1
n . This implies min{k, n i } =
n1
n1
min{ n1
n , n i } = n i . Therefore, the optimal bid when the symmetric
conjecture of i is that each j bids j (j ) = n1
n j is precisely ai = i (i ) =
n1
n i .
ai = min{k,
7.3
128
129
(i ))),13 from which the first and second-order beliefs can be recovered
by marginalization. At this point, the formalism is already quite complex.
Furthermore, there is no compelling reason to stop at third-order beliefs.
How should we proceed? Is it possible to use a formal and compact
representation of strategic interactions with incomplete information which
is not too complex, but at the same time allows a representation of beliefs
of arbitrarily high order and a meaningful definition of equilibrium? The
answer is Yes.
7.3.1
Bayesian Games
Clearly, the symbol rj used here is not to be confused with the symbol denoting the
best reply correspondence.
14
This fundamental contribution by lead to the award to John Harsanyi of the 1994
Nobel Prize for Economics, jointly with John Nash and Reinhard Selten.
15
This is what Harsanyi calls the random vector model of the Bayesian game. The
only difference is that here (not in his general analysis of incomplete information)
Harsany postulates a common prior, p = p1 = p2 = ... = pn , which represents an
objective probability distribution.
130
pi () > 0
:i ()=ti
pi [E i1 (ti )]
.
pi [i1 (ti )]
With this, the beliefs of player i at state of the world are given by
the distribution pi [|i ()]. Since i and pi are common knowledge, the
function 7 pi [|i ()] () is also common knowledge. It follows
that a state of the world determines a players beliefs about the parameter
and about other players beliefs about it. To verify this claim, let us focus
on the case of two players and distributed knowledge of (this is done
only to simplify the notation) and let us derive the beliefs of i about tj
given signal ti :
tj Tj , pi [tj |ti ] = pi [j1 (tj )|ti ],
where j1 (tj ) = { : j () = tj } . Next we derive the first-order beliefs
about (since i knows i , his beliefs about are determined by his beliefs
about j ):
j j , p1i [j |ti ] := pi [1
j (j )|ti ],
1
where 1
j (j ) = {tj : j (tj ) = j }. The functions tj 7 pj [|tj ] (i )
(j {1, 2}) are also common knowledge. Hence, we can derive the secondorder beliefs about , that is, the joint belief of a player about and about
the opponents first-order beliefs about :
pi [tj |ti ].
It can be shown that this comes without substantial loss of generality provided that
we allow the subjective priors p1 , ..., pn to be different.
17
We denote by pi [] and pi [|], respectively, the prior and conditional probabilities of
events. For instance, pi [ti |ti ] is the probability that event [ti ] = { : i () = ti }
occurs conditional on the event [ti ] = { : i () = ti } .
131
p1j
in words, p11 [|ti ] is the marginal on j of the joint distribution p2i [|ti ]
(j (i )).]
It should be clear by now that is possible to iterate the argument
and compute the functions that assign to each type the corresponding
third-order beliefs about , fourth-order beliefs about , and so on. To
sum up, we can conclude that the information and beliefs of all orders
of player i about are determined by ti according to the function
ti 7 (i (ti ), p1i [|ti ], p2i [|ti ], ...). Signal ti is called the type 18 of player i. We
denote the information and belief hierarchy associated with ti by pi (ti ) .
The information and beliefs of each player i in a given state of the world
are those corresponding to the type ti = i (). The information-type of
player i, i = i (ti ), is just one component of his overall type ti , which also
specifies is beliefs about all the relevant exogenous variables/parameters,
i , p1i , p2i , etc. (exogenous here means something that we are not
trying to explain as game theorists: we are not trying to explain , nor
beliefs about , nor beliefs about such beliefs, nor - more generally - any
beliefs about exogenous things).
Definition 28. A Bayesian Game is a structure
BG = hI, , 0 , 0 , (i , Ti , Ai , i , i , pi , ui )iI i
where, for each i I, pi (), 0 : 0 , i : Ti , i : Ti i ,
pi [ti ] := pi [i1 (ti )] > 0 for each ti Ti ,19 and ui : A R.
The analysis of solution concepts for Bayesian games will clarify that
only the beliefs pi [|ti ] (i I, ti Ti ) really matter. The priors pi are just
a convenient mathematical tool to represent the beliefs of each type.
In what follows, we will often use the expression type ti chooses ai
to mean that if player i were of type ti he would choose action ai .
18
19
132
7.3.2
Bayesian Equilibria
where
p[0 , ti |ti ]
is
the
probability
that event [0 , ti ] = { : 0 () = 0 , i () = ti } occurs conditional
on the event [ti ] = { : i () = ti } .
Example 21. We illustrate the concepts of Bayesian game and equilibrium
2 . Suppose that Rowena does not know Colins first-order
elaborating on G
beliefs. From her point of view such beliefs can assign either probability 31
or 34 to a . The two possibilities are regarded as equally likely by Rowena
and all this is common knowledge. This situation can
be represented
by the
following Bayesian game: = {, , , }, 1 = 1a , 1b , T1 = ta1 , tb1 ,
b
a
a
b
b
1 () = 1 () = ta1 , 1 () =
1 () = t1 , 1 (t1 ) = 1 , 1 (t1 ) = 1 ,
n o
, p1 () = 1 , 2 = 2 , T2 = {t0 , t00 }, 2 () = 2 () = t0 ,
2
t002 ,
p2 () = 83 ,
2.
are as in G
1
6,
1
8,
1
3,
2 () = 2 () =
p2 () = p2 () = p2 () = and where
the functions ui
The probabilistic structure is described in
the table below:
To verify that this Bayesian game represents the situation outlined
above we compute the following:
t002
a , ta1
, 14 , 83
, 14 , 16
b , tb1
, 14 , 18
, 14 , 13
133
p2 ()
3/8
3
=
=
p2 () + p2 ()
3/8 + 1/8
4
p2 ()
1/6
1
p12 [a |t002 ] = p2 [ta1 |t002 ] =
=
=
p2 () + p2 ()
1/6 + 1/3
3
1
p1 [t02 |ta1 ] = p1 [t02 |tb1 ] = .
2
This means that in every state of the world Rowena believes that the two
events [Colin assigns probability 43 to a ] and [Colin assigns probability
1
a
4 to ] are equally likely. We now derive the Bayesian equilibria. Let
= (1 , 2 ) be an equilibrium. Since a is dominant when Rowena knows
that = a , 1 (ta1 ) = a. Hence, the equilibrium expected payoff accruing
to type t02 if he chooses c is
p12 [a |t02 ] = p2 [ta1 |t02 ] =
3
1
1
0+ 1= ;
4
4
4
0
the equilibrium expected payoff for t2 if he chooses d is
p2 [ta1 |t02 ]u2 (a , a,c) + p2 [tb1 |t02 ]u2 (b , 1 (tb1 ), c) =
3
1
3
1+ 0= .
4
4
4
0
It follows that in equilibrium 2 (t2 ) = d. The expected payoffs for the two
actions of type t002 are
p2 [ta1 |t02 ]u2 (a , a,d) + p2 [tb1 |t02 ]u2 (b , 1 (tb1 ), d) =
1
0+
3
1
p2 [ta1 |t002 ]u2 (a , a,d) + p2 [tb1 |t002 ]u2 (b , 1 (tb1 ), d) =
1+
3
Since the maximizing choice for type t002 is c, 2 (t002 ) = c.
determine the equilibrium choice for type tb1 :
p2 [ta1 |t002 ]u2 (a , a,c) + p2 [tb1 |t002 ]u2 (b , 1 (tb1 ), c) =
1
0+
2
1
2+
2
2
2
1= ,
3
3
2
1
0= .
3
3
We can now
1
1
1= ,
2
2
1
0 = 1.
2
134
Therefore, 1 (tb1 ) = b.
7.3.3
X
ti Ti
pi [ti ]
ti Ti
Let i = ATi i (recall, this is the set of functions with domain Ti and range
Ai ). The ex ante strategic form of BG is the static game
AS(BG) = hI, (i , Ui )iI i ,
where Ui is defined by (7.3.1). The reader should be able to prove the
following result by inspection of the definitions and using the assumption
that each type ti is assigned positive probability by the prior pi :
Remark 23. A profile (1 , ..., n ) is a Bayesian equilibrium of BG if and
only if it is a Nash equilibrium of the game AS(BG).
The interim strategic form is based on a different metaphor. Assume
that for each role i in the game there is a set of potential players Ti . A
potential player ti is characterized by the payoff function ui (i (ti ), , , ) :
0 i A R and the beliefs pi [, |ti ] (0 Ti ). In the event
that ti is selected to play the game in the role of agent i, he will assign
probability p[0 , ti |ti ] to the event that the residual uncertainty is 0 and
135
that he is facing exactly the profile of opponents ti = (tj )j6=i . The set
of actions available to ti is Ai , that is, Ati = Ai (i I, ti Ti ). If each
potential player tj Tj chooses the action atj Atj , ti s expected payoff
is computed as follows:
X
uti ((atj )jI,tj Tj ) :=
pi [0 , ti |ti ]ui (0 , i (ti ), i (ti ), ati , (atj )j6=i ).
0 ,ti
(7.3.2)
The
interim
strategic
form
of
BG
is
the
following
static
game
with
P
|T
|
players:
i
iI
*
+
[
Ti , (Ati , uti )iI,ti Ti ,
IS(BG) =
iI
where uti is defined by (7.3.2). The reader should be able to prove the
following result by inspection of the definitions:
Remark 24. A profile (1 , ..., n ) is a Bayesian equilibrium of BG if and
only if the corresponding profile (ati )iI,ti Ti such that ati = i (ti ) (i I,
ti Ti ) is a Nash equilibrium of the game IS(BG).
7.3.4
136
7.3.5
Subjective
correlated
equilibrium,
equilibrium and rationalizability
Bayesian
In a way, one could say that Harsanyi [22] had implicitly defined the correlated
equilibrium concept before Aumann [1], but without being aware of it!
137
138
7.4
139
where p[|] =
1 ( ,..., )]
p[{}1
n
1
0 (0 )
, 1
1 ( ,..., )]
0 (0 )
p[1
(
)
n
0
1
0
0
0
0
{ : (1 ( ) = 1 , ..., n ( ) = n }
= { 0 : 0 ( 0 ) = 0 },
1 (1 , , , .n ) =
(the value of p[|] if
p[] = 0 is irrelevant for expected payoff computations). Keeping in mind
Remark 23, it is easy to verify that a profile of functions (i )iN is a
Bayesian equilibrium of the game BG so obtained if and only if it is an
equilibrium of the asymmetric information game.
Recall that in order to introduce the constituent elements of the
model of the Bayesian game we used the metaphor of the ex ante stage.
The use of such metaphor and the mathematical-formal analogy between
Bayesian and asymmetric information games (and the corresponding
solution concepts) has induced many scholars to neglect the differences
between these two interactive decision problems. We should stress,
however, that they are indeed two different problems. In the incomplete
information problem the ex ante stage does not exist, it is just a useful
theoretical fiction: only represents a set of possible states of the world,
where by state of the world we mean a configuration of information and
subjective beliefs. The so called prior beliefs pi () are simply a useful
mathematical device to determine (along with functions 0 , i and i ) the
players interactive beliefs in a given state of the world.23 Instead, in the
23
Essentially, nothing would have changed if we had only specified the belief functions
i : Ti (0 Ti ) that for any type ti determine a probability measure over the set
of residual uncertainty and other players types. At the end of the day, we are simply
interested in the types beliefs. Moreover, it is always possible to specify a distribution
pi that yields the beliefs pi [|ti ] = i (ti ) for any type ti Ti . Actually, there exist an
infinite number of them, as it is apparent from the following construction. Let (Ti )
be any strictly positive distribution and determine pi as follows
(0 , ti , ti )
140
7.5
141
Recall that we say that a choice is justifiable if it is a best reply to some belief. A
choice is rationalizable if it survives the iterated elimination of non justifiable choices.
142
that such action profiles coincide with the strategy profiles of the opponents
of player i in the ex ante strategic form:
(atj )j6=i,tj Tj j6=i ATj = j6=i j = i .
Hence, the set of conjectures of any type/player ti in the interim strategic
form coincides with the sets of conjectures of player i in the ex ante
strategic form. To be consistent with the notation used for games of
complete information, we denote by Ui (i , i ) and uti (ai , i ) the expected
payoffs of player i and of type ti in the corresponding strategic forms, given
conjecture i (i ).
Let us fix arbitrarily a strategy i and a conjecture i . We now prove
that i is a best reply to i if and only if, for every type ti Ti , action
i (ti ) is a best reply to i . This implies the thesis. Fix i (i ). For
every strategy i , the expected payoff of i given i is
Ui (i , i ) =
i (i )Ui (i , i )
i (i )
ti
pi [ti ]
X
i
ti
pi [ti ]
i (i )
0 ,ti
0 ,ti
ti
ti
if and only if
ti Ti , i (ti ) arg max uti (ai , i ).
ai
The average of the expected payoffs for the different types of i is maximized
if and only if26 the expected payoff of every type of i is maximized.
26
Of course, the only if part holds because we assume that pi [ti ] > 0 for each ti Ti .
143
00
3, 3
0, 2
3, 0
0, 2
2, 0
2, 2
2, 0
2, 2
0, 0
3, 2
0, 3
3, 2
144
00 . It is easy to verify that for each type of Rowena all the actions are
justifiable by some conjecture about Colin. Hence, if choices are evaluated
at the interim stage it is not possible to exclude any action, everything
is interim rationalizable. Instead if choices are evaluated at the ex ante
stage, then the strategy 1 ( 0 ) = a, 1 ( 00 ) = b, denoted by ab, can be
excluded. Indeed, since Rowenas payoff does not depend on the state, ab
could be a best reply to some conjecture 1 only if both a and b were
best replies to 1 . But if Rowena believes 1 (c) 12 , then b is not a best
reply; if Rowena believes 1 (c) 21 , then a is not a best reply. Therefore
strategy ab is not justifiable by any 1 . The same argument shows that
ba is not justifiable either. The ex ante justifiable strategies of Rowena
are: aa,am,ma,mm,bb,bm,mb.27 Given any belief 2 such that 2 (ab) = 0,
action c yields an expected payoff less or equal than 32 (check this); action
d instead yields 2 > 32 . Then, the only rationalizable action for Colin in the
ex ante strategic form is d. It follows that the only rationalizable strategy
of Rowena in the ex ante strategic form is bb.
Although the gap between ex-ante and interim rationalizability was
at first just accepted as a fact, it should be disturbing. Conceptually,
rationalizability is meant to capture the assumptions of rationality (i.e.,
expected utility maximization) and common belief in rationality given
the background transparency of the Bayesian Game BG. Since both
strategic forms are based on the same Bayesian game and since ex-ante
maximization is equivalent to interim maximization,28 why does not exante rationalizability coincide (in terms of behavioral predictions) with
interim rationalizability? Which features of the strategic form create
the gap highlighted in Example 22? Before providing an answer to
these questions, we introduce another puzzling feature of rationalizability
in Bayesian environments: the dependence of interim rationalizability
on apparently irrelevant details of the state space. To understand this
problem, consider the following example.
Example 23. [Dekel et al. 2007] Rowena and Colin are involved in a
27
Given that Rows payoff does not depend on , those strategies that select different
actions for the two states are justifiable only by beliefs that make Row indifferent between
these two actions; the strategies am and ma are among the best replies to the belief
1 (c) = 32 , the strategies bm and mb are among the best replies to the belief 1 (c) = 31 .
28
This equivalence holds under the assumptions that for every player i, all types ti
have positive probability.
145
betting game. Each player can decide to bet (action B ) or not to bet
(action N ). There is an unknown parameter
0 0 = {00 , 000 } and players
have no private information (i = i for every agent i). Rowena (Colin)
wins if both players bet and 0 = 00 (0 = 000 ). The decision to bet entails
a deadweight loss of $4 independently of the opponents action; if both
agents bet, the loser gives $12 to the winner. The corresponding game
can be represented as follows:
with payoff uncertainty G
00
000
8, 16
4, 0
16, 8
4, 0
0, 4
0, 0
0, 4
0, 0
Let us assume that there is common belief that each agent assigns
probability 21 to each payoff state. Thus, each player i has only one
hierarchy of beliefs about 0 .
The simplest state space representing this situation is the following:
= { 0 , 00 } ,,0 ( 0 ) = 00 , 0 ( 00 ) = 000 and for every agent i, Ti = {ti },
functions i and i are trivially defined and pi [ 0 ] = 12 . In this case, the
ex-ante and the interim strategic forms coincide and are represented by
the following game:
4, 4
4, 0
0, 4
0, 0
0 () =
0
0
000
if 1 , 2 , 3 , 4
,
if 5 , 6 , 7 , 8
146
2 () =
0
t2
if 1 , 3 , 5 , 7
t002
if 2 , 4 , 6 , 8
and p1 = p2 = p:
State of the world,
p []
1
4
2
0
1
4
1
4
1
4
One can easily check that this common prior induces a distribution over
residual uncertainty and types which can be represented as follows:
00
t02
t002
000
t02
t002
t01
1
4
t01
1
4
t001
1
4
t001
1
4
and that for both types of both players the following holds: (i) the player
has no private information, (ii) he assigns probability 21 to each state 0 ,
and (iii) there is common belief of this. Thus, this second state space
represents exactly the same belief hierarchies than the former one: each i
assigns probability 12 to each 0 , each i is certain that i assigns probability
1
2 to each 0 , and so on. If we write down the ex-ante strategic form derived
from this state space, we get the following game:
BB
BN
NB
NN
BB
4, 4
4, 2
4, 2
4, 0
BN
2, 4
1, 5
5, 1
2, 0
NB
2, 4
5, 1
1, 5
2, 0
NN
0, 4
0, 2
0, 2
0, 0
147
148
,ai
ai Ai : i ( Ai ) , i : 0 Ti (Ai ) ,
1) ai ri ti , i
ICRik,BG (ti ) =
k1,BG
3) (, ai ) , i (, ai ) = p [ | ti ] i (0 () , i ()) [ai ]
k1,BG
where ICRi
(ti ) = j6=i ICRjk1,BG (tj ) .
In the previous definition, function i (0 , ti ) represents the
conjecture of player i concerning the behavior of his opponents given their
31
This problem would not arise if for every player i and every information and belief
hierarchy he may hold, there were at least |Ai | types representing this information and
belief hierarchy. Stricter requirement can be provided, but this would go beyond the
scope of these notes.
149
= I, 0 , (i , Ai , ui )
Theorem 16. Fix G
iI and consider two Bayesian
Games based on it: BG0 = hI, 0 , 0 , 00 , (i , Ti0 , Ai , i0 , 0i , p0i , ui )iI i and
BG00 = hI, 00 , 0 , 000 , (i , Ti00 , Ai , i00 , 00i , p00i , ui )iI i . Let t0i Ti0 and t00i
0
00
Ti00 . Then, if pi (t0i ) = pi (t00i ) , ICRiBG (t0i ) = ICRiBG (t00i ) .
Proof. Take t0i Ti0 and t00i Ti00 and suppose pi (t0i ) = pi (t00i ) . We will
0
prove by induction that for every i and for every k 0 ICRik,BG (t0i ) =
00
ICRik,BG (t00i ) . To simplify notation, we will not specify the dependence
of the set of interim correlated rationalizable actions on Bayesian Game
(notice that this is also justified by the statement of the Theorem). The
result is trivially true for k = 0. Now, suppose that ICRis (t0i ) = ICRis (t00i )
for every s k 1. We need to show that ICRik (t0i ) = ICRik (t00i ) .
Take any ai ICRik (t0i ) . By definition, we can find it0 (0 Ai )
i
0
and t0i : 0 Ti (Ai ) such that (i) ai ri t0i , it0 , (ii)
i
32
Indeed, it is possible to show that the set of interim rationalizable actions for type
ti is equivalent to the set of actions obtained through an iterative construction similar
to the one we just described, but in which i : Ti (Ai ) .
150
k1
t0i (0 , ti ) [ai ] > 0 implies ai ICRi
(ti ) and (iii) for every pair
i
0
0
0 () [a ] .
, ai , t0 (, ai ) = p [ | ti ] t0i 0 () , i
i
i
k1
Let Pi
=
0j (tj ) , pk1
: tj Tj0 be the set of possible
(tj )
j
j6=i
and
,
a
to denote
0
0
0
i
BG
i
BG0
0
BG
BG0
0
k1,BG0
[0 ]BG0 ai BG0 . Finally, for every 0 , let
i
() =
j0 ()
and, similarly, for every 00 , let
0j j0 () , pk1
j
k1,BG00
i
() = 00j j00 () , pjk1 j00 () .
k1
k1
Then for every 0 0 and i
Pi
, define
h
i
X
k1
=
t0i (, ai ) .
t0i 0 , i
k1
(,ai )[0 ,i
]BG0
h
i
k1
t0i 0 , i
is the probability that type t0i assigns to the event residual
uncertainty is 0 and other agents have
private
h
i information and k 1-th
k1
k1
order beliefs given by i . If t0i 0 , i > 0, let
P
t0i (, ai )
k1 0
(,ai )[0 ,i
,a ]
k1
h i BG0i
;
ti 0 , i
a0i =
k1
t0i 0 , i
h
i
k1
0
if ti 0 , i
= 0, take any t0i =
t0j
Ti
such that
j6
=
i
k1
0j t0j , pk1
t0j
= i
and let
j
j6=i
k1
t0i 0 , i
[ai ] =
k1 0
|ICRi
(ti )|
k1 0
if ai ICRi
ti
otherwise
151
(by the inductive hypothesis, the definition of t0i does not depend on the
actual choice of t0i ).
For every (, ai ) 00 Ai define:
k1,BG00
t00i (, ai ) = p | t00i t0i 000 () ,
i
() [ai ] ,
where the previous expression is well defined since BG0 and BG00 are based
and pi (t0 ) = pi (t00 ) . Moreover, since pi (t0 ) = pi (t00 ):
on the same G
i
i
i
i
h
i
X
k1
t0i 0 , i
=
t0i (, ai ) =
k1
(,ai )[0 ,i ]BG0
hn
o
i
k1,BG0
k1
= p 0 : 0 () = 0 ,
i
() = i
| t0i =
hn
o
i
k1,BG00
k1
= p 00 : 00 () = 0 ,
i
() = i
| t00i .
Then, for every 0 , a0i , we have
X
:00
0 ()=0
hn
0
k1
00
i
()
ai =
p | t00i t0i 000 () ,
i
t00i (, ai ) =
000 ()
= 0 ,
k1,BG00
() =
k1
i
t00i
t0i
k1
0 , i
k1
k1
i
Pi
P
=
h
i
k1
t0i 0 , i
k1
k1
i
Pi
ti (, ai )
i
=
k1 0
(,ai )[0 ,i
,ai ]BG0
k1
t0i 0 , i
t0i (, ai )
0 ,a0i BG0
(,ai )[
a0i =
152
k1
00 () . We conclude that a ICRk (t00 ). The statement
ai ICRi
i
i
i
i
of the Theorem follows by induction.
Notice that, by definition, the conditional independence restriction
implicit in the construction of the interim strategic form has no bite
when there is distributed knowledge of the state (0 is a singleton); in
this case it is easy to verify that interim correlated rationalizability is
equivalent to interim rationalizability. On the contrary, suppose that there
is some residual uncertainty (|0 | > 1) and that we insist on using interim
rationalizability. A natural question arises: can we at least provide an
expressible characterization of this solution concept and, in particular, of
the independence restriction implied by it? A characterization is deemed
expressible if it can be stated in a language based on primitives (that is,
elements contained in the description of the game with payoff uncertainty,
and terms derived from them (such as, hierarchies of beliefs over these
G)
primitives).
The answer to the previous question is affirmative only in some
particular cases and the reason goes to the very heart of Harsanyis
approach. To understand why, recall that interim rationalizability requires
players to regard opponents actions as independent of the residual
uncertainty conditional on their types. But what is a type? A type is a
self-referential object: it is a private information and a belief over residual
uncertainty and other players types. Thus, unless we can establish a 1-to-1
mapping between types and agents information and belief hierarchies over
primitives, the conditional independence assumption is not expressible.
Obviously, in order to assess the existence of such a mapping, we can use
both payoff-relevant and payoff-irrelevant primitives. Thus, even though
two types represent the same private information and belief hierarchy
over payoff-relevant parameters, they can still be distinguished by the
information and beliefs hierarchies over payoff-irrelevant elements that
they capture. Whenever this is the case, the conditional independence
assumption is expressible and we can show that, given the transparency of
BG, interim rationalizability characterizes the behavioral implications of
the following epistemic assumptions: (R) players are rational, (CI) their
beliefs satisfy independence between of the opponents behavior and the
residual uncertainty conditional on the information and belief hierarchy
over primitives of the opponents, and (CB(R CI)) there is common belief
of R and CI.
153
Notice that, insofar players may believe that (i) their opponents
behavior may depend on payoff-irrelevant information, and (ii) this
information is correlated with 0 , the information and belief hierarchies
over payoff-irrelevant parameters may still be strategically relevant. For
instance, in example 23, Rowena may be superstitious and believe
that the payoff-relevant state, 0 , is correlated with a particular dream
Colin may have had (payoff-irrelevant information), which could also affect
Colins decision to bet. Similarly, in example 15, a firm may believe
that the quantity produced by its competitor depends on the analysis
carried out by its marketing department and that this (payoff-irrelevant)
information may be correlated with the actual position of the demand
function (0 ).
As a special case, in which interim rationalizability admits an
expressible characterization, we can consider Bayesian Games with
information types.
Definition 32. Fix a game with payoff uncertainty
= hI, 0 , (i , Ai , ui )iI i .
G
A Bayesian Game
BG = hI, , 0 , 0 , (i , Ti , Ai , i , i , pi , ui )iI i
has information types if Ti = i .33 In this case, the
based on G
functions (i () : Ti i )iI are trivially defined (i = Idi ).
Focusing on Bayesian games with information types, one can show that
interim rationalizability characterizes the behavioral implications of the
following epistemic assumptions: (R) rationality, (CIP I) independence
between players actions and residual uncertainty conditional on players
private information, and (CB(R CIP I)) common belief of R and CIP I.
Besides being interesting per se, Bayesian games with information types
are also widely used in the study of many relevant problems in information
economics, such as adverse selection, moral hazard, mechanism design.
Now, let us turn to the other puzzling result: the gap between ex-ante
and interim rationalizability. As already pointed out, the gap highlighted
by example 22 arises because the interim strategic form regards different
33
154
i (i )
i i
i (, i , i ),
pi ()i (i )U
i i
155
Bayesian game in which the state space has information types; although
such restriction is not necessary from the mathematical point of view,
the ex-ante strategic form interprets types as actual information received
by players concerning the initial move by nature and, consequently, the
assumption of Bayesian games with information types is the most, if not
the only, reasonable one. For every i and i ( i ) let
X
ri i = arg max
i (, i ) i (, i (i ()) , i (i ())) .35
i i
,i
i i : i ( i ) ,
1) i ri i , 2) marg i = p,
ACRik,BG =
k1
3) i (, i ) > 0 = i ACRi
k1
where ACRi
= j6=i ACRjk1 .
156
p [0 , t]
if p [0 , t] = 0,
ti (0 , ti ) [ai ] =
1
k1,BG
(ti )
ICRi
k1,BG
if ai ICRi
(ti )
k1,BG
if ai
/ ICRi
(ti )
ti (, ai ) = p [ | ti ] ti (0 () , i ()) [ai ]
Observe that:
X
(, i ) = p [0 , t] ti (0 , ti ) [ai ]
= p [ti ] p [0 , ti | ti ] ti (0 , ti ) [ai ] ,
157
Thus:
X
i (, i ) i (, i (i ()) , i (i ())) =
,i
(, i ) i (, i (i ()) , i (i ())) =
X
ti
p [ti ]
X
0 ,ti
p [0 , ti | ti ]
ai
Since i ri i and p [ti ] > 0, we conclude that i (ti ) ri (ti ,
ti ) .
Finally, if p [0 , t] = 0, by construction ti (0 () , ti ) [ai ] > 0 only if
k1,BG
ai ICRi
(ti ) . If instead, p [0 , t] > 0, the same result follows
from the definition of ex-ante correlated rationalizability and the inductive
hypothesis. Thus, for every ti , we can conclude that i (ti ) ICRik,BG (ti )
and, consequently, that:
n
o
ACRik,BG i i : ti i , i (ti ) ICRik,BG (ti ) .
Now we will prove the other inclusion. For every ti Ti , let (ati , ti ) be
a pair such that ati ICRik,BG (ti ) . Thus for every ati we can find a
rationalizing belief ati ( Ai ) and a function ati : 0 Ti
(Ai ) satisfying the properties stated in the iterative definition of interim
correlated rationalizability. Now define the belief
i ( i ) as
follows:
p () ai () (0 () , i ()) [i (i ())]
o
i (, i ) = n
k1,BG
0
0
: i (i ()) = i (i ())
i ACRi
if
o
n
0 ACRk1,BG : 0 ( ()) = ( ()) 6= and
i (, i ) =
i
i i
i i
i
0 otherwise. It is immediate to see that
i (, i ) > 0 only if i
k1,BG
0
ACRi
. Now, let [, ai ] = {( , i ) : 0 = , i (i ()) = ai } .
158
Thus, if
i 0 , i =
( 0 ,i )[,ai ]
P
=
p () ai () (0 () , i ()) [ai ]
n
o
=
0
k1,BG
0 ( ()) = a
: i
i ACRi
i
i
k1,BG
i ACRi
:i (i ())=ai
= p () ai () (0 () , i ()) [ai ] .
Also
notice
that
every
n
ofor
0 ACRk1,BG : 0 ( ()) = a
(, ai ) , if i
=
,
the
inductive
i
i i
i
k,BG
hypothesis implies that ai
/ ICRi
(i ()) and consequently
ai () (0 () , i ()) [ai ] = 0. Therefore for every (, ai ) :
X
i (, i ) = p () ai () (0 () , i ()) [ai ]
( 0 ,i )[,ai ]
and
X
i (, i ) =
i (, i ) =
ai ( 0 ,i )[,ai ]
p [] ai () (0 () , i ()) [ai ] = p []
ai
Notice that we have already proved two of the properties in the definition
of ex-ante correlated rationalizability. Finally, for every i i
X
i (, i ) i (, i (i ()) , i (i ())) =
,i
,ai
Since p [ti ] > 0 for every ti , the strategy i defined by i (ti ) = ati for every
ti is such that i ri
i . We conclude that i ACRik,BG . Since pairs
(att , ti ) were chosen arbitrarily, we can write:
n
o
ACRik,BG i i : ti i , i (ti ) ICRik,BG (ti ) .
The statement of the Theorem follows by induction.
7.6
159
(1 ) :
M, M
1, L
L, 1
0, 0
L > M > 1,
0, 0
1, L
L, 1
M, M
< 1/2
160
= {(0, 0), (1, 0), (1, 1), (2, 1), (2, 2), (3, 2), (3, 3), ...}
= {(q, r) N N : q = r or q = r + 1} .
If the state is (q, q 1) it means that the last message (among those
sent either by C1 or by C2 ) has been sent by C1 . If the state is (r, r)
(with r > 0) it means that the last message sent by C1 has reached C2 ,
but the confirmation by C2 has not reached C1 . The signal functions are
1 (q, r) = q, 2 (q, r) = r. The function that determines is 1 (0) = ,
1 (q) = if q > 0.37 There is a common prior given by p(0, 0) = (1 ),
p(r + 1, r) = (1 )2r , p(r + 1, r + 1) = (1 )2r+1 for any r 0 (that
is, p(q, r) = (1 )q+r1 for any (q, r) \ {(0, 0)}). This information
is summed up in the following table:
The resulting beliefs, or conditional probabilities, are then determined
36
See [31] (or the textbook by Osborne and Rubinstein [28], pp 81-84). The analysis
in terms of interim rationalizability is not contained in the original work.
37
2 is a singleton, hence 2 is a constant.
t2 =r
161
...
0,
...
1,
(1 )
...
2,
(1 )
(1 )
...
3,
(1 )4
(1 )5
...
4,
5,
...
(1 )
(1 )
8
...
9
(1 )
(1 )
...
...
...
...
...
...
...
...
as follows:
p1 [0|0] = 1,
p1 [r|r + 1] =
p2 [0|0] =
p2 [r + 1|r + 1] =
(1 )2r
1
=
,
2r
2r+1
(1 ) + (1 )
2
1
,
1 +
(1 )2r+1
1
=
(r > 0).
(1 )2r+1 + (1 )2r+2
2
1 .
162
163
From steps (i) and (ii) it follows that, for every r > 0, if the
only rationalizable profile in state (r 1, r 1) is (a, a) then the only
rationalizable profile in state (r, r 1) is (a, a); if the only rationalizable
profile in state (r, r 1) is (a, a) then the only rationalizable profile in
state (r, r) is (a, a). Since we have shown that the only rationalizable
profile in state (0, 0) is (a, a), it follows by induction that (a, a) is the only
rationalizable profile in every state.
7.7
7.7.1
Long-run interaction
164
However, we need to be aware that the payoff ui does not necessarily represent a
material gain which can be cashed. Therefore in the theoretical analysis we are not
forced not assume that the realization of ui is observed by i (observable payoffs).
41
Recall that ri (i , i ) is the set of actions of i that maximize her expected payoff
given i and i .
165
The following remark highlights the fact that in the private-value case
(when ui does not depend on 0 and i ) we get the notion of selfconfirming equilibrium already introduced in section 5.2.3, coherently with
the claim made there according to which the self-confirming equilibrium
concept only presumes that a player knows her own payoff function, not
those of the co-players.
Remark 27. Fix a game with payoff uncertainty and feedback (G, f ), a
parameter value and an action profile a , and suppose that the game
with payoff uncertainty G has private values. Then a is part of a selfconfirming equilibrium at if and only if a is part of a self-confirming
equilibrium of the game with feedback (G , f ).
7.7.2
(7.7.1)
where q0 (0 ) , qi (i ), ui : A R (i I).
In this case, however, it is only assumed that each agent in each
population i knows i and ui (i , ), hI, 0 , q0 , (i , qi , Ai , ui )iI i is not
assumed to be common knowledge. Agents play the game recurrently,
each time with a different 0 drawn from 0 and with different coplayers randomly drawn from the respective populations. If the play
stabilizes, at least in a statistical sense, for every j, j , aj , the
fraction of agents in the sub-population j with characteristic j that
choose action aj remains constant; let j (aj |j ) denote this fraction
and write i = (j (|j ))j6=i,j j to denote the profile of fractions for
the populations different from i. By random matching, the long-run
frequency of the profile of others actions and characteristics (i , ai )
166
Q
is given by j6=i j (aj |j )qj (j ) and the long-run frequency of random
shock Q
0 and opponents actions and characteristics (i , ai ) is given by
q0 (0 ) j6=i j (aj |j )qj (j ).
The ex post information feedback of an agent playing in role i is
described by a feedback function fi : A M . Hence, in the long
run, an agent of population i with characteristic i that (always) chooses
ai , observes that the frequency each message mi is
X
q0 (0 )
j (aj |j )qj (j ).
j6=i
i (i , ai ).
iI,i i
(a mixed action i (|i ) for each i and i , and a conjecture for each i, i e
ai Suppi (|i )) is a self-confirming equilibrium of (G, f ) if for every
i I, i i , ai Ai , the following conditions hold:
(1) ( rationality) if i (ai |i ) > 0, then ai ri (i , i ),
(2) ( confirmed conjectures) mi Mi , Pfaii, i [mi |i ] = Pfaii, i ,qi [mi |i ].
Remark 28. The definition needs to be modified and made more stringent
if the distributions qj are known. In this case equilibrium conjectures must
167
42
A special case of equilibrium of this kind obtains when actions are observed
(fi (, a) = a) and conjectures are naive as they do not take into account that
opponents actions depend on their type. For example in the two-agents case without
residual uncertainty, i (j , aj ) = qj (j ) i (aj ) for some i (Aj ). This is called cursed
equilibrium. Such behavior may explain, for example, the so called winners curse in
common value auctions. This refinement of conjectural equilibrium was proposed by
Eyster and Rabin [17], although the link to the conjectural equilibrium concept was not
made explicit.
Part II
Dynamic Games
168
169
In this part we extend the analysis of strategic thinking to interactive
decision situations where players move sequentially. As the play unfolds,
players obtain new information. We consider a world inhabited by
Bayesian players. When a player receives a piece of information that
was possible (had positive probability) according to his previous beliefs,
then he simply updates his beliefs according to the standard rules of
conditional probabilities. When the new piece of information is completely
unexpected, the player forms new subjective beliefs.
The observation of some previous moves by the co-players may provides
information about the strategies they are playing and/or their type. How
such information is interpreted depends, for example, on beliefs about
the co-players rationality. This introduces a new fascinating dimension to
strategic thinking.
We focus on multistage games with observable actions. As in Part
I, we first analyze games with complete information and then move on
to games with incomplete information. Unlike Part I, here we mainly
focus on equilibrium concepts. The reason is that non-equilibrium solution
concepts of the rationalizability/iterated dominance kind are harder to
define and analyze in the context of dynamic games and, so far, such ideas
have had fewer applications to economic models than rationalizability in
static games. But we believe that these ideas are extremely important
and are going to be applied more and more often. If we managed to wet
your appetite for non-equilibrium solution concepts capturing transparent
assumptions on rationality and interactive beliefs, and you want to see
more of it in the context of dynamic games, then maybe you are ready to
start consulting the literature.43
43
You may start from reviews: Battigalli and Bonanno (1999), Brandenburger (2007),
Perea (2001).
171
the converse does not hold in all games. The one-shot deviation principle
states that the one-shot deviation property implies subgame perfection in
a large class of games including all finite horizon games and all games
with discounting. This result is applied to the analysis of repeated games.
Next, we extend the analysis introducing chance moves (Section 8.5) and
two different notions of randomized strategic behavior, mixed strategies
and behavioral strategies; the latter are better suited to define subgame
perfect equilibrium in randomized strategies (Section 8.6).
8.1
Preliminary Definitions
out
1\2
B2
S2
B1
3, 1
0, 0
S1
0, 0
1, 3
(2, 2)
Figure 8.1: Battle of the Sexes with an Outside Option.
172
plays of the game: either the pair of actions (out, wait) is played and
the game ends, or one of the following 4 sequences of actions pairs is
played: ((in, wait), (B1 , B2 )), ((in, wait), (B1 , S2 )), ((in, wait), (S1 , B2 )) and
((in, wait), (S1 , S2 )). A pair of payoffs (u1 (z), u2 (z)) is associated to each
possible play z.
A possible play of the game is called terminal history. Possible
partial plays like (in, wait) are called non-terminal (or partial) histories.
The rules of the game specify which sequences of action profiles are
terminal or non-terminal histories. For each terminal history, viz. z,1
the rules of the game specify a (collective) consequence c = g(z), e.g.,
a distribution of money among the players. Each player i assigns to
each consequence c utility vi (c). As in the first part, vi represents is
preferences over lotteries of consequences ` (C) by way of expected
utility calculations. Given the rules and the utility functions, each
terminal history z is mapped to a profile of payoffs (induced utilities)
(ui (z))iI = (vi (g(z))iI . For example, the payoff pair attached to
terminal history z = ((in, wait), (S1 , S2 )) is (u1 (z), u2 (z)) = (3, 1). Under
the complete information assumption, the rules of the game (including
the consequence function g) and players preferences over lotteries of
consequences (represented by the utility functions vi , i I) are common
knowledge. Thus, also the payoff functions z 7 ui (z) are common
knowledge. In this chapter, we assume complete information and we
directly specify the payoff functions. But, as in the analysis of static
games, it is important to keep in mind that such payoff functions are
derived from preferences and the consequence function g.
When discussing a specific example like the BoS with an Outside Option
it would be simpler to represent the possible plays as sequences like (out),
((in), (B1 , B2 )), ((in), (B1 , S2 )), etc. But we use this awkward notation
for a reason. Having each inactive player automatically choose wait
simplifies the abstract notation for general games: possible plays of the
game are just sequences of action profiles (a1 , a2 , ...) where each element
at of the sequence is a profile (ati )iI and there is no need to keep track in
the formal notation of who is active. Thus, if A denotes the set of all action
profiles, histories are just sequences of elements from set A. To allow for
the theoretical possibility of games that can go on forever, we also consider
1
Since z is the last (i.e., terminal) letter of the alphabet, it seems like a good choice
as a symbol to denote terminal histories.
173
infinite sequences of elements from A. The rules of the game specify which
sequences are possible, i.e., they specify the set of histories.
Since sequences of elements from a given domain are a crucial ingredient
of the formal representation, it is useful to introduce some preliminary
concepts and notation about sequences.
8.1.1
X`
`N0
Similarly, the power set of X, contains the empty set, that is, 2X , or equivalently
{} 2X .
174
The set of all finite and infinite sequences from X (empty sequence
included) is
X N0 := X <N0 X N .
Another worth noting difference between the words of natural
languages and the histories of a game is that the latter form a rooted
tree, i.e. a partially ordered subset T X N0 where (i) the order relation
x y is x is a prefix (initial subsequence) of y, (ii) every prefix of a
sequence in T (including the empty sequence) is also a sequence in T , so
that the empty sequence is the root of the tree, and (iii) (in games that
may never end) if z X N and every (finite) prefix of z is in T then also
z T.
8.1.2
175
preferences]
A multistage game form with observable actions is a structure
hI, C, g, (Ai , Ai ())iI i
given by the following components:
For each i I, Ai is a non-empty set of potentially feasible
actions.
Let A := iI Ai and consider the set A<N0 of finite sequences
of action profiles; then, for each i I, Ai () : A<N0 Ai is a
constraint correspondence that assigns to each finite sequence of
action profiles h` = (at )`t=1 the set Ai (h` ) of actions of i that are
feasible immediately after h` . It is assumed that Ai (h0 ) 6= and, for
all h A<N0 , Ai (h) = if and only if for every j I, Aj ( h) =
(the reason for the latter assumption will be explained below).
Sequence (at )`t=1 (` = 1, 2, ..., ) is a (feasible) history if a1 A(h0 )
and at+1 A(a1 , a2 , ..., at ) for each t {1, ..., ` 1}, where A(h) :=
iI Ai (h) for each h A<N0 . Thus, a history is a sequence of action
profiles whereby each action profile is feasible given the previous
ones. By convention, the empty sequence, denoted h0 , is a history.
A<N0 denote the set of histories; a history z = (at )` H
Let H
t=1
N
is terminal if either z A or A(z) = . Let
n
o
: z AN or A(z) =
Z := z H
denote the set of terminal histories. With this
g:ZC
is the consequence function.
In order to analyze interaction between a specific group of individuals
for given rules of the game, we add their personal preferences over lotteries
of consequences, represented by utility functions:
for each i I, vi : C R is the VonNeumann-Morgenstern utility
function of player i.
176
H := H\Z
denote the set of non-terminal (or partial) histories.
ui = vi g : Z R denote the payoff function of player i.
of histories has the structure of a tree (connected graph
The set H
without cycles) with a distinguished root, when endowed with the following
prefix of precedence relation:
Definition 36. Sequence h precedes sequence h0 , written h h0 , if h
is a prefix (initial subsequence) of h0 , i.e., either h = h0 and h0 6= h0 or
h = (at )kt=1 , h0 = (bt )`t=1 , k < ` and at = bt for each t {1, ..., k}.
Given
should be able to prove that
the definitions above, the reader
0
(1) h H
if h h0 then
(2) for each sequence h A<N and each history h0 H,
h H;
177
8.1.3
Comments
8.1.4
Graphical Representation
I() : A<N 2I has to be introduced among the primitive elements defining the game;
the constraint correspondence Ai () is defined on the subset Hi = {h A<N : i I(h)}.
178
Take
Leave
Take
1
0
Leave
Take
0
2
Leave
Leave
4
0
Take
3
0
0
4
179
Take of the active player, the horizontal arrows are associated to the action
Leave of the active player. The inactive player can only Wait. We explicitly
included the pseudo-action Wait only to clarify the abstract, general
notation introduced above. But from now on we will identify histories
with sequences of actions by active players only: e.g. (Leave, Wait) will be
simply written as (Leave), ((Leave, Wait),(Wait, Leave)) will be written as
(Leave, Leave) , etc.
How would you play this game? Does your answer depend on how large
is T ?
8.1.5
According to some theorists but not ourselves all the relevant aspects of a game
are captured by this derived representation.
180
T.T
T.L
L.T
L.L
T.T
1, 0
1, 0
1, 0
1, 0
T.L
1, 0
1, 0
1, 0
1, 0
L.T
0, 2
0, 2
3, 0
3, 0
L.L
0, 2
0, 2
0, 4
4, 0
181
182
Out.S1 , if q < 41 ,
Out.B1 , if 41 < q < 23 ,
s1 =
In.B1 ,
if q > 23 .
On the other hand, one may object that specifying a complete strategy
is not necessary for rational planning: if q is not high enough, i.e. if
max{3q, 1q} < 2, Ann has no need to plan ahead for the subgame, as she
can see that it is not worth her while to reach it. Therefore her best plan
is just Out, which is a reduced strategy. Since both notions of rational
7
8
If q = 1/4 she is indifferent, and the optimal plan for the subgame is arbitrary.
Again, we ignore ties.
183
planning are meaningful and intuitive, we are not going to endorse only
the second one of them by calling plan of action the reduced strategies.
The solution concepts presented in the next section further clarify why
the notion of strategy (as opposed to reduced strategy) is important in
game theory. Roughly, we are going to elaborate on the folding back
procedure of dynamic programming. While the folding back procedure
refers to a single player and his arbitrarily given subjective beliefs, the
standard theory of multistage games looks for collective and objective
elaborations of the folding back procedure.
[UNDER CONSTRUCTION HERE]
Definition 42. Two strategies si and s0i are realization equivalent if, for
every given strategy profile of the co-players, they induce the same history:
si Si , (si , si ) = (s0i , si ).
Lemma 10. Two strategies si and s0i are realization equivalent if and only
if they are behaviorally equivalent:
si Si , (si , si ) = (s0i , si ) si i s0i .
Definition 43. The reduced strategic (or normal) form of a game
is a static game N r () = hI, (Sri , Uir )iI i where each set Sri of each player
i I is obtained from Si by replacing each class of behaviorally equivalent
strategies with a unique representative element, that is, Sri = Si | i , and
Uir (sr ) = Ui (s) for each sr jI Srj and s sr .
Definition 44. Two strategies si and s0i are payoff-equivalent, written
si i s0i if, for each strategy profile of the co-players, they yield the same
profile of payoffs:
si Si , (si i s0i ) (j I, Uj (si , si ) = Uj (s0i , si )).
Since Uj = uj (j I), Lemma 10 implies the following:
Remark 33. If two strategies are behaviorally equivalent, then they are
payoff equivalent.
Indeed, a difference between payoff-equivalence and behavioral
equivalence may arise only if the game features some ties between payoffs
184
at distinct terminal histories z 6= z 0 : (ui (z))iI = (ui (z 0 ))iI . Such ties are
structural when z and z 0 yield the same (collective) consequence, e.g.
the same allocation, g(z) = g(z 0 ); otherwise they are due to non generic
ties between utility profiles: g(z) 6= g(z 0 ) and (vi (g(z)))iI = (vi (g(z 0 )))iI .
Unfortunately, the consequence function is mostly overlooked in game
theory and very often all ties between payoffs at distinct terminal histories
are called non generic.
The reduced strategic form of the ToL4 game is represented by the
following table:
1\2
L.T
L.L
1, 0
1, 0
1, 0
L.T
0, 2
3, 0
3, 0
L.L
0, 2
0, 4
4, 0
8.2
185
Any solution concept for static games can be applied to the strategic (or
normal) form N () of a multistage game and thus yields a candidate
solution for . For example, we can find the Nash equilibria or the
rationalizable strategies of N () and ask ourselves if they make sense as
solutions of .
There are many examples of rationalizable or Nash equilibrium strategy
profiles that lack obvious credibility properties. Consider the following very
stylized Entry Game : a firm, player 1, has the opportunity to enter in
a (so far) monopolistic market. The incumbent, player 2, may fight the
entry with a price war that damages both, or acquiesce. The game is
represented in Figure 8.5:
1
In
1
1
Out
0
2
1
0
It is easily checked that this game has two pure Nash equilibria:
[Out,(f if In)] and [In,(a if In)] (plus a continuum of randomized equilibria
where 2 chooses strategy (f if In) with probability larger than 12 ). Under
complete information, if the potential entrant believes that the incumbent
is rational, than she expects acquiescence and therefore enters. The only
reasonable equilibrium seems to be [In,(a if In)]. The problem with
the Nash equilibrium is that it implies that players maximize their payoff
on the equilibrium path, i.e. along the play induced by the equilibrium
strategy profile, but it allows players to plan non maximizing actions at
histories that are off the equilibrium path.
The reasonable equilibrium of the Entry Game can be obtained in
two ways.
(1) Backward induction : We consider the position of a player who
186
Where contingencies are non-terminal histories, and hence allow for the possibility
of making mistakes, as discussed in the previous section.
8.2.1
187
Backward Induction
The backward induction solution procedure is well-defined for all the finite
games with perfect information such that different terminal histories yield
different payoffs (for every i I and for every z, z 0 Z, z 6= z 0 implies
ui (z) 6= ui (z 0 )). We refer to the latter assumption about payoffs by saying
that the game is generic. The procedure can be generalized to other perfect
information games with finite horizon10 and to some games with more than
one active player at each stage, such as the finitely Repeated Prisoners
Dilemma.
Backward Induction Procedure. Fix a generic finite game with
perfect information. Let `(h) denote the length of a history and let
d(h) = maxz:hz [`(z) `(h)] if h H, d(z) = 0 if z Z. This is the
depth of history/node h, that is the maximum length of a continuation
history following h. Recall that (h) denotes the player who is active at
R
h H. Let us define for each player i I a value function vi : H
(si (h) is uniquely defined, see above); for all j I, let vj (h) =
vj ((h, s (h))).
10
For example, the genericity property of payoff functions rules out many games to
which the backward induction procedure can be easily applied, such as the ToL game.
Here is a simple generalization satisfied by ToL: a perfect information game has no
relevant ties if, for every pair of distinct terminal histories z 0 and z 00 (z 0 6= z 00 ), the
player who is decisive for z 0 vs z 00 (i.e. the player who is active at the last common
predecessor of z 0 and z 00 ) is not indifferent between z 0 and z 00 .
188
In words, we start from the last stage of the game: h is such that all
feasible actions at h terminate the game, that is d(h) = 1.11 For example,
in the Take-it-or-Leave-it game of Figure 1 there is only one history h with
d(h) = 1, that is, h = (Leave, Leave, Leave). According to the algorithm,
the active player selects the payoff maximizing action. This determines a
profile of payoffs for all players, denoted (vi (h))iI . Now we go backward
to the second-to-last stage, or more precisely we consider histories of
depth 2: d(h) = 2. The value vi ((h, a)) has already been computed for
because such histories correspond to the last stage
all histories (h, a) H,
of the game. According to the algorithm, the active player (h) chooses
. Intuitively, the reason is that the
the feasible action that maximizes v(h)
active player expects that every following player (possibly himself) would
maximize his own payoff in the last stage. The algorithm continues to go
backward in this fashion until it reaches the first stage (h = h0 ).
Try to use the procedure to solve the Take-it-or-Leave-it game. Would
you play according to the backward induction procedure?
8.2.2
189
if also Colin leaves). The revised beliefs must assign zero probability to
every other strategy of Rowena.
We deem a player in a dynamic game is rational if he would make
expected utility maximizing choices given his (updated) beliefs for every
possible history of observed choices of her opponents. Consider ToL4. Can
one say that Rowena is irrational if she leaves three dollars on the table?
No. Rowena may hope that then Colin will be generous and leave her
four dollars. We might argue that such a belief is not very reasonable, in
fact, it is inconsistent with the rationality of Colin; yet, if Rowena had this
belief, leaving three dollars on the table would be rational, i.e. expected
utility maximizing.
In the analysis of static games, one can capture with a solution
concept the following assumptions: all players are rational and there is
common (probability-one) belief of rationality. The solution concept is
rationalizability. An action is rationalizable if and only if it is iteratively
undominated.
Is there an analog of rationalizability for dynamic games? What can
one say about generic finite games with perfect information?
Note that in generic games there is a one-to-one relationship between
terminal histories and payoff vectors. Here we make use of this to
refer to either terminal histories or the corresponding payoff vectors as
outcomes. This allows me to identify corresponding outcomes in the
given perfect information game and in its strategic form. In particular, we
are interested in the outcome induced by the backward induction solution.
It is sometimes argued that the compelling logic of the backward
induction solution rules out all the non-backward-induction outcomes as
inconsistent with common certainty of rationality. If this claim were
correct, backward induction would be the analog of rationalizability for
(generic and finite) games with perfect information.
To assess this claim, let us first restrict our attention to games with
two stages, such as the Entry Game. It is pretty clear that in such
games if the players are rational in the sense specified above and if
the first mover believes that also the opponents are rational, then the
backward induction play obtains. Furthermore, the backward induction
outcome can be obtained on the normal form of the game by first
eliminating the weakly dominated strategies and then eliminating the
(strictly) dominated strategies of the residual normal form. Thus the
190
The reason why we start with the elimination of weakly dominated strategies is that,
if a strategy si prescribes an irrational continuation at a given history h consistent with
si itself, then this shows up in the normal form as si being weakly dominated. The
dominance relation is, in general, only weak because the opponents might behave in
such a way that h does not occur, implying that the flaw of si does not materialize. For
example, the strategy LL of Colin is irrational, because Colin leaves 4 dollars on the
table at his second move. This strategy is weakly, but not strictly, dominated by LT .
LL cannot be strictly dominated because both LL and LT are ex ante best responses to
strategy T L (or T T ) of Rowena: if she takes immediately the strategy of Colin cannot
affect his payoff.
191
192
(...)
SB(R SB(R) SB(R SB(R))...): every player strongly believes the
conjunction of all the assumptions made above,
(...).
Once the best rationalization principle is properly formalized, it can
be shown to yield, for generic games, the iterative deletion of weakly
dominated strategies.15 Since in generic games of perfect information
iterated weak dominance yields the backward induction outcome, we
can deduce that in such games the best rationalization principle
implies the backward induction outcome.16 [Add more here: only
path equivalence. + CB in rational Planning and Continuation
Consistency]
8.2.3
193
194
8.3
195
Recall that, for any non-terminal history h = (a1 , ..., at ) and player i I,
shi is the strategy that agrees with si at all histories h0 that do not precede
h and chooses the actions contained in h at all histories preceding h [that is,
shi selects action aki immediately after the initial sub-history (a1 , ..., ak1 )
of history h, for each k = 1, ..., t]. In particular, the equality shi (h) = si (h)
holds by definition. Now change shi at h so that it chooses action ai Ai (h).
The resulting strategy is denoted (si |h ai ). That is,
0
(si |h ai )(h ) =
ai ,
shi (h0 ),
if h = h0
otherwise
8.3.1
Finite horizon
196
197
8.3.2
It can also be shown that the finitely repeated Prisoners Dilemma has a unique
Nash equilibrium path, which of course must be a sequence of (cheat,cheat).
198
t
t
e
e
e
h, h Z satisfying h = h , we have ui (h) ui (h) < . This means that
what happens in the distant future has little impact on overall payoffs.
Example. At every stage all the action profiles
in
set
the Cartesian
S
t
0
A = iI Ai are feasible. Let H = {h }
t>0 A , Z = Z = A .
For every i I and t = 1, 2, ... there is a period-t (flow) payoff function
vi,t : At R . There is some positive real v such that for all i I,
sup |vi,t (ht )| < v.
ht At
Payoffs are assigned to terminal histories as follows: for each player i there
is a discount factor i (0, 1) such that for all h Z,
vi (h ) = (1 i )
X
(i )t1 vi,t (ht ).
t=1
You
can
check
that
such
games
are
continuous
at infinity. Infinitely repeated games with discounting are a special case
where vi,t (a1 , ..., at1 , at ) = vi (at ) for some stage game function vi .
Theorem 19. ( One-Shot Deviation Principle with continuity at infinity).
Suppose that is continuous at infinity. Then for any strategy profile s
the following conditions are equivalent:
(1) s is a subgame perfect equilibrium
(2) s satisfies the one-shot-deviation property.
Proof. (1) implies (2) by definition. To show that (2) implies (1), we
prove the contrapositive: if s is not a subgame perfect equilibrium, then
The case Z = is included for completeness. This is similar to stipulating that a
function with a finite domain is continuous by definition.
18
199
(8.3.1)
For anyD positive integer TE, let us define the auxiliary truncated game
T,s = I, (ATi (), uT,s
i )iI with horizon T derived from as follows:
ATi (h) = Ai (h) if `(h) < T and ATi (h) = otherwise, thus
T = {h H
: `(h) T } and Z T = {h H
T :either `(h) = T , or
H
`(h) < T and h Z}
for all h Z Z T , uT,s
i (h) = ui (h),
h
h
for all h Z T \Z, uT,s
i (h) = ui ((s )) (recall that (s ) is the
terminal history induced by strategy profile sh , where sh is the
profile consistent with history h that behaves as s at all histories
not preceding h).
200
8.4
Some Simple
Games
Results
About
Repeated
The previous analysis can be applied to study repeated games. For any
static game G = hI, (Ai , vi )iI i where each payoff function vi : A R
is bounded, let ,T (G) denote the T -stage game with observable actions
obtained by repeating G for T times and computing payoffs ui : AT R
as discounted time averages, with common discount factor (0, 1), that
is, for every terminal history z = (a1 , a2 , ...) and each player i
1
ui (a , a , ...) =
T
X
t1 vi (at ).
(8.4.1)
t=1
u
i (a , a , ...) = (1 )
T
X
t1 vi (at ),
(8.4.2)
t=1
201
T
X
t1 vi (
a) = vi (
a).
t=1
T
X
t vi (
a ).
=t+1
Note that the second term in this expression does not depend on ai , because
the equilibria of G played in the future may depend on calendar time, but
do not depend on past actions. Since a
t is an equilibrium of G, for every
ai , vi (ai , a
ti ) vi (
ati , a
ti ). Therefore i has no incentive to deviate from
t
a
i in stage t.
By Theorem 20, any set of assumptions that guarantee the existence
of an equilibrium of G also guarantee the existence of a SPE of ,T (G).
The next result identifies situations in which a SPE must be the
repetition of the Nash equilibrium of G.
19
That is, for all t = 1, 2, ..., ht1 At1 and i I, si (ht1 ) = ati .
202
k
X
=1
vi (a ) vi (ai , si (h)) +
k
X
vi (a ).
=1
Note that the summation term is the same on both sides of the inequality
because the inductive hypothesis implies that if s is followed in the last
k periods, future payoffs are independent of current actions. Therefore
the action profile s(h) is such that for every i I, for every ai Ai ,
vi (s(h)) vi (ai , si (h)), implying that s(h) is a Nash equilibrium of G.
But the only Nash equilibrium of G is a ; thus, s(h) = a .
20
The finitely repeated PD is a very simple game. Since the PD has a dominant action
equilibrium, the subgame perfect equilibrium can be obtained by backward induction.
Another result that holds for the finitely repeated PD is that, although it has many
Nash equilibrium strategy profiles, they are all equivalent, and hence they all induce the
permanent defection path.
203
4, 4
0, 5
1, 0
5, 0
2, 2
1, 0
0, 1
0, 1
0, 0
204
(IC)
ai Ai
that is,
supai Ai vi (ai , ai ) vi (a )
.
supai Ai vi (ai , ai ) vi (a )
supai Ai vi (ai , ai ) vi (a )
< 1.
supai Ai vi (ai , ai ) vi (a )
205
8.5
206
the previous history, this assumption does not entail any loss of generality.
For example, one may model games where chance moves occur before or
after the moves of real players. To minimize on notational changes we also
ascribe a utility function u0 to the chance player 0, but we assume that it
is constant. This is just another notational trick.
A multistage game with observable actions and chance moves
is a structure
n
= Ai , Ai (), ui i=0 , 0 .
The symbols Ai , Ai (), ui have the same meaning as before. From the
constraint correspondences Ai () (i = 0, ..., n) we can derive the set of
the set of terminal histories Z and the set of nonfeasible histories H,
external observer were uncertain about the true strategy profiles, (z|s)
would represent the probability of z conditional on the players following
the strategy profile s, which explains the notation). Therefore the outcome
function when there are chance moves has the form : S (Z). The
Recall that o (X) = { (X) : Supp = X}, the superscript o stands for
(relatively) open in the hyperplane of vectors whose elements sum to 1.
207
action of player i at stage t in history z, let ht1 (z) denote the prefix
(initial subhistory) of z of length t, and recall that `(z) is the length of z.
Then
s S, Z(s) = {z Z : t {1, ..., `(z)}, i I, ati (z) = si (ht1 (z))},
( Q
`(z)
t
t1 (z)), if z Z(s),
In words, (z|s)
is the product of the probabilities of the actions taken by
the chance player in history z.
A Nash equilibrium of is a strategy profile s such that for every
i I and for every si Si ,
X
X
(z|s
)ui (z)
(z|s
i , si )ui (z).
z
z Z, (z|h;
s) =
( Q
`(z)
t
t1 (z)),
t=`(h)+1 0 (a0 (z)|h
0,
if z Z(h, s),
if z
/ Z(h, s).
(z|h,
s )ui (z)
(z|h,
si , si )ui (z).
z
Let (|h,
ai , s) (Z) denote the probability measure on Z that
results if s is followed starting from h, except that player i chooses
208
(z|h,
s)ui (z)
(z|h,
ai , s)ui (z).
z
It can be shown that in every game with finite horizon or with continuity
at infinity (hence in every game with discounting) a strategy profile is a
subgame perfect equilibrium if and only if it has the OSDP.
8.6
Randomized Strategies
8.6.1
Realization-equivalence
behavioral strategies
between
mixed
and
209
i (Si (h, ai ))
i (Si (h))
(8.6.1)
P
[for any subset X Si , I write i (X) = si X i (si )]. If i (Si (h)) = 0,
i (|h) can be specified arbitrarily. Note that (8.6.1) can be written in a
more compact, but slightly less transparent form:
h H, i (ai |h)i (Si (h)) = i (Si (h, ai )).
23
If the game has chance moves the definition in the text must be adapted by including
in si also s0 , the strategy of the chance player.
210
The formula above, or (8.6.1), says that i and i are mutually consistent
in the sense that they jointly satisfy a kind of chain rule for conditional
probabilities.
Definition 48. A mixed strategy i (Si ) and a behavioral strategy
i = (i (|h))hH hH (Ai (h)) are mutually consistent if they satisfy
(8.6.1).
Remark 37. If mixed strategy i is such that i (Si (h)) = 0 for some h
where player i is active, then there is a continuum of behavioral strategies
i consistent with i . If i (Si (h)) > 0 for every h where i is active, then
there is a unique i consistent with i . If i is active at more than one
history, then there is a continuum of mixed strategies i consistent with
any given behavioral strategy i .
The last statement in the remark can be understood with a countingdimensionality argument. Let Hi := {h H : |Ai (h)| 2} denote
the set of histories where
Q i is active. In a finite game, the number of
elements of SQ
i is |Si | =
hHi |Ai (h)| , thus the dimensionality of (Si )
of
is |Si | 1 = hHi |Ai (h)| 1. OnPthe other hand, the dimensionality
P
the set of behavioral strategies is hHi (|Ai (h)| 1) = hHi |Ai (h)|
|H
shown (by induction on the cardinality of Hi ) that
Q i |. It can be P
24
|A
(h)|
i
hHi
hHi |Ai (h)| because |Ai (h)| 2 for each h Hi .
Thus the dimensionality of (Si ) is higher than the dimensionality of
hH (Ai (h)) if |Hi | 2.
Example 25. In the game tree in Figure 3, Rowena (player 1) has
4 strategies, S1 = {out.u, out.d, in.u, in.d}; two of them out.u and out.d
are realization-equivalent and correspond to the plan of action out. The
figure labels terminal histories: v = (out), w = (in, (u, l)) etc. The set of
non-terminal histories is H = {h0 , (in)} (recall that h0 denotes the initial,
empty, history) and S1 (h0 ) = S1 , S1 (in) = {in.u, in.d}.
24
First note that for every integer L 1, 2L1 L. (This can be easily proved
by induction: it is trivially true for L = 1; suppose it is true for some L 1, then
2L = 2 2L1
Q
PL 2 L = L + L L + 1.) Let nk 2 for each k = 1, 2, .... I prove that
L
n
k
k=1
k=1 nk for each L = 1, 2, .... Let n = max{n1 , ..., n` }. Then
L
Y
k=1
nk n 2L1 n L
L
X
k=1
nk .
211
in
1
out
1\2
v
Figure 8.7: A game tree.
p p
1p
, (out.d) =
,
2 1
2
1 p
2
, 1 (in.d) = .
6
6
1 2
1
1 (S1 (in))
= + = ,
1 (S1 )
6 6
2
1 (S1 (in, u))
=
S1 (in)
1
6
1
6
2
6
1
= .
3
212
1 1
1
1 2
2
= , 1 (out.d) = = ,
2 3
6
2 3
6
1
1 2
2
1 1
= , 1 (in.d) = = .
2 3
6
2 3
6
This is one part of what is known as Kuhns theorem for mixed and behavioral
strategies. For a proof, see the appendix.
213
z Z, (z|)
=
i (si ) ,
s:(s)=z
iI
`(z)
z Z, (z|)
=
YY
t=1 iI
(Kuhn)27
Theorem 23.
For any two profiles of mixed and behavioral
strategies and the following statements hold:
(a) Fix any i I, if i and i satisfy either (8.6.1) or (8.6.2) then
i , si ) = (|
i , si ).
si Si , (|
(b) If, for all i I, i and i satisfy either (8.6.1) or (8.6.2) then
(|)
= (|).
Example 27. Again, the example of Figure 3 illustrates. For Rowena
(player 1) consider the randomized strategies 1p and 1 of Example
25. Colin (player 2) is active at only one history, hence there is an
obvious isomorphism between his mixed and behavioral strategies. Let
1 , l) = (|
1 , l), (|
1 , r) = (|
1 , r) and
2 (l) = 2 (l|in) = q. Then (|
(v|)
=
(y|)
=
26
= (v|),
2
q
= (y|),
3
q
1q
(w|)
= = (w|),
(x|)
=
= (x|),
6
6
1q
(z|)
=
= (z|).
3
Recall that (s) is the terminal history induced by strategy profile s. We introduced
and ht1 (z) in the previous section: ati (z) is the action played by i at stage t in
history z, ht1 (z) is the prefix of length t 1 of z.
27
In his seminal (1953) article, Harold Kuhn proved two important results about
general games with a sequential structure (so called extensive form games ), one
concerns the realization-equivalence of mixed and behavioral strategies under the
assumption of perfect recall (a generalization of the observable actions assumption),
the other concerns the existence of equilibria.
ati (z)
214
8.6.2
Randomized equilibria
t=`(h)+1
z Z, (z|h, ) =
0,
otherwise
(z|
)ui (z)
(z|
i , )ui (z).
(z|h,
)ui (z)
(z|h,
i , )ui (z).
(z|h,
)ui (z)
(z|h,
ai , )ui (z).
The formula is
Z, (z|h,
ai , ) =
Q
( Q
Q
`(h)+1
`(z)
t
t1
(z)|h`(h) (z))
(z)),
j6=i j (aj
jN j (aj (z)|h
t=`(h)+2
0,
`(h)+1
if h z, ai
(z) = ai
otherwise
215
P[h| ] > 0
(z|h,
)ui (z)
(z|h,
i , )ui (z).
z
Adapting arguments used for pure strategy profiles one can show that
the One-Shot-Deviation Principle also holds for behavioral profiles:
Theorem 24. ( One-Shot-Deviation Principle) In every finite game a
behavioral strategy profile is a subgame perfect equilibrium if and only if
it has the One-Shot-Deviation Property.
The One-Shot-Deviation Principle allows a relatively simple proof of
existence of randomized equilibria in finite games:
Theorem 25. (Kuhn) Every finite game has at least one subgame perfect
equilibrium in behavioral strategies.
Proof. We provide only a sketch of the proof. Construct a profile
with a kind of backward induction procedure. All histories with
depth one29 define a static last stage game that has at least one mixed
29
216
equilibrium. Thus, to every h of depth one we can associate a stagegame equilibrium (|h) iI (Ai (h))
payoff profile
Q and an equilibrium
P
that a mixed action profile (|h) and a payoff profile (vi (h))iI has been
assigned to each history with depth k or less. Then one can assign a mixed
action profile (|h) and a payoff profile (vi (h))iI to each
history h with
depth k + 1: just pick a mixed equilibrium of the game I, (Ai (h), vi )iI
where for every i I and for every a A(h), vi (a) = vi (h, a) (note that
(h, a) has depth k or less, therefore vi (h, a) is well defined). The profile
constructed in this way must satisfy the One-Shot-Deviation property
and therefore it is a subgame perfect equilibrium.
8.7. Appendix
8.7
8.7.1
217
Appendix
Epistemic Analysis Of The ToL4 Game
218
history h0 and each (other) history where i is active; these beliefs are
represented by the triple (i (t1i |h), i (t2i |h), i (t3i |h)); I am considering
deterministic beliefs just for simplicity. For example, type t11 of player 1
(Rowena), who takes immediately, starts thinking that the type of 2 (Colin)
is t12 , and hence that Colin would take immediately if given the opportunity;
but ti would be forced to change her mind at history h = (L, L). The belief
that t11 would have at this history is that the type of Colin is t22 , and thus
that Colin would take at the next round if given the opportunity. [Note
that Colin must assign probability one to strategy LL of Rowena if history
(L, L, L) occurs, as this is the only strategy of Rowena consistent with
(L, L, L). Since Colin thinks that t31 is the only type of Rowena playing
LL, he assigns probability one to t31 at h = (L, L, L). This explains the
last column of the second table.]
type of 1
plan of 1
beliefs at h0
bel. at h = (L, L)
t11
(1, 0, 0)
(0, 1, 0)
t21
t31
LT
(0, 1, 0)
(0, 1, 0)
LL
(0, 0, 1)
(0, 0, 1)
type of 2
plan of 2
beliefs at h0
bel. at h = (L)
bel. at h = (L, L, L)
t12
t22
t32
(1, 0, 0)
(0, 1, 0)
(0, 0, 1)
LT
(1, 0, 0)
(0, 0, 1)
(0, 0, 1)
LL
(0, 0, 1)
(0, 0, 1)
(0, 0, 1)
By inspection of the tables one can see that all types but t32 are
rational, in the sense they their plan of action maximizes the players
conditional expected payoff for each history consistent with the plan itself
(in this analysis we may neglect the histories inconsistent with the plan
of ti when we check for the rationality of ti , this make things slightly
simpler). Thus, the set of states where (sequential) rationality holds is
R = {t11 , t21 , t31 } {t12 , t22 }.
Now note that type t31 of Rowena is rational, but does not believe in the
rationality of Colin. On the other hand, in each state of the world in the
= {t1 , t2 } {t1 , t2 } R each player initially assigns probability
subset
1 1
2 2
Thus, at each state in
players
one to some type of the other player in .
are not only rational they are also initially certain of rationality.
8.7. Appendix
219
there is rationality
Suppose, by way of induction, that at each state in
and initial mutual belief in rationality up to order k. Since at each state in
the players initially assign probability one to (the opponents component
LT
LL
1, 0
1, 0
1, 0
LT
0, 2
3, 0
3, 0
LL
0, 2
0, 4
4, 0
LT
1, 0
1, 0
LT
0, 2
3, 0
LL
0, 2
0, 4
220
LT
1, 0
1, 0
LT
0, 2
3, 0
8.7.2
H and a
(it is
Proof of Remark 38. Fix i, i , i , h
i Ai (h)
notationally convenient to put a hat on the fixed history and action).
> 0 implies
Suppose that (8.6.2) holds; it must be shown that i (Si (h))
i (
ai |h) = i (Si (h, a
i ))/i (Si (h)). For any h h, let
ai (h) denote the
from h (in other words,
action taken by i at h to reach h
a(h) = (
aj (h))jI
Y
Y
Y
i (si (h)|h) =
i (
ai (h)|h)
i (si (h)|h) ,
hH
hH:hh
hH:hh
a
and for each si Si (h,
i ),
Y
Y
i (si (h)|h) =
i (
ai (h)|h)i (
ai |h)
hH
hH:hh
i (si (h)|h) .
hH:hh
Therefore,
i (Si (h))
=
i (si ) =
si Si (h)
hH:hh
i (si (h)|h)
hH
si Si (h)
i (
ai (h)|h)
hH:hh
si Si (h)
i (si (h)|h) ,
8.7. Appendix
221
X
a
i (Si (h,
i )) =
i (si ) =
i (si (h)|h)
i ) hH
si Si (h,a
i)
si Si (h,a
i (
ai (h)|h) i (
ai |h)
hH:hh
i (si (h)|h) ,
ai ) hH:hh
si Si (h,
in any order,
Count the partial histories that do not weakly precede h
L
X
i (si (h)|h) =
i (ai,k |hk ) = 1
ai ) hH:hh
si Si (h,
i (si (h)|h) =
hH:hh
si Si (h)
i (ai |h)
L
X
i (ai,k |hk ) = 1.
ai Ai (h)
Then
Y
i (Si (h))
=
i (
ai (h)|h),
hH:hh
a
i (Si (h,
i )) =
i (
ai (h)|h) i (
ai |h)
hH:hh
and
i (Si (h))
= i (
i (
ai |h)
ai |h)
hH:hh
a
i (
ai (h)|h) = i (Si (h,
i ))
9.1
223
!
Ai () :
A1 ... An
S t
A
t0
, At
length t;
by convention A0 := {h!0 } where h0 is the empty sequence;
S t
sequences h
A are called histories; Ai (h) Ai is the set of
t0
then (a , ...a ) H if and only if a1 A(h0 ) and ak+1 A(a1 , ..., ak ) for
each k = 1, ..., t 1.]
Whenever we do not say otherwise the following applies:
!
S
t
Assumption (finiteness):
For some T , H
A ,
0tT
is finite.
furthermore H
224
9.2
Dynamic environments
information
with
incomplete
9.2.1
Rationalizability
225
226
9.3
D
E
Structure I, 0 , i , Ai , Ai (), ui iI does not specify the exogenous
interactive beliefs of the players, therefore it is not rich enough to define
standard notions of equilibrium.
As in the case of static games we should add interactive beliefs
structures (type spaces) `
a la Harsanyi.
Since the dynamics involve additional complications of strategic
analysis, we simplify the interactive beliefs aspect: we assume that the
set of types `
a la Harsanyi Ti coincide with (more precisely, is isomorphic
to) i . To emphasize this assumption we use the phrase simple Bayesian
game :
Let pi (|i ) (0 i ) denote the exogenous belief of type i . A
simple dynamic Bayesian game is a structure
D
E
= I, 0 , i , Ai , Ai (), ui , (pi (|i ))i i iI
We could have also specified priors pi () with pi (i ) > 0 for all
i and let pi (0 , i |i ) = pi (0 , i , i )/pi (i ).
9.3.1
Bayesian equilibrium
pi (0 , i |i )ui 0 , i , i , (si , si ) ,
0 ,i
with si
= (sj )jI\{i} Si .
227
In randomized
228
9.4
Bayes Rule
10
[
n=1
:=
k
for some k = 0, ..., n
n
229
P(|x) = P
(BFor)
230
9.5
231
232
P(a|i , h; , ) =
0
0
i (00 , i
|i , h)P(a|00 , i , i
, h; ).
0
00 ,i
P(a|, h; )i (0 , i |i , h)
.
P(a|i , h; , )
233
`
Y
k=2
Then
E[ui |i , h, ai ; , ]
X
0
0
=
i (00 , i
|i , h)i (ai |i
, h)E[ui |i , (h, (ai , ai )); , ]
0 ,a A (h)
00 ,i
i
i
where
E[ui |i , (h, (ai , ai )); , ]
0 , , (h, (a , a ))), if (h, (a , a )) Z
ui (00 , i
i
i i
i i
P
=
0
0
P(z|,
(h,
(a
,
a
);
)u
i i
i (0 , i , .i , z), otherwise
z:(h,(ai ,ai ))z
9.6
Signaling Games
Equilibrium
and
Perfect
Bayesian
In some models the set of feasible actions of player 1 depends onS and is denoted
by A1 (). The set of potentially feasible actions of player 1 is A1 =
A1 ().
234
We let (|a1 ) denote the probability that Bob would assign to upon
observing action a1 of Ann. Since Bob chooses a2 after he has observed
a1 in order to maximize the expectation of u2 (, a1 , a2 ), the system of
conditional probabilities = ((|a1 ))a1 A1 is an essential ingredient of
equilibrium analysis. In the technical language of game theory is called
system of beliefs.
P
Now, if P(a1 ) = 0 1 (a1 |0 )(0 ) > 0 then Bayes formula applies
and
P((, a1 ))
1 (a1 |)()
(|a1 ) =
=P
.
0
0
P(a1 )
0 1 (a1 | )( )
Since 1 is endogenous, also the system of beliefs is endogenous. Thus
we have to determine through equilibrium analysis the triple (1 , 2 , ).
In the technical game-theoretic language (1 , 2 , ) (a profile of behavioral
strategies plus a system of beliefs) is called assessment.
For any given assessment (1 , 2 , ) we use the following notation to
abbreviate conditional expected payoff formulas:
E[u1 |, a1 ; 2 ] : =
a2 A2 (a1 )
E[u2 |a1 , a2 ; ] : =
(|a1 )u2 (, a1 , a2 ).
235
(BR1 )
a1 A1
max
a2 A2 (a1 )
E[u2 |a1 , a2 ; ]
(BR2 )
(CONS)
A1 , ,
1 (a1 |)()
.
0
0
0 1 (a1 | )( )
Note that each equilibrium condition involves two out of the three
vectors of endogenous variables 1 , 2 and : (BR1 ) says that that each
mixed action 1 (|) (A1 ) ( ) is a best reply to 2 for type of
Ann, (BR2 ) says that each mixed action 2 (|a1 ) (A2 (a1 )) (a1 A1 ) is
a best reply for Bob to the conditional belief (|a1 ) (), and (CONS)
3
236
says that 1 and (together with the exogenous prior ) are consistent
with Bayes rule.
P It should 0 be 0 emphasized that, for some action a1 , P(a1 ) =
0 1 (a1 | )( ) may be zero. For example, suppose that for each
, there is some action a1 such that E[u1 |, a1 ; 2 ] > E[u1 |, a1 ; 2 ].
Then action/message a1 must have zero probability because it is not a best
reply for any type of Ann. Yet we assume that the belief (|a1 ) is well
defined and Bob takes a best reply to this belief. This is a perfection
requirement analogous to the subgame perfection condition for games
with observable actions and complete information. A perfect Bayesian
equilibrium satisfies perfection and consistency with Bayes rule.
Furthermore, even if (|a1 ) cannot be computed with Bayes formula,
it may still be the case that the equilibrium conditions put constraints on
the possible values of (|a1 ). The following example illustrates this point.
(0, 3)
(1, 1)
{ 21 }
u
r
d
(0, 0)
(2, 0)
00
(1, 1)
1
{ 21 }
2
d
(0, 1)
237
reply of Bob is down, i.e. 2 (d|r) = 1, and the best reply of type 00 is left,
i.e. 1 (l|00 ) = 1 1 (r|00 ) = 1, contradicting our initial assumption.
We conclude that in every PBE r is chosen with probability zero and
(|r) cannot be determined with Bayes formula. Yet the equilibrium
conditions put a constraint on (|r): in equilibrium d must be (weakly)
preferred to u (if Bob chooses u after r then type 00 chooses r and we have
just shown that this cannot happen in equilibrium). Therefore
(00 |r) 3(0 |r)
3
or (00 |r)
.
4
The set of equilibrium assessments is
3
0
00
00
(1 , 2 , ) : 1 (l| ) = 1 (l| ) = 1, 2 (d|r) = 1, ( |r) >
4
1
3
(1 , 2 , ) : 1 (l|0 ) = 1 (l|00 ) = 1, 2 (d|r) , (00 |r) =
.
2
4
These assessments are examples of pooling equilibria. A pooling
equilibrium is a PBE assessment where all types of Ann, the informed
player, choose the same pure action with probability one: there exists
a1 A1 such that for every , 1 (a1 |) = 1. In this case Bayes rule
implies that the posterior on conditional on the equilibrium action a1 is
the same as the prior: (|a1 ) = ().
The polar case is when different types choose different pure actions: a
separating equilibrium is a PBE assessment such that each type of
player 1 chooses some action a1 () with probability one (1 (a1 ()|) = 1)
and a1 (0 ) 6= a1 (00 ) for all 0 and 00 with 0 6= 00 . A separating equilibrium
may exist only if A1 has at least as many elements as . If A1 and have
the same number of elements (cardinality) then in a separating equilibrium
each action is chosen with ex ante positive probability (because () > 0
for each ) and the action of player 1 perfectly reveals her private
information (if A1 has more elements than then the actions that in
equilibrium are chosen by some type are perfectly revealing, the others
need not be revealing).
The following signaling game provides an example of separating
equilibrium (the payoffs of the informed player are in bold, call the
downward action of player 2 a and the upward action f ):
238
(0, 0)
9
}
{ 10
s
(2, 1)
(1, 1)
(0, 1)
(1, 1)
2
(1, 0)
1
1
}
{ 10
2
(2, 0)
239
9
2 (a|s) = 1 = 2 (f |w), (s |s) = 10
, (w |w) 12 . In the second set
of assessments each type has whipped cream for breakfast and player 2
would fight if and only if he observed a sausage breakfast: 1 (w|s ) = 1 =
9
1 (w|w ), 2 (a|w) = 1 = 2 (f |s), (s |w) = 10
, (w |s) 12 .
Bibliography
[1] Aumann R.J. (1974): Subjectivity and Correlation in Randomized
Strategies, Journal of Mathematical Economics, 1, 67-96.
[2] Aumann R.J. (1987): Correlated Equilibrium as an Expression of
Bayesian Rationality, Econometrica, 55, 1-18.
[3] Battigalli P. and G. Bonanno (1999): Recent results on belief,
knowledge and the epistemic foundations of game theory, Research
in Economics, 53, 149-226.
[4] Battigalli P. and M. Siniscalchi (2003): Rationalization and
Incomplete Information, Advances in Theoretical Economics, 3 (1),
Article 3.
[5] Battigalli P., S. Cerreia-Vioglio, F. Maccheroni and
M. Marinacci (2011): Self-Confirming Equilibrium and Model
Uncertainty, IGIER working paper 428, Bocconi University.
[6] Battigalli P., M. Gilli and M.C. Molinari (1992): Learning
and Convergence to Equilibrium in Repeated Strategic Interaction,
Research in Economics (Ricerche Economiche), 46, 335-378.
[7] Battigalli P., A. Di Tillio, E. Grillo and A. Penta
(2011): Interactive Epistemology and Solution Concepts for
Games with Asymmetric Information, The B.E. Journal of
Theoretical Economics (2011), 11 : Iss. 1 (Advances), Article 6.
[http://www.bepress.com/bejte/advances/vol3/iss1/art3]
[8] Bernheim D. (1984):
Rationalizable
Econometrica, 52, 1007-1028.
240
Strategic
Behavior,
241
Cursed Equilibrium,
242
243
and