978 3 642 15202 3
978 3 642 15202 3
Editorial Board
G.-M. Greuel, Kaiserslautern M. Gromov, Bures-sur-Yvette
J. Jost, Leipzig J. Kollár, Princeton
G. Laumon, Orsay H. W. Lenstra, Jr., Leiden
S. Müller, Bonn J. Tits, Paris
D. B. Zagier, Bonn G. Ziegler, Berlin
Managing Editor R. Remmert, Münster
This volume is the first part of a treatise on Spin Glasses in the series Ergebnisse der Math-
ematik und ihrer Grenzgebiete. The second part is Vol. 55 of the Ergebnisse series. The first
edition of the treatise appeared as Vol. 46 of the same series (978-3-540-00356-4).
Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys
in Mathematics ISSN 0071-1136
Mathematics Subject Classification (2010): Primary: 82D30, 82B44. Secondary: 82C32, 60G15, 60K40
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI
VII
VIII Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Introduction
√
Let us denote by SN the sphere of RN of center 0 and radius N , and by
μN the uniform measure on SN . For i, k ≥ 1, consider independent standard
Gaussian random variables (r.v.s) gi,k and the subset Uk of RN given by
Uk = (x1 , . . . , xN ) ∈ RN ; gi,k xi ≥ 0 .
i≤N
The direction of the vector (gi,k )i≤N is random (with uniform distribution
over all possible directions) so that Uk is simply a half-space through the
origin of random direction. (It might not be obvious now why we use Gaussian
r.v.s to define a space of random direction, but this will become gradually
clear.) Consider the set SN ∩k≤M Uk , the intersection of SN with many such
half-spaces. Denoting by E mathematical expectation, it should be obvious
that
E μN SN Uk = 2−M , (0.1)
k≤M
because every point of SN has a probability 2−M to belong to all the sets Uk ,
k ≤ M . This however is not really interesting. The fascinating fact is that
when N is large and M/N α, if α > 2 the set SN ∩k≤M Uk is typically
empty (a classical result), while if α < 2, with probability very close to 1, we
have
1
log μN SN Uk RS(α) . (0.2)
N
k≤M
Here,
√
z q 1 q 1
RS(α) = min α E log N √ + + log(1 − q) ,
0<q<1 1−q 2 1−q 2
exp βXk
. (0.3)
i≤M exp βXi
is unclear how to estimate it. The (few) large values become more important
as β increases, and predominate over the more numerous smaller values. Thus
the problem of understanding Gibbs’ measure gets typically harder for large
β (low temperature) than for small β (high temperature).
At this stage, the reader has already learned all the statistical
mechanics (s)he needs to know to read this work.
The energy levels HN (σ) are closely related to the “interactions” between
the spins. When we try to model a situation of “disordered interactions” these
energy levels will become random variables, or, equivalently, the Hamiltonian,
and hence Gibbs’ measure, will become random. There are two levels of ran-
domness (a probabilist’s paradise). The “disorder”, that is, the randomness
of the Hamiltonian HN , is given with our sample system. It does not evolve
with the thermal fluctuations. It is frozen, or “quenched” as the physicists
say. The word “glass” of the expression “spin glasses” conveys (among many
others) this idea of frozen disorder.
Probably the reader has met with skepticism the statement that no further
knowledge of statistical mechanics is required to read this book. She might
think that this could be formally true, but that nonetheless it would be
very helpful for her intuition to understand some of the classical models
of statistical mechanics. This is not the case. When one studies systems at
“high temperature” the fundamentalmental picture is that of the model with
random Hamiltonian HN (σ) = − i≤N hi σi where hi are i.i.d. Gaussian
random variables (that are not necessarily centered). This particular model
is completely trivial because there is no interaction between the sites, so it
reduces to a collection of N models consisting each of one single spin, and
each acting on their own. (All the work is of course to show that this is in
some sense the way things happen in more complicated models.) When one
studies systems at “low temperature”, matters are more complicated, but
this is a completely new subject, and simply nothing of what had rigorously
been proved before is of much help.
In modeling disordered interactions between the spins, the problem is to
understand Gibbs’ measure for a typical realization of the disorder. As we
explained, this is closely related to the problem of understanding the large
values among a typical realization of the family (−HN (σ)). This family is
correlated. One reason for the choice of the index set ΣN is that it is suitable
to create extremely interesting correlation structures with simple formulas.
At the beginning of the already long story of spin glasses are “real” spin
glasses, alloys with strange magnetic properties, which are of considerable
interest, both experimentally and theoretically. It is believed that their re-
markable properties arise from a kind of disorder among the interactions of
magnetic impurities. To explain (at least qualitatively) the behavior of real
spin glasses, theoretical physicists have invented a number of models. They
fall into two broad categories: the “realistic” models, where the interacting
atoms are located at the vertices of a lattice, and where the strength of
Introduction XV
the interaction between two atoms decreases when their distance increases;
and the “mean-field” models, where the geometric location of the atoms in
space is forgotten, and where each atom interacts with all the others. The
mean-field models are of special interest to mathematicians because they are
very basic mathematical objects and yet create extremely intricate struc-
tures. (As for the “realistic” models, they appear to be intractable at the
moment.) Moreover, some physicists believe that these structures occur in
a wide range of situations. The breadth, and the ambition, of these physi-
cists’ work can in particular be admired in the book “Spin Glass Theory and
Beyond” by M. Mézard, G. Parisi, M.A. Virasoro, and in the book “Field
Theory, Disorder and Simulation” by G. Parisi. The methods used by the
physicists are probably best described here as a combination of heuristic ar-
guments and numerical simulation. They are probably reliable, but they have
no claim to rigor, and it is often not even clear how to give a precise mathe-
matical formulation to some of the central predictions. The recent book [102]
by M. Mézard and A. Montanari is much more friendly to the mathemati-
cally minded reader. It covers a wide range of topics, and succeeds well at
conveying the depth of the physicists’ ideas.
It was rather paradoxical for a mathematician like the author to see sim-
ple, basic mathematical objects being studied by the methods of theoretical
physics. It was also very surprising, given the obvious importance of what the
physicists have done, and the breadth of the paths they have opened, that
mathematicians had not succeeded yet in proving any of their conjectures.
Despite considerable efforts in recent years, the program of giving a sound
mathematical basis to the physicists’ work is still in its infancy. We already
have tried to make the case that in essence this program represents a new
direction of probability theory. It is hence not surprising that, as of today, one
has not yet been able to find anywhere in mathematics an already developed
set of tools that would bear on these questions. Most of the methods used
in this book belong in spirit to the area loosely known as “high-dimensional
probability”, but they are developed here from first principles. In fact, for
much of the book, the most advanced tool that is not proved in complete
detail is Hölder’s inequality. The book is long because it attempts to fulfill
several goals (that will be described below) but reading the first two chapters
should be sufficient to get a very good idea of what spin glasses are about,
as far as rigorous results are concerned.
The author believes that the present area has a tremendous long-term
potential to yield incredibly beautiful results. There is of course no way of
telling when progress will be made on the really difficult questions, but to
provide an immediate incitement to seriously learn this topic, the author has
stated as research problems a number of interesting questions (the solution
of which would likely deserve to be published) that he believes are within the
reach of the already established methods, but that he purposely did not, and
XVI Introduction
will not, try to solve. (On the other hand, there is ample warning about the
potentially truly difficult problems.)
This book, together with a forthcoming second volume, forms a second
edition of our previous work [157],“Spin Glasses, a Challenge for Mathemati-
cians”. One of the goals in writing [157] was to increase the chance of signifi-
cant progress by making sure that no stone was left unturned. This strategy
greatly helped the author to obtain the solution of what was arguably at the
time the most important problem about mean-field spin glasses, the validity
of the “Parisi Formula”. This advance occurred a few weeks before [157] ap-
peared, and therefore could not be included there. Explaining this result in
the appropriate context is a main motivation for this second edition, which
also provides an opportunity to reorganize and rewrite with considerably
more details all the material of the first edition.
The programs of conferences on spin glasses include many topics that
are not touched here. This book is not an encyclopedia, but represents the
coherent development of a line of thought. The author feels that the real
challenge is the study of spin systems, and, among those, considers only pure
mean-field models from the “statics” point of view. A popular topic is the
study of “dynamics”. In principle this topic also bears on mean-field models
for spin glasses, but in practice it is as of today entirely disjoint from what
we do here.
This work is divided in two volumes, that total a rather large number
of pages. How is a reader supposed to attack this? The beginning of an
answer is that many of the chapters are largely independent of each other,
so that in practice these two volumes contain several “sub-books” that can
be read somewhat independently of each other. For example, there is the
“perceptron book” (Chapters 2, 3, 8, 9). On the other hand, we must stress
that we progressively learn how to handle technical details. Unless the reader
is already an expert, we highly recommend that she studies most of the first
four chapters before attempting to read anything else in detail.
We now proceed to a more detailed description of the contents of the
present volume. In Chapter 1 we study in great detail the Sherrington-
Kirkpatrick model (SK), the “original” spin glass, at sufficiently high temper-
ature. This model serves as an introduction to the basic ideas and methods.
In the remainder of the present volume we introduce six more models. In
this manner we try to demonstrate that the theory of spin glasses does not
deal only with such and such very specific model, but that the basic phe-
nomena occur again and again, as a kind of new law of nature (or at least
of probability theory). We present enough material to provide a solid under-
standing of these models, but without including any of the really difficult
results. In Chapters 2 and 3, we study the so-called “perceptron capacity
model”. This model is fundamental in the theory of neural networks, but the
underlying mathematical problem is the rather attractive question of com-
puting the “proportion” of the discrete cube (resp. the unit sphere) that is
Introduction XVII
validity of the Parisi formula, as well as the complete proof itself. We shall
bring attention to the apparently deep mysteries that remain, even for the
SK model, the problem of ultrametricity and the problem of chaos. A final
chapter will be devoted to the case of the p-spin interaction model, for p odd,
for which the validity of the Parisi formula will be proved in a large region
of parameters using mostly the cavity method.
At some point I must apologize for the countless typos, inaccuracies, or
downright mistakes that this book is bound to contain. I have corrected many
of each from the first edition, but doubtlessly I have missed some and created
others. This is unavoidable. I am greatly indebted to Sourav Chatterjee,
Albert Hanen and Marc Yor for laboring through this entire volume and
suggesting literally hundreds of corrections and improvements. Their input
was really invaluable, both at the technical level, and by the moral support
it provided to the author. Special thanks are also due to Tim Austin, David
Fremlin and Fréderique Watbled. Of course, all remaining mistakes are my
sole responsibility.
This book owes its very existence to Gilles Godefroy. While Director of
the Jussieu Institute of Mathematics he went out of his way to secure what
has been in practice unlimited typing support for the author. Without such
support this work would not even have been started.
While writing this book (and, more generally, while devoting a large part
of my life to mathematics) it was very helpful to hold a research position
without any other duties whatsoever. So it is only appropriate that I express
here a life-time of gratitude to three colleagues, who, at crucial junctures,
went far beyond their mere duties to give me a chance to get or to keep this
position: Jean Braconnier, Jean-Pierre Kahane, Paul-André Meyer.
It is customary for authors, at the end of an introduction, to warmly thank
their spouse for having granted them the peaceful time needed to complete
their work. I find that these thanks are far too universal and overly enthu-
siastic to be believable. Yet, I must say simply that I have been privileged
with a life-time of unconditional support. Be jealous, reader, for I yet have
to hear the words I dread the most: “Now is not the time to work”.
1. The Sherrington-Kirkpatrick Model
1.1 Introduction
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 1
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 1, © Springer-Verlag Berlin Heidelberg 2011
2 1. The Sherrington-Kirkpatrick Model
Trying to make this large invites making the quantities gij σi σj positive, and
thus invites in turn taking σi and σj of the same sign when gij > 0, and of
opposite signs when gij < 0.
Despite the simplicity of the expression (1.1), the optimization problem of
finding the maximum of this quantity (for a typical realization of the gij ) over
the configuration σ = (σ1 , . . . , σN ) appears to be of extreme difficulty, and
little is rigorously known about it. Equivalently, one can look for a minimum
of the function
− gij σi σj .
i<j
That is, we think of the quantity (1.2) as being the energy of the configuration
σ. The purpose of the normalization factor N −1/2 will be apparent after (1.9)
below. The energy level of a given configuration depends on the (gij ), and
this randomness models the “disorder” of the situation.
The minus signs in the Boltzmann factor exp(−βHN (σ)) that arise from
the physical requirement to favor configurations of low energy are a nuisance
for mathematics. This nuisance is greatly decreased if we think that the object
of interest is (−HN ), i.e. that the minus sign is a part of the Hamiltonian.
We will use this strategy throughout the book. Keeping with this convention,
we write formula (1.2) as
1
− HN (σ) = √ gij σi σj . (1.3)
N i<j
N 2 1
E(HN (σ 1 )HN (σ 2 )) = R − . (1.6)
2 1,2 2
Let us denote by d(σ 1 , σ 2 ) the Hamming distance of σ 1 , σ 2 , that is the
proportion of coordinates where σ 1 , σ 2 differ,
1
d(σ 1 , σ 2 ) = card{i ≤ N ; σi1 = σi2 } . (1.7)
N
Then
R1,2 = 1 − 2d(σ 1 , σ 2 ) , (1.8)
and this shows that R1,2 , and hence the correlation of the family (HN (σ)),
is closely related to the structure of the metric space (ΣN , d), where ΣN =
{−1, 1}N . This structure is very rich, and this explains why the simple ex-
pression (1.3) suffices to create a complex situation. Let us also note that
(1.4) implies
2 N −1
EHN (σ) = . (1.9)
2
2
Here is the place to point out that to lighten notation we write EHN
2 2
rather than E(HN ), a quantity that should not be confused with (EHN ) .
The reader should remember this when she will meet expressions such as
E|X − Y |2 .
We can explain to the reader having √ some basic knowledge of Gaus-
sian r.v.s the reason behind the factor N in (1.3). The 2N Gaussian
√ r.v.s
−HN (σ) are not too much correlated;√ each one is of “size about N ”. Their
maximum should be of size about N log 2N , i.e. about N , see Lemma
A.3.1. If one keeps in mind the physical picture that HN (σ) is the energy
of the configuration σ, a configuration of a N -spin system, it makes a lot of
sense that as N becomes large the “average energy per spin” HN (σ)/N re-
mains in a sense bounded independently of N . With the choice (1.3), some of
the terms exp(−βHN (σ)) will be (on a logarithmic scale) of the same order
as the entire sum ZN (β) = σ exp(−βHN (σ)), a challenging situation.
4 1. The Sherrington-Kirkpatrick Model
The reader might observe that the sentence “the Hamiltonian is” preceding
(1.10) is not strictly true, since this formula actually give the value of −HN
rather than HN . It seems however harmless to allow such minor slips of
language. The last term in (1.10) represents the action of an “external field”,
that is a magnetic field created by an apparatus outside the sample of matter
we study. The external field favors the + spins over the − spins when h > 0.
With the Hamiltonian (1.10), the Boltzmann factor exp(−βHN (σ)) becomes
β
exp √ gij σi σj + βh σi . (1.11)
N i<j i≤N
The coefficient βh of i≤N σi makes perfect sense to a physicist. However,
when one looksat the mathematical
structure of (1.11), one sees that the two
terms N −1/2 i<j gij σi σj and i≤N σi appear to be of different natures.
Therefore, it would be more convenient to have unrelated coefficients in front
of these terms. For example, it is more cumbersome to take derivatives in β
when using the factors (1.11) than when using the factors
β
exp √ gij σi σj + h σi .
N i<j i≤N
1.2 Notations and Simple Facts 5
(Thus, despite its name, the partition function is a number, not a function.)
Let us repeat that we are interested in understanding what happens for N
very large. It is very difficult then to study ZN , as there are so many terms,
all random, in the sum. Throughout the book, we keep the letter Z to denote
a partition function.
The Gibbs measure GN on ΣN with Hamiltonian HN is defined by
exp(−HN (σ))
GN ({σ}) = . (1.15)
ZN
6 1. The Sherrington-Kirkpatrick Model
f = f (σ 1 , . . . , σ n )dGN (σ 1 ) · · · dGN (σ n )
1
= n f (σ 1 , . . . , σ n ) exp − HN (σ ) . (1.17)
ZN 1 n
σ ,...,σ ≤n
The notations (1.16) and (1.17) are in agreement. For example, if, say, a
function f (σ 1 , σ 2 ) on ΣN 2
depends only on σ 1 , we can also view it as a
function on ΣN ; and whether we compute f using (1.16) or (1.17), we get
the same result.
The formula (1.17) means simply that we integrate f on the space
(ΣN n
, G⊗n 1 2
N ). The configurations σ , σ , . . . belonging to the different copies
of ΣN involved there are called replicas. In probabilistic terms, the sequence
(σ )≥1 is simply an i.i.d. sequence distributed like the Gibbs measure. Repli-
cas will play a fundamental role. In physics, they are called “real replicas”,
to distinguish them from the n replicas of the celebrated “replica method”,
where “n is an integer tending to 0”. (There is no need yet to consult your
analyst if the meaning of this last expression is unclear to you.) Through-
out the book we denote replicas by upper indices. Again, this simply means
that these configurations are integrated independently with respect to Gibbs’
measure.
Replicas can be used in particular for “linearization” i.e. replacing a prod-
uct of brackets · by a single bracket. In probabilistic terms, this is simply
the identity EXEY = EXY when X and Y are independent r.v.s. Thus (with
slightly informal but convenient notation) we have, for a function f on ΣN ,
2
f = f (σ 1 )f (σ 2 ) . (1.18)
The partition function ZN is exponentially large. It is better studied on a
logarithmic scale through the quantity N −1 log ZN . This quantity is random;
we denote by pN its expectation
1
pN = E log ZN . (1.19)
N
Here, E denotes expectation over the “disorder”, i.e. the randomness of the
Hamiltonian. (Hence in the case of the Hamiltonian (1.12), this means expec-
tation with respect to the r.v.s gij .) One has to prove in principle that the
1.2 Notations and Simple Facts 7
∂ 1 1 1 ∂ZN
log ZN =
∂h N N ZN ∂h
1 1 ∂(−HN (σ))
= exp(−HN (σ))
N ZN σ ∂h
8 1. The Sherrington-Kirkpatrick Model
1 1
= σi exp(−HN (σ))
N ZN σ
i≤N
1
= σi , (1.20)
N
i≤N
which follows from the fact that E σi does not depend on i by symmetry.
This argument will often be used. It is called “symmetry between sites”. (A
site is simply an i ≤ N , the name stemming from the physical idea that it is
the site of a small magnet.) Therefore
∂pN
= E σ1 , (1.22)
∂h
the “average magnetization.”
Since the quantity pN encompasses much information, its exact compu-
tation cannot be trivial, even in the limit N → ∞ (the existence of which
is absolutely not obvious). As a first step one can try to get lower and up-
per bounds. A very useful fact for the purpose of finding bounds is Jensen’s
inequality, that asserts that for a convex function ϕ, one has
This inequality will be used a great many times (which means, as already
pointed out, that it would be helpful to learn it now). For concave functions
the inequality goes the other way, and the concavity of the log implies that
1 1
pN = E log ZN ≤ log EZN . (1.24)
N N
1.2 Notations and Simple Facts 9
The right-hand side of (1.24) is not hard to compute, but the bound (1.24)
is not really useful, as the inequality is hardly ever an equality.
Throughout the book we denote by ch(x), sh(x) and th(x) the hyperbolic
cosine, sine and tangent of x, and we write chx, shx, thx when no confusion
is possible.
Exercise 1.2.3. Use (A.6) to prove that for the Hamiltonian (1.12) we have
1 β2 1
log EZN = 1− + log 2 + log ch(h) . (1.25)
N 4 N
and taking logarithm and expectation this proves that pN ≥ log 2. Therefore,
combining with (1.24) and (1.25) we have (in the case of the Hamiltonian
(1.12)), and lightening notation by writing chh rather than ch(h),
β2 1
log 2 ≤ pN ≤ (1 − ) + log 2 + log chh . (1.26)
4 N
This rather crude bound will be much improved later. Let us also point out
that the computation of pN for every β > 0 provides the solution of the
“zero-temperature problem” of finding
1
E max(−HN (σ)) . (1.27)
N σ
Indeed,
exp β max(−HN (σ)) ≤ exp(−βHN (σ)) ≤ 2N exp β max(−HN (σ))
σ σ
σ
and thus
10 1. The Sherrington-Kirkpatrick Model
pN (β) 1 log 2
0≤ − E max(−HN (σ)) ≤ . (1.28)
β N σ β
Of course the computation of pN (β) for large β (even in the limit N →
∞) is very difficult but it is not quite as hopeless as a direct evaluation of
E maxσ (−HN (σ)).
For the many models we will consider in this book, the computation of
pN will be a central objective. We will be able to perform this computation
in many cases at “high temperature”, but the computation at “low temper-
ature” remains a formidable challenge.
We now pause for a while and introduce a different and simpler Hamilto-
nian. It is not really obvious that this Hamiltonian is relevant to the study of
the SK model, and that this is indeed the case is a truly remarkable feature.
We consider an i.i.d. sequence (zi )i≤N of standard Gaussian r.v.s. Consider
the Hamiltonian √
− HN (σ) = σi (βzi q + h) , (1.29)
i≤N
we obtain
√
ZN = exp σi (βzi q + h)
σi =±1 i≤N
√ √
= (exp(βzi q + h) + exp(−(βzi q + h))
i≤N
√
= 2N ch(βzi q + h) , (1.32)
i≤N
where
fi (1)ai (1) + fi (−1)ai (−1)
fi i = . (1.35)
ai (1) + ai (−1)
This shows that Gibbs’ measure is a product measure. It is determined by
theaverages σi because a probability measure μ on {−1, 1} is determined
by xdμ(x). To compute the average σi , we use the case where f (σ) = σi
and (1.34), (1.35), (1.31) to obtain
√ √
exp(βzi q + h) − exp(−(βzi q + h))
σi = √ √ ,
exp(βzi q + h) + exp(−(βzi q + h))
and thus
√
σi = th(βzi q + h) , (1.36)
where thx denotes the hyperbolic tangent of x. Moreover the quantities (1.36)
are probabilistically independent.
In words, we can reduce the study of the system with Hamiltonian (1.29)
to the study of the system with one single spin σi taking the possible values
√
σi = ±1, and with Hamiltonian H(σi ) = −σi (βzi q + h).
Exercise 1.2.4. Given a number a, compute the averages exp aσi and
exp aσi1 σi2 for the Hamiltonian (1.29). Of course as usual, the upper in-
dexes denote different replicas, so exp aσi1 σi2 is a “double integral”. As in
the case of (1.36), this reduces to the case of a system with one spin, and it is
surely a good idea to master these before trying to understand systems with
N spins. If you need a hint, look at (1.107) below.
For Hamiltonians that are more complicated than (1.29), and in partic-
ular when different sites interact, the Gibbs measure will not be a product
measure. Remarkably, however, it will often nearly be a product if one looks
only at a “finite number of spins”. That is, given any integer n (that does
not depend on N ), as N → ∞, the law of the Gibbs measure under the map
σ → (σ1 , . . . , σn ) becomes nearly a (random) product measure. Moreover,
the r.v.s ( σi )i≤n become nearly independent. It will be proved in this work
that this is true at high temperature for many models.
If one thinks about it, this is the simplest possible structure, the default
situation. It is of course impossible to interest a physicist in such a situation.
What else could happen, will he tell you. What else, indeed, but finding proofs
is quite another matter.
Despite the triviality of the situation (1.29), an (amazingly successful)
intuition of F. Guerra is that it will help to compare this situation with that
of the SK model. This will be explained in the next section. This comparison
goes quite far. In particular it will turn out that (when β is not too large)
for each n the sequence ( σi )i≤n will asymptotically have the same law as
√
the sequence (th(βzi q + h))i≤n , where zi are i.i.d. standard Gaussian r.v.s
and where the number q depends on β and h only. This should be compared
to (1.36).
Proof. Let
d 1 1
ui (t) = ui (t) = √ ui − √ vi
dt 2 t 2 1−t
so that
∂F
ϕ (t) = E ui (t) (u(t)) . (1.41)
∂xi
i≤M
Now
1
Eui (t)uj (t) =(Eui uj − Evi vj )
2
so the Gaussian integration by parts formula (A.17) yields (1.40).
Of course (and this is nearly the last time in this chapter that we worry
about this kind of problem) there is some extra work to do to give a com-
plete ε-δ proof of this statement, and in particular to deduce (1.41) from
(1.39) using Proposition A.2.1. The details of the argument are given in Sec-
tion A.2.
Since Lemma 1.3.1 relies on Gaussian integration by parts, the reader
might have already formed the question of what happens when one deals with
non-Gaussian situations, such as when one replaces the r.v.s gij of (1.12) by,
say, independent Bernoulli r.v.s (i.e. random signs), or by more general r.v.s.
Generally speaking, the question of what happens in a probabilistic situation
14 1. The Sherrington-Kirkpatrick Model
when one replaces Gaussian r.v.s by random signs can lead to very difficult
(and interesting) problems, but in the case of the SK model, it is largely
a purely technical question. While progressing through our various models,
we will gradually learn how to address such technical problems. It will then
become obvious that most of the results of the present chapter remain true
in the Bernoulli case.
Even though the purpose of this work is to study spin glasses rather than
to develop abstract mathematics, it might help to make a short digression
about what is really going on in Lemma 1.3.1. The joint law of the Gaussian
family u is determined by the matrix of the covariances aij = Eui uj . This ma-
trix is symmetric, aij = aji , so it is completely determined by the triangular
array a = (aij )1≤i≤j≤n and we can think of the quantity EF (u) as a function
Ψ (a). The domain of definition of Ψ is a convex cone with non-empty interior
(since Ψ (a) is defined if and only if the symmetric matrix (aij )i,j≤n is positive
definite), so it (often) makes sense to think of the derivatives ∂Ψ/∂aij . The
fundamental formula is as follows.
Proposition 1.3.2. If i < j we have
∂Ψ ∂2F
(a) = E (u) , (1.42)
∂aij ∂xi ∂xj
while
∂Ψ 1 ∂2F
(a) = E 2 (u) . (1.43)
∂aii 2 ∂xi
Let us first explain why this implies Lemma 1.3.1. If one thinks of a Gaussian
family as determined by its matrix of covariance, the magic formula (1.40)
is just the canonical interpolation in Rn(n+1)/2 between the points (aij ) =
(Eui uj ) and (bij ) := (Evi vj ), since
Therefore Lemma 1.3.1 follows from (1.42) and the chain rule, as is obvious
if we observe that ϕ(t) = Ψ (a(t)) where a(t) = (aij (t))1≤i≤j≤n and if we
reformulate (1.40) as
∂2F
ϕ (t) = (Eui uj − Evi vj )E (u(t))
∂xi ∂xj
1≤i<j≤n
1 ∂2F
+ (E u2i − E vi2 )E (u(t)) .
2 ∂x2i
1≤i≤n
where · denotes the Euclidean norm and o(x) a quantity that goes to 0
with x. This concludes the proof.
This ends our mathematical digression. To illustrate right away the power
of the smart path method, let us prove a classical result (extensions of which
will be useful in Volume II).
∂2F
∀i = j, ≥0
∂xi ∂xj
and
∀i ≤ M, Eu2i = Evi2 ; ∀i = j, Eui uj ≥ Evi vj .
Then
EF (u) ≥ EF (v) .
t2
P(|F (g) − EF (g)| ≥ t) ≤ 2 exp − . (1.47)
4A2
The remarkable part of this statement is that (1.47) does not depend on
M . It is a typical occurrence of the “concentration of measure phenomenon”.
When F is differentiable (and this will be the case for all the applications
we will consider in this work) (1.46) is equivalent to the following
where ∇F denotes the gradient of F , and x the Euclidean norm of the
vector x.
Proof. Let us first assume that F is infinitely differentiable (a condition
that is satisfied in all the cases where we use this result). Given a parameter
s, we would like to bound
E exp s(F (g) − EF (g)) . (1.49)
Eui uj − Evi vj = 0
Eui uj − Evi vj = −1 .
We consider u(t) as in (1.38), and ϕ(t) = EG(u(t)). Using (1.40) for G rather
than F , we get
∂2G
ϕ (t) = −E (u(t)) . (1.50)
∂yi ∂yi+M
i≤M
∂2G ∂F ∂F
(y) = −s2 ((yi )i≤M ) ((yi+M )i≤M ) G(y) . (1.51)
∂yi ∂yi+M ∂xi ∂xi
∂F 2
∀x ∈ R M
, (x) ≤ A2 ,
∂xi
i≤M
As pointed out the choice of the family (vi )i≤2M ensures that ϕ(0) = EG(v) =
1, so that (1.52) implies that ϕ(1) ≤ exp s2 A2 , i.e.
We use Jensen’s inequality (1.23) for the convex function exp(−sx) while
taking expectation in uM +1 , . . . , u2M , so that
we have
E exp s(F (g) − EF (g)) ≤ exp s2 A2 .
Using Markov’s inequality (A.7) we get that for s, t > 0
t2
P(F (g) − EF (g) ≥ t) ≤ exp − .
4A2
18 1. The Sherrington-Kirkpatrick Model
so that ∇F (x) ≤ maxs∈S a(s), and we conclude from (1.47) using the
equivalence of (1.46) and (1.48).
As a first example of application, let us consider the
case where M =
N (N − 1)/2, S = ΣN , and, for s = σ ∈ S, wσ = exp h i≤N σi and
β
a(σ) = √ σi σj .
N 1≤i<j≤N
Therefore 1/2
β N (N − 1) N
a(σ) = √ ≤β .
N 2 2
It follows from (1.53) that the partition function ZN of the Hamiltonian
(1.12) satisfies
t2
P(| log ZN − E log ZN | ≥ t) ≤ 2 exp − . (1.54)
2β 2 N
t2 N
P(|UN − EUN | ≥ t) ≤ 2 exp − .
2β 2
1.3 Gaussian Interpolation and the Smart Path Method 19
The right hand side starts to become small for t about N −1/2 , i.e. it is
unlikely that UN will differ from its expectation by more than a quantity of
order N −1/2 . In words, the fluctuations of the quantity UN = N −1 log ZN
are typically of order at most N −1/2 , while the quantity itself is of order 1.
This quantity is “self-averaging”, a fundamental fact, as was first mentioned
on page 7.
Let us try now to use (1.40) to compare two Gaussian Hamiltonians. This
technique is absolutely fundamental. It will be first used to make precise the
intuition of F. Guerra mentioned on page 12, but at this stage we try to obtain
a result that can also be used in other situations. We take M = 2N = cardΣN .
We consider two jointly Gaussian families u = (uσ ) and v = (vσ ) (σ ∈ ΣN ),
which we assume to be independent from each other. We recall the notation
√ √
uσ (t) = tuσ + 1 − tvσ ; u(t) = (uσ (t))σ ,
and we set
1
U (σ, τ ) = (Euσ uτ − Evσ vτ ) . (1.55)
2
Then (1.40) asserts that for a (well-behaved) function F on RM , if ϕ(t) =
EF (u(t)) we have
∂2F
ϕ (t) = E U (σ, τ ) (u(t)) . (1.56)
σ,τ
∂xσ ∂xτ
Let us assume that we are given numbers wσ > 0. For x = (xσ ) ∈ RM let us
define
1
F (x) = log Z(x) ; Z(x) = wσ exp xσ . (1.57)
N σ
Thus, if σ = τ we have
∂2F 1 wσ wτ exp(xσ + xτ )
(x) = − ,
∂xσ ∂xτ N Z(x)2
while if σ = τ we have
∂2F 1 wσ exp xσ w2 exp 2xσ
(x) = − σ .
∂x2σ N Z(x) Z(x)2
Exercise 1.3.6. Prove that the function F and its partial derivatives of
order 1 satisfy the “moderate growth condition” (A.18). (Hint: Use a simple
bound from below on Z(x), such as Z(x) ≥ wτ exp xτ for a given τ in ΣN .)
This exercise shows that it is legitimate to use (1.56) to compute the
derivative of ϕ(t) = EF (u(t)), which is therefore
20 1. The Sherrington-Kirkpatrick Model
1 1
ϕ (t) = E U (σ, σ)wσ exp uσ (t)
N Z(u(t)) σ
1
− U (σ, τ )wσ wτ exp(uσ (t) + uτ (t)) . (1.58)
Z(u(t))2 σ,τ
The notation · t will be used many times in the sequel. It would be nice to
remember now that the index t simply refers to the value of the interpolating
parameter. This will be the case whenever we use an interpolating Hamil-
tonian. If you forget the meaning of a particular notation, you might try to
look for it in the glossary or the index, that attempt to list for many of the
typical notations the page where it is defined.
Thus (1.58) simply means that
1
ϕ (t) = (E U (σ, σ) t − E U (σ 1 , σ 2 ) t ) . (1.60)
N
In the last term the bracket is a double integral for Gibbs’ measure, and the
variables are denoted σ 1 and σ 2 rather than σ and τ .
The very general formula (1.60) applies to the interpolation between any
two Gaussian Hamiltonians, and is rather fundamental in the study of such
Hamiltonians.
We should observe for further use that (1.60) even holds if the quantities
wσ are random, provided their randomness is independent of the randomness
of uσ and vσ . This is seen by proving (1.60) at wσ given, and taking a further
expectation in the randomness of these quantities. (When doing this, we
permute expectation in the r.v.s wσ and differentiation in t. Using Proposition
A.2.1 this is permitted by the fact that the quantity (1.60) is uniformly
bounded over all choices of (wσ ) by (1.65) below.)
The consideration of Hamiltonians such as (1.29) shows that it is natural
to consider “random external fields”. That is, we consider an i.i.d. sequence
(hi )i≤N of random variables, having the same distribution as a given r.v. h
(with moments of all orders). We assume that this sequence is independent
of all the other r.v.s. Rather than the Hamiltonian (1.12) we consider instead
the more general Hamiltonian
1.3 Gaussian Interpolation and the Smart Path Method 21
β
− HN (σ) = √ gij σi σj + hi σi . (1.61)
N i<j i≤N
β2
ϕ(1) ≤ ϕ(0) + (1 − q)2 . (1.67)
4
When considering an interpolating Hamiltonian Ht we will always lighten
notation by writing H0 rather than Ht=0 . Recalling the choice of vσ in (1.62)
it follows from (1.59) that
√
− H0 (σ) = σi (βzi q + hi ) , (1.68)
i≤N
Here we have chosen convenient but technically incorrect notation. The no-
tation (1.70) is incorrect, since ZN (β, h) depends on the actual realization
of the r.v.s hi , not only on h. Speaking of incorrect notation, we will go one
step further and write
1
pN (β, h) := E log ZN (β, h) . (1.71)
N
The expectation in the right hand side is over all sources of randomness, in
this case the r.v.s hi , and (despite the notation) the quantity pN (β, h) is a
number depending only on β and the law of h. If L(h) denotes the law of
h, it would probably be more appropriate to write pN (β, L(h)) rather than
pN (β, h). The simpler notation pN (β, h) is motivated by the fact that the
most important case (at least in the sense that it is as hard as the general
case) is the case where h is constant. If this notation disturbs you, please
assume everywhere that h is constant and you will not lose much.
Thus with these notations we have
In the statement of the next theorem E stands as usual for expectation in all
sources of randomness, here the r.v.s z and h. This theorem is a consequence
of (1.72), (1.67) and (1.69).
Theorem 1.3.7. (Guerra’s replica-symmetric bound). For any choice
of β, h and q we have
√ β2
pN (β, h) ≤ log 2 + E log ch(βz q + h) + (1 − q)2 . (1.73)
4
1.3 Gaussian Interpolation and the Smart Path Method 23
Again, despite the notation, the quantity pN (β, h) is a number. The ex-
pression “replica-symmetric” is physics’ terminology. Its meaning might grad-
ually become clear. The choice q = 0, h constant essentially recovers (1.25).
It is now obvious what is the best choice of q: the choice that minimizes
the right-hand side of (1.73), i.e.
βz √ β2 β2 1
0 = E √ th(βz q + h) − (1 − q) = E 2 √ − (1 − q) ,
2 q 2 2 ch (βz q + h)
using Gaussian integration by parts (A.14). Since ch−2 (x) = 1 − th2 (x), this
means that we have the absolutely fundamental relation
√
q = Eth2 (βz q + h) . (1.74)
Of course at this stage this equation looks rather mysterious. The mystery will
gradually recede, in particular in (1.105) below. The reader might wonder at
this stage why we do not give a special name, such as q ∗ , to the fundamental
quantity defined by (1.74), to distinguish it from the generic value of q. The
reason is simply that in the long range it is desirable that the simplest name
goes to the most used quantity, and the case where q is not the solution of
(1.74) is only of some limited interest.
It will be convenient to know that the equation (1.74) has a unique solu-
tion.
Proposition 1.3.8. (Latala-Guerra) The function
√
th2 (z x + h)
Ψ (x) = E
x
is strictly decreasing on R+ and vanishes as x → ∞. Consequently if Eh2 > 0
there is a unique solution to (1.74).
The difficult part of the statement is the proof that the function Ψ is
strictly decreasing. In that case, since limx→0+ xΨ (x) = Eth2 h > 0, we have
limx→0+ Ψ (x) = ∞, and since limx→∞ Ψ (x) = 0 there is a unique solution to
the equation Ψ (x) = 1/β 2 and hence (1.74). But Eth2 h = 0 only when h = 0
a.e. (in which case when β > 1 there are 2 solutions to (1.74), one of which
being 0).
Proposition 1.3.8 is nice but not really of importance. The proof is very
beautiful, but rather tricky, and the tricky ideas are not used anywhere else.
To avoid distraction, we postpone this proof until Section A.14. At this stage
we give the proof only in the case where β < 1, because the ideas of this
simple argument will be used√ again and again. Given a (smooth) function f
the function ψ(x) = Ef (βz x + h) satisfies
z √ β2 √
ψ (x) = βE √ f (βz x + h) = Ef (βz x + h) , (1.75)
2 x 2
24 1. The Sherrington-Kirkpatrick Model
using Gaussian integration by parts (A.14). We use this for the function
f (y) = th2 y, that satisfies
thy 1 − 2sh2 y
f (y) = 2 ; f (y) = 2 ≤2.
ch2 y ch4 y
√
Thus, if β < 1, we deduce from (1.75) that the function ψ(q) = Eth2 (βz q +
h) satisfies ψ (q) < 1. This function maps the unit interval into itself, so that
it has a unique fixed point.
Let us denote by SK(β, h) the right-hand side of (1.73) when q is as in
(1.74). As in the case of pN (β, h) this is a number depending only on β and
the law of h. Thus (1.73) implies that
We can hope that when q satisfies (1.74) there is near equality in (1.76),
so that the right hand-side of (1.76) is not simply a good bound for pN (β, h),
but essentially the value of this quantity as N → ∞. Moreover,
1 we have a
clear road to prove this, namely (see (1.65)) to show that 0 E (R1,2 − q)2 t dt
is small. We will pursue this idea in Section 1.4, where we will prove that
this is indeed the case when β is not too large. The case of large β (low
temperature) is much more delicate, but will be approached in Volume II
through a much more elaborated version of the same ideas.
Theorem 1.3.9. (Guerra-Toninelli [75]) For all values of β, h, the se-
quence (N pN (β, h))N ≥1 is superadditive, that is, for integers N1 and N2 we
have
N1 N2
pN1 +N2 (β, h) ≥ pN1 (β, h) + pN (β, h). (1.77)
N1 + N2 N1 + N2 2
Consequently, the limit
exists.
Of course this does not tell us what is the value of p(β, h), although we know
by (1.76) that p(β, h) ≤ SK(β, h).
Proof. Let N = N1 +N2 . The idea is to compare the SK Hamiltonian of size
N with two non-interacting SK Hamiltonians of sizes N1 and N2 . Consider
uσ as in (1.62) and
β β
vσ = √ gij σi σj + √ gij σi σj ,
N1 i<j≤N1
N2 N1 <i<j≤N
1.3 Gaussian Interpolation and the Smart Path Method 25
so that
N1 N2
R1,2 = R + R .
N N
The convexity of the function x → x2 implies
N1 2 N2
2
R1,2 ≤ R + R 2
. (1.79)
N N
Rather than (1.64), a few lines of elementary algebra now yield
1 β2 N1 2 N2 2 1
1 2
U (σ , σ ) = R1,2 −
2
R − R + . (1.80)
N 4 N N N
β2 N1 2 N2
ϕ (t) = − 2
E R1,2 − R − R 2
≥0
4 N N t
Generally speaking it seems plausible that “all limits exist”. Some infor-
mation can be gained using an elementary fact known as Griffiths’ lemma
in statistical mechanics. This is, if a sequence ϕN of convex (differentiable)
functions converges pointwise in an interval to a (necessarily convex) func-
tion ϕ, then limN →∞ ϕN (x) = ϕ (x) at every point x for which ϕ (x) exists
(which is everywhere outside a countable set of possible exceptional values).
26 1. The Sherrington-Kirkpatrick Model
If Griffiths’ lemma does not seem obvious to you, please do not worry, for the
time being this is only a side story, the real point of it being a pretense to
introduce Lemma 1.3.11 below, a step in our learning of Gaussian integration
by parts. Later on, in Volume II, we will use quantitative versions of Griffiths’
lemma with complete proofs.
It is a special case of Hölder’s inequality that the function
β → log f β dμ
is convex (whenever f > 0) for any probability measure μ. Indeed, this means
that for 0 < a < 1 and β1 , β2 > 0 we have
a 1−a
f aβ1 +(1−a)β2 dμ ≤ f β1 dμ f β2 dμ ,
Thus
1
N Φ (β) = wσ uσ exp βuσ ( = uσ ) (1.83)
ZN σ
and
2
1 1
N Φ (β) = wσ uσ exp βuσ −
2
wσ uσ exp βuσ
ZN σ ZN σ
= u2σ − uσ 2
≥0,
1.3 Gaussian Interpolation and the Smart Path Method 27
where the last inequality is simply the Cauchy-Schwarz inequality used in the
probability space (ΣN , GN ).
In particular pN (β, h) is a convex function of β. By Theorem 1.3.9,
p(β, h) = limN →∞ pN (β, h) exists. The function β → p(β, h) is convex and
therefore is differentiable at every point outside a possible countable set of
exceptional values. Now, we have the following important formula.
To compute
wσ exp βuσ
Euσ ,
τ wτ exp βuτ
we first think of the quantities wτ as being fixed numbers, with wτ > 0. We
then apply the Gaussian integration by parts formula (A.17) to the jointly
Gaussian family (uτ )τ and the function
wσ exp βxσ
Fσ (x) = ,
τ wτ exp βxτ
to get
28 1. The Sherrington-Kirkpatrick Model
Exercise 1.3.12. To make clear the point of the previous remark, derive
formula (1.84) by considering ZN as a function of the r.v.s (gij ).
Exercise 1.3.13. Show that (1.84) can in fact be deduced from (1.60). Hint:
use uσ as in (1.62) but take now vσ = 0.
As the next exercise shows, the formula (1.84) is not an accident, but a
first occurrence of a general principle that we will use a great many times
later. In the long range the reader would do well to really master this result.
Exercise 1.3.14. Consider a jointly Gaussian family of r.v.s (HN (σ))σ∈ΣN
and another family (HN (σ))σ∈ΣN of r.v.s. These two families are assumed
to be independent of each other. Let
1
pN (β) = E log exp(−βHN (σ) − HN (σ)) .
N σ
Prove that
d β
pN (β) = (E U (σ, σ) − E U (σ 1 , σ 2 ) ) ,
dβ N
1.4 Latala’s Argument 29
expression “a few spins” means that we consider a fixed number of spins, and
then take N very large.) For many of the models we will study the proof of
(1.86) will be a major goal, and the key step in the computation of pN .
In this section we present a beautiful (unpublished!!) argument of R.
Latala that probably provides the fastest way to prove (1.86) for the SK
model at high enough temperature (i.e. β small enough). This argument is
however not easy to generalize in some directions, and we will learn a more
versatile method in Section 1.6.
From now on we lighten notation by writing ν(f ) for E f . In this section
the Gibbs measure is relative to the Hamiltonian (1.61), that is
β
−HN (σ) = √ gij σi σj + hi σi .
N i<j i≤N
The next theorem provides a precise version of (1.86), in the form of a strong
exponential inequality.
Of course here to lighten notation we write exp sN (R1,2 − q)2 rather than
exp(sN (R1,2 − q)2 ). Since exp x ≥ xk /k! for x ≥ 0 and k ≥ 1, this shows that
1 1
ν((sN )k (R1,2 − q)2k ) ≤ ,
k! 1 − 2s − 4β 2
so that, since k! ≤ kk ,
1 ks k
ν((R1,2 − q)2k ) ≤ ,
1 − 2s − 4β 2 N
and in particular
Kk k
ν (R1,2 − q)2k ≤ , (1.88)
N
where K does not depend on N or k.
The important relationship between growth of moments and exponential
integrability is detailed in Section A.6. This relation is explained there for
probabilities. It is perfectly correct to think of ν (and of its avatar νt defined
below) as being the expectation for a certain probability. This can
be made formal. We do not explain this since it requires an extra level of
abstraction that does not seem very fruitful.
An important special case of (1.88) is:
1.4 Latala’s Argument 31
ν (R1,2 − q)2 ≤ K/N . (1.89)
Equation (1.88) is the first of very many that involve an unspecified con-
stant K. There are several reasons why it is desirable to use such constants. A
clean explicit value might be hard to get, or, like here, it might be irrelevant
and rather distracting. When using such constants, it is understood through-
out the book that their value might not be the same at each occurrence. The
use of the word “constant” to describe K is because this number is never,
ever permitted to depend on N . On the other hand, it is typically permitted
to depend on β and h. Of course we will try to be more specific when the
need arises. An unspecified constant that does not depend on any parameter
(a so-called universal constant) will be denoted by L,and the value of this
quantity might also not be the same at each occurrence (as e.g. in the relation
L = L + 2). Of course, K0 , L1 , etc. denote specific quantities. These conven-
tions will be used throughout the book and it surely would help to remember
them from now on.
It is a very non-trivial question to determine the supremum of the values
of β for which one can control ν(exp sN (R1,2 − q)2 ) for some s > 0, or the
supremum of the values of β for which (1.89) holds. (It is believable that these
are the same.) The method of proof of Theorem 1.4.1 does not allow one to
reach this value, so we do not attempt to push the method to its limit, but
rather to give a clean statement. There is nothing magic about the condition
β < 1/2, which is an artifact of the method of proof. In Volume II, we will
prove that actually (1.88) holds in a much larger region.
We now turn to a general principle of fundamental importance. We go
back to the general case of Gaussian families (uσ ) and (vσ ), and for σ ∈ ΣN
we consider a number wσ > 0. We recall that we denote by
· t
an average for the Gibbs measure with Hamiltonian (1.59), that is,
√ √
−Ht (σ) = tuσ + 1 − tvσ + log wσ = uσ (t) + log wσ .
n
Then, for a function f on ΣN (= (ΣN )n ) we have
−n
f t = Z(u(t)) f (σ , . . . , σ )wσ1 · · · wσn exp
1 n
uσ (t) ,
σ 1 ,...,σ n ≤n
where Z(u(t)) = σ wσ exp uσ (t). We write
d
νt (f ) = E f t ; νt (f ) = (νt (f )) .
dt
The general principle stated in Lemma 1.4.2 below provides an explicit for-
mula for νt (f ). It is in a sense a straightforward application of Lemma 1.3.1.
32 1. The Sherrington-Kirkpatrick Model
However, since Lemma 1.3.1 requires computing the second partial deriva-
tives of the function F , when this function is complicated, (e.g. is a quotient
of 2 factors) we must face the unavoidable fact that this will produce for-
mulas that are not as simple as we might wish. We should be well prepared
for this, as we all know that computing derivatives can lead to complicated
expressions.
We recall the function of two configurations U (σ, τ ) given by (1.55), that
is, U (σ, τ ) = 1/2(Euσ uτ − Evσ vτ ). Thus, in the formula below, the quantity
U (σ , σ ) is
1
U (σ , σ ) = (Euσ uσ − Evσ vσ ) .
2
We also point out that in this formula, to lighten notation, f stands for
f (σ 1 , . . . , σ n ).
n
Lemma 1.4.2. If f is a function on ΣN (= (ΣN )n ), then
νt (f ) = νt (U (σ , σ )f ) − 2n νt (U (σ , σ n+1 )f )
1≤, ≤n ≤n
− nνt (U (σ n+1
,σ n+1
)f ) + n(n + 1)νt (U (σ n+1 , σ n+2 )f ) . (1.90)
This formula looks scary the first time one sees it, but one should observe
that the right-hand side is a linear combination of terms of the same nature,
each of the type
νt (U (σ , σ )f ) = E U (σ , σ )f (σ 1 , . . . , σ n ) t .
The complication is purely algebraic (as it should be). One can observe that
even though f depends only on n replicas, (1.90) involves two new indepen-
dent replicas σ n+1 and σ n+2 .
We will use countless times a principle called symmetry between repli-
cas, a name not to be confused with the expression “replica-symmetric”.
This principle asserts e.g. that ν(f (σ 1 )U (σ 1 , σ 2 )) = ν(f (σ 1 )U (σ 1 , σ 3 )). The
reason for this is simply that the sequence (σ ) is an i.i.d. sequence under
Gibbs’ measure, so that for any permutation π of the replica indices, and any
function f (σ 1 , . . . , σ n ), one has f (σ 1 , . . . , σ n ) = f (σ π(1) , . . . , σ π(n) ) , and
hence taking expectation,
that we will apply to this function, carefully collecting the terms. Let us set
F1 (x) = wσ1 · · · wσn f (σ , . . . , σ ) exp
1 n
xσ ,
σ 1 ,...,σ n ≤n
so that
F (x) = Z(x)−n F1 (x) ,
and therefore
∂F ∂F1 ∂Z
(x) = Z(x)−n (x) − nZ −n−1 (x) (x)F1 (x) .
∂xσ ∂xσ ∂xσ
Consequently,
∂2F ∂ 2 F1
(x) = Z(x)−n (x)
∂xσ ∂xτ ∂xσ ∂xτ
∂Z ∂F1 ∂Z ∂F1
− nZ(x)−n−1 (x) (x) + (x) (x)
∂xσ ∂xτ ∂xτ ∂xσ
2
∂ Z
− nZ(x)−n−1 (x)F1 (x)
∂xσ ∂xτ
∂Z ∂Z
+ n(n + 1)Z(x)−n−2 (x) (x)F1 (x) . (1.91)
∂xσ ∂xτ
Each of the four terms of (1.91) corresponds to a term in (1.90). We will
explain this in detail for the first and the last terms. We observe first that
∂Z
(x) = wσ exp(xσ ) ,
∂xσ
so that the last term of (1.91) is
34 1. The Sherrington-Kirkpatrick Model
where
C, (σ, τ , x)
= 1{σ =σ} 1{σ =τ } wσ1 · · · wσn f (σ 1 , . . . , σ n ) exp xσ1 ,
σ 1 ,...,σ n 1 ≤n
and the contribution of the second term of (1.91) is indeed the second term
of (1.90). The case of the other terms is similar.
Exercise 1.4.4. In the proof of Lemma 1.4.2 write in full detail the contri-
bution of the other terms of (1.91).
The reader is urged to complete this exercise, and to meditate the proof
of Lemma 1.4.2 until she fully understands it. The algebraic mechanism at
work in (1.90) will occur on several occasions (since Gibbs’ measures are
intrinsically given by a ratio of two quantities). More generally, calculations
of a similar nature will be needed again and again.
It will often be the case that U (σ, σ) is a number that does not depend
on σ, in which case the third sum in (1.90) cancels the diagonal of the first
one, and (1.90) simplifies to
νt (f ) = 2 νt (U (σ , σ )f ) − n νt (U (σ , σ n+1 )f )
1≤< ≤n ≤n
n(n + 1)
+ νt (U (σ n+1 , σ n+2 )f ) . (1.95)
2
Then (1.90) still holds true despite the fact that the numbers wσ are now
random. This is seen by first using (1.90) at a given realization of the r.v.s
hi , and then taking a further expectation in the randomness of these. Let us
next compute in the present setting the quantities U (σ , σ ). Let us define
1
R, = σi σi . (1.96)
N
i≤N
This notation will be used in the entire book a countless number of times.
We will also use countless times that by symmetry between replicas, we have
e.g. that ν(R1,2 ) = ν(R1,3 ) or ν(R1,2 R2,3 ) = ν(R1,2 R1,3 ). On the other hand,
36 1. The Sherrington-Kirkpatrick Model
N β2
νt (f ) = νt (R1,2 − q)2 f − 2 νt (R,3 − q)2 f
2
≤2
+ 3νt (R3,4 − q)2 f . (1.97)
Up to Corollary 1.4.7 below, the results are true for every value of q, not
only the solution of (1.74).
(N λ)k
νt (R3,4 − q)2 exp λN (R1,2 − q)2 = νt (R3,4 − q)2 (R1,2 − q)2k
k!
k≥0
(N λ)k
≤ νt (R1,2 − q)2k+2
k!
k≥0
= νt (R1,2 − q)2 exp λN (R1,2 − q)2 .
is non-increasing.
Proof. In the function (1.101) there are two sources of dependence in t,
through νt and through the term −2tβ 2 , so that
d
νt exp (λ − 2tβ )N (R1,2 − q)
2 2
dt
= νt exp (λ − 2tβ 2 )N (R1,2 − q)2
− 2N β 2 νt (R1,2 − q)2 exp (λ − 2tβ 2 )N (R1,2 − q)2 ,
Proposition 1.4.8. When q is the solution of (1.74), for λ < 1/2 we have
1
ν0 exp λN (R1,2 − q)2 ≤ √ . (1.102)
1 − 2λ
Whenever, like here, we state a result without proof or reference, the rea-
son is always that (unless it is an obvious corollary of what precedes) the proof
can be found later in the same section, but that we prefer to demonstrate its
use before giving this proof.
At this point we may try to formulate in words the idea underlying the
proof of Theorem 1.4.1: it is to transfer the excellent control (1.102) of R1,2 −q
for ν0 to ν1 using Lemma 1.4.2.
Proof of Theorem 1.4.1. Taking λ = s+2β 2 < 1/2 we deduce from (1.102)
and Corollary 1.4.7 that for all 0 ≤ t ≤ 1,
1
νt exp (s + 2(1 − t)β 2 )N (R1,2 − q)2 ≤ ,
1 − 2s − 4β 2
because this is true for t = 0 and because the left-hand side is a non-increasing
function of t. Since s + 2(1 − t)β 2 ≥ s this shows that for each t (and in
particular for t = 1) we have
38 1. The Sherrington-Kirkpatrick Model
1
νt exp sN (R1,2 − q)2 ≤ .
1 − 2s − 4β 2
At this point the probabilistically oriented reader should think of the se-
quence (σi1 σi2 )1≤i≤N as (under ν0 ) an i.i.d. sequence of {−1, 1}-valued r.v.s
of expectation q, for which all kinds of estimates are classical. Nonetheless
we give a simple self-contained proof. The main step of this proof is to show
that for every u we have
N u2
ν0 exp N u(R1,2 − q) ≤ exp . (1.105)
2
Since (1.105) holds for every value of u it holds when u is a Gaussian r.v.
with Eu2 = 2λ/N , independent of all the other sources of randomness. Taking
expectation in u in (1.105) and using (A.11) yields (1.102).
To prove (1.105), we first evaluate
ν0 (exp N u(R1,2 − q)) = ν0 exp u (σi σi − q)
1 2
i≤N
= ν0 exp u(σi1 σi2 − q) , (1.106)
i≤N
1.4 Latala’s Argument 39
by independence between the sites. Using that when |ε| = 1 we have exp εx =
chx + shεx = chx + ε shx, we obtain
and thus
ν0 (exp uσi1 σi2 ) = chu + ν0 (σi1 σi2 ) shu . (1.107)
Therefore (1.104) implies
ν0 exp u(σi1 σi2 ) = chu + q shu ,
and consequently
ν0 exp u(σi1 σi2 − q) = exp (−qu) (chu + q shu) .
u2
(chu + qshu) exp(−qu) ≤ exp .
2
Indeed the function
satisfies f (0) = 0,
2
shu + qchu shu + qchu
f (u) = − q ; f (u) = 1 −
chu + qshu chu + qshu
u2
ν0 exp u(σi1 σi2 − q) ≤ exp
2
and (1.106) yields (1.105). This completes the proof.
Let us recall that we denote by SK(β, h) the right-hand side of (1.73)
when q is as in (1.74). As in the case of pN (β, h) this is a number depending
only on β and the law of h.
K
|pN (β, h) − SK(β, h)| ≤ , (1.108)
N
where K does not depend on N .
40 1. The Sherrington-Kirkpatrick Model
Thus, when β < 1/2, (1.73) is a near equality, and in particular p(β, h) =
limN →∞ pN (β, h) = SK(β, h). Of course, this immediately raises the question
as for which values of (β, h) this equality remains true. This is a difficult
question that will be investigated later. It suffices to say now that, given h,
the equality fails for large enough β, but this statement itself is far from being
obvious.
We have observed that, as a consequence of Hölder’s inequality, the func-
tion β → pN (β, h) is convex. It then follows from (1.108) that, when β < 1/2,
the function β → SK(β, h) is also convex. Yet, this is not really obvious on
the definition of this function. It should not be very difficult to find a calcu-
lus proof of this fact, but what is needed is to understand really why this is
the case. Much later, we will be able to give a complicated analytic expres-
sion (the Parisi formula) for limN →∞ pN (β, h), which is valid for any value
of β, and it is still not known how to prove by a direct argument that this
analytical expression is a convex function of β.
In a statement such as (1.108) the constant K can in principle depend on
β and h. It will however be shown in the proof that for β ≤ β0 < 1/2, it can
be chosen so that it does not depend on β or h.
Proof of Theorem 1.4.10. We have proved in (1.103) that if β ≤ β0 < 1/2
then νt ((R1,2 −q)2 ) ≤ K/N , where K depends on β0 only. Now (1.65) implies
2
ϕ (t) − β (1 − q)2 ≤ K ,
4 N
and
√
ϕ(1) = pN (β, h) ; ϕ(0) = log 2 + E log ch(β q + h) .
Theorem 1.4.10 controls the expected value (= first moment) of the quan-
tity N −1 log ZN (β, h) − SK(β, h). In Theorem 1.4.11 below we will be able
to accurately compute the higher moments of this quantity. Of course this
requires a bit more work. This result will not be used in the sequel, so it can
in principle be skipped at first reading. However we must mention that one of
the goals of the proof is to further acquaint the reader with the mechanisms
of integration by parts.
Let us denote by a(k) the k-th moment of a standard Gaussian r.v. (so
that a(0) = 1 = a(2), a(1) = 0 and, by integration by parts, a(k) = Egg k−1 =
√
(k − 1)a(k − 2)). Consider q as in (1.74) and Y = βz q + h. Let
β 2 q2
b = E(log chY )2 − (E log chY )2 − .
2
1.4 Latala’s Argument 41
Theorem 1.4.11. Assume that the r.v. h is Gaussian (not necessarily cen-
tered). Then if β < 1/2, for each k ≥ 1 we have
k
E 1 log ZN (β, h) − SK(β, h) − 1 a(k)bk/2 ≤ K
, (1.109)
N N k/2 N (k+1)/2
Lemma 1.4.12. If the r.v. h is Gaussian (not necessarily centered) then for
any value of β we have
k
1
E log ZN (β, h) − pN (β, h) = O(k) . (1.113)
N
ψ(t) = EV (t)k .
and we compute ∂ϕ(t, a)/∂t using (1.40). This is done by a suitable extension
of (1.60). Keeping the notation of this formula, as well as the notation (1.60),
consider the function W (x) = (x−a)k and for x = (xσ ), consider the function
1
F (x) = W log Z(x) .
N
Thus
44 1. The Sherrington-Kirkpatrick Model
∂F 1 wτ exp xτ 1
(x) = W log Z(x) .
∂xτ N Z(x) N
If xσ = xτ we then have
∂2F 1 wσ wτ exp(xσ + xτ ) 1
(x) = − W log Z(x)
∂xσ ∂xτ N Z(x)2 N
1 wσ wτ exp(xσ + xτ ) 1
+ 2 W log Z(x) ,
N Z(x)2 N
while
∂2F 1 wσ exp xσ 2
wσ exp 2xσ 1
(x) = − W log Z(x)
∂x2σ N Z(x) Z(x)2 N
1 w2 exp 2xσ 1
+ 2 σ 2
W log Z(x) .
N Z(x) N
satisfies
∂ϕ 1
(t, a) = E ( U (σ, σ) t − U (σ 1 , σ 2 ) t )W (A(t))
∂t N
1
+ 2 E U (σ 1 , σ 2 ) t W (A(t)) ,
N
and replacing W by its value this is
∂ϕ k
(t, a) = E ( U (σ, σ) t − U (σ 1 , σ 2 ) t )(A(t) − a)k−1
∂t N
k(k − 1)
+ E U (σ 1 , σ 2 ) t (A(t) − a)k−2 . (1.125)
N2
This is a generalization of (1.60), that corresponds to the case k = 1.
There is an alternate way to explain the structure of the formula (1.125)
(but the proof is identical). It is to say that straightforward (i.e. applying
only the most basic rules of Calculus) differentiation of (1.116) yields
1 σ−Ht (σ) exp(−Ht (σ)) 1
A (t) = = −Ht (σ) t ,
N σ exp(−Ht (σ)) N
where
d 1 1
−Ht (σ) := (−Ht (σ)) = √ uσ − √ vσ ,
dt 2 t 2 1−t
so that
1.4 Latala’s Argument 45
∂ϕ k
(t, a) = kE(A (t)(A(t)−a)k−1 ) = E −Ht (σ) t (A(t)−a)k−1 . (1.126)
∂t N
One then integrates by parts, while using the key relation U (σ, τ ) =
EHt (σ)Ht (τ ). (Of course making this statement precise amounts basically
to reproducing the previous calculation.) The dependence of the bracket · t
on the Hamiltonian creates the first term in (1.125) (we have actually already
done this computation), while the dependence of A(t) on this Hamiltonian
creates the second term.
This method of explanation is convenient to guide the reader (once she
has gained some experience) through the many computations (that will soon
become routine) involving Gaussian integration by parts, without reproduc-
ing the computations in detail (which would be unbearable). For this reason
we will gradually shift (in particular in the next chapters) to this convenient
method of giving a high-level description of these computations. Unfortu-
nately, there is no miracle, and to gain the experience that will make these
formulas transparent to the reader, she has to work through a few of them
in complete detail, and doing in detail the integration by parts in (1.126) is
an excellent start.
Using (1.64) and completing the squares in (1.125) yields
∂ϕ β2k
(t, a) = − E (R1,2 − q)2 t (A(t) − a)k−1
∂t 4
β2
+ k(1 − q)2 E(A(t) − a)k−1
4
β 2 k(k − 1)
+ E (R1,2 − q)2 t (A(t) − a)k−2
4 N
β 2 k(k − 1) 2 β2
− q E(A(t) − a)k−2 − k(k − 1)E(A(t) − a)k−2 .
4 N 4N 2
Now, since
d ∂ϕ ∂ϕ
ϕ(t, SK(t)) = (t, SK(t)) + SK (t) (t, SK(t)) ,
dt ∂t ∂a
and
∂ϕ β2
(t, a) = −kE(A(t) − a)k−1 ; SK (t) = (1 − q)2 ,
∂a 4
one gets
ψ (t) = I + II
where
β 2 q 2 k(k − 1)
I=− EV (t)k−2
4 N
β2k β 2 k(k − 1)
II = − E (R1,2 − q)2 t V (t)k−1 + E (R1,2 − q)2 t V (t)k−2
4 4 N
β2
− k(k − 1)EV (t)k−2 .
4N 2
46 1. The Sherrington-Kirkpatrick Model
We claim that
II = O(k + 1) .
To see this we note that by (1.118) (used for 2(k − 1) rather than k) we have
E(V (t)2(k−1) ) = O(2(k − 1)) and we write, using (1.103),
1/2
E (R1,2 − q)2 t V (t)k−1 ≤ (E (R1,2 − q)4 t )1/2 E V (t)2(k−1)
= O(2)O(k − 1) = O(k + 1) .
The case of the other terms is similar. Thus, we have proved that ψ (t) = I +
O(k +1), and since b (t) = −β 2 q 2 /2 we have also proved (1.120). To complete
the induction it remains only to prove (1.122). With obvious notation,
1
V (0) = (log chYi − E log chY ) .
N
i≤N
The r.v.s Xi = log chYi − E log chY form an i.i.d. sequence of centered vari-
ables, so the statement in that case is simply (a suitable quantitative version
of) the central limit theorem. We observe that by (1.118), for each k, we have
EV (0)k = O(k). (Of course the use of Lemma 1.4.12 here is an overkill.) To
evaluate EV (0)k we use symmetry to write
k−1
k
k−1
XN
EV (0) = E XN V (0) = E XN +B
N
where B = N −1 i≤N −1 Xi . We observe that since B = V (0) − XN /N , for
each k we have EB k = O(k). We expand the term (XN /N + B)k−1 and since
EXN = 0 we get the relation
k−1
EV (0)k = 2
EXN EB k−2 + O(k + 1) .
N
Using again that B = V (0) − XN /N and since EXN
2
= b(0) we then obtain
k−1
EV (0)k = b(0)EV (0)k−2 + O(k + 1) ,
N
from which the claim follows by induction.
Here is one more exercise to help the reader think about interpolation
between two Gaussian Hamiltonians uσ and vσ .
Our next result makes apparent that the (crucial) property ν((R1,2 −
q)2 ) ≤ K/N implies some independence between the sites.
Proposition 1.4.14. For any p and any q with 0 ≤ q ≤ 1 we have
E( σ1 · · · σp − σ1 · · · σp )2 ≤ K(p)ν (R1,2 − q)2 , (1.127)
where K(p) depends on p only.
This statement is clearly of importance: it means that when the right-
hand side is small “the spins decorrelate”. (When p = 2, the quantity
σ1 σ2 − σ1 σ2 is the covariance of the spins σ1 and σ2 , seen as r.v.s on the
probability space (ΣN , GN ). The physicists call this quantity the truncated
correlation.) Equation (1.127) is true for any value of q, but we will show
in Proposition 1.9.5 below that essentially the only value of q for which the
quantity ν((R1,2 − q)2 ) might be small is the solution of (1.74).
We denote by · the dot product in RN , so that e.g. R1,2 = σ 1 · σ 2 /N .
A notable feature of the proof of Proposition 1.4.14 is that the only feature
of the model it uses is symmetry between sites, so this proposition can be
applied to many of the models we will study.
Proof of Proposition 1.4.14. Throughout the proof K(p) denotes a num-
ber depending on p only, that need not be the same at each occurrence. The
proof goes by induction on p, and the case p = 1 is obvious. For the induction
from p − 1 to p it suffices to prove that
E( σ1 · · · σp − σ1 · · · σp−1 σp )2 ≤ K(p)ν (R1,2 − q)2 . (1.128)
Let σ̇i = σi − σi and σ̇ = (σ̇i )i≤N . Therefore
σ1 · · · σp − σ1 · · · σp−1 σp = σ1 σ2 · · · σp−1 σ̇p .
Using replicas, we have
( σ1 · · · σp − σ1 · · · σp−1 σp )2 = σ1 σ2 · · · σp−1 σ̇p 2
= σ11 σ12 · · · σp−1
1 2
σp−1 σ̇p1 σ̇p2 ,
so that
E( σ1 · · · σp − σ1 · · · σp−1 σp )2 = ν(σ11 σ12 · · · σp−1
1 2
σp−1 σ̇p1 σ̇p2 ) . (1.129)
Using symmetry between sites,
N (N − 1) · · · (N − p + 1)ν(σ11 σ12 · · · σp−1 1 2
σp−1 σ̇p1 σ̇p2 )
= ν(σi11 σi21 σi12 σi22 · · · σi1p−1 σi2p−1 σ̇i1p σ̇i2p )
i1 ,...,ip all different
≤ ν(σi11 σi21 σi12 σi22 · · · σi1p−1 σi2p−1 σ̇i1p σ̇i2p )
all i1 ,...,ip
p−1 σ̇ · σ̇ p−1 σ̇ · σ̇
1 2 1 2
p−1
p
= N ν R1,2 = N ν (R1,2 − q
p
) , (1.130)
N N
48 1. The Sherrington-Kirkpatrick Model
σi11 σi21 σi12 σi22 · · · σi1p−1 σi2p−1 σ̇i1p σ̇i2p = σi1 σi2 · · · σip−1 σ̇ip 2
all terms are ≥ 0, and where the last equality uses that σ̇ 1 ·σ̇ 2 = 0. Of course
here σ̇ · σ̇ = i≤N σ̇i1 σ̇i2 , and the vector notation is simply for convenience.
1 2
Using the inequality |xp−1 − y p−1 | ≤ (p − 1)|x − y| for |x|, |y| ≤ 1 and the
Cauchy-Schwarz inequality we obtain
σ̇ 1 · σ̇ 2
p−1 σ̇ 1 · σ̇ 2
ν (R1,2 − q p−1 ) ≤ (p − 1)ν |R1,2 − q| (1.131)
N N
1 2 1/2
2 1/2 σ̇ · σ̇ 2
≤ (p − 1)ν (R1,2 − q) ν .
N
Now we have
σ̇ 1 · σ̇ 2 2 (σ 1 − σ 1 ) · (σ 2 − σ 2 ) 2
=
N N
(σ − σ ) · (σ 2 − σ 4 ) 2
1 3
= .
N
To bound the right-hand side, we move the averages in σ 3 and σ 4 outside the
square (and we note that the function x → x2 is convex). Jensen’s inequality
(1.23) therefore asserts that
(σ 1 − σ 3 ) · (σ 2 − σ 4 ) 2 (σ 1 − σ 3 ) · (σ 2 − σ 4 ) 2
≤ .
N N
Finally we write
(σ 1 − σ 3 ) · (σ 2 − σ 4 ) 2
= (R1,2 − R1,4 − R3,2 + R3,4 )2
N
≤ 4 (R1,2 − q)2 ,
using that ( i≤4 xi )2 ≤ 4 i≤4 x2i . Combining the three previous inequalities
and taking expectation and square root we reach
1/2
σ̇ 1 · σ̇ 2 2 1/2
ν ≤ 2ν (R1,2 − q)2 .
N
N (N − 1) · · · (N − p + 1)E( σ1 · · · σp − σ1 · · · σp−1 σp )2
≤ 2(p − 1)N p ν (R1,2 − q)2 ,
Np
sup <∞.
N ≥p N (N − 1) · · · (N − p + 1)
A = {σ ∈ ΣN ; ∀i ≤ n, σi = ηi } , (1.133)
where η= (η1 , . . . , ηn ) and μn is the product probability on {−1, 1}n with
density i≤n (1+ηi σi ) with respect to the uniform measure. (Let us observe
that μn is the only probability measure on {0, 1}n such that for each i the
average of σi for μn is equal to σi .)
Formally, we have the following.
Theorem 1.4.15. Assume β < 1/2. Denote by GN,n the law of (σ1 , . . . , σn )
under GN , and consider μn as above, the probability on {−1, 1}n with density
i≤n (1 + ηi σi ) with respect to the uniform measure. Then
K(n)
EGN,n − μn 2 ≤ ,
N
where · denotes the total variation distance.
Thus, to understand well the random measure GN,n it remains only to un-
derstand the random sequence ( σi )i≤n . This will be achieved in Theorem
1.7.1 below.
50 1. The Sherrington-Kirkpatrick Model
Now GN,n ({η}) = GN (A) where A is given by (1.133), and the result follows
by making formal the computation (1.134). Namely, we write
2
(GN (A) − μn ({η}))2 = 2−n ηI ( σI − σi )
I⊂{1,...,n} i∈I
≤ 2−n ( σI − σi )2 ,
I⊂{1,...,n} i∈I
This result raises all kinds of open problems. Here is an obvious question.
Research Problem 1.4.16. How fast can n(N ) grow so that GN,n(N ) −
μn(N ) → 0 ?
Of course, it will be easy to prove that one can take n(N ) → ∞, but finding
the best rate might be hard. One might also conjecture the following.
Conjecture 1.4.17. When β > 0 we have
lim E inf GN − μ = 2 ,
N →∞
where the infimum is computed over all the classes of measures μ that are
product measures.
Conjecture 1.4.18. When β < 1/2 we have
lim Ed(GN , μ) = 0 ,
N →∞
E(U − V )2 = EU 2 + EV 2 − 2EU V ,
x 2 x 1 · x2
E(g )2 = ; E(g 1 g 2 ) = .
N N
Using (1.135) and (1.136) we see that, generically (i.e. for most of the points
x1 , x2 ) we have E(g )2 ρ and E(g 1 g 2 ) q. Since the distribution of a
finite jointly Gaussian family (gp ) is determined by the quantities Egp gp , the
√ √ √
pair (g 1 , g 2 ) has nearly the distribution of the pair (z q + ξ 1 ρ − q, z q +
√
ξ 2 ρ − q) where z, ξ 1 and ξ 2 are independent standard Gaussian r.v.s. Hence
g · x1 g · x2 √ √ √ √
Ef √ f √ Ef (z q + ξ 1 ρ − q)f (z q + ξ 2 ρ − q)
N N
√ √
= E(Eξ f (z q + ξ ρ − q))2 ,
the last equality not being critical here, but preparing for future formulas.
This implies that
√ √
EU 2 E(Eξ f (z q + ξ ρ − q))2 . (1.139)
The same argument proves that EU V and EV 2 are also nearly equal to the
right-hand side of (1.139), so that E(U − V )2 0, completing the argument.
In practice, we will need estimates for quantities such as
g · x1 g · xn
EW f √ ,..., √ dμ(x ) · · · dμ(x ) ,
1 n
(1.140)
N N
1.6 The Cavity Method 53
where, to lighten the formula, we have written AvU rather than Av(U ) both
in numerator and denominator. Recalling (1.144) this is
ρ Avf (σ) exp − HN (σ)
.
ρ Av exp − HN (σ)
− 1
R, = σi σi . (1.148)
N
i<N
and the result follows taking expectation, since the randomnesses of Y and
HN −1 are independent.
Lemma 1.6.2 in particular computes ν0 (f ) when f depends only on the
last spin, with formulas such as ν0 (ε1 ε2 ε3 ) = Eth3 Y.
The fundamental tool is as follows, where we recall that ε = σN .
n
Lemma 1.6.3. Consider a function f on ΣN = (ΣN )n ; then for 0 < t < 1
we have
56 1. The Sherrington-Kirkpatrick Model
d
−
νt (f ) := νt (f ) = β 2 νt (f ε ε (R, − q))
dt
1≤< ≤n
−
− β2n νt (f ε εn+1 (R,n+1 − q))
≤n
n(n + 1) −
+ β2 νt (f εn+1 εn+2 (Rn+1,n+2 − q)) (1.151)
2
and also
νt (f ) = β 2 νt (f ε ε (R, − q))
1≤< ≤n
− β2n νt (f ε εn+1 (R,n+1 − q))
≤n
n(n + 1)
+ β2 νt (f εn+1 εn+2 (Rn+1,n+2 − q)) . (1.152)
2
This fundamental formula looks very complicated the first time one sees it,
although the shock should certainly be milder once one has seen (1.95). A
second look reveals that fortunately as in (1.95) the complication is only
algebraic. Counting terms with their order of multiplicity, the right-hand side
−
of (1.151) is the sum of 2n2 simple terms of the type ±β 2 νt (f ε ε (R, − q)).
Proof. The formula (1.151) is the special case of formula (1.95) where
β √
uσ = √ σN gi σi ; vσ = βσN z q (1.153)
N i<N
β2 −
U (σ , σ ) = ε ε (R, − q) .
2
Finally (1.152) follows from (1.151) and (1.149), as the extra terms cancel
out since ε2 = 1.
The reader has observed that the choice (1.153) is fundamentally different
from the choice (1.62). In words, in (1.153) we decouple the last spin from the
others, rather than “decoupling all the spins at the same time” as in (1.62).
Since the formula (1.151) is the fundamental tool of the cavity method,
we would like to help the reader overcome his expected dislike of this formula
by explaining why, if one leaves aside the algebra, it is very simple. It helps
−
to think of R, − q as a small quantity. Then all the terms of the right-hand
side of (1.151) are small, and thus, ν(f ) = ν1 (f ) ∼ ν0 (f ). This is very helpful
when f depends only on the last spin, e.g. f (σ) = ε1 ε2 because in that case
1.6 The Cavity Method 57
we can calculate ν0 (f ) using Lemma 1.6.2. That same lemma lets us also
−
simplify the terms ν0 (f ε ε (R, − q)), at least when f does not depend on
the last spin. We will get then very interesting information simply by writing
that ν(f ) ∼ ν0 (f ) + ν0 (f ).
For pedagogical reasons, we now derive some of the results of Section 1.4
through the cavity method.
Lemma 1.6.4. For a function f ≥ 0 on ΣN
n
, we have
and we integrate.
n
Proposition 1.6.5. Consider a function f on ΣN , and τ1 , τ2 > 0 with
1/τ1 + 1/τ2 = 1. Then we have
|ν(f ) − ν0 (f )| ≤ 2n2 β 2 exp(4n2 β 2 )ν(|f |τ1 )1/τ1 ν(|R1,2 − q|τ2 )1/τ2 . (1.156)
Proof. We have
1
|ν(f ) − ν0 (f )| = νt (f ) dt ≤ sup |νt (f )| .
0 0<t<1
|νt (f ε ε (R, − q))| ≤ νt (|f ||R, − q|) ≤ νt (|f |τ1 )1/τ1 νt (|R, − q|τ2 )1/τ2
and thus by (1.152) (and since n(n + 1)/2 + n2 + n(n − 1)/2 = 2n2 ),
The larger the value of β0 , the harder it is to prove the result. It seems
difficult by the cavity method to reach the value β0 = 1/2 that we obtained
with Latala’s argument in (1.87) and (1.88).
58 1. The Sherrington-Kirkpatrick Model
Proof. Recalling that ε = σN , we use symmetry among sites to write
1
ν (R1,2 − q)2 = ν((σi1 σi2 − q)(R1,2 − q)) = ν(f ) (1.157)
N
i≤N
where
f = (ε1 ε2 − q)(R1,2 − q) .
The simple idea underlying (1.157) is simply “to bring out as much as pos-
sible of the dependence on the last spin”. This is very natural, since the
cavity method brings forward the influence of this last spin. It is nonetheless
extremely effective.
Using (1.149), and since ε2 = 1, we have
1 −
f = (ε1 ε2 − q)(R1,2 − q) = (1 − ε1 ε2 q) + (ε1 ε2 − q)(R1,2 − q) .
N
The key point is that Lemma 1.6.2 implies
− −
ν0 ((ε1 ε2 − q)(R1,2 − q)) = ν0 (ε1 ε2 − q)ν0 (R1,2 − q)
−
= (E th2 Y − q)ν0 (R1,2 − q)
=0
because ν0 (ε1 ε2 ) = Eth2 Y using Lemma 1.6.2 again and since (1.74) means
that q = E th2 Y . Furthermore,
1 1 1
ν0 (f ) = ν0 (1 − ε1 ε2 q) = (1 − qE th2 Y ) = (1 − q 2 ) . (1.158)
N N N
We now use (1.156) with τ1 = τ2 = 2 and n = 2. Since |ε1 ε2 − q| ≤ 2, we
get
|ν(f ) − ν0 (f )| ≤ 16 β 2 exp(16β 2 ) ν (R1,2 − q)2
and using (1.157) and (1.158),
1
ν (R1,2 − q)2 ≤ + 16 β 2 exp(16β 2 ) ν (R1,2 − q)2 .
N
Thus, if β0 is chosen so that
1
16 β02 exp 16β02 ≤
2
we obtain
1 1
ν (R1,2 − q)2 ≤ + ν (R1,2 − q)2 , (1.159)
N 2
and thus
2
ν (R1,2 − q)2 ≤ .
N
1.6 The Cavity Method 59
Proposition 1.6.7. There exists β0 > 0 such that for β ≤ β0 and any k ≥ 1
we have k
64k
ν (R1,2 − q) 2k
≤ . (1.160)
N
In Section A.6 we explain a general principle relating growth of moments
and exponential integrability. This principle shows that (1.160) implies that
for a certain constant L we have
N
ν exp (R1,2 − q)2 ≤ 2 ,
L
4(2k + 1)
|ν0 (f )| ≤ ν0 (A 2k ) + ν0 (A2k
n ) . (1.165)
N
Assuming β0 ≤ 1/8, we obtain from (1.154) that
νt (f ∗ ) ≤ 2ν(f ∗ ) , (1.166)
whenever f ∗ ≥ 0 is a function on ΣN
2
. Then (1.165) implies
8(2k + 1)
|ν0 (f )| ≤ ν(A 2k ) + ν(A2k
n ) .
N
We now observe that A and An+1 are equal in distribution under ν because
n < N . Thus the induction hypothesis yields
k k+1
16(2k + 1) 64k 1 64(k + 1)
|ν0 (f )| ≤ ≤ . (1.167)
N N 2 N
Using (1.169) in (1.168) yields that for the other values of n as well we have
k+1
64(k + 1)
ν A2k+2
n ≤ .
N
The proof of Proposition 1.6.8 does not use Guerra’s interpolation (i.e the in-
terpolation of Theorem 1.3.7), but rather an explicit formula ((1.176) below)
that is the most interesting part of this approach. This method is precious
in situations where we do not wish to (or cannot) use interpolation. Several
such situations will occur later. Another positive feature of (1.170) is that it
is valid for any value
of β, h and q. The way this provides information is that
the average N −1 M ≤N νM (|R1,2 − q|) cannot be small unless the left-hand
side of (1.171) is small. Let us also remark that combining (1.171) with (1.73)
shows that this average can be small only if A(β, h, q) is close to its infimum
in q, i.e. only if q is a near solution of (1.74).
62 1. The Sherrington-Kirkpatrick Model
On the other hand, the best we can do about the right-hand side of (1.170)
is to write (when β ≤ β0 )
1/2 K
νN (|R1,2 − q|) ≤ νN (R1,2 − q)2 ≤√ ,
N
√
so that (1.170) does not recover (1.108), since it gives only a rate K/ N
instead of K/N . It is possible however to prove a better version of (1.170),
where one replaces the error term ν(|R1,2 − q|) by ν((R1,2 − q)2 ). This essen-
tially requires replacing the “order 1 Taylor expansions” by “order 2 Taylor
expansions”, a technique that will become familiar later.
Proof. In this proof we will consider an (N + 1)-spin system for different
values of β, so we write the Hamiltonian as HN +1,β to make clear which value
of β is actually used. Consider the number β+ given by β+ = β 1 + 1/N ,
so that
β β
√ + =√ .
N +1 N
We write the Hamiltonian −HN +1,β+ (σ1 , . . . , σN +1 ) of an (N + 1)-spin
system with parameter β+ rather than β, and we gather the terms containing
the last spin as in (1.143), so that
β
−HN +1,β+ (σ1 , . . . , σN +1 ) = −HN (σ1 , . . . , σN )+σN +1 √ gi σi +hN +1
N i≤N
On the left-hand side we have pN +1 (β+ , h) rather than pN +1 (β, h), and we
proceed to relate these two quantities. Consider a new independent sequence
gij of standard Gaussian r.v.s. Then
D
− HN +1,β+ (σ1 , . . . , σN +1 ) = −HN +1,β (σ1 , . . . , σN +1 )
β
+ gij σi σj , (1.174)
N (N + 1) i<j≤N +1
1.6 The Cavity Method 63
D β
ZN +1 (β+ ) = ZN +1 (β) exp gij σi σj ,
N (N + 1) i<j≤N +1
To prove (1.170) we will calculate the last two terms of (1.176). The next
exercise provides motivation for the result. The second part of the exercise is
rather challenging, and should be all the more profitable.
Exercise 1.6.9. Convince yourself, using the arguments of Section 1.5, that
one should have
β β2 √
E log ch √ gi σi + hN +1 (1 − q) + E log ch(βz q + h) .
N i≤N 2
Then extend the arguments of Section 1.5 to get convinced that one should
have
β β2
E log exp gij σi σj (1 − q 2 ) .
N (N + 1) i<j≤N +1 4
The rigorous computation of the last two terms of (1.176) uses suitable in-
terpolations. This takes about three pages. In case the reader finds the detail
64 1. The Sherrington-Kirkpatrick Model
of the arguments tedious, she can simply skip them, they are not important
for the sequel. Let us consider a (well behaved) function f and for σ ∈ ΣN ,
let us consider a number wσ > 0. For x = (xσ ), let us consider the function
F (x) = log wσ f (xσ ) .
σ
d 1
EF (u(t)) = E wσ U (σ, σ)f (uσ (t)) (1.177)
dt D(u(t)) σ
1
−E 2
wσ1 wσ2 U (σ 1 , σ 2 )f (uσ1 (t))f (uσ2 (t)) ,
D(u(t)) 1 2
σ ,σ
where D(u(t)) = σ wσ f (uσ (t)). Let us now consider the average · for
the Gibbs measure with Hamiltonian −H(σ) = log wσ . Then (1.177) simply
means that
d U (σ, σ)f (uσ (t))
E log f (uσ (t)) = E
dt f (uσ (t))
U (σ 1 , σ 2 )f (uσ1 (t))f (uσ2 (t))
−E . (1.178)
f (uσ (t)) 2
so that
1 β2
U (σ 1 , σ 2 ) = (Euσ1 uσ2 − Evσ1 vσ2 ) = (R1,2 − q) .
2 2
Let us now define
where the bracket means that the function σ → ch(uσ (t)+hN +1 ) is averaged
for the Gibbs measure.
Let us apply the formula (1.178) to the case where f (x) = ch(x + hN +1 )
and wσ = exp(−HN (σ)), at a given realization of hN +1 and HN . (These
1.6 The Cavity Method 65
Let us set
1 1 2
R1,2 = R1,2 (τ 1 , τ 2 ) = σi σi .
N
i≤N +1
and thus
1 β2 N 1
U (τ , τ ) = (Euτ 1 uτ 2 − Evτ 1 vτ 2 ) =
1 2
R1,2 2 − q 2 − .
2 4 N +1 N
We choose wτ = exp(−HN +1,β (τ )), and we define
β2
U (τ , τ ) = (1 − q 2 )
4
and thus
β2 |R1,2 2 − q 2 | exp(uτ 1 (t) + uτ 2 (t))
ϕ (t) − (1 − q 2 ) ≤ E .
4 exp uτ (t) 2
Now R1,2 ≤ (N + 1)/N ≤ 2 and |q| ≤ 1 so that
2
|R1,2 2 − q 2 | ≤ 3|R1,2 − q| ≤ 3|R1,2 − q| +
N
and therefore
β2 K |R1,2 − q| exp(uτ 1 (t) + uτ 2 (t))
ϕ (t) − (1 − q 2 ) ≤ + 3E . (1.182)
4 N exp uτ (t) 2
To finish the proof it suffices to bound this last term by Kν(|R1,2 − q|), since
then (1.182) gives, since ϕ(0) = 0,
β β2
E log exp gij σi σj − 2
(1 − q )
N (N + 1) 1≤i<j≤N +1 4
β2 K
= ϕ(1) − ϕ(0) − (1 − q 2 ) ≤ + Kν(|R1,2 − q|) ,
4 N
and combining with (1.181) and (1.176) finishes the proof.
To bound the last term of (1.182), we consider the function
|R1,2 − q| exp(uτ 1 (t) + uτ 2 (t))
ψ(t) = 3E .
exp uτ (t) 2
Let us set wτ = exp(−HN +1,β (τ )), and consider the Hamiltonian −Ht (τ ) =
log wτ + uτ (t). Denoting by · t an average for this Hamiltonian, we have
1.7 Gibbs’ Measure; the TAP Equations 67
ψ(t) = 3E |R1,2 − q| t .
We compute ψ (t) by Lemma 1.4.2, used for N + 1 rather than N . The exact
expression thus obtained is not important. What matters here is that using
the bound |U (τ 1 , τ 2 )| ≤ K, we find an inequality
|ψ (t)| ≤ Kψ(t) ,
and by integration this shows that ψ(t) ≤ Kψ(1). Denoting by · + an average
for the Gibbs measure with Hamiltonian HN +1,β+ and using (1.174) we then
observe the identity
ψ(1) = 3E |R1,2 − q| 1 = 3E |R1,2 − q| + .
Next, using the cavity method for the Hamiltonian −HN +1,β+ we obtain
E |R1,2 − q| +
Av|R1,2 − q| exp ≤2 σN
+1
√β
N i≤N gi,N +1 σi
+ h N +1
=E 2
Av exp σN +1 √βN i≤N gi,N +1 σi + hN +1
β
≤ E |R1,2 − q|Av exp σN +1 √
gi,N +1 σi + hN +1 ,
≤2
N i≤N
Exercise
1.7.2.
Assume that (1.132) and Theorem 1.7.1 hold. Prove that
ν (R1,2 − q)2 ≤ K/N . (Hint: replace R1,2 by its value, expand and use
symmetry between sites.)
Research Problem 1.7.3. Find approximation results when n = n(N ) →
∞. (The level of the problem might depend upon how much you ask for.)
We recall the notation ρ = (σ1 , . . . , σN −1 ), and we consider the Hamilto-
nian
β
− HN −1 (ρ) = √ gij σi σj + h σi . (1.184)
N i<j≤N −1 i≤N −1
K
E( σ1 − σ1 −)
2
≤ . (1.186)
N
We will prove this at the end of the section as a consequence of a general
principle (Theorem 1.7.11 below), but we can explain right now why (1.185)
is true. The cavity method (i.e. (1.145)) implies
β
sh √N i≤N −1 gi σi + h −
σN = β .
ch √N i≤N −1 gi σi + h −
As we have seen in (1.137), under · − , the cavity field √1N i≤N −1 gi σi is
approximately Gaussian with mean √1N i≤N −1 gi σi − and variance 1 − q;
and if z is a Gaussian r.v. with expectation μ and arbitrary variance, one has
Eshz
= thμ .
Echz
Relation (1.185) is rather fundamental. Not only is it the key ingredient to
Theorem 1.7.1, but it is also at the root of the Thouless-Anderson-Palmer
(TAP) equations that are stated in (1.192) below.
1.7 Gibbs’ Measure; the TAP Equations 69
then
∂F
∂β (β, q(β))
q (β) = .
1 − ∂F
∂q (β, q(β))
2
A2 − − −
E q− = E(q − R1,2 −)
2
= E R1,2 −q 2
− ≤ E (R1,2 − q)2 − . (1.189)
N
−
Using (1.89) for the (N − 1)-spin system yields that E (R1,2 − q− )2 − ≤ K/N
−
and (1.187) then implies that E (R1,2 − q) − ≤ K/N .
2
Given n and β0 < 1/2, there exists a number K(n, β0 ) such that for
β ≤ β0 and any N one can find r.v.s (zi )i≤n , depending only on the r.v.s
(gij )1≤i<j≤N such that
√ 2 K(n, β0 )
E σi − th(βzi q + h) ≤ . (1.190)
N
i≤n
The reader notices that we assume that the r.v.s (zi )i≤n are functions of the
variables (gij )i<j≤N as part of the induction hypothesis. That this induction
hypothesis is true for n = 1 follows from Lemma 1.7.6, exchanging the sites
1 and N . For the induction step from n to n + 1, we apply the induction
1.7 Gibbs’ Measure; the TAP Equations 71
√ √ L
( q− − q)2 ≤ |q− − q| ≤
N
by (1.187), so that
√ √ L
(β− q− − β q)2 ≤
N
and, since |thx − thy| ≤ |x − y|, this implies
√ √ 2 √ √ 2 L
E (th(β− zi q− + h) − th(β zi q + h) ≤ E zi (β− q− − β q) ≤ .
N
Combining with (1.186) and (1.191) we obtain
√ 2 K
E σi − th(β zi q + h) ≤ ,
N
i≤n
Corollary 1.7.8. For any β < 1/2, any h, and any ε > 0 we have
β K(β, ε)
E max σi − th √ gij σj + h − β (1 − q) σi ≤ 1/2−ε . (1.194)
2
i≤N N j =i N
Proof. Let
β
Δi = σi − th √ gij σj + h − β 2 (1 − q) σi , (1.195)
N j =i
−k
i ≤ K(β, k)N
so that by (1.193) and symmetry between sites we have EΔ2k
and 2k K(β, k)
E max |Δi | ≤ EΔ2ki ≤
i≤N N k−1
i≤N
so
K(β, k)
E max |Δi | ≤ ,
i≤N N 1/2−1/2k
and taking k with 2k ≥ 1/ε concludes the proof.
Research Problem 1.7.9. (Level 1+ ) Is it true that for some K that does
not depend on N one has
N Δ2N
E exp ≤2?
K
√
Research Problem 1.7.10. (Level 1+ ) Is it true that the r.v. N ΔN con-
verges in law to a Gaussian limit?
1.7 Gibbs’ Measure; the TAP Equations 73
These are good problems. Even though the SK model is well under control
for β < 1/2, matters seem rather complicated here; that is, until one finds a
good way to look at them.
We turn to the general principle on which much of the section relies.
Let us consider a standard Gaussian r.v. ξ. Let us remind the reader that
throughout the book we denote by Eξ expectation in the r.v. ξ only, that is,
when all other r.v.s are given.
Theorem 1.7.11. Assume β < 1/2. Consider a function U on R, which is
infinitely differentiable. Assume that for all numbers and k, for any Gaus-
sian r.v. z, we have
E|U () (z)|k < ∞ . (1.196)
Consider independent standard Gaussian r.v.s yi and ξ, which are indepen-
dent of the randomness of · . Then, using the notation σ̇i = σi − σi , for
each k we have
2k
1 K
E U √ yi σ̇i − Eξ U (ξ 1 − q) ≤ k , (1.197)
N i≤N N
V (x) = U (x) − Eξ U (ξ 1 − q)
so that
1 − q) = 0 .
Eξ V (ξ (1.199)
Using replicas, and defining Ṡ = N −1/2 i≤N yi σ̇i , the left-hand side of
(1.197) is
2k
E V (Ṡ 1 ) =E V (Ṡ ) . (1.200)
≤2k
so that the quantity (1.200) is ϕ(1). To prove the theorem, it suffices to prove
that ϕ(r) (0) = 0 for r < 2k and that |ϕ(2k) (t)| ≤ KN −k .
For x = (x )≤2k , let us consider the function √ F (x) given by F (x) =
√ √
≤2k V (x ). Let us define X t = (X )
≤2k for X = tṠ + 1 − tξ 1 − q.
With this notation we have ϕ(t) = E F (Xt ) . We observe that
This implies that when we consider a non-zero term in the sum (1.204), each
number ≤ 2k occurs at least one time in the list 1 , 1 , 2 , 2 , . . . , r , r . Let
us assume this is the case. We also observe that for = the averages of
T, over σ and over σ are both zero. It follows that when
where K does not depend on N and, given k and h, K stays bounded with
β ≤ β0 .
Proof. To prove (1.206) we use (1.197) with U (x) = exp εβx to get
K
EA4k ≤
N 2k
where
εβ β2
exp √
A= yi σ̇i − exp (1 − q) .
N i≤N 2
Now, if B = exp εβN −1/2 i≤N yi σi , (A.6) entails that EB 4k ≤ K k , and
therefore
1/2 K
E(AB)2k ≤ EA4k EB 4k ≤ k .
N
This proves (1.206).
We proceed similarly for (1.207), using now U (x) = x exp εβx, noting that
then (using Gaussian integration by parts and (A.6))
The reason why K remains bounded for β ≤ β0 is simply that all the estimates
are uniform over that range.
Lemma 1.7.14. If |A | ≤ B and B ≥ 1 we have
A A
B − B ≤ |A − A | + |B − B | .
1.7 Gibbs’ Measure; the TAP Equations 77
Proof. We write
A A A A A A
− = − + −
B B B B B B
A B−B 1
= + (A − A) ,
B B B
and the result is then obvious.
Corollary 1.7.15. Let
εβ
E = exp √ yi σi + εh . (1.208)
N i≤N
2k
1 σi AvE AvεE 1 K
E √ yi − β(1 − q) −√ yi σi ≤ k .
N i≤N AvE AvE N i≤N N
(1.210)
Proof. Defining
εβ β2 εβ
A(ε) = exp √ yi σi − exp (1 − q) exp √ yi σi ,
N i≤N 2 N i≤N
and
2k
β β2 K
E AvE − ch √ yi σi + h exp (1 − q) ≤ k , (1.211)
N i≤N 2 N
from which (1.209) follows using Lemma 1.7.14. From (1.207) we obtain by
the same method
2k
1 β2 β
E √ yi σ̇i AvE − β(1−q) exp (1−q)sh √ yi σi + h
N i≤N 2 N i≤N
78 1. The Sherrington-Kirkpatrick Model
K
≤ .
Nk
Combining with (1.211) and Lemma 1.7.14 we get
1 2k
√
i≤N yi σ̇i AvE β K
E N
− β(1 − q)th √ yi σi + h ≤ k .
AvE N N
i≤N
Since
√1
i≤N yi σ̇i AvE 1 σi AvE 1
N
=√ yi −√ yi σi ,
AvE N i≤N AvE N i≤N
2k
β β K
E √ gi σi − β 2 (1 − q) σN − √ gi σi − ≤ .
N i≤N −1 N i≤N −1 Nk
1.7 Gibbs’ Measure; the TAP Equations 79
Therefore if
β
A= √ gi σi − + h
N i≤N −1
β
B= √ gi σi − β 2 (1 − q) σN + h ,
N i≤N −1
we have
K
E(thA − thB)2k ≤ E(A − B)2k ≤ ,
Nk
and combining with (1.212) this yields (1.193).
Proof of Lemma 1.7.4. Since (1.185) follows from (1.212), it remains only
to prove (1.186). Recalling (1.208), it suffices to prove that
σ̇1 AvE 2 K
E ≤ . (1.213)
AvE 2 N
Indeed, we have
σ̇1 AvE σ1 AvE
= − σ1 .
AvE AvE
Using (1.213) for the (N − 1)-spin system, and noticing that then by the
cavity method the right-hand side is σ1 − σ1 − , we obtain (1.186). Thus
it suffices to prove that
K
E σ̇1 AvE 2 ≤ .
N
Let us define
ε β
E = exp √
yi σi + ε h ,
N i≤N
so that using replicas
σ̇1 AvE 2
= σ̇11 σ̇12 AvE1 AvE2 = σ̇11 σ̇12 AvE1 E2 ,
1 1 2 1 1 2
E σ̇11 σ̇12 AvE1 E2 = E σ̇i σ̇i AvE1 E2 =E σ̇i σ̇i AvE E1 E2 .
N N
i≤N i≤N
where either f (x) = ch(β 2 x) or f (x) = sh(β 2 x), and where the equality
follows from the fact that σ̇i1 σ̇i2 = 0. Since β ≤ 1 we have |f (x) − f (q)| ≤
L|x − q| and thus
1 1 2
E 1
σ̇i σ̇i f (R1,2 ) − f (q) ≤ LE
1 2 σ̇i σ̇i |R1,2 − q| .
N N
i≤N i≤N
√
Now, each of the factors on the right “contributes as 1/ N ”. This is seen
by using the Cauchy-Schwarz inequality, (1.89) and the fact that by Jensen’s
inequality (1.23) and (1.89) again we have
2
−1 K
E N 1 2
σ̇i σ̇i ≤ .
N
i≤N
−
When β < 1/2 we know that ν(|R1,2 − q|k ) is small; but, as we see later,
there is some point in making the computation for each value of β. There
are two aspects in the computation; one is to get the correct error terms,
which is very simple; the other is to perform the algebra, and this runs into
(algebraic!) complications. Before we start the computation itself, we explain
its mechanism (which will be used again and again). This will occupy the
next page and a half.
To lighten notation in the argument we denote by R any quantity such
that
1 −
|R| ≤ K + ν |R 1,2 − q| 3
, (1.217)
N 3/2
where K does not depend on N . Using the inequality xy ≤ x3/2 + y 3 for
x, y ≥ 0 we observe first that
1 −
ν(|R1,2 − q|) = R . (1.218)
N
We start the computation of ν (R1,2 − q)2 as usual, recalling the notation
−
ε = σN and writing f = (ε1 ε2 − q)(R1,2 − q), f ∼ = (ε1 ε2 − q)(R1,2 − q), so
that
ν (R1,2 − q)2 = ν(f )
1
= ν(f ∼ ) + ν(1 − ε1 ε2 q) . (1.219)
N
82 1. The Sherrington-Kirkpatrick Model
ν(f ∼ ) = ν0 (f ∼ ) + R ,
where
b(, ) = β 2 ν0 (ε ε (ε1 ε2 − q)) .
− −
Next, we apply (1.215) to f = (R, −q)(R1,2 −q), this time with τ1 = 3/2
−
Using the formula R, = R, − ε ε /N , we obtain (using (1.218))
− −
ν((R, − q)(R1,2 − q)) = ν((R, − q)(R1,2 − q)) + R . (1.223)
Because of the symmetry between replicas the quantity ν((R, −q)(R1,2 −q))
can take only 3 values, namely
U = ν (R1,2 − q)2 ; (1.224)
V = ν((R1,2 − q)(R1,3 − q)) ; (1.225)
W = ν((R1,2 − q)(R3,4 − q)) . (1.226)
1
V = ν(f ∼ ) + ν((ε1 ε2 − q)ε1 ε3 ) (1.228)
N
− 1
W = ν((ε1 ε2 − q)(R3,4 − q)) + ν((ε1 ε2 − q)ε2 ε4 ) ,
N
−
where now f ∼ = (ε1 ε2 −q)(R1,3 −q) and we proceed as above. In this manner,
we get a system of 3 linear equations in U, V, W , the solution of which yields
the values of these quantities (at least in the case β < 1/2, where we know
that |R| ≤ KN −3/2 ).
Having finished to sketch the method of proof, we now turn to the com-
putation of the actual coefficients in (1.227). It is convenient to consider the
quantity
√
q = ν0 (ε1 ε2 ε3 ε4 ) = Eth4 Y = Eth4 (βz q + h) . (1.229)
Let (using Lemma 1.6.2)
1≤< ≤n
−
−n b(, n + 1; x, y)ν0 (f − (R,n+1 − q))
≤n
n(n + 1) −
+ b(0)ν0 (f − (Rn+1,n+2 − q)) . (1.230)
2
This is of course an immediate consequence of (1.151), Lemma 1.6.2, and
the definition of b(, ; x, y). The reason why we bring this formula forward is
that it contains the entire algebraic structure of our calculations. In particular
these calculations will hold for other models provided (1.230) is true (possibly
with different values of b(0), b(1) and b(2)). Let us also note that b(0) =
b(n + 1, n + 2; x, y).
−
Using (1.230) with f − = R1,2 − q and n = 2 yields
−
−
ν0 ((ε1 ε2 − q)(R1,2 − q)) = b(2)ν0 (R1,2 − q)2
− −
− 2b(1) ν0 ((R1,2 − q)(R,3 − q))
≤2
− −
+ 3b(0)ν0 ((R1,2 − q)(R3,4 − q)) ,
84 1. The Sherrington-Kirkpatrick Model
so that going back to (1.221) and recalling the definitions (1.224) to (1.226)
and (1.223) we can fill the coefficients in (1.227):
1 − q2
U= + b(2)U − 4b(1)V + 3b(0)W + R . (1.231)
N
−
To treat the situation (1.228) we use (1.230) with n = 3 and f − = R1,3 − q.
One needs to be patient in counting how many terms of each type there are;
one gets the relation
q − q2
V = + b(1)U + (b(2) − 2b(1) − 3b(0))V + (6b(0) − 3b(1))W + R (1.232)
N
and similarly
q − q 2
W = +b(0)U +(4b(1)−8b(0))V +(b(2)−8b(1)+10b(0))W +R . (1.233)
N
Of course, this is not as simple as one might wish. This brings forward
the matrix
⎛ ⎞
b(2) −4b(1) 3b(0)
⎝ b(1) b(2) − 2b(1) − 3b(0) 6b(0) − 3b(1) ⎠ . (1.234)
b(0) 4b(1) − 8b(0) b(2) − 8b(1) + 10b(0)
Rather amazingly, the transpose of this matrix has eigenvectors (1, −2, 1) and
(1, −4, 3) with eigenvalues respectively
b(2) − 2b(1) + b(0) = β 2 (1 − 2q + q) (1.235)
b(2) − 4b(1) + 3b(0) = β 2 (1 − 4q + 3q) . (1.236)
The second eigenvalue has multiplicity 2, but this multiplicity appears in the
form of a two-dimensional Jordan block so that the corresponding eigenspace
has dimension 1. The amazing point is of course that the eigenvectors do not
depend on the specific values of b(0), b(1), b(2). Not surprisingly the quantities
(1.235) and (1.236) will occur in many formulas.
Using eigenvectors is certainly superior to brute force in solving a sys-
tem of linear equations, so one should start the computation of U, V, W by
computing first U − 2V + W . There is more however to (1.230) than the
matrix (1.234). This will become much more apparent later in Section 1.10.
The author cannot help feeling that there is some simple underlying alge-
braic structure, probably in the form of an operator between two rather large
spaces.
Research Problem 1.8.3. (Level 2) Clarify the algebraic structure under-
lying (1.230).
Even without solving this problem, the idea of eigenvectors gives the feeling
that matters will simplify considerably if one considers well-chosen combina-
tions of (1.230) for various values of x and y, such as the following, which
brings out the value (1.235).
1.8 Second Moment Computations and the Almeida-Thouless Line 85
ν0 ((ε1 − ε2 )(ε3 − ε4 )f − )
− − − −
= (b(2) − 2b(1) + b(0))ν0 (f − (R13 − R14 − R23 + R24 ))
− − − − −
= β (1 − 2q + q)ν0 (f (R13 − R14 − R23 + R24 )) .
2
(1.237)
Proof. The magic here lies in the cancellation of most of the terms in the
sums 1≤< ≤n and ≤n coming from (1.230). We use (1.230) four times
for x = 1, 2 and y = 3, 4 and we compute
c(, ) = b(, ; 1, 3) − b(, ; 1, 4) − b(, ; 2, 3) + b(, ; 2, 4) .
We see that this is zero except in the following cases:
c(1, 3) = c(2, 4) = −c(1, 4) = −c(2, 3) = b(2) − 2b(1) + b(0) .
− −
“Rectangular sums” such as R1,3 − R1,4 − R2,3 + R2,4 or R1,3 − R1,4 −
− −
R2,3 + R2,4 will occur frequently.
Now that we have convinced the reader that the error terms in our com-
putation are actually of the type (1.217) we will for simplicity assume that
β < 1/2, in which case the error terms are O(3), where we recall that O(k)
means a quantity A such that |A| ≤ KN −k/2 where K does not depend on
N.
We will continue the computation of U, V, W later, but to immediately
make the point that (1.237) simplifies the algebra we prove the following,
where we recall that “·” denotes the dot product in RN , so that σ 1 · σ 2 =
N R1,2 . It is worth making the effort to fully understand the mechanism of
the next result, which is a prototype for many of the later calculations.
Proposition 1.8.5. If β < 1/2 we have
(σ 1 − σ 2 ) · (σ 3 − σ 4 ) 2 4(1 − 2q + q)
ν = + O(3) . (1.238)
N N (1 − β 2 (1 − 2q + q))
Proof. Let ai = (σi1 − σi2 )(σi3 − σi4 ), so that
(σ 1 − σ 2 ) · (σ 3 − σ 4 ) 1
= R1,3 − R1,4 − R2,3 + R2,4 = ai . (1.239)
N N
i≤N
Moreover
1
ν((ε1 − ε2 )(ε3 − ε4 )f ) = ν((ε1 − ε2 )(ε3 − ε4 )f − ) + ν ((ε1 − ε2 )(ε3 − ε4 ))2 ,
N
(1.241)
− − − −
where f − = R1,3 − R1,4 − R2,3 + R2,4 . First we observe that
ν0 ((ε1 − ε2 )(ε3 − ε4 ))2 = 4ν0 ((1 − ε1 ε2 )(1 − ε3 ε4 )) = 4(1 − 2q + q) . (1.242)
ν0 (f ∗ ) = ν0 ((ε1 − ε2 )(ε3 − ε4 )f − )
− − − − 2
= β 2 (1 − 2q + q)ν0 (R1,3 − R1,4 − R2,3 + R2,4 ) .
− − − −
Next, we observe that ν((R1,2 − q)4 ) = O(4), so that ν((R1,3 − R1,4 − R2,3 +
− 4
R2,4 ) ) = O(4) and we apply (1.216) with e.g τ1 = τ2 = 2 to obtain
− − − − 2
− − − − 2
ν0 (R1,3 − R1,4 − R2,3 + R2,4 ) = ν (R1,3 − R1,4 − R2,3 + R2,4 ) + O(3) .
−
We then use the relation R, = R, − ε ε /N and expansion to get
− − − − 2
ν (R1,3 − R1,4 − R2,3 + R2,4 ) = ν (R1,3 − R1,4 − R2,3 + R2,4 )2 + O(3) .
1
β 2 (1 − 2q + q) = β 2 E <1, (1.243)
ch4 Y
i.e. unless we are on the high-temperature side of the AT line (1.214), a point
to which we will return in the next section.
β 2 (1 − 2q + q)2
E ( σ1 σ2 − σ1 σ2 )2 = + O(3) . (1.244)
N (1 − β 2 (1 − 2q + q))
and
1 1 1
T, = σ̇i σ̇i ; T = σ̇i σi ; T = σi 2
−q .
N N N
i≤N i≤N i≤N
where
1 − 2q + q
A2 = . (1.248)
N (1 − β 2 (1 − 2q + q))
2 (σ 1 − b) · (σ 2 − b) (σ 1 − b) · (σ 2 − b)
T1,2 =
N N
(σ 1 − σ 3 ) · (σ 2 − σ 4 ) (σ 1 − σ 5 ) · (σ 2 − σ 6 )
= .
N N
1
2
ν(T1,2 )= ν((ε1 − ε2 )(ε3 − ε4 )(ε1 − ε5 )(ε3 − ε6 )) + ν((ε1 − ε2 )(ε3 − ε4 )f − ) ,
N
− − − −
for f − = R1,3 − R1,6 − R5,3 + R5,6 . We observe that
so that
ν0 (ε1 − ε2 )(ε3 − ε4 )(ε1 − ε5 )(ε3 − ε6 ) = 1 − 2 q + q .
− −
ν0 ((ε1 − ε2 )ε3 f − ) = (b(2) − b(1))ν0 (f − (R1,3 − R2,3 )) (1.253)
− − − − − −
+ (b(1) − b(0)) ν0 (f (R1, − R2, )) − nν0 (f (R1,n+1 − R2,n+1 )) .
4≤≤n
ν0 ((ε1 − ε2 )ε3 f − )
− −
= (b(2) − 4b(1) + 3b(0))ν0 (f − (R1,3 − R2,3 ))
− − − − −
+ (b(1) − b(0)) ν0 (f (R1, − R2, − R1,n+1 + R2,n+1 ))
4≤≤n
− −
= β (1 − 4q + 3
2
q )ν0 (f − (R1,3 − R2,3 ))
− − − −
+ β 2 (q − q) ν0 (f − (R1, − R2, − R1,n+1 + R2,n+1 )) . (1.254)
4≤≤n
90 1. The Sherrington-Kirkpatrick Model
This is 0 if ≥ 3 or = 1, = 2. Moreover
This proves (1.253). To prove (1.254) when f does not depend on the third
replica we simply notice that then
− − − −
ν0 (f − (R1,3 − R2,3 )) = ν0 (f − (R1,n+1 − R2,n+1 )) ,
where
1 q − q
B2 = . (1.256)
N (1 − β 2 (1 − 2q + q))(1 − β 2 (1 − 4q + 3
q ))
Moreover,
ν(T1 T2 ) = ν(T1 T ) = 0 . (1.257)
1
ν(T12 ) = (q − q) + β 2 (1 − 4q + 3
q )ν((R1,3 − R2,3 )(R1,5 − R4,5 )) (1.259)
N
+ β 2 (q − q) ν (R1,5 − R4,5 )(R1, − R2, − R1,6 + R2,6 ) + O(3) .
=4,5
where
q − q 2
(1−β 2 (1−4q+3
q ))C 2 = q −q 2 )A2 +2β 2 (2q+q 2 −3
+β 2 ( q )B 2 . (1.262)
N
Proof. We know exactly how to proceed. We write
where the index M refers to an M -spin system. This fact, however, relies
on an extension of Theorem 1.3.7, and, like this theorem, uses very special
properties of the SK model.
Research Problem 1.9.1. (Level 2) Prove that beyond the AT line we have
in fact for each N that
ν(|R1,2 − q|) > δ . (1.268)
As we will explain later, we know with considerable work how to deduce
(1.268) from (1.267) in many cases, for example when in the term i≤N hi σi
of the Hamiltonian the r.v.s hi are i.i.d. Gaussian with a non-zero variance;
but we do not know how to prove (1.268) when hi = h = 0.
In contrast with the previous arguments, the results of the present section
rely on a very general method, which has the potential to be used for a great
many models, and that provides results for every N . This method simply
analyzes what goes wrong in the proof of (1.238) when (1.266) occurs. The
main result is as follows.
Proposition 1.9.2. Under (1.266), there exists a number δ > 0, that does
not depend on N , such that for N large enough, we have
δ2
ν |R1,2 − q|3 ≥ δν (R1,2 − q)2 ≥ . (1.269)
N
This is not as nice as (1.268), but this shows something remarkable: the
set where |R1,2 − q| ≥ δ/2 is not exponentially small (in contrast with what
happens in (1.87)). To see this we write, since |R1,2 − q| ≤ 2,
δ
|R1,2 − q|3 ≤ (R1,2 − q)2 + 8 · 1{|R1,2 −q|≥δ/2} , (1.270)
2
where 1{|R1,2 −q|≥δ/2} is the function of σ 1 and σ 2 that is 1 when |R1,2 − q| ≥
δ/2 and is 0 otherwise. Using the first part of (1.269) in the first inequality
and (1.270) in the second one, we obtain
94 1. The Sherrington-Kirkpatrick Model
δ
δν (R1,2 − q)2 ≤ ν |R1,2 − q|3 ≤ ν (R1,2 − q)2 + 8ν 1{|R1,2 −q|≥δ/2} .
2
Hence, using the second part of (1.269) in the second inequality,
δ δ2
ν 1{|R1,2 −q|≥δ/2} ≥ ν (R1,2 − q)2 ≥ . (1.271)
16 16N
Lemma 1.9.3. For each values of β and h, we have
β 2 (1 − 4q + 3
q) < 1 . (1.272)
and since q = Φ(q), we have Φ (q) < 1. Now, using Gaussian integration by
√
parts, and writing as usual Y = βz q + h,
z th Y 1 sh2 Y
Φ (q) = β E √ 2
=β E −2 4
q ch2 Y ch4 Y ch Y
3 2
= β2 E − 2 = β 2 (3(1 − 2 q + q) − 2(1 − q))
ch4 Y ch Y
= β 2 (1 − 4 q + 3 q) , (1.274)
where
1
|R| ≤ K(β, h) + ν(|R1,2 − q| ) .
3
(1.277)
N 3/2
Proof. As explained at the beginning of Section 1.8 all the computations
there are done modulo an error term as in (1.277); and (1.266) and (1.272)
show that we are permitted to divide by (1 − β 2 (1 − 2q + q)) and (1 − β 2 (1 −
4q + q)), so that (1.275) and (1.276) are what we actually proved in (1.238)
and (1.265) respectively.
1.9 Beyond the AT Line 95
4 (1 − 2q + q) 1
R≥− ≥
N (1 − β 2 (1 − 2q + q)) KN
b) Under (1.266) there exists a number δ such that for N large enough
δ
∀x ≥ 0 , ν 1{|R1,2 −x|≥δ } ≥ . (1.280)
N
Proof. We use (1.215), but where νt is defined using x rather than q, to get
−
|ν(ε1 ε2 ) − ν0 (ε1 ε2 )| ≤ Kν(|R1,2 − x|) . (1.281)
so (1.282) yields
96 1. The Sherrington-Kirkpatrick Model
1
|Φ(x) − x| ≤ |Φ(x) − ν(R1,2 )| + |ν(R1,2 ) − x| ≤ K ν(|R1,2 − x|) + .
N
Now, the function Φ(x) satisfies Φ(q) = q and, as seen in the proof of Lemma
1.9.3 we have Φ (q) < 1, so that |x − Φ(x)| ≥ K −1 |x − q| when |x − q| is
small. Since Proposition A.14.1 shows that Φ(x)/x is decreasing it follows
that |x − Φ(x)| = 0 for x = q, and the previous inequality holds for all x ≥ 0
and this proves (1.279).
To prove (1.280), we observe that if |x − q| ≤ δ/4 then by (1.271) we have
δ2
ν 1{|R1,2 −x|≥δ/4} ≥ ν 1{|R1,2 −q|≥δ/2} ≥ ,
16N
so it is enough to consider the case |x − q| ≥ δ/4. But then by (1.279) it holds
δ 1
≤ K ν(|R1,2 − x|) + ,
4 N
so that for N large enough we get ν(|R1,2 − x|) ≥ δ/(8K) := 1/K0 and thus
since |R1,2 − x| ≤ 2 we obtain
1 1
≤ ν(|R1,2 − x|) ≤ 2ν 1{|R1,2 −x|≥1/(2K0 )} + .
K0 2K0
Consequently,
1
ν 1{|R1,2 −x|≥1/2K0 } ≥ .
4K0
In the rest of this section we show that (1.280) has consequences with a
nice physical interpretation (although the underlying mathematics is elemen-
tary large deviation theory).
For this we consider the Hamiltonian
This is the Hamiltonian of a system made from two copies of (ΣN , GN ) that
interact through the term λN R1,2 . We define
ZN,λ = exp(−HN,λ (σ 1 , σ 2 )) (1.284)
σ 1 ,σ 2
1 1
ψN (λ) = E log ZN,λ − E log ZN,0 (1.285)
N N
so that the identity
1
ψN (λ) = E log exp λN R1,2 (1.286)
N
1.9 Beyond the AT Line 97
holds, where · denotes an average for the Gibbs measure with Hamiltonian
HN . This quantity is natural to consider to study the fluctuations of R1,2 . We
denote by · λ an average for the Gibbs measure with Hamiltonian (1.283);
thus
R1,2 exp λN R1,2
ψN (λ) = E = E R1,2 λ .
exp λN R1,2
We also observe that ψN is a convex function of λ, as is obvious from (1.286)
λ −
2
and Hölder’s inequality. (One can also compute ψN (λ) = N (E R1,2
E R1,2 λ ) ≥ 0.)
2
Theorem 1.9.6. ψ(λ) = limN →∞ ψN (λ) exists for all β, h and (under
(1.266)) is not differentiable at λ = 0.
The important part of Theorem 1.9.6 is the non-differentiability of the
function ψ. We shall prove the following, of which Theorem 1.9.6 is an imme-
diate consequence once we know the existence of the limit limN →∞ ψN (λ).
The existence of this limit is only a side story in Theorem 1.9.6. It requires
significant work, so we refer the reader to [76] for a proof.
Proposition 1.9.7. Assume (1.266), and consider δ as in (1.280). Then
for any λ > 0 we have ψN (λ) − ψN (−λ) > δ /2 provided N is large enough.
To deduce Theorem 1.9.6, consider the subset U of R such that ψ (±λ)
exists for λ ∈ U . Since ψ is convex, the complement of U is at most
countable. Griffiths’ lemma (see page 25) asserts that limN →∞ ψN (±λ) =
ψ (±λ) for λ in U . By Proposition 1.9.7 for any λ ∈ U , λ > 0, we have
ψ (λ) − ψ (−λ) ≥ δ /2. Now, since ψ is convex, the limit limλ→0+ ,λ∈U ψ (λ)
is the right-derivative ψ+ (0) and similarly, while the limit λ → 0− is the left
derivative ψ− (0). Therefore ψ+ (0) − ψ− (0) > δ /2 and ψ is not differentiable
at 0.
In words, an arbitrarily small change of λ around 0 produces a change in
E R1,2 λ of at least δ /2, a striking instability.
Proof of Proposition 1.9.7. Let xN = E R1,2 = ψN (0). Using (1.280) we
see that at least one of the following occurs
δ
ν 1{R1,2 ≥xN +δ } ≥ (1.287)
2N
δ
ν 1{R1,2 ≤xN −δ } ≥ . (1.288)
2N
We assume (1.287); the proof in the case (1.288) is similar. We have
exp λN R1,2 ≥ exp λN (xN + δ ) 1{R1,2 ≥xN +δ }
so that
1 1
log exp λN R1,2 ≥ λ(xN + δ ) + log 1{R1,2 ≥xN +δ } .
N N
98 1. The Sherrington-Kirkpatrick Model
The r.v. X = 1{R1,2 ≥xN +δ } satisfies EX ≥ δ /2N by (1.287), so that, since
X ≤ 1, we have
δ δ
P X≥ ≥ .
4N 4N
(Note that EX ≤ ε + P(X ≥ ε) for each ε and take ε = δ /(4N ).) Thus
1 1 δ δ
P log exp λN R1,2 ≥ λ(xN + δ ) + log ≥ . (1.289)
N N 4N 4N
and in particular
δ
ψN (λ) ≥ λ xN +
2
and therefore, since ψN (0) = 0 and ψN is convex,
ψN (λ) − ψN (0) δ δ δ
ψN (λ) ≥ ≥ xN + = ψN (0) + ≥ ψN (−λ) + .
λ 2 2 2
Theorem 1.10.1. Assume that β < 1/2. Fix an integer n. For 1 ≤ < ≤
n consider integers k(, ), and for 1 ≤ ≤ n consider integers k(). Set
k1 = k(, ) ; k2 = k() ,
1≤< ≤n 1≤≤n
is simply any (finite) product of quantities of the type T, , T , T , and the
rôle of the integer n is simply to record “on how many replicas this product
depends”, which is needed to apply the cavity method.
One can reformulate (1.290) as follows. Consider independent Gaussian
r.v.s U, , U , U and assume
2 2
EU, = NA ; EU2 = N B 2 ; EU 2 = N C 2 .
because the r.v.s U, , U for , ≤ n are independent from the r.v.s U, , U
for n + 1 ≤ , . Consequently,
satisfies
E(W − EW )2 = O(1) ,
because we may use (1.295) and expand this quantity as a sum of terms of
the type (1.292). Consider the r.v.s g, = U, + U + U . Expanding and
using (1.291) we see that
k(, )
EW = E g, + O(1) .
,
1.10 Central Limit Theorem for the Overlaps 101
The Gaussian family (g, ) may also be described by the following properties:
2
Eg, 2 2
= N (A + 2B ), Eg, g1 ,2 = N B
2
if card({, } ∩ {1 , 2 }) = 1 and
Eg, g1 ,2 = 0 if {, } and {1 , 2 } are disjoint.
We now prepare for the proof of Theorem 1.10.1. In the next two pages,
until we start the proof itself, the letter k denotes any integer. We start the
work with the easy part,
which is the control of the error terms. By (1.88),
for each k we have ν (R1,2 − q)2k ≤ K/N k , and thus
K
ν |R1,2 − q|k ≤ .
N k/2
Consequently,
− K
ν |R1,2 − q|k ≤ , (1.296)
N k/2
and this entails a similar bound for any quantity W that is a linear combi-
− − −
nation of the quantities R, − q, e.g. W = R1,3 − R2,3 .
In order to avoid repetition, we will spell out the exact property we will use
in the proof of Theorem 1.10.1. The notation is as in Lemma 1.8.2.
Lemma 1.10.2. Consider integers x, y ≤ n as well as a function f − on ΣN n
,
−
which is the product of k terms of the type R, − q. Then the identity
102 1. The Sherrington-Kirkpatrick Model
−
ν((εx εy − q)f − ) = b(, ; x, y)ν(f − (R, − q))
1≤< ≤n
−
−n b(, n + 1; x, y)ν(f − (R,n+1 − q))
≤n
n(n + 1) −
+ b(0)ν(f − (Rn+1,n+2 − q)) + O(k + 2) (1.300)
2
holds.
Proof. We use (1.299) for f = f − (εx εy − q), so that ν0 (f ) = 0 and we use
(1.230) to compute ν0 (f ). We then use (1.298) with k + 1 instead of k to see
− −
that ν0 (f − (R, − q)) = ν(f
−
(R, − q)) + O(k + 2).
Of course Lemma 1.10.2 remains true when f is a product of k terms
−
which are linear combinations of terms of the type R, − q.
ν((ε1 − ε2 )(ε3 − ε4 )f − )
− − − −
= (b(2) − 2b(1) + b(0))ν((R1,3 − R1,4 − R2,3 + R2,4 )f − )
+ O(k + 2) . (1.301)
Moreover, whenever f − does not depend on the third replica σ 3 we also have
− −
ν((ε1 − ε2 )ε3 f − ) = (b(2) − 4b(1) + 3b(0))ν((R1,3 − R2,3 )f − )
− − − −
+ (b(1) − b(0)) ν((R1, − R2, − R1,n+1 + R2,n+1 )f − )
4≤≤n
+ O(k + 2) . (1.302)
terms have already been computed in the previous steps. If one thinks about
it, this is exactly the way we have proceeded in Section 1.8.
We now start the proof of Theorem 1.10.1, the notation of which we use;
in particular:
k = k1 + k2 + k3 .
Proposition 1.10.4. We have
k()
k(, ) k3
ν T, T T (1.303)
1≤< ≤n 1≤≤n
k()
= a(k(, ))Ak1 ν T T k3 + O(k + 1) .
1≤< ≤n 1≤≤n
for v ≤ k1 ;
ε(v) = (ε(v) − εj(v) )εj (v) (1.307)
for k1 < v ≤ k1 + k2 ; and finally, for k1 + k2 < v ≤ k + k1 + k2 + k3 , let
where
1
I= ν ε(1)ε(u) U − (v) . (1.311)
N
2≤u≤k v =u
We first study the term I. Since the product v=u U − (v) contains k − 2
factors, the function ε(1)ε(u) v=u U − (v) is a function of order k − 2 as
defined four lines above (1.297); and then (1.298) entails
1 1
ν ε(1)ε(u) U − (v) = ν0 ε(1)ε(u) U − (v) + O(k + 1) .
N N
v =u v =u
Thus we obtain
k(1, 2) − 1
I= (1 − 2q + q )ν U − (v) + O(k + 1) . (1.312)
N
3≤v≤k
Next, we use (1.301) (with indices 1, j(1), 2, j (1) rather than 1, 2, 3, 4, and
with k − 1 rather than k) to see that
ν ε(1) U − (v)
2≤v≤k
− − − − −
= β (1 − 2q + q )ν
2
(R1,2 − R1,j (1) − Rj(1),2 + Rj(1),j (1) ) U (v)
2≤v≤k
+ O(k + 1)
= β 2 (1 − 2q + q )ν U − (v) + O(k + 1) .
1≤v≤k
We claim that on the right-hand side we may replace each term U − (v) by
U (v) up to an error of O(k + 1). To see this we simply use the relation
U − (v) = U (v) − ε(v)/N and we expand the products. All the terms except
the one where all factors are U (v) are O(k + 1), as follows from (1.297).
Recalling (1.309) we have proved that
k(1, 2) − 1
(1 − β (1 − 2q + q))V =
2
(1 − 2q + q)ν U (v) + O(k + 1) .
N
3≤v≤k
(1.313)
The proof is finished if k(1, 2) = 1, since a(1) = 0. If k(1, 2) ≥ 2, we have
k()
k (, ) k3
ν U (v) = ν T, T T ,
3≤v≤k 1≤< ≤n 1≤≤n
1.10 Central Limit Theorem for the Overlaps 107
Using that a(k(1, 2)) = (k(1, 2) − 1)a(k (1, 2)), this completes the induction
and the proof of Proposition 1.10.4.
Proposition 1.10.5. With the notation of Theorem 1.10.1 we have
k()
k(, ) k3
ν T, T T (1.314)
1≤< ≤n 1≤≤n
= a(k(, )) a(k())Ak1 B k2 ν T k3 + O(k + 1) .
1≤< ≤n 1≤≤n
Proof. We already know from Proposition 1.10.4 that we can assume that
k1 = 0. So we fix k1 = 0 and we prove Proposition 1.10.5 by induction over
k2 . Thus assume k2 > 0 and also without loss of generality that k(1) > 0.
We keep the notation of Proposition 1.10.4. Recalling (1.304) we assume
(v) = 1 ⇔ v ≤ k(1) .
and for v > k(1) we have ν0 (ε(1)ε(v)) = 0 because ε(v) does not depend on
either ε1 or εj(1) . Thus, instead of (1.312) we now have (recalling that the
term I has been defined in (1.311))
108 1. The Sherrington-Kirkpatrick Model
k(1) − 1 −
I= (q − q)ν U (v) + O(k + 1) . (1.316)
N
3≤v≤k
We use (1.302) (with the indices 1, j(1), j (1) rather than 1, 2, 3 and with k−1
rather than k) to obtain
−
ν ε(1) U (v)
2≤v≤k
− − −
= β (1 − 4q + 3
2
q )ν (R1,j (1) − Rj(1),j (1) ) U (v)
2≤v≤k
+ II + O(k + 1) (1.317)
for
− − − −
II = β 2 (q − q) ν (R1, − Rj(1), − R1,n +1 + Rj(1),n +1 ) U − (v) ,
2≤v≤k
(1.318)
where n is an integer larger than all indices j(v), j (v), and where the sum-
mation is over 2 ≤ ≤ n , = j(1), = j (1).
Compared to the proof of Proposition 1.10.4 the new (and non-trivial)
part of the argument is to establish the relation
II = (k(1) − 1)β 2 (q − q)ν 2
T(v) T1,n +1 + O(k + 1) (1.319)
3≤v≤k
and we explain first how to conclude once (1.319) has been established. As
−
usual in (1.316) and (1.317) we can replace U − (v) by U (v) and R, by R,
and also
− − − −
ν (R1,j (1) − Rj(1),j (1) ) U (v) = ν U (v)
2≤v≤k 1≤v≤k
=ν U (v) + O(k + 1)
1≤v≤k
= V + O(k + 1) .
Combining with (1.310) and (1.317) we get
1
(1 − β (1 − 4q + 3
2
q ))V = (k(1) − 1)(q − q) ν U (v)
N
3≤v≤k
2 2
+β ν T(v) T1,n+1 + O(k + 1) . (1.320)
3≤v≤k
1.10 Central Limit Theorem for the Overlaps 109
where k () = k() for > 1 and k (1) = k(1) − 2 if = 1. Thus the induction
hypothesis implies
1 k () k3 1
ν T T = a(k ())B k2 −2 ν T k3 +O(k +1) . (1.322)
N N
1≤≤n 1≤≤n
We can use Proposition 1.10.4 and the induction hypothesis to compute the
term
2
ν T(v) T1,n +1
3≤v≤k
(1 − β 2 (1 − 4q + 3
q ))V
1
= (q − q) + β 2 A2 a(k ())B k2 −2 ν T k3 + O(k + 1) .
N
1≤≤n
and, for 2 ≤ v ≤ k2
This looks complicated, but we shall prove that when we expand the product
most of the terms are O(k + 1). We know from Proposition 1.10.4 that in
order for a term not to be O(k + 1), each factor T, must occur at an even
power because a(k) = 0 for odd k. In order for the terms T1, (or Tj(1), , or
T1,n +1 or Tj(1),n +1 ) to occur at an even power in the expansion, one has
to pick the same term again in one of the factors U (v) for v ≥ 2. Since all
the integers j(v), j (v) are ≤ n , this is impossible for the terms T1,n +1 and
Tj(1),n +1 .
Can this happen for the term Tj(1), ? We can never have {j(1), } =
{j(v), j (v)} for v ≥ 2 because the integers j(v), j (v) are all distinct. We
can never have {j(1), } = {(v), j (v)} because j(1) ∈ / {(v), j (v)} since
j(1) > n, (v) ≤ n and j(1) = j (v), so this cannot happen either for this
term Tj(1), .
Can it happen then for the term T1, ? Since j(v), j (v) ≥ n, we can never
have {1, } = {j(v), j (v)}. Since j (v) > n, we have {1, } = {(v), j (v)}
exactly when (v) = 1 and = j (v). Since 2 ≤ v ≤ k, there are exactly
k(1) − 1 possibilities for v, namely v = 2, . . . , k(1). For each of these values
of v, there is exactly one possibility for , namely = j (v).
So, only for the terms T1, where ∈ {j (2), . . . , j (k(1))} can we pick
another copy of this term in the product 2≤v≤k U (v), and this term is
found in U (u) for the unique 2 ≤ u ≤ k(1) for which = j (u). Therefore in
that case we have
ν (T1, − Tj(1), − T1,n +1 + Tj(1),n +1 ) U (v) = ν T1, 2
U (v) .
2≤v≤k v =u
Moreover, since = j (u) we then have > n, and since (v) ≤ n and all
the numbers j(v) and j (v) are distinct, does not belong to any of the sets
{(v), j(v), j (v)} for v = u, so that, by symmetry between replicas,
2 2 2
ν T1, U (v) = ν T1,n +1 U (v) = ν T1,n +1 T(v) ,
v =u 3≤v≤k 3≤v≤k
and since there are exactly k(1) − 1 such contributions, this completes the
proof of (1.319), hence of Proposition 1.10.5.
Proof of Theorem 1.10.1. We prove by induction over k that
1.10 Central Limit Theorem for the Overlaps 111
II = β 2 (1 − 4q + 3
q )ν(T k ) (1.328)
2
+ (k − 1) β ( q − q )ν(T1,2 T
2 2 k−2
) + 2β (2q + q − 3
2 2 2 k−2
q )ν(T1 T ) .
Once this has been proved one can compute the last two terms using the
induction hypothesis and Propositions 1.10.4 and 1.10.5, namely
2
ν(T1,2 T k−2 ) = A2 a(k − 2)C k−2 + O(k + 1)
and
ν(T12 T k−2 ) = B 2 a(k − 2)C k−2 + O(k + 1) .
Combining this value of II with (1.325) and (1.326), and using (1.262) one
then completes the induction.
It would be nice to have a one-line argument to prove (1.328); maybe
such an argument exists if one finds the correct approach, which probably
means that one has to solve Research Problem 1.8.3. For the time being, one
carefully collects the terms of (1.327). Here are the details of this computation
(a more general version of which will be given in Volume II). In order to
compute ν((R, − q)f ) we can replace each factor R2v−1,2v − q of f by T
whenever {2v − 1, 2v} ∩ {, } = ∅. Thus we see first that
If w is the unique integer ≤ k such that ∈ {2w −1, 2w}, then for w = 1 (and
since f does not contain the factor R1,2 −q) we have ν((R,n+1 −q)f ) = ν(T k ),
whereas for w ≥ 2 we have
ν (R,n+1 − q)f = ν (R,n+1 − q)(R2w−1,2w − q)T k−2 ,
• If w = 1, we find instead
ν((R, − q)f ) = ν (R, − q)(R2w −1,2w − q)T k−2 f
= ν(T k ) + ν(T2 T k−2 ) + O(k + 1) .
(n − 3)(n − 2)
β 2 (2(n − 2)(q − q 2 ) + 2 q − q 2 ) − (n − 2)n(
( q − q 2 ))
2
= 2β 2 (k − 1)(2q + q 2 − 3
q) ,
Theorem 1.11.1. (A. Hanen [79]) If β < 1/2, for each k we have
k/2 2
β2 1
E( σ1 σ2 − σ1 σ2 )k = a(k) E
N (1 − β (1 − 2q + q))
2
ch2k Y
+ O(k + 1) . (1.329)
E( σ1 σ2 − σ1 σ2 )k = E( σN σN −1 − σN σN −1 )k (1.331)
and that
1
σN σN −1 − σN σN −1 = (σ 1 − σN
2 1
)(σN −1 − σN −1 )
2
2 N
1
= (ε1 − ε2 )(σN
1
−1 − σN −1 )
2
(1.332)
2
where as usual ε = σN . Using replicas, we then have
where f − = (σN1
−1 − σN −1 ) · · · (σN −1 − σN −1 ).
2 2k−1 2k
For v ≥ 1, let us set ηv = ε2v−1 − ε2v , and for a set V of integers let us
set
ηV = ηv .
v∈V
n
Lemma 1.11.3. If f is a function on ΣN we have
(m) K(m, n)
|νt (f )| ≤ ν(f 2 )1/2 . (1.334)
N m/2
1.11 Non Gaussian Behavior: Hanen’s Theorem 115
−
Proof. This is because “each derivative brings out a factor R, − q that
−1/2
contributes as N .” More formally, by (1.151), counting each term with
its order of multiplicity, νt (f ) is the sum of 2n2 terms of the type
−
±β 2 νt (ε ε (R, − q)f ) ,
(m)
where , ≤ n + 2, so that by iteration νt (f ) is the sum of at most
2n2 (2(n + 2)2 ) · · · (2(n + 2(m − 1))2 )
terms of the type
±β 2m
νt εr εr (R−r , − q)f
r
r≤m
A basic idea is that the quantity ν0 (ηV εJ ) has a great tendency to be zero,
because each factor ηv = ε2v−1 −ε2v gives it a chance. And taking the product
by εJ cannot destroy all these chances if cardJ < cardV , as is made formal
in the next lemma.
Lemma 1.11.4. Assume that card(V ∗ ∩J) < cardV , and consider a function
f of (ε )∈V
/ ∗ . Then
ν0 (ηV εJ f) = 0 .
Proof. Recalling the definition of V ∗ and since card(V ∗ ∩ J) < cardV , there
exists v in V such that {2v − 1, 2v} ∩ J = ∅. Defining V = V \ {v} we get
ηV εJ f = ηv ηV εJ f ,
where ηV εJ f depends only on ε for = {2v − 1, 2v}. Thus
ηV εJ f 0 = ηv 0 ηV εJ f 0 = 0
because ηv 0 = 0.
As a consequence of Lemma 1.11.4 terms ηV create a great tendency for
certain derivatives to vanish.
Lemma 1.11.5. Consider two integers r, s and sets V and J with cardV = s
and card(V ∗ ∩ J) ≤ r. Consider a function f of (ε )∈V
/ ∗ and a function f
−
n
on ΣN −1 . Then for 2m + r < s we have
(ηV εJ ff − ) = 0 .
(m)
ν0
116 1. The Sherrington-Kirkpatrick Model
|t|≤1 N (m+1)/2
where
WI = ν ηV Cu
u≤m
u∈I
/ u∈I
/
so that
1
WI = ν(ηV εJ f − ) .
N r
We may use Corollary 1.11.6 with r = 2r , f = 1 to obtain
K(s, n)
|ν(ηV εJ f − )| ≤ ν((f − )2 )1/2 ,
N a/2
where a = (s + 1)/2 − r if s is odd and a = s/2 − r if s is even so that
a=b−r .
118 1. The Sherrington-Kirkpatrick Model
−
Also, by (1.103) we have νt ((R1,2 −q )2k ) ≤ KN −k and Hölder’s inequality
implies
K(m)
ν((f − )2 )1/2 ≤ (m−r )/2 .
N
Therefore
1 K(s, n) K(m) K(s, n, m)
|WI | ≤ = .
N r N b/2−r /2 N (m−r )/2 N (b+m)/2
The uniformity of these estimates over β ≤ β0 < 1/2 should be obvious.
Proof. From (1.151) we know that νt (ηV ff − ) is the sum of 2n2 terms of
the type
−
±β 2 νt (ηV ε ε (R, −) .
− q)f f
(m−1)
Now it follows from Lemma 1.11.5 used for s = 2m and ν0 rather than
(m)
ν0 that
(m−1)
ν0 −
(ηV ε ε (R, −) = 0
− q)f f
unless , ∈ V ∗ . Looking again at (1.151) we observe that the only terms for
which this occurs are the terms
−
β 2 νt (ηV ε ε (R, − ) for < ≤ n .
− q)f f
The next result is the heart of the matter. Given a set V with cardV = 2m,
we denote by I a partition of V in sets J with cardJ = 2. When J = {u, v}
we consider the “rectangular sums”
UJ− = R2u−1,2v−1
− −
− R2u−1,2v −
− R2u,2v−1 −
+ R2u,2v ,
and
UJ = R2u−1,2v−1 − R2u−1,2v − R2u,2v−1 + R2u,2v .
(m) − 2m 1 − −
ν0 (ηV f f ) = β m! E f 0 4m ν0 f UJ , (1.337)
I
ch Y J∈I
unless each of the sets {2v − 1, 2v} for v ∈ V contains at least one of the
points r or r (r ≤ m). There are 2m such sets and 2m such points; hence
each set must contain exactly one point. When this is the case let us define
and
"
1 − th2 Y = 1/ch2 Y if r = 2vr −1
ηvr εr 0 = (ε2vr −1 −ε2vr )εr 0 =
−(1 − th Y ) = −1/ch Y if r = 2vr
2 2
τr = 1 if r = 2vr − 1 ; τr = −1 if r = 2vr ,
where the summation is over all the choices of the partition {J1 , . . . , Jm }
of V in sets of two elements, and all choices of r and r as above. Given
the set Jr , there are two possible choices for r (namely r = 2vr − 1 and
r = 2vr ) and similarly there are two possible choices for r . Thus, given the
sets J1 , . . . , Jr , there are 22m choices for the indices r and r , r ≤ m. In the
next step, we add the 22m terms in the right-hand side of (1.340) for which
the sets J1 , . . . , Jm take given values. We claim that this gives a combined
term of the form
120 1. The Sherrington-Kirkpatrick Model
1
β 2m
E f 0 ν0 f − −
UJr .
ch4m Y r≤m
formula, we obtain:
1 (m)
ν ηV f UJ− = ν0 ηV f UJ− + O(2m + 1) . (1.342)
m!
J∈I J∈I
not to be O(2m + 1), the following must occur: given any J0 ∈ I, at least
one of the terms T, of the decomposition (1.345) of UJ0 must occur in the
decomposition of another UJ , J ∈ I ∪ I , J = J0 . (This is because a(1) = 0.)
The only possibility is that J0 ∈ I . Since this must hold for any choice of
J0 , we must have I = I, and thus (1.343) implies
1
ηV f UJ− = β 2m m!E f 0 4m
(m)
ν0 ν UJ2 + O(2m + 1) .
J∈I
ch Y J∈I
(1.346)
Expanding UJ2 using (1.345), and using Theorem 1.10.1 then shows that
ν UJ = (4A2 )m + O(2m + 1) ,
2
J∈I
and combining with (1.342), (1.343) and (1.346) completes the proof.
We shall prove (1.347) by expanding the product and using (1.346) for each
term. We have
122 1. The Sherrington-Kirkpatrick Model
ηJ ηJ
UJ− + = −
UJ , (1.348)
N
N
J∈I I J ∈I
/ J∈I
so that
ηV ηJ UJ− = ηV ηJ2 UJ− .
/
J ∈I J∈I /
J ∈I J∈I
f − = (σN
1
−1 − σN −1 ) · · · (σN −1 − σN −1 ) .
2 2k−1 2k
(1.350)
− −
±β ν0 ηV ε1 ε1 · · · εm εm f
2m (Rr , − q)
r
r≤m
− −
= ±β ν0 (ηV ε1 ε1 · · · εm εm )ν0 f
2m
(Rr , − q) .
r
r≤m
This will be shown by using Corollary 1.11.7 for the (N − 1)-spin system with
Hamiltonian (1.144). First, we observe that if · − denotes an average for the
n
Gibbs measure with Hamiltonian (1.144) then for a function f on ΣN −1 we
have f 0 = f − . So, if ν− (·) = E · − , (1.353) shall follow from
− −
ν− f (Rr , − q) = O(m + p) . (1.354)
r
r≤m
1 N
∼
R, = σi σi = R− , (1.355)
N −1 N − 1 ,
i≤N −1
Recalling the value (1.350) of f − we see that indeed (1.356) follows from
Corollary 1.11.7, because the estimate in that corollary is uniform over β ≤
β0 < 1/2 (and thus the fact that β− in (1.357) depends on N is irrelevant).
Thus we have proved (1.352), and combining with (1.351) we get
1 (m)
ν(ηV f − ) = ν (ηV f − ) + O(k + 1) . (1.358)
m! 0
m≤k−p
Each choice of I gives the same contribution. To count the number of par-
titions I, we observe that if 1 ∈ J, and cardJ = 2, J is determined by its
other element so there are 2p − 1 choices for J. In this manner induction over
p shows that
Therefore
1.12 The SK Model with d-component Spins 125
2 k/2
1 1
ν(ηV f − ) = a(k)β k E 4 + β 2 2
A + O(k + 1) ,
ch2k Y N
Having succeeded to make this computation one can of course ask all
kinds of questions.
lim N k/2 E( σ1 σ2 σ3 − σ1 σ2 σ3 )k .
N →∞
β
− HN = √ gij (σi , σj ) (1.360)
N 1≤i<j≤N
where, of course, (gij )i<j are independent standard normal r.v.s. We may
rewrite (1.360) as
β
− HN = √ gij σi,u σj,u , (1.361)
N u≤d i<j
or,
√ in words, that σi belongs to the Euclidean ball Bd centered at 0, of radius
d. Thus the configuration space is now
SN = BdN .
exp(ε1 h + ε2 h + λε1 ε2 )
with respect to the uniform measure on {−1, 1}2 . This is the case of “two
coupled copies of the SK model” considered in Section 1.9. This case is of
fundamental importance. It seems connected to some of the deepest remain-
ing mysteries of the low temperature phase of the SK model. For large values
of β, this case of “two coupled copies of the SK model” is far from being
completely understood at the time of this writing. One major reason for this
1.12 The SK Model with d-component Spins 127
is that it is not clear how to use arguments in the line of the arguments of
Theorem 1.3.7. The main difficulty is that some of the terms one obtains
when trying to use Guerra’s interpolation have the wrong sign, a topic to
which we will return later.
Let us define
where HN is the Hamiltonian (1.360). (Let us note that in the case where
d = 1 and μ is supported by {−1, 1} this differs from our previous defini-
tion of ZN because we replace a sum over configurations by an average over
configurations.) Let us write
1
pN (β, μ) = E log ZN (β, μ) . (1.364)
N
One of our objectives is the computation of limN →∞ pN (β, μ). It will be
achieved when β is small enough. This computation has applications to the
theory of “large deviations”. For example, in the case of “two coupled copies
of the SK model”, computing limN →∞ pN (β, μ) amounts to computing
1
lim E log exp λN R1,2 , (1.365)
N →∞ N
where now the bracket is an average for the Gibbs measure of the usual
SK model. “Concentration of measure” (as in Theorem 1.3.4) shows that
N −1 log exp λN R1,2 fluctuates little with the disorder. Thus computing
(1.365) amounts to computing the value of exp λN R1,2 for the typical disor-
der. Since we can do this for every λ this is very much the same as computing
N −1 log G⊗2N ({R1,2 ≥ q + a}) and N
−1
log G⊗2N ({R1,2 ≤ q − a}) for a > 0 and
a suitable median value q. In summary, the result of (1.365) can be transfered
in a result about the “large deviations of R1,2 ” for the typical disorder. See
[151] and [162] for more on this.
We will be able to compute limN →∞ pN (β, μ) under the condition Lβd ≤
1, where as usual L is a universal constant. Despite what one might think
at first, the quality of this result does not decrease as d becomes large. It
controls “the same proportion of the high-temperature region √ independently
of d”. Indeed, if μ gives mass 1/2 to the two points (± d, 0, . . . , 0), the
corresponding model is “a clone” of the usual SK model at temperature
βd. The problem of computing limN →∞ pN (β, μ) is much more difficult (and
unsolved) if βd is large.
The SK model with d-component spins offers new features compared with
the standard SK model. One of these is that if μ is “spread out” then one can
understand the system up to values of β much larger than 1/d. For example,
if μ is uniform on {−1, 1}d , the model simply consists in d replicas of the SK
model with h = 0, and we understand it for β < 1/2, independently of the
128 1. The Sherrington-Kirkpatrick Model
u,v 1 1 2
R1,2 = σi,u σi,v . (1.367)
N
i≤N
Theorem 1.12.1. If Lβd ≤ 1, we can find numbers (qu,v ), (ρu,v ) such that
K(d)
ν (Ru,v − ρu,v )2 ≤ , (1.368)
N
u,v≤d
u,v K(d)
ν (R1,2 − qu,v )2 ≤ . (1.369)
N
u,v≤d
β2 2
lim pN (β, μ) = − (ρu,v − qu,v
2
) + E log E(x)dμ(x) (1.374)
N →∞ 4
u,v≤d
ρu = (σ1,u , . . . , σN −1,u ) .
(One will distinguish between the configuration ρu and the numbers ρu,v .)
We define
β
g(ρu ) = √ giN σi,u
N i≤N −1
√ √
gt (ρu ) = tg(ρu ) + 1 − tYu .
We consider the Hamiltonian
β
− HN,t (σ1 , . . . , σN ) = √ gij σi,u σj,u + σN,u gt (ρu )
N u≤d i<j≤N −1 u≤d
β2
+ (1 − t) σN,u σN,v (ρu,v − qu,v ) . (1.375)
2
u,v≤d
The last term is the new feature compared to the standard case.
130 1. The Sherrington-Kirkpatrick Model
n
For a function f on SN we write
νt (f ) = E f t ,
where · t denotes integration with respect to (the nth power of) the Gibbs
n
measure relative to the Hamiltonian (1.375). A function f on SN depends on
1 1 2 2 n n
configurations (σ1 , . . . , σN ), (σ1 , . . . , σN ), . . . , (σ1 , . . . , σN ). We define
−,u,v 1
R, = σi,u σi,v
N
i≤N −1
β2 −,u,v
νt (f ) = νt f εu εv (R, − qu,v (, ))
2
, ≤n,u,v≤d
−,u,v
− nβ 2 νt f εu εn+1
v (R,n+1 − qu,v (, n + 1))
≤n,u,v≤d
β2
νt (f ) = − νt (f εu εv (ρu,v − qu,v ))
2
≤n u,v≤d
1.12 The SK Model with d-component Spins 131
nβ 2
+ νt (f εn+1
u εn+1
v (ρu,v − qu,v ))
2
u,v≤d
1 1 1
+ νt f εu √ g(ρu ) − √ Yu
2
≤n u≤d
t 1−t
n 1 1
− νt f εn+1
u √ g(ρn+1
u ) − √ Y u . (1.377)
2
u≤d
t 1−t
The first two terms are produced by the last term of the Hamiltonian (1.375),
and the last 2 terms by the dependence of gt (ρ) on t. One then performs
Gaussian integration by parts in the last two terms of (1.377), which yields
an expression similar to (1.376), except that one has qu,v rather than qu,v (, )
everywhere. Combining this with the first two terms on the right-hand side
of (1.377) yields (1.376).
The proof of Theorem 1.12.1 will follow the scheme of that of Proposi-
tion 1.6.6, but getting a dependence on d of the correct order requires some
caution.
Corollary 1.12.5. If n = 2, we have
1/2
1/2
−,u,v
|νt (f )| ≤ Lβ 2 d νt (f 2 ) νt (R1,1 − ρu,v )2
u,v≤d
1/2
−,u,v
+ νt (R1,2 − qu,v )2
(1.378)
u,v≤d
and also
|νt (f )| ≤ Lβ 2 d2 νt (|f |) . (1.379)
Here and throughout the book we lighten notation by writing νt (f )1/2 rather
than (νt (f ))1/2 , etc. The quantity νt (f )1/2 cannot be confused with the quan-
tity νt ((f )1/2 ) simply because we will never, ever, consider this latter quantity.
Proof. We write
−,u,v −,u,v
νt f εu εv (R, − qu,v (, )) = νt f εu εv (R, − qu,v (, )) .
u,v≤d u,v≤d
Next, we observe that since we are assuming that for each i we have
2
σi,u ≤d, (1.380)
u≤d
The right-hand side takes only two possible values, depending on whether
= or not. This yields (1.378).
To deduce (1.379) from (1.376), it suffices to show that, for each , we
have
−,u,v
εu εv (R, − qu,v (, )) ≤ 2d2 .
u,v≤d
so that
1
2
qu,v ≤E x2u E(x)dμ(x) x2v E(x)dμ(x) ≤ d2
u,v
Z2 u v
since u≤d x2u ≤ d for x in the support of μ. The inequality u,v ρ2u,v ≤ d2
is similar.
Proof of Theorem 1.12.1. In this proof we assume the existence of numbers
qu,v , ρu,v satisfying (1.372) and (1.373). This existence will be proved later.
Symmetry between sites implies
u,v
A := ν (R1,2 − qu,v )2 = ν(f ) , (1.383)
u,v≤d
where u,v
f= (ε1u ε2v − qu,v )(R1,2 − qu,v ) .
u,v≤d
Next, as in the case of the ordinary SK model, (1.372) implies that for a
function f − on SN
n
−1 , we have
K(d)
|ν0 (f )| ≤ .
N
If βd ≤ 1, (1.379) implies that νt (f ) ≤ Lν1 (f ) whenever f ≥ 0. Combining
this with (1.378), and the usual relation
we get that
1/2
K(d) −,u,v
ν(f ) ≤ 2 2 1/2
+ Lβ d ν(f ) ν (R1,1 − ρu,v )2
N
u,v≤d
1/2
−,u,v
+ν (R1,2 − qu,v ) 2
.
u,v≤d
K(d)
A≤ + Lβ 2 d2 A1/2 (B 1/2 + A1/2 ) , (1.385)
N
where A is defined in (1.383) and
B=ν (Ru,v − ρu,v )2 .
u,v≤d
The same argument (using now (1.373) rather than (1.372)) yields the relation
K(d)
B≤ + Lβ 2 d2 B 1/2 (B 1/2 + A1/2 ) .
N
Combining with (1.385) we get
K(d)
A+B ≤ + L0 β 2 d2 (A + B) ,
N
so that if L0 β 2 d2 ≤ 1/2 this implies that A + B ≤ K(d)/N .
The above arguments prove Theorems 1.12.1, except that it remains to
show the existence of solutions to the equations (1.372) and (1.373). It seems
to be a general fact that “the proof of the existence at high temperature of
solutions to the replica-symmetric equations is implicitly part of the proof
of the validity of the replica-symmetric solution”. What we mean here is
that an argument proving the existence of a solution to (1.372) and (1.373)
can be extracted from the smart path method as used in the above proof of
Theorem 1.12.1. The same phenomenon will occur in many places.
Consider a positive definite symmetric matrix Q = (qu,v )u,v≤d , and a
symmetric matrix Q = (ρu,v )u,v≤d . Consider a centered jointly Gaussian
family (Yu )u≤d as in (1.370). Consider the matrices T (Q, Q ) and T (Q, Q )
given by the right-hand sides of (1.372) and (1.373) respectively. The proof of
the existence of a solution to (1.372) and (1.373) consists in showing that if we
provide the set of pairs of matrices (Q, Q ) as above with Euclidean distance
2
(when seen as a subset of (Rd )2 ), the map (Q, Q ) → (T (Q, Q ), T (Q, Q ))
is a contraction provided Lβd ≤ 1. (Thus it admits a unique fixed point.) To
1.12 The SK Model with d-component Spins 135
Q
see this, considering another pair (Q, ) of matrices, we move from the pair
(Q, Q ) to the pair (Q, Q ) using the path t → (Q(t), Q (t)), where
The estimates required are very similar to those of Corollary 1.12.5 and the
details are better left to the reader.
Proof of Theorem 1.12.2. We just proved the existence of solutions to the
equations (1.372) and (1.373). The uniqueness follows from Theorem 1.12.1.
We begin our preparations for the proof of Theorem 1.12.3. It seems very
likely that one could use interpolation as in (1.108) or adapt the proof of
(1.170). We sketch yet another approach, which is rather instructive in a
different way. We start with the relation
∂pN 1
(β, μ) = 3/2 E (gij σi,u σj,u )
∂β N i<j u≤d
β
= (ν(σi,u σj,u σi,v σj,v ) − ν(σi,u
1 1
σj,u 2
σi,v 2
σj,v )),
N 2 i<j
u,v≤d
we obtain
∂p β
N u,v 2
(β, μ) − ν((Ru,v )2 − (R1,2 ) )
∂β 2 u,v
β 2 2
K(d)
≤ 2 ν((σi,u σi,v )2 − (σi,u
1
σi,v ) ) ≤
N u,v N
i≤N
Therefore, (and since the result is obvious for β = 0) all we have to check is
that the derivative of
β2 2
− (ρu,v − qu,v
2
) + E log E(x)dμ(x) (1.387)
4
u,v≤d
with respect to β is β 2
u,v≤d (ρu,v − qu,v
2
)/2. The crucial fact is as follows.
Lemma 1.12.6. The relations (1.372) and (1.373) mean that the partial
derivatives of the quantity (1.387) with respect to qu,v and ρu,v are zero.
The reader will soon observe that each time we succeed in computing the
limiting value of pN for a certain model, we find this limit as a function F of
certain parameters (here β, μ, (qu,v ) and (ρu,v )). Some of these parameters
are intrinsic to the model (here β and μ) while others are “free” (here (qu,v )
and (ρu,v )). It seems to be a general fact that the “free parameters” are
determined by the fact that the partial derivatives of the function F with
respect to these are 0.
Proof of Lemma 1.12.6. The case of the derivative with respect to ρu,v
is completely straightforward, so we explain only the case of the derivative
with respect to qu,v . We recall the definition (1.371) of E(x):
β2
E(x) = exp xu Yu + xu xv (ρu ,v − qu ,v ) ,
2
u ≤d u ,v ≤d
where the r.v.s Yu are jointly Gaussian and satisfy EYu Yv = β 2 qu ,v . Let us
now consider another jointly Gaussian family Wu and let au ,v = EWu Wv .
Let us define
1.12 The SK Model with d-component Spins 137
β2
E ∗ (x) = exp xu Wu + xu xv (ρu ,v − qu ,v ) ,
2
u ≤d u ,v ≤d
which we think of as a function of the families (qu ,v ) and (au ,v ) (the quanti-
ties (ρu ,v ) being fixed once and for all). The purpose of this is to distinguish
the two different manners in which E(x) depends on qu,v . Thus we have
∂
E log E(x)dμ(x) = I + II , (1.388)
∂qu,v
where
∂
I= E log E ∗ (x)dμ(x) , (1.389)
∂qu,v
and
∂
II = β 2 E log E ∗ (x)dμ(x) . (1.390)
∂au,v
In both these relations, E ∗ (x) is computed at the values au ,v = β 2 qu ,v . To
perform the computation, on has to keep in mind that
xu xv (ρu ,v − qu ,v ) = 2 xu xv (ρu ,v − qu ,v )
u ,v ≤d 1≤u <v ≤d
+ x2u (ρu ,u − qu ,u ) .
u ≤d
G(y1 , . . . , yd )
β2
= log exp xu yu + xu xv (ρu ,v − qu ,v ) dμ(x) ,
2
u ≤d u ,v ≤d
so that
E log E ∗ (x)dμ(x) = EG(W1 , . . . , Wd ) ,
∂ ∂2G
E log E ∗ (x)dμ(x) = E (W1 , . . . , Wd ) ,
∂au,v ∂yu ∂yv
and when this is computed at the values au ,v = β 2 qu ,v this is
138 1. The Sherrington-Kirkpatrick Model
1 1
E xu xv E(x)dμ(x) − E xu E(x)dμ(x) xv E(x)dμ(x) .
Z Z2
Recalling (1.388) this yields the formula
∂ 1
E log E(x)dμ(x) = −β E 2
xu E(x)dμ(x) xv E(x)dμ(x) ,
∂qu,v Z2
from which the conclusion readily follows.
Proof of Theorem 1.12.3. It follows from Lemma 1.12.6 that to differ-
entiate in β the quantity (1.387) we can pretend that qu,v and ρu,v do not
depend on β. To explain why this is the case in a situation allowing for simpler
notation, this is simple consequence of the chain rule,
d ∂F ∂F ∂F
F (β, p(β), q(β)) = + p (β) + q (β) , (1.392)
dβ ∂β ∂p ∂q
so that dF (β, p(β), q(β))/dβ = ∂F /∂β when the last two partial derivatives
of (1.392) are 0. Thus it suffices to prove that
∂
E log E(x)dμ(x) = β (ρ2u,v − qu,v
2
). (1.393)
∂β u,v
Consider a jointly Gaussian family (Xu )u≤d such that E Xu Xv = qu,v , and
which, like qu,v , we may pretend does not depend on β. We may choose these
so that Yu = βXu and now
∂
E(x) = xu Xu + β xu xu (ρu,v − qu,v ) E(x) .
∂β
u≤d u,v≤d
Using Gaussian integration by parts and (1.372) and (1.373) we then reach
that
u≤d xu Xu E(x)dμ(x)
xu xv E(x)dμ(x)
E =β qu,v E
E(x)dμ(x) E(x)dμ(x)
u,v≤d
xu E(x)dμ(x) xv E(x)dμ(x)
−β qu,v E
( E(x)dμ(x))2
u,v≤d
=β (qu,v ρu,v − qu,v
2
),
u,v
1.12 The SK Model with d-component Spins 139
One should comment that the above method of taking the derivative in
β is rather similar in spirit to the method of (1.108); but unlike the proof of
(1.105) it does not use the “right path”, and as a penalty one would have to
work to get the correct rate of convergence K/N instead of obtaining it for
free.
Exercise 1.12.9. Write down a complete proof of Theorem 1.12.3 using in-
terpolation in the spirit of (1.108).
Research Problem 1.12.10. (Level 1) In this problem ν refers to the
Hamiltonian HN of (1.12). Consider a number λ and the following random
function on ΣN
1
ϕ(σ) = log exp(λN R(σ, τ ) − HN (τ )) . (1.394)
N τ
Develop the tools to be able to compute (when β is small enough) the quantity
ν(ϕ(σ)). Compute also ν(ϕ(σ)2 ).
The relationship with the material of the present section is that by Jensen’s
inequality we have
1
ϕ(σ) ≤ log exp(λN R(σ, τ ) − HN (σ) − HN (τ ))
N σ,τ
1
− log exp(−HN (σ)) ,
N σ
and that the expected value of this quantity can be computed using (1.374)
by a suitable choice of μ in a 2-component spin model.
A possible solution to Problem 1.12.10 involves developing the cavity
method in a slightly different setting than we have done so far. Carrying
out the details should be a very good exercise for the truly interested reader.
Research Problem 1.12.11. (Level 2). With the notation above, is it true
that at any temperature for large N one has
Physicists have discovered their results about the SK model using the “replica
method”, a method that has certainly contributed to arouse the interest of
mathematicians in spin glasses. In this section, we largely follow the paper
[81], where the authors attempt as far as possible to make the replica method
rigorous. We start with the following, where we consider only the case of non-
random external field.
Theorem 1.13.1. Consider an integer n ≥ 1. Then
1 n
limN →∞ log E ZN (β, h) = n log 2 (1.396)
N
nβ 2 n2 β 2 2 √
+ max (1 − q)2 − q + log E chn (βz q + h)
q 4 4
Now we have
2 2
E gij σi σj = σi σj
i<j ≤n i<j ≤n
= σi σi σj σj
, i<j
2
1
= σi σi −N
2
, i≤N
1
= (nN 2 − n2 N ) + (σ · σ )2 , (1.397)
2
1≤< ≤n
1/N . (Despite the similarity in notation these r.v.s play a very different rôle
than the interaction r.v.s (gij ).) It follows from (A.6) that
2
β
exp (σ · σ )2 + h σi
2N
σ 1≤< ≤n ≤n,i≤N
=E exp β g, σ · σ + h σi
σ 1≤< ≤n ≤n,i≤N
=E exp β g, σi σi + h σi
σ i≤N 1≤< ≤n ≤n
N
=E exp β g, ε ε + h ε
ε1 ,...,εn =±1 1≤< ≤n ≤n
= E exp N A(g) ,
where
A(g) = log exp β g, ε ε + h ε .
ε1 ,...,εn =±1 1≤< ≤n ≤n
Now,
n(n−1)/4
N 1
E exp N A(g) = exp N A(g) − 2
g, dg
2π 2
1≤< ≤n
1
B(g) = A(g) − 2
g, .
2
1≤< ≤n
We start the proof of Proposition 1.13.2. The proof is pretty but is un-
related to any other argument in this work. It occupies the next two and a
half pages. The fun argument starts again on page 145.
Lemma 1.13.4. Consider numbers a1 , a2 , g. Then
Then if g = 0 (so that |v| = |shg| = 0) there can be equality in (1.402) only
if for some λ we have
1.13 The Physicist’s Replica Method 143
this implies
1
B(g) ≤
(B(g ) + B(g )) (1.405)
2
so that both g and g are maximizers. Moreover, since g is a maximizer, we
have B(g) = B(g ) = B(g ) so in fact
1 1
A(g) = A(g ) + A(g ) . (1.406)
2 2
Let us introduce the notation
α = (ε1 , . . . , εn ) ; Aj (α) = β gj, ε + h for j = 1, 2
3≤≤n
w(α) = exp β g, ε ε + h ε .
3≤< ≤n 3≤≤n
where
Bj (α) = ch2 Aj (α)chβ|g1,2 | + sh2 Aj (α)shβ|g1,2 | .
The Cauchy-Schwarz inequality implies
1/2 1/2
4 w(α)B1 (α) 1/2 1/2
B2 (α) ≤ 4 w(α)B1 (α) 4 w(α)B2 (α)
α α α
where the equality follows from the computation performed in the first three
lines of (1.407). Combining with (1.407) proves (1.404).
In order to have (1.406) we must have equality in (1.407). Since each
quantity w(α) is > 0, for each α we must have
If g1,2 > 0 Lemma 1.13.4 shows that A1 (α) = A2 (α) for each α, and thus
g1, = g2, for each ≥ 3. If g1,2 < 0 Lemma 1.13.4 shows that A1 (α) =
−A2 (α) for each α, so that (h = 0 and) g1, = −g2, for each ≥ 3.
Proof of Proposition 1.13.2. Consider a maximizer g. There is nothing
to prove if g = 0, so we assume that this is not the case. In a first step we
prove that |g, | does not depend on , . Assuming g2,3 = 0, we prove that
|g1,2 | = |g2,3 |; this clearly suffices. By Lemma 1.13.5, g is a maximizer, and
by definition g1,3 = g2,3 = 0. Since g1,3 = 0, and g is a maximizer, Lemma
1.13.5 shows that |g1, | = |g3, | for ∈ {1, 3}, and in particular |g1,2 | = |g2,3 |,
i.e. |g1,2 | = |g2,3 |.
Next, consider a subset I ⊂ {1, . . . , n} with the property that
If no such set exists, g, < 0 for each , and we are done. Otherwise consider
I as large as possible. Without loss of generality, assume that I = {1, . . . , m},
and note that m ≥ 1. If m = n we are done. Otherwise, consider first > m.
We observe by Lemma 1.13.5 that if 1 < 2 ≤ m we have g1 , = g2 , , and
since we have assumed that I is as large as possible, we have g1 , < 0. Next
consider 1 < m < < . Then as we have just seen both g1 , and g1 ,
are < 0 so that Lemma 1.13.5 shows that g, > 0. Therefore, for a certain
number a ≥ 0 we have, for <
This proves b). To prove a) we observe that when h > 0 we have shown that
in fact g, ≥ 0 when g is a maximizer.
1.13 The Physicist’s Replica Method 145
E (chn Y th2 Y )
qn = (1.408)
E chn Y
√
where Y = βz qn + h.
Let us also observe that
1 t
lim log E ZN = E log ZN , (1.409)
t→0+ t
as follows from the fact that ZNt
1 + t log ZN for small t.
Now we take a deep breath. We pretend that Theorem 1.13.1 is true not
only for n integer, but for any number n > 0. We rewrite (1.409) as
1 1 n
E log ZN = lim log E ZN . (1.410)
N n→0 N n
but the only part of the book I feel I understand is the introduction (on which
Section 1.1 relies heavily). Two later (possibly more accessible) books about
spin glasses written by physicists are [59] and [112]. The recent book by M.
Mézard and A. Montanari [102] is much more accessible to a mathematically
minded reader. It covers a wide range of topics, and remarkably succeeds at
conveying the breath and depth of the physical ideas.
The first rigorous results on the SK model concern only the case h = 0.
They are proved by Aizenman, Lebowitz and Ruelle in [4] using a “cluster
expansion technique”, which is a common tool in physics. Their methods
seem to apply only to the case h = 0. At about the same time, Fröhlich and
Zegarlinski [61] prove (as a consequence of a more general approach that is
also based on a cluster expansion technique) that the spin correlations vanish
if β ≤ β0 , even if h = 0. In fact they prove that
L
E( σ1 σ2 − σ1 σ2 )2 ≤ . (1.412)
N
A later paper by Comets and Neveu [49] provides more elegant proofs
of several of the main results of [4] using stochastic calculus. Their method
unfortunately does not appear to extend beyond the case h = 0. They prove
a central limit theorem for the overlap R1,2 .
Theorem 1.3.4 is a special occurrence of the general phenomenon of con-
centration of measure. This phenomenon was first discovered by P. Lévy, and
its importance was brought to light largely through the efforts of V.D. Mil-
man [106]. It is arguably one of the truly great ideas of probability theory.
More references, and applications to probability theory can be found in [139]
and [140]. In the fundamental case of Gaussian measure, the optimal result
is already obtained in [86], and Theorem 1.3.4 is a weak consequence of this
result. Interestingly, it took almost 20 years after the paper [86] before re-
sults similar to (1.54) were obtained in the theory of disordered systems, by
Pastur and Shcherbina [120], using martingale difference sequences. A very
nice exposition of most of what is known about concentration of measure can
be found in the book of M. Ledoux [93]
It was not immediately understood that, while the case β < 1, h = 0
of the SK model is not very difficult, the case h = 0 is an entirely different
matter. The first rigorous attempt at justifying the mysterious expression
in the right-hand side of (1.73) is apparently that of Pastur and Shcherbina
[120]. They prove that this formula holds in the domain where
but they do not prove that (1.413) is true for small β. Their proof required
them to add a strange perturbation term to the Hamiltonian. The result was
later clarified by Shcherbina [127], who used the Hamiltonian (1.61) with
hi Gaussian. Using arguments somewhat similar to those of the Ghirlanda-
Guerra identities, (which we will study in Volume II) she proved that (1.413)
148 1. The Sherrington-Kirkpatrick Model
She did not prove (1.414). She was apparently unaware that (1.414) is proved
in [61] for small β. Since the paper [127] was not published, I was not aware
of it and rediscovered its results in Section 4 of [141] with essentially the same
proof. I also gave a very simple proof of (1.412) for small β. Discovering this
simple proof was an absolute disaster, because I wasted considerable energy
trying to use the same principle in other situations, which invariably led to
difficult proofs of suboptimal results. I will not describe in detail the contents
of [141] or my other papers because this now does not seem so interesting
any more. I hope that the proofs presented here are much cleaner than those
of these previous papers.
In a later paper Shcherbina [128] proved that limN →∞ pN (β, h) = SK(β, h)
in a remarkably large region containing in particular all values β < 1. The
ideas of this paper are not really transparent to me. A later version [129]
is more accessible, but I became aware of its existence too late to have the
energy to analyze it. It would be interesting to decide if this approach suc-
ceeds because of a special trick, or if it contains the germ of a powerful
method. One should however point out that her use of relations similar to
the Ghirlanda-Guerra identities seems to preclude obtaining the correct rates
of convergence.
I proved in [149] an expansion somewhat similar to (1.151), using a more
complicated method that does not seem to extend to the model to be consid-
ered in Chapter 2. This paper proves weaker versions of many of the results
of Section 1.6 and Section 1.8 to Section 1.10. The existence of the limits of
quantities such as N k/2 E A , where A is the product of k terms of the type
R, is proved by a recursion method very similar to the one used here, but
the limit is not computed explicitly.
I do not know who first used the “smart path method”. The proof of
Proposition 1.3.3 is due to J.P. Kahane [87] and that of Theorem 1.3.4 is
due to G. Pisier [124]. I had known these papers since they appeared, but
it took a very, very long time to realize that it was the route to take in the
cavity method. The smart path method was first used in this context in [147],
and then systematically in [158]. Interestingly, Guerra and Toninelli arrived
independently at the very similar idea of interpolating between Hamiltonians
as in Section 1.3. Proposition 1.3.2 must have been known for a very long
time, at least as far back as [137].
The reader might wonder about the purpose of (1.152), since we nearly
always use (1.151) instead. One use is that, using symmetry between sites, we
can get a nice expression for ν1 (f ). This idea will be used in Volume II. We
do not use it here, because, besides
controlling the quantities R1,2 , it requires
controlling R1,2,3,4 = N −1 i≤N σi1 σi2 σi3 σi4 . To give a specific example, if
f = R1,2 − q, we get from (1.152) that
1.14 Notes and Comments 149
In this way we have fewer error terms to control in the course of proving the
central limit theorems presented here. The drawback is that one must prove
first that ν((R1,2,3,4 − q)2n ) ≤ K/N n (which is not very difficult).
Two months after the present Chapter was widely circulated at the time
of [157] (in a version that already contained the central limit theorems of
Section 1.10), the paper [74] came out, offering very similar results, together
with a CLT for N −1 log ZN (β, h), of which Theorem 1.4.11 is a quantitative
improvement.
I am grateful to M. Mézard for having explained to me the idea of coupling
two copies of the SK model, and the discontinuity this should produce beyond
the A-T line. This led to Theorem 1.9.6.
Guerra’s bound of (1.73) is proved in [71] where Proposition 1.3.8 can
also be found. (This lemma was also proved independently by R. Latala in
an unpublished paper.)
The present work should make self-apparent the amount of energy already
spent in trying to reach a mathematical understanding of mean field models
related to spin glasses. It is unfortunate that some of the most precise results
about the SK model rely on very specific properties of this model. However
fascinating, the SK model is a rather specific object, and as such its impor-
tance can be questioned. I feel that the appeal of the “theory” of spin glasses
does not lie in any particular model, but rather in the apparent generality
of the phenomenon it predicts. About this, we still understand very little,
despite all the examples that will be given in forthcoming chapters.
2. The Perceptron Model
2.1 Introduction
The name of this chapter comes from the theory of neural networks. An ac-
cessible introduction to neural networks is provided in [83], but what these
are is not relevant to our purpose, which is to study the underlying mathe-
matics. Roughly speaking, the basic problem is as follows. What “propor-
tion” of ΣN = {−1, 1}N is left when one intersects this set with many
random half-spaces? A natural definition for a random half-space is a set
{x ∈ RN ; x · v ≥ 0} where the random vector v is uniform over the unit
sphere of RN . More conveniently one can consider the set {x ∈ RN ; x·g ≥ 0},
where g is a standard Gaussian vector, i.e. g = (gi )i≤N , where gi are indepen-
dent standard Gaussian r.v.s. This is equivalent because the vector g/g is
uniformly distributed on the unit sphere of RN . Consider now M such Gaus-
sian vectors gk = (gi,k )i≤N , k ≤ M , all independent, the half-spaces
Uk = {x ; x · gk ≥ 0} = x , gi,k xi ≥ 0 ,
i≤N
called the binary perceptron, while the situation with SN is usually called the
spherical perceptron. The spherical perceptron will motivate the next chapter.
We will return to both the binary and the spherical perceptron in Volume II,
in Chapter 8 and Chapter 9 respectively. Both the spherical and the binary
perceptron admit another popular version, where the Gaussian r.v.s gi,j are
replaced by independent Bernoulli r.v.s (i.e. independent random signs), and
we will also study these. Thus we will eventually investigate a total of four
related but different models. It is not very difficult to replace the Gaussian
r.v.s by random signs; but it is very much harder to study the case of ΣN
than the case of the sphere.
Research Problem 2.1.1. (Level 3!) Prove that there exists a number α∗
and a function ϕ : [0, α∗ ) → R with the following properties:
1 - If α > α∗ , then as N → ∞ and M/N → α the probability that the set
(2.1) is not empty is at most exp(−N/K(α)).
2 - If α < α∗ , N → ∞ and M/N → α, then
1
log card ΣN ∩ Uk → ϕ(α) (2.3)
N
k≤M
where u is a function, and where (gi,k ) are independent standard normal r.v.s.
Of course the Hamiltonian depends on u, but the dependence is kept implicit.
The role of the factor N −1/2 is to make the quantity N −1/2 i≤N gi,k σi
typically of order 1. There is no parameter β in the right-hand side of (2.6),
since this parameter can be thought of as being included in the function u.
Since it is difficult to prove anything at all without using integration
by parts we will always assume that u is differentiable. But if we want the
Hamiltonian (2.6) to be a fair approximation of the Hamiltonian (2.5), we will
have to accept that u takes very large values. Then, in the formulas where
u occurs, we will have to show that somehow these large values cancel out.
There is no magic way to do this, one has to work hard and prove delicate
estimates (as we will do in Chapter 8). Another source of difficulty is that we
want to approximate the Hamiltonian (2.5) for large values of β. That makes
it difficult to bound from below a number of quantities that occur naturally
as denominators in our computations.
On the other hand, there is a kind of beautiful “algebraic” structure
connected to the Hamiltonian (2.6), which is uncorrelated to the analytical
problems described above. We feel that it is appropriate, in a first stage,
to bring this structure forward, and to set aside the analytical problems (to
which we will return later). Thus, in this chapter we will assume a very strong
condition on u, namely that for a certain constant D we have
∀ , 0 ≤ ≤ 3 , |u() | ≤ D . (2.7)
Given values of N and M we will try to “describe the system generated by the
Hamiltonian (2.6)” within error terms that become small for N large. We will
be able to do this when the ratio α = M/N is small enough, α ≤ α(D). The
notation α = M/N will be used through this chapter and until Chapter 4.
Let us now try to give an overview of what will
happen, without getting
into details. We recall the notation R, = N −1 i≤N σi σi . As is the case
for the SK model, we expect that in the high-temperature regime we have
R1,2 q (2.8)
After one works some length of time with the system, one gets the irresistible
feeling that (in the high-temperature regime) “the quantities Sk behave like
individual spins”, and (2.8) has to be complemented by the relation
154 2. The Perceptron Model
1
u (Sk1 )u (Sk2 ) r (2.10)
N
k≤M
where r is another number attached to the system. Probably the reader would
expect a normalization factor M rather than N in (2.10), but since we should
think of M/N as M/N → α > 0, this is really the same. Also, the occurrence
of u will soon become clear.
We will use the cavity method twice. In Section 2.2 we “remove one spin”
as in Chapter 1. This lets us guess what is the correct expression of q as
a function of r. In Section 2.3, we then use the “cavity in M ”, comparing
the system with the similar system where M has been replaced by M − 1.
This lets us guess what the expression of r should be as a function of q. The
two relations between r and q that are obtained in this manner are called
the “replica-symmetric equations” in physics. We prove in Section 2.4 that
these equations do have a solution, and that (2.8) and (2.10) hold for these
values of q and r. For N large and M/N small, we will then (approximately)
compute the value of
1
pN,M (u) = E log exp(−HM,N (σ)) , (2.11)
N σ
σN Y , where Y is a Gaussian r.v. independent of all the other r.v.s (Of course
at some point we will have to guess what is the right choice for r = EY 2 , but
the time will come when this guess will be obvious.) Thus we expect that
u(Sk ) u(Sk0 ) + σN Y + constant . (2.13)
k≤M k≤M
Rather than using power expansions (which are impractical when we do not
have a good control on higher derivatives) it is more fruitful to find a suitable
interpolation between the left and the right-hand sides of (2.13). The first idea
that comes to mind is to use the Hamiltonian
t
√
u Sk +0
gN,k σN + σN 1 − tY . (2.14)
N
k≤M
This is effective and was used in [157]. However, the variance of the Gaussian
r.v. Sk0 + t/N gN,k σN depends on t; when differentiating, this creates terms
that we will avoid by being more clever. Let us consider the quantity
0 t 1−t
Sk,t = Sk,t (σ, ξk ) = Sk + gN,k σN + ξk
N N
1 t 1−t
= √ gi,k σi + gN,k σN + ξk . (2.15)
N N N
i<N
In this expression, we should think of (ξk )k≤M not just as random constants
ensuring that the variance of Sk,t is constant but also as “new spins”. That
is, let ξ = (ξk )k≤M ∈ RM , and consider the Hamiltonian
√
− HN,M,t (σ, ξ) = u(Sk,t ) + σN 1 − tY . (2.16)
k≤M
1
f t = ··· f (σ 1 , . . . , σ n , ξ 1 , . . . , ξ n )
Zn
σ 1 ,...,σ n
× exp − HN,M,t dγ(ξ 1 ) · · · dγ(ξ n ) ,
(2.17)
≤n
In these formulas, ξ = (ξk )k≤M , ξk are independent Gaussian r.v.s. One
should think of ξ as being a “replica” of ξ. In this setting, replicas are
simply independent copies.
Exercise 2.2.1. Prove that when f depends on σ 1 , . . . , σ n , but not on
ξ 1 , . . . , ξ n , then f t in (2.19) is exactly the average of f with respect to
the Hamiltonian
1 t √
−H = ut √ gi,k σi + gN,k σN + σN 1 − tY ,
N i≤N −1 N
k≤M
where ut is defined by
1−t
exp ut (x) = E exp u x + ξ , (2.20)
N
for ξ a standard normal r.v.
The reader might wonder whether it is really worth the effort to introduce
this present setting simply in order to avoid an extra term in Proposition 2.2.3
below, a term with which it is not so difficult to deal anyway. The point is
that the mechanism of “introducing new spins” is fundamental and must be
used in Section 2.3, so we might as well learn it now.
Consistently with our notation, if f is a function on ΣNn
× RM n , we define
d
νt (f ) = E f t ; νt (f ) = νt (f ) , (2.21)
dt
2.2 The Smart Path 157
νt (f ) = I + II (2.23)
I=α νt ε ε u (SM,t )u (SM,t )f
1≤< ≤n
− αn
νt ε εn+1 u (SM,t n+1
)u (SM,t )f
≤n
n(n + 1) n+1 n+2
+α νt εn+1 εn+2 u (SM,t )u (SM,t )f . (2.24)
2
II = −r νt (ε ε f ) − n νt (ε εn+1 f )
1≤< ≤n ≤n
n(n + 1)
+ νt (εn+1 εn+2 f ) . (2.25)
2
think at first. In particular one should observe that by symmetry, and since
α = M/N , in the expression for I we can replace the term αu (SM,t )u (SM,t )
by
1
u (Sk,t )u (Sk,t ),
N
k≤M
so that if (2.10) is indeed correct, the terms I and II should have a good will
to cancel each other out.
Proof. We could make this computation appear as a consequence of (1.40),
but for the rest of the book we will change policy, and proceed directly, i.e.
we write the value of the derivative and we integrate by parts. It is immediate
from (2.19) that
d d d
f t =
(−HN,M,t )f −n n+1
(−HN,M,t )f , (2.26)
dt dt t dt t
≤n
d 1 gk ε ξk
ε Y
(−HN,M,t ) = √ √ − √
u (Sk,t )− √ . (2.27)
dt
k≤M
2 N t 1 − t 2 1−t
νt (f ) = III + IV + V
α N
III = νt gM ε u
(SM,t )f − nνt gM εn+1 u (SM,t )f
n+1
(2.28)
2 t
≤n
N
α n+1
IV = − νt ξM u (SM,t )f − nνt ξM u (SM,t )f
n+1
1−t
2
≤n
1 1
V=− √ νt (ε Y f ) − nνt (εn+1 Y f ) .
2 1−t
≤n
It remains to integrate by parts in these formulas to get the result. The easiest
case is that of the term IV, because “different replicas use independent copies
of ξ”. We write the explicit formula for ξM u (SM,t )f t , that is
ξM u (SM,t )f t
1
= n Eξ ξM
u
(SM,t )f (σ 1 , . . . , σ n ) exp −
HM,N,t ,
Z 1 σ ,...,σ n ≤n
2.2 The Smart Path 159
and we see that we only have to integrate by parts in the numerator. The
dependence on ξM is through u (SM,t ) and through the term u(SM,t ) in the
Hamiltonian and moreover
∂SM,t 1−t
= , (2.29)
∂ξM N
so that
1 − t
ξM u (SM,t )f t = (u (SM,t ) + u 2 (SM,t
))f t ,
N
and therefore
α
IV = −
νt ((u (SM,t )+u 2 (SM,t
))f −nνt (u (SM,t
n+1 n+1
)+u 2 (SM,t ))f .
2
≤n
The second easiest case is that of V, because we have done the same com-
putation (implicitly at least) in Chapter 1; since EY 2 = r, we have V = II.
Of course, the reader who does not find this formula obvious should simply
write
νt (ε Y f ) = EY ε f t ,
and carry out the integration by parts, writing the explicit formula for ε f t .
To compute the term III, there is no miracle. We write
νt (gM ε u (SM,t )f ) = EgM ε u (SM,t )f t
and we use the integration by parts formula E(gM F (gM )) = EF (gM ) when
seeing ε u (SM,t )f t as a function of gM . The dependence on gM is through
the quantities SM,v , and
∂SM,v t
= ε .
∂gM N
Writing the (cumbersome) explicit formula for ε u (SM,t )f t , we get that
∂ t
ε u (SM,t )f t = u (SM,t )f t
∂gM N
+
ε ε u (SM,t )u (SM,t )f t − n ε εn+1 u (SM,t
n+1
)u (SM,t )f t .
≤n
The first term arises from the dependence of the factor u (SM,t ) on gM and
the other terms from the dependence of the Hamiltonian on gM . Consequently
we obtain
t
νt (ε u (SM,t )f ) = νt (u (SM,t )f )
N
+ νt (ε ε u (SM,t )u (SM,t )f ) − nνt (ε εn+1 u (SM,t )u (SM,t )f ) .
n+1
≤n
160 2. The Perceptron Model
Similarly we have
∂ n+1 t n+1
εn+1 u (SM,t )f t = u (SM,t )f t
∂gM N
n+1
+ ε εn+1 u (SM,t )u (SM,t )f t
≤n+1
− (n + 1) εn+1 εn+2 u (SM,t
n+1 n+2
)u (SM,t )f t ,
and consequently
n+1 t n+1
νt (εn+1 u (SM,t )f ) = νt (u (SM,t )f )
N
n+1
+ νt (ε εn+1 u (SM,t )u (SM,t )f )
≤n+1
− (n + 1)νt (εn+1 εn+2 u (SM,t
n+1 n+2
)u (SM,t )f ) .
Exercise 2.2.4. Suppose that we had not been as sleek as we were, and that
instead of (2.15) and (2.22) we had defined
t 1 t
0
Sk,t = Sk,t (σ) = Sk + gN,k σN = √ gi,k σi + gN,k σN
N N N
i<N
and
1 t
Sk,t =√ gi,k σi +
gN,k σN .
N i<N N
Prove that then in the formula (2.23) we would get the extra term
α
VI =
νt u (SM,t )2 +u (SM,t
) f −nνt u (SM,t
n+1 2 n+1
) +u (SM,t ) f .
2
≤n
2.3 Cavity in M
To pursue the idea that the terms I and II in (2.23) should nearly cancel out
each other, the first thing to do is to try to make sense of the term I, and to
understand the influence of the quantities u (SM,t ). The quantities SM,t also
occur in the Hamiltonian, and we should make this dependence explicit. For
this we introduce a new Hamiltonian
2.3 Cavity in M 161
√
− HN,M −1,t (σ, ξ) = u(Sk,t (σ, ξk )) + σN 1 − tY , (2.30)
k≤M −1
Our best guess now is that the quantities SM,t , when seen as functions
of the system with Hamiltonian (2.30), will have a jointly Gaussian behavior
under Gibbs’ measure, with pairwise correlation q, allowing us to approx-
imately compute the right-hand side of (2.32) in Proposition 2.3.5 below.
This again will be shown by interpolation. Let us consider a new parameter
0 ≤ q ≤ 1 and standard Gaussian r.v.s (ξ ) and z that are independent of all
162 2. The Perceptron Model
the other r.v.s already considered. (The reader will not confuse the r.v.s ξ
with the r.v.s ξM .) Let us set
√
θ = z q + ξ 1−q . (2.33)
Thus these r.v.s share the common randomness z and are independent given
that randomness. For 0 ≤ v ≤ 1 we define
√ √
Sv = vSM,t + 1 − vθ . (2.34)
so that in fact νt,v = E · t,v . Let us observe that the r.v. θ depends also
on z, but this r.v. is not considered as a “new spin”, but rather as “new
randomness”.
The present idea of considering ξ as a new spin is essential. As we men-
tioned on page 156, the idea of considering ξ1 , . . . , ξM as new spins was not
essential, but since it is the same idea, we decided to make the minimal extra
effort to use the setting of (2.19).
First, we reveal the magic of the computation of νt,0 .
2.3 Cavity in M 163
and
νt,0 (u (S01 )u (S02 )f ) = rE f t,∼ . (2.38)
This follows from the formula (2.31). The quantities θ do not depend on the
spins σ, and their randomness “in the variables labeled ξ” is independent of
the randomness of the other terms. Now, independence implies
Eξ exp u(θ ) = (Eξ exp u(θ))n .
≤n
Moreover exp u(θ) t,∼ = Eξ exp u(θ), as (an obvious) special case of
(2.39). This proves (2.37).
To prove (2.38), proceeding in a similar manner and using now that
2 n−2
Eξ u (θ1 )u (θ2 ) exp u(θ ) = Eξ u (θ) exp u(θ) Eξ exp u(θ) ,
≤n
we get
f u (θ1 )u (θ2 ) exp ≤n u(θ ) t,∼
νt,0 (u (S01 )u (S02 )f ) =E n
exp u(θ) t,∼
= rE f t,∼ ,
n
Lemma 2.3.2. Consider a function f on ΣN . This function depend on the
variables ξk for k < M and ≤ n, but it does not depend on the randomness
of the variables z, gi,M , ξM or ξ . Then if Bv ≡ 1 or Bv = u (Sv1 )u (Sv2 ),
whenever 1/τ1 + 1/τ2 = 1 we have
d
νt,v (Bv f ) ≤ K(n, D) νt,v (|f |τ1 )1/τ1 νt,v (|R1,2 − q|τ2 )1/τ2 + 1 νt,v (|f |) .
dv N
(2.40)
Here K(n, D) depends on n and D only.
Therefore the left-hand side is small if we can find q such that R1,2 q. The
reason why we write a derivative in the left-hand side rather than a partial
derivative is that when considering νt,v we always think of t as fixed.
Proof. The core of the proof is to compute d(νt,v (Bv f ))/dv by differentiation
and integration by parts, after which the bound (2.40) basically follows from
Hölder’s inequality. It turns out that if one looks at things the right way,
there is a relatively simple expression for d(νt,v (Bv f ))/dv. We will not reveal
this magic formula now. Our immediate concern is to explain in great detail
the mechanism of integration by parts, that will occur again and again, and
for this we decided to use a completely pedestrian approach, writing only
absolutely explicit formulas.
First, we compute d(νt,v (Bv f ))/dv by straightforward differentiation of
the formula (2.35). In the case where Bv = u (Sv1 )u (Sv2 ), setting
1 1
Sv = √ SM,t − √ θ ,
2 v 2 1−v
we find
d
(νt,v (Bv f )) = νt,v f Sv1 u (Sv1 )u (Sv2 ) + νt,v f Sv2 u (Sv1 )u (Sv2 )
dv
+ νt,v f Sv u (Sv )u (Sv1 )u (Sv2 )
≤n
− (n + 1)νt,v f Svn+1 u (Svn+1 )u (Sv1 )u (Sv2 ) . (2.41)
Of course the first term occurs because of the factor u (Sv1 ) in Bv , the second
term because of the factor u (Sv2 ) and the other terms because of the depen-
dence of the Hamiltonian on v. The rest of the proof consists in integrating
by parts. In some sense it is a straight forward application of the Gaussian
integration by parts formula (A.17). However, since we are dealing with com-
plicated expressions, it will take several pages to fill in all the details. The
notation is complicated, and this obscures the basic simplicity of the argu-
ment. Probably the ambitious reader should try to compute everything on
her own in simple case, and look at our presentation only if she gets stuck.
Even though we have written the previous formula in a compact form
using νt,v , to integrate by parts we have to spell out the dependence of the
2.3 Cavity in M 165
Hamiltonian on the variables Sv by using the formula (2.35). For example,
the first term in the right-hand side of (2.41) is
1
f Sv u (Sv1 )u (Sv2 ) exp
≤n u(Sv ) t,∼
E . (2.42)
exp u(Sv1 ) nt,∼
To keep the formulas manageable, let us write
w = w(σ 1 , . . . , σ n , ξ 1 , . . . , ξ n ) = exp −
HN,M −1,t
≤n
w∗ = w∗ (σ , ξ ) = exp(−HN,M
−1,t ) .
and where
C = f u (Sv1 )u (Sv2 ) exp u(Sv ) .
≤n
Let us now make an observation that will be used many times. The r.v.
Z is independent of all the r.v.s labeled ξ, so that
Eξ σ1 ,...,σn w Sv1 C 1
σ 1 ,...,σ n w Sv C
n
= E ξ n
,
Z Z
and thus the quantity (2.43) is then equal to
C C
EEξ w Sv1 =E w Sv1 . (2.44)
Zn Zn
σ 1 ,...,σ n σ 1 ,...,σ n
Let us now denote by E0 integration in the randomness of gi,M , ξM , z and
ξ , given all the other sources of randomness. Therefore, since the quantities
w do not depend on any of the variables gi,M , ξk , z or ξ , the quantity (2.44)
equals
C
E w E0 Sv1 n . (2.45)
1 n
Z
σ ,...,σ
166 2. The Perceptron Model
Condition (2.47) holds simply because to compute Fσ1 ,...,σn ((zσ )), we sub-
stitute zσ = Sv to xσ in the previous formula. This construction however
does not suffice, because Z cannot be considered as a function of the quan-
tities zσ : the effect of the expectation Eξ is that “the part depending on the
r.v.s labeled ξ has been averaged out”. The part of zσ that does not depend
on the r.v.s labeled ξ is simply
√ 1 t √ √
yσ = v √ gi,M σi + gN,M σN + 1 − v qz .
N N
i<N
Defining
√ 1−t √
ξ∗ = v ξM + 1 − v 1 − qξ ,
N
we then have
zσ = yσ + ξ∗ .
2.3 Cavity in M 167
1 √
− √ ( qz + 1 − qξ ) ,
2 1−v
so that Sv = zσ . The family of all the r.v.s zσ , yσ , ξ∗ , and zσ is a Gaussian
family, and this is the family we will use to apply the integration by parts
formula. In the upcoming formulas, the reader should take great care to
distinguish between the quantities zσ and zσ (The position of the is not
the same).
We note the relations
E(θ )2 = 1 = E(SM,t (σ, ξM
))2 ; = ⇒ Eθ θ = q .
1 t
= ⇒ ESM,t (σ, ξM
)SM,t (τ , ξM ) = Rt (σ, τ ) := σi τi + σN τN ,
N N
i<N
so that
1 t
Ezσ zσ = 0 ; = ⇒ Ezσ zτ = (R (σ, τ ) − q) , (2.50)
2
and
1 t
(R (σ, τ ) − q) .
Ezσ yτ = (2.51)
2
We will simply use the integration by parts formula (A.17) and these
relations to understand the form of the quantity
C 1 Fσ 1 ,...,σ n ((zσ ))
E0 Sv1 = E 0 zσ 1 . (2.52)
Zn F1 ((yσ ))n
Let us repeat that this integration by parts takes place given all the
sources of randomness other than the r.v.s gi,M , ξk for k < M , z and ξ
(so that it is fine if f depends on some randomness independent of these).
The exact result of the computation is not relevant now (it will be given
168 2. The Perceptron Model
in Chapter 9). For the present result we simply need the information that
t
dνt,v (Bv f )/dv is a sum of terms of the type (using the notation R, =
t
R (σ , σ ))
− q)A) ,
t
νt,v (f (R, (2.53)
where A is a monomial in the quantities u (Svm ), u (Svm ), u(3) (Svm ) for m ≤
n + 2. So, let us perform the integration by parts in (2.52):
It is convenient to refer to the last term in the above (or similar) formula “as
the term created by the denominator” when performing the integration by
parts in (2.52). (It would be nice to remember this, since we will often use this
expression in our future attempts at describing at a high level computations
similar to the present one.) We first compute this term. We observe that
∂F1
= Eξ w∗ (τ , ξ 1 )u (xτ + ξ∗1 ) exp u(xτ + ξ∗1 ) .
∂xτ
Therefore using (2.51) we see that the term created by the denominator in
(2.52) is
so that, changing the name of τ into σ n+1 , and since w∗n+1 = w∗ (σ n+1 , ξ n+1 ),
the quantity (2.54) is equal to (using (2.46) in the second line)
In a last step we observe that in the above formula we can remove the expec-
tation Eξ . This is because the r.v.s labeled ξ that occur in this expectation
(namely ξ n+1 and ξ n+1 ) are independent of the other r.v.s labeled ξ that
occur in C and w. In this manner we finally see that the contribution of this
quantity to the computation of (2.42) is
n t
C(R1,n+1 − q)ww∗n+1 u (Svn+1 ) exp u(Svn+1 )
− E
2 Z n+1
σ 1 ,...,σ n+1
n
= − νt,v f (R1,n+1t
− q)u (Sv1 )u (Sv2 )u (Svn+1 ) .
2
In a similar manner we compute the contribution in (2.52) of the dependence
of Fσ1 ,...,σn on the variables zσ at a given value of , i.e of the quantity
∂Fσ1 ,...,σn 1
E0 zσ1 1 zτ E0 ((zσ )) . (2.55)
τ
∂xτ F1 ((yσ ))n
Since Ezσ zσ = 0 by (2.50) we see that for = 1 the contribution of this term
is 0.
When ≥ 3, we have
∂Fσ1 ,...,σn 1 n 1 2
((xσ )) = f (σ , . . . , σ )u (xσ1 )u (xσ2 )u (xσ ) exp u(xσ ) ,
∂xτ
≤n
1
νt,v (f Sv1 u (Sv1 )u (Sv2 )) = t
νt,v f (R1,2 − q)u (Sv1 )u (Sv2 )
2
1
+ t
νt,v f (R1, − q)u (Sv1 )u (Sv2 )u (Sv )
2
2≤≤n
n
− νt,v f (R1,n+1
t
− q)u (Sv1 )u (Sv2 )u (Svn+1 ) .
2
170 2. The Perceptron Model
d 1
νt,v (Bv f ) ≤ K(n, D) νt,v (|f ||R, − q|) + νt,v (|f |) .
dv N
1≤< ≤n+2
(2.59)
To conclude we use Hölder’s inequality.
Exercise 2.3.3. Let us recall the notation Sk,t of Proposition 2.2.3 and de-
fine
1 gk ε ξ
Sk,t = √ √ −√ k ,
2 N t 1−t
so that (2.27) becomes
d ε Y
(−HN,M,t )=
Sk,t
u (Sk,t )− √ .
dt
k≤M
2 1−t
Then get convinced that the term I in (2.23) can be obtained “in one step”
rather than by integrating by parts separately over the r.v.s ξk, and gk as
was done in the proof of Proposition 2.2.3.
172 2. The Perceptron Model
and thus
1 1
ν0 (f ) = ν0 (1 − ε1 ε2 q) = (1 − q 2 ) . (2.72)
N N
To compute νt (f ), we use Proposition 2.3.6 with n = 2 and τ1 = τ2 = 2.
Since |f | ≤ 2|R1,2 − q|, we obtain
1
|νt (f )| ≤ αK(D) νt (R1,2 − q) + ν(|f |) .
2
(2.73)
N
One should observe that in the above argument we never used the unique-
ness of the solutions of the equations (2.68) to obtain (2.69), only their exis-
tence. In turn, uniqueness of these solutions follows from (2.69).
One may like to think of the present model as a kind of “square”. There
are two “spin systems”, one that consists of the σi and one that consists of the
Sk . These are coupled: the σi determine the Sk and these in turn determine
the behavior of the σi . This philosophy undermines the first proof of Theorem
2.4.2 below.
From now on in this section, q and r always denote the solutions of (2.68).
We recall the definition (2.11)
2.4 The Replica Symmetric Solution 175
1
pN,M (u) = E log exp(−HN,M (σ)) ,
N σ
and we define
1 √ √
p(u) = − r(1 − q) + E log(2ch(z r)) + αE log Eξ exp u(z q + ξ 1 − q) .
2
(2.74)
√ √
where θk = zk q + ξk 1 − q. In this formula, we should think of zi and zk
as representing new randomness, and of ξk as representing “new spins”, so
that Gibbs averages are given by (2.19), and we define
1
pN,M,s = E log Eξ exp(−HM,N,s ) .
N σ
The variables ξk are not the same as in Section 2.2; we could have denoted
them by ξk to insist on this fact, but we preferred simpler notation.
A key point of the present interpolation is that the equations giving the
parameters qs and rs corresponding to the parameters q and r in the case
s = 1 are now
√ √ √ √
qs = Eth2 ( sz rs + 1 − sz r) (2.77)
2
Eξ u (θs ) exp u(θs )
rs = αE (2.78)
Eξ exp u(θs )
where
√ √ √ √
θs = s(z qs + ξ 1 − qs ) + 1 − s(z q + ξ 1 − q) .
To understand the formula (2.77) one should first understand what hap-
pens if we include theaction of a random external field in the Hamiltonian,
i.e. we add a term h i≤N gi σi (where gi are i.i.d. standard Gaussian) to
176 2. The Perceptron Model
the right-hand side of (2.6). Then there is nothing to change to the proof of
Theorem 2.4.1; only the first formula of (2.68) becomes
√
q = E th2 (z r + hg) , (2.79)
Let us now explain how to compute νs (Sk,s u (Sk,s )). Without loss of general-
ity we assume k = M . We make explicit the dependence of the Hamiltonian
on SM,s by introducing the Hamiltonian
√ √ √ √
−HM −1,N,s = u( sSk + 1 − sθk ) + σi 1 − szi r .
k≤M −1 i≤N
2.4 The Replica Symmetric Solution 177
We will not use the fact that the contribution for each k ≤ M is the same,
but rather we regroup the terms as
d r
pN,M,s (u) = − (1 − q)
ds 2
1 1
− νs (R1,2 − q) 1
u (Sk,s 2
)u (Sk,s )−r . (2.83)
2 N
k≤M
1 1 K(D)
νs (f 2 ) ≤ νs (R1,2 − q)2 + νs (f − )2 +
2 2 N
2.4 The Replica Symmetric Solution 179
1 1 K(D)
νs (f 2 ) ≤ νs (R1,2 − q)2 + νs (f 2 ) + ,
2 2 N
which completes the proof using (2.80).
To prepare for the second proof of Theorem 2.4.2, let us denote by
F (α, r, q) the right-hand side of (2.74), i.e.
1 √
F (α, r, q) = − r(1 − q) + E log(2ch(z r)) + αE log Eξ exp u(θ) ,
2
√ √
where θ = z q + ξ 1 − q and let us think of this quantity as a function of
three unrelated variables. For convenience, we reproduce the equations (2.68):
2
2 √ Eξ u (θ) exp u(θ)
q = E th (z r) ; r = αE . (2.87)
Eξ exp u(θ)
so that ∂F/∂r = 0 if
1 2 √
q =1−E 2 √ = E th (z r) .
ch (z r)
Next, if
√ z ξ
θ =z q+ξ 1 − q, θ = √ − √ ,
2 q 2 1−q
we have
∂F r α u (θ) exp u(θ)
= + E θ . (2.88)
∂q 2 2 Eξ exp u(θ)
To integrate by parts, we observe that F1 (z) = Eξ exp u(θ) does not depend
on ξ and
dF1 d √ √
= Eξ exp u(z q + ξ 1 − q) = qEξ u (θ) exp u(θ) .
dz dz
180 2. The Perceptron Model
Second proof of Theorem 2.4.2. We define ZN,M = σ exp(−HN,M (σ)),
and we note the identity
1
ZN,M +1 = ZN,M exp u √ gi,M +1 σi
N i≤N
so that
1 1
pN,M +1 (u) − pN,M (u) = E log exp u √ gi,M +1 σi . (2.90)
N N i≤N
As usual Eξ denotes expectation in all the r.v.s labeled ξ. Here this expecta-
tion is not built in the bracket · , in contrast with what we did e.g in (2.35),
so that it must be written explicitly.
2.4 The Replica Symmetric Solution 181
We note that
by the bound
|ϕ(1) − ϕ(0) − ϕ (0)| ≤ sup |ϕ (v)| . (2.92)
A new differentiation and integration by parts in (2.91) bring out in each
term a new factor (R, − q), so that using (2.69) we now get
K(D)
|ϕ (v)| ≤ K(D)ν (R1,2 − q)2 ≤ .
N
As a special case of (2.91),
1
ϕ (0) = − rν(R1,2 − q) .
2
We shall prove later (when we learn how to prove central limit theorems in
Chapter 9) the non-trivial fact that |ν(R1,2 − q)| ≤ K(D)/N , and (2.92) then
implies
pN,M +1 (u) − pN,M (u) − 1 E log Eξ exp u(θ) ≤ K(D) . (2.93)
N N2
One can then recover the value of pN,M (u) by summing these relations over
M . This is a non-trivial task, since the value of q (and hence of θ) depends
on M .
182 2. The Perceptron Model
Comparing with (2.93) and summing over M then proves (2.75) (and even
better, since the summation is over M , we get a bound αK(D)/N ). This
completes the second proof of Theorem 2.4.2.
It is worth noting that the first proof of Theorem 2.4.2 provides an easy
way to discover the formula (2.74), but that this formula is much harder to
guess if one uses the second proof. In some sense the first proof of Theo-
rem 2.4.2 is more powerful and more elegant than the second proof. However
we will meet situations (in Chapters 3 and 4) where it is not immediate to
apply this method (and whether this is possible remains to be investigated).
In these situations, we shall use instead the argument of the second proof of
Theorem 2.4.2.
ν(A2k+2
n ) ≤ |ν0 (f )| + sup |νt (f )| . (2.96)
t
Since by Lemma 2.2.2 we have ν0 ((ε1 ε2 − q)A 2k+1 ) = 0, using the inequality
to obtain
k k+1
16(2k + 1) 64k 2k + 1 64(k + 1)
|ν0 (f )| ≤ ≤ . (2.97)
N N 4(k + 1) N
Theorem 2.5.2. Assume that u satisfies (2.7) for a certain number D. Then
there is a number K(D), depending on D only, with the following property.
For αK0 (D) ≤ 1 we have
2k k
1 αkK(D)
∀k ≥ 0 , ν u (Sj )u (Sj ) − r
1 2
≤ . (2.99)
N N
j≤M
r. For 1 ≤ n ≤ M we define
so that with the notation (2.87) we have r = α
1
Cn = (u (Sj1 )u (Sj2 ) − r) .
M
n≤j≤M
where f ∼ = (u (SM
1 2
)u (SM ) − r)Cn2k+1 . Let us define
1
C = (u (Sj1 )u (Sj2 ) − r) .
M
n≤j≤M −1
2(2k + 1)D2
ν(f ∼ ) ≤ ν(f ∗ ) + (ν(Cn2k ) + ν(C 2k
)) . (2.103)
M
2k 2k
Since n < M , symmetry among the values of j implies ν(C ) = ν(Cn+1 )
and the induction hypothesis yields
k
∼ ∗ 8(k + 1)D2 K1 (D)k
ν(f ) ≤ ν(f ) + . (2.104)
M M
Since we work under the condition αK0 (D) ≤ 1, we can as well assume that
α ≤ 1, so that M ≤ N and
∗
2k+2 1/τ2 1
|ν(f )| ≤ K2 (D) ν(C 2k+2 1/τ1
) ν (R1,2 − q) + ν(|C | 2k+1
) .
M
(2.105)
We recall the inequality x1/τ1 y 1/τ2 ≤ x + y. Changing x to x/A and y to
Aτ2 /τ1 y in this inequality gives
x
x1/τ1 y 1/τ2 ≤ + Aτ2 /τ1 y .
A
Using this for A = 2K2 (D), x = ν(C 2k+2
) and y = ν((R1,2 − q)2k+2 ), we
deduce from (2.105) that
1 K(D)
|ν(f ∗ )| ≤ ν(C) + K(D)2k+1 ν (R1,2 − q)2k+2 +
2k+2
ν(|C |2k+1 ) .
2 M
(2.106)
We now use the inequality
2(2k + 2)D2
ν(C 2k+2
) ≤ ν(Cn2k+2 ) + ν(|C |2k+1 ) + ν(|Cn |2k+1 ) .
M
We combine this with (2.106), we use that |Cn |2k+1 ≤ 2D2 Cn2k and |C |2k+1 ≤
2D2 C 2k and the induction hypothesis to get
1
|ν(f ∗ )| ≤ ν(Cn2k+2 ) + K(D)2k+2 ν (R1,2 − q)2k+2
2
k
(k + 1)K(D) K1 (D)k
+ ,
M M
Finally we use (2.94) to conclude the proof that ν(Cn2k+2 ) ≤ (K1 (D)(k +
1)/M )k+1 if K1 (D) has been chosen large enough. This completes the induc-
tion.
The following central limit theorem describes the fluctuations of pN,M (u)
(given by (2.11)). We recall that a(k) = Ez k where z is a standard Gaussian
r.v. and that O(k) denotes a quantity A = AN with |A| ≤ KN −k/2 where K
does not depend on N . We recall the notation p(u) of (2.74),
1 √ √
p(u) = − r(1 − q) + E log(2ch(z r)) + αE log Eξ exp u(z q + ξ 1 − q) .
2
Theorem 2.5.3. Let
√ √
b = E(log ch(z r))2 − (E log ch(z r))2 − qr .
Proof. This argument resembles that in the proof of Theorem 1.4.11, and
it would probably help the reader to review the proof of that theorem now.
The present proof is organized a bit differently, avoiding the a priori estimate
of Lemma 1.4.12. The interpolation method of the first proof of Theorem
2.4.2 is at the center of the argument, so the reader should feel comfortable
with this proof in order to proceed. We recall the Hamiltonian (2.76) and
we denote by · s an average for the corresponding Gibbs measure. In the
2.5 Exponential Inequalities 187
proof O(k) will denote a quantity A = AN such that |A| ≤ KN −k/2 where
K does not depend on N or s, and we will take for granted that Theorems
2.5.1 and 2.5.2 hold uniformly over s. (This fact is left as a good exercise for
the reader.)
Consider the following quantities
1
A(s) = log Eξ exp(−HN,M,s (σ))
N σ
√ √ s
RS(s) = E log 2ch(z r) + αE log Eξ exp u(z q + ξ 1 − q) − r(1 − q)
2
V (s) = A(s) − RS(s)
√ √
b(s) = E(log ch(z r))2 − (E log ch(z r))2 − rqs .
The quantities EA(s), RS(s) and b(s) are simply the quantities corresponding
for the interpolating system respectively to the quantities pN,M (u), pu , and
b. Fixing k, we set
ψ(s) = EV (s)k .
We aim at proving by induction over k that ψ(s) = (b(s)/N )k/2 a(k)+O(k+1),
which, for s = 1, proves the theorem. Consider ϕ(s, a) = E(A(s)−a)k , so that
ψ(s) = ϕ(s, RS(s)) and by straightforward differentiation ∂ϕ/∂s is given by
the quantity
k Sj θj
σi √
E √ −√ u (Sj,s ) − √ zi r (A(s) − a) k−1
,
2N
j≤M
s 1−s i≤N
1−s s
√ √
where Sj,s = sSj + 1 − sθj . Next, defining Sj,s
as usual we claim that
∂ϕ/∂s = I + II, where
k 1
I= E − (R1,2 − q)u (Sj,s )u (Sj,s ) − r(1 − R1,2 ) (A(s) − a)
1 2 k−1
2 N s
j≤M
and
kr(1 − q)
IV = − E((A(s) − a)k−1 ) .
2
Similarly we have also II = V + VI where V is the quantity
k(k − 1) 1
E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
(A(s) − a)k−2
2N N s
j≤M
and
rq
VI = − k(k − 1)E((A(s) − a)k−2 ) .
2N
Now,
d ∂ϕ ∂ϕ
ψ (s) = ϕ(s, RS(s)) = (s, RS(s)) + RS (s) (s, RS(s)) . (2.107)
ds ∂s ∂a
Since RS (s) = −r(1 − q)/2 and ∂ϕ/∂a(s, RS(s)) = −kEv(s)k−1 , the second
term of (2.107) cancels out with the term IV and we get
where
k 1
VII = − E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
V (s)k−1
2 N s
j≤M
k(k − 1) 1
VIII = E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
V (s)k−2
2N N s
j≤M
rq
IX = − k(k − 1)EV (s)k−2 .
2N
The idea is that each of the factors R1,2 − q, (N −1 j≤M u (Sj,s
1 2
)u (Sj,s ) − r)
−1/2
and V (s) “counts as N ”. This follows from Theorems 2.5.1 and 2.5.2 for
the first two terms, but we have not proved it yet in the case of V (s). (In the
case of Theorem 1.4.11, the a priori estimate of Lemma 1.4.12 showed that
V (s) “counts as N −1/2 ”.) Should this be indeed the case, the terms VII and
VIII will be of lower order O(k + 1). We turn to the proof that this is actually
the case.
A first step is to show that
K(k) k−1 K(k) k−2
VII ≤ (E|V (s)|k ) k ; VIII ≤ 2
(E|V (s)|k ) k . (2.109)
N N
In the case of VII, setting A = R1,2 − q and
1
B= 1
u (Sj,s 2
)u (Sj,s )−r
N
j≤M
2.5 Exponential Inequalities 189
K(k) k−1
≤ E|V (s)|k ) k .
N
We proceed in a similar manner for VIII, i.e. we write that
k−2
E( AB s V (s)k−1 ) ≤ E |A|k 1/k
s E |B| s (E|V (s)| )
k 1/k k k
K(k) k−2
≤ (E|V (s)|k ) k ,
N
and this proves (2.109).
Since xy ≤ xτ1 + y τ2 for τ2 = k/(k − 2) and τ1 = k/2 we get
1 k−2 1
(E|V (s)|k ) k ≤ k/2 + E|V (s)|k .
N N
This implies in particular
K(k) k−2 1
IX ≤ (E|V (s)|k ) k ≤ K(k) + E|V (s)| k
N N k/2
and
K(k) 1 1
VIII ≤ + E|V (s)| k
≤ K(k) + E|V (s)| k
.
N N k/2 N k/2
Next, we use that xy ≤ xτ1 + y τ2 for τ2 = k/(k − 1) and τ1 = k to get
1 k−1 1 1
(E|V (s)|k ) k ≤ k + E|V (s)|k ≤ k/2 + E|V (s)|k .
N N N
When k is even (so that |V (s)|k = V (s)k and E|V (s)|k = ψ(s)) we have
proved that
1
ψ (s) ≤ K(k) + ψ(s) . (2.110)
N k/2
Thus (2.110) and Lemma A.13.1 imply that
1
ψ(s) ≤ K(k) ψ(0) + .
N k/2
Since it is easy (as the spins decouple) to see that ψ(0) ≤ K(k)N k/2 , we
have proved that for k even we have EV (s)k = O(k). Since E|V (s)|k ≤
(EV (s)2k )1/2 this implies that E|V (s)|k = O(k) for each k so that by (2.109)
we have VII = O(k + 1) and VIII = O(k + 1). Thus (2.108) yields
rq
ψ (s) = − k(k − 1)EV (s)k−2 + O(k + 1)
2N
b (s) k
= (k − 1)EV (s)k−2 + O(k + 1) .
N 2
190 2. The Perceptron Model
Exercise 2.5.4. Rewrite the proof of Theorem 1.4.11 without using the a
priori estimate of Lemma 1.4.12. This allows to cover the case where the r.v.
h is not necessarily Gaussian.
This problem has really two parts. The first (easier) part is to prove results
for the present model. For this, the approach of “separating the numerator
from the denominator” as explained in Section 9.1 seems likely to succeed.
The second part (harder) is to find arguments that will carry over when we
will have much less control over u as in Chapter 9. For this second
√ part, the
work is partially done in [100], but reaching only the rate 1/ N rather than
the correct rate 1/N .
Research Problem 2.5.6. (Level 2) For the present model prove the TAP
equations.
In the present model the configuration space is RN , that is, the configuration
σ can be any point in RN . Given another integer M , we will consider the
Hamiltonian
1
− HN,M (σ) = u √ gi,k σi + h gi σi − κσ2 . (3.1)
k≤M
N i≤N i≤N
Here σ2 = i≤N σi2 , (gi,k )i≤N,k≤M and (gi )i≤N are independent standard
Gaussian r.v.s and κ > 0, h ≥ 0. We will always assume
u ≤ 0 , u is concave. (3.2)
To get a feeling for this Hamiltonian, let us think of u such that, for a
certain number τ , u(x) = −∞ if x < τ and u(x) = 0 if x ≥ τ . Then it
is believable that the Hamiltonian (3.1) will teach us something about the
region in RN defined by
1
∀k ≤ M , √ gi,k σi ≥ τ . (3.3)
N i≤N
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 191
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 3, © Springer-Verlag Berlin Heidelberg 2011
192 3. The Shcherbina and Tirozzi Model
' '
1 ' x − y '2
x+y
∀x, y ∈ RN , (HN,M (x) + HN,M (y)) − HN,M ≥ κ' ' 2 ' .
'
2 2
(3.4)
The beauty of the present model is that it allows the use of powerful tools
from convexity, from which a very strong control of the overlaps will follow.
The overlaps are defined as usual, by
σ · σ
R, = .
N
The case = is now of interest, R, = σ 2 /N . Let us consider the Gibbs’
measure G on RN with Hamiltonian HN,M , that is, for any subset B of RN ,
1
G(B) = exp(−HN,M (σ))dσ , (3.5)
ZN,M B
where dσ denotes Lebesgue’s measure and ZN,M = exp(−HN,M (σ))dσ is
the normalization factor. As usual, we denote by · an average for this Gibbs
measure, so that G(B) = 1B . We use the notation ν(f ) = E f .
The goal of this section is to prove the following.
|u | ≤ D ; |u | ≤ D . (3.7)
Then for k ≤ N/4 we have
k
Kk
ν (R1,1 − ν(R1,1 )) 2k
≤ (3.8)
N
k
Kk
ν (R1,2 − ν(R1,2 ))2k ≤ , (3.9)
N
There is of course nothing special in the value N/4 which is just a convenient
choice. We could replace the condition k ≤ N/4 by the condition k ≤ AN
for any number A, with now a constant K(A) depending on A.
The basic reason why in Theorem 3.1.1 one does not control all moments
is that moments of high orders are very sensitive to what happens on very
small sets or very rare events. For example moments of order about N are
very sensitive to what happens on “events of size exp(−N/K)”. Controlling
events that small is difficult, and is quite besides our main goal. Of course
one can dream of an entire “large deviation theory” that would describe the
extreme situations that can occur with such rarity. In the present model, and
3.1 The Power of Convexity 193
well as in the other models considered in the book, such a theory remains
entirely to be built.
Theorem 3.1.1 asserts that the overlaps are nearly constant. For many of
the systems studied in this book, it is a challenging task to prove that the
overlaps are nearly constant, and this requires a “high-temperature” condi-
tion. In the present model, no such condition is necessary, so one might say
that the system is always in a high-temperature state. One would expect
that it is then a simple matter to completely understand this system, and in
particular to compute
1
lim E log exp(−HN,M (σ))dσ . (3.10)
N →∞,M/N →α N
This, however, does not seem to be the case. At the present time we know how
to handle only very special situations, and the reasons for this will become
apparent as the reader progresses through the present chapter.
Then s 1−s
W (x)dx ≥ U (x)dx V (x)dx . (3.12)
then 2
κ
exp f (x) − f dμ dμ(x) ≤ 4 (3.16)
8A2
and
2k k
8kA2
∀k ≥ 1 , f (x) − f dμ dμ(x) ≤ 4 . (3.17)
κ
The most striking feature of the inequalities (3.16) and (3.17) is that they
do not depend on the dimension of the underlying space. When H(x) =
x2 /2, μ is the canonical Gaussian measure and (3.17) recovers (1.47) (with
worse constants).
Proof. Define the functions W, U, V as follows:
κ
W (x) = exp(−H(x)) ; V (y) = exp d(y, B)2 − H(y)
2
and
U (x) = 0 if x ∈
/B
U (x) = exp(−H(x)) if x ∈ B .
These functions satisfy (3.11) with s = 1/2. Indeed, it suffices to consider the
case where x ∈ B, in which case (3.11) reduces to
x+y 1 κ
−H ≥ −H(x) − H(y) + d(y, B)2 ,
2 2 2
which follows from (3.4) and the fact that d(y, B) ≤ x − y. Then (3.12)
holds, and for the previous choices it means exactly (3.14).
To prove (3.16) we consider a median m of f for μ, that is number m
such that μ({f ≤ m}) ≥ 1/2 and μ({f ≥ m}) ≥ 1/2. The set B = {f ≤ m}
then satisfies μ(B) ≥ 1/2 and since (3.15) implies
f (x) ≤ m + Ad(x, B)
κ
exp (f (x) − m)2 dμ(x) ≤ 2 . (3.18)
{f ≥m} 2A2
κ
exp (f (x) − f (y))2 dμ(x)dμ(y) ≤ 4 ,
8A2
Let us point out that in Theorem 3.1.4 the function H can take the value
+∞. Equivalently, this theorem holds when μ is a probability on a convex
set C with a density proportional to exp ψ(σ), where ψ satisfies
' '
1 x+y ' x − y '2
(ψ(x) + ψ(y)) − ψ ≤ −κ '
' 2 ' .
' (3.21)
2 2
The argument that allows to deduce (3.16) from (3.19) is called a sym-
metrization argument. This argument proves also the following. For each
number m, each function f and each probability μ we have
2k
f− f dμ dμ ≤ 22k (f − m)2k dμ . (3.22)
To see this we simply write, using Jensen’s inequality in the second line and
that (a + b)2k ≤ 22k−1 (a2k + b2k ) in the third line,
196 3. The Shcherbina and Tirozzi Model
2k 2k
f− f dμ dμ = f (x) − f (y)dμ(y) dμ(x)
2k
≤ (f (x) − f (y)) dμ(x)dμ(y)
2k
= ((f (x) − m) − (f (y) − m)) dμ(x)dμ(y)
≤ 22k (f − m)2k dμ .
The essential feature of the present model is that any realization of the
Gibbs measure with Hamiltonian (3.1) satisfies (3.16) and (3.17). We will
need to use (3.16) for functions such as x2 that are not Lipschitz on RN ,
but are Lipschitz when x is not too large. For this, it is useful to know that
the Gibbs
√ measure with Hamiltonian (3.1) essentially lives on a ball of radius
about N , and the next two lemmas prepare for this. In this chapter and
the next, we will use many times the fact that
N
2 2 π N/2
exp(−tσ )dσ = exp(−tx )dx = . (3.23)
t
Proof. Using the definition of μ in the first line, that U ≤ 0 in the second
line, completing the squares in the third line and using (3.23) in the last line,
we obtain
κ 1 κ
exp σ2 dμ(σ) = exp U (σ) − σ2 + ai σi dσ
2 Z 2
i≤N
1 κ
≤ exp − σ + 2
ai σi dσ
Z 2
i≤N
1 κ ai 2 1 2
= exp − σi − + ai dσ
Z 2 κ 2κ
i≤N i≤N
1 1 2 κ
= exp ai exp − σ dσ 2
Z 2κ 2
i≤N
3.1 The Power of Convexity 197
N/2
1 2π 1 2
= exp ai .
Z κ 2κ
i≤N
Lemma 3.1.6. Assume (3.6), that is u(x) ≥ −D(1 + |x|) for a certain num-
ber D and all x. Then we have
(
1 κ N/2 M
≤ exp D M + 2
gi,k .
ZN,M π κN
i≤N,k≤M
Proof. The proof relies on the rotational invariance of the Gaussian mea-
sure γ on RN of density (κ/π)N/2 exp(−κσ2 ) with respect to Lebesgue’s
measure. For x ∈ RN we have
1 1
|x · σ|dγ(σ) = x ≤ x , (3.24)
πκ κ
because the rotational invariance of γ reduces this to the case N = 1.
Letting gk = (gi,k )i≤N , we have
gk · σ
ZN,M = exp u √ − κσ + h
2
gi σi dσ
k≤M
N i≤N
π N/2 gk · σ
= exp u √ +h gi σi dγ(σ)
κ N
k≤M i≤N
π N/2 gk · σ
≥ exp u √ +h gi σi dγ(σ)
κ N
k≤M i≤N
π N/2 gk · σ
= exp u √ dγ(σ) ,
κ N
k≤M
using Jensen’s inequality in the third line and since σi dγ(σ) = 0. Now,
using (3.6) and (3.24) for x = gk yields
gk · σ 1 gk
u √ dγ(σ) ≥ −D M + √ √
N κ N
k≤M k≤M
(
1 M
≥ −D M + √ gk 2 ,
κ N
k≤M
We set (
B∗ = N + 2 +
gi,k gi2 . (3.27)
i≤N,k≤M i≤N
using (3.30). The second term on the right-hand side of (3.36) is handled
similarly, using now that b2k ≤ (KB ∗ )k by (3.30) and Jensen’s inequality.
To prove (3.35), let us consider a parameter a to be chosen later and let
σ2
|ϕ(σ)| ≤ 1{σ≥a} ,
N
and, using (3.22) for m = 0 in the first inequality and the Cauchy-Schwarz
inequality in the second line,
2k
σ2
(ϕ − ϕ ) 2k
≤2 2k
ϕ2k
≤2 2k
1{σ≥a} (3.38)
N
4k 1/2
σ2
≤ 22k 1{σ≥a} 1/2
.
N
Again, here, to understand what this means the reader must keep in mind
that the letter K might denote different constants at different occurrences.
The complete argument is that if
κσ2
exp ≤ exp K1 B ∗ ,
2
then
κa2
1{σ≥a} ≤ exp K1 B ∗ − ,
2
√
so that (3.40) holds for a = K2 B ∗ whenever K2 ≥ 2(K1 + 2)/κ.
Therefore with this choice of a we have, plugging (3.40) and (3.39) into
(3.38),
2k
∗ KB ∗
(ϕ − ϕ ) 2k
≤ exp(−B ) .
N
Since R1,1 = σ2 /N = f + ϕ, using that (x + y)2k ≤ 22k (x2k + y 2k ) and
(3.37) we get the estimate
(R1,1 − R1,1 )2k ≤ 22k (f − f )2k + (ϕ − ϕ )2k
k 2k
KB ∗ k ∗ KB ∗
≤ + exp(−B ) .
N2 N
so that
2k k 2k k
∗ KB ∗ k KB ∗ K 2B∗k
exp(−B ) ≤ = ,
N B∗ N N2
and, equivalently,
2 1/2 1/2
gi,k yk ≤B yk2 . (3.47)
i≤N k≤M k≤M
1/2 √ √
has a Lipschitz constant A ≤ KB k≤M yk2 / N = KB y/ N .
Proof. Since
∂ 1
f (σ) = √ gi,k u (Sk )yk ,
∂σi N k≤M
If we think that B and B 2 are basically of order N , this shows that ∇2
is about √
1/N , i.e. that the functions R1,2 and R1,1 have Lipschitz constants
about 1/ N .
Proof. With the customary abuse of notation we have
∂ 1
R1,1 = √ ( R1,1 σi1 u (Sk1 ) − R1,1 σi1 u (Sk1 ) )
∂gi,k N
1
= √ ( f (σ 1 )σi1 u (Sk1 ) ) ,
N
where f (σ 1 ) = R1,1 − R1,1 . We define σ̇i1 = σi1 − σi and u̇ (Sk1 ) = u (Sk1 ) −
u (Sk1 ) . Since f = 0 the identity
holds. Thus
∂ 2
1
R1,1 = f (σ 1 )σi1 u (Sk1 ) 2
≤ 2(I + II) ,
∂gik N
i,k i,k
and
1 2
II = σi f (σ 1 )u̇ (Sk1 ) 2
N
i≤N k≤M
1
= σi 2 f (σ 1 )f (σ 2 )u̇ (Sk1 )u̇ (Sk2 )
N
i≤N k≤M
1
= σi 2 f (σ 1 )f (σ 2 )(U(σ 1 ) − U ) · (U(σ 2 ) − U ) .
N
i≤N
KM 2 KB ∗
I≤ f ≤ .
N N2
We note that (3.30) used for k = 1 implies
1 1 2 1 KB ∗
R1,2 = σi 2
≤ σi = σ2 ≤ . (3.51)
N N N N
i≤N i≤N
We take care in a similar manner of the term ∂ R1,1 /∂gi , and the case of
R1,2 is similar.
C = {g; B ∗ ≤ LN ; B 2 ≤ LN } ,
(To see that this is possible we recall Lemma A.9.1 and that E exp B ∗ /4 ≤
exp LN by (3.33).) Let us think of R1,1 (resp. R1,2 ) as a function f (g),
√∇f ≤
so that by Proposition 3.1.16, on C the gradient ∇f of f satisfies 2
By (3.17) we have
206 3. The Shcherbina and Tirozzi Model
k
Kk
∀k ≥ 1 , (f − m)2k dγ ≤ , (3.56)
N
where m = f dγ . The rest of the proof consists simply in checking as
expected that the set C c is so small that (3.52) and (3.53) follow from (3.56).
This is tedious and occupies the next half page. By definition of γ , for any
function h we have
hW dσ hdγ
hdγ = C
= C .
C
W dσ γ(C)
Thus
k
Kk
E 1C (f − m)2k = (f − m) dγ = γ(C)
2k
(f − m) dγ ≤ 2k
,
C C N
and
E(f − m)2k = E 1C (f − m)2k + E 1C c (f − m)2k
k
Kk
≤ + E 1C c (f − m)2k
N
k
Kk 1/2
≤ + P(C c )1/2 E(f − m)4k .
N
∗
Using (3.51), we see that |f | ≤ KB
/N , and since γ is4ksupported by C and
∗
B ≤ LN on C we have |m| = | f dγ | ≤ K. Also (Ef )1/2 ≤ K k by (3.30)
and (3.32). Therefore (E(f − m)4k )1/2 ≤ K k . Hence, recalling that by (3.54)
we have P(C c ) ≤ exp(−N ) and using that exp(−N/2) ≤ (2k/N )k by (3.41)
we obtain
k k
Kk N Kk
E(f − m)2k ≤ + L exp − Kk ≤
N 2 N
Theorem 3.1.18. For k ≤ N/4, and assuming (3.6) and (3.7) we have
k
Kk
ν (R1,1 − ν(R1,1 ))2k ≤ (3.57)
N
k
Kk
ν (R1,2 − ν(R1,2 ))2k
≤ , (3.58)
N
The notation, similar to that of the previous chapter, should not hide that
the procedure is different. The numbers q and ρ are not defined through
a system of equations, but by “the physical system”. They depend on N
and M . It would help to remember the definition (3.59) now. The purpose
of the present section is to show that q and ρ nearly satisfy the system of
“replica-symmetric” equations (3.69), (3.76) and (3.104) below. These equa-
tions should in principle allow the computation of q and ρ.
Since the cavity method, i.e. the idea of “bringing forward the influence
of the last spin” was successful in previous chapters, let us try it here. The
following approach is quite close to that of Section 2.2 so some familiarity
with that section would certainly help the reader who wishes to follow all
the details. Consider two numbers r and r, with r ≤ r. Consider a centered
Gaussian r.v. Y , independent of all the other r.v.s already considered, with
E Y 2 = r, and consider 0 < t < 1. We write
1 t
Sk,t = √ gi,k σi + gN,k σN , (3.60)
N N
i≤N −1
n(n + 1)
n+1 n+2
+ νt εn+1 εn+2 u (SM,t )u (SM,t )f (3.63)
2
III = −r νt (ε ε f ) − n νt (ε εn+1 f )
1≤< ≤n ≤n
n(n + 1)
+ νt (εn+1 εn+2 f ) (3.64)
2
r
IV = − νt (ε2 f ) − nνt (ε2n+1 f ) (3.65)
2
≤n
1
V = (r − r) νt (ε2 f ) − nνt (ε2n+1 f ) . (3.66)
2
≤n
1
u (Sk,t )u (Sk,t ),
M
k≤M
and we can expect these to behave like constants. If we make the proper
choice for r and r, the terms II and III will nearly cancel each other, while
the term I will nearly cancel out with IV + V. For these choices (that will
not be very hard to guess) we will have that νt (f ) 0, i.e. ν(f ) ν0 (f ). The
strategy to prove the replica-symmetric equations will then (predictably) be
as follows. Using symmetry between sites, we have ρ = ν(R1,1 ) = ν(ε21 ), and
ν(ε21 ) ν0 (ε21 ) is easy to compute because the last spin decouples for ν0 .
Before we start the derivation of the replica-symmetric equations, let us
try to describe the overall strategy. This is best done by comparison with
the situation of Chapter 2. There, to compute the quantity q, that contained
information about the spins σi , we needed an auxiliary quantity r, that con-
tained information about the “spins” Sk . We could express r as a function
of q and then q as a function of r. Now we have two quantities q and ρ that
contain information about the spins σi . To determine them we will need the
two auxiliary quantities r and r, which “contain information about the spins
Sk ”. We will express r and r as functions of q and r, and in a second stage
we will express r and r as functions of q and ρ, and reach a system of four
equations with four unknown.
We now define r and r as functions of q and ρ. Of course the forthcom-
ing formulas have been guessed by analyzing the “cavity in M ” arguments
of Chapter 2. Consider independent √ standard Gaussian r.v.s ξ, z. Consider
√
numbers 0 ≤ x < y, the r.v. θ = z x + ξ y − x, and define
2
Eξ (u (θ) exp u(θ))
Ψ (x, y) = αE
Eξ exp u(θ)
2
α Eξ (ξ exp u(θ))
= E , (3.67)
y−x Eξ exp u(θ)
using integration by parts (of course as usual Eξ denotes averaging in ξ only).
We also define
Eξ (u (θ) + u 2 (θ)) exp u(θ)
Ψ (x, y) = αE
Eξ exp u(θ)
2
α Eξ (ξ − 1) exp u(θ)
= E , (3.68)
y−x Eξ exp u(θ)
integrating by parts twice. We set
r = Ψ (q, ρ); r = Ψ (q, ρ) . (3.69)
1/2 1/2
This makes sense because by the Cauchy-Schwarz inequality R1,2 ≤ R1,1 R2,2
and thus q = ν(R1,2 ) ≤ ρ = ν(R1,1 ). We also observe that from the first line
of (3.67) and (3.68) we have r, r ≤ K(D). We first address a technical point
by proving that r ≤ r.
210 3. The Shcherbina and Tirozzi Model
= exp w(x)dx ,
by integration by parts.
i.e. 2
E((ξ 2 − 1) exp v(ξ)) E(ξ exp v(ξ))
≤ . (3.73)
E exp v(ξ) E exp v(ξ)
√
Now we fix z and we use this inequality for the function v(x) = u(z q +
√
x ρ − q). Combining with (3.67) and (3.68) yields the result.
We are now in a position to guess how to express q and ρ as functions of
r and r.
1 r + h2
ρ= + + δ1 (3.74)
2κ + r − r (2κ + r − r)2
r + h2
q= + δ2 , (3.75)
(2κ + r − r)2
1 r + h2
ρ= + (3.76)
2κ + r − r (2κ + r − r)2
r + h2
q= . (3.77)
(2κ + r − r)2
These four equations are the “replica-symmetric” equations of the present
model. Please note that (3.76) and (3.77) are exact equations, in contrast
with (3.74) and (3.75). When we write the equations (3.69), (3.76) and (3.77),
we think of q, ρ, r, r as variables, while in (3.74) and (3.75) they are given by
(3.59). This follows our policy that a bit of informality is better than bloated
notation. This will not be confusing. Until the end of this section, q and ρ
keep the meaning (3.59), and afterwards we will revert to the notation qN,M
and ρN,M .
Proof. Symmetry between sites entails ρ = ν(R1,1 ) = ν(ε21 ), q = ν(R1,2 ) =
ν(ε1 ε2 ), so it suffices to show that ν0 (ε21 ) is given by the right-hand side of
(3.76) and ν0 (ε1 ε2 ) is given by the right-hand side of (3.77).
We observe that, for ν0 , the last spin decouples from the others (which is
a major reason behind the definition of ν0 ) so that
1 ε2
ν0 (ε21 ) = E ε2 exp ε(Y + hgN ) − (2κ + r − r) dε (3.78)
Z 2
2
1 ε2
ν0 (ε1 ε2 ) = E ε exp ε(Y + hgN ) − (2κ + r − r) dε , (3.79)
Z 2
where
ε2
Z= exp ε(Y + hgN ) − (2κ + r − r) dε .
2
We compute these Gaussian integrals as follows. If z is a centered Gaussian
r.v., and d is a number, writing z 2 edz = z(zedz ), integration by parts yields
E zedz = dE z 2 E edz .
Thus
E z 2 edz
= E z 2 + d2 (E z 2 )2 .
E edz
Using this for d = Y + hgN , E z 2 = 1/(2κ + r − r) we get
1 (Y + hgN )2
ε21 0 = + ,
2κ + r − r (2κ + r − r)2
and, taking expectation,
212 3. The Shcherbina and Tirozzi Model
1 r + h2
ν0 (ε21 ) = + , (3.80)
2κ + r − r (2κ + r − r)2
and we compute ν0 (ε1 ε2 ) similarly.
We now start the real work, the proof that when N is large, δ1 and δ2
in Proposition 3.2.4 are small. In order to have a chance to make estimates
using Proposition 3.2.1, we need some integrability properties of ε = σN , and
we address this technical point first. We will prove an exponential inequality,
which is quite stronger than what we really need, but the proof is not any
harder than that of weaker statements. We start by a general principle.
Lemma 3.2.5. Consider a concave function T (σ) ≤ 0 on RN , numbers
(ai )i≤N , numbers κ, κ > 0 and a convex subset C of RN . Consider the prob-
ability measure G on RN given by
1
∀B , G(B) = exp T (σ) − κσ2 − κ σN 2
+ ai σi dσ , (3.81)
Z B∩C
i≤N
C = {ε ∈ R ; ∃ρ ∈ RN −1 , (ρ, ε) ∈ C} .
Then this function is concave and the law μ of σN under G is the probability
measure on C with density proportional to exp w(x), where
The definition of μ as the law of σN under G implies that for any function v,
1
v(x)dμ(x) = v(ε) exp T (σ)−κσ −κ σN +
2 2
ai σi dσ . (3.84)
Z C
i≤N
3.2 The Replica-Symmetric Equations 213
1
= v(ε) exp(f (ε) − (κ + κ )ε2 + aN ε)dε
Z C
1
= v(ε) exp w(ε)dε .
Z C
This proves that μ has a density exp w(ε). To finish the proof it suffices to
show that f is concave. Let us write
C(ε) = {ρ ∈ RN −1 ; (ρ, ε) ∈ C} ,
νt (exp((f1 + f2 )2 /K)) ≤ K (of course for a different K). This follows from
the convexity of the function x → exp x2 .
Proof. The Gibbs measure corresponding to the
Hamiltonian (3.61) is given
by the formula (3.81) for C = RN , T (σ) = k≤M u(Sk,t (σ)), ai = hgi if
√
i < N , aN = hgN + 1 − tY and κ = (1 − t)(r − r)/2. Lemma 3.2.5 implies
that the function
f (σN ) = log exp u(Sk,t ) − κ 2
σi + h gi σi dρ (3.85)
k≤M i≤N −1 i≤N −1
(σN − σN t )2
exp ≤4, (3.87)
K t
Therefore we get √
t
f (0) = √ gN,k u (Sk,0 ) ,
N k≤M
where · is a certain Gibbs average that does not depend on the r.v.s gN,k .
Let us denote by E0 expectation in the r.v.s gN,k only. Then, since |u | ≤ D,
t
E0 f (0)2 = u (Sk,0 ) 2
≤ αD2 ,
N
k≤M
f (0)2 1
E0 exp ≤ ≤2.
4αD2 1 − 1/2
Therefore
f (0)2 f (0)2
E exp = EE 0 exp ≤2.
4αD2 4αD2
Despite this excellent control, the fact that σN is not bounded does create
hardship. For example, it does not seem possible to use the argument of
Lemma 2.3.7 to compare ν(f ) and νt (f ) when f ≥ 0.
We turn to the study of terms I and II of Proposition 3.2.1. Let us consider
the Hamiltonian
− HN,M −1,t (σ) = u(Sk,t (σ)) − κσ2 + h gi σi
k≤M −1 i≤N
√ σ2
+ σN 1 − tY − (1 − t)(r − r) N . (3.88)
2
The difference with (3.61) is that the summation is over k ≤ M − 1 rather
than over k ≤ M . We denote by · t,∼ an average for the corresponding Gibbs
measure.
We consider standard Gaussian r.v.s z, (ξ ) that are independent of all
the other r.v.s already considered, and we set
√ √
θ = z q + ξ ρ − q . (3.89)
For 0 ≤ v ≤ 1 we define
√ √
Sv = vSM,t + 1 − vθ . (3.90)
Here as usual Eξ means expectation in all the r.v.s labeled ξ, or, equivalently
here, in ξ 1 . This really is the same as definition (2.35). The notation is a bit
different (there is an expectation Eξ in the denominator) simply because in
(2.35) we made the convention that this expectation Eξ was “built-in” the
average · t,∼ and we do not do it here (for the simple reason that we do
not want to have to remind the reader of this each time we write a similar
formula). Obviously we have
νt,1 (f ) = νt (f ) .
The magic of the definition of νt,v is revealed by the following, whose proof
is nearly identical to that of Lemma 2.3.1.
n
Lemma 3.2.7. Consider a function f on ΣN . Then we have
∀ , 1 ≤ ≤ 4 , |u() | ≤ D . (3.95)
The proof is nearly identical to that of (2.59) (except that one does not use
Hölder’s inequality in the last step). The new feature is that there are more
terms when we integrate by parts. Defining
1 1
Sv = √ SM,t − √ θ ,
2 v 2 1−v
we then have
3.2 The Replica-Symmetric Equations 217
1
ESv Sv = (R, − ρ) ,
2
while in (2.59) we had ESv Sv = 0. This is what creates the new terms
νt,v (|f ||R, − ρ|) in (3.96) compared to (2.59). In (2.59) these terms do not
occur because there R, = 1 = ρ.
We have proved in Theorem 3.1.18 that, with respect to ν, we have R1,2
q and R1,2 ρ. If the same is true with respect to νt,v then (3.96) will go
a long way to fulfill our program that the terms of Proposition 3.2.1 nearly
cancel out.
The first step will be to prove that in the bound (3.96) we can replace νt,v
by νt in the right-hand side. (So that it to use this bound it will suffice to know
that R1,2 q and R1,2 ρ for νt ). Unfortunately we cannot immediately
write a differential inequality such as |dνt,v (f )/dv| ≤ K(n)νt,v (f ) when f ≥ 0
because it is not true that the quantities |R, −ρ| and |R, −q| are bounded.
But it is true that they are “almost bounded” in the sense that they are
bounded outside an exponentially small set, namely that we can find K for
which
The reader wishing to skip the proof of this purely technical point can
jump ahead to (3.105) below. To prove these inequalities, we observe from
(3.91) that when f is a function on ΣN (that does not depend on the r.v.s
ξ ) then
νt,v (f ) = E f t,v ,
where · is a Gibbs average for the Hamiltonian
t,v
√ √ √
− H(σ) = u(Sk,t (σ)) + uv ( vSM,t (σ) + 1 − v qz) − κσ2
k<M
√ σ2
+h gi σi + σN 1 − tY − (1 − t)(r − r) N , (3.99)
2
i≤N
Another
√ proof of the concavity of uv is as follows. Writing X = x +
√
ξ 1 − v ρ − q, the concavity of uv i.e. the fact uv ≤ 0 means that
218 3. The Shcherbina and Tirozzi Model
2
E((u (X)2 + u (X))eu(X) ) Eu (X)eu(X)
≤ ,
Eeu(X) Eeu(X)
an inequality
√ that we can prove by applying (3.73) to the function v(ξ) =
√
u(x + ξ 1 − v ρ − q) and integration by parts as in (3.67) and (3.68).
There is nothing to change to the proof of Lemma 3.2.6 to obtain
σ2
νt,v exp N ≤ K . (3.100)
K
where · t,v denotes an average for the Gibbs measure with Hamiltonian
(3.99). We now prove that
σ2
νt,v exp ≤ exp LN . (3.102)
K
For this, we recall that E exp(B ∗ /4) ≤ exp LN by (3.33), and we denote
by K0 the constant K in (3.101). We define K1 = 8K0 /κ. Using Hölder’s
inequality in the first inequality and (3.101) in the second inequality we get
2/κK1
σ2 σ2 κ
νt,v exp = E exp ≤ E exp σ2
K1 K1 t,v 2 t,v
∗
2K0 ∗ B
≤ E exp B = E exp ≤ exp LN .
κK1 4
Since
N |R1,2 | = |σ 1 · σ 2 | ≤ σ 1 σ 2 ,
we have
|R1,2 | ≥ t ⇒ (N t)2 ≤ σ 1 2 σ 2 2
⇒ σ 1 2 ≥ tN or σ 2 2 ≥ tN ,
Since by (3.28) and (3.32) we have |ρ| ≤ K and |q| ≤ K, (3.97) and (3.98)
follow. Let us also note from (3.102) by a similar argument that
νt,v (R1,2 − q)8 ≤ K ; νt,v (R1,1 − ρ)8 ≤ K , (3.104)
using (3.104) and (3.98). We then proceed in a similar manner for |R, − q|.
4
In this fashion, we deduce from (3.96) that, if f is any function on ΣN
then
d
νt,v (f ) ≤ Kνt,v (|f |) + K exp(−N ) sup νt,v (f 2 )1/2 . (3.105)
dv v
We will prove only (3.108), since the proof of the other relations is entirely
similar. Let ϕ(v) = ανt,v (ε ε u (Sv )u (Sv )f ). Lemma 3.2.7 implies
|ϕ(1) − ϕ(0)| = |ανt (ε ε u (SM,t
)u (SM,t )f ) − rν0 (ε ε f )| . (3.109)
On the other hand, |ϕ(1)−ϕ(0)| ≤ supv |ϕ (v)|, and by (3.96) (used for ε ε f
rather than f ) we obtain
|ϕ (v)| ≤ K νt,v (|ε ε f ||R1 ,1 − ρ|)
1 ≤3
1
+ νt,v (|ε ε f ||R1 ,2 − q|) + νt,v (|ε ε f |) . (3.110)
N
1≤1 <2 ≤4
Proof. Since q = q1 it suffices to prove that qt = dqt /dt satisfies |qt | ≤ K/N
(and similarly for ρ). Since qt = νt (R1,2 ), qt = νt (R1,2 ) is given by Proposition
3.2.1 for f = R1,2 .
A key observation is that the five terms of this proposition cancel out
(as they should!) if f is a constant, i.e. is not random and does not depend
on σ 1 , σ 2 , . . .. Therefore to evaluate νt (R1,2 ) we can in each of these terms
replace f = R1,2 by R1,2 − qt , because the contributions of qt to the various
terms cancel out.
The point of doing this is that the quantity R1,2 − qt is small (for νt )
as is shown by√(3.112), and therefore each of the terms of Proposition 3.2.1
is at most K/ N . This is seen by using that |u | ≤ D, |u | ≤ D, Hölder’s
inequality, (3.112) (and (3.100) to take care of the terms ε ε ). √
This argument is enough to prove (3.112) with a bound K/ N rather
than K/N . This is all what is required to prove Proposition 3.2.12 below.
The rest of this proof describes the extra work required to reach the
correct rate K/N (just for the beauty of it).
We proceed as in the proof of Proposition 3.2.10 (with now f = R1,2 − qt )
but in the right-hand side of (3.110) we use Hölder’s inequality as in
1/4
νt,v (|ε ε f (R1 ,1 − ρ)|) ≤ νt,v ((ε ε )2 )1/2 νt,v (f 4 )1/4 νt,v (R1 ,1 − ρ)4 ,
Similarly we get
K K
|ρt | ≤ √ (|q − qt | + |ρ − ρt |) + ,
N N
so that if ψ(t) = |q − qt | + |ρ − ρt | the right derivative ψ (t) satisfies
ψ(t) 1 K
|ψ (t)| ≤ K √ + ≤ Kψ(t) + .
N N N
Since ψ(1) = 0, Lemma A.13.1 shows that ψ(t) ≤ K/N .
Using (3.113), (3.114) and (3.112) we get νt ((R1,2 − q) ) 4 1/4
≤ K/N and
νt ((R1,1 − ρ)4 )1/4 ≤ K/N , so that combining with (3.107) we have proved
that |νt (f )| ≤ K/N for f = ε21 or f = ε1 ε2 , and therefore the following.
Proposition 3.2.12. We have
K K
|ν(ε21 ) − ν0 (ε21 )| ≤ √ ; |ν(ε1 ε2 ) − ν0 (ε1 ε2 )| ≤ √ . (3.115)
N N
Combining with Proposition 3.2.4, this shows that (q, ρ) is a solution of the
system of
√ replica-symmetric equations (3.69), (3.76) and (3.104) “with accu-
racy K/ N ”. Letting N → ∞ this proves in particular that this system does
have a solution, which did not seem obvious beforehand.
Let us consider the function
√ √
F (q, ρ) = αE log Eξ exp u(z q + ξ ρ − q)
1 q 1 h2
+ + log(ρ − q) − κρ + (ρ − q) , (3.116)
2ρ−q 2 2
which is defined for 0 ≤ q < ρ. It is elementary (calculus and integration by
parts as in Lemma 2.4.4) to show that the conditions ∂F/∂ρ = 0 = ∂F/∂q
mean that (3.69), (3.76) and (3.104) are satisfied.
We would like to prove that for large N the quantity
1
E log exp(−HN,M,t (σ))d(σ) (3.117)
N
is nearly F (q, ρ) + log(2eπ)/2. Unfortunately we see no way to do this unless
we know something about the uniqueness of the solutions of (3.69), (3.76),
(3.104).
Research Problem 3.2.13. (Level 2) Find general conditions under which
the equations (3.69), (3.76), (3.104) have a unique solution.
As we will show in the next section, Shcherbina and Tirozzi managed to
solve this problem in a very important case. Before we turn to this, we must
however address the taste of unfinished work left by Proposition 3.2.12. We
turn to the proof of the correct result.
3.2 The Replica-Symmetric Equations 223
We have
K
|νt (f (R, − q)) − E f (R, − q) t,∼ |
, ≤
N
by proceeding as in (2.65) because,
√ as expected, the extra factor R, − q
allows one to gain a factor 1/ N (and similarly for R, − ρ). Therefore to
prove (3.119) it suffices to prove that
K K
|νt (f (R, − q))| ≤ ; |νt (f (R, − ρ))| ≤ .
N N
√
In the same manner that we have proved the inequality |νt (f )| ≤ K/ N , we
√ |νt (f (R, −q))| ≤ K/N and |νt (f (R, −ρ))| ≤ K/N (gaining
show now that
In Section 3.2 these were denoted simply q and ρ, but we now find it more
convenient to denote in this section by q and ρ two “variables” with
0 ≤ q < ρ.
As pointed out in Section 3.1, the case where exp u(x) = 1{x≥τ } is of
special interest. In this case, we will prove that the system of equations (3.69),
(3.76) and (3.104) has a unique solution. The function (3.116) takes the form
√ √ 1 q
F (q, ρ) = αE log Pξ (z q + ξ ρ − q ≥ τ ) +
2ρ−q
1 h2
+ log(ρ − q) − κρ + (ρ − q) . (3.121)
2 2
We observe that
3.3 Controlling the Solutions of the RS Equations 225
√
√ √ τ −z q
Pξ (z q + ξ ρ − q ≥ τ ) = N √ ,
ρ−q
where
N (x) = P(ξ ≥ x) .
We define
√
τ − z q0
RS0 (α) = αE log N √
ρ0 − q0
1 q0 1 h2
+ + log(ρ0 − q0 ) − κρ0 + (ρ0 − q0 ) . (3.123)
2 ρ0 − q0 2 2
The reader will recognize that this is F (q0 , ρ0 ), where F is defined in (3.121).
The value of κ and h will be kept implicit. (We recall that τ has been fixed
once and for all.) The main result of this chapter is as follows.
Theorem 3.3.2. Consider α0 < 2, 0 < κ0 < κ1 , h0 > 0, ε > 0. Then we
can find ε > 0 with the following property. Consider any concave function
u ≤ 0, with the following properties:
x ≥ τ ⇒ u(x) = 0 (3.124)
exp u(τ − ε ) ≤ ε (3.125)
u is four times differentiable and |u() | is bounded for 1 ≤ ≤ 4 . (3.126)
Then for N large enough, and if HN,M denotes the Hamiltonian (3.1) we
have
1
E log exp(−HN,M (σ))dσ − RS0 M + 1 log(2eπ) ≤ ε (3.127)
N N 2
whenever κ0 ≤ κ ≤ κ1 , h ≤ h0 , M/N ≤ α0 .
In particular, we succeed in computing
1
lim lim E log exp(−HN,M (σ))dσ . (3.128)
u→1{x≥τ } N →∞,M/N →α N
In Volume II we will prove the very interesting fact that the limits can be
interchanged, solving the problem of computing the “part of the sphere SN
that belongs to the intersection of M random half-spaces”.
Besides Theorem 3.3.1, the proof of Theorem 3.3.2 requires the following.
226 3. The Shcherbina and Tirozzi Model
Proposition 3.3.3. Consider κ0 > 0, α0 < 2 and h0 > 0. Then we can find
a number C, depending only on κ0 , α0 and h0 , such that if κ ≥ κ0 , α ≤ α0
and h ≤ h0 , then for any concave function u ≤ 0 that satisfies (3.124), and
whenever q and ρ satisfy the system of equations (3.69), (3.76) and (3.104),
we have
1
q, ρ ≤ C ; ≤C. (3.129)
ρ−q
We recall that the numbers q0 and ρ0 are given by (3.122).
Corollary 3.3.4. Given 0 < κ0 < κ1 , α0 < 2, h0 > 0 and ε > 0, we can
find a number ε > 0 such that whenever the concave function u ≤ 0 satisfy
(3.124) and (3.125), whenever κ0 ≤ κ ≤ κ1 , h ≤ h0 and α ≤ α0 , given any
numbers 0 ≤ q ≤ ρ that satisfy the equations (3.69), (3.76) and (3.104) we
have
|q − q0 | ≤ ε ; |ρ − ρ0 | ≤ ε .
It is here that Theorem 3.3.1 is really needed. Without it, it seems very
difficult to control q and ρ.
Proof. This is a simple compactness argument now that we know (3.129).
We simply sketch the proof of this “soft” argument. Assume for contradiction
that we can find a sequence εn → 0, a sequence un of functions that satisfies
(3.124) and (3.125) for εn rather than ε , numbers κ0 ≤ κn ≤ κ1 , hn ≤ h0 ,
αn ≤ α0 , numbers qn and ρn that satisfy the corresponding equations (3.69),
(3.76) and (3.104), and are such that |qn − q0 | ≥ ε and |ρn − ρ0 | ≥ ε. By
Proposition 3.3.3 we have qn , ρn , 1/(qn − ρn ) ≤ C. This boudedness permits
us to take converging subsequences. So, without loss of generality we can
assume that the sequences κn , hn , αn , qn and ρn have limits called κ, h, α,
q and ρ respectively. Moreover 1/(q − ρ) < C, so in particular ρ < q. Finally
we have |q − q0 | ≥ ε and |ρ − ρ0 | ≥ ε. If one writes explicitly the equations
(3.122), it is obvious from the fact that (qn , ρn ) is a solution to the equations
(3.69), (3.76) and (3.104) (for κn and hn rather than for κ and h) that (q, ρ)
is a solution to these equations. But this is absurd, since by Theorem 3.3.1
one must then have q = q0 and ρ = ρ0 .
Once this has been obtained the proof of Theorem 3.3.2 is easy following
the approach of the second proof of Theorem 2.4.2, so we complete it first.
We recall the bracket · t,∼ associated with the Hamiltonian (3.88). To
lighten notation we write · ∼ rather than · 1,∼ .
Lemma 3.3.5. Assume that the function u satisfies (3.7). Writing gi =
gi,M , we have
1
E log exp u √ gi σi
N i≤N ∼
√
= E log Eξ exp u(z qN,M + ξ ρN,M − qN,M ) + R ,
3.3 Controlling the Solutions of the RS Equations 227
where
K(κ0 , h0 , D)
|R| ≤ √ .
N
Now that we have proved Theorem 3.1.18 this is simply an occurrence of the
general principle explained in Section 1.5. We compare a quantity of the type
(1.140) with the corresponding quantity (1.141) when f (x) = exp u(x) and
w(x) = log x, when μ is Gibbs’ measure.
Proof. We consider
v √ √
Sv = gi σi + 1 − v(z qN,M + ξ ρN,M − qN,M )
N
i≤N
and
ϕ(v) = E log Eξ exp u(Sv ) ∼ .
We differentiate and integrate by parts to obtain:
and
1 π h2
pN,M = log + .
2 κ 4κ
228 3. The Shcherbina and Tirozzi Model
Informally, the rest of the proof goes as follows. By Lemma 3.3.5 we have
1
E log exp u √ gi,M σi
N ∼
√
E log Eξ exp u(z qN,M + ξ ρN,M − qN,M ) .
Now by Propositions 3.2.4 and 3.2.12 the numbers qN,M and ρN,M are near
solutions of the system of equations (3.69), (3.76) and (3.104).
As N → ∞, (u being fixed) these quantities become close (uniformly on α)
to a true solution of these equations. Thus, by Corollary 3.3.4, and provided
u satisfies (3.124) and (3.125) and ε is small enough, we have qN,M q0 (=
q0 (M/N, κ, h)) and ρN,M ρ0 (= ρ0 (M/N, κ, h)) and thus
√
E log Eξ exp u(z qN,M + ξ ρN,M − qN,M )
√ √ √ √
E log Eξ u(z q0 + ξ ρ0 − q0 ) E log Pξ (z q0 + ξ ρ0 − q0 ≥ τ ) ,
using again (3.124) and (3.125). Now (3.130) implies
1 √ √ M/N
d
E log Pξ (z q0 + ξ ρ0 − q0 ≥ τ ) RS0 (α)dα
N (M −1)/N dα
M M −1
= RS0 , κ, h − RS0 , κ, h .
N N
This chain of approximations yields
M M −1
pN,M − pN,M −1 RS0 , κ, h − RS0 , κ, h ,
N N
where means with error ≤ ε/N . Summation over M of these relations
together with the case M = 0 yields the desired result.
It is straightforward to write an “ε-δ proof” following the previous scheme,
so there seems to be no point in doing it here.
Our next goal is the proof of Proposition 3.3.3, that will reveal how the
initial condition α0 < 2 comes into play. Preparing for this proof, we consider
the function
1 e−x /2
2
d
A(x) = − log N (x) = √ , (3.132)
dx 2π N (x)
about which we collect simple facts.
3.3 Controlling the Solutions of the RS Equations 229
where the last equality uses that r − r = Ψ (ρ, q) − Ψ (ρ, q) and (3.144). Inte-
gration by parts yields
√ 1 √
−Ev (z q) = − √ Ezv (z q) .
q
r + h2 r √
q= ≥ = (ρ − q)2 r = α(ρ − q)2 Ev (z q)2 .
(2κ + r − r)2 (2κ + r − r)2
232 3. The Shcherbina and Tirozzi Model
r + h2 h20
q= ≤ + r(ρ − q)2 . (3.147)
(2κ + r − r)2 (2κ0 )2
Since α0 < 2 we can find a > 1/2 with aα0 < 1. Then by (3.143) there is a
number q(τ, a) satisfying
2
τ aq
q ≥ q(τ, a) ⇒ E z−√ 1{z≤τ / q} < a ⇒ E(Y 2 1{Y ≥0} ) ≤
√ .
q ρ−q
Thus, using (3.148) we get, using also that ρ − q ≤ 1/(2κ0 ) in the second
inequality,
αL
q ≥ q(τ, a) ⇒ r(ρ − q)2 ≤ αL(ρ − q) + aαq ≤ + aαq
2κ0
3.3 Controlling the Solutions of the RS Equations 233
h20 αL
q ≥ q(τ, a) ⇒ q ≤ + + aαq ,
(2κ0 )2 2κ0
so that
h20 αL
q ≥ q(τ, a) ⇒ (1 − aα)q ≤ 2
+ .
(2κ0 ) 2κ0
Since aα ≤ aα0 < 1, this proves that q (and hence ρ) is bounded by a number
depending only on h0 , κ0 and α0 .
It remains to prove Theorem 3.3.1. The proof is unrelated to the methods
of this work. While it is not difficult to follow line by line, the author cannot
really explain why it works (or how Shcherbina and Tirozzi could ever find
it). The need for a more general and enlightening approach is rather keen
here.
We make the change of variable x = q/(ρ − q), so that q = xρ/(1 + x),
ρ − q = ρ/(1 + x), and
√
τ 1+x √ x
F (q, ρ) = G(x, ρ) := α E log N √ −z x +
ρ 2
1 1 h2 ρ
+ log ρ − log(1 + x) − κρ + . (3.149)
2 2 2(1 + x)
Proposition 3.3.11. For x > 0 and ρ > 0 we have
∂2G ∂ x + 1 ∂G
< 0 ; >0. (3.150)
∂ρ2 ∂x x ∂x
Corollary 3.3.12. a) Given ρ > 0 there exists at most one value x1 such
that (∂G/∂x)(x1 , ρ) = 0. If such a value exists, the function x → G(q, ρ)
attains its minimum at x1 .
b) Given ρ > 0 there exists at most one value q1 such that (∂F/∂q)(q1 , ρ) =
0. If such a value exists, the function q → F (q, ρ) attains its minimum at q1 .
Proof. a) By the second part of (3.150) we have ∂G(x, ρ)/∂x < 0 for x < x1
while ∂G(x, ρ)/∂x > 0 for x > x1 .
b) Follows from a) since at given ρ the change of variable x = q/(ρ − q)
is monotonic.
Proof of Theorem 3.3.1. Suppose that we have ∂G/∂x = 0 and ∂G/∂ρ = 0
at the points (x1 , ρ1 ) and (x2 , ρ2 ). Then, since ∂ 2 G/∂ρ2 < 0, we have
G(x2 , ρ1 ) < G(x2 , ρ2 ) unless ρ2 = ρ1 . By the first part of Corollary
3.3.12 used for ρ = ρ1 we have G(x1 , ρ1 ) < G(x2 , ρ1 ) unless x1 = x2 . So
G(x1 , ρ1 ) < G(x2 , ρ2 ) unless (x1 , ρ1 ) = (x2 , ρ2 ). Reversing the argument
shows that (x1 , ρ1 ) = (x2 , ρ2 ).
√ √ √
We write W = τ 1 + x/ ρ − z x.
234 3. The Shcherbina and Tirozzi Model
and thus
√
z τ 1+x
(1 + x) E √ A(W ) = √ E A(W ) − E A(W )2 . (3.154)
x ρ
so that if λ2 A2 ≤ κ,
2
1 λf
exp λf dμ ≤ exp dμ .
1− λ 2 A2
κ
2
By iteration we get
2 2k
1 λf
exp λf dμ ≤ exp dμ
0≤<k
1− λ 2 A2
κ22
2k
We go back to the case where the spins take values in {−1, 1}. The Curie-
Weiss model is the “canonical” model for mean-field (deterministic) ferro-
magnetic interaction, i.e. interaction where the spins tend to align with each
other. The simplest Hamiltonian that will achieve this will contain a term
σi σj for each pair of spins, so it will be (proportional to) i<j σi σj . Equiv-
alently, we consider the Hamiltonian
2 2
βN 1 β
− HN (σ) = σi = σi . (4.1)
2 N 2N
i≤N i≤N
where " %
Ak = card σ ∈ ΣN ; σi = k .
i≤N
so by (4.2) we have, bounding the sum in the right-hand side by the number
of terms (i.e. 2N + 1) times the largest term,
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 237
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 4, © Springer-Verlag Berlin Heidelberg 2011
238 4. The Hopfield Model
βt2
ZN (β) ≤ (2N + 1)2 exp N max
N
− I(t) . (4.5)
t 2
Also, by (A.30), when N + k is even we have
2N k
Ak ≥ √ exp −N I , (4.6)
L N N
and thus
2N k β 2
ZN (β) ≥ max √ exp −N I exp k .
k+N even L N N 2N
Finally we get
2
1 βt
log ZN (β) = log 2 + max − I(t) + o(1) , (4.7)
N t∈[−1,1] 2
or, equivalently,
1+t
= exp(2βt) ,
1−t
i.e.
exp(2βt) − 1
t= = thβt . (4.9)
exp(2βt) + 1
If β ≤ 1, the only root of (4.9) is t = 0. For β > 1, there is a unique root
m∗ > 0. That is, m∗ = m∗ (β) satisfies
thβm∗ = m∗ . (4.10)
β 3 m∗3
βm∗ − + β 3 m∗3 o(βm∗ ) = m∗ ,
3
so that
m∗ (β) ∼ 3(β − 1) as β → 1+ . (4.11)
We define
βm∗2
b∗ = − I(m∗ ) (4.12)
2
so (4.7) reads
4.1 Introduction: The Curie-Weiss Model 239
1
log ZN (β) = log 2 + b∗ + o(1) . (4.13)
N
When β > 1, as N → ∞, Gibbs’ measure
is essentially supported by the
set of configurations σ for which N −1 i≤N σi ±m∗ . This is because for
a subset U of R,
" %
1 −1
GN σ; σi ∈ U = ZN (β) exp(−HN (σ)) ,
N
i≤N
where the summation is over all sequences for which N −1 i≤N σi ∈ U . Thus,
using (4.4), and bounding the sum in the second line by (2N + 1) times a
bound for the largest term,
" % 2
1 1 βN k
GN σ; σi ∈ U = Ak exp (4.14)
N ZN (β) 2 N
i≤N k/N ∈U
2
2N βt
≤ (2N + 1) exp N sup − I(t) .
ZN (β) t∈U 2
If we take
U = {t ; |t ± m∗ | ≥ ε}
where ε is given (does not depend on N ) then
2 2
βt βt
sup − I(t) < max − I(t) ,
t∈U 2 t∈[0,1] 2
where h > 0. To see where Gibbs’ measure lies, one should now maximize
β 2
f (t) = t − I(t) + th .
2
This maximum is attained at a point 0 < t < 1 because f (t) > f (−t) for
t > 0; this point t must satisfy βt + h = I (t), i.e.
t = th(βt + h) , (4.16)
240 4. The Hopfield Model
and we will see that there is a unique positive root to this equation. The
external field “breaks the symmetry between the two states”.
Consider now a random sequence (ηi )i≤N , ηi = ±1, and the random
Hamiltonian 2
β
− HN (σ) = ηi σi . (4.17)
2N
i≤N
When β > 1, the effect of the term βN m2k /2 of (4.18) is to tend to align
the sequence σ with the sequence (ηi,k )i≤N or the sequence (−ηi,k )i≤N . If
the sequences (ηi,k )i≤N are really different as k varies, this creates conflict.
For this reason the case β > 1 seems the most interesting.
The Hopfield model is the system with random Hamiltonian (4.18), when
the numbers ηi,k are independent Bernoulli r.v.s, that is are such that P(ηi,k =
±1) = 1/2. It simplifies notation to observe that, equivalently, one can assume
"
ηi,1 = 1 ∀ i ≤ N ; the numbers (ηi,k )i≤N,2≤k≤M
(4.20)
are independent r.v.s with P(ηi,k = ±1) = 1/2 .
This assumption is made throughout this chapter and Chapter 10. The
Hopfield model is already of interest if we fix M and let N → ∞. We shall
however focus on the more challenging case where N → ∞, M → ∞, M/N →
α, α > 0.
The Hopfield model (with Hamiltonian (4.18), that is, without external
field) has a “high -temperature phase” somewhat similar to the phase β < 1,
h = 0 of the SK model. This phase occurs in the region
√
β(1 + α) < 1 (4.21)
and it is quite interesting to see how this condition occurs. We will refer the
reader to Section 2 of [142] for this, because this study does not use the cavity
method and is somewhat distinct from the main theme we pursue here.
4.1 Introduction: The Curie-Weiss Model 241
Space (and energy!) limitations do not allow the study of this model here.
What is really interesting is not to study this model for β1 , β2 small, but,
given β1 (possibly large) to study the system for β2 as large as possible. The
“replica-symmetric” equations for this model are
√
μ = E th(β2 z q + β1 μ + h) (4.23)
√
q = E th2 (β2 z q + β1 μ + h) . (4.24)
Throughout the chapter we will consider the Hopfield model with external
field, so that the Hamiltonian is
2
Nβ 1
− HN,M (σ) = ηi,k σi + h σi
2 N
k≤M i≤N i≤N
Nβ 2
= mk + N hm1 . (4.25)
2
k≤M
we can expect the value k = 1 to play a special role. Without loss of generality
we can and do assume h ≥ 0.
We observe that the function f (x) = th(βx + h) is concave for x ≥ 0. If
h > 0 we have f (0) > 0. If h = 0 and β > 1 we have f (0) = 0 and f (0) > 1.
Thus if β > 1 there is a unique positive solution to (4.16). Throughout the
chapter, we denote by m∗ = m∗ (β, h) this solution, i.e.
m∗ = th(βm∗ + h) . (4.26)
We set
β ∗2
b∗ = log ch(βm∗ + h) − m . (4.27)
2
The expression of b∗ given here is appropriate for the proof of Lemma 4.1.2
below. It is not obvious that this is the same as the value (4.12), which, in
the presence of the external field, is
βm∗2
+ m∗ h − I(m∗ ) . (4.28)
2
To prove the equality of (4.27) and (4.28), we observe that (A.26) implies
Of course ZN,M = ZN,M (β, h) denotes the partition function of the Hamil-
tonian (4.25). Thus ZN,1 is the partition function of the Curie-Weiss model
with external field. The proof serves as an introduction to the method of
Section 4.3. It is much more effective and accurate than the more natural
method leading to (4.13). The result is also true for β < 1 if we define m∗ by
(4.26) for h > 0 and m∗ = 0 for h = 0. This is left as an exercise.
Proof. We start with the identity (see (A.6))
a2
E exp ag = exp
2
whenever g is standard Gaussian r.v., so that
ZN,1 = E exp( N βgm1 + N hm1 ) .
σ
−1
Now, since m1 = N i≤N σi we have, using (1.30) in the second equality,
4.1 Introduction: The Curie-Weiss Model 243
β
exp( N βgm1 + N hm1 ) = exp σi g+h
σ σ
N
i≤N
N
N β
= 2 ch g+h ,
N
and therefore N
N β
ZN,1 = 2 E ch g+h .
N
Thus
1 β t2
ZN,1=2 √
N
exp N log ch t+h − dt
2π R N 2
Nβ βz 2
= 2N exp N log ch(βz + h) − dz
2π R 2
√
with the change of variable t = N βz. The function z → log ch(βz + h) −
βz 2 /2 attains its maximum at the point z such that th(βz + h) = z, i.e.
z = m∗ , and this maximum is b∗ . Thus
where
Nβ
AN = exp N ψ(z)dz ,
2π R
for
βz 2
ψ(z) = log ch(βz + h) − − b∗ .
2
To finish the proof we will show that there is a number K such that
1
ψ(z) ≤ − (z − m∗ )2 . (4.30)
K
√
Making the change of variable z = m∗ + x/ N then implies easily that
log AN stays bounded as N → ∞, and (4.29) concludes the proof.
The proof of (4.30) is elementary and tedious. We observe that the func-
tion ψ satisfies ψ(m∗ ) = ψ (m∗ ) = 0. Also, the function
ψ (z) = β(th(βz + h) − z)
Since ψ(z) < −β(z − m∗ )2 /4 for large z, it follows that (4.30) holds for all
values of z.
The Hopfield model has a kind of singularity for β = 1. In that case,
some understanding has been gained only when M/N → 0, see [154] and
the references therein to earlier work. These results again do not rely on the
cavity method and are not reproduced here. Because of that singularity, we
study the Hopfield model only for β = 1. Our efforts in the next sections
concentrate on the most interesting case, i.e. β > 1. We will explain why
the case β < 1 is several orders of magnitude easier than the case β > 1.
It is still however not trivial. This is because the special methods that allow
the control of the Hopfield model without external field under the condition
(4.21) break down in the presence of an external field.
When studying the Hopfield model, we will think of N and M as large
but fixed. Throughout the chapter we write
M
α= .
N
The model then depends on the parameters (N, α, β, h).
Exercise 4.1.3. Prove that there exists a large enough universal constant L
such that one can control the Hopfield model with external field in a region
of the type β < 1, α ≤ (1 − β)2 /L.
Of course this exercise should be completed only after reading some of the
present chapter, and in particular Theorem 4.2.4 below. On the other hand,
even if β < 1, when h = 0, reaching the largest possible value of α for which
there is “high-temperature” behavior is likely to be a level 3 problem.
Nβ 2
− HN,M (σ) = mk (σ) + N hm1 (σ) . (4.31)
2
k≤M
W = (N β/2π)M/2 . (4.33)
This is
W Nβ
exp − z2 exp(N βz · m(σ) + N hm1 (σ)) .
ZN,M 2 σ
Now
N βz · m(σ) + N hm1 (σ) = β zk ηi,k σi +h σi
k≤M i≤N i≤N
= σi β zk ηi,k + h σi
i≤N k≤M i≤N
= σi (βη i · z + h) ,
i≤N
and therefore
exp(N βz · m(σ) + N hm1 (σ)) = 2N ch(βη i · z + h)
σ i≤N
= 2 expN
log ch(βη i · z + h) ,
i≤N
Theorem 4.2.2. There exists a number L with the following property. Given
β > 1, there exists a number κ > 0 with the following property. Assume
that α ≤ m∗4 /Lβ. Then there exists a number K such that with probability
≥ 1 − K exp(−N/K), the function z → ψ(z) + κN z2 is concave in the
region " %
∗ m∗
z ; z − m e1 ≤ . (4.35)
L(1 + log β)
Here, and everywhere in this chapter, K denotes a number that does not
depend on N or M (so that K never depends on α = M/N ). In the present
case, K depends only on β and h. As usual the letter L denotes a univer-
sal constant, that certainly need not be the same at each occurrence. We
will very often omit the sentence “There exists a number L with the follow-
ing property” and the sentence “There exists a number K” in subsequent
statements.
4.2 Local Convexity and the Hubbard-Stratonovitch Transform 247
The point of Theorem 4.2.2 is that the function z2 is convex, so that
the meaning of this theorem is that in the region (4.35) the function ψ is
sufficiently concave that it will satisfy (3.21), opening the way to the use
of Theorem 3.1.4. The conditions α ≤ m∗4 /Lβ and (4.35) are by no means
intuitive, but are the result of a careful analysis.
Even though Theorem 4.2.2 will not be used before Section 4.5 we will
present the proof now, since it is such a crucial result for the present approach
(or the other hand, when we return to the study of the Hopfield model in
Chapter 10 this result will no longer be needed). We must not hide the fact
that this proof uses ideas from probability theory, which, while elementary,
have been pushed quite far. This is also the case of the results of Section 4.3.
These proofs contain no “spin glasses ideas”. Therefore the reader who finds
these proofs difficult should simply skip them all. In Section 4.4 page 272,
matters become quite easier.
Throughout the book we will use the letter Ω to denote an event (so we
do not follow the standard probability notation, which is to denote by Ω the
entire underlying probability space).
Theorem 4.2.4. Given β < 1, there exists a number κ > 0 with the follow-
ing property. Assume that α ≤ (β − 1)2 /L. Then with overwhelming proba-
bility the function ψ(z) + κN z2 is concave.
Again, we have omitted the sentence “There exists a number L such that
the following holds”. In Theorem 4.2.4 the constant K implicit in the words
“with overwhelming probability” depends only on β.
To prove that a function ϕ is concave in a convex domain, we prove
2
that at each point w of this domain the second differential Dw of ϕ satisfies
248 4. The Hopfield Model
Dw2
(v, v) ≤ 0 for each vector v. If differentials are not familiar to you, the
2
quantity Dw (v, v) is simply the second derivative at t = 0 of the function
t → ϕ(w + tv).
Proof of Theorem 4.2.4. Let us set z = m∗ e1 + w, and denote by Dw 2
It follows from Corollary A.9.4 that (there exists a number L such that) if
M ≤ (1 − β)2 N/L, with overwhelming probability one has
∀v , (η i · v)2 ≤ N (1 + (1 − β))v2 ,
i≤N
and consequently,
β β
β(1 − m∗2 ) ≤ ≤ L exp − .
ch2 (βm∗ (2, 0)) L
using that 1 + x2 ≤ 1 + β 2 .
Theorem 4.2.2 asserts that with overwhelming probability we control the
Hessian (= the second differential) of ψ over the entire region (4.35). Con-
trolling the Hessian at a given point with overwhelming probability is easy,
but controlling at the same time every point of a region is distinctly more
difficult, and it is not surprising that this should require significant work. The
key to our approach is the following.
Proposition 4.2.6. We can find numbers L and L1 with the following prop-
erty. Consider 0 < a < 1 and b ≥ L1 log(2/a). Assume that α ≤ a2 .
Then the following event occurs with probability ≥ 1 − L exp(−N a2 ): for each
w, w ∈ RM , we have
(η i · v)2 ≤ LN a2 v2 , (4.40)
i∈J(w)
where
J(w) = {i ≤ N ; |η i · w| ≥ bw} . (4.41)
Tounderstand this statement, we note that E(η i · z)2 = z2 , so that
E i≤N (η i · v)2 = N v2 . Also when b 1, and since E(η i · w)2 = w2 ,
it is rare that |η i · w| ≥ bw, so the set J(w) has a tendency to be a rather
small subset of {1, . . . , N }, and it is much easier to control in (4.40) the sum
over J(w) rather than the sum over {1, . . . , N }. The difficulty of course is to
find a statement that holds for all v and w.
250 4. The Hopfield Model
In contrast with the case β < 1 we must now take advantage of the fact
that the denominators have a tendency to be > 1, or even 1 for large β.
The difficulty is that some of the terms βm∗ + h + βη i · w might be close
to 0, in which case ch2 (βm∗ + h + βη i · w) is not large. We have to show
somehow that these terms do not contribute too much. The strategy is easier
to understand when β is not close to 1. In that case, the only terms that can
be troublesome are those for which βm∗ + h + βη i · w might be close to 0 (for
otherwise ch2 (βm∗ +h+βη i ·w) 1) and these are such that η i ·w ≤ −m∗ /2
and in particular |η i · w| ≥ m∗ /2. Proposition 4.2.6 is perfectly appropriate
to control these terms (as it should, since this is why it was designed).
We first consider the case β ≥ 2. In that case (following the argument of
(4.37)), since m∗ ≥ m∗ (2, 0), we have
1 β
β 2
≤ L exp − ,
ch2 (βm∗ /2 + h) L
and thus
β
2
Dw (v, v) ≤ −βN v2 + L exp − (η i · v)2
L
i≤N
+β 2
1{|ηi ·w|≥m∗ /2} (η i · v)2 . (4.43)
i≤N
To control the second term in the right-hand side, we note that by Corol-
lary A.9.4, with overwhelming probability we have (whenever M ≤ N )
∀v, (η i · v)2 ≤ Lv2 . (4.44)
i≤N
L3 log β ≥ L1 log(2/a) .
√
Thus we can use Proposition 4.2.6 with a as above and b = L3 log β. We
observe that
4.2 Local Convexity and the Hubbard-Stratonovitch Transform 251
m∗ m∗
w ≤ and |η i · w| ≥ ⇒ |η i · w| ≥ bw .
2b 2
It then follows from Proposition 4.2.6 that if M ≤ N a2 = N/Lβ (i.e. α ≤
1/Lβ), then with overwhelming probability, the following occurs:
m∗ N β0
w ≤ √ ⇒ Dw
2
(v, v) ≤ − v2 . (4.45)
L log β 4
This is obvious if |x| ≥ cm∗ because then the right-hand side is ≥ 1. This
is also obvious if x ≥ 0 because this is true for x = 0 and the function
f (x) = ch−2 (β(m∗ + x) + h) decreases. Now,
2βth(β(m∗ + x) + h)
f (x) = − ,
ch2 (β(m∗ + x) + h)
Therefore (4.46) also holds for |x| ≤ cm∗ , and is proved in every case.
We define
d = 1 − m∗2 + 2βm∗2 c ,
252 4. The Hopfield Model
d<1.
where
I = −βN v2 + β 2 d (η i · v)2
i≤N
∗2
II = β m2
1{|ηi ·w|≥cm∗ } (η i · v)2 .
i≤N
and consequently
I ≤ −βN v2 (1 − βd(1 + ρ)) . (4.49)
∗2 ∗2
By (4.38), we have 1 − β(1 − m ) ≥ m /L, so that, recalling the definition
of d, that d ≤ 1, and that β ≤ β0 ,
1 − βd(1 + ρ) ≥ 1 − βd − β0 ρ
= 1 − β(1 − m∗2 ) − 2β 2 m∗2 c − β0 ρ
≥ 1 − β(1 − m∗2 ) − 2β02 m∗2 c − β0 ρ
m∗2
≥ − 2β02 m∗2 c − β0 ρ .
L0
We make the choices
m∗2 1
ρ= ; c= ,
4β0 L0 8β02 L0
ρ2 m∗4
α≤ = ,
L L
with overwhelming probability we have
4.2 Local Convexity and the Hubbard-Stratonovitch Transform 253
m∗2
I ≤ −βN v2 . (4.50)
2L0
To take care of the term II we use Proposition 4.2.6 again. We choose
a = 1/L4 , where L4 = 2β0 L0 . We can then apply Proposition 4.2.6 for
b = L1 log(2/a) (= L5 ). Then, since L0 is the constant in (4.40), the right-
hand side of this inequality is N v2 /4β02 L0 . Since when |η · w| ≥ cm∗
and w ≤ m∗ c/b = m∗ /L then |η · w| ≥ bw, this proves that if
M ≤ a2 N = N/L24 then with overwhelming probability II ≤ N v2 m∗2 /4L0
whenever w ≤ m∗ c/b = m∗ /L. Consequently, combining with (4.50) we
have shown that if β ≤ β0 and α ≤ m∗4 /L , then with overwhelming proba-
bility
m∗ m∗2
w ≤ ⇒ Dw (v, v) ≤ −βN v2 .
L 4L0
Combining with (4.45) we have completed the proof, because if the constant
L in (4.35) is large enough, the region this condition defines is included into
the region we have controlled. This is obvious by distinguishing the cases
β ≥ β0 and β ≤ β0 .
Proof of Proposition 4.2.6. Consider the largest integer N0 ≤ N with
N0 log(eN/N0 ) ≤ N a2 . In Proposition A.9.5 it is shown that the following
event occurs with probability ≥ 1 − L exp(−N a2 ):
∀J ⊂ {1, · · · , N }, cardJ ≤ N0 , ∀w ∈ RM ,
(η i · w)2 ≤ w2 (N0 + L max(N a2 , N N0 a)) . (4.51)
i∈J
b2 N0 ≤ N0 + L max(N a2 , N N0 a) ,
and therefore
(b2 − 1)N0 ≤ L max(N a2 , N N0 a) . (4.52)
The idea of the proof is to show that this bound on N0 contradicts the
definition of N0 , by forcing N0 to be too small. It is clear that, given a,
254 4. The Hopfield Model
this is the case if b is large enough, but to get values of b of the right order
one has to work a bit. Assuming without loss of generality b ≥ 2, we have
b2 − 1 ≥ b2 /2, so that (4.52) implies
b2 N0 ≤ L max(N a2 ,
N N0 a) .
√
Thus we have either N0 ≤ LN a /b or else b N0 ≤ L N N0 a, i.e.
2 2 2
a2 a2
N0 ≤ LN 4
≤ LN 2 .
b b
Therefore we always have N0 ≤ L6 N a2 /b2 . We show now that we can choose
the constant L1 large enough so that
2
L6 eb 1
b ≥ L1 log(2/a) =⇒ 2 log ≤ . (4.53)
b L6 a 2 2
To see that such a number L1 exists we can assume L6 ≥ e and we observe
that log(eb2 /L6 a2 ) ≤ 2 log b + 2 log(2/a). We moreover take L1 large enough
such that we also have L6 a2 /b2 ≤ L6 /b2 ≤ 1.
Since the function x → x log(eN/x) increases for x ≤ N , and since N0 ≤
L6 N a2 /b2 ≤ N , when b ≥ L1 log(2/a) we deduce from (4.53) that
2
eN a2 eb N a2
N0 log ≤ L6 N 2 log ≤ ,
N0 b L6 a 2 2
and therefore since N0 + 1 ≤ 2N0 we have
eN
(N0 + 1) log ≤ N a2 .
N0 + 1
But this contradicts the definition of N0 .
Thus we have shown that cardJ(w) < N0 . Then, by (4.51), and since
N0 ≤ N a2 we get
(η i · v)2 ≤ LN a2 v2 .
i∈J(w)
It is useful to note that the balls of this theorem are disjoint since ρ0 ≤
m∗ /2.
To reformulate Theorem 4.3.2, if we consider the set
A = {z ∈ RM ; ∀k ≤ M , ∀τ = ±1 , z − τ m∗ εk > ρ0 } ,
Thus,
Nβ 2 αN
γ({z2 ≥ ρ2 }) exp ρ ≤ 2M/2 ≤ exp
4 2
and, since α ≤ βρ2 /4, we get
N N βρ2
γ({z ≥ ρ }) ≤ exp − (βρ − 2α) ≤ exp −
2 2 2
. (4.55)
4 8
and hence
G (A) ≤ LG(A + B) . (4.57)
To prove that a set A is negligible for G it therefore suffices to prove
that A + B is negligible for G. Consequently if G is essentially supported
by a set C, then G is essentially supported by C + B. This is because the
complement A of C + B is such that A + B is contained in the complement of
C so that it is negligible for G and hence A is negligible for G . In particular
when G is essentially supported by the union C of the balls of radius ρ0 /2
centered at the points ±m∗ ek , then G is essentially supported by C + B.
When α ≤ m∗2 ρ20 /16, we have 2 α/β ≤ ρ0 /2 and hence C + B is contained
in the union of the balls of radius ρ0 centered at the points ±m∗ ek . Thus it
suffices to prove Theorem 4.3.2 for G rather than for G .
As a consequence of Lemma 4.2.1, for a subset A of RM , the identity
W A exp ψ(z)dz
G(A) = (4.58)
2−N ZN,M
holds, and the strategy to prove that A is negligible is simply to prove that
typically the numerator in (4.58) is much smaller than the denominator. For
this it certainly helps to bound the denominator from below. As is often the
case in this chapter, we need different arguments when β is close to 1 and
when β is away from 1. Of course the choice of the number 2 below is very
much arbitrary.
where
1 thx
ξ (x) = thx; ξ (x) = ; ξ (x) = −2 2 ; |ξ (4) (x)| ≤ 4 . (4.60)
ch2 x ch x
η i · (m∗ e1 + v) = m∗ + η i · v ,
ψ ∼ (v) = ψ(m∗ e1 + v)
Nβ
=− m∗ e1 + v2 + log ch(βη i · (m∗ e1 + v) + h)
2
i≤N
Nβ
=− m∗ e1 + v2 + log ch(b + βη i · v) . (4.62)
2
i≤N
β 3 thb β4
− 2 (η i · v)3 + Ri (v)(η i · v)4 (4.63)
3 ch b 6
i≤N i≤N
by Jensen’s inequality.
Given a vector x of RM , we have for a certain constant cp that
E g4 ≥ (E g2 )2 = M 2
When β ≥ 2, we will use a different bound. We will use the vector θ =
(θk )k≤M given by
m∗
θ= (η i − e1 ) , (4.69)
N
1≤i≤N
This bound is true for any realization of the disorder and every value of β, M
and N . Since θ2 is about αm∗ /N this is much worse when β ≤ 2 than the
bound (4.59) when La∗ < 1.
Proof. The convexity of the function log ch, and the fact that since b =
βm∗ + h, we have thb = m∗ imply that log ch(b + x) ≥ log chb + m∗ x.
Therefore (4.62) implies
Nβ
ψ ∼ (v) ≥ − m∗ e1 + v2 + N log chb + m∗ βη i · v
2
i≤N
Nβ
= N b∗ − v2 + βm∗ (η i − e1 ) · v
2
i≤N
Nβ
= N b∗ − v2 + N βθ · v
2
Nβ Nβ
= N b∗ + θ2 − v − θ2 .
2 2
Thus
Nβ Nβ
W exp ψ ∼ (v)dv ≥ exp N b∗ + θ2 W exp − v − θ2 dv
2 2
N β
= exp N b∗ + θ2 ,
2
and the result follows from (4.61).
We have the following convenient consequence of Propositions 4.3.3 and
4.3.4: whenever α ≤ m∗4 ,
M/2
1
2−N ZN,M ≥ exp b∗ N . (4.71)
La∗
Indeed, if β ≤ 2 this follows from Proposition 4.3.3, while if β ≥ 2, by Propo-
sition 4.3.4 we have 2−N ZM,N ≥ exp b∗ N , and, since a∗ remains bounded
away from 0 as β ≥ 2, we simply take L large enough that then La∗ ≥ 1.
The bound (4.71) does not however capture (4.70).
We turn to the task of finding upper bounds for the numerator of (4.58).
For this we will have to find an upper bound for ψ. We will use two rather
distinct bounds, the first of which will rely on the following elementary fact.
Lemma 4.3.5. The function
√
ϕ(x) = log ch(β x + h) (4.72)
E G(A) ≤ P(G(A) ≥ ε) + ε
since G(A) ≤ 1.
A = {z ∈ RM ; z ≥ 2m∗ }
Proof. We write
1
log ch(βη i · z + h) ≤ ϕ((η i · z) ) ≤ N ϕ
2
(η i · z)2
(4.74)
N
i≤N i≤N i≤N
by concavity of ϕ.
Using Corollary A.9.4 we see that provided
a∗2
α≤ (4.75)
L
then the event
1 a∗
∀ z ∈ RM , (η i · z)2 ≤ 1 + z2 (4.76)
N 8
i≤N
262 4. The Hopfield Model
Let√us consider the function f (t) = log ch(βt + h) − βt2 /2, so that ϕ(x) =
f ( x) + βx/2 and (4.77) means
N βa∗ a∗
ψ(z) ≤ z + N f
2
1 + z . (4.78)
16 8
β ∗
f (t) ≤ b∗ − a (t − m∗ )2 .
2
Thus for t ≥ 2m∗ , and since then t − m∗ ≥ t/2 and thus (t − m∗ )2 ≥ t2 /4,
we have
βa∗ 2
f (t) ≤ b∗ − t ,
8
and therefore
a∗ βa∗ a∗ βa∗
f 1 + z ≤ b∗ − 1+ z2 ≤ b∗ − z2 .
8 8 8 8
It then follows from (4.78) that ψ(z) ≤ N b∗ − N βa∗ z2 /16 for z ≥ 2m∗ .
Thus, under (4.75), with overwhelming probability we have
Nβ ∗
ψ(z)dz ≤ exp N b∗ exp − a z2 dz (4.79)
A A 8
βa∗ m∗2 Nβ ∗
≤ exp N b∗ − exp − a z2 dz
4 A 16
βm∗4 βm∗4 βm∗4
G(A) ≤ LM exp −N ≤ exp N αL7 − ≤ exp −N ,
L L7 2L7
provided α ≤ m∗4 /2L27 . This completes the proof.
This preliminary result is interesting in itself, and will be very helpful
since from now on we need to be concerned only with the values of z such
that |z| ≤ 2m∗ .
Our further results will be based on the following upper bound for ψ; it
is this bound that is the crucial fact.
Lemma 4.3.7. We have
∗β
ψ(z) ≤ N b + (η i · z) − N z
2 2
2
i≤N
β
− min 1, ((η i · z)2 − m∗2 )2 . (4.80)
L
i≤N
The last term in (4.80) has a crucial influence. There are two main steps
to use this term. First, we will learn to control it from below uniformly on
large balls. This control will be achieved by proving that with overwhelming
probability at every point of the ball this term is not too much smaller than its
expectation. In a second but separate step, we will show that this expectation
cannot be small unless z is close to one of the points ±m∗ ei . Therefore with
overwhelming probability this last term can be small only if z is close to one
of the points ±m∗ ei , and this explains why Gibbs’ measure concentrates near
these points.
There is some simple geometry behind the behavior of the expectation of
the last term of (4.80). If we forget the minimum and consider simply the
average
264 4. The Hopfield Model
1 2
(η i · z)2 − m∗2 ,
N
i≤N
As a warm-up before the real proof, the reader should convince herself that
this quantity can be small only if one of the zk ’s is approximately ±m∗ and
the rest are nearly zero.
Proof of Lemma 4.3.7. We recall the function ϕ of Lemma 4.3.5. Assuming
for definiteness that x ≥ m∗2 , this lemma implies
x
ϕ(x) = ϕ(m∗2 ) + ϕ (m∗2 )(x − m∗2 ) + (x − t)ϕ (t)dt
m∗2
β min(x,2)
≤ ϕ(m∗2 ) + ϕ (m∗2 )(x − m∗2 ) − (x − t)dt
L m∗2
β
≤ ϕ(m∗2 ) + ϕ (m∗2 )(x − m∗2 ) − min 1, (x − m∗2 )2 . (4.81)
L
Now,
β β
ϕ (m∗2 ) = ∗
th(βm∗ + h) =
2m 2
and
β ∗2 β
ϕ(m∗2 ) − m = log ch(βm∗ + h) − m∗2 = b∗ ,
2 2
so that (4.81) implies
β β
ϕ(x) ≤ b∗ + x − min 1, (x − m∗2 )2 .
2 L
Using this for x = (η i ·z)2 , summing over i ≤ N and using the first inequality
in (4.74) yields the result.
To perform the program outlined after the statement of Lemma 4.3.7, it
helps to introduce a kind of truncation. Given a parameter d ≤ 1, we write
Rd (z) = E min d, ((η i · z)2 − m∗2 )2 , (4.82)
Proof. Let
N
A1 = {z ∈ A ; Rd (z) ≥ log cardA + log C} .
4
To prove (4.85), it suffices to achieve control over z ∈ A1 . Let us fix z in A1 .
The r.v.s
Xi = min d, ((η i · z)2 − m∗2 )2
are i.i.d., 0 ≤ Xi ≤ 1, E Xi = Rd (z). We prove an elementary exponential
inequality about these variables. Since exp(−x) ≤ 1 − x/2 ≤ exp(−x/2) for
x ≤ 1, we have
E Xi E Xi Rd (z)
E exp(−Xi ) ≤ 1 − ≤ exp − = exp −
2 2 2
and thus
N Rd (z)
E exp − Xi ≤ exp − ,
2
i≤N
so that
N Rd (z) N Rd (z) N Rd (z)
P Xi ≤ exp − ≤ exp − ,
4 4 2
i≤N
and
N Rd (z) N Rd (z) 1
P Xi ≤ ≤ exp − ≤
4 4 CcardA
i≤N
Next, we relate what happens for two points close to each other.
| min(d, x2 ) − min(d, y 2 )|
√ √
= |min( d, |x|)2 − min( d, |y|)2 |
√ √ √ √
= | min( d, |x|) − min( d, |y|)|(min( d, |x|) + min( d, |y|))
√ √ √ √
≤ 2 d| min( d, |x|) − min( d, |x|)| ≤ 2 d|x − y| , (4.88)
∀ k ≤ M, z ± m∗ ek ≥ ξm∗ . (4.90)
Then if
d = ξ 2 m∗4 , (4.91)
we have
ξ 2 ∗4
Rd (z) ≥ m . (4.92)
L
The proof relies on the following probabilistic estimate.
ξm∗2
|z2 − m∗2 | ≤ (4.96)
4
so that
|z2 − m∗2 | ξm∗
|z − m∗ | = ≤ . (4.97)
z + m∗ 4
Next, (4.94) implies
268 4. The Hopfield Model
ξ 2 m∗4
≥ zk2 z2 = z4 − z4
16
k= ≤M
≥ z − (max z2 )
4
z2
≤M
≤M
Consider k such that zk2 = max≤M z2 . Then, since z2 ≥ 3m∗2 /4 by (4.96),
we have from (4.98) that
ξ 2 m∗4 ξ 2 m∗2
z2 − zk2 ≤ ≤ .
16z2 12
Now z2 − zk2 = 2
=k z = z − zk ek 2 , so that
ξm∗
z − zk ek ≤ (4.99)
3
and consequently
ξm∗
|z − |zk || ≤ . (4.100)
3
Moreover, if τ = signzk , we have
and thus
EX EX
E min(d, X) ≥ min d, P X≥
2 2
2
E X (E X)
≥ min d, . (4.104)
2 4E (X 2 )
We have
X = (U + a)2 , (4.105)
4.3 The Bovier-Gayrard Localization Theorem 269
E U 4 ≤ L(E U 2 )2 , (4.107)
but there are much nicer arguments to do this [14]. From (4.105) it follows
that
We now put together the different pieces, and we state our main tool for
the proof of Theorem 4.3.2.
z ∈ A ⇒ ∀ k ≤ M , z ± m∗ ek ≥ ξm∗ .
B = {z ; z ≤ 2m∗ }
" %
1
A= z ; z ≤ 2m∗ , ∀ k ≤ M , z ± m∗ ek ≥ m∗ . (4.109)
2
Thus, we can use (4.108) with ρ = 2m∗ and ξ = 1/2. Consider a number
0 < c < 1, to be determined later. Using Corollary A.9.4 we see that if
α ≤ c2 /L, with overwhelming probability we have
∗ M/2 m∗4
G(A) ≤ (La ) W exp N β cL8 z −
2
+ L8 α dz . (4.110)
A L8
m∗4 m∗4
z ≤ 2m∗ ⇒ cL8 z2 − ≤− − cL8 z2
L8 2L8
and thus (4.110) yields that if α ≤ m∗4 /4L28 , with overwhelming probability
we have
N βm∗4 N βm∗2
G(A) ≤ (La∗ )M/2 W exp − exp − z2 dz
L L
∗ M/2 ∗4
La N βm
≤ ∗2
exp − . (4.111)
m L
so that
(η i · z)2 − N z2 = (η i · v)2 − N v2 (4.113)
i≤N i≤N
∗
+ 2τ m (η i · v)(η i · ek ) − N v · ek ,
i≤N
Theorem 4.4.3. If β > 1, h > 0, then for α ≤ m∗4 /L, the set
" %
m∗
A = σ ; m(σ) − m∗ e1 ≥ (4.117)
4
is negligible.
Theorem 4.4.4. Consider β > 1, h > 0 and ρ0 ≤ m∗ /2. If α ≤ m∗2 ρ20 /L,
then G is essentially supported by the ball in RM of radius ρ0 centered at the
point m∗ e1 . Equivalently, the set
{σ ; m(σ) − m∗ e1 ≥ ρ0 } (4.118)
is negligible.
{σ ; ∀k ≤ M, m(σ) ± m∗ ek ≥ ρ0 }
is negligible. The union of this set and the set (4.117) is the set (4.118), which
is therefore negligible.
For k ≤ M and τ = ±1 we consider the sets
" %
m∗
Bk,τ = σ ; m(σ) − τ m∗ ek ≤ (4.119)
4
" %
3m∗
Ck,τ = σ ; τ mk (σ) ≥ . (4.120)
4
0
Let us denote by HN,M (σ) the Hamiltonian (4.18), which corresponds to
the case h = 0. We define
0
S(k, τ ) = exp(−HN,M (σ)) . (4.121)
σ∈Ck,τ
Lemma 4.4.5. There exists a number a such that for each k ≤ M , each
τ = ±1, we have
1 N u2
0 ≤ u ≤ 1 ⇒ P log S(k, τ ) − a ≥ u ≤ K exp − . (4.122)
N K
It suffices to prove this for k = τ = 1, because the r.v.s S(k, τ ) all have
the same distribution. This inequality relies on a “concentration of measure”
principle that is somewhat similar to Theorem 1.3.4. This principle, which
has received numerous applications, is explained in great detail in Section 6 of
[140], and Lemma 4.4.5 is proved exactly as Theorem 6.8 there. The author
believes that learning properly this principle is well worth the effort, and
that this is better done by reading [140] than by reading only the proof of
Lemma 4.4.5. Thus, Lemma 4.4.5 will be one of the few exceptions to our
policy of giving complete self-contained proofs.
Proof of Theorem 4.4.3. We have
274 4. The Hopfield Model
1
G(C1,1 ) = exp(−HN,M (σ)) .
ZN,M
σ∈C1,1
For σ in C1,1 , we have hm1 (σ) ≥ 3hm∗ /4, so that −HN,M (σ) ≥ −HN,M
0
(σ)
∗
+3N hm /4, and
3 ∗ S(1, 1)
1 ≥ G(C1,1 ) ≥ exp N hm . (4.123)
4 ZN,M
If (k, τ ) = (1, 1), we have hm1 (σ) ≤ hm∗ /4 for σ in Bk,τ so that
1
exp(−HN,M (σ)) ≤ exp N hm∗ 0
exp(−HN,M (σ))
4
σ∈Bk,τ σ∈Bk,τ
1 ∗
≤ exp N hm S(k, τ )
4
and thus
1 S(k, τ )
G(Bk,τ ) ≤ exp N hm∗ . (4.124)
4 ZN,M
Taking u = min(1, hm∗ /8) in Lemma 4.4.5 shows that with overwhelming
probability we have
1
S(1, 1) ≥ exp N a − N hm∗
8
1 ∗
∀ k, τ, S(k, τ ) ≤ exp N a + N hm
8
and thus
1
∀ k, τ, S(k, τ ) ≤ exp N hm∗ S(1, 1) .
4
Combining with (4.123) and (4.124) yields that, with overwhelming proba-
bility,
1
(k, τ ) = (1, 1) ⇒ G(Bk,τ ) ≤ exp − N hm∗
4
so that Bk,τ is negligible. Combining with Theorem 4.3.2 finishes the proof.
From now on we assume h > 0, and we recall Theorem 4.4.4: given ρ0 ≤ m∗ /2,
if α ≤ m∗2 ρ20 /L8 , then G is essentially supported by the set
B1 = {z ; z − m∗ e1 ≤ ρ0 } .
4.5 Controlling the Overlaps 275
B2 = {z ; z ≤ ρ0 } .
B = B1 + B2 = {z ; z − m∗ e1 ≤ 2ρ0 } .
√
Using this for ρ0 = m∗ /L β proves that if α ≤ m∗4 /Lβ, then G is essentially
supported by the set
" %
m∗
B = z ; z − m∗ e1 ≤ √ . (4.125)
L β
Combining with Theorem 4.2.2 yields that moreover there exists κ > 0 such
that, with overwhelming probability, the function z → ψ(z) + κN z2 is
concave on B (so that ψ satisfies (3.21)). This will permit us to control the
model in the region where α ≤ m∗4 /Lβ. (In Chapter 10 we will be able to
control a larger region using a different approach.)
Consider the random probability measure G∗ on B which has a density
proportional to exp ψ(z) with respect to Lebesgue’s measure on B. Then we
should be able to study G∗ using the tools of Section 3.1. Moreover, we can
expect that G and G∗ are “exponentially close”, so we should be able to
transfer our understanding of G∗ to G, and then to G . We will address this
technical point later, and we start the study of G∗ . As usual, one can expect
the overlaps to play a decisive part. The coordinate z1 plays a special rôle, so
we exclude it from the following definition of the overlap of two configurations
U, = U, (z , z ) = zk zk . (4.126)
2≤k≤M
is small, and we explain the beautiful argument of Bovier and Gayrard which
proves this. For λ ≥ 0, consider the probability Gλ on B that has a density
proportional to exp(ψ(z) + λN U1,1 (z)) with respect to Lebesgue’s measure
on B; and denote by · ∗λ an average for this probability, so that · ∗ = · ∗0 .
The function
276 4. The Hopfield Model
κ κ 2 κ
z → λU1,1 (z) − z = − z1 −
2
− λ zk2
2 2 2
2≤k≤M
When the function (4.128) is concave, we can use (3.17) (with N κ/2 instead
of κ) to get
k
∗ Kk
∀k ≥ 1 , (U1,1 − U1,1 ∗λ )2k λ ≤ , (4.129)
N
where K depends on κ only, and hence on β only.
The next step is to control the fluctuations of U1,1 ∗ . For this we consider
the random function
1
ϕ(λ) = log exp(ψ(z) + λN U1,1 (z))dz (4.130)
N B
ϕ(λ) = Eϕ(λ) ,
This function has the following two remarkable properties: first, given any
number b, the set {F ≤ b} is a convex set in RN ×M . This follows from the
convexity of the function
√ log ch and Hölder’s inequality. Second, its Lipschitz
constant is ≤ K/ N . Indeed,
∂F β
= zk th(βxi · z + h) ,
∂xi,k N
2
∂F Lβ 2
≤ ,
∂xi,k N
i≤N,k≤M
√
i.e. the gradient of F has a norm ≤ K/ N . The abstract principle of [120]
implies then that
N u2
∀u > 0, P (|ϕ(λ) − a| ≥ u) ≤ exp − ,
K
|ϕ | ≤ C0 (4.135)
ϕ ≤ C1 (4.136)
ϕ ≤ C1 with probability ≥ 1 − δ (4.137)
E(ϕ(λ) − ϕ(λ))2k ≤ C2k . (4.138)
Then when C2 ≤ λ40 C12 we have
k/2
E(ϕ (0) − ϕ (0))2k ≤ Lk (δC02k + C1k C2 ) . (4.139)
Comment. We expect that (4.141) holds with the better bound (Kk/N )k . It
does not seem to be possible to prove this by the arguments of this section,
that create an irretrievable loss of information.
Proof. We are going to apply Lemma 4.5.2 to the function (4.130) with
λ0 = κ/2. Since |U1,1 | ≤ L on B the first part of (4.131) implies that |ϕ | ≤ L
so that (4.135) holds for C0 = L. We see from (4.132) and (4.133) that (4.136)
and (4.137) hold for C1 = K and δ = K exp(−N/K), and from Lemma 4.5.1
that (4.138) holds with C2 = Kk/N . We conclude from (4.139) that, provided
C2 ≤ λ40 C12 , and in particular whenever Kk/N ≤ κ4 K1 , we have
k/2
∗ ∗ 2k N Kk
E( U1,1 − E U1,1 ) ≤ K k
exp − + .
K2 N
This implies that (4.141) holds whenever exp(−N/K2 ) ≤ (Kk/N )k/2 . This
occurs provided k ≤ N/K3 . To handle the case k ≥ N/K3 , we simply write
that 4 k/2
∗ ∗ 2k L K3 k
E( U1,1 − E U1,1 ) ≤ L ≤ 2k
.
N
Proposition 4.5.4. For k ≥ 1 we have
k/2
Kk
E (U1,1 − E U1,1 ∗ )2k ≤ . (4.142)
N
Proof. Identical to that of Proposition 4.5.4, using now the Gibbs measure
with density proportional to exp(ψ(z) + λN zj ). The proof can be copied
verbatim replacing U1,1 by zj .
280 4. The Hopfield Model
using that 2vk1 vk2 ≤ (vk1 )2 + (vk2 )2 . The proof is then identical to that of
Proposition 4.5.4.
We now turn to the task of transferring our results from G∗ to G and then
to G . One expects that these measures are very close to each other; still we
must check that the exponentially small set B c does not create trouble. Such
lackluster technicalities occupy the rest of this section. Let us denote by · −
an average for G. By definition of · ∗ , for a function f on RM we have
1B f − = G(B) f ∗ , so that
− − − ∗ −
f = 1B f + 1B c f = G(B) f + 1B c f .
Taking expectation and using the Cauchy-Schwarz inequality in the last term
shows that when f ≥ 0 it holds
− ∗ − 1/2
Ef ≤E f + (EG(B c ))1/2 (E f 2 ) ,
satisfies
k
k
∀k , ET ≤ L 1 +
k k
, (4.148)
N
and therefore
∀k ≤ N , ET k ≤ Lk . (4.149)
This means that for all practical purposes, one can think of T as being
bounded. This quantity occurs in many situations.
Proof. We have
NT N 2
exp ≤ exp mj (σ)
4 σ
4
j≤M
and therefore
NT N 2
E exp ≤ E exp mj (σ)
4 σ
4
j≤M
N 2
= E exp m (σ) ≤ 2N +M , (4.150)
σ j≤M
4 j
using independence and (A.24) to see that E exp(N m2j (σ)/4) ≤ 2. We use
Lemma 3.1.8 with X = N T /4 to get, since M ≤ N , that
Corollary 4.5.8. For any number C that does not depend on N we have
E exp CT ≤ K(C).
Proof. We can either use (4.150) and Hölder’s inequality or expand the
exponential as a power series and use (4.148).
Lemma 4.5.9. We have
−
∀k ≤ N , E z2k ≤ Lk . (4.151)
Since
(N β)N
y2N dγ(y) ≤ 2M/2 ,
4N N !
so that y2N dγ(y) ≤ LN , and in particular y2k dγ(y) ≤ Lk for k ≤ N
by Hölder’s inequality.
By definition G is the image of G under that map σ → m(σ) =
(mk (σ))k≤M , so that
− K
|E U1,1 − E U1,1 ∗ | ≤ √
N
and (4.154). The proof of (4.155) is similar, and only a small adaptation of
(4.146) to the case of 2 replicas is required to prove (4.156) using the same
scheme.
The measure G itself is a technical tool. What we are really looking for
is information about G , and we are ready to prove it. We denote by · an
average for G .
4.5 Controlling the Overlaps 283
−
U1,1 = (x + y, x + y)dG (x)dγ(y) = (x, x)dG (x) + C
= U1,1 + C ,
where C = (y, y)dγ(y) is non-random. Thus, using (4.154),
k/2
− − 2k Kk
E( U1,1 − E U1,1 ) = E( U1,1 − E U1,1 ) 2k
≤ . (4.160)
N
Next,
−
(U1,1 − U2,2 )2k =
1 2k
(x + y1 , x1 + y1 ) − (x2 + y2 , x2 + y2 ) dG (x1 )dG (x2 )dγ(y1 )dγ(y2 )
by using Jensen’s inequality to integrate in γ inside the power (·)2k rather
than outside, and using again the fact that (x, y)dγ(y) = 0. Thus, applying
Jensen’s inequality in the second inequality below, we get
−
(U1,1 − U2,2 )2k ≥ (U1,1 − U2,2 )2k ≥ (U1,1 − U1,1 )2k .
Since (U1,1 − U2,2 )2k ≤ 22k ((U1,1 − E U1,1 − )2k + (U2,2 − E U1,1 − 2k
) ), using
(4.154) yields
k/2
Kk
E (U1,1 − U1,1 ) 2k
≤ .
N
284 4. The Hopfield Model
It follows from
(A.56) (used for a = 1) that with overwhelming probability
T = maxσ j≤M m2j (σ) ≤ LM/N . Using (4.149) and the Cauchy-Schwarz
inequality to control the expectation
of T on 2the rare event where this fails, we
obtain that ET ≤ LM/N . Since 2≤j≤M zj ≤ T by (4.153), this concludes
the proof.
We note that
ηk σN
mk (σ) = nk (σ) + , (4.173)
N
where for simplicity we write ηk rather than ηN,k .
286 4. The Hopfield Model
μ = ν(σN ) ; (4.174)
1 2
q = ν(σN σN ) ; (4.175)
M −1
ρ= + ν σN ηk nk (σ) ; (4.176)
N
2≤k≤M
M −1 1
r= q + ν σN ηk nk (σ 2 ) . (4.177)
N
2≤k≤M
ignoring the constant βM/(2N ) that plays no role. The strategy we will follow
should come as no surprise. We will express the averages · in Lemma 4.6.2
using the Hamiltonian (4.178). We will bet that the quantities
ηk nk (ρ) (4.179)
2≤k≤M
have a Gaussian behavior, and to bring this out we will interpolate them with
suitable Gaussian r.v.s. The reader may observe that in (4.179) the sum is
over 2 ≤ k ≤ M . The quantity n1 (ρ) requires a special treatment. The idea is
that for k ≥ 2 the quantity nk (ρ) should be very small, allowing the quantity
(4.179) to have a Gaussian behavior. On the other hand, one should think of
the quantity n1 (ρ) as n1 (ρ) μ = 0.
Given replicas, ρ1 , . . . , ρn , we write nk = nk (ρ ), and given a parameter
t we define
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 287
√ √ √ √
gt = t ηk nk + 1 − t(z r + ξ ρ − r) , (4.180)
2≤k≤M
We shall show that for the four choices of f occurring in Lemma 4.6.2 we
have ν0 (f0 ) ν1 (f1 ), where means that the error is ≤ KN −1/4 . This will
provide the desired equations for μ, ρ, r, q. The computation of ν0 (f0 ) is fun,
so we do it first. We write
√
Y = βz r + βμ + h .
Lemma 4.6.3. a) If f (σ 1 ) = σN
1
then
b) If f (σ 1 , σ 2 ) = σN
1 2
σN then
c) If f (σ 1 , x1 ) = σN
1
x1 then
d) If f (σ 1 , x1 , x2 ) = σN
1
x2 then
β(ρ − r)
Eξ shY = exp shY
2
288 4. The Hopfield Model
1 shY Eξ shY
ν0 (σN )=E =E ,
Eξ chY Eξ chY
this makes (4.184) obvious, and (4.185) is similar. So we prove (4.186). Now
√ √
Eξ (z r + ξ ρ − r)shY
ν0 (f0 ) = E ,
Eξ chY
We rewrite (4.190) as
r(1 − β(1 − q))2 αq(1 − β(1 − q)) + βq(ρ − r)(1 − β(1 − q)) .
Using (4.191) in the second term in the right-hand side then yields
We turn to the comparison of ν(f1 ) and ν0 (f0 ), with the goal of proving
that for the functions of Lemma 4.6.3 these two quantiles are nearly equal.
As expected this will be done by controlling the derivative of the function
t → νt (ft ). We define
1 1 √ √
gt = √ ηk nk − √ (z r + ξ ρ − r) . (4.193)
2 t 2≤k≤M 2 1−t
Here of course ε = σN and
∂ft ∂f
= (σ 1 , . . . , σ n , gt1 , . . . , gtn ) .
∂x ∂x
Proof. This looks complicated, but this is straightforward differentiation.
There are 3 separate reasons why νt (ft ) depends on t. First, ft depends on t
through gt , and this creates the term I. Second, νt (ft ) depends on t because
in (4.181) the term Et depends on t through gt , and this creates the term
II. Finally, νt (ft ) depends on t because in (4.181) the term Et depends on
t through the quantity (1 − t)μ, and this creates the term III. Let us also
mention that for clarity we have stated this result for general n but that the
case n = 2 suffices.
We would like to integrate by parts in the terms I and II using (4.193).
Unfortunately the r.v. ηk is not Gaussian, it is a random sign. We now de-
scribe the technique, called “approximate integration by parts”, that is a
substitute of integration by parts for such variables.
The basic fact is that if v is a three times differentiable function on R,
then
1
1
v(1) − v(−1) = v (1) + v (−1) + (x2 − 1)v (x)dx . (4.198)
2 −1
1 x=1 1
xv (x)dx = xv (x) − v (x)dx
−1 x=−1 −1
= v (1) + v (−1) − (v(1) − v(−1)) .
If η is a r.v. such that P(η = ±1) = 1/2, then (4.198) implies
1
1
E ηv(η) = E v (η) + (x2 − 1)v (x)dx . (4.199)
4 −1
We will call E v (η) the main term and the last term the error term. This term
will have a tendency to be small because v will depend little on η. √Typically
every occurrence of η in v is multiplied by a small factor (e.g. 1/ N ). We
will always bound the error term through the crude inequality
1 1 2
(x − 1)v (x)dx ≤ sup |v (x)| . (4.200)
4
−1 |x|≤1
The contribution of the main term is what we would get if the r.v. η had
been Gaussian.
We start to apply approximate integration by parts to (4.194). We take
care of the main terms first. These terms are the same as if we were integrating
by parts for Gaussian r.v.s, and we have learned how to make this calculation
in Chapter 3. Let us set
S, = nk nk ,
2≤k≤M
so that (being careful to distinguish between gt and gt , where the position
of the is not the same) the relations
= ⇒ E gt gt = S, − r
E gt gt = S, − ρ
hold and integration by parts brings out factors S, − r and S, − ρ. The
dependence on gt is through the Hamiltonian and ft . It then should be clear
that the contribution of the main terms to the integration by parts in II is
bounded by
∂ft
IV = K νt
|ft | + |S, − r|
∂x
= ≤n+2
∂ft
+ νt
|ft | + |S, − ρ| . (4.201)
∂x
≤n+1
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 291
where
T − = sup n2k (ρ) , (4.203)
ρ
2≤k≤M
Taking first expectation E0 inside the bracket and using the Cauchy-Schwarz
inequality we get
1/2
νt (f ∗ ) ≤ E (E0 f ∗2 )1/2 E0 ch2 Yt, . (4.205)
≤n −
We claim that
E0 ch2 Y,t ≤ K exp KT − . (4.206)
≤n
and recalling the definition (4.180) of gt , using (A.6) and independence, we
see that indeed
E0 exp(±6Yt, ) ≤ K exp KT − .
Combining (4.206) and (4.205) we get
νt (f ∗ ) ≤ KE (E0 f ∗2 )1/2 − exp K1 T − . (4.207)
using that E0 (1/X) ≥ 1/E0 X for X = chY1,1 n , and using (4.206) for
t = 1. We write this inequality for f ∼ = (E0 f ∗2 )1/2 (that depends only
on ρ1 , . . . , ρn ), we multiply by exp((K1 + K2 )T − ) and we take expectation
to get
E (E0 f ∗2 )1/2 − exp K1 T − ≤ ν (E0 f ∗2 )1/2 exp KT − .
Combining with (4.207) this proves (4.202). The point of (4.204) is that T −
is not bounded, so we write
ν (E0 f ∗2 )1/2 exp KT − ≤ exp KLν((E0 f ∗2 )1/2 )
+ ν 1{T − ≥L} (E0 f ∗2 )1/2 exp KT − .
Using Corollary 4.5.8 for N − 1 rather than N yields that E exp 4KT − ≤ K.
Using (4.150) for for N − 1 rather than N we then obtain that if L is large
enough we have P(T − ≥ L) ≤ exp(−4N ).
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 293
Corollary 4.6.6. If f is one of the functions of Lemma 4.6.3 then the term
(4.201) satisfies
N
IV ≤ Kν(|S1,2 − r| + |S1,1 − ρ|) + K exp − .
K
so that
E0 f ∗2 ≤ K|S, − r|2 . (4.208)
Now, the Cauchy-Schwarz inequality implies | 2≤k≤M mk (σ )mk (σ )| ≤ T ,
1 2
Proceeding in the same manner for the other terms of (4.201) completes the
proof.
Next we deduce from Proposition 4.6.1 that the term IV is ≤ KN −1/4 .
Using obvious notation,
S1,2 − m1k m2k = n1k n2k − m1k m2k
2≤k≤M 2≤k≤M
= ((n1k − m1k )m2k + m1k (n2k − m2k ))
2≤k≤M
+ (n1k − m1k )(n2k − m2k )
2≤k≤M
and using that |nk − mk | ≤ 1/N , the Cauchy-Schwarz inequality and (4.149)
yields
K
ν S1,2 − m1k m2k ≤ √ .
2≤k≤M
N
Using (4.170) for k = 1 we then obtain that ν(|S1,2 − r|) ≤ KN −1/4 . We then
proceed similarly for the other terms.
In this manner we can control the main terms produced by approximate
integration by parts in the term II of (4.196). The case of the term I of
(4.196) is entirely similar, and the term III of (4.197) is immediate to control
as in Corollary 4.6.6. We turn to the control of the error terms produced
by approximate integration by parts. Let us fix 2 ≤ k ≤ M , and consider
294 4. The Hopfield Model
t, x, σ 1 , · · · , σ n let us define
Eξ Avε1 ,...,εn =±1 f ∗ Et,x −
f∗ t,x = .
Eξ Avε1 ,...,εn =±1 Et,x −
We consider the function
We then reproduce the argument of Lemma 4.6.5 to find that this quantity
is bounded by
Kν((nk )4 ) + K exp(−N/K) . (4.210)
The bound (4.200) implies that the error term created by the approximate
integration by parts in the quantity νt (η nk ε ft ) is bounded by the quantity
(4.210). The sum over all values of k of these errors is bounded by
N
ν (nk ) + K exp −
4
.
K
2≤k≤M
Now we use (4.171) for k = 3 to see that ν((nk )6 ) ≤ KN −3/2 and thus
K
ν (nk )6 ≤ 1/2 .
2≤k≤M
N
Finally, recalling (4.203) we have 2≤k≤M (nk )2 ≤ T − , so that, using (4.149)
for N − 1 rather than N , we get ν( 2≤k≤M (nk )2 ) ≤ E T − ≤ L . Therefore
K
ν (nk )4 ≤ .
N 1/4
2≤k≤M
This completes the proof that the equations (4.188) and (4.192) are satisfied
with error terms ≤ KN −1/4 .
The Hopfield model was introduced in [118], but became popular only after
Hopfield [79], [80] put it forward as a model of memory. For this aspect
a model
as of memory, it is the energy landscape, i.e. the function σ →
2
k≤M m k (σ) that matters. There are some rigorous results, [112], [97], [142],
[132], [56] but they are based on ad hoc methods, none of which deserves to
appear in a book. A detailed study of the model from the physicists’ point
of view appears in [3].
The first attempt at justifying the replica-symmetric equations can be
found in [121]. The authors try to duplicate the results of [120] for the Hopfield
model, i.e. to establish the replica-symmetric equations under the assump-
tion that a certain quantity does not fluctuate with the disorder. This paper
contains many interesting ideas, but one could of course wonder, among other
things, how one could prove anything at all without addressing the question
of uniqueness of the solutions of these equations. See also [122].
My notation differs from the traditional one as I call r what is traditionally
called rα. Thus, the replica-symmetric equations usually read
√
q = E th2 (βz rα + βμ + h)
√
μ = E th(βz rα + βμ + h)
q
r= .
(1 − β(1 − q))2
296 4. The Hopfield Model
This might be natural when one derives these equations from the “replica
trick”. The reason for not following this tradition is that the entire approach
starts with studying the sequence (mk (σ))k≤M , and its global behavior (as in
the Bovier-Gayrard localization theorem). Thus it is natural to incorporate
the data about the length of this sequence (i.e. α) in the parameter r. (Maybe
it is not such a good idea after all, but it is too late to change it anyway!)
The Bovier-Gayrard localization theorem is the culmination of a series of
papers of these authors, sometimes with P. Picco. I am unsure as to whether
the alternate approach I give here is better than the original one, but at least
it is different. Bovier and Gayrard put forward the law of (mk (σ)) under
Gibbs’ measure as the central object. This greatly influenced the paper [142]
where I first proved the validity of the replica-symmetric solution, using the
cavity method. Very soon after seeing the paper [142], Bovier and Gayrard
gave a simpler proof [28], based on convexity properties of the function ψ of
(4.34) (which they proved) and on the Brascamp-Lieb inequalities. It is quite
interesting that the convexity of ψ does not seem to hold in the whole region
where there is replica-symmetry (the physicists’ way to say that R1,2 q).
Despite this the Bovier-Gayrard approach is of interest, as will become even
clearer in Section 6.7. I have largely followed it here, rewriting of course some
of the technicalities in the spirit of the rest of the book. In Volume II I
will present my own approach, which is not really that much more difficult,
although it yields much more accurate results.
In her paper [128], Shcherbina claims that her methods allow her to prove
that the replica-symmetric solution holds on a large region. It would indeed be
very nice to have a proof of the validity of the replica-symmetric solution that
does not require to prove first something like the Bovier-Gayrard localization
theorem. It is sad to see how some authors apparently do not care whether
their ideas will be transmitted to the community or will be lost. More likely
than not, in the present case they will be lost.
The paper [17] should not be missed. The interesting paper [20] is also
related to the present chapter.
5. The V -statistics Model
5.1 Introduction
The name of the model is motivated by the fact that the right-hand side of
(5.3) resembles an estimator known as a V -statistics. However no knowledge
about these seems relevant for the present chapter. The case of interest is
when M is a proportion of N . Then HN,M is of order N . As in Chapter
2, we will be interested only in the “algebraic” structure connected with
the Hamiltonian (5.3), so we will decrease technicalities by making a strong
assumption on u. We assume that for a certain number D,
How, and to which extent, this condition can be relaxed is an open problem,
although one could expect that techniques similar to those presented later in
Chapter 9 should bear on this question. We assume
D≥1. (5.5)
Let us first try to describe what happens at a global level. In the high-
temperature regime, we expect to have the usual relation
R1,2 q (5.6)
where R1,2 = N −1 i≤N σi1 σi2 , and where the number q depends on the
system.
Throughout the chapter we use the notation
∂u
w= . (5.7)
∂x
Thus the symmetry condition (5.2) implies
∂u ∂u
w(x, y) = (x, y) = (y, x) . (5.8)
∂x ∂y
where as usual Sk = Sk (σ ). The new and unexpected feature is that the
computation of r seems to require the use of an auxiliary function γ(x).
Intuitively this function γ satisfies
1 M
γ(x) E u(x, Sk ) = E u(x, SM ) ,
N N
k≤M
where of course the bracket denotes an average for the Gibbs measure with
Hamiltonian (5.3). The reason behind the occurrence of this function is the
“cavity in M ” argument. Going from M − 1 to M we add the term
1
u(Sk , SM )
N
k<M
to the Hamiltonian, and we will prove that in fact this term acts somewhat as
γ(SM ). The function γ will be determined through a self-consistency equation
and will in turn allow the computation of r. In the present model, the “replica
symmetric equations” are a system of there equations with three unknowns,
one of which is the function γ.
5.2 The Smart Path 299
r = EY 2 (5.12)
f = f (σ 1 , . . . , σ n , ξ 1 , . . . , ξ n )
where Zt = Eξ σ exp(−Ht (σ)) and Ht = Ht,N,M (σ , ξ ). We write
d
νt (f ) = E f t ; νt (f ) = νt (f ) ,
dt
and, as usual, ε = σN . We also recall that ν = ν1 .
This interpolation is designed to decouple the last spin. The following is
proved as Lemma 1.6.2.
n
Proposition 5.2.2. For a function f on ΣN , we have
νt (f ) = I + II , (5.14)
where, defining
1
A, = 3
νt ε ε w(Sk 1 ,t , Sk 2 ,t )w(Sk 1 ,t , Sk 3 ,t )f (5.15)
N
for a summation over k1 , k2 , k3 ≤ M , k2 = k1 , k3 = k1 , we have
n(n + 1)
I= A, − n A,n+1 + An+1,n+2 (5.16)
2
1≤< ≤n ≤n
and
n(n + 1)
II = −r νt (ε ε f ) − n νt (ε εn+1 f ) + νt (εn+1 εn+2 f ) .
2
1≤< ≤n ≤n
(5.17)
∂u
(x, y) = w(y, x) ,
∂y
d 1
(−HN,M,t (σ )) = Sk 1 ,t w(Sk 1 ,t , Sk 2 ,t ) + Sk 2 ,t w(Sk 2 ,t , Sk 1 ,t )
dt N
1≤k1 <k2 ≤M
1
− √ ε Y
2 1−t
1 1
= Sk1 ,t w(Sk 1 ,t , Sk 2 ,t ) − √ ε Y .
N
k =k
2 1−t
1 2
νt (f ) = III + IV , (5.18)
where
III = νt (D f ) − nνt (Dn+1 f )
≤n
5.2 The Smart Path 301
for
1
D = Sk1 ,t w(Sk 1 ,t , Sk 2 ,t )
N
k1 =k2
and where
1
IV = − √ νt (ε Y f ) − nνt (εn+1 Y f ) .
2 1 − t ≤n
1
ESk,t
Sk,t = ε ε ;
ESk,t Sk ,t = 0 if k = k
2N
hold and (with the usual abuse of notation)
∂HN,M,t 1
= w(Sk,t , Sk2 ,t ) + w(Sk,t , Sk1 ,t )
∂Sk,t N
k<k2 k1 <k
1
= w(Sk,t , Sk ,t ) .
N
k =k
Corollary 5.2.3. Assume that D2 α3 ≤ 1 and |r| ≤ 1. Then for any function
f ≥ 0 on ΣN
n
we have
2
νt (f ) ≤ Ln ν(f ) . (5.19)
5.3 Cavity in M
We would like, with the appropriate choice of r (i.e. if (5.9) holds), that the
terms I and II of (5.14) nearly cancel out. So we need to make sense of the
term A, . To lighten notation we assume = 1, = 2.
In the summation (5.15) there are at most M 2 terms for which k2 = k3 .
Defining
1
A1,2 = νt ε1 ε2 w(Sk11 ,t , Sk13 ,t )w(Sk21 ,t , Sk22 ,t )f
N3
k1 ,k2 ,k3 all different
KM 2 K K
|A1,2 − A1,2 | ≤ νt (|f |) = α2 νt (|f |) ≤ νt (|f |) , (5.20)
N3 N N
where K is a number depending only on D and α. Each triplet (k1 , k2 , k3 )
brings the same contribution to A1,2 , so that
M (M − 1)(M − 2) 1 1 2 2
A1,2 = 3
νt ε1 ε2 w(SM,t , SM −1,t )w(SM,t , SM −2,t )f .
N
Therefore, defining
1 1 2 2
C1,2 = νt ε1 ε2 w(SM,t , SM −1,t )w(SM,t , SM −2,t )f , (5.21)
we have
M (M − 1)(M − 2)
A1,2 = C1,2 ,
N3
so that
K
|A1,2 − α3 C1,2 | ≤ .
N
Combining with (5.20) we reach that
K
|A1,2 − α3 C1,2 | ≤ νt (|f |) . (5.22)
N
To estimate C1,2 it seems a good idea to make explicit the dependence of the
Hamiltonian on SM,t , SM −1,t and SM −2,t . Defining
1 √
− HN,M −3,t = u(Sk1 ,t , Sk2 ,t ) + 1 − tσN Y , (5.23)
N
1≤k1 <k2 ≤M −3
it holds that
− HN,M,t = −HN,M −3,t − H , (5.24)
where
5.3 Cavity in M 303
1
−H = u(Sk1 ,t , Sk2 ,t ) . (5.25)
N
1≤k1 <k2 ≤M,k2 ≥M −2
(5.26)
where Zt,∼ is the normalization factor, Zt,∼ = E− ξ σ exp(−H N,M −3,t (σ))
−
and where Eξ denotes expectation in the r.v.s ξk for ≥ 1 and k ≤ M − 3.
and for the same value of k their pairwise correlation will be a new parameter
q. So we fix 0 ≤ q ≤ 1 (which will be determined later) and for j = 0, 1, 2,
≤ n, we consider independent standard Gaussian r.v.s zj and ξˆj (that are
independent of all the other sources of randomness) and we set
√
θj = zj q + ξˆj 1−q .
For 0 ≤ v ≤ 1 we define
√ √
Sj,v =
vSM −j,t + 1 − vθj , (5.31)
keeping the dependence of Sj,v on t implicit. (The reader will observe that,
despite the similarity of notation, it is in practice impossible to confuse the
quantity Sk,t with the quantity Sj,v . Here again we choose a bit of informality
over heavy notation.) Let us denote by
hEv t,∼
νt,v (h) = E , (5.33)
Eξ Ev t,∼
Moreover, if αD ≤ 1 and
1 1 2 2
Bv = w(S0,v , S1,v )w(S0,v , S2,v ), (5.36)
we have
d
νt,v (Bv f ) ≤ Ln2 D2 νt,v (f 2 )1/2 νt,v (R1,2 − q)2 1/2 + K νt,v (|f |) . (5.37)
dv N
5.3 Cavity in M 305
and the best we can do is to bound this term by D2 νt,v (|f ||R1,2
t
− q|).
2
The factor α in (5.35) is not really needed for the rest of the proof. There
is a lot of room in the arguments. However it occurs so effortlessly that we
see no reason to omit it. This might puzzle the reader.
Exercise 5.3.4. Prove the first part of (5.38) using a differential inequality.
306 5. The V -statistics Model
where the dependence on t is kept implicit in the left-hand side. Then, for
v = 0, the quantity Ev of (5.32) is equal to
E0 = E E (5.39)
for
E = exp γ (θj ) (5.40)
≤n j=0,1,2
1
E = exp (u(θ0 , θ2 ) + u(θ0 , θ1 ) + u(θ1 , θ2 )) . (5.41)
N
≤n
This is seen by separating in the sum (5.25) the terms for which k1 = M − 2
or k1 = M − 1 (these create E ).
Of course the influence of E will be very small; but to understand the
influence of E , we must understand the function γ . We explain first the
heuristics.
We hope that the quantities (Sk,t )k≤M −3 behave roughly like independent
r.v.s under the averages · t,∼ , so that by the law of large numbers, we should
have that for each ,
M −3
γ (x) E u(S1,t , x) t,∼ αE u(S1,t , x) t,∼ . (5.42)
N
This shows that (in the limit N → ∞) γ does not depend on and is not
random. We denote by γ this non-random function, and we now look for the
relation it should satisfy. It seems very plausible that
E u(S1,t , x) t,∼ νt (u(S1,t , x)) = νt (u(SM,t , x)) (5.43)
by symmetry. We expect that (5.37) still holds, with a similar proof, if we
define now Bv = u(Sv , x). Assuming that as expected R1,2 q, we should
have νt (B1 ) νt,0 (B0 ) i.e. (with obvious notation: since there is only one
replica, we no longer need replica indices)
νt (u(SM,t , x)) νt,0 (u(θ0 , x))
u(θ0 , x) exp 0≤j≤2 γ(θj )
E . (5.44)
Eξ exp 0≤j≤2 γ(θj )
5.4 The New Equation 307
Lemma 5.4.2. If LDα ≤ 1, given any value of q then there exists a unique
function γ = γα,q from R to [0, 1] that satisfies (5.46). Moreover, given any
other function γ∗ from R to [0, 1] we have
u(θ, y) exp γ∗ (θ)
sup |γ(x) − γ∗ (x)| ≤ 2 sup γ(y) − αE . (5.47)
x y Eξ exp γ∗ (θ)
We remind the reader that throughout the book a statement such as “If
LDα ≤ 1...” is a short-hand for “There exists a universal constant L with
the following property. If LDα ≤ 1...”
Proof. This is of course a “contraction argument”. Consider the supremum
norm · ∞ on the space C of functions from R to [−1, 1]. Consider the
operator U that associates to a function ψ ∈ C the function U (ψ) given by
Since 1/e ≤ exp ψ(θ) ≤ e and |u(θ, x)| ≤ D we have |U (ψ)(x)| ≤ αDe2 , so if
αDe2 ≤ 1 we have U (ψ) ∈ C. Consider ψ1 , ψ2 ∈ C and ϕ(t) = U (tψ1 + (1 −
t)ψ2 ) ∈ C, so that, writing Et = exp(tψ1 (θ) + (1 − t)ψ2 (θ)), we get
308 5. The V -statistics Model
LαD ≤ 1 (5.49)
we have
) 2 *
1 K
∀x , E u(Sk,t , x) − γ(x) ≤ L(αD)2 E (R1,2 −q)2 t,∼ + .
N N
k≤M −3 t,∼
(5.50)
hE∗ ∗
Eh t,∼ =E (5.54)
Eξ E∗ ∗
holds, where
1
E∗ = exp u(Sk,t , SM −3,t ) , (5.55)
N
k≤M −4
and where Eξ denotes expectation in the variables ξk . Again, we must devise a
cavity argument with the underlying belief that SM −3,t will have a Gaussian
behavior. So, considering independent standard Gaussian r.v.s z and ξ, ˆ and
defining
√
θ = z q + ξˆ 1 − q ,
√ √
for 0 ≤ v ≤ 1 we set Sv = vSM −3,t + 1 − vθ. We consider the function
(αu(Sv , x) − γ(x))A∗ (x) exp N1 k≤M −4 u(Sk,t , Sv ) ∗
ψ(v) = E , (5.56)
Eξ exp N1 k≤M −4 u(Sk,t , Sv ) ∗
We will bound |ψ (v)| (as in Lemma 5.3.2) but let us first look at ψ(0).
Defining
1
B(x) = u(Sk,t , x) = A∗ (x) + γ(x) , (5.58)
N
k≤M −4
we have
(αu(θ, x) − γ(x))A∗ (x) exp B(θ) ∗
ψ(0) = E .
Eξ exp B(θ) ∗
Since we are following the pattern of Section 5.3, it should not come as a
surprise that the value of ψ(0) is not completely trivial to estimate; but a
last interpolation will suffice. For 0 ≤ s ≤ 1 we consider
310 5. The V -statistics Model
(αu(θ, x) − γ(x))A∗ (x) exp(sB(θ) + (1 − s)γ(θ)) ∗
ψ∗ (s) = E . (5.59)
Eξ exp(sB(θ) + (1 − s)γ(θ)) ∗
d
(sB(θ) + (1 − s)γ(θ)) = B(θ) − γ(θ) = A∗ (θ) .
ds
Writing Es = exp(sB(θ) + (1 − s)γ(θ)), we find
we then get
1/2 1/2 K
|ψ (v)| ≤ LαDE A∗ (x)2 ∗ E (R1,2 − q)2 ∗ + ,
N
so that
1/2 1/2
ψ(1) ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗
1/2 1/2 K
+ LαDE A∗ (x)2 ∗ E (R1,2 − q)2 ∗ + . (5.63)
N
5.4 The New Equation 311
For L αD ≤ 1 we have
1/2 1/2 1 1
LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ ≤ E A∗ (x)2 ∗ + E A∗ (θ)2 ∗ ,
16 16
and the inequality ab ≤ a2 /t + tb2 for t = LαD implies
1/2 1/2 1
LαDE A∗ (x)2 ∗ E (R1,2 −q)2 ∗ ≤ E A∗ (x)2 ∗ +L(αD)2 E (R1,2 −q)2 ∗ .
16
Combining with (5.63) we get
K 1 1
ψ(1) ≤ + E A∗ (x)2 ∗ + E A∗ (θ)2 ∗ + L(αD)2 E (R1,2 − q)2 ∗ .
N 8 16
Combining with (5.53) and (5.57) we then obtain
K 1 1
E A(x)2 t,∼ ≤ + E A∗ (x)2 ∗ + E A∗ (θ)2 ∗
N 8 16
+ L(αD)2 E (R1,2 − q)2 ∗ . (5.64)
Now since |A(x) − A∗ (x)| ≤ K/N we have A∗ (x)2 ≤ A(x)2 + K/N and thus
K
E A∗ (θ)2 ∗ ≤ E A(θ)2 ∗ + .
N
In the quantity E A(θ)2 ∗ , the r.v. θ is independent of the randomness of
· ∗ . So, denoting by E∗ expectation in the randomness of this bracket only,
we have
E∗ A(θ)2 ∗ ≤ sup E∗ A(y)2 ∗ = sup E A(y)2 ∗ ,
y y
and thus
3 1 K
sup E A(x)2 t,∼ ≤ sup E A(y)2 t,∼ + + L(αD)2 E (R1,2 − q)2 t,∼ .
4 x 4 y N
Therefore we get
K
sup E A(x)2 t,∼ ≤ L(αD)2 E (R1,2 − q)2 t,∼ + , (5.66)
x N
and recalling the definition (5.51) of A(x) this is exactly (5.50).
312 5. The V -statistics Model
Now that we have proved Theorem 5.4.3, we can go back to the study of
the quantity νt,0 (B0 f ) of Section 5.3. Given 0 ≤ q ≤ 1, and the function γ
provided by Lemma 5.4.2, we define
2
Eξ γ (θ) exp γ(θ)
r∗ = E . (5.67)
Eξ exp γ(θ)
1/2 K
|νt,0 (f )− f t,∼ | ≤ Ln αD(E f 2 t,∼ )
1/2
(E (R1,2 −q)2 t,∼ ) + (E f2 t,∼ )
1/2
.
N
(5.69)
When Bv is given by (5.36) we have
and we consider
B0 f E(s) t,∼
ψ(s) = α2 E . (5.72)
Eξ E(s) t,∼
The fundamental formula (5.33) shows that ψ(1) = α2 νt,0 (B0 f ). As expected
we will compute ψ(0) and bound ψ (s). Using that B0 = w(θ01 , θ11 )w(θ02 , θ22 )
we get, by independence of θj and of the randomness · t,∼
5.5 The Replica-Symmetric Solution 313
w(θ01 , θ11 )w(θ02 , θ22 ) exp j=0,1,2 ≤n γ(θj )
ψ(0) = α2 E Ef t,∼ . (5.73)
Eξ exp j=0,1,2, ≤n γ(θj )
and
Eξ w(θ01 , θ11 )w(θ02 , θ22 ) exp γ(θj )
j=0,1,2, ≤n
Therefore
ψ(0) = EU1 U2 E f t,∼ , (5.74)
where
Eξ w(θ01 , θ11 ) exp(γ(θ01 ) + γ(θ11 ))
U1 = α
Eξ exp γ(θ01 )Eξ exp γ(θ11 )
Eξ w(θ02 , θ22 ) exp(γ(θ02 ) + γ(θ22 ))
U2 = α .
Eξ exp γ(θ02 )Eξ exp γ(θ22 )
√ √
Let us now recall that θj = zj q + ξˆj 1 − q, where the Gaussian r.v.s
zj , ξˆj are all independent of each other. Let us denote by Ej expectation in
zj only, and E,j expectation in ξˆj only. Then
and
E1,0 E1,1 w(θ01 , θ11 ) exp γ(θ01 ) exp γ(θ11 )
E1 U1 = αE1
E1,0 exp γ(θ01 )E1,1 exp(θ11 )
exp γ(θ01 ) w(θ01 , θ11 ) exp γ(θ11 )
= αE1 E1,0 E1,1
E1,0 exp γ(θ01 ) E1,1 exp(θ11 )
1
exp γ(θ0 ) w(θ01 , θ11 ) exp γ(θ11 )
= αE1,0 E1 E1,1 .
E1,0 exp γ(θ01 ) E1,1 exp γ(θ11 )
so that
E1,0 γ (θ01 ) exp γ(θ01 )
E1 U1 = .
E1,0 exp γ(θ01 )
In a similar manner,
E2,0 γ (θ02 ) exp γ(θ02 )
E2 U2 = ,
E2,0 exp γ(θ02 )
so that
Eξ γ (θ) exp γ(θ)
E1 U1 = E2 U2 = ,
Eξ exp γ(θ)
and thus EU1 U2 = E(E1 U1 )2 = r∗ by (5.67). Thus we have shown that
ψ(0) = r∗ E f t,∼ .
To bound ψ (s), we proceed very much as in the proof of (5.61). We define
1
A (x) =
u(Sk,t , x) − γ(x) .
N
k≤M −3
K
+ E f 2 2t,∼
N
1/2
≤ LαDνt (f 2 )1/2 νt (R1,2 − q)2
K
+ νt (f 2 )1/2 . (5.78)
N
It follows from (5.37) and Lemma 5.3.3 again that
1/2 K
|α2 νt,0 (B0 f )−α2 νt (B1 f )| ≤ Lα2 D2 νt (f 2 )1/2 νt (R1,2 −q)2 + νt (f 2 )1/2 .
N
Moreover from (5.35) we see that the quantity r∗ |νt,0 (f ) − νt (f )| satisfies the
same bound. Combining with (5.78) we obtain
1/2 K
|α2 νt (B1 f ) − r∗ νt (f )| ≤ LαDνt (f 2 )νt (R1,2 − q)2 + νt (f 2 )1/2 .
N
Replacing f by ε1 ε2 f , and since νt (B1 ε1 ε2 f ) = C1,2 by (5.21), the result
follows from (5.22).
2
Corollary 5.5.3. If f is a function on ΣN , if LαD ≤ 1, and if
r = αr∗ (5.79)
we have
1/2 K
|νt (f )| ≤ Lα2 Dν(f 2 )1/2 ν (R1,2 − q)2 + ν(f 2 )1/2 . (5.80)
N
Proof. We combine (5.14) and (5.77).
√ √
Theorem 5.5.4. If LαD ≤ 1, α ≤ 1, writing as usual θ = z q + ξ 1 − q,
the system of three equations (5.46),
2
Eξ γ (θ) exp γ(θ)
r = αE (5.81)
Eξ exp γ(θ)
√
q = Eth2 z r (5.82)
with unknown (q, r, γ) has a unique solution and
K
ν (R1,2 − q)2 ≤ . (5.83)
N
316 5. The V -statistics Model
Proof. First we will show that (5.46) and (5.81) define r as a continuous
function r(q) of q. Thinking of α as fixed once and for all, we denote by γq the
solution of (5.46). We will first show that the map q → γq ∈ C is continuous
when C is provided with the topology induced by the supremum norm · .
Let us write θ = θq to make explicit the dependence on q. Let us fix q0 and
let us consider the function q → ψ(q) ∈ C given by
We will follow the method of the first proof of Theorem 2.4.2. We consider q
and r as in Theorem 5.5.4. We consider independent standard Gaussian r.v.s
z, (zk )k≤M , (zi )i≤N , (ξk )k≤M , we write
√ √ √
θk = zk q + ξk 1 − q ; Sk,s = sSk + 1 − sθk , (5.86)
We define
1
pN,M,s = E log Eξ exp(−HN,M,s ) .
N σ
where
1 1
p∗N,M = E log Eξ exp u(θk1 , θk2 ) , (5.89)
N N
1≤k1 <k2 ≤M
lim p∗N,M .
N →∞ ,M/N →α
The obstacle here is that it is not clear how to use condition (5.46).
Comparing (5.92) and (5.93) we get the relation
∂ ∂ r(α, q)
(E log Eξ exp γα,q (θ)) = − . (5.94)
∂q ∂α 2
A direct proof of this mysterious relation would provide a solution to the
exercise. The difficulty is of course that γα,q depends on q and α.
Proof of Proposition 5.5.6. From now on until the end of the chapter,
the arguments will be complete but sketchy, as they will rely on simplified
versions of techniques we have already used in this chapter. We define the
function W (α, q) by W (0, q) = 0 and (5.92).
Since the very definition of p∗N,M involves thinking of the variables ξk as
spins, we will approach the problem by the methods we have developed to
study spin systems. We write the identity
1
N (p∗N,M +1 − p∗N,M ) = E log Eξ exp u(θk , θM +1 ) (5.95)
N
1≤k≤M
1
− HN,M = u(θk1 , θk2 ) . (5.97)
N
1≤k1 <k2 ≤M
K
E A(x)2 ≤ E (αu(θM , x) − γα,q (x))A∗ (x) + . (5.99)
N
Let us denote by · ∗ an average as in (5.96) but for the Hamiltonian HN,M −1 .
Let
1
B(x) = u(θk , θM ) = A∗ (x) + γα,q (x) ,
N
1≤k<M
so that
(αu(θM , x) − γα,q (x))A∗ (x) exp B(θM ) ∗
(αu(θM , x) − γα,q (x))A∗ (x) = .
Eξ exp B(θM ) ∗
1/2 1/2 K
E A(x)2 ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ + .
N
Also, we have E h ∗ ≤ L h when h is a positive function, so that
K
E A(x)2 ≤ LαDE A∗ (x)2 1/2
E A∗ (θ)2 1/2
+ ,
N
after which we conclude the proof of (5.98) as in the few lines of the proof of
Theorem 5.4.3 that follow (5.64).
Combining (5.98) and (5.95) yields
K
|N (p∗N,M +1 − p∗N,M ) − E log Eξ exp γα,q (θ)| ≤ √ . (5.100)
N
320 5. The V -statistics Model
i.e.
K
N W M + 1 , q − W M , q − E log Eξ exp γα,q (θ) ≤ .
N N N
∂ ∗ 1
pN,M = 2 E (θk1 w(θk1 , θk2 ) + θk2 w(θk2 , θk1 ))
∂q N
1≤k1 <k2 ≤M
where
1 1
θk = √ zk − √ ξk
2 q 2 1−q
and where the bracket · is as in (5.96). Thus
∂ ∗ 1
p = 2E θk1 w(θk1 , θk2 )
∂q N,M N
k1 =k2
1 k1 =k2 θk1 w(θk1 , θk2 ) exp(−HN,M )
= 2E .
N Eξ exp(−HN,M )
so that finally
∂ ∗ 1
pN,M = − E w(θk1 , θk2 )w(θk1 , θk3 ) . (5.101)
∂q 2N 3
k1 =k2 ,k1 =k3
Proof. Since
1
∂
pN,M = pN,M,1 = pN,M,0 + pN,M,s ds ,
0 ∂s
combining with (5.88) and (5.91) it suffices to prove that
∂
pN,M,s + r (1 − q) ≤ √K . (5.105)
∂s 2 N
First we compute ∂pN,M,s /∂s using straightforward differentiation. Denoting
by νs the average corresponding to the Hamiltonian (5.87) and defining
1 1
Sk,s = √ Sk − √ θk
2 s 2 1−s
we get
∂
pN,M,s = I + II ,
∂s
where
1
I= νs (Sk1 ,s w(Sk1 ,s , Sk2 ,s ))
N2
k1 =k2
and
1 √
II = − √ νs σi zi r .
2 1−s i≤N
Exercise 5.5.9. Improve the rate of (5.104) into the usual rate K/N . (This
requires very significant work.)
6. The Diluted SK Model and the K-Sat
Problem
6.1 Introduction
In the SK model, each individual (or spin) interacts with every other indi-
vidual. For large N , this does not make physical sense. Rather, we would like
that, as N → ∞, a given individual typically interacts only with a bounded
number of other individuals. This motivates the introduction of the diluted
SK model. In this model, the Hamiltonian is given by
− HN (σ) = β gij γij σi σj . (6.1)
i<j
As usual, (gij )i<j are i.i.d. standard Gaussian r.v.s. The quantities γij ∈
{0, 1} determine which of the interaction terms are actually present in the
Hamiltonian. There is an interaction term between σi and σj only when γij =
1. The natural choice for these quantities is to consider a parameter γ > 0
(that does not depend on N ) indicating “how diluted is the interaction”,
and to decide that the quantities γij are i.i.d. r.v.s with P(γij = 1) = γ/N ,
P(γij = 0) = 1 − γ/N , and are independent from the r.v.s gij . Thus, the
expected number of terms in (6.1) is
γ N (N − 1) γ(N − 1)
= ,
N 2 2
and the expected number of terms that contain σi is about γ/2. That is,
the average number of spins that interact with one given√spin is about γ/2.
One should observe that the usual normalizing factor 1/ N does not occur
in (6.1).
If we draw an edge between i and j when γij = 1, the resulting random
graph is well understood [12]. When γ < 1, this graph has only small con-
nected components, so there is no “global interaction” and the situation is
not so interesting. In order to get a challenging model we must certainly allow
the case where γ takes any positive value.
In an apparently unrelated direction, let us remind the reader that the
motivation of Chapter 2 is the problem as to whether certain random subsets
of {−1, 1}N have a non-empty intersection. In Chapter 2, we considered “ran-
dom half-spaces”. These somehow “depend on all coordinates”. What would
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 325
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 6, © Springer-Verlag Berlin Heidelberg 2011
326 6. The Diluted SK Model and the K-Sat Problem
It turns out that the mean number of terms of the Hamiltonian that
depend on a given spin is of particular relevance. This number is γ = αp
(where α is such that EM = αN ), and for simplicity of notation this will be
our main parameter rather than α.
The purpose of this chapter is to describe the behavior of the system
governed by the Hamiltonian (6.4) under a “high-temperature condition”
asserting in some sense that this Hamiltonian is small enough. This condition
will involve the r.v. S given by
Since the mean number of spins interacting with a given spin remains
bounded independently of N , the central limit theorem does not apply, and
the ubiquitous Gaussian behavior of the previous chapters is now absent.
Despite this fundamental difference, and even though this is hard to express
explicitly, there are many striking similarities.
We now outline the organization of this chapter. A feature of our approach
is that, in contrast with what happened for the previous models, we do not
know how to gain control of the model “in one step”. Rather, we will first
prove in Section 6.2 that for large N a small collection of spins are approx-
imately independent under a condition like (6.6). This is the main content
of Theorem 6.2.2. The next main step takes place in Section 6.4, where in
Theorem 6.4.1 we prove that under a condition like (6.6), a few quantities
σ1 , . . . , σk are approximately independent with law μγ where μγ is a prob-
ability measure on [0, 1], that is described in Section 6.3 as the fixed point of a
(complicated) operator. This result is then used in the last
part of Section 6.4
to compute limN →∞ pN (γ), where pN (γ) = N −1 E log exp(−HN (σ)), still
under a “high-temperature” condition of the type (6.6). In Section 6.5 we
prove under certain conditions an upper bound for pN (γ), that is true for all
values of γ and that asymptotically coincides with the limit previously com-
puted under a condition of the type (6.6). In Section 6.6 we investigate the
case of continuous spins, and in Section 6.7 we demonstrate the very strong
consequences of a suitable concavity hypothesis on the Hamiltonian, and we
point out a number of rather interesting open problems.
328 6. The Diluted SK Model and the K-Sat Problem
The purpose of this section is to show that under (6.6) “the system is in a
pure state”, that is, the spin correlations vanish. In fact we will prove that
K
E | σ1 σ2 − σ1 σ2 | ≤ (6.7)
N
where K depends only on p and γ. The proof, by induction over N , is similar
in spirit to the argument occurring at the end of Section 1.3. In order to make
the induction work, it is necessary to carry a suitable induction hypothesis,
that will prove a stronger statement than (6.7). This stronger statement will
be useful later in its own right.
Given k ≥ 1 we say that two functions f, f on ΣN n
depend on k coordinates
if we can find indices 1 ≤ i1 < . . . < ik ≤ N and functions f , f from
{−1, 1}kn to R such that
and similarly for f . The reason we define this for two functions is to stress
that both functions depend on the same set of k coordinates.
For i ≤ N , consider the transformation Ti of ΣN n
that, for a point
(σ , . . . , σ ) of ΣN , exchanges the i-th coordinates of σ 1 and σ 2 , and leaves
1 n n
f ◦ Ti = −f , (6.8)
2B + B ∗
E | σ11 (σ21 − σ22 ) | ≤
N
i.e.
2B + B ∗
E | σ1 σ2 − σ1 σ2 | ≤ ,
N
which is (6.7). More generally, basically the same argument shows that when
condition C(N, γ0 , B, B ∗ ) holds (for each N and numbers B and B ∗ that
do not depend on N ), to compute Gibbs averages of functions that depend
only on a number of spin that remains bounded independently of N , one
can pretend that these spins are independent under Gibbs’ measure. We will
return to this important idea later.
where the sum is over those k ≤ M for which i(k, p) ≤ N − 1, and where
H(σ) is the sum of the other terms of (6.4), those for which i(k, p) = N .
330 6. The Diluted SK Model and the K-Sat Problem
Since the set {i(k, 1), . . . , i(k, p)} is uniformly distributed over the subsets of
{1, . . . , N } of cardinality p, the probability that i(k, p) = N is exactly p/N .
A remarkable property of Poisson r.v.s is as follows: when M is a
Poisson
r.v., if (X k )k≥1 are i.i.d. {0, 1}-valued r.v.s then k≤M Xk and
k≤M (1−Xk ) are independent Poisson r.v.s with mean respectively EM EXk
and EM E(1 − Xk ). The simple proof is given in Lemma A.10.1. Using this
for Xk = 1 if i(k, p) = N and Xk = 0 otherwise implies that the numbers of
terms in H(σ) and HN −1 (σ) are independent Poisson r.v.s of mean respec-
tively (p/N )αN = γ and αN − γ. Thus the pair (−HN −1 (σ), −H(σ)) has
the same distribution as the pair
θk (σi (k,1) , . . . , σi (k,p) ), θj (σi(j,1) , . . . , σi(j,p−1) , σN ) . (6.14)
k≤M j≤r
where ri are i.i.d. Poisson r.v.s (independent of all other sources of random-
ness) with
αM
Eri = M .
p
Prove that the Hamiltonian HN has the same distribution as the Hamiltonian
i Hi .
where
− HN −1 (σ) = θk (σi (k,1) , . . . , σi (k,p) ) , (6.16)
k≤M
6.2 Pure State 331
and
− H(σ) = θj (σi(j,1) , . . . , σi(j,p−1) , σN ) . (6.17)
j≤r
Let us stress that in this section and in the next, the letter r will stand
for the number of terms in the summation (6.17), which is a Poisson r.v. of
expectation γ.
We observe from (6.16) that if we write ρ = (σ1 , . . . , σN −1 ) when σ =
(σ1 , . . . , σN ), −HN −1 (σ) = −HN −1 (ρ) is the Hamiltonian of a (N − 1)-spin
system, except that we have replaced γ by a different value γ− . To compute
γ− we recall that the mean number of terms of the Hamiltonian HN −1 is
αN − γ, so that the mean number γ− of terms that contain a given spin is
p N −p
γ− = (αN − γ) = γ , (6.18)
N −1 N −1
since pα = γ. We note that γ− ≤ γ, so that
γ < γ0 ⇒ γ− ≤ γ0 , (6.19)
Av f E −
f = (6.20)
Av E −
holds. Here,
E = E(σ , . . . , σ ) = exp
1 n
−H(σ ) ,
(6.21)
≤n
n
for some functions fs on ΣN −1 , such that the number of terms does not
depend on N , and that all pairs (fs , Av f E) have the property of the pair
(f , f ), but in the (N − 1)-spin system. Since
Avf E 1 fs
= ,
Av f E 2 s Av f E
we can now apply the induction hypothesis to each term to get a bound for
the sum and hence for
Av f E −
E− ,
Av f E −
and finally (6.22) completes the induction step.
We now start the proof. We consider a pair (f , f ) as in Definition 6.2.1,
that is |f | ≤ Qf , f ◦ Ti = −f for some i ≤ N , and f, f depend on k
coordinates. We want to bound E| f / f |, and for this we study the last
term of (6.22). Without loss of generality, we assume that i = N and that f
and f depend on the coordinates 1, . . . , k −1, N . First, we observe that, since
we assume |f | ≤ Qf , we have |f E| ≤ Qf E, so that |Av f E| ≤ Av |f E| ≤
QAv f E, and thus
Av f E −
E− ≤Q. (6.23)
Av f E −
We recall (6.21) and (6.17), and in particular that r is the number of terms
in the summation (6.17) and is a Poisson r.v. of expectation γ. We want to
apply the induction hypothesis to compute the left-hand side of (6.23). The
expectation E− is expectation given E, and it helps to apply the induction
hypothesis if the functions Avf E and Avf E are not too complicated. To
ensure this it will be desirable that all the points i(j, ) for j ≤ r and ≤ p−1
are different and ≥ k. In the rare event Ω (we recall that Ω denotes an event,
and not the entire probability space) where this not the case, we will simply
use the crude bound (6.23) rather than the induction hypothesis. Recalling
that i(j, 1) < . . . < i(j, p − 1), to prove that Ω is a rare event we write
Ω = Ω1 ∪ Ω2 where
" %
Ω1 = ∃j ≤ r , i(j, 1) ≤ k − 1
" %
Ω2 = ∃j, j ≤ r , j = j , ∃, ≤ p − 1 , i(j, ) = i(j , ) .
These two events depend only on the randomness of E. Let us recall that for
j ≤ r the sets
Ij = {i(j, 1), . . . , i(j, p − 1)} (6.24)
are independent and uniformly distributed over the subsets of {1, . . . , N − 1}
of cardinality p − 1. The probability that any given i ≤ N − 1 belongs to
Ij is therefore (p − 1)/(N − 1). Thus the probability that i(j, 1) ≤ k − 1,
6.2 Pure State 333
(p − 1)(k − 1) kpγ
P(Ω1 ) ≤ Er ≤ .
N −1 N
Here and below, we do not try to get sharp bounds. There is no point in
doing this, as anyway our methods cannot reach the best possible bounds.
Rather, we aim at writing explicit bounds that are not too cumbersome. For
j < j ≤ r, the probability that a given point i ≤ N − 1 belongs to both
sets Ij and Ij is ((p − 1)/(N − 1))2 . Thus the random number U of points
i ≤ N − 1 that belong to two different sets Ij for j ≤ r satisfies
2
p−1 r(r − 1) p2 γ 2
E U = (N − 1) E ≤ ,
N −1 2 2N
using that Er(r − 1) = (Er)2 since r is a Poisson r.v., see (A.64). Since U is
integer valued, we have P({U = 0}) ≤ EU and since Ω2 = {U = 0} we get
p2 γ 2
P(Ω2 ) ≤ ,
2N
so that finally, since Ω = Ω1 ∪ Ω2 , we obtain
kpγ + p2 γ 2
P(Ω) ≤ . (6.25)
N
Using (6.22), (6.23) and (6.25), we have
f Av f E − Av f E −
E = E 1Ω E − + E 1Ω c E−
f Av f E − Av f E −
kpγ + p2 γ 2 Av f E −
≤ Q + E 1Ω c E− . (6.26)
N Av f E −
The next task is to use the induction hypothesis to study the last term above.
When Ω does not occur (i.e. on Ω c ), all the points i(j, ), j ≤ r, ≤ p − 1
are different and are ≥ k. Recalling the notation (6.24) we have
!
J = {i(j, ); j ≤ r, ≤ p − 1} = Ij ,
j≤r
J ∩ {1, . . . , k − 1, N } = ∅ . (6.27)
f ◦T =f
E ◦ T ◦ TN (σ 1 , σ 2 , . . . , σ n ) = E(σ 2 , σ 1 , . . . , σ n ) = E(σ 1 , σ 2 , . . . , σ n ) ,
and hence
E ◦ T ◦ TN = E .
Combining with (6.29) we get
(f E) ◦ T ◦ TN = (f ◦ T ◦ TN )(E ◦ T ◦ TN ) = −f E
(f E) ◦ T = −(f E) ◦ TN . (6.30)
Now, for any function f we have Av(f ◦ TN ) = Avf and Av(f ◦ T ) = (Avf ) ◦
i∈J Ui . Therefore we obtain
Av ((f E) ◦ TN ) = Av f E
Av((f E) ◦ T ) = (Av f E) ◦ Ui ,
i∈J
where
fs = (Av f E) ◦ Uiu − (Av f E) ◦ Uiu . (6.32)
u≤s−1 u≤s
A crucial feature of this bound is that it does not depend on the number n
of replicas.
Proof. Let us write
E = exp −H(σ ) ; E = exp
−H(σ ) ,
3≤≤n =1,2
so that E = E E . Since |H(σ)| ≤ j≤r Sj , we have
E ≥ exp −2 Sj ,
j≤r
and therefore
E ≥ E exp −2 Sj . (6.35)
j≤r
This implies
Av f E ≥ (Av f E ) exp −2 Sj . (6.36)
j≤r
336 6. The Diluted SK Model and the K-Sat Problem
Next,
fs = (Av f E) ◦ Uiu − (Av f E) ◦ Uiu
u≤s−1 u≤s
= Av (f E) ◦ Tiu − (f E) ◦ Tiu
u≤s−1 u≤s
= Av f E ◦ Tiu − E ◦ Tiu , (6.37)
u≤s−1 u≤s
using in the last line that f ◦ Tiu = f for each u, since f depends only on
the coordinates 1, . . . , k − 1, N . Recalling that E = E E , and observing that
for each i, we have E ◦ Ti = E , we get
E◦ Tiu − E ◦ Tiu = E E ◦ Tiu − E ◦ Tiu ,
u≤s−1 u≤s u≤s−1 u≤s
and, if we set
Δ = supE ◦ Tiu − E ◦ Tiu = sup |E − E ◦ Tis | ,
u≤s−1 u≤s
we get from (6.37) that, using that |f | ≤ Qf in the first inequality and (6.35)
in the second one,
|fs | ≤ ΔAv (|f |E ) ≤ QΔAv(f E ) ≤ QΔAv(f E) exp 2 Sj . (6.38)
j≤r
To bound Δ, we write E = j≤r Ej , where
Ej = exp
θj (σi(j,1)
, . . . , σi(j,p−1)
, σN ).
=1,2
Now, using the inequality |ex − ey | ≤ |x − y|ea ≤ 2aea for |x|, |y| ≤ a and
a = 2Sv , we get
|Ev − Ev ◦ Tis | ≤ 4Sv exp 2Sv .
Since for all j we have Ej ≤ exp 2Sj , we get Δ ≤ 4Sv exp 2 j≤r Sj . Com-
bining with (6.38) completes the proof.
6.2 Pure State 337
where
U = E S exp 4S ; V = E exp 4S .
Combining with (6.31), and since there are k = r(p − 1) ≤ rp terms we get
Av f E − 4Qp
Eθ E− ≤ ((kr + r2 p)B + rB ∗ )U V r−1 .
Av f E − N
This bound assumes that Ω does not occur; but combining with (6.26) we
obtain the bound
f Qp
E ≤ kγ + pγ 2
+ 4B kU ErV r−1
+ pU Er 2 r−1
V + 4B ∗
U ErV r−1
.
f N
Since r is a Poisson r.v. of expectation γ a straightforward calculation shows
that ErV r−1 = γ exp γ(V − 1). Since ex ≤ 1 + xex for all x ≥ 0 (as is trivial
using power series expansion) we have V ≤ 1+4U , so exp γ(V −1) ≤ exp 4γU
and U ErV r−1 ≤ D exp 4D. The result follows.
Proof of Theorem 6.2.2. If
D0 = γ0 E S exp 4S
Thus condition
∀i ≤ N − 1, σi dμ(ρ) = σi − ,
where σii is the i-th coordinate of the i-th replica ρi . The following conse-
quence of property C(N, γ0 , K0 , K0 ) will be used in Section 6.4. It expresses,
in a form that is particularly adapted to the use of the cavity method the fact
that under property C(N, γ0 , K0 , K0 ), a given number of spins (independent
of N ) become nearly independent for large N .
6.2 Pure State 339
and fi similarly. The idea is simply that “we make the spins independent one
at a time”. Thus
Av σN E − f − Av σN E • f −
= 1 ; = N −1 , (6.44)
Av E − f1 − Av E • fN −1 −
The terms in the summation are zero unless i belongs to the union of the
sets Ij , j ≤ r, for otherwise f and f do not depend on the i-th coordinate
and fi = fi−1 , fi = fi−1 . We then try to bound the terms in the summation
when i ∈ Ij for a certain j ≤ r. Since |fi | ≤ fi we have
fi−1 − fi − fi−1 − fi − fi − fi−1 − fi −
− ≤ +
fi−1 − fi − fi−1 − fi−1 − fi −
f − fi − fi−1 − fi −
≤ i−1 +
fi−1 − fi−1 −
E := exp Wu (σ11 , σ22 , . . . , σii , σi+1
1 1
, . . . , σN )
u=j
i−1
= exp Wu (σ11 , σ22 , . . . , σi−1 , σi1 , σi+1
1 1
, . . . , σN ).
u=j
Then
fi−1 = Av E(σ11 , . . . , σi−1
i−1
, σi1 , . . . , σN
1
) ≥ exp(−Sj )Av E ,
where Av denotes average over σN 1
= ±1. In a similar fashion, we get |fi−1 | ≤
exp Sj Av E , |fi | ≤ exp Sj Av E , and thus
In the limit N → ∞ the sets Ij = {i(j, 1), . . . , i(j, p − 1)} are disjoint. The
quantity E depends on a number of spins that in essence does not depend
on N . If we know the asymptotic behavior of any fixed number (i.e. of any
number that does not depend on N ) of the spins (σi )i<N , we can then com-
pute the behavior of the spin σN . This behavior has to be the same as the
behavior of the spins σi for i < N , and this gives rise to a “self-consistency
equation”.
To define formally this equation, consider a Poisson r.v. r with Er = γ,
and independent of the r.v.s θj . For σ ∈ {−1, 1}N and ε ∈ {−1, 1} we define
Er = Er (σ, ε) = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) . (6.49)
1≤j≤r
This definition will be used many times in the sequel. We note that Er
depends on σ only through the coordinates of rank ≤ r(p − 1).
Given a sequence x = (xi )i≥1 with |x i | ≤ 1 we denote by λx the prob-
ability on {−1, 1}N that “has a density i (1 + xi σi ) with respect to the
uniform measure”. More formally, λx is the product measure such that
σi dλx (σ) = xi for each i. We denote by · x an average for λx .
Similarly, if x = (xi )i≤M we also denote by λx the probability measure
on ΣM = {−1, 1}M such that σi dλx (σ) = xi and we denote by · x an
average for λx , so that we have
f x = (1 + xi σi )f (σ)dσ ,
i≤M
f • = f Y , (6.50)
342 6. The Diluted SK Model and the K-Sat Problem
where Y = ( σ1 − , . . . , σN −1 − ).
Consider a probability measure μ on [−1, 1], and an i.i.d. sequence X =
(Xi )i≥1 such that Xi is of law μ. We define T (μ) as the law of the r.v.
Av εEr X
, (6.51)
Av Er X
μ = T (μ) .
The proof will consist of showing that T is a contraction for the Monge-
Kantorovich transportation-cost distance d defined in (A.66) on the set of
probability measures on [−1, 1] provided with the usual distance. In the
present case, this distance is simply given by the formula
where the infimum is taken over all pairs of r.v.s (X, Y ) such that the law
of X is μ1 and the law of Y is μ2 . The very definition of d shows that to
bound d(μ1 , μ2 ) there is no other method than to produce a pair (X, Y ) as
above such that E|X − Y | is appropriately small. Such a pair will informally
be called a coupling of the r.v.s X and Y .
∂
f x = Δi f x (6.53)
∂xi
− −
i ) − f (η i ))/2, and where η i (resp. η i ) is obtained by
where Δi f (η) = (f (η + +
1 x
f x = f (η) dλx (η) = (f (1) + f (−1)) + (f (1) − f (−1)) .
2 2
Thus, using in the second inequality the trivial fact that a = a x for any
number a implies
d 1 1
f x = (f (1) − f (−1)) = (f (1) − f (−1)) . (6.54)
dx 2 2 x
Since λx is a product measure, using (6.54) given all the coordinates different
from i, and then Fubini’s theorem, we obtain (6.53).
where Sj = sup |θj |. For the other values of i the left-hand side of the previous
inequality is 0.
Now
|Δi (Av εEr )| = |Av (εΔi Er )| ≤ Av |Δi Er | .
We write Er = E E , where E = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε), and
where E does not depend on σi . Thus, using that |ex −ey | ≤ |x−y|ea ≤ 2aea
for |x|, |y| ≤ a, we get (keeping in mind the factor 1/2 in the definition
of Δi , that offsets the factor 2 above) that Δi E ≤ Sj exp Sj , and since
E ≤ Er exp Sj we get
and thus
Δi (Av εEr ) x
Av Er x ≤ Sj exp 2Sj .
d(μγ , μγ ) ≤ 4|γ − γ | .
so that (6.60) implies that d(μγ , μγ ) ≤ d(μγ , μγ )/2 + 2(γ − γ), hence the
desired result.
n
Exercise 6.3.5. Consider three functions U, V, W on ΣN . Assume that
V ≥ 0, that for a certain number Q, we have |U | ≤ QV , and let S ∗ =
supσ1 ,...,σn |W |. Prove that for any Gibbs measure · we have
U exp W U
− ≤ 2QS ∗ exp 2S ∗ .
V exp W V
Exercise 6.3.6. Use the idea of Exercise 6.3.5 to control the influence of
E in (6.61) and to show that if γ and γ satisfy (6.52) then d(μγ , μγ ) ≤
4|γ − γ |ES exp 2S.
Then there exists a number K2 (p, γ0 ) such that if we define for n ≥ 0 the
numbers A(n) as follows:
346 6. The Diluted SK Model and the K-Sat Problem
In particular when
80p3 (γ0 + γ03 )ES exp 2S ≤ 1 , (6.66)
we can replace (6.65) by
2k 3 K2 (γ0 , p)
E | σi − zi | ≤ E exp 2S . (6.67)
N
i≤k
The last statement of the Theorem simply follows from the fact that under
(6.66) we have A(n) ≤ 2A(0), so that we can take n very large in (6.90).
When (6.66) need not hold, optimisation over n in (6.65) yields a bound
≤ KkN −α for some α > 0 depending only on γ0 , p and S.
The next problem need not be difficult. This issue came at the very time
where the book was ready to be sent to the publisher, and it did not seem ap-
propriate to either delay the publication or to try to make significant changes
in a rush.
Research Problem 6.4.2. (level 1-) Is it true that (6.67) follows from
(6.62)? More specifically, when γ0 1, and when S is constant, does (6.67)
follow from a condition of the type K(p)γ0 S ≤ 1?
Probably the solution of this problem will not require essentially new
ideas. Rather, it should require technical work and improvement of the esti-
mates from Lemma 6.4.3 to Lemma 6.4.7, trying in particular to bring out
more “small factors” such as ES exp 2S, in the spirit of Exercice 6.3.6. It
seems however that it will also be necessary to proceed to a finer study of
what happens on the set Ω defined page 349.
It follows from Theorem 6.2.2 that we can assume throughout the proof
that property C(γ0 , N, K0 , K0 ) holds for every N . It will be useful to consider
the metric space [−1, 1]k , provided with the distance d given by
d((xi )i≤k , (yi )i≤k ) = |xi − yi | . (6.68)
i≤k
L( σ1 , . . . , σk ) = L( σN −k+1 , . . . , σN ) .
The starting point of the proof of Theorem 6.4.1 is a formula similar to (6.20),
but where we remove the last k coordinates rather than the last one. Writing
now ρ = (σ1 , . . . , σN −k ), we consider the Hamiltonian
− HN −k (ρ) = θs (σi(s,1) , . . . , σi(s,p) ) , (6.70)
s
γ− (N − k) = pN ατ = γN τ ,
and thus
(N − k − 1) · · · (N − k − p + 1)
γ− = γ . (6.71)
(N − 1) · · · (N − p + 1)
In particular γ− ≤ γ0 whenever γ ≤ γ0 . Let us denote again by · − an
average for the Gibbs measure with Hamiltonian (6.70). (The value of k will
be clear from the context.) Given a function f on ΣN , we then have
348 6. The Diluted SK Model and the K-Sat Problem
Av f E −
f = , (6.72)
Av E −
where Av means average over σN −k+1 , . . . , σN = ±1, and where
E = exp θs (σi(s,1) , . . . , σi(s,p) ) ,
where now the sets {i(j, 1), . . . , i(j, p)} are uniformly distributed over the
subsets of {1, . . . , N } of cardinality p that intersect {N − k + 1, . . . , N }, and
where r is a Poisson r.v. The expected value of r is the mean number of terms
in the Hamiltonian −HN that are not included in the summation (6.70), so
that
⎛ ⎞
N −k
p γN (N − k) · · · (N − k − p + 1)
Er = αN ⎝1 − ⎠= 1− . (6.74)
N p N · · · (N − p + 1)
p
The quantity r will keep this meaning until the end of the proof of The-
orem 6.4.1, and the quantity E will keep the meaning of (6.73). It is good to
note that, since N ≥ 2kp, for ≤ p we have
N −k− k 2k
=1− ≥1− .
N − N − N
Therefore
(N − k) · · · (N − k − p − 1) 2k p 2kp
≥ 1− ≥1− , (6.75)
N · · · (N − p + 1) N N
and thus
Er ≤ 2kγ . (6.76)
We observe the identity
Av σN −k+1 E − Av σN E −
L( σN −k+1 , . . . , σN ) = L ,..., . (6.77)
Av E − Av E −
The task is now to use the induction hypothesis to approximate the right-
hand side of (6.77); this will yield the desired induction relation. There are
three sources of randomness on the right-hand of (6.77). There is the ran-
domness associated with the (N − k)-spin system of Hamiltonian (6.70); the
randomness associated to r and the sets {i(j, 1), . . . , i(j, p)}; and the random-
ness associated to the functions θs , s ≤ r. These three sources of randomness
are independent of each other.
6.4 The Replica-Symmetric Solution 349
" %
Ω2 = ∃ j, j ≤ r, j = j , ∃ , ≤ p − 1 , i(j, ) = i(j , ) .
4k2
P(Ω) ≤ (γp + γ 2 p2 ) . (6.79)
N
We recall that, as defined page 341, given a sequence x = (x1 , . . . , xN −k )
with |xi | ≤ 1 and a function f on ΣN −k , we denote by f x the average of f
with respect to the product measure λx on ΣN −k such that σi dλx (ρ) = xi
for 1 ≤ i ≤ N − k.
We now start a sequence of lemmas that aim at deducing from (6.77) the
desired induction relations among the quantities D(N, k, γ0 ). There will be
four steps in the proof. In the first step below, in each of the brackets in the
right-hand side of (6.77) we replace the Gibbs measure · − by · Y where
Y = ( σ1 − , . . . , σN −k − ). The basic reason why this creates only a small
error is that C(N, γ0 , K0 , K0 ) holds true for each N , a property which is used
as in Proposition 6.2.7.
Y = ( σ1 −, . . . , σN −k −) .
Set
Av σN −k+ E − Av σN −k+ E Y
u = σN −k+ = ; v = .
Av E − Av E Y
Then we have
k3
d(L(u1 , . . . , uk ), L(v1 , . . . , vk )) ≤ K(p, γ0 )E exp 2S . (6.80)
N
Proof. From now on E− denotes expectation in the randomness of the N − k
spin system only. When Ω does not occur, there is nothing to change to the
proof of Proposition 6.2.7 to obtain that
8r(p − 1)2 K0
E− |u − v | ≤ exp 2Sj ,
N −k
j≤r
350 6. The Diluted SK Model and the K-Sat Problem
8r(p − 1)2 K0
E− |u − v | ≤ exp 2Sj + 21Ω . (6.81)
N −k
j≤r
8(p − 1)2 K0
E|u − v | ≤ E exp 2S Er2 + 2P(Ω)
N −k
k2 K(p, γ0 )
≤ E exp 2S ,
N
using (6.79), that N − k ≥ N/2 and that Er2 = Er + (Er) ≤ 2γk + 4γ k .
2 2 2
Since the left-hand side of (6.80) is bounded by ≤k E|u − v |, the result
follows.
In the second step, we replace the sequence Y by an appropriate i.i.d.
sequence of law μγ− . The basic reason this creates only a small error is the
“induction hypothesis” i.e. the control of the quantities D(N − k, m, γ0 ).
Proposition 6.4.4. Consider an independent sequence X = (X1 , . . . , XN −k )
where each Xi has law μ− := μγ− . We set
Av σN −k+ E X
w = , (6.82)
AvE X
and we recall the quantities v of the previous lemma. Then we have
k3
d(L(v1 , . . . , vk ), L(w1 , . . . , wk )) ≤ K(p, γ0 ) (6.83)
N
+ 4ES exp 2SED(N − k, r(p − 1), γ0 ) ,
where the last expectation is taken with respect to the Poisson r.v. r.
The proof will rely on the following lemma.
Lemma 6.4.5. Assume that Ω does not occur. Consider ≤ k and
E = exp θj (σi(j,1) , . . . , σi(j,p−1) , σN −k+ ) , (6.84)
∂ Av σN −k+ E x
=0 (6.86)
∂xi Av E x
unless i ∈ Ij for some j with i(j, p) = N −k+. In that case we have moreover
∂ Av σN −k+ E x
≤ 4Sj exp 2Sj . (6.87)
∂xi Av E x
Av σN −k+ E x = Av σN −k+ E x Av E x
and similarly with (6.89), so that (6.85) follows, of which (6.86) is an obvious
consequence. As for (6.87), it is proved exactly as in Lemma 6.3.3.
Proof of Proposition 6.4.4. Thestrategy is to construct a specific realiza-
& X for which the quantity E ≤N −k |v −w | is small. Consider the set
tion of
J = j≤r Ij (so that cardJ ≤ (p − 1)r). The construction takes place given
the set J. By definition of D(N − k, r(p − 1), γ0 ), given J we can construct
an i.i.d. sequence (Xi )i≤N −k distributed like μ− that satisfies
E− |Xi − σi − | ≤ 2D(N − k, r(p − 1), γ0 ) . (6.90)
i∈J
We can moreover assume that the sequence (θj )j≥1 is independent of the
randomness generated by J and the variables Xi . The sequence (Xi )i≤N −k
is our specific realization. It is i.i.d. distributed like μ− .
It follows from Lemma 6.4.5 that if Ω does not occur,
Av σN −k+ E X Av σN −k+ E Y
|w − v | = −
Av E X Av E Y
≤ |Xi − σi − | 2Sj exp 2Sj ,
i∈Ij
Taking expectation E− and using (6.90) implies that when Ω does not occur,
Eθ E− |w − v | ≤ 4(ES exp 2S)D(N − k, r(p − 1), γ0 ) ,
≤k
i.e.
1Ω c Eθ E− |w − v | ≤ 4(ES exp 2S)D(N − k, r(p − 1), γ0 ) . (6.91)
≤k
k2
d(L(w1 , . . . , wk ), T (μ− ) ⊗ · · · ⊗ T (μ− )) ≤ K(p, γ0 ) . (6.92)
N
Proof. Let us define, for ≤ k
+ ,
r() = card j ≤ r; i(j, p − 1) ≤ N − k, i(j, p) = N − k + , (6.93)
so that when Ω does not occur, r() is the number of terms in the summation
of (6.84), and moreover for different values of , the sets of indices occurring
in (6.84) are disjoint. The sequence (r())≤k is an i.i.d. sequence of Poisson
r.v.s. (and their common mean will soon be calculated).
6.4 The Replica-Symmetric Solution 353
Since the sequence (r())≤k is an i.i.d. sequence of Poisson r.v.s, the sequence
(w )≤k is i.i.d. It has almost law T (μ− ), but not exactly because the Poisson
r.v.s r() do not have the correct mean. This mean γ = Er() is given by
N −k
N γ p−1 (N − k) · · · (N − k − p + 2)
γ = =γ ≤γ.
p N (N − 1) · · · (N − p + 1)
p
The sequence (w )≤k is i.i.d. and the law of w is T (μ− ). Thus (6.95) implies:
and
P(s() = r()) = P(r () = 0) ≤ γ − γ .
Moreover from (6.75) we see that γ − γ ≤ 2γkp/N. The result follows.
The next lemma is the last step. It quantifies the fact that T (μ− ) is nearly
μ.
Lemma 6.4.7. We have
4γk2 p
d(T (μ− )⊗k , μ⊗k ) ≤ . (6.96)
N
Proof. The left-hand side is bounded by
k
kd(T (μ− ), μ) = kd(T (μ− ), T (μ)) ≤ d(μ, μ− ) ≤ 2k(γ − γ− ) ,
2
using Lemma 6.3.4. The result follows since by (6.75) we have γ − γ− ≤
2kpγ/N .
Proof of Theorem 6.4.1. We set B = 4ES exp 2S. Using the triangle in-
equality for the transportation-cost distance and the previous estimates, we
have shown that for a suitable value of K2 (γ0 , p) we have (recalling the defi-
nition (6.63) of A(0)),
k 3 A(0)
d L( σN −k+1 , . . . , σN ), μ⊗k ≤ + BED(N − k, r(p − 1), γ0 ) .
N
(6.97)
Given an integer n we say that property C ∗ (N, γ0 , n) holds if
k 3 A(n)
∀p ≤ N ≤ N , ∀k ≤ N , D(N , k, γ0 ) ≤ 21−n k + . (6.98)
N
Since D(N , k, γ0 ) ≤ 2k, C ∗ (N, γ0 , 0) holds for each N . And since A(n) ≥
A(0), C ∗ (p, γ0 , n) holds as soon as K2 (γ0 , p) ≥ 2p, since then D(p, k, γ0 ) ≤
2k ≤ k3 A(0)/p ≤ k3 A(n)/p. We will prove that
C ∗ (N − 1, γ0 , n) ⇒ C ∗ (N, γ0 , n + 1) , (6.99)
thereby proving that C ∗ (N, γ0 , n) holds for each N and n, which is the content
of the theorem.
To prove (6.99), we assume that C ∗ (N − 1, γ0 , n) holds and we consider
k ≤ N/2. It follows from (6.98) used for N = N − k ≤ N − 1 and r(p − 1)
instead of k that since k ≤ N/2 we have
k3
d L( σN −k+1 , . . . , σN ), μ⊗k ≤ 2−n k + (A(0) + 40p3 (γ + γ 3 )BA(n)) ,
N
and since this holds for each γ ≤ γ0 , the definition of D(N, k, γ0 ) shows that
k3 k 3 A(n + 1)
D(N, k, γ0 ) ≤ 2−n k + (A(0) + 40p3 (γ0 + γ03 )BA(n)) = 2−n k + .
N N
(6.103)
We have assumed k ≤ N/2, but since D(N, k, γ0 ) ≤ 2k and A(n + 1) ≥ A(0),
(6.103) holds for k ≥ N/2 provided K2 (γ0 , p) ≥ 8. This proves C ∗ (N, γ0 , n+1)
and concludes the proof.
We now turn to the computation of
1
pN (γ) = E log exp(−HN (σ)) . (6.104)
N σ
We will only consider the situation where (6.66) holds, leaving it to the reader
to investigate what kind of rates of convergence she can obtain when assum-
ing only (6.62). We consider i.i.d. copies (θj )j≥1 of the r.v. θ, that are inde-
pendent of θ, and we recall the notation (6.49). Consider an i.i.d. sequence
X = (Xi )i≥1 , where Xi is of law μγ (given by Theorem 6.3.1). Recalling the
definition (6.49) of Er we define
γ(p − 1)
p(γ) = log 2 − E log exp θ(σ1 , . . . , σp ) X + E log Av Er X . (6.105)
p
As we shall see later, the factor log N above is parasitic and can be removed.
pN (γ + δ) − pN (δ) 1 δ
= E log exp(−HN (σ)) . (6.111)
δ Nδ
6.4 The Replica-Symmetric Solution 357
When u = 0, we have HN δ
≡ 0 so that log exp(−HN δ
(σ)) = 0. For very
small δ, the probability that u = 1 is at the first order in δ equal to N δ/p.
The contribution of this case to the right-hand side of (6.111) is, by symmetry
among sites,
1 1
E log exp θ1 (σi(1,1) , . . . , σi(1,p) ) = E log exp θ(σ1 , . . . , σp ) .
p p
The contribution of the case u > 1 is of second order in δ, so that taking the
limit in (6.111) as δ → 0 yields (6.109).
Lemma 6.4.12. Recalling that X = (Xi )i≥1 where Xi are i.i.d. of law μγ
we have
pN (γ) − 1 E log exp θ(σ1 , . . . , σp ) ≤ K . (6.112)
p X
N
Proof. From Lemma 6.4.11 we see that it suffices to prove that
K
E log exp θ(σ1 , . . . , σp ) − E log exp θ(σ1 , . . . , σp ) ≤ . (6.113)
X N
Let us denote by E0 expectation in the randomness of · (but not in θ), and
let S = sup |θ|. It follows from Theorem 6.2.2 (used as in Proposition 6.2.7)
that
K
E0 exp θ(σ1 , . . . , σp ) − exp θ(σ11 , . . . , σpp ) ≤ exp S .
N
Here and below, the number K depends only on p and γ0 , but not on S or
N . Now
exp θ(σ11 , . . . , σpp ) = exp θ(σ1 , . . . , σp ) Y ,
where Y = ( σ1 , . . . , σp ). Next, since
∂
∂xi exp θ(σ1 , . . . , σp ) x ≤ exp S ,
so that
K
E0 log exp θ(σ1 , . . . , σp ) − log exp θ(σ1 , . . . , σp ) X
≤ exp 2S ,
N
and (6.113) by taking expectation in the randomness of θ.
Proof of Lemma 6.4.10. We observe that
γ
pN −1 (γ) − pN −1 (γ− ) = pN −1 (t)dt .
γ−
where (θj )j≥1 are independent distributed like θ, where r is a Poisson r.v.
of expectation γ and where the sets {i(j, 1), . . . , i(j, p − 1)} are uniformly
distributed over the subsets of {1, . . . , N − 1} of cardinality p − 1. All these
randomnesses, as well as the randomness of HN −1 are globally independent.
Thus the identity
E log exp(−HN (σ)) = E log exp(−HN −1 (ρ))
σ ρ
+ log 2 + E log Av E − (6.114)
holds, where
E = E(ρ, ε) = exp θj (σi(j,1) , . . . , σi(j,p−1) , ε) .
j≤r
The term log 2 occurs from the identity a(1) + a(−1) = 2Av a(ε). Moreover
(6.114) implies the equality
∀n ≥ 1, E(−b)n ≥ 0 , (6.118)
and
either f ≥ 0 or p is even. (6.120)
Let us consider two examples where these conditions are satisfied. First,
let
θ(σ1 , . . . , σp ) = βJσ1 · · · σp ,
where J is a symmetric r.v. Then (6.117) holds for a = ch(βJ), b = th(βJ),
f (σ) = σ, (6.118) holds by symmetry and (6.120) holds when p is even.
Second, let
(1 + ηj σj )
θ(σ1 , . . . , σp ) = −β ,
2
j≤p
1 Kγ
∀γ, ∀μ, pN (γ) = E log exp(−HN (σ)) ≤ p(γ, μ) + , (6.121)
N σ
N
and let us consider independent copies (Ui,s (1), Ui,s (−1))i,s≥1 of the pair
(U (1), U (−1)).
6.5 The Franz-Leone Bound 361
γ(p − 1) Kγ
ϕ (t) ≤ − E log exp θ(σ1 , . . . , σp ) X + . (6.123)
p N
This is of course the key fact.
Proof of Theorem 6.5.1. We deduce from (6.5.3) that
γ(p − 1) Kγ
pN (γ) = ϕ(1) ≤ ϕ(0) − E log exp θ(σ1 , . . . , σp ) X + .
p N
= log 2 + E log Av Er X ,
Here, as in the rest of the section, we denote by · an average for the Gibbs
measure with Hamiltonian (6.122), keeping the dependence on t implicit. On
the other hand, the number K in (6.124) is of course independent of t.
Proof. In ϕ (t) there are terms coming from the dependence on t of Mt and
terms coming from the dependence on t of ri,t .
As shown by Lemma 6.4.11, the term created by the dependence of Mt
on t is
γ γ
N
γK
E log exp θ(σ1 , . . . , σp ) ≤ E log exp θ(σi1 , . . . , σip ) + ,
p pN p i1 ,...,ip =1
N
because all the terms where the indices i1 , . . . , ip are distinct are equal. The
same argument as in Lemma 6.4.11 shows that the term created by the de-
pendence of ri,t on t is −(γ/N )E log exp U (σi ) .
Thus, we have reduced the proof of Proposition 6.5.3 (hence, of Theo-
rem 6.5.1) to the following:
Lemma 6.5.5. We have
N
1 p
p
E log exp θ(σi1 , . . . , σip ) − E log exp U (σi )
i1 ,...,ip =1
N N
i≤N
+(p − 1)E log exp θ(σ1 , . . . , σp ) X ≤0. (6.125)
The proof is not really difficult, but it must have been quite another matter
when Franz and Leone discovered it.
Proof. We will get rid of the annoying logarithms by power expansion,
xn
log(1 + x) = − (−1)n
n
n≥1
for |x| < 1. Let us denote by E0 the expectation in the randomness of X and
of the functions fj of (6.117) only. Let us define
n
Cn = E0 f (σ1 ) (6.126)
1
X
1 n
Aj,n = Aj,n (σ , . . . , σ ) = fj (σi ) (6.127)
N
i≤N ≤n
6.5 The Franz-Leone Bound 363
Bn = Bn (σ 1 , . . . , σ n ) = E0 Aj,n . (6.128)
so that, taking the average · X and logarithm, and using (6.119) to allow
the power expansion in the second line,
log exp θ(σ1 , . . . , σp ) X = log a + log 1 + b fj (σj )
j≤p X
∞
(−b)n
n
= log a − fj (σj ) . (6.131)
n=1
n X
j≤p
Now, by independence
n n
E0 fj (σj ) = E0 fj (σj ) X = Cnp
j≤p X j≤p
so that
∞
(−b)n p
E0 log exp θ(σ1 , . . . , σp ) X = E0 log a − Cn .
n=1
n
As in (6.131),
1
N
p
log exp θ(σi1 , . . . , σip )
N i ,...,i
1 p
∞
(−b)n 1 N n
= log a − p
fj (σij ) .
n=1
n N i ,...,i =1
1 p j≤p
1
N n
1
N n
fj (σij ) = fj (σij )
Np i1 ,...,ip =1
Np i1 ,...,ip =1 ≤n j≤p
j≤p
= Aj,n .
j≤p
Now from (6.128) and independence we get E0 j≤p Aj,n = Bnp , so that
∞
1
N
(−b)n
E0 log exp θ(σi1 , . . . , σip ) = E0 log a − Bnp .
Np i1 ,...,ip =1 n=1
n
In this section we consider the situation of the Hamiltonian (6.4) when the
spins are real numbers. There are two motivations for this. First, the “main
parameter” of the system is no longer “a function” but rather “a random
function”. This is both a completely natural and fun situation. Second, this
will let us demonstrate in the next section the power of the convexity tools
we developed in Chapters 3 and 4. We consider a (Borel) function θ on Rp ,
i.i.d. copies (θk )k≥1 of θ, and for σ ∈ RN the quantity HN (σ) given by (6.4).
We consider a given probability measure η on R, and we lighten notation by
writing ηN for η ⊗N , the corresponding product measure on RN . The Gibbs
measure is now defined as the random probability measure on RN which has
a density with respect to ηN that is proportional to exp(−HN (σ)). Let us fix
an integer k and, for large N , let us try to guess the law of (σ1 , . . . , σk ) under
Gibbs’ measure. This is a random probability measure on Rk . We expect that
it has a density Yk,N with respect to ηk = η ⊗k . What is the simplest possible
structure? It would be nice if we had
The nicest possible probabilistic structure would be that these random ele-
ments X1 , . . . , Xk be i.i.d, with a common law μ, a probability measure on the
metric space D. This law μ is the central object, the “main parameter”. (If
we wish, we can equivalently think of μ as the law of a random element of D.)
The case of Ising spins is simply the situation where η({1}) = η({−1}) = 1/2,
in which case
and
for a certain r.v. S. Of course (Sk )k≥1 denote i.i.d. copies of S with Sk =
sup |θk (σ1 , . . . , σp )|. Whether or how this boundedness condition can be weak-
ened remains to be investigated. Overall, once one gets used to the higher
level of abstraction necessary compared with the case of Ising spins, the proofs
are really not more difficult in the continuous case. In the present section we
will control the model under a high-temperature condition and the extension
of the methods of the previous sections to this setting is really an exercise.
The real point of this exercise is that in the next section, we will succeed
to partly control the model without assuming a high-temperature condition
but assuming instead the concavity of θ, a result very much in the spirit of
Section 3.1.
Our first task is to construct the “order parameter” μ = μγ . We keep the
notation (6.49), that is we write
Er = Er (σ, ε) = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) ,
1≤j≤r
that is, we integrate the generic k-th coordinate with respect to η after making
the change of density Xk .
For consistency with the notation of the previous section, for a function
h(ε) we write
Avh = h(ε)dη(ε) . (6.136)
Thus
AvEr = Er (σ, ε)dη(ε)
Er X
(6.137)
AvEr X
Theorem 6.6.1. Assuming (6.52), i.e. 4γp(ES exp 2S) ≤ 1, there exists a
unique probability measure μ on D such that μ = T (μ).
Once this estimate has been obtained we proceed exactly as in the proof
of Theorem 6.3.1. Namely, if μ and μ are the laws of X and Y respectively,
and since the law of the quantity (6.137) is T (μ), the expected value of
the left-hand side of (6.139) is an upper bound for the transportation-cost
distance d(T (μ), T (μ )) associated to the distance d of (6.138) (by the very
definition of the transportation-cost distance). Thus taking expectation in
(6.139) implies that
Since this is true for any choice of X and Y with laws μ and μ respectively,
we obtained that
Now we observe that to bound both terms by Sj exp 2Sj Xi − Yi 1 it suffices
to prove that
368 6. The Diluted SK Model and the K-Sat Problem
(where both sides are functions of ε). Indeed to bound the term I using (6.143)
we observe that ' '
' Er i ' Er i
' '
' AvEr i ' = Av AvEr i = 1 (6.144)
1
A := E i = E i−1 .
and therefore
Consequently,
Finally
since exp(−Sj ) ≤ E we have exp(−Sj ) ≤ B, so that exp(−Sj ) ≤
BXi (σi )dη(σi ). The first part of (6.145) then implies that A exp(−Sj ) ≤
Er i and combining with (6.147) this finishes the proof of (6.143) and
Lemma 6.6.2.
A suitable extension of Theorem 6.2.2 will be crucial to the study of the
present model. As in the case of Theorem 6.6.1, once we have found the
proper setting, the proof is not any harder than in the case of Ising spins.
Let us consider a probability space (X , λ), an integer n, and a family
(fω )ω∈X of functions on (RN )n . We assume that there exists i ≤ N such that
for each ω we have
fω ◦ Ti = −fω , (6.148)
where Ti is defined as in Section 6.2 i.e. Ti exchanges the ith components σi1
and σi2 of the first two replicas and leaves all the other components unchanged.
Consider another function f ≥ 0 on (RN )n . We assume that f and the
functions fω depend on k coordinates (of course what we mean here is that
they depend on the same k coordinates whatever the choice of ω). We assume
that for a certain number Q, we have
Theorem 6.6.3. Under (6.10), and provided that γ ≤ γ0 , with the previous
notation we have
| fω |dλ(ω) K0 kQ
E ≤ . (6.150)
f N
Proof. The fundamental identity (6.20):
Av f E −
f =
Av E −
We replace (6.34) by
|fω,s |dλ(ω) ≤ 4QSv exp 4 Su Av f E ,
u≤r
k3 K(p, γ0 )
≤ E exp 2S , (6.152)
N
where YN,k denotes the density with respect to ηk = η ⊗k of the law of
σ1 , . . . , σk under Gibbs’ measure, and μ is as in Theorem 6.6.1 and where
K(p, γ0 ) depends only on p and γ0 .
-
It is convenient to denote by ≤k X the function
(σ1 , . . . , σk ) → X (σ ) ,
≤k
-
so that the left-hand side of (6.152) is simply EYN,k − ≤k X 1 .
Overall the principle of the proof is very similar to that of the proof of
Theorem 6.4.1, but the induction hypothesis will not be based on (6.152).
The starting point of the proof is the fundamental cavity formula (6.72),
where Av now means that σN −k+1 , . . . , σN are averaged independently with
respect to η. When f is a function of k variables, this formula implies that
Avf (σN −k+1 , . . . , σN )E −
f (σN −k+1 , . . . , σN ) =
AvE −
E −
= Av f (σN −k+1 , . . . , σN ) . (6.153)
AvE −
6.6 Continuous Spins 371
E −
. (6.154)
AvE −
Before deciding how to start the proof of Theorem 6.6.4, we will first take
full advantage of Theorem 6.6.3. For a function f on RN −k we denote
N −k
f • = f (σ11 , σ22 , . . . , σN −k ) − ,
that is, we average every coordinate in a different replica. We recall the set
Ω of (6.79).
j = j ⇒ Ij ∩ Ij ⊂ {N − k + 1, . . . , N } ,
or, equivalently, that the sets Ij \{1, . . . , N −k} for j ≤ r are all disjoint. Con-
sider functions Wj (σ), depending only on the coordinates in Ij , and assume
that supσ |Wj (σ)| ≤ Sj . Consider
E = exp Wj (σ) .
j≤r
Then we have
E − E • 4K0 k(p − 1)
E− Av − ≤ exp 2Sj . (6.156)
AvE − AvE • N −k
j≤r
372 6. The Diluted SK Model and the K-Sat Problem
Exchanging the variables σii and σi1 exchanges Ei and Ei−1 and changes the
sign of the function f = Ei Av(Ei − Ei−1 ). Next we prove the inequality
To prove this we observe that E is of the form AB where A does not depend on
the ith coordinate and exp(−Sj ) ≤ B ≤ exp Sj . Thus with obvious notation
|Bi − Bi−1 | ≤ 2 exp Sj ≤ 2 exp 2Sj Bi−1 and since A does not depend on the
ith coordinate we have Ai = Ai−1 and thus
Therefore
|Av(Ei − Ei−1 )| ≤ (2 exp 2Sj )AvEi−1
and
Av|f | ≤ (2 exp 2Sj )AvEi AvEi−1 . (6.158)
Thinking of Av in the left-hand side as averaging over the parameter ω =
(σi )N −k<i≤N,≤N +1 , we see that (6.158) is (6.149) when A = 2 exp 2Sj and
f = AvEi AvEi−1 . Applying (6.150) to the (N −k)-spin system we then obtain
6.6 Continuous Spins 373
K0 k
II ≤ (2 exp 2Sj ) .
N −k
Proceeding similarly we get the same bound for the term I (in a somewhat
simpler manner) and this completes the proof of (6.156).
Proof of Proposition 6.6.5. We take expected values in (6.156), and we
remember as in the Ising case (i.e. when σi = ±1) that it suffices to consider
the case N ≥ 2k.
It will be useful to introduce the following random elements V1 , . . . , Vk of
D. (These depend also on N , but the dependence is kept implicit.) The func-
tion V is the density with respect to η of the law of σN −k+ under Gibbs’ mea-
sure. Let us denote by Yk∗ the function (6.154) of σN −k+1 , . . . , σN , which, as
already noted, is the density with respect to ηk of the law of σN −k+1 , . . . , σN
under Gibbs’ measure. Thus V is the th -marginal of Yk∗ , that is, it is ob-
tained by averaging Yk∗ over all σN −k+j for j = with respect to η.
Therefore
E •
= U , (6.162)
AvE •
≤k
where
E •
U = .
AvE •
Let us think
of U as a function
- of σN −k+ only, so we can write for consistency
of notation ≤k U = ≤k U . Thus (6.161) means
' '
' ∗ . ' Kk 2
'Yk − U '
E1 Ωc ' ' ≤ N E exp 2S .
1≤k
-
Now Yk∗ − ≤k U 1 ≤ 2, and combining with (6.79) we get
' '
' ∗ . ' Kk 2
'
E'Yk − U '
' ≤ N E exp 2S . (6.163)
1≤k
then (6.159) shows that to prove Theorem 6.6.4, the following estimates suf-
fices.
k3 K
D(N, γ0 , k) ≤ E exp 2S .
N
For this we relate the N -spin system with the (N − k)-spin system. For this
purpose, the crucial equation is (6.162). The sequence V1 , . . . , Vk is distributed
as (Y1 , . . . , Yk ). Moreover, if for i ≤ N − k we denote by Yi− the density with
respect to η of the law of σi under the Gibbs measure of the (N − k)-spin
system, we have, recalling the notation (6.135)
E • = E Y ,
We can then complete the proof of Theorem 6.6.8 along the same lines as in
Theorem 6.4.1. The functions (E )≤k do not depend on too many spins. We
can use the induction hypothesis and Lemma 6.6.2 to show that we can find a
sequence X = (X1 , . . . , XN −k+1 ) of identically distributed random elements
of D, of law μ− (= μγ− , where γ− is given by (6.74)), so that
376 6. The Diluted SK Model and the K-Sat Problem
' '
' E X '
E 1Ω c '
'V −
'
AvE X '1
≤k
is not too large. Then the sequence ( E X / AvE X )≤k is nearly i.i.d. with
law T (μ− ), and hence nearly i.i.d. with law μ. Completing the argument
really amounts to copy the proof of Theorem 6.4.1, so this is best left as an
easy exercise for the motivated reader. There is nothing else to change either
to the proof of Theorem 6.4.13.
We end this section by a challenging technical question. The relevance
of this question might not yet be obvious to the reader, but it will become
clearer in Chapter 8, after we learn how to approach the “spherical model”
through the “Gaussian model”. Let us consider the sphere
√
SN = {σ ∈ RN ; σ = N } (6.167)
The situation here is that, even though each of the individual functions
t → θ(tσi(k,1) , . . . , tσi(k,p) ) can be wildly discontinuous, these discontinuities
should be smoothed out by the integration for λN . Even the case θ is not
random and p = 1 does not seem obvious.
σ ∈ Uk ⇔ (σi(k,1) , . . . , σi(k,p) ) ∈ Vk .
V = {θ = 0} .
Lemma 6.7.4. The density Y with respect to η of the law of σ1 under Gibbs’
measure satisfies
This lemma is purely deterministic, and is true for any realization of the
disorder. It is good however to observe right away that r is a Poisson r.v.
with Er = γ, where as usual γ = αp and EM = αN .
Proof. Since the density of Gibbs’ measure with respect to η ⊗N is propor-
tional to exp(−HN (σ)), the function Y (σ1 ) is proportional to
satisfies
K
E (R − E R )2 ≤ √ , (6.181)
N
where K depends only on κ, n, D and on the quantity (6.178).
The power of this statement might not be intuitive, but soon we will
show that it has remarkable consequences. Throughout the proof, K denotes
a number depending only on κ, n, A, D and on the quantity (6.178).
Lemma 6.7.6. The conditions of Proposition 6.7.5 imply:
K
(R − R )2 ≤ √ . (6.182)
N
Proof. The Gibbs’ measure on RN n has a density proportional to
exp − HN (σ ) − κ σ 2
≤n ≤n
so that
ϕ (0) = R .
We will deduce (6.184) from Lemma 4.5.2 used for k = 1 and δ = 0,
λ0 = 1/K, C0 = K, C1 = K, C2 = K/N , and much of the work consists
in checking conditions (4.135) to (4.138) of this lemma. Denoting by · λ
an average for the Gibbs’ measure with density with respect to Lebesgue’s
measure proportional to
exp − HN (σ ) − κ
σ + λN R ,
2
(6.185)
≤n ≤n
and that the left-hand side is zero unless i = j. This implies in turn that at
every point the second differential D of R satisfies |D(x, y)| ≤ Kxy/N
for every x, yin RN n . On the other hand, the second differential D∼ of the
function −κ ≤n σ 2 /2 satisfies at every point D∼ (x, x) = −κx2 for
every x in RN n . Therefore if Kλ ≤ κ, at every point the second differential
D∗ of the function (6.186) satisfies D∗ (x, x) ≤ 0 for every x in RN n , and
consequently this function is concave. Then the quantity (6.185) is of the
type
κ 2
exp U − σ
2
≤n
where U is concave; we can then use (6.183) and (3.17) to conclude that
ϕ (λ) = N (R − R λ )2 λ ≤K,
and this proves (4.137) with δ = 0 and hence also (4.136). It remains to prove
(4.138). For j ≤ N let us define
−Hj = θk (σi(k,1) , . . . , σi(k,p) ) .
k≤M,i(k,p)=j
To prove (4.138), it suffices to prove that for any given value of m we have
K
E(Em+1 ϕ(λ) − Em ϕ(λ))2 ≤ .
N2
Consider the Hamiltonian
− H ∼ (σ) = − Hj (6.187)
j =m+1
and
1
ϕ∼ (λ) = log exp H ∼ (σ ) − κ σ 2 + λN R dσ 1 · · · dσ n .
N
≤n ≤n
It should be obvious that (since we have omitted the term Hm+1 in (6.187))
so that
2
E Em+1 ϕ(λ) − Em ϕ(λ))2 = E(Em+1 (ϕ(λ) − ϕ∼ (λ)) − Em (ϕ(λ) − ϕ∼ (λ))
2
≤ 2E Em+1 (ϕ(λ) − ϕ∼ (λ))
2
+ 2E Em (ϕ(λ) − ϕ∼ (λ))
≤ 4E(ϕ(λ) − ϕ∼ (λ))2 .
r = card{k ≤ M ; i(k, p) = m + 1} ,
and thus
) 2 *
1 2
1
(ϕ(λ) − ϕ∼ (λ))2 ≤ 2 − Hm+1 (σ ) ≤ 2 Hm+1 (σ ) .
N ∼ N
≤n ≤n ∼
and therefore
|Hm+1 (σ )| ≤ ak + A ni |σi | ,
k∈I i≤N
where ni ∈ N and ni = rp, because each of the r terms in Hm+1 creates at
most p terms in the right-hand side. The randomness of Hm+1 is independent
of the randomness of · ∼ , and since Er2 ≤ K and Ea2k < ∞, by (6.178) it
suffices to prove that if i ≤ N then E (σi )2 ∼ ≤ K. This is done by basically
copying the proof of Lemma 6.7.3. Using (6.183) the density Y with respect
to η of the law of σi under Gibbs’ measure satisfies
∀x, y ∈ R , Y (x) ≤ Y (y) exp (ri A + K0 /N )|x − y| ,
To see this, we first note that without loss of generality we can assume that
|Uj | ≤ 1 for each j. Consider for each j ≤ k a function Uj∼ with |Uj∼ | ≤ 1
and assume that for some number S we have
Then
|Uj (σ1 , . . . , σn ) − Uj∼ (σ1 , . . . , σn )| ≤ 1{σs ≥S} ,
s≤n
and therefore
| Uj (σ1 , . . . , σn ) − Uj∼ (σ1 , . . . , σn ) | ≤ 1{σs ≥S} .
s≤n
and therefore
|EC − EC ∼ | ≤ mj E 1{σs ≥S} = n mj E 1{σ1 ≥S} .
j≤k s≤n j≤k
By Lemma 6.7.3, the right-hand side can be made small for S large, and since
we can choose the functions Uj that satisfy (6.196) and Uj (σ1 , . . . , σn ) = 0 if
one of the numbers |σs | is ≥ 2S, this indeed shows that we can assume (6.195).
A function Uj that satisfies (6.195) can be uniformly approximated by a
finite sum of functions of the type
f1 (σ1 ) · · · fn (σn ) ,
()
where |fs | is bounded for s ≤ n and = 0, 1, 2. By expansion we then reduce
to the case where
Uj (σ1 , . . . , σn ) = f1,j (σ1 ) · · · fn,j (σn ) (6.198)
()
and we can furthermore assume that |fs,j | is bounded for = 0, 1, 2 and
s ≤ n. Assuming (6.194) and (6.198) we have
B := EV ( U1 (σ1 , . . . , σn ) , . . . , Uk (σ1 , . . . , σn ) )
= E f1,1 (σ1 ) · · · fn,1 (σn ) m1
· · · f1,k (σ1 ) · · · fn,k (σn ) mk
.
We will write this expression using replicas. Let m = m1 + · · · + mk . Let
us write {1, . . . , m} as the disjoint union of sets I1 , . . . , Ik with cardIj = mj ;
and for ∈ Ij and s ≤ n let us set
gs, = fs,j ,
so that in particular for ∈ Ij we have s≤n gs, (σs ) = s≤n fs,j (σs ). Then,
using independence of replicas in the first equality, we get
gs, (σs ) = gs, (σs )
≤m s≤n ≤m s≤n
m1 mk
= fs,1 (σs ) ··· fs,k (σs ) ,
s≤n s≤n
386 6. The Diluted SK Model and the K-Sat Problem
and therefore
B=E gs, (σs ) =E gs, (σs ) .
≤m s≤n s≤n ≤m
Defining
1
Rs = gs, (σi ) ,
N
i≤N ≤m
Proposition 6.7.5 shows that for each s we have E |Rs − ERs | ≤ KN −1/4 ,
so that, replacing in turn each Rs by E Rs one at a time,
E R − E R ≤ K ,
s s N 1/4
s≤n s≤n
and therefore
K
B − E Rs ≤ 1/4 . (6.201)
N
s≤n
so that (6.202) is exactly (6.193) in this special case. As we have shown, this
special case implies the general one.
Given n, k, and a number C, inspection of the previous argument shows
that the convergence is uniform over the families of functions U1 , . . . , Uk that
satisfy |U1 |, . . . , |Uk | ≤ C.
We turn to the main result of this section, the proof that “in the limit
μ = T (μ)”. We recall the definition of Er as in (6.49), and that r is a Poisson
r.v. of expectation αp. Let us denote by X = (Xi )i≥1 an i.i.d. sequence,
where Xi ∈ D is a random element of law μ = μN (the law of the density
with respect to η of the law of σ1 under Gibbs’ measure), and let us define
T (μ) as follows: if
Er X
Y = ∈D,
AvEr X
then T (μ) is the law of Y in D. The following asserts in a weak sense that in
the limit T (μN ) = μN .
To relate (6.203) with the statement that “T (μ) = μ”, we note that
Avfs (ε)Er X
= Y fs dη ,
AvEr X
In a weak sense this asserts that in the limit the laws of X (i.e μ) and Y (i.e.
T (μ)) coincide.
388 6. The Diluted SK Model and the K-Sat Problem
and let us consider a Poisson r.v. r with Er = α p. The letter r keeps this
meaning until the end of this chapter. For j ≥ 1, let us consider independent
copies θj of θ, and sets {i(j, 1), . . . , i(j, p − 1)} that are uniformly distributed
among the subsets of {1, . . . , N } of cardinality p − 1. Of course we assume
that all the randomness there is independent of the randomness of · . Let us
define
−H(σ, ε) = θj (σi(j,1) , . . . , σi(j,p−1) , ε)
j≤r
Next, we have
δ √ √
E ≤ δ + P( AvE ≤ δ) ,
δ + AvE
and, writing H = H(σ, ε),
so that
√ 1 E Av|H|
P AvE ≤ δ ≤ P −AvH ≥ log √ ≤ √ .
δ log(1/ δ)
so that (6.178) and Lemma 6.7.3 imply that supN E Av|H| < ∞ and the
lemma is proved.
Ω1 = {∃j ≤ r , i(j, 1) = 1}
Ω2 = {∃j, j ≤ r , j = j , ∃, ≤ p − 1 , i(j, ) = i(j , )} (6.210)
Ω3 = {(p − 1)(r + 1) ≤ N } , (6.211)
Without loss of generality we can assume that |fs | ≤ 1 for each s. The
inequality (6.197) and Lemma 6.7.10 yield
f1 (σ1 )AvE fn (σ1 )AvE f1 (σ1 )AvE fn (σ1 )AvE
lim sup E ··· −E ···
δ→0 N δ + AvE δ + AvE AvE AvE
=0. (6.217)
Combining with (6.217) and (6.216) proves (6.209) since fs (σ1 ) = fs (σ1 ) X.
To complete the proof of Theorem 6.7.9, we show the following, where we
lighten notation by writing fs = fs (ε).
Proof. We follow the method of Lemma 6.7.11, keeping its notation. For
s ≤ n we define
Us = Avfs (ε) exp θ(σj(p−1)+1 , . . . , σ(j+1)(p−1) , ε)
1≤j≤r
Since the influence of Ω vanishes in the limit, and exchanging again the limits
N → ∞ and δ → 0 as permitted by Lemma 6.7.10 (and a similar argument
for the terms E Us X /(δ + U X )), we obtain
Avf E Avfn E Un X
1 U1 X
lim E ··· −E ··· =0.
N →∞ AvE AvE U X U X
One really wonders what kind of methods could be used to approach this
question. Even if this can be solved, the challenge remains to find situations
where in the relation (see (6.170))
1
E log η ⊗N Uk
N
k≤M
1
= lim E log exp β θk (σi(k,1) , . . . , σi(k,p) )dη ⊗N (σ)
β→∞ N
k≤M
f ∈ D(B) ⇒ |f (x)|dη(x) ≤ ε ,
|x|≥x0
The map
ν → ψ(ν) := f1 Y dη · · · fn Y dη dν(Y )
Avf1 Er X Avfn Er X
E ···
AvEr X AvEr X
= f1 Y dη · · · fn Y dη dT (νN )(Y ) (6.223)
and the limit of the previous quantity along the sequence (N (k)) is
f1 Y dη · · · fn Y dη dT (ν)(Y ) .
We will now show that this identity implies ν = T (ν), a contradiction which
completes the proof of the theorem. Approximating a function on a bounded
set by a polynomial yields that if F is a continuous function of n variables,
then
F f1 Y dη, . . . , fn Y dη dν(Y )
= F f1 Y dη, . . . , fn Y dη dT (ν)(Y ) .
Consequently,
ϕ(Y )dν(Y ) = ϕ(Y )dT (ν)(Y ) , (6.225)
7.1 Introduction
Given positive numbers c(i, j), i, j ≤ N , the assignment problem is to find
min c(i, σ(i)) , (7.1)
σ
i≤N
The link with (7.2) is that it can be shown that if the r.v.s c(i, j) are
i.i.d., and their common distribution has a density f on R+ with respect
to Lebesgue measure, then if f is continuous in a neighborhood of 0, the
limit in (7.2) depends only on f (0). (The intuition for this is simply that all
the numbers c(i, σ(i)) relevant in the computation of the minimum in (7.2)
should be very small for large N , so that only the part of the distribution of
c(i, j) close to 0 matters.) Thus it makes no difference to assume that c(i, j)
is uniform over [0, 1] or is exponential of mean 1.
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 397
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 7, © Springer-Verlag Berlin Heidelberg 2011
398 7. An Assignment Problem
Research Problem 7.1.1. (Level 2) Extend the results of the present chap-
ter to the case β ≤ β0 where β0 is independent of α.
Even in the domain β ≤ β(α) our results are in a sense weaker than those
of the previous chapters. We do not study the model for given large values
of N and M , but only in the limit N → ∞ and M/N → α, and we do not
obtain a rate for several of the convergence results.
One of the challenges of the present situation is that it is not obvious
how to formulate the correct questions. We expect (under our condition that
β is small) that “the spins at two different sites are nearly independent”.
Here this should mean that when i1 = i2 , under Gibbs’ measure the variables
σ → σ(i1 ) and σ → σ(i2 ) are nearly independent. But how could one quantify
this phenomenon in a way suitable for a proof by induction?
We consider the partition function
ZN,M = exp(−HN,M (σ)) , (7.5)
σ
7.1 Introduction 399
so that
ZN,M = a(i, σ(i)) .
σ i≤N
The cavity method will require removing elements from {1, . . . , N } and
{1, . . . , M }. Given a set A ⊂ {1, . . . , N } and a set B ⊂ {1, . . . , M } such that
N − card A ≤ M − card B, we write
ZN,M (A; B) = a(i, σ(i)) .
σ
The product is taken over i ∈ {1, . . . , N }\A and the sum is taken over
the one-to-one maps σ from {1, . . . , N }\A to {1, . . . , M }\B. Thus ZN,M =
ZN,M (∅; ∅). When A = {i1 , i2 , . . .} and B = {j1 , j2 , . . .} we write
Rather than working directly with Gibbs’ measure, we will prove that
It should be obvious that this is a very strong property, and that it deals with
independence. One can also get convinced that it deals with Gibbs’ measure
by observing that
ZN,M (i, j)
G({σ(i) = j}) = a(i, j) .
ZN,M
These quantities occur in the right-hand side of (7.7). The number uN,M (j)
is the Gibbs probability that j does not belong to the image of {1, . . . , N }
under the map σ. In particular we have 0 ≤ uN,M (j) ≤ 1. (On the other
hand we only know that wN,M (i) > 0.)
Having understood that these quantities are important, we would like
to know something about the family (uN,M (j))j≤M (or (wN,M (i))i≤N ). An
optimistic thought is that this family looks like an i.i.d. sequence drawn out
of a certain distribution, that we would like to describe, probably as a fixed
point of a certain operator. Analyzing the problem, it is not very difficult to
400 7. An Assignment Problem
guess what the operator should be; the unpleasant surprise is that it does
not seem obvious that this operator has a fixed point, and this contributes
significantly to the difficulty of the problem. In order to state our main result,
let us describe this operator. Of course, the motivation behind this definition
will become clear only gradually.
Consider a standard Poisson point process on R+ (that is, its intensity
measure is Lebesgue’s measure) and denote by (ξi )i≥1 an increasing enumer-
ation of the points it produces. Consider a probability measure η on R+ , and
i.i.d. r.v.s (Yi )i≥1 distributed according to η, which are independent of the
r.v.s ξi . We define
1
A(η) = L (7.9)
i≥1 Yi exp (−βξi /(1 + α))
1
B(η) = L , (7.10)
1 + i≥1 Yi exp(−βξi )
where of course L(X) is the law of the r.v. X. The dependence on β and α
is kept implicit.
Theorem 7.1.2. Given α > 0, there exists β(α) > 0 such that for β ≤ β(α)
there exists a unique pair μ, ν where μ is a probability measure on [0, 1] and
ν is a probability measure on R+ such that
α
xdμ(x) = ; μ = B(ν) ; ν = A(μ) . (7.11)
1+α
One intrinsic difficulty is that there exists such a pair for each value of α
(not too small); so one cannot expect that the operator B ◦ A is a contraction
for a certain distance. The way we will prove (7.11) is by showing that a
cluster point of the sequence (L(uN,M (M )), L(wN,M (N ))) is a solution of
these equations.
While it is not entirely obvious what are the relevant questions one should
ask about the system, the following shows that the objects of Theorem 7.1.2
are of central importance.
1
lim E log ZN,M = −(1 + α) log x dμ(x) − log x dν(x) . (7.13)
N →∞ N
7.2 Overview of the Proof 401
In this section we try to describe the overall strategy. The following funda-
mental identities are proved in Lemma 7.3.4 below
1
uN,M (M ) = (7.14)
1+ k≤N a(k, M ) wN,M −1 (k)
1
wN,M (N ) = . (7.15)
≤M a(N, ) uN −1,M ()
Observe that in the right-hand side of (7.14) the r.v.s a(k, M ) are independent
of the numbers wN,M −1 (k), and similarly in (7.15). We shall prove that
As a consequence, given the numbers wN,M −2 (k), the r.v.s uN,M (M ) and
uN,M (M − 1) are nearly independent. Their common law depends only on
the empirical measure
1
δwN,M −2 (i) ,
N
i≤N
where La denotes the law in the randomness of the variables a(k), when all
the other sources of randomness are fixed.
Thus, given the numbers wN,M (k), the r.v.s uN,M (M ) and uN,M (M − 1)
are nearly independent with common law μN . By symmetry this is true for
each pair of r.v.s uN,M (j) and uN,M (k).
Therefore we expect that the empirical measure
1
μN = δuN,M (j)
M
j≤M
is nearly μN .
Since μN is a continuous function of νN , it follows that if νN is concen-
trated (in the sense that it is nearly non-random), then such is the case of
μN , that is nearly concentrated around its mean μN , and therefore μN itself
is concentrated around μN .
We can argue similarly that if μN is concentrated around μN , then νN
must be concentrated around a certain measure νN that can be calculated
from μN . The hard part of the proof is to get quantitative estimates showing
that if β is sufficiently small, then these cross-referential statements can be
combined to show that both μN and νN are concentrated around μN and
νN respectively. Now, the way μN is obtained from νN means in the limit
that μN B(νN ). Similarly, νN A(μN ). Also, μN = L(uN,M (M )) and
νM = L(wN,M (N )), so μ = limN L(uN,M (M )) and ν = limN L(wN,M (N ))
satisfy μ = B(ν) and ν = A(μ).
Lemma 7.3.1. If i ∈
/ A, we have
ZN,M (A; B) = a(i, ) ZN,M (A ∪ {i} ; B ∪ {}) . (7.21)
∈B
/
If j ∈
/ B, we have
ZN,M (A; B) = ZN,M (A; B ∪ {j}) + a(k, j) ZN,M (A ∪ {k} ; B ∪ {j}) .
k∈A
/
(7.22)
7.3 The Cavity Method 403
Proof. One replaces each occurrence of ZN,M (·; ·) by its value and one checks
that the same terms occur in the left-hand and right-hand sides.
The following deserves no proof.
Lemma 7.3.2. If M ∈
/ B, we have
If N ∈
/ A, we have
and thus
uN,M () = M − N . (7.26)
≤M
To prove (7.26) we can also observe that uN,M () is the Gibbs probability
that does not belong to the image under σ of {1, · · · , N }, so that the left-
hand side of (7.26) is the expected number of integers that do not belong to
this image, i.e. M − N . In particular (7.26) implies by symmetry between the
values of that EuN,M (M ) = (M − N )/M α/(1 + α), so that any cluster
point μ of the sequence L(uN,M (M )) satisfies xdμ(x) = α/(1 + α).
This proves (7.27). The proof of (7.28) is similar, using now (7.21) and (7.24).
It will be essential to consider the following quantity, where i ≤ N :
ZN,M ZN,M −1 (i; ∅) − ZN,M (i; ∅) ZN,M −1
LN,M (i) = 2 . (7.30)
ZN,M
The idea is that (7.7) used for j = M implies that ELN,M (i)2 is small.
(This expectation does not depend on i.) Conversely, if ELN,M (i)2 is small
this implies (7.7) for j = M and hence for all values of j by symmetry.
We will also use the quantity
ZN,M ZN,M −1 (∅; j) − ZN,M (∅; j) ZN,M −1
RN,M (j) = 2 . (7.31)
ZN,M
It is good to notice that |RN,M (j)| ≤ 2. This follows from (7.23) and the fact
that the quantity ZN,M (A, B) decreases as B increases.
The reason for introducing the quantity RN,M (j) is that it occurs natu-
rally when one tries to express LM,N (i) as a function of a smaller system (as
the next lemma shows).
Lemma 7.3.5. We have
a(N, ) RN −1,M () − a(N, M ) uN −1,M (M )2
≤M −1
LN,M (N ) = − 2 (7.32)
≤M a(N, ) uN −1,M ()
k≤N a(k, M ) LN,M −1 (k)
RN,M (M − 1) = − 2 . (7.33)
1 + k≤N a(k, M ) wN,M −1 (k)
Proof. Using the definition (7.31) of RN,M (j) with j = M − 1, we have
ZN,M ZN,M −1 (∅; M − 1) − ZN,M (∅; M − 1) ZN,M −1
RN,M (M − 1) = 2 .
ZN,M
(7.34)
As in (7.29), but using now (7.22) with B = {M − 1} and j = M we obtain:
ZN,M (∅; M − 1) = ZN,M −1 (∅; M − 1)
+ a(k, M ) ZN,M −1 (k; M − 1) . (7.35)
k≤N
Using this and (7.29) in the numerator of (7.34), and (7.29) in the denomina-
tor, and gathering the terms yields (7.33). The proof of (7.32) is similar.
We end this section by a technical but essential fact.
7.4 Decoupling 405
7.4 Decoupling
In this section, we prove (7.7) and, more precisely, the following.
Theorem 7.4.1. Given α > 0, there exists β(α) > 0 such that if β ≤ β(α)
and M = N (1 + α), then for βN ≥ 1
K(α)
E LN,M (N )2 ≤ (7.37)
N
K(α)
E RN,M (M − 1)2 ≤ . (7.38)
N
The method of proof consists of using Lemma 7.3.5 to relate ERN,M (M −
1)2 with ELN,M −1 (N )2 and ELN,M (N )2 with ERN −1,M (M − 1)2 , and to it-
erate these relations. In the right-hand sides of (7.32) and (7.33), we will first
take expectation in the quantities a(N, ) and a(k, M ), that are probabilisti-
cally independent of the other quantities (an essential fact). Our first task is
to learn how to do this.
We recall the random sequence a(k) = exp(−βN Xk ) of (7.20), where
(Xk ) are i.i.d., uniform over [0, 1], and independent of the other sources of
randomness. The following lemma is obvious.
Lemma 7.4.2. We have
1 1
E a(k)p = (1 − exp(−βpN )) ≤ . (7.39)
βpN βpN
Lemma 7.4.3. Consider numbers (xk )k≤N . Then we have
2
1 1
E a(k) xk ≤ + x2k . (7.40)
2β 2 N 2βN
k≤N k≤N
406 7. An Assignment Problem
The sequence (a(k, M ))k≤N has the same distribution as the sequence
(a(k))k≤N , so that taking expectation first in this sequence and using (7.40)
we get, assuming without loss of generality that β ≤ 1,
1 1
E RN,M (M − 1)2 ≤ E LN,M −1 (k)2 = E LN,M −1 (N )2
β2N β2
k≤N
Lβ 3
+ 6 y()2 , (7.41)
b N
≤M
As will be apparent later, an essential feature is that the second term of this
bound has a coefficient β 3 (rather than β 2 ).
1 N α α
b := u() ≥ ≥
N N 2
≤M
K(α) Lβ 3
Ea LN,M (N )2 ≤ + 6 y()2 . (7.44)
N α N
≤M −1
In this definition we assume that the values of ZN −k,M that are relevant
for the computation of RN −k,M −k have been computed with the parameter
β replaced by the value β such that β (N − k) = βN . We observe that
M − k = N (1 + α) − k ≥ (N − k)(1 + α) and M − k ≤ 3(N − k).
Combining Corollaries 7.4.6 and 7.4.4, implies that if β (N − k) = βN ≥ 1
and β ≤ α/80 we have
Lβ K(α)
V (k) ≤ V (k + 1) + . (7.45)
α6 N
Let us assume that k ≤ N/2, so that b ≤ 2b. Then (7.45) holds whenever
β ≤ α/160. Thus if Lβ/α6 ≤ 1/2, k ≤ N/2 and βN ≥ 1, we obtain
1 K(α)
V (k) ≤ V (k + 1) + .
2 N
Combining these relations yields
K(α) K(α)
V (0) ≤ 2−k V (k) + ≤ 2−k+2 +
N N
since V (k) ≤ 4. Taking k log N proves (7.38), and (7.37) follows by (7.42).
2 K(α)
E uN,M (j) − uN,M −1 (j) ≤ (7.46)
N
2 K(α)
E uN,M (j) − uN −1,M (j) ≤ (7.47)
N
2 K(α)
E wN,M (i) − wN,M −1 (i) ≤ (7.48)
N
2 K(α)
E wN,M (i) − wN −1,M (i) ≤ . (7.49)
N
Proof. The proofs are similar, so we prove only (7.46). We can assume j =
M − 1. Using (7.29) and (7.35) we get
ZN,M (∅; M − 1)
uN,M (M − 1) =
ZN,M
ZN,M −2 1 + k≤N a(k, M ) wN,M −2 (k)
= .
ZN,M −1 1 + k≤N a(k, M ) wN,M −1 (k)
which is obvious from (7.30). Using this identity for M − 1 rather than M ,
we obtain
uN,M (M − 1) − uN,M −1 (M − 1)
ZN,M −2 1 + k≤N a(k, N ) wN,M −2 (k)
= −1
ZN,M −1 1 + k≤N a(k, N ) wN,M −1 (k)
k≤N a(k, N ) LN,M −1 (k)
= .
1 + k≤N a(k, N ) wN,M −1 (k)
There is of course nothing magic about the number 8, this result is true for
any other number (with a different condition on β). As the proof is tedious,
it is postponed to the end of this section.
Proof of Proposition 7.4.5. First we reduce to the case u() = u () by
using that 2cc ≤ c2 + c 2 for
−2 −2
c= a()u() ; c = a()u () .
≤M ≤M
Expending the square in the numerator of the left-hand side, we see that it
equals I + II, where
ȧ( )2
I= y( )2 E 4 (7.52)
≤M ≤M a()u()
ȧ(1 )ȧ(2 )
II = y(1 )y(2 )E 4 .
1 =2 ≤M a()u()
To bound the terms of I, let us set S = = a()u(), so
ȧ( )2 ȧ( )2 1
4 ≤ E 4 = Eȧ( ) E 4
2
E
S S
≤M a()u()
by independence. Now since ≤M u() ≥ 4 and u( ) ≤ 1, we have
3 3
u() ≥ u() ≥ b , (7.53)
4 4
= ≤M
so using (7.50) for M − 1 rather than M and 3b/4 rather than b we get
ES−4
≤ Lβ 4 /b4 ; since Eȧ( )2 ≤ Ea( )2 ≤ 1/βN , we have proved that, using
that b ≤ 1 in the second inequality
7.4 Decoupling 411
Lβ 3 Lβ 3
I≤ y()2
≤ y()2 .
N b4 N b6
≤M ≤M
and
U = a(1 )u(1 ) + a(2 )u(2 ) ≥ 0 .
Thus ≤M a()u() = S(1 , 2 ) + U . Since U ≥ 0, a Taylor expansion yields
1 1 4U R
4 = 4
− 5
+ 6
(7.54)
(S( , )) S( , ) S( 1 , 2 )
≤M a()u()
1 2 1 2
where |R| ≤ 15U 2 . Since S(1 , 2 ) is independent of a(1 ) and a(2 ), and since
Eȧ(1 )ȧ(2 )U = 0, multiplying (7.54) by ȧ(1 )ȧ(2 ) and taking expectation
we get
15|ȧ(1 )ȧ(2 )|U 2
ȧ(1 )ȧ(2 )
E 4 ≤ E
a()u() S(1 , 2 )6
≤M
1
= 15E(|ȧ(1 )ȧ(2 )|U 2 )E .
S(1 , 2 )6
L
E(|ȧ(1 )ȧ(2 )|U 2 ) ≤ .
(βN )2
We also have that ES(1 , 2 )−6 ≤ Lβ 6 /b6 by (7.50) (used for k = 6 and M −2
rather than M , and proceeding as in (7.53)). Thus
2
Lβ 4 Lβ 4
II ≤ |y( 1 )y( 2 )| ≤ |y()|
b6 N 2 b6 N 2
1 =2 ≤M
Lβ 4
≤ 6 y()2 ,
b N
≤M
For t ≤ 1, taking λ = e/t, and since then log λ − λt/e = log e/t − 1 = − log t,
we get
tb
P Y ≤ ≤ tb/2β .
2eβ
Exercise 7.4.10. Prove that for a r.v. Y ≥ 0 one has the formula
∞
1
EY −k = tk−1 E exp(−tY )dt ,
(k − 1)! 0
1
μN = δuN,M (j) . (7.59)
M
j≤M
We recall the sequence a(k) = exp(−βN Xk ), where (Xk ) are i.i.d., uni-
form over [0, 1] and independent of the other sources of randomness. Consider
the random measure μN on [0, 1] given by
1
μN = La ,
1 + k≤N a(k)wN,M (k)
where La denotes the law in the randomness of the variables a(k) with all
the other sources of randomness fixed. Thus, for a continuous function f on
[0, 1] we have
1
f dμN = Ea f ,
1 + k≤N a(k)wN,M (k)
where Ea denotes expectation in the r.v.s a(k) only. Consider the (non-
random) measure μN = EμN , so that
1
f dμN = Ef .
1 + k≤N a(k)wN,M (k)
In the next section we shall make precise the intuition that “νN determines
μN ” and “μN determines νN ” to conclude the proof of Theorem 7.1.2.
It is helpful to consider an appropriate distance for probability measures.
Given two probability measures μ, ν on R, we consider the quantity
Δ(μ, ν) = inf E(X − Y )2 ,
where the infimum is over the pairs (X, Y ) of r.v.s such that X has law
μ and Y has law ν. The quantity Δ1/2 (μ, ν) is a distance. This statement
is not obvious, but is proved in Section A.11, where the reader may find
more information. This distance is called Wasserstein’s distance between μ
and ν. It is of course related to the transportation-cost distance considered in
Chapter 6, but is more convenient here. Let us observe that since E(X −Y )2 ≥
(EX − EY )2 we have
2
xdμ(x) − xdν(x) ≤ Δ(μ, ν) . (7.61)
7.5 Empirical Measures 415
We observe that the bistochastic matrices are exactly the matrices aij =
N P(X = xi , Y = yj ). Thus the left-hand side of (7.63) is
1
inf aij (xi − yj )2 ,
N
i,j≤N
where the infimum is over all bistochastic matrices (aij ). The infimum is at-
tained at an extreme point, and it is a classical result (“Birkhoff’s theorem”)
that this extreme point is a permutation matrix.
Lemma 7.5.3. Given numbers w(k), w (k) ≥ 0 we have
2
1 1
E −
1 + k≤N a(k)w(k) 1 + k≤N a(k)w (k)
2
≤ 2 (w(k) − w (k))2 . (7.64)
β N
k≤N
Consequently
1 1
Δ L ,L
1 + k≤N a(k)w(k) 1+ k≤N a(k)w (k)
2
≤ (w(k) − w (k))2 . (7.65)
β2N
k≤N
416 7. An Assignment Problem
K
E(uN,M (M ) − u)2 ≤ .
N
Exchanging the rôles of M and M − 1 shows that if
1
u =
1+ k≤N a(k, M − 1)wN,M −2 (k)
we have
K
E(uN,M (M − 1) − u )2 = E(uN,M (M ) − u)2 ≤ .
N
Therefore to prove (7.66) it suffices to prove that
lim E f (u) − f dμN f (u ) − f dμN = 0 . (7.67)
N →∞
where
1 1
u1 = ; u1 = ,
1+ k≤N a(k)wN,M (k) 1 + k≤N a (k)wN,M (k)
7.5 Empirical Measures 417
Let us denote by Ea expectation only in the r.v.s a(k), a (k), a(k, M ) and
a(k, M − 1), which are probabilistically independent of the r.v.s wN,M −2 (k).
Then, by independence,
Ea (f (u) − f (u2 ))(f (u ) − f (u2 )) = (Ea f (u) − Ea f (u2 ))(Ea f (u ) − Ea f (u2 )).
Proof. We have
1
f dμN = f (uN,M ())
M
≤M
EΔ(μN , μN ) ≤ EΔ(μN , μ
N ) . (7.70)
N is an independent
Taking expectation in (7.71) concludes the proof, since μ
0N .
copy of μ
Let us observe the inequality
By (7.70) and Lemma 7.5.8 this proves (7.73). To prove (7.74) we simply use
(7.72) to write that
Δ(μN , μ∼ ∼
N ) ≤ 2Δ(μN , μN ) + 2Δ(μN , μN ) ,
∼ Lβ 3
lim sup EΔ(νN , νN )≤ lim sup EΔ(μN , μ∼
N)
N →∞ α6 N →∞
Lβ 3 16 ∼
≤ 6 2 lim sup EΔ(νN , νN ),
α β N →∞
so that if 16Lβ/α6 < 1 then
∼
lim sup EΔ(νN , νN ) = lim sup EΔ(μN , μ∼
N) = 0
N →∞ N →∞
Lemma
7.5.11. Consider
numbers u(), u () ≥ 0 for ≤ M and assume
that ≤M u() = ≤M u () ≥ N α/2. Then we have
2
1 1 Lβ 3
E − ≤ (u() − u ())2 .
≤M a()u() ≤M a()u () α6 N
≤M
(7.77)
Consequently we have
1 1 Lβ 3
Δ L ,L ≤ 6 (u()−u ())2 .
≤M a()u() ≤M a()u () α N
≤M
(7.78)
Proof. We write
2
1 1
−
≤M a()u() ≤M a()u ()
2
≤M (u() − u ())a()
≤ 2 2 ,
≤M u()a() ≤M u ()a()
and we use (7.41) with y() = u() − u (), so that ≤M y() = 0.
Consider the random measure ν N on R given by +
1
ν N = La ,
≤N a()uN,M ()
pendent copy of the family (uN,M ())≤M . By Lemma 7.5.2 we can find a
permutation σ with
1 2
uN,M () − u∼
N,M (σ()) = Δ(μN , μ∼
N) .
M
≤M
so that
K(α)
EΔ(νN,b , νN ) ≤ , (7.80)
b2
and using such a uniformity, rather than (7.75) it suffices to prove for each b
the corresponding result when in the left-hand side “everything is truncated
at level b”. More specifically, defining νN,b by
1
f dνN,b = Ef min b, ,
≤M a()uN,M ()
7.6 Operators
The definition of the operators A and B given in (7.9) and (7.10) is pretty,
but it does not reflect the property we need. The fundamental property of
the operator A is that if the measure M −1 ≤M δu() approaches the mea-
−1
sure μ, the law of ≤M aN ()u() approaches A(μ), where aN () =
exp(−N βX ), M/N 1 + α, and where of course the r.v.s (X )≥1 are
i.i.d. uniform over [0, 1]. Since the description of A given in (7.9) will not be
needed, its (non-trivial) equivalence with the definition we will give below in
Proposition 7.6.2 will be left to the reader.
In order to prove the existence of the operator A, we must prove that if
two measures
1 1
δu() and δu ()
M M
≤M ≤M
1 1 M M
≤ K(α) + + −
N N N N
2
Lβ 3 Lβ 2
+ 6 Δ(η, η ) + 4 xdη(x) − xdη (x) . (7.81)
α α
Proposition 7.6.2. Given a number α > 0 there exists a number β(α) > 0
with the following property. If β ≤ β(α) and if μ is a probability measure on
7.6 Operators 423
[0, 1] with xdμ(x) ≥ α/4, there exists a unique probability measure A(μ) on
R+ with the following property. Consider numbers 0 ≤ u() ≤ 1 for ≤ M ,
and set
1
η= δu() .
M
≤M
Then
1 1 M
Δ A(μ), L +
≤ K(α) − (1 + α)
≤M N ()u()
a N N
2
Lβ 2
+ 4 xdμ(x) − xdη(x)
α
Lβ 3
+ 6 Δ(μ, η) . (7.82)
α
Moreover, if μ is another probability measure and if xdμ (x) ≥ α/4, we
have
2
Lβ 2 Lβ 3
Δ(A(μ), A(μ )) ≤ 4 xdμ(x) − xdμ (x) + 6 Δ(μ, μ ) . (7.83)
α α
Proof. The proof uses a truncation argument similar to the one given at
the end of the proof of Proposition 7.5.10. Given a number b > 0 and a
probability measure θ in D(C) we define the truncation θb as the image of
θ under the map x → min(x, b). In words, all the mass that θ gives to the
half-line [b, ∞[ is pushed to the point b. Then we have
∞ ∞
C
Δ(θ, θb ) ≤ (x − min(x, b))2 dθ(x) ≤ x2 dθ(x) ≤ . (7.84)
0 b b2
Consider now a sequence (θn )n≥1 in D(C). We want to prove that it has a
subsequence that converges for the distance Δ. Since for each b the set of
probability measures on the interval [0, b] is compact for the distance Δ (as is
explained in Section A.11), we assume, by taking a subsequence if necessary,
that for each integer m the sequence (θnm )n≥1 converges for Δ to a certain
probability measure λm . Next we show that there exists a probability measure
λ in D(C) such that λm = λm for each m. This is simply because if m < m
424 7. An Assignment Problem
∞ 4
then λmm = λm (the “pieces fit together”) and because 0 x dλm (x) ≤ C
for each m. Now, for each m we have limn→∞ Δ(θnm , λm ) = 0, and (7.84) and
the triangle inequality imply that limn→∞ Δ(θn , λ) = 0.
Proof of Proposition 7.6.2. The basic idea is to define A(μ) “as the limit”
−1
of the law λ of ≤M aN ()u() as M −1 ≤M δu() → μ, M, N →
∞, M/N → (1 + α). We note that by (7.50)
4used for k = 8, whenever
≤M u() ≥ αN/8, (and β < β(α)) we have x dλ(x) ≤ L. Thus, recalling
the notation of Lemma 7.6.3, we have λ ∈ D(L), a compact set, and therefore
the family of these measures has a cluster point A(μ), and (7.82) holds by
continuity. Moreover (7.83) is a consequence of (7.82) and continuity (and
shows that the cluster point A(μ) is in fact unique).
We recall the probability measures μN , νN , νN , μN of Section 7.5.
Proposition 7.6.4. We have
lim Δ(νN , A(μN )) = 0 . (7.85)
N →∞
Then
1 K L
Δ B(ν), L ≤ + 2 Δ(ν, η) . (7.87)
1 + k≤N aN (k)w(k) N β
Moreover
L
Δ(B(ν), B(ν )) ≤ Δ(ν, ν ) . (7.88)
β2
Proof. Similar, but simpler than the proof of Proposition 7.6.2.
Lβ 3
Δ(ν, ν ) ≤ Δ(μ , μ)
α6
and by (7.88) we have
L
Δ(μ, μ ) ≤ Δ(ν, ν )
β2
so that
Lβ
Δ(μ, μ ) ≤ Δ(μ, μ )
α6
and Δ(μ, μ ) = 0 if Lβ/α6 < 1. Let us stress the miracle here. The condition
(7.26) forces the relation xdμ(x) = α/(1 + α), and this neutralizes the first
426 7. An Assignment Problem
Proof.
Without loss of generality
we assume that N ≤ N . Let S =
≤M a N ()u() and S = ≤M a N ()u (). Then
2
1 1 1 1 (S − S )2
Δ L ,L ≤E − =E ≤ I + II (7.91)
S S S S S2S 2
where 2
≤M (aN () − aN ())u ()
I = 2E ;
S2S 2
2
≤M aN ()(u() − u ())
II = 2E .
S2S 2
We observe since N ≤ N that aN () ≥ aN (), so that
S ≥ S ∼ := aN ()u () ,
≤M
and 2
≤M aN ()(u() − u ())
II ≤ 2E .
S 2 S ∼2
To bound this quantity we will use the estimate(7.41). The relations
xdη(x) ≥ α/4 and xdη (x) ≥ α/4 mean that ≤M u() ≥ αM/4 ≥
αN/4 and ≤M u () ≥ αM/4 ≥ αN/4. Thus in (7.41) we can take b = α/4.
This estimate then yields
2 2
Lβ 2 M Lβ 3 1
II ≤ 4 xdη(x) − xdη (x) + 6 (u() − u ())2 .
α N α N
≤M
(7.92)
7.6 Operators 427
We can assume
from Lemma 7.5.2 that we have reordered the terms u () so
that M −1 ≤M (u() − u ())2 ≤ Δ(η, η ), and then the bound (7.92) is as
desired, since M ≤ 2N .
To controlthe term I, we first note that 0 ≤ aN () − aN () ≤ 1 since
N ≤ N ; and ≤M (aN () − aN ())u() ≤ M since 0 ≤ u () ≤ 1. Therefore
aN () − aN ()
I ≤ 2M E .
S2S 2
≤M
Lemma 7.6.9. Consider independent r.v.s X , X, uniform over [0, 1]. Con-
sider an integer R ≥ 1, a number γ ≥ 2 and the r.v.s
a = exp(−γX) ; a = exp(−γR X ) .
≤R
Then we can find a pair of r.v.s (Y, Y ) such that Y has the same law as the
r.v. a and Y has the same law as the r.v. a with
L L
E|Y − Y | ≤ 2
, E(Y − Y )2 ≤ 2 . (7.94)
γ γ
Proof of Proposition 7.6.8. We use Lemma 7.6.9 for γ = βN , R = M .
Consider independent copies (Y , Y ) of the pair (Y, Y ). It should be obvious
∼
from
the definition of the sequence u () that S :=
≤M Y u() equals
∼
a
≤M M M M
()u () in distribution. Writing S = ≤M Y u(), the left-
hand side of (7.93) is
2 2
1 1 1 1 ≤M (Y − Y )u()
Δ L ,L ≤E − =E ,
S S S S S2S 2
2
≤M |Y − Y |
≤E .
S2S 2
We expand the square, and we use (7.94) for γ = βN and one more time
the method used to control (7.52) to find that this is ≤ K(α)/N .
Proof of Lemma 7.6.9. Given any two r.v.s a, a ≥ 0, there is a canonical
way to construct a coupling of them. Consider the function Y on [0, 1] given
by
Y (x) = inf{t ; P(a ≥ t) ≤ x} .
The law of Y under Lebesgue’s measure is the law of a. Indeed the definition
of Y (x) shows that
This is the area between the graphs of Y of Y , and also the area between
the graphs of the functions t → P(a ≥ t) and t → P(a ≥ t) because these
7.6 Operators 429
two areas are exchanged by symmetry around the diagonal (except maybe
for their boundary). Therefore
∞
E|Y − Y | = |P(a ≥ t) − P(a ≥ t)|dt .
0
The rest of the proof consists in elementary (and very tedious) estimates of
this quantity when a and a are as in Lemma 7.6.9. For t ≤ 1 we have
1 1 1 1
P(a ≥ t) = P(exp(−γX) ≥ t) = P X ≤ log = min 1, log ,
γ t γ t
and similarly
1 1
P(exp(−γRX ) ≥ t) = min 1, log .
γR t
Using (7.96) for t ≤ 1 and (7.97) for t > 1 we obtain, using (7.95) in the
second inequality,
∞ 1
|P(a ≥ t) − P(a ≥ t)| dt ≤ 2 (P(a ≥ t) − ψ(t)) dt
0 0
∞ ∞
+ P(a ≥ t) dt − P(a ≥ t) dt
0 0
L
≤ + Ea − Ea .
γ2
Finally we use that by (7.39) we have |Ea − Ea | ≤ L/γ 2 , and this concludes
the proof that E |Y − Y | ≤ L/γ 2 .
We turn to the control of E(Y − Y )2 . First, we observe that
A>0⇒A=Y −2,
7.6 Operators 431
so that if t > 0 we have P(A ≥ t) = P(Y ≥ t + 2). Since Y and a have the
same distribution, it holds:
∞
E(min(Y , 2) − Y )2 = EA2 = 2 tP(Y ≥ t + 2)dt
0
∞
=2 tP(a ≥ t + 2)dt .
0
so that
eλ
P(a ≥ t) ≤ exp − λt .
γ
Taking λ = log γ > 0, we get
P(a ≥ t) ≤ L γ −t
ZN,M
AN,M − AN,M −1 = E log = −E log uN,M (M − 1)
ZN,M −1
ZN,M
AN,M − AN −1,M = E log = −E log wN,M (N ) .
ZN −1,M
By Theorem 7.1.2, these quantities have limits − log x dμ (x) and − log x
dν (x) respectively. (To obtain the required tightness, we observe that from
(7.27), (7.28) and Markov’s inequality we have P(uN,M (M − 1) < t) ≤ Kt
and P(wN,M (N ) < t) ≤ Kt.) Setting M (R) = R(1 + α), we write
AN,M − A1,1 = I + II ,
where
I= AR,M (R) − AR−1,M (R)
2≤R≤M
II = AR−1,M (R) − AR−1,M (R−1) .
2≤R≤M
where C is a constant and c(i, j) are as previously. The idea of the Hamil-
tonian is that the term −CcardA favors the pairs (A, σ) for which cardA is
large. It seems likely that, given C, results of the same nature as those we
proved can be obtained for this model when β ≤ β(C), but that it will be
difficult to prove the existence of a number β0 such than these results hold
for β ≤ β0 , independently of the value of C, and even more difficult to prove
that (as the results of [169] seem to indicate) they will hold for any value of
C and of β.
A. Appendix: Elements of Probability Theory
This appendix lists some well-known and some less well-known facts about
probability theory. The author does not have the energy to give a reference in
the printed literature for the well known facts, for the simple reason that he
has not opened a single textbook over the last three decades. However all the
statements that come without proof should be in standard textbooks, two of
which are [10] and [161]. Of course the less well-known facts are proved in
detail.
The appendix is not designed to be read from the first line. Rather one
should refer to each section as the need arises. If you do not follow this advice,
you might run into difficulties, such as meeting the notation L before having
learned that this always stands for a universal constant (= a number).
For the purpose of derivation inside an integral sign, or, equivalently, inside
an expectation, the following result will suffice. It follows from Lebesgue’s
dominated convergence theorem. If that is too fancy, much more basic ver-
sions of the same principle suffice, and can be found in Wikipedia.
Proposition A.2.1. Consider a random function ψ(t) defined on an inter-
val J of R, and assume that E|ψ(t)| < ∞ for each t ∈ J. Assume that the
function ψ(t) is always continuously differentiable, and that for each compact
subinterval I of J one has
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 435
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
436 A. Appendix: Elements of Probability Theory
Proof. We prove (A.1) with ψ(t) = F (u(t)). We write first that for a
compact subinterval I of ]0, 1[ we have
∂F ∂F
supui (t) (u(t)) ≤ sup |ui (t)| sup (u(t)) .
t∈I ∂xi t∈I t∈I ∂xi
and 2
∂F
E sup (u(t)) < ∞ . (A.5)
t∈I ∂xi
We prove only the second inequality, since the first one is rather immediate.
Using that ∂F/∂xi is of moderate growth (as in (A.18)), given any a > 0 we
first see that there is a constant A such that
∂F
∂xi (x) ≤ A exp ax ,
2
and since
√ √ √
u(t) ≤ tu + 1 − tv ≤ 2 max(u, v)
we obtain
∂F
sup (u(t)) ≤ A max(exp 2au2 , exp 2av2 )
t∈I ∂xi
≤ A exp 2a (u2i + vi2 ) ,
i≤M
so that (A.5) follows from Hölder’s inequality and the integrability properties
of Gaussian r.v.s, namely the fact that if g is a Gaussian r.v. then E exp ag 2 <
∞ for aEg 2 < 1/2 as follows from (A.11) below.
A.3 Gaussian Random Variables 437
Changing X into −X and t into −t, we get the following equally useful fact:
This is actually proved in (3.137) page 229, a way to show that this book is
worth what you paid for. There is of course a more precise understanding of
the tails of g than (A.9) and (A.10); but (A.9) and (A.10) will mostly suffice
here. Another fundamental formula is that when Eg 2 = τ 2 then for 2aτ 2 < 1
and any b we have
1 τ 2 b2
E exp(ag 2 + bg) = √ exp . (A.11)
1 − 2aτ 2 2(1 − 2aτ 2 )
Indeed,
∞
1 t2
E exp(ag + bg) = √
2
exp at − 2 + bt dt .
2
2πτ −∞ 2τ
bτ 2 τ
t= + u√ .
1 − 2aτ 2 1 − 2aτ 2
provided
lim F (t) exp(−t2 /2τ 2 ) = 0 . (A.15)
|t|→∞
This formula is used over and over in this work. As a first application, if
Eg 2 = τ 2 and 2aτ 2 < 1 we have
so that
1
(1 − 2aτ 2 )Eg exp ag 2 = τ 2 E exp ag 2 = τ √
1 − 2aτ 2
by (A.11) and Eg exp ag 2 = τ 2 (1−2aτ 2 )−3/2 . As another application, if k ≥ 2
This is probably the single most important formula in this work. For a proof,
consider the r.v.s
E z g
z = z − g .
E g2
They satisfy E z g = 0; thus g is independent of the family (z1 , . . . , zn ). We
then apply (A.14) at (z )≤n given. Since z = z +gE gz /E g 2 , (A.17) follows
whenever the following is satisfied to make the use of (A.14) legitimate (and
to allow the interchange of the expectation in z and in the family (z1 , . . . , zn )):
for each number a > 0, we have
so that
P(X ≥ t) ≤ e−λt E exp λXi = exp −λt + log E exp λXi . (A.19)
i≤N i≤N
If (ηi )i≤N are independent Bernoulli r.v.s, i.e. P(ηi = ±1) = 1/2, then
E exp λ ai ηi = ch λai , and thus
P ai ηi ≥ t ≤ exp −λt + log ch λai . (A.20)
i≤N i≤N
This
inequality is often called the subgaussian inequality. By symmetry,
P i≤N ai ηi ≤ −t is bounded by the same expression, so that
t2
P
ai ηi ≥ t ≤ 2 exp − . (A.22)
2 i≤N a2i
i≤N
This is seen by taking ai = 1/N , by observing that for the uniform measure
2 1 2
on ΣN the
sequence ηi = σi σi is an independent Bernoulli sequence and that
R1,2 = i≤N ai ηi . Related to (A.21) is the fact that
2
1 1
E exp ai ηi ≤1 . (A.24)
2 1 − i≤N a2i
i≤N
Equivalently,
2
1 2N
exp ai σi ≤1 ,
2 1 − i≤N a2i
i≤N
where the summation is over all sequences (σi )i≤N with σi = ±1. To prove
(A.24) we consider a standard Gaussian r.v. g independent of the r.v.s ηi
and, using (A.6), we have, denoting by Eg expectation in g only, and using
again that log ch t ≤ t2 /2,
2
1
E exp ai ηi = E Eg exp gai ηi
2
i≤N i≤N
= Eg exp log chgai
i≤N
g2 2
≤ Eg exp ai
2
i≤N
1
= 1 .
1− i≤N a2i
√
It follows from (A.24) that if S = a2i , then, if bi = ai / 2S, we have
2
i≤N
i≤N bi = 1/2 and
2 2
1 1 1
E exp ai ηi = E exp bi ηi ≤ ≤2.
4S 2 1/2
i≤N i≤N
A.5 Tail Estimates 443
eλ − e−λ e2λ − 1
λ −λ
= 2λ =t,
e +e e +1
where
1
I(t) = ((1 + t) log(1 + t) + (1 − t) log(1 − t)) . (A.27)
2
The function I(t) is probably better understood by noting that
1
I(0) = I (0) = 0 , I (t) = . (A.28)
1 − t2
It follows from (A.26) that
P ηi ≥ N t ≤ exp(−N I(t)) ,
i≤N
It will often occur that for a r.v. X, we know an upper bound for the prob-
abilities P(X ≥ t), and that we want to deduce an upper bound for EF (X)
for a certain function F . For example, if Y is a r.v., Y ≥ 0, then
∞
EY = P(Y ≥ t) dt , (A.32)
0
For a typical application of (A.33) let us assume that X satisfies the following
tail condition:
t2
∀t ≥ 0 , P(|X| ≥ t) ≤ 2 exp − 2 , (A.34)
2A
where A is a certain number. Then, using (A.33) for F (x) = xk and |X|
instead of X we get
∞
t2
E|X| ≤ 2k
k
t k−1
exp − 2 dt .
0 2A
EX 2k ≤ 2k+1 k!A2k .
Suppose, conversely, that for a given r.v. X we know that for a certain number
B and any k ≥ 1 we have EX 2k ≤ B 2k k k (i.e. an inequality of the
type (A.35)
for even moments). Then, using the power expansion exp x2 = k≥0 x2k /k!,
for any number C we have
X2 EX 2k B 2k k k
E exp 2
= 2k
≤ .
C C k! C 2k k!
k≥0 k≥0
Now, by Stirling’s formula, there is a constant L0 such that k k ≤ Lk0 k!, and
therefore there is a number L (e.g. L = 2L0 ) such that
X2
E exp ≤2.
LB 2
446 A. Appendix: Elements of Probability Theory
Many r.v.s considered in this work satisfy the condition (A.34). The previous
considerations explain why, when convenient, we control these r.v.s through
their moments.
If F is a continuously differentiable non-decreasing function on R, F ≥ 0,
F (−∞) = 0, we have
X
F (X) = F (t)dt = F (t)dt .
−∞ {t≤X}
Taking expectation, and using again Fubini’s theorem to exchange the inte-
gral in t and the expectation, we get now that
∞
E F (X) = F (t) P(X ≥ t) dt . (A.36)
−∞
This is seen by using (A.36) for the conditional probability that X ≥ a, and
for the r.v. min(X, b) instead of X.
t2 4A3 t
P Xi ≥ t ≤ exp − 1− . (A.40)
2N EX 2 N (EX 2 )2
i≤N
A.7 Bernstein’s Inequality 447
We have
E exp λX = 1 + E ϕ(λX) (A.42)
where ϕ(x) = ex − x − 1. We observe that Eϕ(|X|/A) ≤ E exp(|X|/A) − 1 =
2 − 1 = 1. Now power series expansion yields that ϕ(x) ≤ ϕ(|x|) and that for
x > 0, the function λ → ϕ(λx)/λ2 increases. Thus, for λ ≤ 1/A, we have
E ϕ(λX) ≤ λ2 A2 E ϕ(|X|/A) ≤ λ2 A2 .
λ2 EX 2
E exp λX = 1 + + E ϕ1 (λX)
2
where ϕ1 (x) = ex −x2 /2−x−1. We observe that Eϕ1 (|X|/A) ≤ Eϕ(|X|/A) ≤
1. Using again power series expansion yields ϕ1 (x) ≤ ϕ1 (|x|) and that for
x > 0 the function λ → ϕ1 (λx)/λ3 increases. Thus, if λ ≤ 1/A, we get
E ϕ1 (λX) ≤ λ3 A3 E ϕ1 (|X|/A) ≤ λ3 A3
4A3 t 4A2
2 2
≥ ≥1
N (EX ) EX 2
|X| ≤ A . (A.43)
448 A. Appendix: Elements of Probability Theory
λp (λA)p−2
E ϕ(λX) = E X p ≤ λ2 E X 2 ≤ λ2 E X 2 .
p! p!
p≥2 p≥2
|Xi |
∀ i ≤ N , Ei−1 exp ≤2. (A.45)
A
Exactly as before, this implies that for |λ| A ≤ 1 we have Ei−1 exp λXi ≤
exp λ2 A2 . Thus
Ek−1 exp λ Xi = exp λ Xi Ek exp λ Xk
i≤k i≤k−1
≤ exp λ 2 2
Xi + λ A .
i≤k−1
i≤N i≤k−1
Using this for k = 1 and taking expectation yields E exp λ i≤N Xi ≤
exp N λ2 A2 . Use of Chebyshev inequality as before gives
2
t t
P Xi ≥ t ≤ exp − min , . (A.46)
4N A2 2A
i≤N
A.8 ε-Nets
In this section we get some control of the norm of certain random matrices.
Much more detailed (and difficult) results are known.
and hence
√
∀ (xi )i≤N , ∀ (yi )i≤N
∈B, gij xi yj ≤ 32 N ,
i<j
1/2
1/2
≤N xk yk + N La x2k yk2 . (A.54)
k∈I k∈I k∈I
A.9 Random Matrices 451
1/2 1/2
≤N xk yk + N La x2k yk2 , (A.55)
k≤M k≤M k≤M
and
2
xk ηi,k ≤ N (1 + La) x2k . (A.56)
i≤N k≤M k≤M
1/2 1/2
xk yk ηi,k ηi,k ≤ LN a x2k yk2 . (A.57)
i≤N k=k , k, k ∈I k∈I k∈I
whenever (xk )k∈I ∈ A and (yk )k∈I ∈ A. Now, given any such sequences
(A.52) implies
N
P xk yk ηi,k ηi,k ≥ N u ≤ exp − min(u , u) . (A.59)
2
L
i≤N k=k ,k,k ∈I
so that taking u = L a where L large enough, all the events (A.58) simulta-
neously occur with a probability at least 1 − exp(−N a2 ).
Our next result resembles Proposition A.9.3, but rather than restricting
the range of k we now restrict the range of i.
452 A. Appendix: Elements of Probability Theory
2
xk ηi,k ≤ N0 x2k + L max(N a2 , N N0 a) x2k .
i∈J k≤M k≤M k≤M
(A.60)
Proof. The proof is very similar to the proof of Proposition A.9.3. It suffices
to prove that for all choices of (xk ) and (yk ) we have
1/2 1/2
xk yk ηi,k ηi,k ≤ L max(N a2 , N N0 a) x2k yk2 .
i∈J k=k k≤M k≤M
(A.61)
Consider a subset A of R
M
, with cardA ≤ 5 M
, A ⊂ 2B, B ⊂ convA, where
B is the Euclidean ball k≤M x2k ≤ 1. To ensure (A.61) it suffices that
xk yk ηi,k ηi,k ≤ L max(N a2 , N N0 a)
i∈J k=k
whenever cardJ ≤ N0 , (xk )k≤M , (yk )k≤M ∈ A. It follows from (A.52) that
for v > 0,
cardJ
P xk yk ηi,k ηi,k ≥ vcardJ ≤ exp − 2
min(v , v) ,
L
i∈J k=k
The number of possible choices for J and the sequences (xk )k≤M , (yk )k≤M
is at most
N
eN
N0
(cardA) ≤
2
25M ≤ exp 5N a2 ,
n N0
n≤N0
1≤k<k ≤M
M 2
1 u u
≤ 1+ exp − 1−L
ε 2 N
where Rk,k = N −1 i≤N ηi,k ηi,k .
Proof. We start the proof by observing that
1/2
2
Rk,k = sup αk,k Rk,k
k<k k<k
1/2
sup αk,k Rk,k ≥ (1 − 2 ε) 2
Rk,k .
A
k<k
Thus
−2
≥ (1 − 2 ε)
2
P N Rk,k u
k<k
1/2
u
−1
=P 2
Rk,k ≥ (1 − 2 ε)
N
k<k
u
≤ P sup αk,k Rk,k ≥
A N
k<k
M 2
1 u u
≤ 1+ exp − 1−L ,
ε 2 N
√
where we use (A.53) for t = uN in the last line.
Corollary A.9.7. We have
" %
−N n −2
2 card (σ 1 , . . . , σ n ) ; 2
N R, ≥ (1 − 2 ε) u
1≤< ≤n
n2
1 u u
≤ 1+ exp − 1−L
ε 2 N
where R, = N −1 i≤N σi σi .
Proof. This is another way to formulate Lemma A.9.6 when M = n.
454 A. Appendix: Elements of Probability Theory
and
P(X ≤ a/t) ≤ exp(−a(t log t − t − 1)) .
In particular we have
a
P(|X − a| ≥ a/2) ≤ exp − . (A.65)
L
assume that these are independent. We can define a Poisson point process of
intensity measure μ as Π = ∪k≥0 Πk . Then for each a, Π ∩ [a, ∞) is a Poisson
point process, the intensity measure of which is the restriction of μ to [a, ∞).
where the supremum is taken over all functions f from X to R with Lipschitz
constant 1, i.e. that satisfy |f (x) − f (y)| ≤ d(x, y) for all x, y in X . The
classical formula (A.67) is a simple consequence of the Hahn-Banach theorem.
We will not use it in any essential way, so we refer the reader to Lemma A.11.1
below for the complete proof of a similar result.
Another proof that d is a distance uses the classical notion of disintegra-
tion of measures (or, equivalently of conditional probability), and we sketch
it now. Consider a probability measure θ on X 2 with marginals μ1 and μ2
respectively. Then there exists a (Borel measurable) family of probability
measures θx on X such that for any continuous function h on X 2 we have
hdθ = h(x, y)dθx (y) dμ1 (x) . (A.68)
A.11 Distances Between Probability Measures 457
Consider then the probability measure θ on X 2 such that for any continuous
function we have
hdθ = h(y, z)dθx (y)dθx (z) dμ1 (x) . (A.70)
Using (A.70) in the case where h(y, z) = f (y) in the first line and (A.68) in
the third line we get that
f (y)dθ (y, z) = f (y)dθx (y)dθx (z) dμ1 (x)
= f (y)dθx (y) dμ1 (x)
using in the last inequality that μ2 is the second marginal of θ. This proves
that the first marginal of θ is μ2 , and similarly, its second marginal is μ3 .
Using the triangle inequality
d(y, z) ≤ d(y, x) + d(x, z) ,
and using (A.70), (A.69) and (A.68) we obtain
and in this manner we can easily complete the proof that d is a distance on
M1 (X ).
The topology defined by the distance d is the weak topology on M1 (X ).
To see this we observe first that the weak topology is also the weakest topol-
ogy that makes all the maps μ → f (x)dμ(x) where f is a Lipschitz function
on X with Lipschitz constant ≤ 1. This is simply because the linear span of
the classes of such functions is dense in C(X ) for the uniform norm. Therefore
the weak topology is weaker than the topology defined by d. To see that it is
also stronger we note that in (A.67) we can also take the supremum on the
class of Lipschitz functions that take the value 0 at a given point of X . This
class is compact for the supremum norm. Therefore given ε > 0 there is a
finite class F of Lipschitz functions on X such that
d(μ1 , μ2 ) ≤ ε + sup f (x) dμ1 (x) − f (x) dμ2 (x) . (A.71)
F
458 A. Appendix: Elements of Probability Theory
where the infimum is taken over all pairs of r.v.s (X, Y ) with laws μ and
ν respectively. The quantity Δ1/2 (μ, ν) is a distance, called Wasserstein’s
distance between μ and ν. This is not obvious from the definition, but can be
proved following the scheme we outlined in the case of the Monge-Kantorovich
transportation-cost distance (A.66). It also follows from the duality formula
given in Lemma A.11.1 below. Of course Wasserstein’s distance is a close
cousin of the transportation-cost distance, simply we replace the “linear”
measure of the “cost of transportation” by a “quadratic measure” of this
cost.
Denoting by D the diameter of X , i.e.
D = sup{d(x, y) ; x , y ∈ X } ,
so that
d(μ, ν)2 ≤ Δ(μ, ν) ≤ Dd(ν, μ) .
Consequently the topology induced by Wasserstein distance on M1 (X ) also
coincides with the weak topology. Let us note in particular from (A.71) that,
given a number ε > 0 there exists a finite set F of continuous functions on
X such that
Δ(μ1 , μ2 ) ≤ ε + sup f (x) dμ1 (x) − f (x) dμ2 (x) . (A.73)
F
then for each pair (X, Y ) of r.v.s valued in X we have Ef (X) + Eg(Y
) ≤
Ed(X, Y )2 , so that if X has law μ and Y has law ν we have f dμ + g dν ≤
A.11 Distances Between Probability Measures 459
Ed(X,
Y )2 . Taking the infimum over all choices of X and Y we see that
f dμ+ g dν ≤ Δ(μ, ν). Therefore if a denotes the right-hand side of (A.74),
we have proved that a ≤ Δ(μ, ν), and we turn to the proof of the converse.
We consider the subset S of the set C(X × X ) of continuous functions on
X × X that consists of the functions w(x, y) such that there exists continuous
functions f and g on X for which
f dμ + g dν = a (A.75)
and
∀ x, y ∈ X , w(x, y) > f (x) + g(y) − d(x, y)2 . (A.76)
It follows from the definition of a that for each function w in S there exist
x and y with w(x, y) > 0. Since S is convex and open, the Hahn-Banach
separation theorem asserts that we can find a linear functional Φ on C(X ×X )
such that Φ(w) > 0 for each w in S. If w ∈ S and w ≥ 0 it follows from the
definition of S that w + λw ∈ S, so that Φ(w + λw ) > 0. Thus Φ(w ) ≥ 0,
i.e. Φ is positive, it is a positive measure on X × X . Since it is a matter of
normalization, we can assume that it is a probability, which we denote by θ.
If f and g are as in (A.75), then for each ε > 0 we see by (A.76) that the
function w(x, y) = f (x) + g(y) − d(x − y)2 + ε belongs to S and thus
The total variation distance induces the weak topology on M1 (X ) only when
X is finite. When this is the case, we have
μ − ν = |μ({x}) − ν({x})| . (A.79)
x∈X
460 A. Appendix: Elements of Probability Theory
The reader should observe here how tricky we have been: we resisted the
temptation to use Gaussian integration by parts.
To study ϕ, we note first that ϕ(0) = 0 since ϕ is odd, so that since
ϕ is increasing ϕ(y) > 0 for y > 0 and ϕ(y) < 0 for y < 0. The function
ψ(y) = yϕ (y) − ϕ(y) satisfies ψ(0) = 0 and ψ (y) = yϕ (y). Thus ψ (y) < 0
for y = 0 and thus ψ(y) < 0 for y > 0 and ψ(y) > 0 for y < 0. Therefore
ϕ(y)(yϕ (y) − ϕ(y)) = ϕ(y)ψ(y) < 0 for y = 0 and hence
E ϕ(Y )(Y ϕ (Y ) − ϕ(Y )) < 0 .
1 √ √
ϕ(z x + h)ϕ (z x + h) e−z /2 dz
2
√
2π
2
1 y hy h2
= √ ϕ(y)ϕ (y) exp − + − dy . (A.83)
2πx 2x x 2x
1 √ √
ϕ(z x + h)ϕ (z x + h) e−z /2 dz
2
√
2π
2
1 y hy h2
= −√ ϕ(y)ϕ (y) exp − − − dy . (A.84)
2πx 2x x 2x
This proves the case N = 1. It is then easy to deduce the general case by
induction on N as follows. Suppose N > 1 and assume that the functional
A.15 Proof of Theorem 3.1.4 463
Since
W (x)dx = Wq (x)dx dq = W ∗ (q)dq ,
RN R RN −1 R
and similarly for U ∗ and V ∗ this is the desired result. Theorem 3.1.4 is
established.
References
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 465
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
466 References
and with an application to the diffusion equation, J. Funct. Anal. 22, pp.
366–389.
38. Canisius J., van Enter A.C.D., van Hemmen J.L. (1983) On a classical spin
glass model. Z. Phys. B 50, no. 4, pp. 311–336.
39. Carmona P., Hu Y. (2006) Universality in Sherrington-Kirkpatrick’s spin glass
model. Ann. Inst. H. Poincaré Probab. Statist. 42, no. 2, pp. 215–222.
40. Carvalho S., Tindel S. (2007) On the multiple overlap function of the SK
model. Publicacions Matematiques 51, pp. 163–199.
41. Carvalho S., Tindel S. (2007) A central limit theorem for a localized version
of the SK model. Potential Analysis 26, pp. 323–343.
42. Catoni O. (1996) The Legendre transform of two replicas of the Sherrington-
Kirkpatrick spin glass model. A free energy inequality. Probab. Theory Relat.
Fields 105, no. 3, pp. 369–392.
43. Cavagna A., Giardina I., Parisi G. (1997) Structure of metastable states in
spin glasses by means of a three replica potential. J. Phys. A: Math. and Gen.
30, no. 13, pp. 4449–4466.
44. Chatterjee S. (2007) Estimation in spin glasses: a first step. Ann. Statist. 35,
no. 5, pp. 1931–1946.
45. Chatterjee S. (2010) Spin glasses and Stein’s method. Probab. Theory Relat.
Fields 148, pp. 567–600.
46. Chatterjee S., Crawford N. (2009) Central limit theorems for the energy den-
sity in the Sherrington-Kirkpatrick model. J. Stat. Phys. 137, no. 4 pp. 639–
666.
47. Comets F. (1996) A spherical bound for the Sherrington-Kirkpatrick model.
Hommage à P.A. Meyer et J. Neveu. Astérisque 236, pp. 103–108.
48. Comets F. (1998) The martingale method for mean-field disordered systems at
high temperature. Mathematical aspects of spin glasses and neural networks.
Progr. Probab. 41, pp. 91–113. Birkhäuser Boston, Inc., Boston, MA.
49. Comets F., Neveu J. (1995) The Sherrington-Kirkpatrick model of spin glasses
and stochastic calculus: the high temperature case. Comm. Math. Phys. 166,
no. 3, pp. 549–564.
50. Crawford N. (2008) The intereaction between multioverlap in the high tem-
perature phase of the Sherrington-Kirkpatrick spin glass. J. Math. Phys. 49,
125201 (24 pages).
51. Dembo A., Montanari A. (2010) Gibbs measures and phase transitions on
sparse random graphs. Braz. J. Probab. Stat. 24, no. 2, pp. 137–211.
52. Derrida B., Gardner E. (1988) Optimal storage properties of neural network
models. J. Phys. A 21, pp. 271–284.
53. Derrida B., Gardner E. (1989) Three unfinished works on the optimal storage
capacity of networks. Special issue in memory of Elizabeth Gardner, pp. 1957-
1988. J. Phys. A 22, no. 12, pp. 1983–1994.
54. Ellis R.S. (1985) Entropy, large Deviations and Statistical Mechanics
Grundlehren der Mathematischen Wissenschaften 271. Springer-Verlag, New
York. xiv+364 pp.
55. Feng J., Shcherbina M., Tirozzi B. (2001) On the critical capacity of the
Hopfield model. Comm. Math. Phys. 216, no. 1, pp. 139–177.
56. Feng J., Shcherbina M., Tirozzi B. (2001) On the critical capacity of the
Hopfield model. Comm. Math. Phys. 216, no. 1, pp. 139-177.
57. Feng J., Tirozzi B. (1995) The SLLN for the free-energy of a class of neural
networks. Helv. Phys. Acta 68, no. 4, pp. 365–379.
58. Feng J., Tirozzi B. (1997) Capacity of the Hopfield model. J. Phys. A 30,
no. 10, pp. 3383–3391.
468 References
59. Fischer K.H., Hertz J., (1991) Spin glasses. Cambridge Studies in Magnetism,
1. Cambridge University Press, Cambridge, 1991. x+408 pp.
60. Franz S., Leone M. (2003) Replica bounds for optimization problems and
diluted spin systems. J. Statist. Phys. 111, no. 3-4, pp. 535–564.
61. Fröhlich J., Zegarlinski B. (1987) Some comments on the Sherrington-
Kirkpatrick model of spin glasses. Comm. Math. Phys. 112, pp. 553–566.
62. Gamarnik, D. (2004) Linear phase transition in random linear constraint sat-
isfaction problems. Probab. Theory Relat. Fields 129, pp. 410–440.
63. Gardner E. (1988) The space of interactions in neural network models. J.
Phys. A 21, pp. 257–270.
64. Gentz B. (1996) An almost sure central limit theorem for the overlap param-
eters in the Hopfield model. Stochastic Process. Appl. 62, no. 2, pp. 243–262.
65. Gentz B. (1996) A central limit theorem for the overlap in the Hopfield model.
Ann. Probab. 62, no. 4, pp. 1809–1841.
66. Gentz B. (1998) On the central limit theorem for the overlap in the Hop-
field model. Mathematical aspects of spin glasses and neural networks Progr.
Probab. 41, pp. 115–149, Birkhäuser Boston, Boston, MA.
67. Gentz B., Löwe M. (1999) The fluctuations of the overlap in the Hopfield
model with finitely many patterns at the critical temperature. Probab. Theory
Relat. Fields 115, no. 3, pp. 357–381.
68. Gentz B., Löwe M. (1999) Fluctuations in the Hopfield model at the critical
temperature. Markov Process. Related Fields 5, no. 4, pp. 423–449.
69. Guerra F. (1995) Fluctuations and thermodynamic variables in mean field
spin glass models. Stochastic Processes, Physics and Geometry, S. Albeverio
et al. editors, World Scientific, Singapore.
70. Guerra F. (1996) About the overlap distribution in mean field spin glass mod-
els. International Journal of Modern Physics B 10, pp. 1675–1684.
71. Guerra F. (2001) Sum rules for the free energy in the mean field spin glass
model. Fields Institute Communications 30, pp. 161–170.
72. Guerra F. (2005) Mathematical aspects of mean field spin glass theory. Euro-
pean Congress of Mathematics, pp. 719–732, Eur. Math. Soc., Zürich, 2005.
73. Guerra F., Toninelli F.L. (2002) Quadratic replica coupling for the
Sherrington-Kirkpatrick mean field spin glass model. J. Math. Phys. 43, no. 7,
pp. 3704–3716.
74. Guerra F., Toninelli F.L. (2002) Central limit theorem for fluctuations in the
high temperature region of the Sherrington-Kirkpatrick spin glass model. J.
Math. Phys. 43, no. 12, pp. 6224–6237.
75. Guerra F., Toninelli F.L. (2002) The Thermodynamic Limit in Mean Field
Spin Glass Models. Commun. Math. Phys. 230, pp. 71–79.
76. Guerra F., Toninelli F.L. (2003) The Infinite Volume Limit in Generalized
Mean Field Disordered Models. Markov Process. Related Fields 9, no. 2, pp.
195–207.
77. Guerra F., Toninelli F.L. (2003) Infinite volume limit and spontaneous replica
symmetry breaking in mean field spin glass models. Ann. Henri Poincaré 4,
suppl. 1, S441–S444.
78. Guerra F., Toninelli F.L. (2004) The high temperature region of the Viana-
Bray diluted spin glass model. J. Statist. Phys. 115, no. 1-2, pp. 531–555.
79. Hanen A. (2007) Un théorème limite pour les covariances des spins dans le
modèle de Sherrington-Kirkpatrick avec champ externe. Ann. Probab. 35,
no. 1, pp. 141–179.
80. Hanen A. (2008) A limit theorem for mean magnetization in the Sherrington-
Kirkpatrick model with an external field. To appear.
References 469
81. van Hemmen J.L., Palmer R.G. (1979) The replica method and a solvable spin
glass system. J. Phys. A 12, no. 4, pp. 563–580.
82. van Hemmen J.L., Palmer R.G. (1982) The thermodynamic limit and the
replica method for short-range random systems. J. Phys. A 15, no. 12, pp.
3881–3890.
83. Hertz J., Krogh A., Palmer R.C. (1991) Introduction to the theory of neural
computation. Santa Fe Institute Studies in the Sciences of Complexity. Lec-
ture Notes, I. Addison-Wesley Publishing Company, Advanced Book Program,
Redwood City. xxii+327 pp.
84. Hopfield J.J. (1982) Neural networks and physical systems with emergent
collective computational abilities. Proc. Natl. Acad. Sci. USA 79, pp. 1554–
2558.
85. Hopfield J.J. (1982) Neural networks and physical systems with emergent
collective computational abilities. Proc. Natl. Acad. Sci. USA 81, pp. 3088.
86. Ibragimov A., Sudakov V.N., Tsirelson B.S. (1976) Norms of Gaussian sample
functions. Proceedings of the Third Japan USSR Symposium on Probability
theory. Lecture Notes in Math. 550, Springer Verlag, pp. 20–41.
87. Kahane J.-P. (1986) Une inégalité du type de Slepian et Gordon sur les pro-
cessus gaussiens. Israel J. Math. 55, no. 1, pp. 109–110.
88. Kösters H. (2006) Fluctuations of the free energy in the diluted SK-model.
Stochastic Process. Appl. 116, no. 9, pp. 1254-1268.
89. Kim J.H., Roche J.R. (1998) Covering cubes by random half cubes, with
applications to binary neural networks: rigorous combinatorial approaches.
Eight annual Workshop on Computational Learning Theory, Santa Cruz, 1995.
J. Comput. System Sci. 56, no. 2, pp. 223–252.
90. Krauth W., Mézard M. (1989) Storage capacity of memory network with bi-
nary couplings. J. Phys. 50, pp. 3057-3066.
91. Kurkova, I. (2005) Fluctuations of the free energy and overlaps in the high-
temperature p-spin SK and Hopfield models. Markov Process. Related Fields
11, no. 1, pp. 55–80.
92. Latala R. (2002) Exponential inequalities for the SK model of spin glasses,
extending Guerra’s method. Manuscript.
93. Ledoux M. (2001) The concentration of measure phenomenon. Mathematical
Surveys and Monographs 89, American Mathematical Society, Providence,
RI, x+181 pp.
94. Ledoux M. (2000) On the distribution of overlaps in the Sherrington-
Kirkpatrick spin glass model, J. Statist. Phys. 100, no. 5-6, pp. 871–892.
95. Ledoux M., Talagrand M. (1991) Probability in Banach Spaces. Springer-
Verlag, Berlin.
96. Linusson S., Wästlund J. (2004) A proof of Parisi’s conjecture on the random
assignment problem. Probab. Theory Relat. Fields 128, no. 3, pp. 419–440.
97. Loukianova D. (1997) Lower bounds on the restitution error of the Hopfield
model. Probab. Theory Relat. Fields 107, pp. 161–176.
98. Löwe M. (1998) On the storage capacity of Hopfield models with correlated
patterns. Ann. Appl. Probab. 8, no. 4, pp. 1216–1250.
99. Löwe M. (1999) The storage capacity of generalized Hopfield models with
semantically correlated patterns. Markov Process. Related Fields 5, no. 1, pp.
1–19.
100. Márquez-Carreras D., Rovira C., Tindel S. (2006) Asymptotic behavior of
the magnetization for the perceptron model. Ann. Inst. H. Poincaré Probab.
Statist. 42, no. 3, pp. 327–342.
101. Mézard M. (1988) The space of interactions in neural networks: Gardner’s
computation with the cavity method. J. Phys. A 22, pp. 2181–2190.
470 References
168. Wästlund J. (2009) An easy proof of the ζ(2) limit in the random assignment
problem. Electron. Commun. Probab. 14, pp. 261–269.
169. Wästlund J. Replica-symmetry and combinatorial optimization, Manuscript,
2009.
Index
Dw 2
, 250 ∇, 203
G , 244 θ, 259
H0 , 22 ρ, 53
K, 31 qb, 83
L, 31 rb, 163
O(k), 41 b(0) , b(1) , b(2), 83
−
R, , 55
b∗ , 242
R1,1 , 192 m∗ , 242
R1,2 , 3 mk (σ), 240
R, , 35 n → 0, 145
T, , 87 pN , 6
U1,1 , 275 qN,M , 207
ZN , 5 A(x), 228
Av, 54 I(t), 237
E, 21 N (x), VII, 225
Eξ , 51, 73, 156 R, 81
Er , 341 1A , 49
RS0 (α), 225
SN , VII Aizenman, 147
ΣN , 3 Aldous, 397
σ, 191 align the spins, 240
β+ , 62 allergic, 235
ch, 9 analytic continuation, 146
AT line, 80, 93
·, 6
atoms, 1, 4
·t , 20, 31
·t,∼ , 161
Bernoulli r.v., 13, 152, 240
log, VII Birkhoff’s theorem, 415
νt , 156 Boltzmann, IX
ν(f ), 30 Boltzmann factor, IX
ν(f )1/2 , 131 Bovier, 246, 254, 255, 275, 296
νt , 156, 207 Brascamp-Lieb, 235, 236, 296
νx , 341 Brunn-Minkowski, 193
νt,v , 162, 216
G, 245 cavity, 53
ρN,M , 207 central limit theorem, 41, 186
sh, 9 claims, 296
, 29 Comets, 147
th, 9 concentration of measure, 16, 127, 273
θ , 162 configuration, IX
b, 202 conflict, 240
ε , 55 contraction argument, 59
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 475
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
476 Index
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 479
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
480 Glossary
2
−x /2
A(x) The function − dx
d
log N (x) = √1 e
2π N (x)
, 228
I(t) The function
1
I(t) = (1 + t) log(1 + t) + (1 − t) log(1 − t) ,
2
which satisfies I(0) = I (0) = 0 and I (0) =
1/(1 − t2 ), see (A.29), 237
N (x) The probability that a standard Gaussian r.v. g
is ≥ x, VII
R Denotes a quantity which is a remainder, of
“smaller order”, such as in (1.217), 81
1A The indicator function of the set A, 49
Av Typically denotes the average over one or a few
spins that take values ±1, 53
approximate
integration by parts A central technique to handle situations where
the randomness is generated by Bernoulli r.v.s
rather than by Gaussian r.v.s. It relies on the
identity (4.198), 289
AT line For the SK model, the line of equation
√
β 2 Ech−4 (βz q + h) = 1, where q is the solu-
tion of (1.74), 80
essentially
supported A random measure G is essentially supported
by a set A (depending on N and M ) if the com-
c
plement A of A is negligible, 254
external field A term h 1≤i≤N σi occurring in the Hamilto-
nian, 4
symmetry
between sites A general principle that for many Hamiltonians,
the sites “play the same rôle”, 8