0% found this document useful (0 votes)
15 views494 pages

978 3 642 15202 3

This document is Volume 54 of the series 'Ergebnisse der Mathematik und ihrer Grenzgebiete', featuring a treatise on Spin Glasses by Michel Talagrand. It includes various models and methods related to Spin Glasses, such as the Sherrington-Kirkpatrick Model and the Perceptron Model, among others. The volume serves as a comprehensive resource for advanced studies in the field of mathematical physics and probability theory.

Uploaded by

ayan849
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views494 pages

978 3 642 15202 3

This document is Volume 54 of the series 'Ergebnisse der Mathematik und ihrer Grenzgebiete', featuring a treatise on Spin Glasses by Michel Talagrand. It includes various models and methods related to Spin Glasses, such as the Sherrington-Kirkpatrick Model and the Perceptron Model, among others. The volume serves as a comprehensive resource for advanced studies in the field of mathematical physics and probability theory.

Uploaded by

ayan849
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 494

Ergebnisse der Mathematik Volume 54

und ihrer Grenzgebiete


3. Folge

A Series of Modern Surveys


in Mathematics

Editorial Board
G.-M. Greuel, Kaiserslautern M. Gromov, Bures-sur-Yvette
J. Jost, Leipzig J. Kollár, Princeton
G. Laumon, Orsay H. W. Lenstra, Jr., Leiden
S. Müller, Bonn J. Tits, Paris
D. B. Zagier, Bonn G. Ziegler, Berlin
Managing Editor R. Remmert, Münster

For other titles published in this series, go to


www.springer.com/series/728
Michel Talagrand

Mean Field Models


for Spin Glasses

Volume I: Basic Examples


Michel Talagrand
Université Paris 6
Institut de mathématiques
UMR 7586 CNRS
Place Jussieu 4
75252 Paris Cedex 05
France
michel@talagrand.net

This volume is the first part of a treatise on Spin Glasses in the series Ergebnisse der Math-
ematik und ihrer Grenzgebiete. The second part is Vol. 55 of the Ergebnisse series. The first
edition of the treatise appeared as Vol. 46 of the same series (978-3-540-00356-4).

ISBN 978-3-642-15201-6 e-ISBN 978-3-642-15202-3


DOI 10.1007/978-3-642-15202-3
Springer Heidelberg Dordrecht London New York

Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys
in Mathematics ISSN 0071-1136

Mathematics Subject Classification (2010): Primary: 82D30, 82B44. Secondary: 82C32, 60G15, 60K40

© Springer-Verlag Berlin Heidelberg 2011


This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from Springer. Violations
are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this publication does not
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective
laws and regulations and therefore free for general use.

Cover design: VTEX, Vilnius

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)


To Wansoo T. Rhee, for so many reasons.
Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XI

1. The Sherrington-Kirkpatrick Model . . . . . . . . . . . . . . . . . . . . . . 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Notations and Simple Facts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Gaussian Interpolation and the Smart Path Method . . . . . . . . 12
1.4 Latala’s Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.5 A Kind of Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . 51
1.6 The Cavity Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.7 Gibbs’ Measure; the TAP Equations . . . . . . . . . . . . . . . . . . . . . . 67
1.8 Second Moment Computations and the Almeida-Thouless Line 80
1.9 Beyond the AT Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
1.10 Central Limit Theorem for the Overlaps . . . . . . . . . . . . . . . . . . 98
1.11 Non Gaussian Behavior: Hanen’s Theorem . . . . . . . . . . . . . . . . 113
1.12 The SK Model with d-component Spins . . . . . . . . . . . . . . . . . . . 125
1.13 The Physicist’s Replica Method . . . . . . . . . . . . . . . . . . . . . . . . . . 140
1.14 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

2. The Perceptron Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
2.2 The Smart Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
2.3 Cavity in M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.4 The Replica Symmetric Solution . . . . . . . . . . . . . . . . . . . . . . . . . 173
2.5 Exponential Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
2.6 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

3. The Shcherbina and Tirozzi Model . . . . . . . . . . . . . . . . . . . . . . . 191


3.1 The Power of Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
3.2 The Replica-Symmetric Equations . . . . . . . . . . . . . . . . . . . . . . . 207
3.3 Controlling the Solutions of the RS Equations . . . . . . . . . . . . . 224
3.4 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

VII
VIII Contents

4. The Hopfield Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237


4.1 Introduction: The Curie-Weiss Model . . . . . . . . . . . . . . . . . . . . . 237
4.2 Local Convexity and the Hubbard-Stratonovitch Transform . 244
4.3 The Bovier-Gayrard Localization Theorem . . . . . . . . . . . . . . . . 254
4.4 Selecting a State with an External Field . . . . . . . . . . . . . . . . . . 272
4.5 Controlling the Overlaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
4.6 Approximate Integration by Parts and the Replica-Symmetric
Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
4.7 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

5. The V -statistics Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
5.2 The Smart Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
5.3 Cavity in M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.4 The New Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
5.5 The Replica-Symmetric Solution . . . . . . . . . . . . . . . . . . . . . . . . . 312

6. The Diluted SK Model and the K-Sat Problem . . . . . . . . . . . 325


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
6.2 Pure State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
6.3 The Functional Order Parameter . . . . . . . . . . . . . . . . . . . . . . . . . 340
6.4 The Replica-Symmetric Solution . . . . . . . . . . . . . . . . . . . . . . . . . 345
6.5 The Franz-Leone Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
6.6 Continuous Spins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
6.7 The Power of Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
6.8 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

7. An Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
7.2 Overview of the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
7.3 The Cavity Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
7.4 Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
7.5 Empirical Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
7.6 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
7.7 Notes and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433

A. Appendix: Elements of Probability Theory . . . . . . . . . . . . . . . 435


A.1 How to Use this Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
A.2 Differentiation Inside an Expectation . . . . . . . . . . . . . . . . . . . . . 435
A.3 Gaussian Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
A.4 Gaussian Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
A.5 Tail Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
A.6 How to Use Tail Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
A.7 Bernstein’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
A.8 ε-Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Contents IX

A.9 Random Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449


A.10 Poisson Random Variables and Point Processes . . . . . . . . . . . . 454
A.11 Distances Between Probability Measures . . . . . . . . . . . . . . . . . . 456
A.12 The Paley-Zygmund Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 460
A.13 Differential Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
A.14 The Latala-Guerra Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
A.15 Proof of Theorem 3.1.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Introduction


Let us denote by SN the sphere of RN of center 0 and radius N , and by
μN the uniform measure on SN . For i, k ≥ 1, consider independent standard
Gaussian random variables (r.v.s) gi,k and the subset Uk of RN given by
  
Uk = (x1 , . . . , xN ) ∈ RN ; gi,k xi ≥ 0 .
i≤N

The direction of the vector (gi,k )i≤N is random (with uniform distribution
over all possible directions) so that Uk is simply a half-space through the
origin of random direction. (It might not be obvious now why we use Gaussian
r.v.s to define a space of random direction, but this will become gradually
clear.) Consider the set SN ∩k≤M Uk , the intersection of SN with many such
half-spaces. Denoting by E mathematical expectation, it should be obvious
that    
E μN SN Uk = 2−M , (0.1)
k≤M

because every point of SN has a probability 2−M to belong to all the sets Uk ,
k ≤ M . This however is not really interesting. The fascinating fact is that
when N is large and M/N  α, if α > 2 the set SN ∩k≤M Uk is typically
empty (a classical result), while if α < 2, with probability very close to 1, we
have   
1
log μN SN Uk  RS(α) . (0.2)
N
k≤M

Here,
  √  
z q 1 q 1
RS(α) = min α E log N √ + + log(1 − q) ,
0<q<1 1−q 2 1−q 2

where N (x) denotes the probability that a standard Gaussian r.v. g is ≥ x,


and where log x denotes (as everywhere through the book) the natural log-
arithm of x. Of course you should rush to require medical attention if this
formula seems transparent to you. We simply give it now to demonstrate
that we deal with a situation whose depth cannot be guessed beforehand.
The wonderful fact (0.2) was not discovered by a mathematician, but by
XI
XII Introduction

a physicist, E. Gardner. More generally theoretical physicists have discov-


ered wonderful new areas of mathematics, which they have explored by their
methods. This book is an attempt to correct this anomaly by exploring these
areas using mathematical methods, and an attempt to bring these marvelous
questions to the attention of the mathematical community. This is a book of
mathematics. No knowledge of physics or statistical mechanics whatsoever is
required or probably even useful to read it. If you read enough of this volume
and the next, then in Volume II you will be able to understand (0.2).
More specifically, this is a book of probability theory (mostly). Attempt-
ing first a description at a “philosophical” level, a fundamental problem is
as follows. Consider a large finite collection (Xk )k≤M of random variables.
What can we say about the largest of them? More generally, what can we say
about the “few largest” of them? When the variables Xk are probabilistically
independent, everything is rather easy. This is no longer the case when the
variables are correlated. Even when the variables are identically distributed,
the answer depends very much on their correlation structure. What are the
correlation structures of interest? Most of the familiar correlation structures
in Probability are low-dimensional, or even “one-dimensional”. This is be-
cause they model random phenomena indexed by time, or, equivalently, by
the real line, a one-dimensional object. In contrast with these familiar situ-
ations, the correlation structures considered here will be “high-dimensional”
– in a sense that will soon become clear – and will create new and truly
remarkable phenomena. This is a direction of probability theory that has not
yet received the attention it deserves.
A natural idea to study the few largest elements of a given realization
of the family (Xk )k≤M is to assign weights to these elements, giving large
weights to the large elements. Ideas from statistical mechanics suggest that,
considering a parameter β, weights proportional to exp βXk are particularly
appropriate. That is, the (random) weight of the k-th element is

exp βXk
 . (0.3)
i≤M exp βXi

These weights define a random probability measure on the index set.


Under an appropriate normalization, one can expect that this probability
measure will be essentially supported by the indices k for which Xk is ap-
proximately a certain value x(β). This is because the number of variables
Xk close to a given large value x should decrease as x increases, while the
corresponding weights increase, so that an optimal compromise should be
reached at a certain level. The number x(β) will increase with β. Thus we
have a kind of “scanner” that enables us to look at the values of the family
(Xk )k≤M close to the (large) number x(β), and this scanner is tuned with
the parameter β.
We must stress an essential point. We are interested in what happens for
a typical realization of the family (Xk ). This can be very different (and much
Introduction XIII

harder to understand) than what happens in average of all realizations. To


understand the difference between typical and average, consider the situation
of the Spinland State Lottery. It sells 1023 tickets at a unit price of one spin
each. One ticket wins the single prize of 1023 spins. The average gain of a
ticket is 1 spin, but the typical gain is zero. The average value is very different
from the typical value because there is a large contribution coming from a
set of very small probability. This is exactly the difference between (0.1) and
(0.2). If M/N  α < 2, in average, μN (SN ∩k≤M Uk ) = 2−αN , but typically
N −1 log μN (SN ∩k≤M Uk )  RS(α).
In an apparently unrelated direction, let us consider a physical system that
can be in a (finite) number of possible configurations. In each configuration,
the system has a given energy. It is maintained by the outside world at a given
temperature, and is subject to thermal fluctuations. If we observe the system
after it has been left undisturbed for a long time, what is the probability to
observe it in a given configuration?
The system we will mostly consider is ΣN = {−1, 1}N , where N is a
(large) integer. A configuration σ = {σ1 , . . . , σN } is an element of ΣN . It
tells us the values of the N “spins” σi , each of which can take the values ±1.
When in the configuration σ, the system has an energy HN (σ). Thus HN
is simply a real-valued function on ΣN . It is called the Hamiltonian of the
system. We consider a parameter β (that physically represents the inverse of
the temperature). We weigh each configuration proportionally to its so-called
Boltzmann factor exp(−β HN (σ)). This defines Gibbs’ measure, a probability
measure on ΣN given by
exp(−βHN (σ))
GN ({σ}) = (0.4)
ZN
where the normalizing factor ZN is given by

ZN = ZN (β) = exp(−βHN (σ)) . (0.5)
σ

The summation is of course over σ in ΣN . The factor ZN is called the par-


tition function. Statistical mechanics asserts that Gibbs’ measure represents
the probability of observing a configuration σ after the system has reached
equilibrium with an infinite heat bath at temperature 1/β. (Thus the ex-
pression “high temperature” will mean “β small” while the expression “low
temperature” will mean “β large”.) Of course the reader might wonder why
(0.4) is the “correct definition”. This is explained in physics books such as
[102], [126], [125], and is not of real concern to us. That this definition is very
fruitful will soon become self-evident.
The reason for the minus sign in Boltzmann’s factor exp(−βHN (σ)) is
that the system favors low (and not high) energy configurations. It should be
stressed that the (considerable. . .) depth of the innocuous looking definition
(0.4) stems from the normalizing factor ZN . This factor, the partition func-
tion, is the sum of many terms of widely different orders of magnitude, and it
XIV Introduction

is unclear how to estimate it. The (few) large values become more important
as β increases, and predominate over the more numerous smaller values. Thus
the problem of understanding Gibbs’ measure gets typically harder for large
β (low temperature) than for small β (high temperature).
At this stage, the reader has already learned all the statistical
mechanics (s)he needs to know to read this work.
The energy levels HN (σ) are closely related to the “interactions” between
the spins. When we try to model a situation of “disordered interactions” these
energy levels will become random variables, or, equivalently, the Hamiltonian,
and hence Gibbs’ measure, will become random. There are two levels of ran-
domness (a probabilist’s paradise). The “disorder”, that is, the randomness
of the Hamiltonian HN , is given with our sample system. It does not evolve
with the thermal fluctuations. It is frozen, or “quenched” as the physicists
say. The word “glass” of the expression “spin glasses” conveys (among many
others) this idea of frozen disorder.
Probably the reader has met with skepticism the statement that no further
knowledge of statistical mechanics is required to read this book. She might
think that this could be formally true, but that nonetheless it would be
very helpful for her intuition to understand some of the classical models
of statistical mechanics. This is not the case. When one studies systems at
“high temperature” the fundamentalmental picture is that of the model with
random Hamiltonian HN (σ) = − i≤N hi σi where hi are i.i.d. Gaussian
random variables (that are not necessarily centered). This particular model
is completely trivial because there is no interaction between the sites, so it
reduces to a collection of N models consisting each of one single spin, and
each acting on their own. (All the work is of course to show that this is in
some sense the way things happen in more complicated models.) When one
studies systems at “low temperature”, matters are more complicated, but
this is a completely new subject, and simply nothing of what had rigorously
been proved before is of much help.
In modeling disordered interactions between the spins, the problem is to
understand Gibbs’ measure for a typical realization of the disorder. As we
explained, this is closely related to the problem of understanding the large
values among a typical realization of the family (−HN (σ)). This family is
correlated. One reason for the choice of the index set ΣN is that it is suitable
to create extremely interesting correlation structures with simple formulas.
At the beginning of the already long story of spin glasses are “real” spin
glasses, alloys with strange magnetic properties, which are of considerable
interest, both experimentally and theoretically. It is believed that their re-
markable properties arise from a kind of disorder among the interactions of
magnetic impurities. To explain (at least qualitatively) the behavior of real
spin glasses, theoretical physicists have invented a number of models. They
fall into two broad categories: the “realistic” models, where the interacting
atoms are located at the vertices of a lattice, and where the strength of
Introduction XV

the interaction between two atoms decreases when their distance increases;
and the “mean-field” models, where the geometric location of the atoms in
space is forgotten, and where each atom interacts with all the others. The
mean-field models are of special interest to mathematicians because they are
very basic mathematical objects and yet create extremely intricate struc-
tures. (As for the “realistic” models, they appear to be intractable at the
moment.) Moreover, some physicists believe that these structures occur in
a wide range of situations. The breadth, and the ambition, of these physi-
cists’ work can in particular be admired in the book “Spin Glass Theory and
Beyond” by M. Mézard, G. Parisi, M.A. Virasoro, and in the book “Field
Theory, Disorder and Simulation” by G. Parisi. The methods used by the
physicists are probably best described here as a combination of heuristic ar-
guments and numerical simulation. They are probably reliable, but they have
no claim to rigor, and it is often not even clear how to give a precise mathe-
matical formulation to some of the central predictions. The recent book [102]
by M. Mézard and A. Montanari is much more friendly to the mathemati-
cally minded reader. It covers a wide range of topics, and succeeds well at
conveying the depth of the physicists’ ideas.
It was rather paradoxical for a mathematician like the author to see sim-
ple, basic mathematical objects being studied by the methods of theoretical
physics. It was also very surprising, given the obvious importance of what the
physicists have done, and the breadth of the paths they have opened, that
mathematicians had not succeeded yet in proving any of their conjectures.
Despite considerable efforts in recent years, the program of giving a sound
mathematical basis to the physicists’ work is still in its infancy. We already
have tried to make the case that in essence this program represents a new
direction of probability theory. It is hence not surprising that, as of today, one
has not yet been able to find anywhere in mathematics an already developed
set of tools that would bear on these questions. Most of the methods used
in this book belong in spirit to the area loosely known as “high-dimensional
probability”, but they are developed here from first principles. In fact, for
much of the book, the most advanced tool that is not proved in complete
detail is Hölder’s inequality. The book is long because it attempts to fulfill
several goals (that will be described below) but reading the first two chapters
should be sufficient to get a very good idea of what spin glasses are about,
as far as rigorous results are concerned.
The author believes that the present area has a tremendous long-term
potential to yield incredibly beautiful results. There is of course no way of
telling when progress will be made on the really difficult questions, but to
provide an immediate incitement to seriously learn this topic, the author has
stated as research problems a number of interesting questions (the solution
of which would likely deserve to be published) that he believes are within the
reach of the already established methods, but that he purposely did not, and
XVI Introduction

will not, try to solve. (On the other hand, there is ample warning about the
potentially truly difficult problems.)
This book, together with a forthcoming second volume, forms a second
edition of our previous work [157],“Spin Glasses, a Challenge for Mathemati-
cians”. One of the goals in writing [157] was to increase the chance of signifi-
cant progress by making sure that no stone was left unturned. This strategy
greatly helped the author to obtain the solution of what was arguably at the
time the most important problem about mean-field spin glasses, the validity
of the “Parisi Formula”. This advance occurred a few weeks before [157] ap-
peared, and therefore could not be included there. Explaining this result in
the appropriate context is a main motivation for this second edition, which
also provides an opportunity to reorganize and rewrite with considerably
more details all the material of the first edition.
The programs of conferences on spin glasses include many topics that
are not touched here. This book is not an encyclopedia, but represents the
coherent development of a line of thought. The author feels that the real
challenge is the study of spin systems, and, among those, considers only pure
mean-field models from the “statics” point of view. A popular topic is the
study of “dynamics”. In principle this topic also bears on mean-field models
for spin glasses, but in practice it is as of today entirely disjoint from what
we do here.
This work is divided in two volumes, that total a rather large number
of pages. How is a reader supposed to attack this? The beginning of an
answer is that many of the chapters are largely independent of each other,
so that in practice these two volumes contain several “sub-books” that can
be read somewhat independently of each other. For example, there is the
“perceptron book” (Chapters 2, 3, 8, 9). On the other hand, we must stress
that we progressively learn how to handle technical details. Unless the reader
is already an expert, we highly recommend that she studies most of the first
four chapters before attempting to read anything else in detail.
We now proceed to a more detailed description of the contents of the
present volume. In Chapter 1 we study in great detail the Sherrington-
Kirkpatrick model (SK), the “original” spin glass, at sufficiently high temper-
ature. This model serves as an introduction to the basic ideas and methods.
In the remainder of the present volume we introduce six more models. In
this manner we try to demonstrate that the theory of spin glasses does not
deal only with such and such very specific model, but that the basic phe-
nomena occur again and again, as a kind of new law of nature (or at least
of probability theory). We present enough material to provide a solid under-
standing of these models, but without including any of the really difficult
results. In Chapters 2 and 3, we study the so-called “perceptron capacity
model”. This model is fundamental in the theory of neural networks, but the
underlying mathematical problem is the rather attractive question of com-
puting the “proportion” of the discrete cube (resp. the unit sphere) that is
Introduction XVII

typically contained in the intersection of many random half-spaces, the ques-


tion to which (0.2) answers in a special case. Despite the fact that the case
of the cube and of the sphere are formally similar, the case of the sphere is
substantially easier, because one can use there fundamental tools from con-
vexity theory. In Chapter 4 we study the Hopfield model, using an approach
invented by A. Bovier and V. Gayrard, that relies on the same tools from
convexity as Chapter 3. This approach is substantially simpler than the ap-
proach first invented by the author, although it yields less complete results,
and in particular does not seem to be able to produce either the correct rates
of convergence or even to control a region of parameters of the correct shape.
Chapter 5 introduces a new model, based on V -statistics. It is connected to
the Perceptron model of Chapter 2, but with a remarkable twist. The last two
chapters present models that are much more different from the previous ones
than those are from each other. They require somewhat different methods,
but illustrate well the great generality of the underlying phenomena. Chap-
ter 6 studies a common generalization of the diluted SK model, and of the
K-sat problem (a fundamental question of computer science). It is essentially
different from the models of the previous chapters, since it is a model with
“finite connectivity”, that is, a spin interacts in average only with a number
of spins that remains bounded as the size of the system increases (so we can
kiss goodbye to the Central Limit Theorem). Chapter 7 is motivated by the
random assignment problem. It is the least understood of all the models pre-
sented here, but must be included because of all the challenges it provides.
An appendix recalls many basic facts of probability theory.
Let us now give a preview of the contents of the forthcoming Volume
II. We shall first develop advanced results about the high-temperature be-
havior of some of the models that we introduce in the present volume. This
work is heartfully dedicated to all the physicists who think that the expres-
sions “high-temperature” and “advanced results” are contradictory. We shall
demonstrate the depth of the theory even in this supposedly easier situation,
and we shall present some of its most spectacular results. We shall return
to the Perceptron model, to prove the celebrated “Gardner formula” that
gives the proportion of the discrete cube (resp. the sphere, of which (0.2)
is a special case) that lies in the intersection of many random half spaces.
We shall return to the Hopfield model to present the approach through the
cavity method that yields the correct rates of convergence, as well as a region
of parameters of the correct shape. And we shall return to the SK model
to study in depth the high-temperature phase in the absence of an external
field.
In the rest of Volume II, we shall present low-temperature results. Be-
sides the celebrated Ghirlanda-Guerra identities that hold very generally,
essentially nothing is known outside the case of the SK model and some of its
obvious generalizations, such as the p-spin interaction model for p even. For
these models we shall present the basic ideas that underline the proof of the
XVIII Introduction

validity of the Parisi formula, as well as the complete proof itself. We shall
bring attention to the apparently deep mysteries that remain, even for the
SK model, the problem of ultrametricity and the problem of chaos. A final
chapter will be devoted to the case of the p-spin interaction model, for p odd,
for which the validity of the Parisi formula will be proved in a large region
of parameters using mostly the cavity method.
At some point I must apologize for the countless typos, inaccuracies, or
downright mistakes that this book is bound to contain. I have corrected many
of each from the first edition, but doubtlessly I have missed some and created
others. This is unavoidable. I am greatly indebted to Sourav Chatterjee,
Albert Hanen and Marc Yor for laboring through this entire volume and
suggesting literally hundreds of corrections and improvements. Their input
was really invaluable, both at the technical level, and by the moral support
it provided to the author. Special thanks are also due to Tim Austin, David
Fremlin and Fréderique Watbled. Of course, all remaining mistakes are my
sole responsibility.
This book owes its very existence to Gilles Godefroy. While Director of
the Jussieu Institute of Mathematics he went out of his way to secure what
has been in practice unlimited typing support for the author. Without such
support this work would not even have been started.
While writing this book (and, more generally, while devoting a large part
of my life to mathematics) it was very helpful to hold a research position
without any other duties whatsoever. So it is only appropriate that I express
here a life-time of gratitude to three colleagues, who, at crucial junctures,
went far beyond their mere duties to give me a chance to get or to keep this
position: Jean Braconnier, Jean-Pierre Kahane, Paul-André Meyer.
It is customary for authors, at the end of an introduction, to warmly thank
their spouse for having granted them the peaceful time needed to complete
their work. I find that these thanks are far too universal and overly enthu-
siastic to be believable. Yet, I must say simply that I have been privileged
with a life-time of unconditional support. Be jealous, reader, for I yet have
to hear the words I dread the most: “Now is not the time to work”.
1. The Sherrington-Kirkpatrick Model

1.1 Introduction

Consider a large population of individuals (or atoms) that we label from 1


to N . Let us assume that each individual knows all the others. The feelings
of the individual i towards the individual j are measured by a number gij
that can be positive, or, unfortunately, negative. Let us assume symmetry,
gij = gji , so only the numbers (gij )i<j are relevant. We are trying to model a
situation where these feelings are random. We are not trying to make realistic
assumptions, but rather to find the simplest possible model; so let us assume
that the numbers (gij )i<j are independent random variables. (Throughout
the book, the word “independent” should always be understood in the prob-
abilistic sense.) Since we are aiming for simplicity, let us also assume that
these random variables (r.v.s) are standard Gaussian. This is the place to
point out that Gaussian r.v.s will often be denoted by lower case letters.
A very important feature of this model (called frustration, in physics) is
that even if gij > 0 and gjk > 0 (that is, i and j are friends, and j and k
are friends), then i and k are just as likely to be enemies as they are to be
friends. The interactions (gij ) describe a very complex social situation.
Let us now think that we fix a typical realization of the numbers (gij ).
Here and elsewhere we say that an event is “typical” if (for large N ) it occurs
with probability close to 1. For example, the situation where nearly half of
the r.v.s gij are > 0 is typical, but the situation where all of them are < 0
is certainly not typical. Let us choose the goal of separating the population
in two classes, putting, as much as possible, friends together and enemies
apart. It should be obvious that at best this can be done very imperfectly:
some friends will be separated and some enemies will cohabit. To introduce
a quantitative way to measure how well we have succeeded, it is convenient
to assign to each individual i a number σi ∈ {−1, 1}, thereby defining two
classes of individuals. Possibly the simplest measure of how well these two
classes unite friends and separate enemies is the quantity

gij σi σj . (1.1)
i<j

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 1
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 1, © Springer-Verlag Berlin Heidelberg 2011
2 1. The Sherrington-Kirkpatrick Model

Trying to make this large invites making the quantities gij σi σj positive, and
thus invites in turn taking σi and σj of the same sign when gij > 0, and of
opposite signs when gij < 0.
Despite the simplicity of the expression (1.1), the optimization problem of
finding the maximum of this quantity (for a typical realization of the gij ) over
the configuration σ = (σ1 , . . . , σN ) appears to be of extreme difficulty, and
little is rigorously known about it. Equivalently, one can look for a minimum
of the function 
− gij σi σj .
i<j

Finding the minimum value of a function of the configurations is called in


physics a zero-temperature problem, because at zero temperature a system
is always found in its configuration of lowest energy. To a zero-temperature
problem is often associated a version of the problem “with a temperature”,
here the problem corresponding to the Hamiltonian
1 
HN (σ) = − √ gij σi σj . (1.2)
N i<j

That is, we think of the quantity (1.2) as being the energy of the configuration
σ. The purpose of the normalization factor N −1/2 will be apparent after (1.9)
below. The energy level of a given configuration depends on the (gij ), and
this randomness models the “disorder” of the situation.
The minus signs in the Boltzmann factor exp(−βHN (σ)) that arise from
the physical requirement to favor configurations of low energy are a nuisance
for mathematics. This nuisance is greatly decreased if we think that the object
of interest is (−HN ), i.e. that the minus sign is a part of the Hamiltonian.
We will use this strategy throughout the book. Keeping with this convention,
we write formula (1.2) as
1 
− HN (σ) = √ gij σi σj . (1.3)
N i<j

One goal is to understand the system governed by the Hamiltonian (1.3)


at a given (typical) realization of the disorder (i.e. the r.v.s gij ), or, equiva-
lently, at a given realization of the (random) Hamiltonian HN . To understand
better this Hamiltonian, we observe that the energies HN (σ) are centered
Gaussian r.v.s. The energies of two different configurations are however not
independent. In fact, for two configurations σ 1 and σ 2 , we have
1  1 1 2 2
E(HN (σ 1 )HN (σ 2 )) = σ σ σ σ
N i<j i j i j
 2
N 1  1 2 1
= σi σi − . (1.4)
2 N 2
i≤N
1.1 Introduction 3

Here we see the first occurrence of a quantity which plays an essential


part in the sequel, namely
1  1 2
R1,2 = R1,2 (σ 1 , σ 2 ) = σi σi . (1.5)
N
i≤N

This quantity is called the overlap of the configurations σ 1 , σ 2 , because the


closer it is to 1, the closer they are to each other. It depends on σ 1 and σ 2 ,
even though the compact notation keeps the dependence implicit. The words
“which plays an essential part in the sequel” have of course to be understood
as “now is the time to learn and remember this definition, that will be used
again and again”. We can rewrite (1.4) as

N 2 1
E(HN (σ 1 )HN (σ 2 )) = R − . (1.6)
2 1,2 2
Let us denote by d(σ 1 , σ 2 ) the Hamming distance of σ 1 , σ 2 , that is the
proportion of coordinates where σ 1 , σ 2 differ,
1
d(σ 1 , σ 2 ) = card{i ≤ N ; σi1 = σi2 } . (1.7)
N
Then
R1,2 = 1 − 2d(σ 1 , σ 2 ) , (1.8)
and this shows that R1,2 , and hence the correlation of the family (HN (σ)),
is closely related to the structure of the metric space (ΣN , d), where ΣN =
{−1, 1}N . This structure is very rich, and this explains why the simple ex-
pression (1.3) suffices to create a complex situation. Let us also note that
(1.4) implies
2 N −1
EHN (σ) = . (1.9)
2
2
Here is the place to point out that to lighten notation we write EHN
2 2
rather than E(HN ), a quantity that should not be confused with (EHN ) .
The reader should remember this when she will meet expressions such as
E|X − Y |2 .
We can explain to the reader having √ some basic knowledge of Gaus-
sian r.v.s the reason behind the factor N in (1.3). The 2N Gaussian
√ r.v.s
−HN (σ) are not too much correlated;√ each one is of “size about N ”. Their
maximum should be of size about N log 2N , i.e. about N , see Lemma
A.3.1. If one keeps in mind the physical picture that HN (σ) is the energy
of the configuration σ, a configuration of a N -spin system, it makes a lot of
sense that as N becomes large the “average energy per spin” HN (σ)/N re-
mains in a sense bounded independently of N . With the choice (1.3), some of
the terms exp(−βHN (σ)) will be (on a logarithmic scale) of the same order
as the entire sum ZN (β) = σ exp(−βHN (σ)), a challenging situation.
4 1. The Sherrington-Kirkpatrick Model

In the heuristic considerations leading to (1.3), we have made the assump-


tion that any two individuals interact. This is certainly not the case if the
individuals are atoms in a macroscopic sample of matter. Thus (1.3) is not,
by any means, a realistic model for disordered interactions among atoms.
A more realistic model would locate the atoms at the vertices of a lattice
(e.g. Z2 ) and would assume that the strength of the interaction between two
atoms decreases as their distance increases. The problem is that these models,
while more interesting from the point of view of physics, are also extremely
difficult. Even if one makes the simplifying assumption that an atom inter-
acts only with its nearest neighbors, they are so difficult that, at the time
of this writing, no consensus has been reached among physicists regarding
their probable behavior. It is to bypass this difficulty that Sherrington and
Kirkpatrick introduced the Hamiltonian (1.3), where the geometric location
of the atoms is forgotten and where they all interact with each other. Such a
simplification is called a “mean-field approximation”, and the corresponding
models are called “mean-field models”.
The Hamiltonian (1.3) presents a very special symmetry. It is invariant
under the transformation σ → −σ, and so is the corresponding Gibbs mea-
sure (0.4). This special situation is somewhat misleading. In order not to get
hypnotized by special features, we will consider the version of (1.3) “with an
external field”, i.e. where the Hamiltonian is
1  
− HN (σ) = √ gij σi σj + h σi . (1.10)
N i<j i≤N

The reader might observe that the sentence “the Hamiltonian is” preceding
(1.10) is not strictly true, since this formula actually give the value of −HN
rather than HN . It seems however harmless to allow such minor slips of
language. The last term in (1.10) represents the action of an “external field”,
that is a magnetic field created by an apparatus outside the sample of matter
we study. The external field favors the + spins over the − spins when h > 0.
With the Hamiltonian (1.10), the Boltzmann factor exp(−βHN (σ)) becomes
  
β 
exp √ gij σi σj + βh σi . (1.11)
N i<j i≤N

The coefficient βh of i≤N σi makes perfect sense to a physicist. However,
when one looksat the mathematical
 structure of (1.11), one sees that the two
terms N −1/2 i<j gij σi σj and i≤N σi appear to be of different natures.
Therefore, it would be more convenient to have unrelated coefficients in front
of these terms. For example, it is more cumbersome to take derivatives in β
when using the factors (1.11) than when using the factors
  
β 
exp √ gij σi σj + h σi .
N i<j i≤N
1.2 Notations and Simple Facts 5

Thus for the sake of mathematical clarity, it is better to abandon the


physical point of view of having a “main parameter” β. Rather, we will think
of the Hamiltonian as depending upon parameters β, h, . . . That is, we will
write
β  
− HN (σ) = −HN (σ, β, h) = √ gij σi σj + h σi (1.12)
N i<j i≤N

for the Hamiltonian of the Sherrington-Kirkpatrick model. The Boltzmann


factor exp(−βHN (σ)) then becomes
exp(−HN (σ, β, h))
or, with simpler notation, when β and h are implicit it becomes
exp(−HN (σ)) . (1.13)
Once one has survived the initial surprise of not seeing the customary term
β in (1.13), this notation works out appropriately. The formulas familiar to
the physicists can be recovered by replacing our term h by βh.

1.2 Notations and Simple Facts


The purpose of this book is to study “spin systems”. The main parameter
on which these depend is the number N of points of the system. To lighten
notation, the number N remains often implicit in the notation, such as in
(1.16) below. For all but the most trivial situations, certain exact computa-
tions seem impossibly difficult at a given value of N . Rather, we will obtain
“approximate results” that become asymptotically exact as N → ∞. As far
as possible, we have tried to do quantitative work, that is to obtain optimal
rates of convergence as N → ∞.
Let us recall that throughout the book we write ΣN = {−1, 1}N .
Given a Hamiltonian HN on ΣN = {−1, 1}N , that is, a family of numbers
(HN (σ))σ∈ΣN , σ ∈ ΣN , we define its partition function ZN by

ZN = exp(−HN (σ)) . (1.14)
σ

(Thus, despite its name, the partition function is a number, not a function.)
Let us repeat that we are interested in understanding what happens for N
very large. It is very difficult then to study ZN , as there are so many terms,
all random, in the sum. Throughout the book, we keep the letter Z to denote
a partition function.
The Gibbs measure GN on ΣN with Hamiltonian HN is defined by
exp(−HN (σ))
GN ({σ}) = . (1.15)
ZN
6 1. The Sherrington-Kirkpatrick Model

Exercise 1.2.1. Characterize the probability measures on ΣN that arise as


the Gibbs measure of a certain Hamiltonian. If the answer is not obvious to
you, start with the case N = 1.
Given a function f on ΣN , we denote by f its average for GN ,
1 
f = f (σ)dGN (σ) = f (σ) exp(−HN (σ)) . (1.16)
ZN σ
n
Given a function f on ΣN = (ΣN )n , we denote

f = f (σ 1 , . . . , σ n )dGN (σ 1 ) · · · dGN (σ n )
   
1
= n f (σ 1 , . . . , σ n ) exp − HN (σ  ) . (1.17)
ZN 1 n
σ ,...,σ ≤n

The notations (1.16) and (1.17) are in agreement. For example, if, say, a
function f (σ 1 , σ 2 ) on ΣN 2
depends only on σ 1 , we can also view it as a
function on ΣN ; and whether we compute f using (1.16) or (1.17), we get
the same result.
The formula (1.17) means simply that we integrate f on the space
(ΣN n
, G⊗n 1 2
N ). The configurations σ , σ , . . . belonging to the different copies
of ΣN involved there are called replicas. In probabilistic terms, the sequence
(σ  )≥1 is simply an i.i.d. sequence distributed like the Gibbs measure. Repli-
cas will play a fundamental role. In physics, they are called “real replicas”,
to distinguish them from the n replicas of the celebrated “replica method”,
where “n is an integer tending to 0”. (There is no need yet to consult your
analyst if the meaning of this last expression is unclear to you.) Through-
out the book we denote replicas by upper indices. Again, this simply means
that these configurations are integrated independently with respect to Gibbs’
measure.
Replicas can be used in particular for “linearization” i.e. replacing a prod-
uct of brackets · by a single bracket. In probabilistic terms, this is simply
the identity EXEY = EXY when X and Y are independent r.v.s. Thus (with
slightly informal but convenient notation) we have, for a function f on ΣN ,
2
f = f (σ 1 )f (σ 2 ) . (1.18)
The partition function ZN is exponentially large. It is better studied on a
logarithmic scale through the quantity N −1 log ZN . This quantity is random;
we denote by pN its expectation
1
pN = E log ZN . (1.19)
N
Here, E denotes expectation over the “disorder”, i.e. the randomness of the
Hamiltonian. (Hence in the case of the Hamiltonian (1.12), this means expec-
tation with respect to the r.v.s gij .) One has to prove in principle that the
1.2 Notations and Simple Facts 7

expectation exists. A sketch of proof is that the integrability of the function


log ZN can be obtained on the set {ZN ≥ 1} by using that log x ≤ x and that
EZN < ∞ and on the set {ZN ≤ 1} by using that ZN ≥ exp(−HN (σ 0 )),
where σ 0 is any given point of ΣN . This argument is “semi-trivial” in the
sense that there is “a lot of room”, and that it contains nothing fancy or
clever. We have claimed in the introduction that this is a fully rigorous work.
It seems however better to lighten the exposition in the beginning of this
work by not proving a number of “semi-trivial” facts as above, and a great
many statements will be given without a complete formal proof. Of course
triviality is in the eye of the beholder, but it seems that either the reader is
trained enough in analysis to complete the proofs of these facts without much
effort (in the unlikely event that she feels this is really necessary), or else that
she better take these facts for granted, since in any case they are quite beside
the main issues we try to tackle. We fear that too much technicality at this
early stage could discourage readers before they feel the beauty of the topic
and are therefore better prepared to accept the unavoidable pain of technical
work (which will be necessary soon enough). This policy of skipping some
“details” will be used only at the beginning of this work, when dealing with
“simple situations”. In contrast, when we will later be dealing with more
complicated situations, we will prove everything in complete detail.
The number pN of (1.19) is of fundamental importance, and we first try to
explain in words why. There will be many informal explanations such as this,
in which the statements are a sometimes imprecise and ambiguous description
of what happens, and are usually by no means obvious. Later (not necessarily
in the same section) will come formal statements and complete proofs. If you
find these informal descriptions confusing, please just skip them, and stick to
the formal statements.
In some sense, as N → ∞, the number pN captures much important
information about the r.v. N −1 log ZN . This is because (in all the cases of
interest), this number pN stays bounded below and above independently
of N , while (under rather general conditions) the fluctuations of the r.v.
N −1 log ZN become small as N → ∞ (its variance is about 1/N ). In physics’
terminology, the random quantity N −1 log ZN is “self-averaging”. At a crude
first level of approximation, one can therefore think of the r.v. N −1 log ZN
as being constant and equal to pN . For the SK model, this will be proved on
page 18.
Let us demonstrate another way that pN encompasses much information
about the system. For example, consider pN = pN (β, h) obtained in the case
of the Hamiltonian (1.12). Then we have

∂ 1 1 1 ∂ZN
log ZN =
∂h N N ZN ∂h
1 1  ∂(−HN (σ))
= exp(−HN (σ))
N ZN σ ∂h
8 1. The Sherrington-Kirkpatrick Model
 
1 1  
= σi exp(−HN (σ))
N ZN σ
i≤N

1 
= σi , (1.20)
N
i≤N

and ∗ therefore∗ , taking expectation,


∂pN 1 
= E σi . (1.21)
∂h N
i≤N

The ∗ therefore∗ involves the interchange of a derivative and an expectation,


which is in principle a non-trivial fact. Keeping in mind that ZN is a rather
simple function, a finite sum of very simple functions, we certainly do not
expect any difficulty there or in similar cases. We have provided in Proposition
A.2.1 a simple result that is sufficient for our needs, although we will not check
this every time. In the present case the interchange is made legitimate by the
fact that the quantities (1.20) are bounded by 1, so that (A.1) holds. Let
us stress the main point. The interchange of limits is done here at a given
value of N . In contrast, any statement involving limits as N → ∞ (and first
of all the existence of such limits) is typically much more delicate.
Let us note that
1 
E σi = E σ1 ,
N
i≤N

which follows from the fact that E σi does not depend on i by symmetry.
This argument will often be used. It is called “symmetry between sites”. (A
site is simply an i ≤ N , the name stemming from the physical idea that it is
the site of a small magnet.) Therefore
∂pN
= E σ1 , (1.22)
∂h
the “average magnetization.”
Since the quantity pN encompasses much information, its exact compu-
tation cannot be trivial, even in the limit N → ∞ (the existence of which
is absolutely not obvious). As a first step one can try to get lower and up-
per bounds. A very useful fact for the purpose of finding bounds is Jensen’s
inequality, that asserts that for a convex function ϕ, one has

ϕ(EX) ≤ Eϕ(X) . (1.23)

This inequality will be used a great many times (which means, as already
pointed out, that it would be helpful to learn it now). For concave functions
the inequality goes the other way, and the concavity of the log implies that
1 1
pN = E log ZN ≤ log EZN . (1.24)
N N
1.2 Notations and Simple Facts 9

The right-hand side of (1.24) is not hard to compute, but the bound (1.24)
is not really useful, as the inequality is hardly ever an equality.

Exercise 1.2.2. Construct a sequence ZN of r.v.s with ZN ≥ 1 such that


limN →∞ N −1 E log ZN = 0 but limN →∞ N −1 log EZN = 1.

Throughout the book we denote by ch(x), sh(x) and th(x) the hyperbolic
cosine, sine and tangent of x, and we write chx, shx, thx when no confusion
is possible.

Exercise 1.2.3. Use (A.6) to prove that for the Hamiltonian (1.12) we have

1 β2 1
log EZN = 1− + log 2 + log ch(h) . (1.25)
N 4 N

If follows from Jensen’s inequality and the convexity of the exponential


function that for a random variable X we have E exp X ≥ exp EX. Using this
for the uniform probability over ΣN we get
   
−N −N
2 exp(−HN (σ)) ≥ exp 2 −HN (σ) ,
σ σ

and taking logarithm and expectation this proves that pN ≥ log 2. Therefore,
combining with (1.24) and (1.25) we have (in the case of the Hamiltonian
(1.12)), and lightening notation by writing chh rather than ch(h),

β2 1
log 2 ≤ pN ≤ (1 − ) + log 2 + log chh . (1.26)
4 N
This rather crude bound will be much improved later. Let us also point out
that the computation of pN for every β > 0 provides the solution of the
“zero-temperature problem” of finding
1
E max(−HN (σ)) . (1.27)
N σ

Indeed,
    
exp β max(−HN (σ)) ≤ exp(−βHN (σ)) ≤ 2N exp β max(−HN (σ))
σ σ
σ

so that, taking logarithm and expectation we have


β 1 
E max(−HN (σ)) ≤ pN (β) := E log exp(−βHN (σ))
N σ N σ
β
≤ log 2 + E max(−HN (σ))
N σ

and thus
10 1. The Sherrington-Kirkpatrick Model

pN (β) 1 log 2
0≤ − E max(−HN (σ)) ≤ . (1.28)
β N σ β
Of course the computation of pN (β) for large β (even in the limit N →
∞) is very difficult but it is not quite as hopeless as a direct evaluation of
E maxσ (−HN (σ)).
For the many models we will consider in this book, the computation of
pN will be a central objective. We will be able to perform this computation
in many cases at “high temperature”, but the computation at “low temper-
ature” remains a formidable challenge.
We now pause for a while and introduce a different and simpler Hamilto-
nian. It is not really obvious that this Hamiltonian is relevant to the study of
the SK model, and that this is indeed the case is a truly remarkable feature.
We consider an i.i.d. sequence (zi )i≤N of standard Gaussian r.v.s. Consider
the Hamiltonian  √
− HN (σ) = σi (βzi q + h) , (1.29)
i≤N

where q is a parameter, that will be adjusted in due time. The sequence



(βzi q + h) is simply an i.i.d. sequence of Gaussian r.v.s. (that are not
centered if h = 0), so the random Hamiltonian (1.29) is rather canonical.
It is also rather trivial, because there is no interaction between sites: the

Hamiltonian is the sum of the terms σi (βzi q + h), each of which depends
only on the spin at one site. Let us first observe that if we are given numbers
ai (1) and ai (−1) we have the identity
 
ai (σi ) = (ai (1) + ai (−1)) . (1.30)
σ i≤N i≤N

Using this relation for



ai (σ) = exp(σ(βzi q + h)) (1.31)

we obtain
  

ZN = exp σi (βzi q + h)
σi =±1 i≤N
 √ √
= (exp(βzi q + h) + exp(−(βzi q + h))
i≤N
 √
= 2N ch(βzi q + h) , (1.32)
i≤N

where we recall that ch(x) denotes the hyperbolic cosine of x, so that



pN = log 2 + E log ch(βz q + h) (1.33)

where z is a standard Gaussian r.v.


1.2 Notations and Simple Facts 11

Consider now functions fi on {−1, 1} and the function



f (σ) = fi (σi ) .
i≤N

Then, using (1.30) yields


   
f (σ) ai (σi ) = fi (σi )ai (σi ) = (fi (1)ai (1) + fi (−1)ai (−1)) .
σ i≤N σ i≤N i≤N

Combining with (1.32) we get



f (σ) = fi i , (1.34)

where
fi (1)ai (1) + fi (−1)ai (−1)
fi i = . (1.35)
ai (1) + ai (−1)
This shows that Gibbs’ measure is a product measure. It is determined by
theaverages σi because a probability measure μ on {−1, 1} is determined
by xdμ(x). To compute the average σi , we use the case where f (σ) = σi
and (1.34), (1.35), (1.31) to obtain
√ √
exp(βzi q + h) − exp(−(βzi q + h))
σi = √ √ ,
exp(βzi q + h) + exp(−(βzi q + h))

and thus

σi = th(βzi q + h) , (1.36)
where thx denotes the hyperbolic tangent of x. Moreover the quantities (1.36)
are probabilistically independent.
In words, we can reduce the study of the system with Hamiltonian (1.29)
to the study of the system with one single spin σi taking the possible values

σi = ±1, and with Hamiltonian H(σi ) = −σi (βzi q + h).

Exercise 1.2.4. Given a number a, compute the averages exp aσi and
exp aσi1 σi2 for the Hamiltonian (1.29). Of course as usual, the upper in-
dexes denote different replicas, so exp aσi1 σi2 is a “double integral”. As in
the case of (1.36), this reduces to the case of a system with one spin, and it is
surely a good idea to master these before trying to understand systems with
N spins. If you need a hint, look at (1.107) below.

Exercise 1.2.5. Show that if a Hamiltonian H on ΣN decomposes as


H(σ) = H1 (σ) + H2 (σ) where H1 (σ) depends only on the values of σi for
i ∈ I ⊂ {1, . . . , N }, while H1 (σ) depends only on the values of σi for i in
the complement of I, then Gibbs’ measure is a product measure in a natural
way. Prove the converse of this statement.
12 1. The Sherrington-Kirkpatrick Model

For Hamiltonians that are more complicated than (1.29), and in partic-
ular when different sites interact, the Gibbs measure will not be a product
measure. Remarkably, however, it will often nearly be a product if one looks
only at a “finite number of spins”. That is, given any integer n (that does
not depend on N ), as N → ∞, the law of the Gibbs measure under the map
σ → (σ1 , . . . , σn ) becomes nearly a (random) product measure. Moreover,
the r.v.s ( σi )i≤n become nearly independent. It will be proved in this work
that this is true at high temperature for many models.
If one thinks about it, this is the simplest possible structure, the default
situation. It is of course impossible to interest a physicist in such a situation.
What else could happen, will he tell you. What else, indeed, but finding proofs
is quite another matter.
Despite the triviality of the situation (1.29), an (amazingly successful)
intuition of F. Guerra is that it will help to compare this situation with that
of the SK model. This will be explained in the next section. This comparison
goes quite far. In particular it will turn out that (when β is not too large)
for each n the sequence ( σi )i≤n will asymptotically have the same law as

the sequence (th(βzi q + h))i≤n , where zi are i.i.d. standard Gaussian r.v.s
and where the number q depends on β and h only. This should be compared
to (1.36).

1.3 Gaussian Interpolation and the Smart Path Method

To study a difficult situation one can compare it to a simpler one, by finding


a path between them and controlling derivatives along this path. This is
an old idea. In practice we are given the difficult situation, and the key
to the effectiveness of the method is to find the correct simple situation to
which it should be compared. This can be done only after the problem is
well understood. To insist upon the fact that the choice of the path is the
real issue, we call this method the smart path method . (More precisely, the
real issue is in the choice of the “easy end of the path”. Once this has been
chosen, the choice of the path itself will be rather canonical, except for its
“orientation”. We make the convention that the “smart path” moves from
the “easy end” to the “hard end”) The smart path method, under various
forms, will be the main tool throughout the book.
In the present section, we introduce this method in the case of Gaussian
processes. We obtain a general result of fundamental importance, Theorem
1.3.4 below, as well as two spectacular applications to the SK model. At the
same time, we introduce the reader to some typical calculations.
Consider an integer M and an infinitely differentiable function F on RM
(such that all its partial derivatives are of “moderate growth” in the sense
of (A.18)). Consider two centered jointly Gaussian families u = (ui )i≤M ,
v = (vi )i≤M . How do we compare
1.3 Gaussian Interpolation and the Smart Path Method 13

EF (u) and EF (v) ? (1.37)

Of course the quantity EF (u) is determined by the distribution of u, and


it might help to make sense of the formula (1.40) below to remember that
this distribution is determined by its covariance matrix, i.e. by the quantities
Eui uj (a fundamental property of Gaussian distributions). There is a canon-
ical method to compare the quantities (1.37) (going back to [137]). Since
we are comparing a function of the law of u with a function of the law of
v, we can assume without loss of generality that the families u and v are
independent. We consider u(t) = (ui (t))i≤M where
√ √
ui (t) = tui + 1 − tvi , (1.38)

so that u = u(1) and v = u(0), and we consider the function

ϕ(t) = EF (u(t)) . (1.39)

The following lemma relies on the Gaussian integration by parts formula


(A.17), one of the most constantly used tools in this work.

Lemma 1.3.1. For 0 < t < 1 we have


1 ∂2F
ϕ (t) = (Eui uj − Evi vj )E (u(t)) . (1.40)
2 i,j ∂xi ∂xj

Proof. Let
d 1 1
ui (t) = ui (t) = √ ui − √ vi
dt 2 t 2 1−t
so that
 ∂F
ϕ (t) = E ui (t) (u(t)) . (1.41)
∂xi
i≤M

Now
1
Eui (t)uj (t) =(Eui uj − Evi vj )
2
so the Gaussian integration by parts formula (A.17) yields (1.40). 

Of course (and this is nearly the last time in this chapter that we worry
about this kind of problem) there is some extra work to do to give a com-
plete ε-δ proof of this statement, and in particular to deduce (1.41) from
(1.39) using Proposition A.2.1. The details of the argument are given in Sec-
tion A.2.
Since Lemma 1.3.1 relies on Gaussian integration by parts, the reader
might have already formed the question of what happens when one deals with
non-Gaussian situations, such as when one replaces the r.v.s gij of (1.12) by,
say, independent Bernoulli r.v.s (i.e. random signs), or by more general r.v.s.
Generally speaking, the question of what happens in a probabilistic situation
14 1. The Sherrington-Kirkpatrick Model

when one replaces Gaussian r.v.s by random signs can lead to very difficult
(and interesting) problems, but in the case of the SK model, it is largely
a purely technical question. While progressing through our various models,
we will gradually learn how to address such technical problems. It will then
become obvious that most of the results of the present chapter remain true
in the Bernoulli case.
Even though the purpose of this work is to study spin glasses rather than
to develop abstract mathematics, it might help to make a short digression
about what is really going on in Lemma 1.3.1. The joint law of the Gaussian
family u is determined by the matrix of the covariances aij = Eui uj . This ma-
trix is symmetric, aij = aji , so it is completely determined by the triangular
array a = (aij )1≤i≤j≤n and we can think of the quantity EF (u) as a function
Ψ (a). The domain of definition of Ψ is a convex cone with non-empty interior
(since Ψ (a) is defined if and only if the symmetric matrix (aij )i,j≤n is positive
definite), so it (often) makes sense to think of the derivatives ∂Ψ/∂aij . The
fundamental formula is as follows.
Proposition 1.3.2. If i < j we have

∂Ψ ∂2F
(a) = E (u) , (1.42)
∂aij ∂xi ∂xj

while
∂Ψ 1 ∂2F
(a) = E 2 (u) . (1.43)
∂aii 2 ∂xi
Let us first explain why this implies Lemma 1.3.1. If one thinks of a Gaussian
family as determined by its matrix of covariance, the magic formula (1.40)
is just the canonical interpolation in Rn(n+1)/2 between the points (aij ) =
(Eui uj ) and (bij ) := (Evi vj ), since

aij (t) := Eui (t)ui (t) = tEui uj + (1 − t)Evi vj = taij + (1 − t)bij .

Therefore Lemma 1.3.1 follows from (1.42) and the chain rule, as is obvious
if we observe that ϕ(t) = Ψ (a(t)) where a(t) = (aij (t))1≤i≤j≤n and if we
reformulate (1.40) as
 ∂2F
ϕ (t) = (Eui uj − Evi vj )E (u(t))
∂xi ∂xj
1≤i<j≤n
 1 ∂2F
+ (E u2i − E vi2 )E (u(t)) .
2 ∂x2i
1≤i≤n

Proof of Proposition 1.3.2. Considering the triangular array

b = (bij )1≤i≤j≤n = (Evi vj )1≤i≤j≤n


1.3 Gaussian Interpolation and the Smart Path Method 15

and integrating (1.40) between 0 and 1 we get

Ψ (a) − Ψ (b) = EF (u) − EF (v) = ϕ(1) − ϕ(0)


1 1
∂2F
= (aij − bij ) E (u(t))dt
i,j
2 0 ∂xi ∂xj
 1
∂2F
= (aij − bij ) E (u(t))dt
0 ∂xi ∂xj
1≤i<j≤n
1 1
∂2F
+ (aii − bii ) E (u(t))dt . (1.44)
2 0 ∂x2i
i≤n

Now, as b gets close to a the integral


1
∂2F
E (u(t))dt
0 ∂xi ∂xj
tends to
∂2F
E (u) ,
∂xi ∂xj
because uniformly in t the distribution of u(t) gets close to the distribution
of u. Therefore
 ∂2F 1 ∂2F
Ψ (b) − Ψ (a) = (bij − aij )E (u) + (bii − aii )E (u) ,
∂xi ∂xj 2 ∂x2i
1≤i<j≤n i≤n
+ b − ao(b − a) , (1.45)

where  ·  denotes the Euclidean norm and o(x) a quantity that goes to 0
with x. This concludes the proof. 

This ends our mathematical digression. To illustrate right away the power
of the smart path method, let us prove a classical result (extensions of which
will be useful in Volume II).

Proposition 1.3.3. (Slepian’s lemma) Assume that

∂2F
∀i = j, ≥0
∂xi ∂xj
and
∀i ≤ M, Eu2i = Evi2 ; ∀i = j, Eui uj ≥ Evi vj .
Then
EF (u) ≥ EF (v) .

Proof. It is obvious from (1.40) that ϕ (t) ≥ 0. 



The following is a fundamental property.
16 1. The Sherrington-Kirkpatrick Model

Theorem 1.3.4. Consider a Lipschitz function F on RM , of Lipschitz con-


stant ≤ A. That is, we assume that, given x, y in RM , we have

|F (x) − F (y)| ≤ Ad(x, y) , (1.46)

where d denotes the Euclidean distance on RM . If g1 , . . . , gM are independent


standard Gaussian r.v.s, and if g = (g1 , . . . , gM ), then for each t > 0 we have

t2 
P(|F (g) − EF (g)| ≥ t) ≤ 2 exp − . (1.47)
4A2
The remarkable part of this statement is that (1.47) does not depend on
M . It is a typical occurrence of the “concentration of measure phenomenon”.
When F is differentiable (and this will be the case for all the applications
we will consider in this work) (1.46) is equivalent to the following

∀x ∈ RM , ∇F (x) ≤ A , (1.48)

where ∇F denotes the gradient of F , and x the Euclidean norm of the
vector x.
Proof. Let us first assume that F is infinitely differentiable (a condition
that is satisfied in all the cases where we use this result). Given a parameter
s, we would like to bound
 
E exp s(F (g) − EF (g)) . (1.49)

At the typographic level, a formula as above is on the heavy side, and we


will often omit the outer brackets when this creates no ambiguity, i.e. we
will write exp s(F (g) − EF (g)). To bound the quantity (1.49) it is easier
(using a fundamental idea of probability called symmetrization) to control
first E exp s(F (g) − F (g )) where g is an independent copy of g. (If you have
never seen this, observe as a motivation that F (g) − F (g ) is the difference
between two independent copies of the r.v. F (g) − EF (g).)
Given s, we consider the function G on R2M given by

G((yi )i≤2M ) = exp s(F ((yi )i≤M ) − F ((yi+M )i≤M )) .

We consider a family u = (ui )i≤2M of independent standard Gaussian


r.v.s and we would like to bound EG(u). The idea is to compare this situation
with the much simpler quantity EG(v) where vi = vi+M when i ≤ M (so
that G(v) = 1 and hence EG(v) = 1). So let us consider a family (vi )i≤2M
such that the r.v.s (vi )i≤M are independent standard Gaussian, independent
of the sequence (ui )i≤2M , and such that vi = vi−M if i ≥ M +1, and therefore
vi = vi+M if i ≤ M . We note that

Eui uj − Evi vj = 0

unless j = i + M or i = j + M in which case


1.3 Gaussian Interpolation and the Smart Path Method 17

Eui uj − Evi vj = −1 .

We consider u(t) as in (1.38), and ϕ(t) = EG(u(t)). Using (1.40) for G rather
than F , we get
 ∂2G
ϕ (t) = −E (u(t)) . (1.50)
∂yi ∂yi+M
i≤M

The reason why it is legitimate to use (1.40) is that “exp sF is of moderate


growth” (as defined on page 440) since F is Lipschitz. We compute

∂2G ∂F ∂F
(y) = −s2 ((yi )i≤M ) ((yi+M )i≤M ) G(y) . (1.51)
∂yi ∂yi+M ∂xi ∂xi

Now, (1.48) implies

  ∂F 2
∀x ∈ R M
, (x) ≤ A2 ,
∂xi
i≤M

and (1.50), (1.51) and the Cauchy-Schwarz inequality shows that

ϕ (t) ≤ s2 A2 ϕ(t) . (1.52)

As pointed out the choice of the family (vi )i≤2M ensures that ϕ(0) = EG(v) =
1, so that (1.52) implies that ϕ(1) ≤ exp s2 A2 , i.e.

E exp s(F (u1 , . . . , uM ) − F (uM +1 , . . . , u2M )) ≤ exp s2 A2 .

We use Jensen’s inequality (1.23) for the convex function exp(−sx) while
taking expectation in uM +1 , . . . , u2M , so that

E exp s(F (u1 , . . . , uM ) − EF (uM +1 , . . . , u2M ))


≤ E exp s(F (u1 , . . . , uM ) − F (uM +1 , . . . , u2M )) ≤ exp s2 A2 .

Going back to the notation g of Theorem 1.3.4, and since

EF (uM +1 , · · · , u2M ) = EF (g)

we have
E exp s(F (g) − EF (g)) ≤ exp s2 A2 .
Using Markov’s inequality (A.7) we get that for s, t > 0

P(F (g) − EF (g) ≥ t) ≤ exp(s2 A2 − st)

and, taking s = t/(2A2 ),

t2 
P(F (g) − EF (g) ≥ t) ≤ exp − .
4A2
18 1. The Sherrington-Kirkpatrick Model

Applying the same inequality to −F completes the proof when F is infinitely


differentiable (or even twice continuously differentiable). The general case
(that is not needed in this book) reduces to this special case by convolution
with a smooth function. 
The importance of Theorem 1.3.4 goes well beyond spin glasses, but it
seems appropriate to state a special case that we will use many times.
Proposition 1.3.5. Consider a finite set S and for s ∈ S consider a vector
a(s) ∈ RM and a number ws > 0. Consider the function F on RM given by

F (x) = log ws exp x · a(s) .
s∈S

Then F has a Lipschitz constant ≤ A = maxs∈S a(s).


Consequently if g1 , . . . , gM are independent standard Gaussian r.v.s, and
if g = (g1 , . . . , gM ), then for each t > 0 we have
 
t2
P(|F (g) − EF (g)| ≥ t) ≤ 2 exp − . (1.53)
4 maxs∈S a(s)2

Proof. The gradient ∇F (x) of F at x is given by



s∈S ws a(s) exp x · a(s)
∇F (x) =  ,
s∈S ws exp x · a(s)

so that ∇F (x) ≤ maxs∈S a(s), and we conclude from (1.47) using the
equivalence of (1.46) and (1.48). 

As a first example of application, let us consider the
 case where M =
N (N − 1)/2, S = ΣN , and, for s = σ ∈ S, wσ = exp h i≤N σi and
 
β
a(σ) = √ σi σj .
N 1≤i<j≤N

Therefore  1/2 
β N (N − 1) N
a(σ) = √ ≤β .
N 2 2
It follows from (1.53) that the partition function ZN of the Hamiltonian
(1.12) satisfies

t2 
P(| log ZN − E log ZN | ≥ t) ≤ 2 exp − . (1.54)
2β 2 N

If UN = N −1 log ZN , we can rewrite this as

t2 N 
P(|UN − EUN | ≥ t) ≤ 2 exp − .
2β 2
1.3 Gaussian Interpolation and the Smart Path Method 19

The right hand side starts to become small for t about N −1/2 , i.e. it is
unlikely that UN will differ from its expectation by more than a quantity of
order N −1/2 . In words, the fluctuations of the quantity UN = N −1 log ZN
are typically of order at most N −1/2 , while the quantity itself is of order 1.
This quantity is “self-averaging”, a fundamental fact, as was first mentioned
on page 7.
Let us try now to use (1.40) to compare two Gaussian Hamiltonians. This
technique is absolutely fundamental. It will be first used to make precise the
intuition of F. Guerra mentioned on page 12, but at this stage we try to obtain
a result that can also be used in other situations. We take M = 2N = cardΣN .
We consider two jointly Gaussian families u = (uσ ) and v = (vσ ) (σ ∈ ΣN ),
which we assume to be independent from each other. We recall the notation
√ √
uσ (t) = tuσ + 1 − tvσ ; u(t) = (uσ (t))σ ,

and we set
1
U (σ, τ ) = (Euσ uτ − Evσ vτ ) . (1.55)
2
Then (1.40) asserts that for a (well-behaved) function F on RM , if ϕ(t) =
EF (u(t)) we have
 ∂2F
ϕ (t) = E U (σ, τ ) (u(t)) . (1.56)
σ,τ
∂xσ ∂xτ

Let us assume that we are given numbers wσ > 0. For x = (xσ ) ∈ RM let us
define 
1
F (x) = log Z(x) ; Z(x) = wσ exp xσ . (1.57)
N σ

Thus, if σ = τ we have

∂2F 1 wσ wτ exp(xσ + xτ )
(x) = − ,
∂xσ ∂xτ N Z(x)2

while if σ = τ we have
 
∂2F 1 wσ exp xσ w2 exp 2xσ
(x) = − σ .
∂x2σ N Z(x) Z(x)2

Exercise 1.3.6. Prove that the function F and its partial derivatives of
order 1 satisfy the “moderate growth condition” (A.18). (Hint: Use a simple
bound from below on Z(x), such as Z(x) ≥ wτ exp xτ for a given τ in ΣN .)
This exercise shows that it is legitimate to use (1.56) to compute the
derivative of ϕ(t) = EF (u(t)), which is therefore
20 1. The Sherrington-Kirkpatrick Model
 
1 1
ϕ (t) = E U (σ, σ)wσ exp uσ (t)
N Z(u(t)) σ
 
1
− U (σ, τ )wσ wτ exp(uσ (t) + uτ (t)) . (1.58)
Z(u(t))2 σ,τ

Let us now denote by


· t

an average for the Gibbs measure with Hamiltonian


√ √
− Ht (σ) = uσ (t) + log wσ = tuσ + 1 − tvσ + log wσ . (1.59)

Any function f on ΣN satisfies the formula


 
f (σ) exp(−Ht (σ)) σ wσ f (σ) exp uσ (t)
f t= 
σ
=  .
σ exp(−H t (σ)) σ wσ exp uσ (t)

The notation · t will be used many times in the sequel. It would be nice to
remember now that the index t simply refers to the value of the interpolating
parameter. This will be the case whenever we use an interpolating Hamil-
tonian. If you forget the meaning of a particular notation, you might try to
look for it in the glossary or the index, that attempt to list for many of the
typical notations the page where it is defined.
Thus (1.58) simply means that
1
ϕ (t) = (E U (σ, σ) t − E U (σ 1 , σ 2 ) t ) . (1.60)
N
In the last term the bracket is a double integral for Gibbs’ measure, and the
variables are denoted σ 1 and σ 2 rather than σ and τ .
The very general formula (1.60) applies to the interpolation between any
two Gaussian Hamiltonians, and is rather fundamental in the study of such
Hamiltonians.
We should observe for further use that (1.60) even holds if the quantities
wσ are random, provided their randomness is independent of the randomness
of uσ and vσ . This is seen by proving (1.60) at wσ given, and taking a further
expectation in the randomness of these quantities. (When doing this, we
permute expectation in the r.v.s wσ and differentiation in t. Using Proposition
A.2.1 this is permitted by the fact that the quantity (1.60) is uniformly
bounded over all choices of (wσ ) by (1.65) below.)
The consideration of Hamiltonians such as (1.29) shows that it is natural
to consider “random external fields”. That is, we consider an i.i.d. sequence
(hi )i≤N of random variables, having the same distribution as a given r.v. h
(with moments of all orders). We assume that this sequence is independent
of all the other r.v.s. Rather than the Hamiltonian (1.12) we consider instead
the more general Hamiltonian
1.3 Gaussian Interpolation and the Smart Path Method 21

β  
− HN (σ) = √ gij σi σj + hi σi . (1.61)
N i<j i≤N

There is presently nothing to change either to the notation or the proofs to


consider this more general case, so this will be our setting. Whenever extra
work would be needed to handle this case, we will come back to the case
where hi is non-random.
Since there are now two sources of randomness in the disorder, namely
the gij and the hi , this is the place to mention that throughout the book,
and unless it is explicitly specified otherwise, as is absolutely standard, the
notation E will stand for expectation over all these sources of randomness.
When we have two (or more) independent sources of randomness like here,
and we want to take expectation only on, say, the r.v.s gij , we will say just
that, or (as probabilists often do) that we take expectation conditionally on
the r.v.s hi , or given the r.v.s hi .
To compare (following Guerra) the Hamiltonian (1.61) with the simpler
Hamiltonian (1.29) we use (1.60) in the case
 
β   √
uσ = √ gij σi σj ; vσ = β zi qσi ; wσ = exp hi σi ,
N i<j≤N i≤N i≤N
(1.62)
where 0 ≤ q ≤ 1 is a parameter. Recalling the fundamental notation (1.5),
relation (1.6) implies
β2  
Euσ1 uσ2 = 2
N R1,2 −1 (1.63)
2
and
Evσ1 vσ2 = N β 2 qR1,2 .
Recalling (1.55), we have
 
1 β2 1 β2
U (σ 1 , σ 2 ) = R1,2 −
2
− qR1,2 , (1.64)
N 4 N 2
and since R1,2 (σ, σ) = 1, we get
β2
ϕ (t) = (1 − 2q − E R1,2
2
t + 2qE R1,2 t )
4
β2 β2
= − E (R1,2 − q)2 t + (1 − q)2 . (1.65)
4 4
A miracle has occurred. The difficult term is negative, so that
β2
ϕ (t) ≤ (1 − q)2 . (1.66)
4
Needless to say, such a miracle will not occur for many models
 1 of interest, so
we better enjoy it while we can. The relation ϕ(1) = ϕ(0) + 0 ϕ (t)dt implies
22 1. The Sherrington-Kirkpatrick Model

β2
ϕ(1) ≤ ϕ(0) + (1 − q)2 . (1.67)
4
When considering an interpolating Hamiltonian Ht we will always lighten
notation by writing H0 rather than Ht=0 . Recalling the choice of vσ in (1.62)
it follows from (1.59) that
 √
− H0 (σ) = σi (βzi q + hi ) , (1.68)
i≤N

and, as in (1.33), we obtain



ϕ(0) = log 2 + E log ch(βz q + h) , (1.69)

where of course the expectation is over the randomness of z and h.


Let us now consider the partition function of the Hamiltonian (1.61),
  
β  
ZN (β, h) = exp √ gij σi σj + hi σi . (1.70)
σ N i≤N i≤N

Here we have chosen convenient but technically incorrect notation. The no-
tation (1.70) is incorrect, since ZN (β, h) depends on the actual realization
of the r.v.s hi , not only on h. Speaking of incorrect notation, we will go one
step further and write
1
pN (β, h) := E log ZN (β, h) . (1.71)
N
The expectation in the right hand side is over all sources of randomness, in
this case the r.v.s hi , and (despite the notation) the quantity pN (β, h) is a
number depending only on β and the law of h. If L(h) denotes the law of
h, it would probably be more appropriate to write pN (β, L(h)) rather than
pN (β, h). The simpler notation pN (β, h) is motivated by the fact that the
most important case (at least in the sense that it is as hard as the general
case) is the case where h is constant. If this notation disturbs you, please
assume everywhere that h is constant and you will not lose much.
Thus with these notations we have

pN (β, h) = ϕ(1) . (1.72)

In the statement of the next theorem E stands as usual for expectation in all
sources of randomness, here the r.v.s z and h. This theorem is a consequence
of (1.72), (1.67) and (1.69).
Theorem 1.3.7. (Guerra’s replica-symmetric bound). For any choice
of β, h and q we have
√ β2
pN (β, h) ≤ log 2 + E log ch(βz q + h) + (1 − q)2 . (1.73)
4
1.3 Gaussian Interpolation and the Smart Path Method 23

Again, despite the notation, the quantity pN (β, h) is a number. The ex-
pression “replica-symmetric” is physics’ terminology. Its meaning might grad-
ually become clear. The choice q = 0, h constant essentially recovers (1.25).
It is now obvious what is the best choice of q: the choice that minimizes
the right-hand side of (1.73), i.e.
 
βz √ β2 β2 1
0 = E √ th(βz q + h) − (1 − q) = E 2 √ − (1 − q) ,
2 q 2 2 ch (βz q + h)

using Gaussian integration by parts (A.14). Since ch−2 (x) = 1 − th2 (x), this
means that we have the absolutely fundamental relation

q = Eth2 (βz q + h) . (1.74)

Of course at this stage this equation looks rather mysterious. The mystery will
gradually recede, in particular in (1.105) below. The reader might wonder at
this stage why we do not give a special name, such as q ∗ , to the fundamental
quantity defined by (1.74), to distinguish it from the generic value of q. The
reason is simply that in the long range it is desirable that the simplest name
goes to the most used quantity, and the case where q is not the solution of
(1.74) is only of some limited interest.
It will be convenient to know that the equation (1.74) has a unique solu-
tion.
Proposition 1.3.8. (Latala-Guerra) The function

th2 (z x + h)
Ψ (x) = E
x
is strictly decreasing on R+ and vanishes as x → ∞. Consequently if Eh2 > 0
there is a unique solution to (1.74).
The difficult part of the statement is the proof that the function Ψ is
strictly decreasing. In that case, since limx→0+ xΨ (x) = Eth2 h > 0, we have
limx→0+ Ψ (x) = ∞, and since limx→∞ Ψ (x) = 0 there is a unique solution to
the equation Ψ (x) = 1/β 2 and hence (1.74). But Eth2 h = 0 only when h = 0
a.e. (in which case when β > 1 there are 2 solutions to (1.74), one of which
being 0).
Proposition 1.3.8 is nice but not really of importance. The proof is very
beautiful, but rather tricky, and the tricky ideas are not used anywhere else.
To avoid distraction, we postpone this proof until Section A.14. At this stage
we give the proof only in the case where β < 1, because the ideas of this
simple argument will be used√ again and again. Given a (smooth) function f
the function ψ(x) = Ef (βz x + h) satisfies

z √ β2 √
ψ (x) = βE √ f (βz x + h) = Ef (βz x + h) , (1.75)
2 x 2
24 1. The Sherrington-Kirkpatrick Model

using Gaussian integration by parts (A.14). We use this for the function
f (y) = th2 y, that satisfies

thy 1 − 2sh2 y
f (y) = 2 ; f (y) = 2 ≤2.
ch2 y ch4 y

Thus, if β < 1, we deduce from (1.75) that the function ψ(q) = Eth2 (βz q +
h) satisfies ψ (q) < 1. This function maps the unit interval into itself, so that
it has a unique fixed point.
Let us denote by SK(β, h) the right-hand side of (1.73) when q is as in
(1.74). As in the case of pN (β, h) this is a number depending only on β and
the law of h. Thus (1.73) implies that

pN (β, h) ≤ SK(β, h) . (1.76)

We can hope that when q satisfies (1.74) there is near equality in (1.76),
so that the right hand-side of (1.76) is not simply a good bound for pN (β, h),
but essentially the value of this quantity as N → ∞. Moreover,
1 we have a
clear road to prove this, namely (see (1.65)) to show that 0 E (R1,2 − q)2 t dt
is small. We will pursue this idea in Section 1.4, where we will prove that
this is indeed the case when β is not too large. The case of large β (low
temperature) is much more delicate, but will be approached in Volume II
through a much more elaborated version of the same ideas.
Theorem 1.3.9. (Guerra-Toninelli [75]) For all values of β, h, the se-
quence (N pN (β, h))N ≥1 is superadditive, that is, for integers N1 and N2 we
have
N1 N2
pN1 +N2 (β, h) ≥ pN1 (β, h) + pN (β, h). (1.77)
N1 + N2 N1 + N2 2
Consequently, the limit

p(β, h) = lim pN (β, h) (1.78)


N →∞

exists.
Of course this does not tell us what is the value of p(β, h), although we know
by (1.76) that p(β, h) ≤ SK(β, h).
Proof. Let N = N1 +N2 . The idea is to compare the SK Hamiltonian of size
N with two non-interacting SK Hamiltonians of sizes N1 and N2 . Consider
uσ as in (1.62) and

β  β 
vσ = √ gij σi σj + √ gij σi σj ,
N1 i<j≤N1
N2 N1 <i<j≤N
1.3 Gaussian Interpolation and the Smart Path Method 25

where gij are i.i.d. standard Gaussian r.v.s independent of the


r.v.s gij . Con-
sidering ϕ as in (1.39) with F (x) as in (1.57) and wσ = exp i≤N hi σi , we
have
N1 N2
ϕ(0) = pN1 (β, h) + pN2 (β, h)
N N
ϕ(1) = pN (β, h) .

Let us recall yet another time the fundamental notation (1.9),



R1,2 = N −1 σi1 σi2 ,
i≤N

and let us define similarly


 
R = N1−1 σi1 σi2 ; R = N2−1 σi1 σi2 ,
i≤N1 N1 <i≤N

so that
N1 N2
R1,2 = R + R .
N N
The convexity of the function x → x2 implies
N1 2 N2
2
R1,2 ≤ R + R 2
. (1.79)
N N
Rather than (1.64), a few lines of elementary algebra now yield
 
1 β2 N1 2 N2 2 1
1 2
U (σ , σ ) = R1,2 −
2
R − R + . (1.80)
N 4 N N N

When σ 1 = σ 2 we have R1,2 = R = R = 1, so that (1.60) entails

β2 N1 2 N2
ϕ (t) = − 2
E R1,2 − R − R 2
≥0
4 N N t

by (1.79). The fact that limN →∞ rN /N exists for a superadditive sequence


(rN ) is classical. It is called “Fekete’s lemma” and is even mentioned (with a
reference to the original paper) in Wikipedia. 


Exercise 1.3.10. Carry out the proof of (1.80).

Generally speaking it seems plausible that “all limits exist”. Some infor-
mation can be gained using an elementary fact known as Griffiths’ lemma
in statistical mechanics. This is, if a sequence ϕN of convex (differentiable)
functions converges pointwise in an interval to a (necessarily convex) func-
tion ϕ, then limN →∞ ϕN (x) = ϕ (x) at every point x for which ϕ (x) exists
(which is everywhere outside a countable set of possible exceptional values).
26 1. The Sherrington-Kirkpatrick Model

If Griffiths’ lemma does not seem obvious to you, please do not worry, for the
time being this is only a side story, the real point of it being a pretense to
introduce Lemma 1.3.11 below, a step in our learning of Gaussian integration
by parts. Later on, in Volume II, we will use quantitative versions of Griffiths’
lemma with complete proofs.
It is a special case of Hölder’s inequality that the function

β → log f β dμ

is convex (whenever f > 0) for any probability measure μ. Indeed, this means
that for 0 < a < 1 and β1 , β2 > 0 we have
   
a 1−a
f aβ1 +(1−a)β2 dμ ≤ f β1 dμ f β2 dμ ,

and setting U = f β1 and V = f β2 this is the inequality


 a  1−a
U V a 1−a
dμ ≤ U dμ V dμ .

Consequently (thinking of the sum in (1.70) as an integral) the function


1
β → Φ(β) = log ZN (β, h) (1.81)
N
is a convex random function (a fact that will turn out to be essential much
later). An alternative proof of the convexity of the function Φ, more in line
with the ideas of statistical mechanics, is as follows. As in (1.62) we define
1 
uσ = √ gij σi σj . (1.82)
N i<j≤N
 
Let wσ = exp i≤N hi σi , so that ZN := ZN (β, h) = σ wσ exp βuσ and

N Φ(β) = log wσ exp βuσ .
σ

Thus
1 
N Φ (β) = wσ uσ exp βuσ ( = uσ ) (1.83)
ZN σ
and
 2
1  1 
N Φ (β) = wσ uσ exp βuσ −
2
wσ uσ exp βuσ
ZN σ ZN σ
= u2σ − uσ 2
≥0,
1.3 Gaussian Interpolation and the Smart Path Method 27

where the last inequality is simply the Cauchy-Schwarz inequality used in the
probability space (ΣN , GN ).
In particular pN (β, h) is a convex function of β. By Theorem 1.3.9,
p(β, h) = limN →∞ pN (β, h) exists. The function β → p(β, h) is convex and
therefore is differentiable at every point outside a possible countable set of
exceptional values. Now, we have the following important formula.

Lemma 1.3.11. For any value of β we have


∂ β
pN (β, h) = (1 − E R1,2
2
). (1.84)
∂β 2
2
Thus Griffiths’ lemma proves that, given h, limN →∞ E R1,2 exists for each
value of β where the map β → p(β, h) is differentiable.
It is however typically much more difficult to prove that no such excep-
tional values exist. We will be able to prove it after considerable work in
Volume II.
Proof of Lemma 1.3.11. We recall (1.82). Defining

R(σ, τ ) = N −1 σi τi ,
i≤N

we can rewrite (1.63) (where we take β = 1) as


1
Euσ uτ = (N R(σ, τ )2 − 1) .
2

Let again wσ = exp i≤N hi σi , so that taking expectation in (1.83) yields

∂ 1 σ wσ uσ exp βuσ
pN (β, h) = E 
∂β N τ wτ exp βuτ
1  wσ exp βuσ
= Euσ  . (1.85)
N σ τ wτ exp βuτ

To compute
wσ exp βuσ
Euσ  ,
τ wτ exp βuτ
we first think of the quantities wτ as being fixed numbers, with wτ > 0. We
then apply the Gaussian integration by parts formula (A.17) to the jointly
Gaussian family (uτ )τ and the function

wσ exp βxσ
Fσ (x) =  ,
τ wτ exp βxτ

to get
28 1. The Sherrington-Kirkpatrick Model

1 wσ exp βuσ β 1 wσ exp βuσ


Euσ  = 1− E
N w
τ τ exp βu τ 2 N τ wτ exp βuτ
β  wσ wτ (R(σ, τ )2 − 1/N ) exp β(uσ + uτ )
− E  2 .
2 τ ( τ wτ exp βuτ )

 in the randomness of the r.v.s hi , this equality


Taking a further expectation
remains true if wσ = exp i≤N hi σi , and using this formula in (1.85) we get
the result. 

Although the reader might not have noticed it, in the proof of (1.84) we
have done something remarkable, and it is well worth to spell it out. Looking
at the definition of the Hamiltonian, it would be quite natural to think of the
quantity
1
log ZN (β, h)
N
as a function of the N (N − 1)/2 r.v.s gij . Instead, we have been thinking of
it as a function of the much larger family of the 2N r.v.s uσ . Such a shift in
point of view will be commonplace in many instances where we will use the
Gaussian integration by parts formula (A.17). The use of this formula can
greatly simplify if one uses a clever choice for the Gaussian family of r.v.s.

Exercise 1.3.12. To make clear the point of the previous remark, derive
formula (1.84) by considering ZN as a function of the r.v.s (gij ).

Of course, after having proved (1.60), no great inventiveness was required


to think of basing the integration by parts on the family (uσ ), in particular in
view of the following (that reveals that the only purpose of the direct proof
of Lemma 1.3.11 we gave was to have the reader think a bit more about
Gaussian integration by parts (A.17)).

Exercise 1.3.13. Show that (1.84) can in fact be deduced from (1.60). Hint:
use uσ as in (1.62) but take now vσ = 0.

As the next exercise shows, the formula (1.84) is not an accident, but a
first occurrence of a general principle that we will use a great many times
later. In the long range the reader would do well to really master this result.
Exercise 1.3.14. Consider a jointly Gaussian family of r.v.s (HN (σ))σ∈ΣN
and another family (HN (σ))σ∈ΣN of r.v.s. These two families are assumed
to be independent of each other. Let
1 
pN (β) = E log exp(−βHN (σ) − HN (σ)) .
N σ

Prove that
d β
pN (β) = (E U (σ, σ) − E U (σ 1 , σ 2 ) ) ,
dβ N
1.4 Latala’s Argument 29

where U (σ 1 , σ 2 ) = EHN (σ 1 )HN (σ 2 ), and where the bracket is an average


for the Gibbs measure with Hamiltonian βHN + HN . Prove that this formula
generalizes (1.84).
3
Research Problem 1.3.15. (Level 2) Prove that limN →∞ E R1,2 exists
for all β.
In fact, it does not even seem to have been shown that, given h, there exist
values of β beyond the Almeida-Thouless line of Section 1.9 where this limit
exists.
To help the newcomer to the area, the research problems are ranked level 1
to level 3. The solution to level 1 problems should be suitable for publication,
but the author feels that they can be successfully attacked by the methods
explained in this book, or simple extensions of these methods. To put it
another way, the author feels that he would be rather likely to solve them in
(expected) finite time if he tried (which he won’t). Level 2 problems are more
likely to require ingredients substantially beyond what is found in the book.
On the other hand these problems do not touch what seem to be the central
issues of spin glass theory, and there is no particular reason to think that
they are very hard. Simply, they have not been tried. Level 3 problems seem
to touch essential issues, and there is currently no way of telling how difficult
they might be. It goes without saying that this classification is based on the
author’s current understanding, and comes with no warranty whatsoever. (In
particular, problems labeled level 2 as the above might well turn out to be
level 3.)

1.4 Latala’s Argument


It will turn out in many models that at high temperature “the overlap is
essentially constant”. That is, there exists a number q, depending only on the
system, such that if one picks two configurations σ 1 and σ 2 independently
according to Gibbs’ measure, one observes that typically
R1,2  q . (1.86)
The symbol  stands of course for approximate equality. It will often be
used in our informal explanations, and in each case its precise meaning will
soon become apparent. It is not surprising in the least that a behavior such
as (1.86) occurs. If we remember that N R1,2 = i≤N σi1 σi2 , and if we expect
to have at least some kind of “weak independence” between the sites, then
(1.86) should hold by the law of large numbers. The reader might have also
observed that a condition of the type (1.86) is precisely what is required to
nullify the dangerous term E (R1,2 − q)2 t in (1.65).
What is not intuitive is that (1.86) has very strong consequences. In par-
ticular it implies that at given typical disorder, a few spins are nearly inde-
pendent under Gibbs measure, as is shown in Theorem 1.4.15 below. (The
30 1. The Sherrington-Kirkpatrick Model

expression “a few spins” means that we consider a fixed number of spins, and
then take N very large.) For many of the models we will study the proof of
(1.86) will be a major goal, and the key step in the computation of pN .
In this section we present a beautiful (unpublished!!) argument of R.
Latala that probably provides the fastest way to prove (1.86) for the SK
model at high enough temperature (i.e. β small enough). This argument is
however not easy to generalize in some directions, and we will learn a more
versatile method in Section 1.6.
From now on we lighten notation by writing ν(f ) for E f . In this section
the Gibbs measure is relative to the Hamiltonian (1.61), that is

β  
−HN (σ) = √ gij σi σj + hi σi .
N i<j i≤N

The next theorem provides a precise version of (1.86), in the form of a strong
exponential inequality.

Theorem 1.4.1. Assume β < 1/2. Then for 2s < 1 − 4β 2 we have


  1
ν exp sN (R1,2 − q)2 ≤ , (1.87)
1 − 2s − 4β 2

where q is the unique solution of (1.74), i.e. q = Eth2 (βz q + h).

Of course here to lighten notation we write exp sN (R1,2 − q)2 rather than
exp(sN (R1,2 − q)2 ). Since exp x ≥ xk /k! for x ≥ 0 and k ≥ 1, this shows that

1 1
ν((sN )k (R1,2 − q)2k ) ≤ ,
k! 1 − 2s − 4β 2

so that, since k! ≤ kk ,

1 ks k
ν((R1,2 − q)2k ) ≤ ,
1 − 2s − 4β 2 N

and in particular
  Kk k
ν (R1,2 − q)2k ≤ , (1.88)
N
where K does not depend on N or k.
The important relationship between growth of moments and exponential
integrability is detailed in Section A.6. This relation is explained there for
probabilities. It is perfectly correct to think of ν (and of its avatar νt defined
below) as being the expectation for a certain probability. This can
be made formal. We do not explain this since it requires an extra level of
abstraction that does not seem very fruitful.
An important special case of (1.88) is:
1.4 Latala’s Argument 31
 
ν (R1,2 − q)2 ≤ K/N . (1.89)

Equation (1.88) is the first of very many that involve an unspecified con-
stant K. There are several reasons why it is desirable to use such constants. A
clean explicit value might be hard to get, or, like here, it might be irrelevant
and rather distracting. When using such constants, it is understood through-
out the book that their value might not be the same at each occurrence. The
use of the word “constant” to describe K is because this number is never,
ever permitted to depend on N . On the other hand, it is typically permitted
to depend on β and h. Of course we will try to be more specific when the
need arises. An unspecified constant that does not depend on any parameter
(a so-called universal constant) will be denoted by L,and the value of this
quantity might also not be the same at each occurrence (as e.g. in the relation
L = L + 2). Of course, K0 , L1 , etc. denote specific quantities. These conven-
tions will be used throughout the book and it surely would help to remember
them from now on.
It is a very non-trivial question to determine the supremum of the values
of β for which one can control ν(exp sN (R1,2 − q)2 ) for some s > 0, or the
supremum of the values of β for which (1.89) holds. (It is believable that these
are the same.) The method of proof of Theorem 1.4.1 does not allow one to
reach this value, so we do not attempt to push the method to its limit, but
rather to give a clean statement. There is nothing magic about the condition
β < 1/2, which is an artifact of the method of proof. In Volume II, we will
prove that actually (1.88) holds in a much larger region.
We now turn to a general principle of fundamental importance. We go
back to the general case of Gaussian families (uσ ) and (vσ ), and for σ ∈ ΣN
we consider a number wσ > 0. We recall that we denote by

· t

an average for the Gibbs measure with Hamiltonian (1.59), that is,
√ √
−Ht (σ) = tuσ + 1 − tvσ + log wσ = uσ (t) + log wσ .
n
Then, for a function f on ΣN (= (ΣN )n ) we have
  
−n
f t = Z(u(t)) f (σ , . . . , σ )wσ1 · · · wσn exp
1 n
uσ (t) ,
σ 1 ,...,σ n ≤n

where Z(u(t)) = σ wσ exp uσ (t). We write

d
νt (f ) = E f t ; νt (f ) = (νt (f )) .
dt
The general principle stated in Lemma 1.4.2 below provides an explicit for-
mula for νt (f ). It is in a sense a straightforward application of Lemma 1.3.1.
32 1. The Sherrington-Kirkpatrick Model

However, since Lemma 1.3.1 requires computing the second partial deriva-
tives of the function F , when this function is complicated, (e.g. is a quotient
of 2 factors) we must face the unavoidable fact that this will produce for-
mulas that are not as simple as we might wish. We should be well prepared
for this, as we all know that computing derivatives can lead to complicated
expressions.
We recall the function of two configurations U (σ, τ ) given by (1.55), that
is, U (σ, τ ) = 1/2(Euσ uτ − Evσ vτ ). Thus, in the formula below, the quantity

U (σ  , σ  ) is
 1
U (σ  , σ  ) = (Euσ uσ − Evσ vσ ) .
2
We also point out that in this formula, to lighten notation, f stands for
f (σ 1 , . . . , σ n ).
n
Lemma 1.4.2. If f is a function on ΣN (= (ΣN )n ), then
  
νt (f ) = νt (U (σ  , σ  )f ) − 2n νt (U (σ  , σ n+1 )f )
1≤, ≤n ≤n

− nνt (U (σ n+1
,σ n+1
)f ) + n(n + 1)νt (U (σ n+1 , σ n+2 )f ) . (1.90)

This formula looks scary the first time one sees it, but one should observe
that the right-hand side is a linear combination of terms of the same nature,
each of the type
 
νt (U (σ  , σ  )f ) = E U (σ  , σ  )f (σ 1 , . . . , σ n ) t .

The complication is purely algebraic (as it should be). One can observe that
even though f depends only on n replicas, (1.90) involves two new indepen-
dent replicas σ n+1 and σ n+2 .
We will use countless times a principle called symmetry between repli-
cas, a name not to be confused with the expression “replica-symmetric”.
This principle asserts e.g. that ν(f (σ 1 )U (σ 1 , σ 2 )) = ν(f (σ 1 )U (σ 1 , σ 3 )). The
reason for this is simply that the sequence (σ  ) is an i.i.d. sequence under
Gibbs’ measure, so that for any permutation π of the replica indices, and any
function f (σ 1 , . . . , σ n ), one has f (σ 1 , . . . , σ n ) = f (σ π(1) , . . . , σ π(n) ) , and
hence taking expectation,

ν(f (σ 1 , . . . , σ n )) = ν(f (σ π(1) , . . . , σ π(n) )) .


n
In particular, if f is a function on ΣN , then the value of ν(U (σ  , σ r )f ) does
not depend on the value of r if r ≥ n + 1. Similarly, if  >  > n, the value

of ν(U (σ  , σ  )f ) does not depend on  or  .
n
Exercise 1.4.3. a) Let us take for f the function on ΣN that is constant
equal to 1. Then f t = 1, so that νt (f ) = 1 for each t and hence νt (f ) = 0.
Prove that in that case the right-hand side of (1.90) is 0.
1.4 Latala’s Argument 33

n n
b) A function f on ΣN can also be seen as a function on ΣN for n > n.
Prove that the right-hand sides of (1.90) computed for n and n coincide (the
extra terms in the case of n cancel out).

Proof of Lemma 1.4.2. Consider as before x = (xσ ) and let



Z(x) = wσ exp xσ
σ
  
F (x) = Z(x)−n wσ1 · · · wσn f (σ 1 , . . . , σ n ) exp xσ .
σ 1 ,...,σ n ≤n

We recall the formula (1.56):


 ∂2F
ϕ (t) = E U (σ, τ ) (u(t)) ,
σ,τ
∂xσ xτ

that we will apply to this function, carefully collecting the terms. Let us set
  
F1 (x) = wσ1 · · · wσn f (σ , . . . , σ ) exp
1 n
xσ ,
σ 1 ,...,σ n ≤n

so that
F (x) = Z(x)−n F1 (x) ,
and therefore
∂F ∂F1 ∂Z
(x) = Z(x)−n (x) − nZ −n−1 (x) (x)F1 (x) .
∂xσ ∂xσ ∂xσ
Consequently,

∂2F ∂ 2 F1
(x) = Z(x)−n (x)
∂xσ ∂xτ ∂xσ ∂xτ
 
∂Z ∂F1 ∂Z ∂F1
− nZ(x)−n−1 (x) (x) + (x) (x)
∂xσ ∂xτ ∂xτ ∂xσ
2
∂ Z
− nZ(x)−n−1 (x)F1 (x)
∂xσ ∂xτ
∂Z ∂Z
+ n(n + 1)Z(x)−n−2 (x) (x)F1 (x) . (1.91)
∂xσ ∂xτ
Each of the four terms of (1.91) corresponds to a term in (1.90). We will
explain this in detail for the first and the last terms. We observe first that
∂Z
(x) = wσ exp(xσ ) ,
∂xσ
so that the last term of (1.91) is
34 1. The Sherrington-Kirkpatrick Model

C(σ, τ , x) := n(n + 1)Z(x)−n−2 wσ wτ exp(xσ + xτ )F1 (x) .


Consequently,

U (σ, τ )C(σ, τ , u(t)) (1.92)
σ,τ

= n(n + 1)Z(u(t))−n−2 U (σ, τ )wσ wτ exp(uσ (t) + uτ (t))F1 (u(t)) .
σ,τ

Recalling the value of F1 (x), we obtain


      
F1 (u(t)) = f (σ 1 , . . . , σ n ) wσ exp uσ (t) ,
σ 1 ,...,σ n 1≤≤n 1≤≤n

and using this in the second line below we find



U (σ, τ )wσ wτ exp(uσ (t) + uτ (t))F1 (u(t))
σ,τ

= U (σ n+1 , σ n+2 )wσn+1 wσn+2 exp(uσn+1 (t) + uσn+2 (t))F1 (u(t))
σ n+1 ,σ n+2
    
= U (σ n+1 , σ n+2 )f (σ 1 , . . . , σ n ) wσ exp uσ (t) .
σ 1 ,...,σ n+2 1≤≤n+2 1≤≤n+2

It is of course in this computation that the new independent replicas occur.


Combining with (1.92), we get, by definition of · t ,

U (σ, τ )C(σ, τ , u(t)) = n(n + 1) U (σ n+1 , σ n+2 )f (σ 1 , . . . , σ n ) t ,
σ,τ

so that E σ,τ U (σ, τ )C(σ, τ , u(t)) is indeed the last term of (1.90).
Let us now treat in detail the contribution of the first term of (1.91). We
have
∂ 2 F1 
(x) = C, (σ, τ , x) ,
∂xσ ∂xτ  , ≤n

where

C, (σ, τ , x)
  
= 1{σ =σ} 1{σ =τ } wσ1 · · · wσn f (σ 1 , . . . , σ n ) exp xσ1 ,
σ 1 ,...,σ n 1 ≤n

and where 1{σ =σ} = 1 if σ  = σ and is 0 otherwise. Therefore


1.4 Latala’s Argument 35

Z(u(t))−n U (σ, τ )C, (σ, τ , u(t)) (1.93)
σ,τ
  
−n 
= Z(u(t)) U (σ , σ )wσ1 · · · wσn f (σ , . . . , σ ) exp
 1 n
uσ1 (t)
σ 1 ,...,σ n 1 ≤n
  1 n
= U (σ , σ )f (σ , . . . , σ ) t , (1.94)

and the contribution of the second term of (1.91) is indeed the second term
of (1.90). The case of the other terms is similar. 


Exercise 1.4.4. In the proof of Lemma 1.4.2 write in full detail the contri-
bution of the other terms of (1.91).

The reader is urged to complete this exercise, and to meditate the proof
of Lemma 1.4.2 until she fully understands it. The algebraic mechanism at
work in (1.90) will occur on several occasions (since Gibbs’ measures are
intrinsically given by a ratio of two quantities). More generally, calculations
of a similar nature will be needed again and again.
It will often be the case that U (σ, σ) is a number that does not depend
on σ, in which case the third sum in (1.90) cancels the diagonal of the first
one, and (1.90) simplifies to

  
νt (f ) = 2 νt (U (σ  , σ  )f ) − n νt (U (σ  , σ n+1 )f )
1≤< ≤n ≤n

n(n + 1)
+ νt (U (σ n+1 , σ n+2 )f ) . (1.95)
2

What we have done in Lemma 1.4.2 is very general. We now go back to


the study of the Hamiltonian (1.61) and as in (1.62) we define
 
β   √
uσ = √ gij σi σj ; vσ = β zi qσi ; wσ = exp hi σi .
N i<j≤N i≤N i≤N

Then (1.90) still holds true despite the fact that the numbers wσ are now
random. This is seen by first using (1.90) at a given realization of the r.v.s
hi , and then taking a further expectation in the randomness of these. Let us

next compute in the present setting the quantities U (σ  , σ  ). Let us define
1   
R, = σi σi . (1.96)
N
i≤N

This notation will be used in the entire book a countless number of times.
We will also use countless times that by symmetry between replicas, we have
e.g. that ν(R1,2 ) = ν(R1,3 ) or ν(R1,2 R2,3 ) = ν(R1,2 R1,3 ). On the other hand,
36 1. The Sherrington-Kirkpatrick Model

if a function f depends only on σ 1 and σ 2 , it is true that ν((R1,3 − q)2 f ) =


ν((R1,4 − q)2 f ), but not in general that ν((R1,2 − q)2 f ) = ν((R1,3 − q)2 f ).
As in (1.64) we have
 
1  β2 1 β2
U (σ  , σ  ) = 2
R,  − − qR, .
N 4 N 2

Using (1.95) for n = 2, and completing the squares we get

N β2     
νt (f ) = νt (R1,2 − q)2 f − 2 νt (R,3 − q)2 f
2
≤2
 
+ 3νt (R3,4 − q)2 f . (1.97)

Up to Corollary 1.4.7 below, the results are true for every value of q, not
only the solution of (1.74).

Lemma 1.4.5. Consider any number λ > 0. Then


   
νt (R3,4 −q)2 exp λN (R1,2 −q)2 ≤ νt (R1,2 −q)2 exp λN (R1,2 −q)2 . (1.98)

Proof. First, we observe a general form of Hölder’s inequality,

νt (f1 f2 ) ≤ νt (f1τ1 )1/τ1 νt (f2τ2 )1/τ2 , (1.99)

for f1 , f2 ≥ 0, 1/τ1 + 1/τ2 = 1. This is obtained by using Hölder’s inequality


for the probability νt (or by using it successively for · t and then for E).
Using (1.99) with τ1 = k + 1, τ2 = (k + 1)/k we deduce that, using symmetry
between replicas in the second line,
 
νt (R3,4 − q)2 (R1,2 − q)2k
 1/k+1  k/k+1
≤ νt (R3,4 − q)2k+2 νt (R1,2 − q)2k+2
 
= νt (R1,2 − q)2k+2 .

To prove (1.98), we simply expand exp λN (R1,2 − q)2 as a power series of


(R1,2 − q)2 and we apply the preceding inequality to each term, i.e. we write

   (N λ)k  
νt (R3,4 − q)2 exp λN (R1,2 − q)2 = νt (R3,4 − q)2 (R1,2 − q)2k
k!
k≥0
 (N λ)k  
≤ νt (R1,2 − q)2k+2
k!
k≥0
 
= νt (R1,2 − q)2 exp λN (R1,2 − q)2 . 


Combining with (1.97) we get:


1.4 Latala’s Argument 37

Corollary 1.4.6. If λ > 0 then


   
νt exp λN (R1,2 − q)2 ≤ 2N β 2 νt (R1,2 − q)2 exp λN (R1,2 − q)2 . (1.100)

Corollary 1.4.7. For t < λ/2β 2 we have


 
d  
νt exp (λ − 2tβ )N (R1,2 − q)
2 2
≤0,
dt
or in other words, the function
 
t → νt exp (λ − 2tβ 2 )N (R1,2 − q)2 (1.101)

is non-increasing.
Proof. In the function (1.101) there are two sources of dependence in t,
through νt and through the term −2tβ 2 , so that
 
d  
νt exp (λ − 2tβ )N (R1,2 − q)
2 2
dt
 
= νt exp (λ − 2tβ 2 )N (R1,2 − q)2
 
− 2N β 2 νt (R1,2 − q)2 exp (λ − 2tβ 2 )N (R1,2 − q)2 ,

and we use (1.100). 




Proposition 1.4.8. When q is the solution of (1.74), for λ < 1/2 we have
  1
ν0 exp λN (R1,2 − q)2 ≤ √ . (1.102)
1 − 2λ
Whenever, like here, we state a result without proof or reference, the rea-
son is always that (unless it is an obvious corollary of what precedes) the proof
can be found later in the same section, but that we prefer to demonstrate its
use before giving this proof.
At this point we may try to formulate in words the idea underlying the
proof of Theorem 1.4.1: it is to transfer the excellent control (1.102) of R1,2 −q
for ν0 to ν1 using Lemma 1.4.2.
Proof of Theorem 1.4.1. Taking λ = s+2β 2 < 1/2 we deduce from (1.102)
and Corollary 1.4.7 that for all 0 ≤ t ≤ 1,
  1
νt exp (s + 2(1 − t)β 2 )N (R1,2 − q)2 ≤ ,
1 − 2s − 4β 2
because this is true for t = 0 and because the left-hand side is a non-increasing
function of t. Since s + 2(1 − t)β 2 ≥ s this shows that for each t (and in
particular for t = 1) we have
38 1. The Sherrington-Kirkpatrick Model
  1
νt exp sN (R1,2 − q)2 ≤ . 

1 − 2s − 4β 2

As a consequence (1.88) holds uniformly in t and in β ≤ β0 < 1/2, i.e.


 
νt (R1,2 − q)2k ≤ (Kk/N )k , (1.103)

where K does not depend on t or β.


Exercise 1.4.9. Prove that if ε = ±1 we have
1+q
ν0 (1{σi1 =σi2 =ε} ) =
4
and
1−q
ν0 (1{σi1 =−σi2 =ε} ) =.
4
These relations will never be used, so do not worry if you can’t solve this
exercise. Its purpose is to help learning to manipulate simple objects. Some
hints might be contained in the proof of (1.104) below.
Proof of Proposition 1.4.8. Let us first recall that ν0 is associated to the
Hamiltonian (1.68), so that for ν0 there is no correlation between sites, so

this is a (nice) exercise
 in Calculus. Let Yi = βzi q + hi , so (1.68) means
that −H0 (σ) = i≤N σi Yi . Recalling that · 0 denotes an average for the
Gibbs measure with Hamiltonian H0 , we get that σi 0 = thYi and, since
q = Eth2 Yi by (1.74) we have

ν0 (σi1 σi2 ) = E σi1 σi2 0 = E σi 2


0 = Eth2 Yi = q . (1.104)

At this point the probabilistically oriented reader should think of the se-
quence (σi1 σi2 )1≤i≤N as (under ν0 ) an i.i.d. sequence of {−1, 1}-valued r.v.s
of expectation q, for which all kinds of estimates are classical. Nonetheless
we give a simple self-contained proof. The main step of this proof is to show
that for every u we have
  N u2
ν0 exp N u(R1,2 − q) ≤ exp . (1.105)
2
Since (1.105) holds for every value of u it holds when u is a Gaussian r.v.
with Eu2 = 2λ/N , independent of all the other sources of randomness. Taking
expectation in u in (1.105) and using (A.11) yields (1.102).
To prove (1.105), we first evaluate
  
ν0 (exp N u(R1,2 − q)) = ν0 exp u (σi σi − q)
1 2

i≤N
  
= ν0 exp u(σi1 σi2 − q) , (1.106)
i≤N
1.4 Latala’s Argument 39

by independence between the sites. Using that when |ε| = 1 we have exp εx =
chx + shεx = chx + ε shx, we obtain

exp uσi1 σi2 = chu + σi1 σi2 shu ,

and thus
ν0 (exp uσi1 σi2 ) = chu + ν0 (σi1 σi2 ) shu . (1.107)
Therefore (1.104) implies
 
ν0 exp u(σi1 σi2 ) = chu + q shu ,

and consequently
 
ν0 exp u(σi1 σi2 − q) = exp (−qu) (chu + q shu) .

Now, for q ≥ 0 and all u we have

u2
(chu + qshu) exp(−qu) ≤ exp .
2
Indeed the function

f (u) = log(chu + qshu) − qu

satisfies f (0) = 0,
 2
shu + qchu shu + qchu
f (u) = − q ; f (u) = 1 −
chu + qshu chu + qshu

so that f (0) = 0 and f (u) ≤ 1, and therefore f (u) ≤ u2 /2. Thus

  u2
ν0 exp u(σi1 σi2 − q) ≤ exp
2
and (1.106) yields (1.105). This completes the proof. 

Let us recall that we denote by SK(β, h) the right-hand side of (1.73)
when q is as in (1.74). As in the case of pN (β, h) this is a number depending
only on β and the law of h.

Theorem 1.4.10. If β < 1/2 then

K
|pN (β, h) − SK(β, h)| ≤ , (1.108)
N
where K does not depend on N .
40 1. The Sherrington-Kirkpatrick Model

Thus, when β < 1/2, (1.73) is a near equality, and in particular p(β, h) =
limN →∞ pN (β, h) = SK(β, h). Of course, this immediately raises the question
as for which values of (β, h) this equality remains true. This is a difficult
question that will be investigated later. It suffices to say now that, given h,
the equality fails for large enough β, but this statement itself is far from being
obvious.
We have observed that, as a consequence of Hölder’s inequality, the func-
tion β → pN (β, h) is convex. It then follows from (1.108) that, when β < 1/2,
the function β → SK(β, h) is also convex. Yet, this is not really obvious on
the definition of this function. It should not be very difficult to find a calcu-
lus proof of this fact, but what is needed is to understand really why this is
the case. Much later, we will be able to give a complicated analytic expres-
sion (the Parisi formula) for limN →∞ pN (β, h), which is valid for any value
of β, and it is still not known how to prove by a direct argument that this
analytical expression is a convex function of β.
In a statement such as (1.108) the constant K can in principle depend on
β and h. It will however be shown in the proof that for β ≤ β0 < 1/2, it can
be chosen so that it does not depend on β or h.
Proof of Theorem 1.4.10. We have proved in (1.103) that if β ≤ β0 < 1/2
then νt ((R1,2 −q)2 ) ≤ K/N , where K depends on β0 only. Now (1.65) implies
 
 2 
ϕ (t) − β (1 − q)2  ≤ K ,
 4  N

where K depends on β0 only. Thus


 
 2 
ϕ(1) − ϕ(0) − β (1 − q)2  ≤ K ,
 4  N

and

ϕ(1) = pN (β, h) ; ϕ(0) = log 2 + E log ch(β q + h) . 


Theorem 1.4.10 controls the expected value (= first moment) of the quan-
tity N −1 log ZN (β, h) − SK(β, h). In Theorem 1.4.11 below we will be able
to accurately compute the higher moments of this quantity. Of course this
requires a bit more work. This result will not be used in the sequel, so it can
in principle be skipped at first reading. However we must mention that one of
the goals of the proof is to further acquaint the reader with the mechanisms
of integration by parts.
Let us denote by a(k) the k-th moment of a standard Gaussian r.v. (so
that a(0) = 1 = a(2), a(1) = 0 and, by integration by parts, a(k) = Egg k−1 =

(k − 1)a(k − 2)). Consider q as in (1.74) and Y = βz q + h. Let

β 2 q2
b = E(log chY )2 − (E log chY )2 − .
2
1.4 Latala’s Argument 41

Theorem 1.4.11. Assume that the r.v. h is Gaussian (not necessarily cen-
tered). Then if β < 1/2, for each k ≥ 1 we have
  k 
 
E 1 log ZN (β, h) − SK(β, h) − 1 a(k)bk/2  ≤ K
, (1.109)
 N N k/2  N (k+1)/2

where K does not depend on N .

Let us recall that, to lighten notation, we write


 k
1
E log ZN (β, h) − SK(β, h)
N
instead of  k 
1
E log ZN (β, h) − SK(β, h) .
N
A similar convention will be used whenever there is no ambiguity.
The case k = 1 of (1.109) recovers (1.108). We can view (1.109) as a
“quantitative central limit theorem”. With accuracy about N −1/2 , the k-th
moment of the r.v.
 
√ 1
N log ZN (β, h) − SK(β, h) (1.110)
N

is about that of bz where z is a standard Gaussian √ r.v. In particular the
r.v. (1.110) has in the limit the same law as the r.v. bz.
When dealing with central limit theorems, it will be convenient to denote
by O(k) any quantity A such that |A| ≤ KN −k/2 where K does not depend on
N (of course K will depend on k). This is very different from the “standard”
meaning of this notation (that we will never use). Thus we can write (1.109)
as
 k
1 1
E log ZN (β, h) − SK(β, h) = k/2 a(k)bk/2 + O(k + 1) . (1.111)
N N
Let us also note that

O(k)O() = O(k + ) ; O(2k)1/2 = O(k) . (1.112)

Lemma 1.4.12. If the r.v. h is Gaussian (not necessarily centered) then for
any value of β we have
 k
1 
E  log ZN (β, h) − pN (β, h) = O(k) . (1.113)
N

Moreover the constant K implicit in the notation O(k) remains bounded as


both β and the variance of h remain bounded.
42 1. The Sherrington-Kirkpatrick Model

The hypothesis that h is Gaussian can be considerably weakened; but


here is not the place for such refinements.
Proof. Let us write hi = cyi + d, where yi are i.i.d. centered Gaussian. The
key to the proof is that for t > 0 we have
 
t2
P(| log ZN (β, h) − N pN (β, h)| ≥ t) ≤ 2 exp − 2 .
4c N + 2β 2 (N − 1)
(1.114)
This can be proved by an obvious adaptation of the proof of (1.54). Equiv-
alently, to explain the same argument in a different way, let us think of the
quantity log ZN (β, h) as a function F of the variables (gij )i<j and (yi )i≤N .
It is obvious (by writing the value of the derivative) that
   
 ∂F   
  ≤ √β ;  ∂F  ≤ c .
 ∂gij  N  ∂yi


Thus the gradient ∇F of F satisfies ∇F 2 ≤ c2 N + β 2 (N − 1)/2 and (1.47)


implies (1.114). Then (1.114) yields
 
N t2
P(|X| ≥ t) ≤ 2 exp − 2 , (1.115)
A
where
1
X= log ZN (β, h) − pN (β, h)
N
and A2 = 4c2 +2β 2 . Now (1.113)
 ∞ follows from (1.115) by computing moments
through the formula EY k = 0 ktk−1 P(Y ≥ t)dt for Y ≥ 0, as is explained
in detail in Section A.6. 

Proof of Theorem 1.4.11. We use again the path (1.59). Let
1 
A(t) = log exp(−Ht (σ)) (1.116)
N σ
β2t
SK(t) = log 2 + E log chY + (1 − q)2
4
V (t) = A(t) − SK(t)
β 2 q2 t
b(t) = E(log chY )2 − (E log chY )2 − .
2
Thus, the quantities EA(t), SK(t) and b(t) correspond along the interpolation
respectively to the quantities pN (β, h), SK(β, h) and b. In this proof we write
O(k) for a quantity A such that |A| ≤ KN −k/2 where K does not depend on
N or of the interpolation parameter t.
Let us write explicitly the interpolating Hamiltonian (1.59) using (1.62):

β t   √ √
− Ht (σ) = √ gij σi σj + (hi + β 1 − tzi q)σi . (1.117)
N i<j≤N i≤N
1.4 Latala’s Argument 43

It is of
√the type (1.61), but we have replaced β by β t ≤ β and the r.v. h by

h + β 1 − tz q, where z is a standard Gaussian r.v. independent of h. Thus
(1.113) implies
EV (t)k = O(k) . (1.118)
We will prove by induction over k ≥ 1 that
1
EV (t)k = a(k)b(t)k/2 + O(k + 1) . (1.119)
N k/2
For t = 1 this is (1.111). To start the induction, we observe that by Theorem
1.4.10 and (1.117), this is true for k = 1. For the induction step, let us fix k
and assume that (1.119) has been proved for all k ≤ k − 1. Let us define

ψ(t) = EV (t)k .

The basic idea is to prove that


k(k − 1)
ψ (t) = b (t)EV (t)k−2 + O(k + 1) . (1.120)
2N
The induction hypothesis then yields
k(k − 1)
ψ (t) = b (t)a(k − 2)b(t)k/2−1 + O(k + 1)
2N k/2
1 k 
= k/2 a(k) b (t)b(t)k/2−1 + O(k + 1) . (1.121)
N 2
Assume now that we can prove that
1
ψ(0) = a(k)b(0)k/2 + O(k + 1) . (1.122)
N k/2
Then by integration of (1.121) we get
1
ψ(t) = a(k)b(t)k/2 + O(k + 1) , (1.123)
N k/2
which is (1.119). We now start the real proof, the first step of which is to
compute ψ (t). For a given number a we consider

ϕ(t, a) = E(A(t) − a)k , (1.124)

and we compute ∂ϕ(t, a)/∂t using (1.40). This is done by a suitable extension
of (1.60). Keeping the notation of this formula, as well as the notation (1.60),
consider the function W (x) = (x−a)k and for x = (xσ ), consider the function
 
1
F (x) = W log Z(x) .
N
Thus
44 1. The Sherrington-Kirkpatrick Model
 
∂F 1 wτ exp xτ 1
(x) = W log Z(x) .
∂xτ N Z(x) N
If xσ = xτ we then have
 
∂2F 1 wσ wτ exp(xσ + xτ ) 1
(x) = − W log Z(x)
∂xσ ∂xτ N Z(x)2 N
 
1 wσ wτ exp(xσ + xτ ) 1
+ 2 W log Z(x) ,
N Z(x)2 N

while
   
∂2F 1 wσ exp xσ 2
wσ exp 2xσ 1
(x) = − W log Z(x)
∂x2σ N Z(x) Z(x)2 N
 
1 w2 exp 2xσ 1
+ 2 σ 2
W log Z(x) .
N Z(x) N

Therefore, proceeding as in the proof of (1.60), we conclude that the function


 
1
ϕ(t, a) = EW log Z(u(t)) = EW (A(t))
N

satisfies
∂ϕ 1  
(t, a) = E ( U (σ, σ) t − U (σ 1 , σ 2 ) t )W (A(t))
∂t N
1  
+ 2 E U (σ 1 , σ 2 ) t W (A(t)) ,
N
and replacing W by its value this is
∂ϕ k  
(t, a) = E ( U (σ, σ) t − U (σ 1 , σ 2 ) t )(A(t) − a)k−1
∂t N
k(k − 1)  
+ E U (σ 1 , σ 2 ) t (A(t) − a)k−2 . (1.125)
N2
This is a generalization of (1.60), that corresponds to the case k = 1.
There is an alternate way to explain the structure of the formula (1.125)
(but the proof is identical). It is to say that straightforward (i.e. applying
only the most basic rules of Calculus) differentiation of (1.116) yields

1 σ−Ht (σ) exp(−Ht (σ)) 1
A (t) = = −Ht (σ) t ,
N σ exp(−Ht (σ)) N

where
d 1 1
−Ht (σ) := (−Ht (σ)) = √ uσ − √ vσ ,
dt 2 t 2 1−t
so that
1.4 Latala’s Argument 45

∂ϕ k  
(t, a) = kE(A (t)(A(t)−a)k−1 ) = E −Ht (σ) t (A(t)−a)k−1 . (1.126)
∂t N
One then integrates by parts, while using the key relation U (σ, τ ) =
EHt (σ)Ht (τ ). (Of course making this statement precise amounts basically
to reproducing the previous calculation.) The dependence of the bracket · t
on the Hamiltonian creates the first term in (1.125) (we have actually already
done this computation), while the dependence of A(t) on this Hamiltonian
creates the second term.
This method of explanation is convenient to guide the reader (once she
has gained some experience) through the many computations (that will soon
become routine) involving Gaussian integration by parts, without reproduc-
ing the computations in detail (which would be unbearable). For this reason
we will gradually shift (in particular in the next chapters) to this convenient
method of giving a high-level description of these computations. Unfortu-
nately, there is no miracle, and to gain the experience that will make these
formulas transparent to the reader, she has to work through a few of them
in complete detail, and doing in detail the integration by parts in (1.126) is
an excellent start.
Using (1.64) and completing the squares in (1.125) yields
∂ϕ β2k  
(t, a) = − E (R1,2 − q)2 t (A(t) − a)k−1
∂t 4
β2
+ k(1 − q)2 E(A(t) − a)k−1
4
β 2 k(k − 1)  
+ E (R1,2 − q)2 t (A(t) − a)k−2
4 N
β 2 k(k − 1) 2 β2
− q E(A(t) − a)k−2 − k(k − 1)E(A(t) − a)k−2 .
4 N 4N 2
Now, since
d ∂ϕ ∂ϕ
ϕ(t, SK(t)) = (t, SK(t)) + SK (t) (t, SK(t)) ,
dt ∂t ∂a
and
∂ϕ β2
(t, a) = −kE(A(t) − a)k−1 ; SK (t) = (1 − q)2 ,
∂a 4
one gets
ψ (t) = I + II
where
β 2 q 2 k(k − 1)
I=− EV (t)k−2
4 N
β2k   β 2 k(k − 1)  
II = − E (R1,2 − q)2 t V (t)k−1 + E (R1,2 − q)2 t V (t)k−2
4 4 N
β2
− k(k − 1)EV (t)k−2 .
4N 2
46 1. The Sherrington-Kirkpatrick Model

We claim that
II = O(k + 1) .
To see this we note that by (1.118) (used for 2(k − 1) rather than k) we have
E(V (t)2(k−1) ) = O(2(k − 1)) and we write, using (1.103),
   1/2
E (R1,2 − q)2 t V (t)k−1 ≤ (E (R1,2 − q)4 t )1/2 E V (t)2(k−1)
= O(2)O(k − 1) = O(k + 1) .

The case of the other terms is similar. Thus, we have proved that ψ (t) = I +
O(k +1), and since b (t) = −β 2 q 2 /2 we have also proved (1.120). To complete
the induction it remains only to prove (1.122). With obvious notation,
1 
V (0) = (log chYi − E log chY ) .
N
i≤N

The r.v.s Xi = log chYi − E log chY form an i.i.d. sequence of centered vari-
ables, so the statement in that case is simply (a suitable quantitative version
of) the central limit theorem. We observe that by (1.118), for each k, we have
EV (0)k = O(k). (Of course the use of Lemma 1.4.12 here is an overkill.) To
evaluate EV (0)k we use symmetry to write
  k−1 
k
 k−1
 XN
EV (0) = E XN V (0) = E XN +B
N

where B = N −1 i≤N −1 Xi . We observe that since B = V (0) − XN /N , for
each k we have EB k = O(k). We expand the term (XN /N + B)k−1 and since
EXN = 0 we get the relation
k−1
EV (0)k = 2
EXN EB k−2 + O(k + 1) .
N
Using again that B = V (0) − XN /N and since EXN
2
= b(0) we then obtain

k−1
EV (0)k = b(0)EV (0)k−2 + O(k + 1) ,
N
from which the claim follows by induction. 

Here is one more exercise to help the reader think about interpolation
between two Gaussian Hamiltonians uσ and vσ .

Exercise 1.4.13. Consider a (reasonable) function W (y1 , . . . , ym+1 ) of m+1


n
variables. Consider m functions f1 , . . . , fm on ΣN . Compute the derivative
of
ϕ(t) = W (νt (f1 ), . . . , νt (fm ), N −1 log Z(u(t))) ,
where the notation is as usual.
1.4 Latala’s Argument 47

Our next result makes apparent that the (crucial) property ν((R1,2 −
q)2 ) ≤ K/N implies some independence between the sites.
Proposition 1.4.14. For any p and any q with 0 ≤ q ≤ 1 we have
 
E( σ1 · · · σp − σ1 · · · σp )2 ≤ K(p)ν (R1,2 − q)2 , (1.127)
where K(p) depends on p only.
This statement is clearly of importance: it means that when the right-
hand side is small “the spins decorrelate”. (When p = 2, the quantity
σ1 σ2 − σ1 σ2 is the covariance of the spins σ1 and σ2 , seen as r.v.s on the
probability space (ΣN , GN ). The physicists call this quantity the truncated
correlation.) Equation (1.127) is true for any value of q, but we will show
in Proposition 1.9.5 below that essentially the only value of q for which the
quantity ν((R1,2 − q)2 ) might be small is the solution of (1.74).
We denote by · the dot product in RN , so that e.g. R1,2 = σ 1 · σ 2 /N .
A notable feature of the proof of Proposition 1.4.14 is that the only feature
of the model it uses is symmetry between sites, so this proposition can be
applied to many of the models we will study.
Proof of Proposition 1.4.14. Throughout the proof K(p) denotes a num-
ber depending on p only, that need not be the same at each occurrence. The
proof goes by induction on p, and the case p = 1 is obvious. For the induction
from p − 1 to p it suffices to prove that
 
E( σ1 · · · σp − σ1 · · · σp−1 σp )2 ≤ K(p)ν (R1,2 − q)2 . (1.128)
Let σ̇i = σi − σi and σ̇ = (σ̇i )i≤N . Therefore
σ1 · · · σp − σ1 · · · σp−1 σp = σ1 σ2 · · · σp−1 σ̇p .
Using replicas, we have
( σ1 · · · σp − σ1 · · · σp−1 σp )2 = σ1 σ2 · · · σp−1 σ̇p 2
= σ11 σ12 · · · σp−1
1 2
σp−1 σ̇p1 σ̇p2 ,
so that
E( σ1 · · · σp − σ1 · · · σp−1 σp )2 = ν(σ11 σ12 · · · σp−1
1 2
σp−1 σ̇p1 σ̇p2 ) . (1.129)
Using symmetry between sites,
N (N − 1) · · · (N − p + 1)ν(σ11 σ12 · · · σp−1 1 2
σp−1 σ̇p1 σ̇p2 )

= ν(σi11 σi21 σi12 σi22 · · · σi1p−1 σi2p−1 σ̇i1p σ̇i2p )
i1 ,...,ip all different

≤ ν(σi11 σi21 σi12 σi22 · · · σi1p−1 σi2p−1 σ̇i1p σ̇i2p )
all i1 ,...,ip
   
p−1 σ̇ · σ̇ p−1 σ̇ · σ̇
1 2 1 2
p−1
p
= N ν R1,2 = N ν (R1,2 − q
p
) , (1.130)
N N
48 1. The Sherrington-Kirkpatrick Model

where the inequality follows from the fact that since

σi11 σi21 σi12 σi22 · · · σi1p−1 σi2p−1 σ̇i1p σ̇i2p = σi1 σi2 · · · σip−1 σ̇ip 2

all terms are ≥ 0, and where the last equality uses that σ̇ 1 ·σ̇ 2 = 0. Of course
here σ̇ · σ̇ = i≤N σ̇i1 σ̇i2 , and the vector notation is simply for convenience.
1 2

Using the inequality |xp−1 − y p−1 | ≤ (p − 1)|x − y| for |x|, |y| ≤ 1 and the
Cauchy-Schwarz inequality we obtain
    σ̇ 1 · σ̇ 2 
p−1 σ̇ 1 · σ̇ 2  
ν (R1,2 − q p−1 ) ≤ (p − 1)ν |R1,2 − q|  (1.131)
N N
 1 2  1/2
 
2 1/2 σ̇ · σ̇ 2
≤ (p − 1)ν (R1,2 − q) ν .
N

Now we have
σ̇ 1 · σ̇ 2 2 (σ 1 − σ 1 ) · (σ 2 − σ 2 ) 2
=
N N
(σ − σ ) · (σ 2 − σ 4 ) 2
1 3
= .
N

To bound the right-hand side, we move the averages in σ 3 and σ 4 outside the
square (and we note that the function x → x2 is convex). Jensen’s inequality
(1.23) therefore asserts that

(σ 1 − σ 3 ) · (σ 2 − σ 4 ) 2 (σ 1 − σ 3 ) · (σ 2 − σ 4 ) 2
≤ .
N N

Finally we write

(σ 1 − σ 3 ) · (σ 2 − σ 4 ) 2
= (R1,2 − R1,4 − R3,2 + R3,4 )2
N
≤ 4 (R1,2 − q)2 ,
 
using that ( i≤4 xi )2 ≤ 4 i≤4 x2i . Combining the three previous inequalities
and taking expectation and square root we reach
 1/2
σ̇ 1 · σ̇ 2 2  1/2
ν ≤ 2ν (R1,2 − q)2 .
N

Combining with (1.129), (1.130) and (1.131) we then get

N (N − 1) · · · (N − p + 1)E( σ1 · · · σp − σ1 · · · σp−1 σp )2
 
≤ 2(p − 1)N p ν (R1,2 − q)2 ,

and this finishes the proof since


1.4 Latala’s Argument 49

Np
sup <∞. 

N ≥p N (N − 1) · · · (N − p + 1)

As a consequence, when one looks only at a given number of spins, one


fixes β < 1/2 and lets N → ∞, Gibbs’ measure is asymptotically a product
measure. To see this, we first observe that combining (1.127) and (1.89)
implies
K(p)
E( σ1 · · · σp − σ1 · · · σp )2 ≤ . (1.132)
N
Next, given η1 , . . . , ηn ∈ {−1, 1}, consider the set

A = {σ ∈ ΣN ; ∀i ≤ n, σi = ηi } , (1.133)

where the dependence on η1 , . . . , ηn is kept implicit. Then, denoting by 1A


the function such that 1A (σ) = 1 if σ ∈ A and 1A (σ) = 0 otherwise, we have
 
1A (σ) = 2−n (1 + σi ηi ) = 2−n σI ηI ,
i≤n I⊂{1,...,n}
 
where σI = i∈I σi and ηI = i∈I ηi . Thus, using (1.132),
  
GN (A) = 1A = 2−n ηI σI  2−n ηI σi
I⊂{1,...,n} I⊂{1,...,n} i∈I

= 2−n (1 + ηi σi )
i≤n
= μn ({η}) , (1.134)

where η= (η1 , . . . , ηn ) and μn is the product probability on {−1, 1}n with
density i≤n (1+ηi σi ) with respect to the uniform measure. (Let us observe
that μn is the only probability measure on {0, 1}n such that for each i the
average of σi for μn is equal to σi .)
Formally, we have the following.

Theorem 1.4.15. Assume β < 1/2. Denote by GN,n the law of (σ1 , . . . , σn )
under GN , and consider μn as above, the probability on {−1, 1}n with density

i≤n (1 + ηi σi ) with respect to the uniform measure. Then

K(n)
EGN,n − μn 2 ≤ ,
N
where  ·  denotes the total variation distance.

Thus, to understand well the random measure GN,n it remains only to un-
derstand the random sequence ( σi )i≤n . This will be achieved in Theorem
1.7.1 below.
50 1. The Sherrington-Kirkpatrick Model

Proof. By definition of the total variation distance (and (A.79)), it holds



GN,n − μn  = |GN,n ({η}) − μn ({η})| ,
η

where the summation is over η in {−1, 1}n . Since there are 


2N terms in
the summation,
 using the Cauchy-Schwarz inequality as in ( i∈I ai )2 ≤
2
cardI i∈I ai and taking expectation we get

EGN,n − μn 2 ≤ 2n E(GN,n ({η}) − μn ({η}))2 .
η

Now GN,n ({η}) = GN (A) where A is given by (1.133), and the result follows
by making formal the computation (1.134). Namely, we write
   2
(GN (A) − μn ({η}))2 = 2−n ηI ( σI − σi )
I⊂{1,...,n} i∈I
 
≤ 2−n ( σI − σi )2 ,
I⊂{1,...,n} i∈I

so that, taking expectation and using (1.132),


K(n)
E(GN (A) − μn ({η}))2 ≤ . 

N

This result raises all kinds of open problems. Here is an obvious question.

Research Problem 1.4.16. How fast can n(N ) grow so that GN,n(N ) −
μn(N )  → 0 ?
Of course, it will be easy to prove that one can take n(N ) → ∞, but finding
the best rate might be hard. One might also conjecture the following.
Conjecture 1.4.17. When β > 0 we have
lim E inf GN − μ = 2 ,
N →∞

where the infimum is computed over all the classes of measures μ that are
product measures.
Conjecture 1.4.18. When β < 1/2 we have
lim Ed(GN , μ) = 0 ,
N →∞

where μ is the product measure on ΣN such that for each i ≤ N we have


σi dμ(σ) = σi , and where now d denotes the transportation-cost distance
(see Section A.11) associated with the Hamming distance (1.7).
A solution of the previous conjectures would not yield much information.
Are there more fruitful questions to be asked concerning the global structure
of Gibbs’ measure?
1.5 A Kind of Central Limit Theorem 51

1.5 A Kind of Central Limit Theorem


This short section brings forward a fundamental fact, a kind of random central
limit
 theorem (CLT). The usual CLT asserts (roughly speaking) that a sum
ai Xi is nearly Gaussian provided the r.v.s Xi are independent and none of
the terms of the sum is large compared to the sum itself. The situation here
 the terms Xi are not really independent, but we do not look at
is different:
all sums ai Xi , only at sums where the coefficients ai are random.
More specifically consider a probability measure μ = μN on RN (a Gibbs
measure is the case of interest). Assume that for two numbers q and ρ we
have, for large N ,
 1 2 2
x ·x
− q dμ(x1 )dμ(x2 )  0 (1.135)
N
 2
x2
− ρ dμ(x)  0 . (1.136)
N
Consider the case where μ is Gibbs’ measure for the SK model. Then (1.135)
means that (R1,2 −q)2  0, while (1.136) is automatically satisfiedfor ρ = 1,
because μ is supported by ΣN and for σ ∈ ΣN we have σ2 = i≤N σi2 =
N . (We will later consider systems where the individual spins can take values
in R, and (1.136) will become relevant.) Let
 
b = xdμ(x) = xi dμ(x)
i≤N

be the barycenter of μ. The fundamental fact is as follows. Consider indepen-


dent Gaussian standard r.v.s gi and g = (gi )i≤N . Then for a typical value of
g (i.e. unless we have been unlucky enough to pick g in a small exceptional
set), we have

The image of μ under the map x → g · x/ N is nearly a Gaussian

measure of mean g · b/ N and of variance ρ − q. (1.137)
The reader should not worry about the informality of the statement which
is designed only to create the correct intuition. We shall never need a formal
statement, but certain constructions we shall use are based on the intuition
provided by (1.137). The reason why (1.135) and (1.136) imply (1.137) is
very simple. Let us consider a bounded, continuous function f , and the two
r.v.s
   
g·x g·b √
U= f √ dμ(x) and V = Eξ f √ + ξ ρ − q ,
N N
where ξ is a standard Gaussian r.v. independent of g and where throughout
the book we denote by Eξ expectation in the r.v.s ξ only, that is, for
all the other r.v.s given.
52 1. The Sherrington-Kirkpatrick Model

We will show that, given the function f , with probability close to 1 we


have U  V i.e.
   
g·x g·b √
f √ dμ(x)  Eξ f √ + ξ ρ − q . (1.138)
N N
Therefore, given a finite set F of functions, with probability close to 1 (i.e.
“for a typical value of g”), (1.138) occurs simultaneously for each f in F,
which is what we meant by (1.137).
To prove that U  V , we compute

E(U − V )2 = EU 2 + EV 2 − 2EU V ,

and we show that EU 2  EV 2  EU V . Now,


   
g · x1 g · x2
EU = E f √
2
f √ dμ(x1 )dμ(x2 )
N N
   
g · x1 g · x2
= Ef √ f √ dμ(x1 )dμ(x2 ) .
N N

For  = 1, 2, let g  = g · x / N . These two Gaussian r.v.s are such that

x 2 x 1 · x2
E(g  )2 = ; E(g 1 g 2 ) = .
N N
Using (1.135) and (1.136) we see that, generically (i.e. for most of the points
x1 , x2 ) we have E(g  )2  ρ and E(g 1 g 2 )  q. Since the distribution of a
finite jointly Gaussian family (gp ) is determined by the quantities Egp gp , the
√ √ √
pair (g 1 , g 2 ) has nearly the distribution of the pair (z q + ξ 1 ρ − q, z q +

ξ 2 ρ − q) where z, ξ 1 and ξ 2 are independent standard Gaussian r.v.s. Hence
   
g · x1 g · x2 √ √ √ √
Ef √ f √  Ef (z q + ξ 1 ρ − q)f (z q + ξ 2 ρ − q)
N N
√ √
= E(Eξ f (z q + ξ ρ − q))2 ,

the last equality not being critical here, but preparing for future formulas.
This implies that
√ √
EU 2  E(Eξ f (z q + ξ ρ − q))2 . (1.139)

The same argument proves that EU V and EV 2 are also nearly equal to the
right-hand side of (1.139), so that E(U − V )2  0, completing the argument.
In practice, we will need estimates for quantities such as
   
g · x1 g · xn
EW f √ ,..., √ dμ(x ) · · · dμ(x ) ,
1 n
(1.140)
N N
1.6 The Cavity Method 53

where W is a real-valued function and f is now a function of n variables. We


will compare such a quantity with
 √ √ √ √ 
EW Eξ f (z q + ξ 1 ρ − q, . . . , z q + ξ n ρ − q) , (1.141)
using the standard path to obtain quantitative estimates.
The variables ξ will be “invisible” in the present chapter because they
will occur in terms such as
√ √ a2 (ρ − q) √
Eξ exp a(z q + ξ ρ − q) = exp exp az q . (1.142)
2
They will however be essential in subsequent chapters.

1.6 The Cavity Method


As pointed out, the arguments of Theorems 1.3.7 and 1.3.9 are very spe-
cial; but even Latala’s argument is not easy to extend to other models. The
purpose of this section is to develop an other method, the cavity method,
which we will be able to use for many models that do not share the special
features of the SK model. Moreover, even in the case of the SK model, the
cavity method is essential to obtain certain types of information, as we will
demonstrate in the rest of this chapter.
Originally, the cavity method is simply induction over N . To reduce the
system to a smaller system, one removes a spin, creating a “cavity”. The
basic step is to bring forward the dependence of the Hamiltonian on the last
spin by writing
β  
− HN (σ) = √ gij σi σj + hi σi
N i<j≤N i≤N
 
β 
= −HN −1 (σ) + σN √ gi σi + hN , (1.143)
N i<N
where gi = giN and
β  
− HN −1 (σ) = √ gij σi σj + hi σi . (1.144)
N i<j≤N −1 i≤N −1

Thus, if we write ρ = (σ1 , . . . , σN −1 ), we see that −HN −1 (σ) = −HN −1 (ρ)


depends on ρ only. It is the Hamiltonian of a (N − 1)-spin system (for a
different value of β). Let us denote by · − an average for this Hamiltonian.
Then we have the following absolutely fundamental identity.
Proposition 1.6.1. For a function f on ΣN , it holds that
   
Av f (σ) exp σN √βN i<N gi σi + hN

f =   β   . (1.145)
Av exp σN √N i<N gi σi + hN

54 1. The Sherrington-Kirkpatrick Model

Here, Av means average over σN = ±1; the result of this averaging is a


function of ρ only, which is then integrated with respect to · − . Of course
the denominator is simply
 
β 
ch √ gi σi + hN
N i<N −

and it can only help that it is always ≥ 1.


Proof. There is no magic. One replaces each of the two brackets on the right-
hand side of (1.145) by their definition; each of these brackets is a fraction.
The denominators are the same and cancel out. What remains is
   β   
i<N gi σi + hN − HN −1 (σ)

ρ Avf (σ) exp σN N
   β    ,
i<N gi σi + hN − HN −1 (σ)

ρ Av exp σN N

where, to lighten the formula, we have written AvU rather than Av(U ) both
in numerator and denominator. Recalling (1.144) this is
  
ρ Avf (σ) exp − HN (σ)
   .
ρ Av exp − HN (σ)

Multiplying both numerator and denominator by 2 and recalling the meaning


of Av we see that this quantity is f . 

Let us assume now that the (N −1)-spin system with Hamiltonian −HN −1
behaves well (i.e. satisfies (1.89)). Then, according to the intuition
√  developed
in Section 1.5, we expect that the maps (σ1 , . . . , σN −1 ) → (β/ N ) i<N gi σi
behaves, under Gibbs’ measure like a Gaussian r.v. To compute the right-
hand side of (1.145), we will follow the intuition of comparing a quantity as
in (1.140) to a quantity as in (1.141) (remembering that we can forget about
the variables√ξ because
 of (1.142))). For this, we will replace in (1.145) the

quantity (β/ N ) i<N gi σi by βz q (where 0 ≤ q ≤ 1 will be chosen later),
that is, we will consider the Hamiltonian

− HN −1 (ρ) + σN (βz q + hN ) . (1.146)

This Hamiltonian is the Hamiltonian of an N -spin system, but in which the


last spin is “decoupled” from the first (N − 1)-spins. It turns out to be easier
to compare the N -spin system to this decoupled system rather than to the
(N − 1)-spin system. We will interpolate between the Hamiltonians (1.143)
and (1.146) using
 
√ β  √ √
−Ht (σ) = −HN −1 (ρ)+σN t√ gi σi + 1 − tβz q+hN . (1.147)
N i<N

We denote by · t an average for the corresponding Gibbs measure, and


νt (·) = E · t . The notations are identical to those of the previous sections,
1.6 The Cavity Method 55

for a different interpolating Hamiltonian. This should not create confusion


since from now on, at least in the present chapter, νt will always refer to
the interpolating Hamiltonian (1.147). The Hamiltonian defined by (1.59)
and (1.62) was designed to compare the Hamiltonian of the SK model with
a situation where all the spins are independent of each other. In contrast,
the Hamiltonian (1.147) is designed to compare the SK model to a situation
where the last spin is independent of the first N − 1 spins.

We write, for σ  , σ  ∈ ΣN

− 1   
R,  = σi σi . (1.148)
N
i<N

To lighten notation, when considering replicas σ 1 , σ 2 , . . . we write



ε = σN .

With this notation we have


− ε ε
R, = R,  + . (1.149)
N
The fact that ν0 decouples the last spin is expressed by the following,
where

Y = βz q + h .

Lemma 1.6.2. For any function f − on ΣN n


−1 and any set I ⊂ {1, . . . , n}
we have
    
ν0 f − εi = ν0 εi ν0 (f − ) = E(thY )cardI ν0 (f − ) . (1.150)
i∈I i∈I

Proof. Since when t = 0 the Hamiltonian Ht is the sum of a term depending


only on the first N − 1 spins and of a term depending only on the last spin
it should be obvious that
 
f− εi = f− 0 εi = f − 0 (thY )cardI
i∈I 0 i∈I 0

and the result follows taking expectation, since the randomnesses of Y and
HN −1 are independent. 

Lemma 1.6.2 in particular computes ν0 (f ) when f depends only on the
last spin, with formulas such as ν0 (ε1 ε2 ε3 ) = Eth3 Y.

The fundamental tool is as follows, where we recall that ε = σN .
n
Lemma 1.6.3. Consider a function f on ΣN = (ΣN )n ; then for 0 < t < 1
we have
56 1. The Sherrington-Kirkpatrick Model

d 

νt (f ) := νt (f ) = β 2 νt (f ε ε (R,  − q))
dt
1≤< ≤n


− β2n νt (f ε εn+1 (R,n+1 − q))
≤n
n(n + 1) −
+ β2 νt (f εn+1 εn+2 (Rn+1,n+2 − q)) (1.151)
2
and also

νt (f ) = β 2 νt (f ε ε (R, − q))
1≤< ≤n

− β2n νt (f ε εn+1 (R,n+1 − q))
≤n
n(n + 1)
+ β2 νt (f εn+1 εn+2 (Rn+1,n+2 − q)) . (1.152)
2
This fundamental formula looks very complicated the first time one sees it,
although the shock should certainly be milder once one has seen (1.95). A
second look reveals that fortunately as in (1.95) the complication is only
algebraic. Counting terms with their order of multiplicity, the right-hand side

of (1.151) is the sum of 2n2 simple terms of the type ±β 2 νt (f ε ε (R,  − q)).

Proof. The formula (1.151) is the special case of formula (1.95) where

β  √
uσ = √ σN gi σi ; vσ = βσN z q (1.153)
N i<N

wσ = exp (−HN −1 (ρ) + hN σN ) ,


so that (1.55) implies:

 β2 −
U (σ  , σ  ) = ε ε (R, − q) .
2
Finally (1.152) follows from (1.151) and (1.149), as the extra terms cancel
out since ε2 = 1. 

The reader has observed that the choice (1.153) is fundamentally different
from the choice (1.62). In words, in (1.153) we decouple the last spin from the
others, rather than “decoupling all the spins at the same time” as in (1.62).
Since the formula (1.151) is the fundamental tool of the cavity method,
we would like to help the reader overcome his expected dislike of this formula
by explaining why, if one leaves aside the algebra, it is very simple. It helps

to think of R,  − q as a small quantity. Then all the terms of the right-hand

side of (1.151) are small, and thus, ν(f ) = ν1 (f ) ∼ ν0 (f ). This is very helpful
when f depends only on the last spin, e.g. f (σ) = ε1 ε2 because in that case
1.6 The Cavity Method 57

we can calculate ν0 (f ) using Lemma 1.6.2. That same lemma lets us also

simplify the terms ν0 (f ε ε (R,  − q)), at least when f does not depend on

the last spin. We will get then very interesting information simply by writing
that ν(f ) ∼ ν0 (f ) + ν0 (f ).
For pedagogical reasons, we now derive some of the results of Section 1.4
through the cavity method.
Lemma 1.6.4. For a function f ≥ 0 on ΣN
n
, we have

νt (f ) ≤ exp(4n2 β 2 )ν(f ) . (1.154)



Proof. Here of course as usual ν(f ) = ν1 (f ). Since |R,  | ≤ 1 and q ∈ [0, 1]

we have |R, − q| ≤ 2, so that (1.151) yields

|νt (f )| ≤ 4n2 β 2 νt (f ) (1.155)

and we integrate. 


n
Proposition 1.6.5. Consider a function f on ΣN , and τ1 , τ2 > 0 with
1/τ1 + 1/τ2 = 1. Then we have

|ν(f ) − ν0 (f )| ≤ 2n2 β 2 exp(4n2 β 2 )ν(|f |τ1 )1/τ1 ν(|R1,2 − q|τ2 )1/τ2 . (1.156)

Proof. We have
 
 1 
|ν(f ) − ν0 (f )| =  νt (f ) dt ≤ sup |νt (f )| .
0 0<t<1

Now, Hölder’s inequality for νt implies

|νt (f ε ε (R, − q))| ≤ νt (|f ||R, − q|) ≤ νt (|f |τ1 )1/τ1 νt (|R, − q|τ2 )1/τ2

and thus by (1.152) (and since n(n + 1)/2 + n2 + n(n − 1)/2 = 2n2 ),

|νt (f )| ≤ 2n2 β 2 νt (|f |τ1 )1/τ1 νt (|R1,2 − q|τ2 )1/τ2 .

We then use (1.154) for |f |τ1 and |R1,2 − q|τ2 . 




Proposition 1.6.6. There exists β0 > 0 such that if β ≤ β0 then


  2
ν (R1,2 − q)2 ≤ ,
N
where q is the solution of (1.74).

The larger the value of β0 , the harder it is to prove the result. It seems
difficult by the cavity method to reach the value β0 = 1/2 that we obtained
with Latala’s argument in (1.87) and (1.88).
58 1. The Sherrington-Kirkpatrick Model


Proof. Recalling that ε = σN , we use symmetry among sites to write
  1 
ν (R1,2 − q)2 = ν((σi1 σi2 − q)(R1,2 − q)) = ν(f ) (1.157)
N
i≤N

where
f = (ε1 ε2 − q)(R1,2 − q) .
The simple idea underlying (1.157) is simply “to bring out as much as pos-
sible of the dependence on the last spin”. This is very natural, since the
cavity method brings forward the influence of this last spin. It is nonetheless
extremely effective.
Using (1.149), and since ε2 = 1, we have

1 −
f = (ε1 ε2 − q)(R1,2 − q) = (1 − ε1 ε2 q) + (ε1 ε2 − q)(R1,2 − q) .
N
The key point is that Lemma 1.6.2 implies
− −
ν0 ((ε1 ε2 − q)(R1,2 − q)) = ν0 (ε1 ε2 − q)ν0 (R1,2 − q)

= (E th2 Y − q)ν0 (R1,2 − q)
=0

because ν0 (ε1 ε2 ) = Eth2 Y using Lemma 1.6.2 again and since (1.74) means
that q = E th2 Y . Furthermore,
1 1 1
ν0 (f ) = ν0 (1 − ε1 ε2 q) = (1 − qE th2 Y ) = (1 − q 2 ) . (1.158)
N N N
We now use (1.156) with τ1 = τ2 = 2 and n = 2. Since |ε1 ε2 − q| ≤ 2, we
get    
|ν(f ) − ν0 (f )| ≤ 16 β 2 exp(16β 2 ) ν (R1,2 − q)2
and using (1.157) and (1.158),
  1    
ν (R1,2 − q)2 ≤ + 16 β 2 exp(16β 2 ) ν (R1,2 − q)2 .
N
Thus, if β0 is chosen so that
1
16 β02 exp 16β02 ≤
2
we obtain
  1 1  
ν (R1,2 − q)2 ≤ + ν (R1,2 − q)2 , (1.159)
N 2
and thus
  2
ν (R1,2 − q)2 ≤ . 

N
1.6 The Cavity Method 59

In essence the previous proof is a kind of contraction argument, as is


shown by (1.159). When β is small “the operation of adding one spin improves
the behavior of (R1,2 − q)2 ”. A great many of the arguments we will use to
control various models under a “high-temperature” condition will be of the
same nature, although they will typically require more work.
An elementary inductive argument will allows us to control the higher
moments of (R1,2 − q)2 .

Proposition 1.6.7. There exists β0 > 0 such that for β ≤ β0 and any k ≥ 1
we have  k
  64k
ν (R1,2 − q) 2k
≤ . (1.160)
N
In Section A.6 we explain a general principle relating growth of moments
and exponential integrability. This principle shows that (1.160) implies that
for a certain constant L we have
 
N
ν exp (R1,2 − q)2 ≤ 2 ,
L

a statement similar to (1.102).


Proof of Proposition 1.6.7. For 1 ≤ n ≤ N , let
1 
An = (σi1 σi2 − q) ,
N
n≤i≤N

so that R1,2 − q = A1 . We will prove by induction over k that (provided


β ≤ β0 ) we have
 k
 2k  64k
∀n ≤ N, ν An ≤ . (1.161)
N
This tricky induction hypothesis should not mislead the reader into thinking
that the argument is difficult. It is actually very robust, and the purpose of the
tricky hypothesis is simply to avoid a few lines of unappetizing computations.
To perform the induction from k to k + 1, we observe that we can assume
n < N , since if n = N (1.161) is always true because |AN | ≤ 2/N . Symmetry
between sites implies
  1    N −n+1
ν A2k+2
n = ν (σi1 σi2 − q)A2k+1
n = ν(f ) ≤ |ν(f )| ,
N N
n≤i≤N
(1.162)
where
f = (ε1 ε2 − q)A2k+1
n .
It follows that  
ν A2k+2
n ≤ |ν0 (f )| + sup |νt (f )| . (1.163)
t
60 1. The Sherrington-Kirkpatrick Model

We first evaluate ν0 (f ). Let


1 
A = (σi1 σi2 − q) .
N
n≤i≤N −1

Lemma 1.6.2 implies


 
ν0 (ε1 ε2 − q)A 2k+1 = 0 ,

so that since |ε1 ε2 − q| ≤ 2,


    
|ν0 (f )| = ν0 (ε1 ε2 − q)A2k+1
n − ν0 (ε1 ε2 − q)A 2k+1 
≤ 2ν0 (|A2k+1
n − A 2k+1 |) . (1.164)

We use the inequality

|x2k+1 − y 2k+1 | ≤ (2k + 1)|x − y|(x2k + y 2k )

for x = An and y = A . Since |x − y| ≤ 2/N we deduce from (1.164) that

4(2k + 1)  
|ν0 (f )| ≤ ν0 (A 2k ) + ν0 (A2k
n ) . (1.165)
N
Assuming β0 ≤ 1/8, we obtain from (1.154) that

νt (f ∗ ) ≤ 2ν(f ∗ ) , (1.166)

whenever f ∗ ≥ 0 is a function on ΣN
2
. Then (1.165) implies

8(2k + 1)  
|ν0 (f )| ≤ ν(A 2k ) + ν(A2k
n ) .
N
We now observe that A and An+1 are equal in distribution under ν because
n < N . Thus the induction hypothesis yields
 k  k+1
16(2k + 1) 64k 1 64(k + 1)
|ν0 (f )| ≤ ≤ . (1.167)
N N 2 N

Next, we compute νt (f ) using (1.152) with n = 2. There are 8 terms (counting


them with their order of multiplicity), and in each of them we bound ε1 ε2 − q
by 2. We apply Hölder’s inequality with τ1 = (2k+2)/(2k+1) and τ2 = 2k+2
in the first line and (1.166) in the second line to get
 1/τ1  1/τ2
|νt (f )| ≤ 16β 2 νt A2k+2
n νt (R1,2 − q)2k+2
 1/τ1  1/τ2
≤ 32β 2 ν A2k+2
n ν (R1,2 − q)2k+2
 
≤ 32β 2 ν(A2k+2
n ) + ν((R1,2 − q)2k+2 ) ,
1.6 The Cavity Method 61

using that xy ≤ xτ1 + y τ2 . (A nice feature of using Hölder’s inequality is that


“it separates replicas”, and in the end we only need to consider two replicas.)
Combining with (1.163) and (1.167), when 32β02 ≤ 1/4 we get
 k+1
  1 64(k + 1) 1 
ν A2k+2
n ≤ + ν(A2k+2
n ) + ν((R1,2 − q)2k+2 ) . (1.168)
2 N 4

Since A1 = R1,2 − q, when n = 1 the previous inequality implies:


 k+1
    64(k + 1)
ν (R1,2 − q) 2k+2
=ν A2k+2
1 ≤ . (1.169)
N

Using (1.169) in (1.168) yields that for the other values of n as well we have
 k+1
  64(k + 1)
ν A2k+2
n ≤ . 

N

The following provides another method to estimate pN (β, h).

Proposition 1.6.8. For any choices of β, h, q (with q ∈ [0, 1]) we have


 
1
|(N + 1)pN +1 (β, h) − N pN (β, h) − A(β, h, q)| ≤ K + ν(|R1,2 − q|) ,
N
(1.170)
where
β2 √
A(β, h, q) = log 2 + (1 − q)2 + E log ch(βz q + h)
4
and where K depends only on β and h.

As a consequence of this formula, by summation, and with obvious nota-


tion we get
 
log N 1 
|pN (β, h) − A(β, h, q)| ≤ K + νM (|R1,2 − q|) . (1.171)
N N
M ≤N

The proof of Proposition 1.6.8 does not use Guerra’s interpolation (i.e the in-
terpolation of Theorem 1.3.7), but rather an explicit formula ((1.176) below)
that is the most interesting part of this approach. This method is precious
in situations where we do not wish to (or cannot) use interpolation. Several
such situations will occur later. Another positive feature of (1.170) is that it
is valid for any value
 of β, h and q. The way this provides information is that
the average N −1 M ≤N νM (|R1,2 − q|) cannot be small unless the left-hand
side of (1.171) is small. Let us also remark that combining (1.171) with (1.73)
shows that this average can be small only if A(β, h, q) is close to its infimum
in q, i.e. only if q is a near solution of (1.74).
62 1. The Sherrington-Kirkpatrick Model

On the other hand, the best we can do about the right-hand side of (1.170)
is to write (when β ≤ β0 )
 1/2 K
νN (|R1,2 − q|) ≤ νN (R1,2 − q)2 ≤√ ,
N

so that (1.170) does not recover (1.108), since it gives only a rate K/ N
instead of K/N . It is possible however to prove a better version of (1.170),
where one replaces the error term ν(|R1,2 − q|) by ν((R1,2 − q)2 ). This essen-
tially requires replacing the “order 1 Taylor expansions” by “order 2 Taylor
expansions”, a technique that will become familiar later.
Proof. In this proof we will consider an (N + 1)-spin system for different
values of β, so we write the Hamiltonian as HN +1,β to make clear which value
of β is actually used. Consider the number β+ given by β+ = β 1 + 1/N ,
so that
β β
√ + =√ .
N +1 N
We write the Hamiltonian −HN +1,β+ (σ1 , . . . , σN +1 ) of an (N + 1)-spin
system with parameter β+ rather than β, and we gather the terms containing
the last spin as in (1.143), so that
 
β 
−HN +1,β+ (σ1 , . . . , σN +1 ) = −HN (σ1 , . . . , σN )+σN +1 √ gi σi +hN +1
N i≤N

where gi = gi,N +1 . With obvious notation, the identity


 
β 
ZN +1 (β+ ) = 2ZN (β) ch √ gi σi + hN +1 (1.172)
N i≤N

holds, where of course ZN (β) = σ exp(−HN (σ)). This is obvious if one re-
places the bracket by its value and one writes that 2ch(x) = exp x + exp(−x).
Hence taking logarithm and expectation we obtain

(N + 1)pN +1 (β+ , h) = N pN (β, h) + log 2


 
β 
+ E log ch √ gi σi + hN +1 . (1.173)
N i≤N

On the left-hand side we have pN +1 (β+ , h) rather than pN +1 (β, h), and we
proceed to relate these two quantities. Consider a new independent sequence
gij of standard Gaussian r.v.s. Then
D
− HN +1,β+ (σ1 , . . . , σN +1 ) = −HN +1,β (σ1 , . . . , σN +1 )
β 
+ gij σi σj , (1.174)
N (N + 1) i<j≤N +1
1.6 The Cavity Method 63

where D means equality in distribution. This is because


1 1 D 1
√ gij + gij = √ gij
N +1 N (N + 1) N

since 1/N = 1/(N + 1) + 1/(N (N + 1)). We then observe the identity

D β 
ZN +1 (β+ ) = ZN +1 (β) exp gij σi σj ,
N (N + 1) i<j≤N +1

where · denotes an average for the Gibbs measure with Hamiltonian


−HN +1,β . (Please note that this · does not indicate a derivative of any
kind, but rather a shortage of available symbols.) This is proved as in (1.172).
Thus, taking logarithm and expectation,

(N + 1)pN +1 (β+ , h) = (N + 1)pN +1 (β, h)


β 
+ E log exp gij σi σj . (1.175)
N (N + 1) i<j≤N +1

Comparing (1.175) and (1.173) we get

(N + 1)pN +1 (β, h) − N pN (β, h)


 
β 
= log 2 + E log ch √ gi σi + hN +1
N i≤N
β 
− E log exp gij σi σj . (1.176)
N (N + 1) i<j≤N +1

To prove (1.170) we will calculate the last two terms of (1.176). The next
exercise provides motivation for the result. The second part of the exercise is
rather challenging, and should be all the more profitable.

Exercise 1.6.9. Convince yourself, using the arguments of Section 1.5, that
one should have
 
β  β2 √
E log ch √ gi σi + hN +1  (1 − q) + E log ch(βz q + h) .
N i≤N 2

Then extend the arguments of Section 1.5 to get convinced that one should
have
β  β2
E log exp gij σi σj  (1 − q 2 ) .
N (N + 1) i<j≤N +1 4

The rigorous computation of the last two terms of (1.176) uses suitable in-
terpolations. This takes about three pages. In case the reader finds the detail
64 1. The Sherrington-Kirkpatrick Model

of the arguments tedious, she can simply skip them, they are not important
for the sequel. Let us consider a (well behaved) function f and for σ ∈ ΣN ,
let us consider a number wσ > 0. For x = (xσ ), let us consider the function

F (x) = log wσ f (xσ ) .
σ

Let us consider two independent


√ jointly Gaussian families (uσ ) and (vσ ), and

define as usual uσ (t) = tuσ + 1 − tvσ , u(t) = (uσ (t))σ and
1
U (σ 1 , σ 2 ) := (Euσ1 uσ2 − Evσ1 vσ2 ) .
2
Then, a computation very similar to that leading to (1.58) shows that

d 1 
EF (u(t)) = E wσ U (σ, σ)f (uσ (t)) (1.177)
dt D(u(t)) σ
1 
−E 2
wσ1 wσ2 U (σ 1 , σ 2 )f (uσ1 (t))f (uσ2 (t)) ,
D(u(t)) 1 2
σ ,σ

where D(u(t)) = σ wσ f (uσ (t)). Let us now consider the average · for
the Gibbs measure with Hamiltonian −H(σ) = log wσ . Then (1.177) simply
means that
d U (σ, σ)f (uσ (t))
E log f (uσ (t)) = E
dt f (uσ (t))
U (σ 1 , σ 2 )f (uσ1 (t))f (uσ2 (t))
−E . (1.178)
f (uσ (t)) 2

Let us consider the case where


β  √
uσ = √ gi σi ; vσ = βz q
N i≤N

so that
1 β2
U (σ 1 , σ 2 ) = (Euσ1 uσ2 − Evσ1 vσ2 ) = (R1,2 − q) .
2 2
Let us now define

ϕ(t) = E log ch(uσ (t) + hN +1 ) ,

where the bracket means that the function σ → ch(uσ (t)+hN +1 ) is averaged
for the Gibbs measure.
Let us apply the formula (1.178) to the case where f (x) = ch(x + hN +1 )
and wσ = exp(−HN (σ)), at a given realization of hN +1 and HN . (These
1.6 The Cavity Method 65

quantities are random, but their randomness is independent of the random-


nesses of uσ and vσ .) Let us finally take a further expectation (this time in
the randomness of hN +1 and HN ). We then get
U (σ, σ)ch(uσ (t) + hN +1 )
ϕ (t) = E
ch(uσ (t) + hN +1 )
U (σ 1 , σ 2 )sh(uσ1 (t) + hN +1 )sh(uσ2 (t) + hN +1 )
−E .
ch(uσ (t) + hN +1 ) 2
Since U (σ, σ) = β 2 (1−q)/2 the first term is β 2 (1−q)/2, and since ch(uσ (t)+
hN +1 ) ≥ 1 we obtain
 β2  β2
 
ϕ (t) − (1 − q) ≤ E |R1,2 − q||sh(uσ1 (t) + hN +1 )sh(uσ2 (t) + hN +1 )| .
2 2
(1.179)
Taking first expectations in the r.v.s gi , z and hN +1 (which are independent
of the randomness of · ), we get
 β2 
 
ϕ (t) − (1 − q) ≤ Kν(|R1,2 − q|) , (1.180)
2
where K does not depend on N . Therefore, we have
   
  β2 √ 
E log ch √β g σ + h − (1 − q) − E log ch(βz q + h)
 N i≤N
i i N +1
2 
 β2 
 
= ϕ(1) − ϕ(0) − (1 − q)
2
 β2 
 
≤ supϕ (t) − (1 − q) ≤ Kν(|R1,2 − q|) . (1.181)
t 2
We use a similar procedure to evaluate the last term of (1.176). We will
use (1.178) for the function f (x) = exp x and N + 1 instead of N . We denote
now by τ = (σ1 , . . . , σN +1 ) ∈ ΣN +1 the generic element of ΣN +1 , and we
consider the case where
β  β
uτ = gij σi σj ; vτ = √ qz .
N (N + 1) i<j≤N +1 2

Let us set
1  1 2
R1,2 = R1,2 (τ 1 , τ 2 ) = σi σi .
N
i≤N +1

Using (1.63) for N + 1 rather than N we get


  2 
N + 1 β2 1 1
Euτ 1 uτ 2 = 1 2
σi σi −
N 2 N +1 N +1
i≤N +1
 
β2 N 1
= R1,2 2 − ,
2 N +1 N
66 1. The Sherrington-Kirkpatrick Model

and thus
 
1 β2 N 1
U (τ , τ ) = (Euτ 1 uτ 2 − Evτ 1 vτ 2 ) =
1 2
R1,2 2 − q 2 − .
2 4 N +1 N
We choose wτ = exp(−HN +1,β (τ )), and we define

ϕ(t) = E log exp(uτ (t)) .

Using (1.178) we find that

U (τ , τ ) exp(uτ (t)) U (τ 1 , τ 2 ) exp(uτ 1 (t) + uτ 2 (t))


ϕ (t) = E −E ,
exp uτ (t) exp uτ (t) 2
and since R (τ , τ ) = (N + 1)/N we have

β2
U (τ , τ ) = (1 − q 2 )
4
and thus
 β2  |R1,2 2 − q 2 | exp(uτ 1 (t) + uτ 2 (t))
 
ϕ (t) − (1 − q 2 ) ≤ E .
4 exp uτ (t) 2
Now R1,2 ≤ (N + 1)/N ≤ 2 and |q| ≤ 1 so that

2
|R1,2 2 − q 2 | ≤ 3|R1,2 − q| ≤ 3|R1,2 − q| +
N
and therefore
 β2  K |R1,2 − q| exp(uτ 1 (t) + uτ 2 (t))
 
ϕ (t) − (1 − q 2 ) ≤ + 3E . (1.182)
4 N exp uτ (t) 2
To finish the proof it suffices to bound this last term by Kν(|R1,2 − q|), since
then (1.182) gives, since ϕ(0) = 0,
 
 β  β2 
E log exp gij σi σj − 2 
(1 − q )
 N (N + 1) 1≤i<j≤N +1 4
 β2  K
 
= ϕ(1) − ϕ(0) − (1 − q 2 ) ≤ + Kν(|R1,2 − q|) ,
4 N
and combining with (1.181) and (1.176) finishes the proof.
To bound the last term of (1.182), we consider the function
|R1,2 − q| exp(uτ 1 (t) + uτ 2 (t))
ψ(t) = 3E .
exp uτ (t) 2
Let us set wτ = exp(−HN +1,β (τ )), and consider the Hamiltonian −Ht (τ ) =
log wτ + uτ (t). Denoting by · t an average for this Hamiltonian, we have
1.7 Gibbs’ Measure; the TAP Equations 67

ψ(t) = 3E |R1,2 − q| t .
We compute ψ (t) by Lemma 1.4.2, used for N + 1 rather than N . The exact
expression thus obtained is not important. What matters here is that using
the bound |U (τ 1 , τ 2 )| ≤ K, we find an inequality
|ψ (t)| ≤ Kψ(t) ,
and by integration this shows that ψ(t) ≤ Kψ(1). Denoting by · + an average
for the Gibbs measure with Hamiltonian HN +1,β+ and using (1.174) we then
observe the identity
ψ(1) = 3E |R1,2 − q| 1 = 3E |R1,2 − q| + .
Next, using the cavity method for the Hamiltonian −HN +1,β+ we obtain
E |R1,2 − q| +
   
Av|R1,2 − q| exp ≤2 σN 
+1
√β
N i≤N gi,N +1 σi

+ h N +1
=E   2
Av exp σN +1 √βN i≤N gi,N +1 σi + hN +1
  
β 
≤ E |R1,2 − q|Av exp σN +1 √
 
gi,N +1 σi + hN +1 ,
≤2
N i≤N

+1 = ±1, and since Av exp(σN +1 x) = chx ≥



where Av means average over σN
1. Taking expectation in the r.v.s gi,N +1 and hN +1 we conclude that this is
≤ Kν(|R1,2 − q|). 


1.7 Gibbs’ Measure; the TAP Equations


In this
 section we assume that the external field term of the Hamiltonian
is h i≤N σi , although it probably would not require much more effort to
handle the case of a term i≤N hi σi .
We have shown in Theorem 1.4.15 that when we fix a number n of spins,
and we look at the behavior of these n spins under Gibbs’ measure, it is nearly
determined by the random sequence ( σi )i≤n . What is the behavior of this
sequence? Here, again, the situation is as simple as possible: the sequence
( σi )i≤n is asymptotically independently identically distributed. Moreover
we can provide a precise rate for this.
Theorem 1.7.1. Given β < 1/2, and an integer n, we can find independent
standard Gaussian r.v.s (zi )i≤n such that
 √ 2 K
E σi − th(βzi q + h) ≤ , (1.183)
N
i≤n

where q is the solution of (1.74) i.e. q = Eth2 (βz q + h), and where K does
not depend on N (but will of course depend on n).
68 1. The Sherrington-Kirkpatrick Model

Exercise
 1.7.2.
 Assume that (1.132) and Theorem 1.7.1 hold. Prove that
ν (R1,2 − q)2 ≤ K/N . (Hint: replace R1,2 by its value, expand and use
symmetry between sites.)
Research Problem 1.7.3. Find approximation results when n = n(N ) →
∞. (The level of the problem might depend upon how much you ask for.)
We recall the notation ρ = (σ1 , . . . , σN −1 ), and we consider the Hamilto-
nian
β  
− HN −1 (ρ) = √ gij σi σj + h σi . (1.184)
N i<j≤N −1 i≤N −1

This is the Hamiltonian of the SK model of an (N − 1)-spin system, but the


value of β has been changed into β− such that
β β
√ − =√ .
N −1 N
Let us note that |β − β− | ≤ K/N . We denote by · − an average for the
corresponding Gibbs’ measure. We recall that we write gi = giN for i < N .
The following fact essentially allows us to compute σN as a function of the
(N − 1)-spin system.
Lemma 1.7.4. For β < 1/2 we have
  2
β  K
E σN − th √ gi σi −+h ≤ (1.185)
N i<N N

K
E( σ1 − σ1 −)
2
≤ . (1.186)
N
We will prove this at the end of the section as a consequence of a general
principle (Theorem 1.7.11 below), but we can explain right now why (1.185)
is true. The cavity method (i.e. (1.145)) implies
  β  
sh √N i≤N −1 gi σi + h −
σN =   β   .
ch √N i≤N −1 gi σi + h −

As we have seen in (1.137), under · − , the cavity field √1N i≤N −1 gi σi is

approximately Gaussian with mean √1N i≤N −1 gi σi − and variance 1 − q;
and if z is a Gaussian r.v. with expectation μ and arbitrary variance, one has
Eshz
= thμ .
Echz
Relation (1.185) is rather fundamental. Not only is it the key ingredient to
Theorem 1.7.1, but it is also at the root of the Thouless-Anderson-Palmer
(TAP) equations that are stated in (1.192) below.
1.7 Gibbs’ Measure; the TAP Equations 69

Replacing β by β− = β 1 − 1/N slightly changes q into q− such that



q− = Eth2 (β− z q− + h). The following should not come as a surprise. We
recall that L denotes a universal constant.
Lemma 1.7.5. For β < 1/2 we have
L
|q − q− | ≤ . (1.187)
N
Proof. This proof is straightforward and uninteresting. If we define F (β, q) =

Eth2 (βz q + h) and define q(β) by

q(β) = F (β, q(β))

then
∂F
∂β (β, q(β))
q (β) = .
1 − ∂F
∂q (β, q(β))

Now if f (x) = th2 (x), we have f (x) = (2 − 4 sh2 (x))/ch4 (x) ≤ 2.


Computation using Gaussian integration by parts shows that ∂F
∂β (β, q) =
√ ∂F 2 √
βqEf (βz q + h) remains bounded and that ∂q (β, q) = (β /2)Ef (βz q +
h) ≤ 1/4. Therefore q (β) remains bounded for β ≤ 1/2. 


Lemma 1.7.6. We can find a standard Gaussian r.v. z, depending only on


the r.v.s (gij )i<j≤N , which is independent of the r.v.s (gij )i<j≤N −1 , and such
that
 √ 2 K
E σN − th(βz q + h) ≤ .
N
It is important to read carefully the previous statement. It does not say
(and this is not true) that z depends only on the r.v.s (giN )i<N . One would
certainly wish in this result to have the constant K remain bounded as 0 ≤
h ≤ h0 ; unfortunately our argument does not yield this (there is a kind of
discontinuity as h → 0).
Proof. We can and do assume h = 0, for otherwise q = 0 and σN ≡ 0, so
there is nothing to prove. Looking
 at (1.185) the basic idea is simply that one

should have z q  N −1/2 i≤N −1 gi σi − . However some renormalization
is necessary to ensure that Ez 2 = 1, so that we define
1 
z= gi σi − ,
A
i≤N −1

where A2 = i≤N −1 σi 2− and gi = giN . Thus z depends only upon the r.v.s
(gij )i<j≤N . Conditionally upon the r.v.s (gij )i<j≤N −1 , the r.v. z is standard
Gaussian, because these r.v.s determine the numbers σi − and are indepen-
dent of the r.v.s gi . Therefore (as surprising as this might be the first time
one thinks about this), the r.v. z is independent of the r.v.s (gij )i<j≤N −1 .
70 1. The Sherrington-Kirkpatrick Model

Combining (1.185) and the inequality |th x − th y| ≤ |x − y|, it remains


only to prove that
  2
√ 1 K(h)
E z q− √ gi σi − ≤ . (1.188)
N i≤N −1 N

Taking first the expectation in (gi )i≤N −1 , we obtain


  2   √ 2
√ 1 q 1
E z q− √ gi σi − =E gi σi − −√
N i≤N −1 A N
i≤N −1
  √  2
q 1
=E σi 2− −√
A N
i≤N −1
√ 2  2
q 1 √ A
= EA2 −√ =E q− √ .
A N N
Now,
 2  2  2
√ A q − A2 /N 1 A2
q− √ = √ √ 2 ≤ q− .
N q + A/ N q N
  −
Finally, since A2 = i≤N −1 σi 2− = i≤N −1 σi1 σi2 − = N R1,2 − , we get

 2
A2 − − −
E q− = E(q − R1,2 −)
2
= E R1,2 −q 2
− ≤ E (R1,2 − q)2 − . (1.189)
N

Using (1.89) for the (N − 1)-spin system yields that E (R1,2 − q− )2 − ≤ K/N

and (1.187) then implies that E (R1,2 − q) − ≤ K/N .
2



Proof of Theorem 1.7.1. The proof goes by induction over n. When we


use the cavity method, we replace β by β− , that depends on N , so we cannot
“use (1.183) for β− instead of β”. Since β− ≤ β, this difficulty disappears if
one formulates the induction hypothesis as follows:

Given n and β0 < 1/2, there exists a number K(n, β0 ) such that for
β ≤ β0 and any N one can find r.v.s (zi )i≤n , depending only on the r.v.s
(gij )1≤i<j≤N such that
  √ 2 K(n, β0 )
E σi − th(βzi q + h) ≤ . (1.190)
N
i≤n

The reader notices that we assume that the r.v.s (zi )i≤n are functions of the
variables (gij )i<j≤N as part of the induction hypothesis. That this induction
hypothesis is true for n = 1 follows from Lemma 1.7.6, exchanging the sites
1 and N . For the induction step from n to n + 1, we apply the induction
1.7 Gibbs’ Measure; the TAP Equations 71

hypothesis to the (N − 1)-spin system with Hamiltonian HN −1 given by


(1.184). This amounts to replacing β by β− ≤ β. We then get from (1.190)
that
  √ 2 K(n, β0 )
E σi − − th(β− zi q− + h) ≤ , (1.191)
N −1
i≤n

where the variables (zi )i≤n are i.i.d. standard


√ Gaussian and depend only on

(gij )i<j≤N −1 . We observe that, since | a − b| ≤ |a − b| we have

√ √ L
( q− − q)2 ≤ |q− − q| ≤
N
by (1.187), so that
√ √ L
(β− q− − β q)2 ≤
N
and, since |thx − thy| ≤ |x − y|, this implies
 √ √ 2  √ √ 2 L
E (th(β− zi q− + h) − th(β zi q + h) ≤ E zi (β− q− − β q) ≤ .
N
Combining with (1.186) and (1.191) we obtain
  √ 2 K
E σi − th(β zi q + h) ≤ ,
N
i≤n

where K depends only on β0 , h and n. We now appeal to Lemma 1.7.6.


The r.v. z is standard Gaussian and probabilistically independent of the
r.v.s (zi )i≤n because these are functions of the r.v.s (gij )i<j≤N −1 and z is
independent of these r.v.s. Moreover, setting zN = z, we have
  √ 2 K
E σi − th(β zi q + h) ≤ .
N
i∈{1,···,n,N }

Exchanging the sites N and n + 1 concludes the proof. 



We now turn to the Thouless-Anderson-Palmer (TAP) equations [160].
These equations, at a given disorder (hopefully) determine the numbers σi
(the mean magnetization at site i). They can be stated as
 
β 
σi ≈ th √ gij σj + h − β (1 − q) σi .
2
(1.192)
N j =i

The physicists have no qualms writing exact equalities in (1.192), but it is


certainly not obvious that these equations hold simultaneously for every i,
even approximately. This will be a consequence of the next result, which, as
Lemma 1.7.4, depends on the general principle of Theorem 1.7.11 below.
72 1. The Sherrington-Kirkpatrick Model

Theorem 1.7.7. For β < 1/2, any h, any integer k ≥ 1 we have


   2k
β K
E σN − th √ gi σi + h − β 2 (1 − q) σN ≤ k , (1.193)
N i≤N −1 N

where K depends on β and k but not on N .

In most of the statements of the rest of this section, the constant K is as


above, it might depend on β and will certainly depend on k. Even though we
will not mention this every time, if we fix β0 < 1/2, one can check that for
β ≤ β0 the constant K depends on k only.
There is an obvious relationship between (1.185) and (1.193). We have
introduced a kind of correction term in (1.193), but now all the quantities that
appear are defined in terms of the N -spin system. A big difference however
is that (in order to control all spins at the same time) we need to control
higher moments and that this requires new ideas compared to Section 1.6.

Corollary 1.7.8. For any β < 1/2, any h, and any ε > 0 we have
  

 β   K(β, ε)
E max σi − th √ gij σj + h − β (1 − q) σi  ≤ 1/2−ε . (1.194)
2
i≤N N j =i N

Proof. Let
 
β 
Δi = σi − th √ gij σj + h − β 2 (1 − q) σi , (1.195)
N j =i

−k
i ≤ K(β, k)N
so that by (1.193) and symmetry between sites we have EΔ2k
and 2k  K(β, k)
E max |Δi | ≤ EΔ2ki ≤
i≤N N k−1
i≤N
so
K(β, k)
E max |Δi | ≤ ,
i≤N N 1/2−1/2k
and taking k with 2k ≥ 1/ε concludes the proof. 


Research Problem 1.7.9. (Level 1+ ) Is it true that for some K that does
not depend on N one has

N Δ2N
E exp ≤2?
K

Research Problem 1.7.10. (Level 1+ ) Is it true that the r.v. N ΔN con-
verges in law to a Gaussian limit?
1.7 Gibbs’ Measure; the TAP Equations 73

These are good problems. Even though the SK model is well under control
for β < 1/2, matters seem rather complicated here; that is, until one finds a
good way to look at them.
We turn to the general principle on which much of the section relies.
Let us consider a standard Gaussian r.v. ξ. Let us remind the reader that
throughout the book we denote by Eξ expectation in the r.v. ξ only, that is,
when all other r.v.s are given.
Theorem 1.7.11. Assume β < 1/2. Consider a function U on R, which is
infinitely differentiable. Assume that for all numbers  and k, for any Gaus-
sian r.v. z, we have
E|U () (z)|k < ∞ . (1.196)
Consider independent standard Gaussian r.v.s yi and ξ, which are indepen-
dent of the randomness of · . Then, using the notation σ̇i = σi − σi , for
each k we have
   2k
1  K
E U √ yi σ̇i − Eξ U (ξ 1 − q) ≤ k , (1.197)
N i≤N N

where of course q is the solution of (1.74), and the constant K depends on


k, U, β, but not on N .

According to (1.137) we expect that the r.v. σ → N −1/2 i≤N yi σ̇i should
be approximately
 Gaussian of variance
√ 1−q under · , so that we should have
U (N −1/2 i≤N yi σ̇i )  Eξ U (ξ 1 − q), and (1.197) makes this statement
precise.
As we shall see, the proof of (1.197) is made possible by special sym-
metries. It would be useful to know other statements, such as the following
(which, unfortunately is probably not true).
Research Problem 1.7.12. (Level 1+). Under the preceding conditions,
and recalling that Eξ denotes expectation in ξ only, is it true that
    2k
1  1  K
E U √ yi σi − Eξ U √ yi σi + ξ 1−q ≤ ?
N i≤N N i≤N Nk
(1.198)
The subsequent proofs use many times the following observation. If two
random quantities A and B (depending on N ) satisfy EA2k ≤ K/N k and
EB 2k ≤ K/N k , then E(A + B)2k ≤ K/N k (for a different constant K).
This follows from the inequality (A + B)2k ≤ 22k (A2k + B 2k ). (The reader
observes that in fact the previous inequality also holds with the factor 22k−1
rather than 22k . Our policy however is to often write crude but sufficient
inequalities.)
Proof of Theorem 1.7.11. Consider the function
74 1. The Sherrington-Kirkpatrick Model

V (x) = U (x) − Eξ U (ξ 1 − q)

so that
1 − q) = 0 .
Eξ V (ξ (1.199)

Using replicas, and defining Ṡ  = N −1/2 i≤N yi σ̇i , the left-hand side of
(1.197) is
 2k 
E V (Ṡ 1 ) =E V (Ṡ  ) . (1.200)
≤2k

Consider independent standard Gaussian r.v. (ξ  )≤2k and the function


 √ √
ϕ(t) = E V ( tṠ  + 1 − tξ  1 − q) ,
≤2k

so that the quantity (1.200) is ϕ(1). To prove the theorem, it suffices to prove
that ϕ(r) (0) = 0 for r < 2k and that |ϕ(2k) (t)| ≤ KN −k .
For x = (x )≤2k , let us consider the function √ F (x) given by F (x) =
 √ √
≤2k V (x  ). Let us define X t = (X )
 ≤2k for X  = tṠ  + 1 − tξ  1 − q.
With this notation we have ϕ(t) = E F (Xt ) . We observe that

ϕ(t) = E F (Xt ) = E E0 F (Xt ) ,

where E0 denotes expectation in the r.v.s yi and ξ  only.


Let
1   2
T, = E0 (Ṡ  )2 − E0 (ξ  1 − q)2 = (σ̇i ) − (1 − q)
N
i≤N

and, for  =  , let


  1   
T, = E0 Ṡ  Ṡ  − E0 ξ  ξ  (1 − q) = σ̇i σ̇i .
N
i≤N

We will prove that these quantities satisfy


K
∀r , 2r
E T,  ≤ . (1.201)
Nr
Let us explain this in the case  =  , the case  =  being similar. We observe
that, since (σ̇i )2 = (σi − σi )2 = 1 − 2σi σi + σi 2 ,
  
1   2 1 1 
T, = (σ̇i ) − (1 − q) = −2 σi σi − q +

σi 2 − q .
N N N
i≤N i≤N i≤N

To control the first term on the right-hand side we write


1.7 Gibbs’ Measure; the TAP Equations 75
 2r  2r
1   1  1 2
σi σi − q = σi σi − q ≤ (R1,2 − q)2r ,
N N
i≤N i≤N

by using Jensen’s inequality (1.23) to average in σ 2 outside the power 2r


rather than inside. Then (1.88) implies E (R1,2 − q)2r ≤ KN −r . We control
the other term similarly.
To compute the derivatives of ϕ(t) we apply iteratively (1.40) to the
function t → E0 F (Xt ) (given the randomness of · ). We observe that for
s = (s )≤2k the corresponding partial derivative F (s) of F is given by

F (s) (x) = V (s ) (x ) . (1.202)
≤2k

Consider a list 1 , 1 , 2 , 2 , . . . , r of integers ≤ 2k, and the sequence s =


(s )≤2k that is obtained from this list as follows: for each  ≤ 2k, s counts
the number of times  occurs in the list. Then it follows from (1.40) and
induction on r that ϕ(r) (t) is given by

ϕ(r) (t) = 2−r E T1 ,1 · · · Tr ,r F (s) (Xt ) , (1.203)
1 ,1 ,...,r ,r

where the summation is over all choices of 1 , 1 , . . . , r . If we combine (1.196),


(1.202) and (1.201) with Hölder’s inequality we see that |ϕ(r) (t)| ≤ KN −r/2
(as usual, “each factor T, contributes as N −1/2 ”). Let us now examine
ϕ(r) (0), with the aim of proving that this is 0 unless r ≥ 2k. Since the
randomness of ξ  is independent of the randomness of · , ϕ(r) (0) is of the
type

ϕ(r) (0) = 2−r EF (s) ((ξ  1 − q)≤2k )E T1 ,1 · · · Tr ,r . (1.204)
1 ,1 ,...,r ,r

Using independence and (1.202) we note first that



EF (s) ((ξ  1 − q)≤2k ) = EV (s ) (ξ 1 − q) .
≤2k

Combining with (1.199) we obtain

EF (s) ((ξ  1 − q)≤2k ) = 0 unless ∀ ≤ 2k , s ≥ 1 .

This implies that when we consider a non-zero term in the sum (1.204), each
number  ≤ 2k occurs at least one time in the list 1 , 1 , 2 , 2 , . . . , r , r . Let
us assume this is the case. We also observe that for  =  the averages of

T, over σ  and over σ  are both zero. It follows that when

T1 ,1 · · · Tr ,r = 0 (1.205)


76 1. The Sherrington-Kirkpatrick Model

no number  ≤ 2k can occur exactly once in the list 1 , 1 , 2 , 2 , . . . , r . Since


we assume that each of these numbers occurs at least once in this list, it
must occur at least twice (for otherwise averaging T1 ,1 · · · Tr ,r in σ  would
already be 0). Since the length of the list is 2r we must have 2r ≥ 4k i.e.
r ≥ 2k. Therefore r ≥ 2k whenever ϕ(r) (0) = 0. 


Corollary 1.7.13. Given β ≤ β0 < 1/2, h, ε = ±1 and k ≥ 1 we have


 2k
εβ  β2 εβ  K
E exp √ yi σi − exp (1 − q) exp √ yi σi ≤ k
N i≤N 2 N N
i≤N
(1.206)
and

1  εβ 
E √ yi σ̇i exp √ yi σi
N i≤N N i≤N
2k
β2 εβ  K
−εβ(1 − q) exp (1 − q) exp √ yi σi ≤ k (1.207)
2 N i≤N N

where K does not depend on N and, given k and h, K stays bounded with
β ≤ β0 .
Proof. To prove (1.206) we use (1.197) with U (x) = exp εβx to get
K
EA4k ≤
N 2k
where
εβ  β2
exp √
A= yi σ̇i − exp (1 − q) .
N i≤N 2

Now, if B = exp εβN −1/2 i≤N yi σi , (A.6) entails that EB 4k ≤ K k , and
therefore
 1/2 K
E(AB)2k ≤ EA4k EB 4k ≤ k .
N
This proves (1.206).
We proceed similarly for (1.207), using now U (x) = x exp εβx, noting that
then (using Gaussian integration by parts and (A.6))

Eξ U (ξ 1 − q) = εβ(1 − q) exp(β 2 (1 − q)/2) .

The reason why K remains bounded for β ≤ β0 is simply that all the estimates
are uniform over that range. 

Lemma 1.7.14. If |A | ≤ B and B ≥ 1 we have
 
A A 

 B − B  ≤ |A − A | + |B − B | .
1.7 Gibbs’ Measure; the TAP Equations 77

Proof. We write
A A A A A A
− = − + −
B B B  B B B
A B−B 1
= + (A − A) ,
B B B
and the result is then obvious. 

Corollary 1.7.15. Let
 
εβ 
E = exp √ yi σi + εh . (1.208)
N i≤N

Recalling that Av denotes average over ε = ±1, we have


  2k
AvεE β  K
E − th √ yi σi + h ≤ k (1.209)
AvE N N
i≤N

 2k
1  σi AvE AvεE 1  K
E √ yi − β(1 − q) −√ yi σi ≤ k .
N i≤N AvE AvE N i≤N N
(1.210)
Proof. Defining
εβ  β2 εβ 
A(ε) = exp √ yi σi − exp (1 − q) exp √ yi σi ,
N i≤N 2 N i≤N

we deduce from (1.206) that


1 1 2k K
E A(1) exp h ± A(−1) exp(−h) ≤ k ,
2 2 N
i.e.
   2k
β  β2 K
E AvεE − sh √ yi σi + h exp (1 − q) ≤ k
N i≤N 2 N

and
   2k
β  β2 K
E AvE − ch √ yi σi + h exp (1 − q) ≤ k , (1.211)
N i≤N 2 N

from which (1.209) follows using Lemma 1.7.14. From (1.207) we obtain by
the same method
  2k
1  β2 β 
E √ yi σ̇i AvE − β(1−q) exp (1−q)sh √ yi σi + h
N i≤N 2 N i≤N
78 1. The Sherrington-Kirkpatrick Model

K
≤ .
Nk
Combining with (1.211) and Lemma 1.7.14 we get
 1    2k

i≤N yi σ̇i AvE β  K
E N
− β(1 − q)th √ yi σi + h ≤ k .
AvE N N
i≤N

Since
  
√1
i≤N yi σ̇i AvE 1  σi AvE 1 
N
=√ yi −√ yi σi ,
AvE N i≤N AvE N i≤N

combining with (1.209) proves (1.210). 



Proof of Theorem 1.7.7. The Hamiltonian (1.184) is the Hamiltonian of
an (N − 1)-spin system with parameter

N −1
β− = β ≤β.
N
The cavity method yields
AvεE −
σN = ,
AvE −
where, recalling that gi = giN ,
    
εβ− εβ 
E = exp √ gi σi + εh = exp √ gi σi + εh .
N − 1 i≤N −1 N i≤N −1

We then apply (1.209) to the (N − 1)-spin system with Hamiltonian (1.184),


and to the sequence yi = gi to get
  2k
β  K
E σN − th √ gi σi − + h ≤ k (1.212)
N i≤N −1 N

and in particular (1.185). Similarly we obtain from (1.210) that


   2k
1 1 K
E √ gi σi −β− (1−q− ) σN − √ gi σi − ≤ ,
N − 1 i≤N −1 N − 1 i≤N −1 Nk
√ √
and, since β− = β N − 1/ N , multiplying by β− 2k
and observing that |β−
2

β | ≤ 1/N and that |q − q− | ≤ K/N by (1.187), we get
2

  2k
β β  K
E √ gi σi − β 2 (1 − q) σN − √ gi σi − ≤ .
N i≤N −1 N i≤N −1 Nk
1.7 Gibbs’ Measure; the TAP Equations 79

Therefore if
β 
A= √ gi σi − + h
N i≤N −1
β 
B= √ gi σi − β 2 (1 − q) σN + h ,
N i≤N −1

we have
K
E(thA − thB)2k ≤ E(A − B)2k ≤ ,
Nk
and combining with (1.212) this yields (1.193). 

Proof of Lemma 1.7.4. Since (1.185) follows from (1.212), it remains only
to prove (1.186). Recalling (1.208), it suffices to prove that

σ̇1 AvE 2 K
E ≤ . (1.213)
AvE 2 N
Indeed, we have
σ̇1 AvE σ1 AvE
= − σ1 .
AvE AvE
Using (1.213) for the (N − 1)-spin system, and noticing that then by the
cavity method the right-hand side is σ1 − σ1 − , we obtain (1.186). Thus
it suffices to prove that
K
E σ̇1 AvE 2 ≤ .
N
Let us define  
ε β 
E = exp √ 
yi σi + ε h ,
N i≤N
so that using replicas

σ̇1 AvE 2
= σ̇11 σ̇12 AvE1 AvE2 = σ̇11 σ̇12 AvE1 E2 ,

where from now on Av means average over ε1 , ε2 = ±1.


Using symmetry between sites, and taking first expectation in the r.v.s yi
(that are independent of the randomness of the bracket) we get

1  1 2 1  1 2
E σ̇11 σ̇12 AvE1 E2 = E σ̇i σ̇i AvE1 E2 =E σ̇i σ̇i AvE E1 E2 .
N N
i≤N i≤N

Now, using (A.6),


 
ε1 β  ε2 β 
E E1 E2 = E exp √ yi σi1 + √ yi σi2 + ε1 h + ε2 h
N i≤N N i≤N
= exp(β 2 + β 2 ε1 ε2 R1,2 + ε1 h + ε2 h) .
80 1. The Sherrington-Kirkpatrick Model

We observe that if ε = ±1 we have exp εx = chx + εshx. Writing the above


quantity as a product of four exponentials, to each of which we apply this
formula, we get
 
AvE E1 E2 = exp β 2 ch(β 2 R1,2 )ch2 h + sh(β 2 R1,2 )sh2 h .

Thus it suffices to show that


   
    1  1 2   K
E 1 σ̇i σ̇i f (R1,2 )  = E
1 2
σ̇i σ̇i f (R1,2 ) − f (q) ≤
 N N  N ,
i≤N i≤N

where either f (x) = ch(β 2 x) or f (x) = sh(β 2 x), and where the equality
follows from the fact that σ̇i1 σ̇i2 = 0. Since β ≤ 1 we have |f (x) − f (q)| ≤
L|x − q| and thus
   
      1  1 2
E 1 
σ̇i σ̇i f (R1,2 ) − f (q)  ≤ LE 
1 2  σ̇i σ̇i |R1,2 − q| .
 N N
i≤N i≤N

Now, each of the factors on the right “contributes as 1/ N ”. This is seen
by using the Cauchy-Schwarz inequality, (1.89) and the fact that by Jensen’s
inequality (1.23) and (1.89) again we have
  2
−1 K
E N 1 2
σ̇i σ̇i ≤ . 

N
i≤N

1.8 Second Moment Computations and the


Almeida-Thouless Line
In this section, q is always the solution of (1.74). Theorem 1.4.1 shows that
ν((R1,2 −q)2 ) ≤ K/N for β < 1/2, so we expect that limN →∞ N ν((R1,2 −q)2 )
exists, and we would like to compute it. The present section develops the
machinery to do this. Our computations will be proven to hold true for β <
1/2, but an interesting side story is that it will be obvious that the result of

these calculations can be correct only when β 2 Ech−4 (βz q + h) < 1. It is
conjectured that this is exactly the region where this is the case. When h is
non-random, the line
1
β2E 4 √ =1 (1.214)
ch (βz q + h)
in the (β, h) plane is called the Almeida-Thouless (AT) line. In the SK model,
it is the (conjectured) boundary between the “high-temperature” region
(where the replica-symmetric solution is correct) and the “low-temperature”
region (where the situation is much more complicated).
The basic tool is as follows, where νt is as in Section 1.6, and where we

recall that R1,2 = N −1 i<N σi1 σi2 .
1.8 Second Moment Computations and the Almeida-Thouless Line 81

Proposition 1.8.1. Consider a function f on n replicas. Then, if τ1 , τ2 > 0


and 1/τ1 + 1/τ2 = 1 we have

|ν(f ) − ν0 (f )| ≤ K(n, β)ν(|f |τ1 )1/τ1 ν(|R1,2 − q|τ2 )1/τ2 (1.215)

|ν(f ) − ν0 (f ) − ν0 (f )| ≤ K(n, β)ν(|f | )
τ1 1/τ1
ν(|R1,2 − q| 2τ2 1/τ2
) . (1.216)

Of course K(n, β) does not depend on N .




One should think of |R1,2 −q| as being small (about 1/ N ). The difference
between the right-hand sides of (1.215) and (1.216) is that we have in the
latter an exponent 2τ2 rather than τ2 . Higher-order expansions yield smaller
error terms.
Proof. To prove (1.215) we simply bound νt (f ) using (1.151), (1.154)
and Hölder’s inequality. To prove (1.216) we compute νt (f ) by iteration of

(1.151), observing that the new differentiation “brings a new factor (R,  −q)

in each term”, we bound |νt (f )| as previously, and we use that

|ν(f ) − ν0 (f ) − ν0 (f )| ≤ sup |νt (f )| . 



0≤t≤1


When β < 1/2 we know that ν(|R1,2 − q|k ) is small; but, as we see later,
there is some point in making the computation for each value of β. There
are two aspects in the computation; one is to get the correct error terms,
which is very simple; the other is to perform the algebra, and this runs into
(algebraic!) complications. Before we start the computation itself, we explain
its mechanism (which will be used again and again). This will occupy the
next page and a half.
To lighten notation in the argument we denote by R any quantity such
that  
1  − 
|R| ≤ K + ν |R 1,2 − q| 3
, (1.217)
N 3/2
where K does not depend on N . Using the inequality xy ≤ x3/2 + y 3 for
x, y ≥ 0 we observe first that
1 −
ν(|R1,2 − q|) = R . (1.218)
N
 
We start the computation of ν (R1,2 − q)2 as usual, recalling the notation


ε = σN and writing f = (ε1 ε2 − q)(R1,2 − q), f ∼ = (ε1 ε2 − q)(R1,2 − q), so
that
 
ν (R1,2 − q)2 = ν(f )
1
= ν(f ∼ ) + ν(1 − ε1 ε2 q) . (1.219)
N
82 1. The Sherrington-Kirkpatrick Model

Using (1.215) with τ1 = ∞, τ2 = 1 and (1.218) we obtain


1 1
ν(1 − ε1 ε2 q) = ν0 (1 − ε1 ε2 q) + R . (1.220)
N N
We know that ν0 (1 − ε1 ε2 q) = 1 − q 2 using Lemma 1.6.2.
Next, we apply (1.216) to f ∼ with τ1 = 3, τ2 = 3/2, to get

ν(f ∼ ) = ν0 (f ∼ ) + R ,

because ν0 (f ∼ ) = 0 by Lemma 1.6.2 and (1.74). Therefore we have


  1 − q2 −
ν (R1,2 − q)2 = + ν0 ((ε1 ε2 − q)(R1,2 − q)) + R . (1.221)
N
As is shown by (1.151), the quantity ν0 (f ∼ ) is a sum of terms of the type
 − −

±β 2 ν0 ε ε (ε1 ε2 − q)(R,  − q)(R1,2 − q) .

Using Lemma 1.6.2, such a term is of the form


− −
± b(,  )ν0 ((R,  − q)(R1,2 − q)) (1.222)

where
b(,  ) = β 2 ν0 (ε ε (ε1 ε2 − q)) .
− −
Next, we apply (1.215) to f = (R,  −q)(R1,2 −q), this time with τ1 = 3/2

and τ2 = 3 to get (after a further use of Hölder’s inequality)


− − − −
ν0 ((R,  − q)(R1,2 − q)) = ν((R, − q)(R1,2 − q)) + R .


Using the formula R,  = R, − ε ε /N , we obtain (using (1.218))

− −
ν((R,  − q)(R1,2 − q)) = ν((R, − q)(R1,2 − q)) + R . (1.223)

Because of the symmetry between replicas the quantity ν((R, −q)(R1,2 −q))
can take only 3 values, namely
 
U = ν (R1,2 − q)2 ; (1.224)
V = ν((R1,2 − q)(R1,3 − q)) ; (1.225)
W = ν((R1,2 − q)(R3,4 − q)) . (1.226)

Thus from (1.221) we have obtained the relation


1
U= (1 − q 2 ) + c1 U + c2 V + c3 W + R , (1.227)
N
for certain numbers c1 , c2 , c3 . We repeat this work for V and W ; specifically,
we write
1.8 Second Moment Computations and the Almeida-Thouless Line 83

1
V = ν(f ∼ ) + ν((ε1 ε2 − q)ε1 ε3 ) (1.228)
N
− 1
W = ν((ε1 ε2 − q)(R3,4 − q)) + ν((ε1 ε2 − q)ε2 ε4 ) ,
N

where now f ∼ = (ε1 ε2 −q)(R1,3 −q) and we proceed as above. In this manner,
we get a system of 3 linear equations in U, V, W , the solution of which yields
the values of these quantities (at least in the case β < 1/2, where we know
that |R| ≤ KN −3/2 ).
Having finished to sketch the method of proof, we now turn to the com-
putation of the actual coefficients in (1.227). It is convenient to consider the
quantity

q = ν0 (ε1 ε2 ε3 ε4 ) = Eth4 Y = Eth4 (βz q + h) . (1.229)
Let (using Lemma 1.6.2)

b(2) = β 2 ν0 (ε1 ε2 (ε1 ε2 − q)) = β 2 ν0 (1 − ε1 ε2 q) = β 2 (1 − q 2 )


b(1) = β 2 ν0 (ε1 ε3 (ε1 ε2 − q)) = β 2 ν0 (ε2 ε3 − ε1 ε3 q) = β 2 (q − q 2 )
b(0) = β 2 ν0 (ε3 ε4 (ε1 ε2 − q)) = β 2 ν0 (ε1 ε2 ε3 ε4 − qε3 ε4 ) = β 2 (
q − q2 ) .

For two integers x, y we define

b(,  ; x, y) = b(card({,  } ∩ {x, y})) .

Lemma 1.8.2. Consider a function f − on ΣN n


−1 and two integers x, y ≤ n,
x = y. Then


ν0 ((εx εy − q)f − ) = b(,  ; x, y)ν0 (f − (R,  − q))

1≤< ≤n


−n b(, n + 1; x, y)ν0 (f − (R,n+1 − q))
≤n
n(n + 1) −
+ b(0)ν0 (f − (Rn+1,n+2 − q)) . (1.230)
2
This is of course an immediate consequence of (1.151), Lemma 1.6.2, and
the definition of b(,  ; x, y). The reason why we bring this formula forward is
that it contains the entire algebraic structure of our calculations. In particular
these calculations will hold for other models provided (1.230) is true (possibly
with different values of b(0), b(1) and b(2)). Let us also note that b(0) =
b(n + 1, n + 2; x, y).

Using (1.230) with f − = R1,2 − q and n = 2 yields

 − 
ν0 ((ε1 ε2 − q)(R1,2 − q)) = b(2)ν0 (R1,2 − q)2

− −
− 2b(1) ν0 ((R1,2 − q)(R,3 − q))
≤2
− −
+ 3b(0)ν0 ((R1,2 − q)(R3,4 − q)) ,
84 1. The Sherrington-Kirkpatrick Model

so that going back to (1.221) and recalling the definitions (1.224) to (1.226)
and (1.223) we can fill the coefficients in (1.227):
1 − q2
U= + b(2)U − 4b(1)V + 3b(0)W + R . (1.231)
N

To treat the situation (1.228) we use (1.230) with n = 3 and f − = R1,3 − q.
One needs to be patient in counting how many terms of each type there are;
one gets the relation
q − q2
V = + b(1)U + (b(2) − 2b(1) − 3b(0))V + (6b(0) − 3b(1))W + R (1.232)
N
and similarly
q − q 2
W = +b(0)U +(4b(1)−8b(0))V +(b(2)−8b(1)+10b(0))W +R . (1.233)
N
Of course, this is not as simple as one might wish. This brings forward
the matrix
⎛ ⎞
b(2) −4b(1) 3b(0)
⎝ b(1) b(2) − 2b(1) − 3b(0) 6b(0) − 3b(1) ⎠ . (1.234)
b(0) 4b(1) − 8b(0) b(2) − 8b(1) + 10b(0)
Rather amazingly, the transpose of this matrix has eigenvectors (1, −2, 1) and
(1, −4, 3) with eigenvalues respectively
b(2) − 2b(1) + b(0) = β 2 (1 − 2q + q) (1.235)
b(2) − 4b(1) + 3b(0) = β 2 (1 − 4q + 3q) . (1.236)
The second eigenvalue has multiplicity 2, but this multiplicity appears in the
form of a two-dimensional Jordan block so that the corresponding eigenspace
has dimension 1. The amazing point is of course that the eigenvectors do not
depend on the specific values of b(0), b(1), b(2). Not surprisingly the quantities
(1.235) and (1.236) will occur in many formulas.
Using eigenvectors is certainly superior to brute force in solving a sys-
tem of linear equations, so one should start the computation of U, V, W by
computing first U − 2V + W . There is more however to (1.230) than the
matrix (1.234). This will become much more apparent later in Section 1.10.
The author cannot help feeling that there is some simple underlying alge-
braic structure, probably in the form of an operator between two rather large
spaces.
Research Problem 1.8.3. (Level 2) Clarify the algebraic structure under-
lying (1.230).
Even without solving this problem, the idea of eigenvectors gives the feeling
that matters will simplify considerably if one considers well-chosen combina-
tions of (1.230) for various values of x and y, such as the following, which
brings out the value (1.235).
1.8 Second Moment Computations and the Almeida-Thouless Line 85

Lemma 1.8.4. Consider a function f − on ΣN


n
−1 . Then

ν0 ((ε1 − ε2 )(ε3 − ε4 )f − )
− − − −
= (b(2) − 2b(1) + b(0))ν0 (f − (R13 − R14 − R23 + R24 ))
− − − − −
= β (1 − 2q + q)ν0 (f (R13 − R14 − R23 + R24 )) .
2
(1.237)
Proof. The magic here lies in the cancellation of most of the terms in the
sums 1≤< ≤n and ≤n coming from (1.230). We use (1.230) four times
for x = 1, 2 and y = 3, 4 and we compute
c(,  ) = b(,  ; 1, 3) − b(,  ; 1, 4) − b(,  ; 2, 3) + b(,  ; 2, 4) .
We see that this is zero except in the following cases:
c(1, 3) = c(2, 4) = −c(1, 4) = −c(2, 3) = b(2) − 2b(1) + b(0) . 


− −
“Rectangular sums” such as R1,3 − R1,4 − R2,3 + R2,4 or R1,3 − R1,4 −
− −
R2,3 + R2,4 will occur frequently.
Now that we have convinced the reader that the error terms in our com-
putation are actually of the type (1.217) we will for simplicity assume that
β < 1/2, in which case the error terms are O(3), where we recall that O(k)
means a quantity A such that |A| ≤ KN −k/2 where K does not depend on
N.
We will continue the computation of U, V, W later, but to immediately
make the point that (1.237) simplifies the algebra we prove the following,
where we recall that “·” denotes the dot product in RN , so that σ 1 · σ 2 =
N R1,2 . It is worth making the effort to fully understand the mechanism of
the next result, which is a prototype for many of the later calculations.
Proposition 1.8.5. If β < 1/2 we have
 
(σ 1 − σ 2 ) · (σ 3 − σ 4 ) 2 4(1 − 2q + q)
ν = + O(3) . (1.238)
N N (1 − β 2 (1 − 2q + q))
Proof. Let ai = (σi1 − σi2 )(σi3 − σi4 ), so that
(σ 1 − σ 2 ) · (σ 3 − σ 4 ) 1 
= R1,3 − R1,4 − R2,3 + R2,4 = ai . (1.239)
N N
i≤N

Therefore, if f = R1,3 − R1,4 − R2,3 + R2,4 , we have


 
(σ 1 − σ 2 ) · (σ 3 − σ 4 ) 2  
ν = ν (R1,3 − R1,4 − R2,3 + R2,4 )2
N
1  
= ν ai f = ν(aN f )
N
i≤N
= ν((ε1 − ε2 )(ε3 − ε4 )f ) . (1.240)
86 1. The Sherrington-Kirkpatrick Model

Moreover
1  
ν((ε1 − ε2 )(ε3 − ε4 )f ) = ν((ε1 − ε2 )(ε3 − ε4 )f − ) + ν ((ε1 − ε2 )(ε3 − ε4 ))2 ,
N
(1.241)
− − − −
where f − = R1,3 − R1,4 − R2,3 + R2,4 . First we observe that
 
ν0 ((ε1 − ε2 )(ε3 − ε4 ))2 = 4ν0 ((1 − ε1 ε2 )(1 − ε3 ε4 )) = 4(1 − 2q + q) . (1.242)

We use (1.216) for f ∗ = (ε1 − ε2 )(ε3 − ε4 )f − with τ1 = 3 and τ2 = 3/2 to


obtain

|ν(f ∗ ) − ν0 (f ∗ ) − ν0 (f ∗ )| ≤ Kν(|R1,2 − q|3 ) = O(3) .
Next, by Lemma (1.6.2) we have ν0 (f ∗ ) = ν0 ((ε1 − ε2 )(ε3 − ε4 )f − ) = 0, and
(1.237) implies

ν0 (f ∗ ) = ν0 ((ε1 − ε2 )(ε3 − ε4 )f − )
 − − − − 2

= β 2 (1 − 2q + q)ν0 (R1,3 − R1,4 − R2,3 + R2,4 ) .
− − − −
Next, we observe that ν((R1,2 − q)4 ) = O(4), so that ν((R1,3 − R1,4 − R2,3 +
− 4
R2,4 ) ) = O(4) and we apply (1.216) with e.g τ1 = τ2 = 2 to obtain
 − − − − 2
  − − − − 2

ν0 (R1,3 − R1,4 − R2,3 + R2,4 ) = ν (R1,3 − R1,4 − R2,3 + R2,4 ) + O(3) .

We then use the relation R,  = R, − ε ε /N and expansion to get

 − − − − 2
  
ν (R1,3 − R1,4 − R2,3 + R2,4 ) = ν (R1,3 − R1,4 − R2,3 + R2,4 )2 + O(3) .

Finally we have reached the relation

ν(f ∗ ) = ν((ε1 − ε2 )(ε3 − ε4 )f − )


 
= β 2 (1 − 2q + q)ν (R1,3 − R1,4 − R2,3 + R2,4 )2 + O(3)

and thus combining with (1.240) and (1.241) we get


  4
(1 − β 2 (1 − 2q + q))ν (R1,3 − R1,4 − R2,3 + R2,4 )2 = (1 − 2q + q) + O(3) .
N
Since for β ≤ 1/2 we have β 2 (1 − 2q + q) < 1/4 < 1 the result follows. 

Since error terms are always handled by the same method, this will not
be detailed any more.
One can note the nice (accidental?) expression
1
1 − 2q + q = E .
ch4 Y
We have proved (1.238) for β < 1/2, but we may wonder for which values
of β it might hold. Since the left hand side of (1.238) is ≥ 0, this relation
cannot hold unless
1.8 Second Moment Computations and the Almeida-Thouless Line 87

1
β 2 (1 − 2q + q) = β 2 E <1, (1.243)
ch4 Y
i.e. unless we are on the high-temperature side of the AT line (1.214), a point
to which we will return in the next section.

Corollary 1.8.6. If β < 1/2 we have

  β 2 (1 − 2q + q)2
E ( σ1 σ2 − σ1 σ2 )2 = + O(3) . (1.244)
N (1 − β 2 (1 − 2q + q))

Proof. Recalling (1.239) and using symmetry between sites,


   
(σ 1 − σ 2 ) · (σ 3 − σ 4 ) 2 1  2
ν =ν ai
N N
i≤N
1 N −1
= ν(a2N ) + ν(a1 a2 ) .
N N
Now, (1.242) and (1.215) imply

ν(a2N ) = 4(1 − 2q + q) + O(1)

and

ν(a1 a2 ) = E (σ11 − σ12 )(σ13 − σ14 )(σ21 − σ22 )(σ23 − σ24 )


= E (σ11 − σ12 )(σ21 − σ22 ) 2
= 4E( σ1 σ2 − σ1 σ2 )2 .

The result then follows from (1.238). 



After these parentheses, we can get back to the computation of U, V
and W . If we were interested only in these values, the shortest route would
certainly be to solve the equations (1.231), (1.232), (1.233). We choose a
less direct approach, that will be much easier than the brute force method
to generalize to higher moments in Section 1.10. The computation is very
pretty and natural, but, as we have already discovered, the result of this
computation will be a bit complicated. It is given at the end of this section.
Rather than scaring away the reader with these formulas, we take the gentler
road of gradually discovering how they come into existence.
Pursuing the idea that the computation simplifies if we “use the correct
basis” we introduce the quantities

(σ  − b) · (σ  − b) (σ  − b) · b b·b
T, = ; T = ;T = − q , (1.245)
N N N
where as usual b = σ = ( σi )i≤N . Using the notation σ̇i = σi − σi , we
can also write (1.245) as
88 1. The Sherrington-Kirkpatrick Model

1    1   1 
T, = σ̇i σ̇i ; T = σ̇i σi ; T = σi 2
−q .
N N N
i≤N i≤N i≤N

These quantities will be proved to be “independent” in a certain sense. They


will allow to recover the quantities R, − q by the formula

R, − q = T, + T + T + T . (1.246)

Proposition 1.8.7. If β < 1/2 we have


2
ν(T1,2 ) = A2 + O(3) (1.247)

where
1 − 2q + q
A2 = . (1.248)
N (1 − β 2 (1 − 2q + q))

Proof. A basic problem is that T1,2 is a function of σ 1 and σ 2 , but that it


depends on the disorder through b, so that one cannot directly use results
such as Lemma 1.6.3 for f = T1,2 . There is a basic technique to go around
this difficulty. It will be used again and again in the rest of this section, and
in Section 1.10. It is basically to “replace each occurrence of b by σ  for a
value of  that has never been used before”. For example we have

2 (σ 1 − b) · (σ 2 − b) (σ 1 − b) · (σ 2 − b)
T1,2 =
N N
(σ 1 − σ 3 ) · (σ 2 − σ 4 ) (σ 1 − σ 5 ) · (σ 2 − σ 6 )
= .
N N

To understand this formula we keep in mind that σ  are averaged inde-


pendently for Gibbs’ measure, and that for any given vectors x, y ∈ RN we
have the formula x · (y + σ) = x · (y + b). Applying this four times, to the
integrations in σ  for  = 3, 4, 5, 6 proves the above equality. Therefore we
have
 1 
2 (σ − σ 3 ) · (σ 2 − σ 4 ) (σ 1 − σ 5 ) · (σ 2 − σ 6 )
ν(T1,2 )=ν ,
N N
or to match better with the notation of Proposition 1.8.5, and using symmetry
between replicas,
 1 
2 (σ − σ 2 ) · (σ 3 − σ 4 ) (σ 1 − σ 5 ) · (σ 3 − σ 6 )
ν(T1,2 ) = ν .
N N

Thus, using again (1.239) and symmetry among sites,


2
ν(T1,2 ) = ν((ε1 − ε2 )(ε3 − ε4 )f ) ,

where f = R1,3 − R1,6 − R5,3 + R5,6 , so that


1.8 Second Moment Computations and the Almeida-Thouless Line 89

1
2
ν(T1,2 )= ν((ε1 − ε2 )(ε3 − ε4 )(ε1 − ε5 )(ε3 − ε6 )) + ν((ε1 − ε2 )(ε3 − ε4 )f − ) ,
N
− − − −
for f − = R1,3 − R1,6 − R5,3 + R5,6 . We observe that

(ε1 − ε2 )(ε3 − ε4 )(ε1 − ε5 )(ε3 − ε6 ) 0


= (ε1 − ε2 )(ε1 − ε5 ) = 1 − ε1 ε5 − ε1 ε2 + ε2 ε5
2
0
2
0
= (1 − th2 Y )2 = 1 − 2 th2 Y + th4 Y (1.249)

so that
 
ν0 (ε1 − ε2 )(ε3 − ε4 )(ε1 − ε5 )(ε3 − ε6 ) = 1 − 2 q + q .

One then proceeds exactly as in Proposition 1.8.5 to prove (1.248). 




Proposition 1.8.8. If  <  and (,  ) = (1, 2) we have

ν(T1,2 T, ) = 0 . (1.250)

For any  we have


ν(T1,2 T ) = 0 . (1.251)
Finally we have
ν(T1,2 T ) = 0 . (1.252)

/ {,  } we have that T1,2 T, = 0 by integrating


Proof. For example, if 1 ∈
1
in σ . 

The following is in the spirit of Lemma 1.8.4. It is simpler than (1.230),
yet it allows more computations than (1.237).

Lemma 1.8.9. Consider a function f − on ΣN


n
−1 . Then

− −
ν0 ((ε1 − ε2 )ε3 f − ) = (b(2) − b(1))ν0 (f − (R1,3 − R2,3 )) (1.253)
 

− − − − − −
+ (b(1) − b(0)) ν0 (f (R1, − R2, )) − nν0 (f (R1,n+1 − R2,n+1 )) .
4≤≤n

Moreover, when f does not depend on the third replica we have

ν0 ((ε1 − ε2 )ε3 f − )
− −
= (b(2) − 4b(1) + 3b(0))ν0 (f − (R1,3 − R2,3 ))

− − − − −
+ (b(1) − b(0)) ν0 (f (R1, − R2, − R1,n+1 + R2,n+1 ))
4≤≤n
− −
= β (1 − 4q + 3
2
q )ν0 (f − (R1,3 − R2,3 ))

− − − −
+ β 2 (q − q) ν0 (f − (R1, − R2, − R1,n+1 + R2,n+1 )) . (1.254)
4≤≤n
90 1. The Sherrington-Kirkpatrick Model

Proof. We use again (1.230). For  <  we compute

c(,  ) := b(,  ; 1, 3) − b(,  ; 2, 3) .

This is 0 if  ≥ 3 or  = 1,  = 2. Moreover

c(1, 3) = −c(2, 3) = b(2) − b(1)


c(1,  ) = −c(2,  ) = b(1) − b(0) if  ≥ 4.

This proves (1.253). To prove (1.254) when f does not depend on the third
replica we simply notice that then
− − − −
ν0 (f − (R1,3 − R2,3 )) = ν0 (f − (R1,n+1 − R2,n+1 )) ,

and we move the corresponding terms from (1.254) around. 




Proposition 1.8.10. If β < 1/2 we have

ν(T12 ) = B 2 + O(3) (1.255)

where
1 q − q
B2 = . (1.256)
N (1 − β 2 (1 − 2q + q))(1 − β 2 (1 − 4q + 3
q ))

Moreover,
ν(T1 T2 ) = ν(T1 T ) = 0 . (1.257)

Proof. We start with the observation that


(σ 1 − b) · b (σ 1 − b) · b
T12 =
N N
(σ − σ ) · σ (σ 1 − σ 4 ) · σ 5
1 2 3
=
N N
= (R1,3 − R2,3 )(R1,5 − R4,5 ) . (1.258)

We then write as usual

ν(T12 ) = ν((ε1 − ε2 )ε3 (R1,5 − R4,5 ))


1 − −
= ν((ε1 − ε2 )ε3 (ε1 − ε4 )ε5 ) + ν((ε1 − ε2 )ε3 (R1,5 − R4,5 )) .
N
We have

ν0 ((ε1 − ε2 )ε3 (ε1 − ε4 )ε5 ) = ν0 ((1 − ε1 ε4 − ε2 ε1 + ε2 ε4 )ε3 ε5 ) = q − q .


− −
Using (1.254) for n = 5, f − = R1,5 − R4,5 , our usual scheme of proof yields
1.8 Second Moment Computations and the Almeida-Thouless Line 91

1
ν(T12 ) = (q − q) + β 2 (1 − 4q + 3
q )ν((R1,3 − R2,3 )(R1,5 − R4,5 )) (1.259)
N
  
+ β 2 (q − q) ν (R1,5 − R4,5 )(R1, − R2, − R1,6 + R2,6 ) + O(3) .
=4,5

From (1.258) we deduce that

ν((R1,3 − R2,3 )(R1,5 − R4,5 )) = ν(T12 ) .

To evaluate the last term in (1.259) we write, using (1.246)

R1,5 − R4,5 = T1,5 − T4,5 + T1 − T4


R1, − R2, − R1,6 + R2,6 = T1, − T2, − T1,6 + T2,6 .

Proposition 1.8.8 and (1.247) imply


  
ν (R1,5 − R4,5 )(R1, − R2, − R1,6 + R2,6 ) = ν(T1,5
2
) = A2 + O(3) ,
=4,5

so that (1.259) means that


1
(1 − β 2 (1 − 4q + 3
q ))ν(T12 ) = (q − q) + β 2 (q − q)A2 + O(3) . (1.260)
N
Since q ≤ q as is obvious from the definition of q, we have 1 − 4q + 3 q ≤ 1,
so that β 2 (1 − 4q + 3
q ) < 1 and (1.260) implies (1.256). The rest is obvious
as in Proposition 1.8.8. 

We can then take the last step.

Proposition 1.8.11. If β < 1/2 we have

ν(T 2 ) = C 2 + O(3) (1.261)

where
q − q 2
(1−β 2 (1−4q+3
q ))C 2 = q −q 2 )A2 +2β 2 (2q+q 2 −3
+β 2 ( q )B 2 . (1.262)
N
Proof. We know exactly how to proceed. We write

ν(T 2 ) = ν((R1,2 − q)(R3,4 − q))


= ν((ε1 ε2 − q)(R3,4 − q))
1 −
= ν((ε1 ε2 − q)ε3 ε4 ) + ν((ε1 ε2 − q)(R3,4 − q)) .
N
We observe that
ν0 ((ε1 ε2 − q)ε3 ε4 ) = q − q 2 ,
92 1. The Sherrington-Kirkpatrick Model

and we use (1.230) with n = 4 and f − = R3,4 − q to get, writing b(,  ) =
b(,  ; 1, 2), that
1 
ν(T 2 ) = ( q − q2 ) + b(,  )ν((R, − q)(R3,4 − q))
N
1≤< ≤4

−4 b(, 5)ν((R,5 − q)(R3,4 − q))
≤4
+ 10b(5, 6)ν((R5,6 − q)(R3,4 − q)) + O(3) . (1.263)
Using (1.246) and Propositions 1.8.7, 1.8.8 and 1.8.10 we know that
 
ν (R, − q)2 = A2 + 2 B 2 + ν(T 2 ) + O(3)

ν((R, − q)(R1 ,2 − q)) = B 2 + ν(T 2 ) + O(3)


if card({,  } ∩ {1 , 2 }) = 1, while if {,  } ∩ {1 , 2 } = ∅, then
ν((R, − q)(R1 ,2 − q)) = ν(T 2 ) .
We substitute these expressions in the right-hand side of (1.263) and we
collect the terms. The coefficient of ν(T 2 ) is
  
b(,  ) − 4 b(, 5) + 10 b(5, 6) = β 2 1 − q 2 + 4 q(1 − q) + (
q − q2 )
1≤< ≤4 ≤4

− 4(2 q(1 − q) + 2( q − q2 )
q − q 2 )) + 10(
= β 2 (1 − 4 q + 3 q) .
q − q 2 ), and the coefficient of B 2 is
The coefficient of A2 is β 2 (

b(,  ) + 2b(3, 4) − 4(b(3, 5) + b(4, 5))
=1,2, =3,4
 
= β 2 4(q − q 2 ) + 2(
q − q 2 ) − 8(
q − q 2 ) = 2 β 2 (2 q + q 2 − 3 q) .
We then get
1
ν(T 2 ) = q − q 2 ) + β 2 (1 − 4 q + 3 q) ν(T 2 ) + β 2 (
( q − q 2 ) A2
N
+ 2 β 2 (2 q + q 2 − 3 q) B 2 + O(3) (1.264)
and this implies the result. 

Using (1.246) again, we have proved the following, where A, B, C are given
respectively by (1.248), (1.256) and (1.262).
Theorem 1.8.12. For β < 1/2 we have
 
ν (R1,2 − q)2 = A2 + 2B 2 + C 2 + O(3) (1.265)
ν((R1,2 − q)(R1,3 − q)) = B + C + O(3)
2 2

ν((R1,2 − q)(R3,4 − q)) = C 2 + O(3) .


1.9 Beyond the AT Line 93

1.9 Beyond the AT Line



We recall that q is the solution of (1.74) and that q = Eth4 Y = Eth4 (βz q +
h). We should mention for the specialist that we will (much) later prove that
“beyond the AT line”, that is, when
1
β 2 (1 − 2q + q) = β 2 E >1 (1.266)
ch4 Y
the left-hand side of (1.171) is bounded below independently of N , and that,
consequently, for some number δ that does not depend on N , we have
1 
νM (|R1,2 − q|) > δ > 0 , (1.267)
N
M ≤N

where the index M refers to an M -spin system. This fact, however, relies
on an extension of Theorem 1.3.7, and, like this theorem, uses very special
properties of the SK model.
Research Problem 1.9.1. (Level 2) Prove that beyond the AT line we have
in fact for each N that
ν(|R1,2 − q|) > δ . (1.268)
As we will explain later, we know with considerable work how  to deduce
(1.268) from (1.267) in many cases, for example when in the term i≤N hi σi
of the Hamiltonian the r.v.s hi are i.i.d. Gaussian with a non-zero variance;
but we do not know how to prove (1.268) when hi = h = 0.
In contrast with the previous arguments, the results of the present section
rely on a very general method, which has the potential to be used for a great
many models, and that provides results for every N . This method simply
analyzes what goes wrong in the proof of (1.238) when (1.266) occurs. The
main result is as follows.
Proposition 1.9.2. Under (1.266), there exists a number δ > 0, that does
not depend on N , such that for N large enough, we have
    δ2
ν |R1,2 − q|3 ≥ δν (R1,2 − q)2 ≥ . (1.269)
N
This is not as nice as (1.268), but this shows something remarkable: the
set where |R1,2 − q| ≥ δ/2 is not exponentially small (in contrast with what
happens in (1.87)). To see this we write, since |R1,2 − q| ≤ 2,
δ
|R1,2 − q|3 ≤ (R1,2 − q)2 + 8 · 1{|R1,2 −q|≥δ/2} , (1.270)
2
where 1{|R1,2 −q|≥δ/2} is the function of σ 1 and σ 2 that is 1 when |R1,2 − q| ≥
δ/2 and is 0 otherwise. Using the first part of (1.269) in the first inequality
and (1.270) in the second one, we obtain
94 1. The Sherrington-Kirkpatrick Model
    δ    
δν (R1,2 − q)2 ≤ ν |R1,2 − q|3 ≤ ν (R1,2 − q)2 + 8ν 1{|R1,2 −q|≥δ/2} .
2
Hence, using the second part of (1.269) in the second inequality,

  δ   δ2
ν 1{|R1,2 −q|≥δ/2} ≥ ν (R1,2 − q)2 ≥ . (1.271)
16 16N
Lemma 1.9.3. For each values of β and h, we have

β 2 (1 − 4q + 3
q) < 1 . (1.272)

Proof. Consider the function



Φ(x) = E th2 (βz x + h) . (1.273)

Then Proposition A.14.1 shows that Φ(x)/x is decreasing, so that

x Φ (x) − Φ(x) < 0 ,

and since q = Φ(q), we have Φ (q) < 1. Now, using Gaussian integration by

parts, and writing as usual Y = βz q + h,
   
z th Y 1 sh2 Y
Φ (q) = β E √ 2
=β E −2 4
q ch2 Y ch4 Y ch Y
 
3 2
= β2 E − 2 = β 2 (3(1 − 2 q + q) − 2(1 − q))
ch4 Y ch Y
= β 2 (1 − 4 q + 3 q) , (1.274)

and this finishes the proof. 




Lemma 1.9.4. We recall the quantities A2 , B 2 , C 2 of Section 1.8. Then,


under (1.266) we have
 
(σ 1 − σ 2 )(σ 3 − σ 4 ) 2 4 (1 − 2q + q)
ν = + R (1.275)
N N (1 − β 2 (1 − 2q + q))
ν((R1,2 − q)2 ) = A2 + 2B 2 + C 2 + R (1.276)

where  
1
|R| ≤ K(β, h) + ν(|R1,2 − q| ) .
3
(1.277)
N 3/2
Proof. As explained at the beginning of Section 1.8 all the computations
there are done modulo an error term as in (1.277); and (1.266) and (1.272)
show that we are permitted to divide by (1 − β 2 (1 − 2q + q)) and (1 − β 2 (1 −
4q + q)), so that (1.275) and (1.276) are what we actually proved in (1.238)
and (1.265) respectively. 

1.9 Beyond the AT Line 95

Proof of Proposition 1.9.2. We deduce from (1.275) that

4 (1 − 2q + q) 1
R≥− ≥
N (1 − β 2 (1 − 2q + q)) KN

because 1 − 2q + q = Ech−4 Y > 0 and the denominator is < 0 by (1.266). It


follows from (1.277) that for N large enough
  1
ν |R1,2 − q|3 ≥ . (1.278)
KN
Now since A2 , B 2 , C 2 ≤ K/N , (1.276) shows that, using (1.278) in the third
following inequality,
 
  K 1    
ν (R1,2 − q) ≤
2
+R≤K + ν |R1,2 − q| 3
≤ Kν |R1,2 − q|3 ,
N N
and this proves that there exists δ, that does not depend on N , such that
ν(|R1,2 −q|3 ) ≥ δν((R1,2 −q)2 ). Moreover, using (1.278), and since |R1,2 −q| ≤
2 we have ν((R1,2 − q)2 ) ≥ ν(|R1,2 − q|3 )/2 ≥ 1/(KN ). 

We might think that the unpleasant behavior (1.271) arises from the fact
that ν(|R1,2 − x|)  0 for some x = q. This is not the case.

Proposition 1.9.5. a) There exists K depending on β and h only such that


for all x ≥ 0 we have
 
1
|x − q| ≤ K ν(|R1,2 − x|) + . (1.279)
N

b) Under (1.266) there exists a number δ such that for N large enough
  δ
∀x ≥ 0 , ν 1{|R1,2 −x|≥δ } ≥ . (1.280)
N
Proof. We use (1.215), but where νt is defined using x rather than q, to get

|ν(ε1 ε2 ) − ν0 (ε1 ε2 )| ≤ Kν(|R1,2 − x|) . (1.281)

We have ν0 (ε1 ε2 ) = Φ(x), where Φ is given by (1.273), and ν(ε1 ε2 ) = ν(R1,2 )


by symmetry among sites, and therefore (1.281) implies
 
1
|Φ(x) − ν(R1,2 )| ≤ K ν(|R1,2 − x|) + . (1.282)
N
Jensen’s inequality entails

|ν(R1,2 ) − x| ≤ ν(|R1,2 − x|) ,

so (1.282) yields
96 1. The Sherrington-Kirkpatrick Model
 
1
|Φ(x) − x| ≤ |Φ(x) − ν(R1,2 )| + |ν(R1,2 ) − x| ≤ K ν(|R1,2 − x|) + .
N

Now, the function Φ(x) satisfies Φ(q) = q and, as seen in the proof of Lemma
1.9.3 we have Φ (q) < 1, so that |x − Φ(x)| ≥ K −1 |x − q| when |x − q| is
small. Since Proposition A.14.1 shows that Φ(x)/x is decreasing it follows
that |x − Φ(x)| = 0 for x = q, and the previous inequality holds for all x ≥ 0
and this proves (1.279).
To prove (1.280), we observe that if |x − q| ≤ δ/4 then by (1.271) we have
    δ2
ν 1{|R1,2 −x|≥δ/4} ≥ ν 1{|R1,2 −q|≥δ/2} ≥ ,
16N
so it is enough to consider the case |x − q| ≥ δ/4. But then by (1.279) it holds
 
δ 1
≤ K ν(|R1,2 − x|) + ,
4 N

so that for N large enough we get ν(|R1,2 − x|) ≥ δ/(8K) := 1/K0 and thus
since |R1,2 − x| ≤ 2 we obtain

1   1
≤ ν(|R1,2 − x|) ≤ 2ν 1{|R1,2 −x|≥1/(2K0 )} + .
K0 2K0
Consequently,
  1
ν 1{|R1,2 −x|≥1/2K0 } ≥ . 

4K0

In the rest of this section we show that (1.280) has consequences with a
nice physical interpretation (although the underlying mathematics is elemen-
tary large deviation theory).
For this we consider the Hamiltonian

− HN,λ (σ 1 , σ 2 ) = −HN (σ 1 ) − HN (σ 2 ) + λN R1,2 . (1.283)

This is the Hamiltonian of a system made from two copies of (ΣN , GN ) that
interact through the term λN R1,2 . We define

ZN,λ = exp(−HN,λ (σ 1 , σ 2 )) (1.284)
σ 1 ,σ 2
1 1
ψN (λ) = E log ZN,λ − E log ZN,0 (1.285)
N N
so that the identity
1
ψN (λ) = E log exp λN R1,2 (1.286)
N
1.9 Beyond the AT Line 97

holds, where · denotes an average for the Gibbs measure with Hamiltonian
HN . This quantity is natural to consider to study the fluctuations of R1,2 . We
denote by · λ an average for the Gibbs measure with Hamiltonian (1.283);
thus
R1,2 exp λN R1,2
ψN (λ) = E = E R1,2 λ .
exp λN R1,2
We also observe that ψN is a convex function of λ, as is obvious from (1.286)
λ −
2
and Hölder’s inequality. (One can also compute ψN (λ) = N (E R1,2
E R1,2 λ ) ≥ 0.)
2

Theorem 1.9.6. ψ(λ) = limN →∞ ψN (λ) exists for all β, h and (under
(1.266)) is not differentiable at λ = 0.
The important part of Theorem 1.9.6 is the non-differentiability of the
function ψ. We shall prove the following, of which Theorem 1.9.6 is an imme-
diate consequence once we know the existence of the limit limN →∞ ψN (λ).
The existence of this limit is only a side story in Theorem 1.9.6. It requires
significant work, so we refer the reader to [76] for a proof.
Proposition 1.9.7. Assume (1.266), and consider δ as in (1.280). Then
for any λ > 0 we have ψN (λ) − ψN (−λ) > δ /2 provided N is large enough.
To deduce Theorem 1.9.6, consider the subset U of R such that ψ (±λ)
exists for λ ∈ U . Since ψ is convex, the complement of U is at most
countable. Griffiths’ lemma (see page 25) asserts that limN →∞ ψN (±λ) =
ψ (±λ) for λ in U . By Proposition 1.9.7 for any λ ∈ U , λ > 0, we have
ψ (λ) − ψ (−λ) ≥ δ /2. Now, since ψ is convex, the limit limλ→0+ ,λ∈U ψ (λ)
is the right-derivative ψ+ (0) and similarly, while the limit λ → 0− is the left
derivative ψ− (0). Therefore ψ+ (0) − ψ− (0) > δ /2 and ψ is not differentiable
at 0.
In words, an arbitrarily small change of λ around 0 produces a change in
E R1,2 λ of at least δ /2, a striking instability.
Proof of Proposition 1.9.7. Let xN = E R1,2 = ψN (0). Using (1.280) we
see that at least one of the following occurs
  δ
ν 1{R1,2 ≥xN +δ } ≥ (1.287)
2N
  δ
ν 1{R1,2 ≤xN −δ } ≥ . (1.288)
2N
We assume (1.287); the proof in the case (1.288) is similar. We have
 
exp λN R1,2 ≥ exp λN (xN + δ ) 1{R1,2 ≥xN +δ }

so that
1 1  
log exp λN R1,2 ≥ λ(xN + δ ) + log 1{R1,2 ≥xN +δ } .
N N
98 1. The Sherrington-Kirkpatrick Model

The r.v. X = 1{R1,2 ≥xN +δ } satisfies EX ≥ δ /2N by (1.287), so that, since
X ≤ 1, we have  
δ δ
P X≥ ≥ .
4N 4N
(Note that EX ≤ ε + P(X ≥ ε) for each ε and take ε = δ /(4N ).) Thus
  
1 1 δ δ
P log exp λN R1,2 ≥ λ(xN + δ ) + log ≥ . (1.289)
N N 4N 4N

On the other hand the r.v.


1 1 1
F = log exp λN R1,2 = log ZN,λ − ZN,0
N N N
satisfies
 
N x2
∀x > 0 , P(|F − EF | ≥ x) ≤ 2 exp −
K

by Proposition 1.3.5 (as used in the proof of (1.54)). Taking x = λδ /4 we


get     
1  λδ N

P  log exp λN R1,2 − ψN (λ) ≥  ≤ 2 exp − ,
N 4 K
and in particular
   
1 λδ N
P log exp λN R1,2 ≥ ψN (λ) + ≤ 2 exp − .
N 4 K

Comparing with (1.289) implies that for N large enough we have


 
λδ 1 δ
ψN (λ) + ≥ λ(xN + δ ) + log ,
4 N 4N

and in particular  
δ
ψN (λ) ≥ λ xN +
2
and therefore, since ψN (0) = 0 and ψN is convex,

ψN (λ) − ψN (0) δ δ δ
ψN (λ) ≥ ≥ xN + = ψN (0) + ≥ ψN (−λ) + . 

λ 2 2 2

1.10 Central Limit Theorem for the Overlaps

In this section we continue the work√of Section


√ 1.8 but
√ now for higher mo-
ments. We show that the quantities N T, , N T , N T of (1.245) behave
asymptotically as N → ∞ like an independent family of Gaussian r.v.s. As
1.10 Central Limit Theorem for the Overlaps 99

a consequence, we show that as N → ∞, for the typical disorder, a given


family of variables

(N 1/2 (R, − R1,2 ))1≤< ≤n


n
becomes nearly Gaussian on the probability space (ΣN , GnN ) with an explicit
covariance structure that is independent of the disorder. Moreover, the r.v.s
N 1/2 ( R1,2 − q) are asymptotically Gaussian.
Since all the important ideas were laid down in Section 1.8, the point
is really to write things down properly. This requires significant work. This
work is not related to any further material in this volume, so the reader who
may not enjoy these calculations should simply skip the rest of this section.
We recall that the quantities A, B, C have been defined in (1.248), (1.256),
(1.262), and we use the notation a(k) = Eg k where g is a standard Gaussian
r.v.

Theorem 1.10.1. Assume that β < 1/2. Fix an integer n. For 1 ≤  <  ≤
n consider integers k(,  ), and for 1 ≤  ≤ n consider integers k(). Set
 
k1 = k(,  ) ; k2 = k() ,
1≤< ≤n 1≤≤n

consider a further integer k3 and finally set k = k1 + k2 + k3 . Then


   k() 
k(, )
ν T, T T k3 (1.290)
 1≤<
≤n 1≤≤n
= a(k(,  )) a(k()) a(k3 )Ak1 B k2 C k3 + O(k + 1) .
1≤< ≤n 1≤≤n

Here, as usual, O(k+1) denotes a quantity W with |W | ≤ KN −(k+1)/2 , where


K does not depend on N (but will depend on the integers k(,  ), k(), k3 ).
The left-hand side and the first term in the right-hand side of (1.290) are
both of order N −k/2 . The product
 k(, )
 k()
T, T T k3
1≤< ≤n 1≤≤n

is simply any (finite) product of quantities of the type T, , T , T , and the
rôle of the integer n is simply to record “on how many replicas this product
depends”, which is needed to apply the cavity method.
One can reformulate (1.290) as follows. Consider independent Gaussian
r.v.s U, , U , U and assume
2 2
EU,  = NA ; EU2 = N B 2 ; EU 2 = N C 2 .

The point of this definition is that the quantities N A2 , N B 2 , and N C 2 do


not depend on N . Then (1.290) means
100 1. The Sherrington-Kirkpatrick Model
 
k/2
 k(, )  k() k3
N ν 1≤< ≤n T, T 1≤≤n T
 
 k(, )  k() k3
=E 1≤< ≤n U, 1≤≤n U U + O(1) , (1.291)

where, in agreement with the notation


√ for O(k) (see above) O(1) denotes a
quantity W such that |W | ≤ K/ N .
We now explain why this statement contains the fact that the r.v.s
(N 1/2 (R, − R1,2 ))1≤< ≤n are asymptotically Gaussian under Gibbs’
measure. We still consider numbers  k(,  ) for 1 ≤  <
 ≤ n and num-
bers k() for 1 ≤  ≤ n, and let k = 1≤< ≤n k(,  ) + 1≤≤n k(). First
we show that the quantity
 k(, )
 k()
V = N k/2 T, T (1.292)
1≤< ≤n 1≤≤n

is essentially non-random. Indeed, we use replicas to express V 2 as


 k(, )
 k()
 k(, )
 k()
V 2 = Nk T, T T+n, +n T+n ,
1≤< ≤n 1≤≤n 1≤< ≤n 1≤≤n

and we apply (1.291) to compute EV 2 and EV ; we thus obtain:

EV 2 = (EV )2 + O(1) , (1.293)

because the r.v.s U, , U for ,  ≤ n are independent from the r.v.s U, , U
for n + 1 ≤ ,  . Consequently,

E(V − EV )2 = O(1) . (1.294)

Let q = R1,2 , and observe that T = q − q, so that by (1.246) we obtain

R, − q = T, + T + T . (1.295)


 
When a product , (R,
 − q)k(, ) contains k factors, the quantity
 
W = N k/2 (R, − q)k(, )
,

satisfies
E(W − EW )2 = O(1) ,
because we may use (1.295) and expand this quantity as a sum of terms of
the type (1.292). Consider the r.v.s g, = U, + U + U . Expanding and
using (1.291) we see that
 k(, )
EW = E g, + O(1) .
,
1.10 Central Limit Theorem for the Overlaps 101

The above facts


√ make precise the statement that for the typical disorder, the
quantities ( N (R, − q))1≤< ≤n are asymptotically Gaussian on the space
n
(ΣN , GnN ). Indeed, for the typical disorder,
   k(, )
N k/2 (R, − q)k(, ) E g, .
, ,

The Gaussian family (g, ) may also be described by the following properties:
2
Eg, 2 2
 = N (A + 2B ), Eg, g1 ,2 = N B
2
if card({,  } ∩ {1 , 2 }) = 1 and
Eg, g1 ,2 = 0 if {,  } and {1 , 2 } are disjoint.
We now prepare for the proof of Theorem 1.10.1. In the next two pages,
until we start the proof itself, the letter k denotes any integer. We start the
work with the easy part,
 which is the control of the error terms. By (1.88),
for each k we have ν (R1,2 − q)2k ≤ K/N k , and thus
  K
ν |R1,2 − q|k ≤ .
N k/2
Consequently,
 −  K
ν |R1,2 − q|k ≤ , (1.296)
N k/2
and this entails a similar bound for any quantity W that is a linear combi-
− − −
nation of the quantities R,  − q, e.g. W = R1,3 − R2,3 .

Let us say that a function f is of order k if it is the product of k such


quantities W1 , . . . , Wk and of a function W0 with |W0 | ≤ 4. The reason for
the condition |W0 | ≤ 4 is simply that typical choices for W0 will be W0 = 1
and W0 = (ε1 − ε2 )(ε3 − ε4 ) (and this latter choice satisfies |W0 | ≤ 4). Thus

a typical example of a function of order 1 is (R,  − q)(ε1 − ε2 )(ε3 − ε4 ).

As a consequence of (1.296) and Hölder’s inequality, if f is a function of


order k then we have
ν(f 2 )1/2 = O(k) . (1.297)
In words, each factor W for  = 1, . . . , k contributes as N −1/2 while the
factor W0 contributes as a constant. Consequently, (1.215) and (1.216) used
for τ1 = τ2 = 2 imply that a function f of order k satisfies

ν(f ) = ν0 (f ) + O(k + 1) (1.298)


ν(f ) = ν0 (f ) + ν0 (f ) + O(k + 2) . (1.299)

In order to avoid repetition, we will spell out the exact property we will use
in the proof of Theorem 1.10.1. The notation is as in Lemma 1.8.2.
Lemma 1.10.2. Consider integers x, y ≤ n as well as a function f − on ΣN n
,

which is the product of k terms of the type R,  − q. Then the identity
102 1. The Sherrington-Kirkpatrick Model


ν((εx εy − q)f − ) = b(,  ; x, y)ν(f − (R,  − q))

1≤< ≤n


−n b(, n + 1; x, y)ν(f − (R,n+1 − q))
≤n
n(n + 1) −
+ b(0)ν(f − (Rn+1,n+2 − q)) + O(k + 2) (1.300)
2
holds.
Proof. We use (1.299) for f = f − (εx εy − q), so that ν0 (f ) = 0 and we use
(1.230) to compute ν0 (f ). We then use (1.298) with k + 1 instead of k to see
− −
that ν0 (f − (R,  − q)) = ν(f

(R,  − q)) + O(k + 2). 

Of course Lemma 1.10.2 remains true when f is a product of k terms

which are linear combinations of terms of the type R,  − q.

Corollary 1.10.3. Consider a function f − on ΣN


n
, that is the product of k

terms of the type R, − q. Then we have

ν((ε1 − ε2 )(ε3 − ε4 )f − )
− − − −
= (b(2) − 2b(1) + b(0))ν((R1,3 − R1,4 − R2,3 + R2,4 )f − )
+ O(k + 2) . (1.301)

Moreover, whenever f − does not depend on the third replica σ 3 we also have
− −
ν((ε1 − ε2 )ε3 f − ) = (b(2) − 4b(1) + 3b(0))ν((R1,3 − R2,3 )f − )

− − − −
+ (b(1) − b(0)) ν((R1, − R2, − R1,n+1 + R2,n+1 )f − )
4≤≤n
+ O(k + 2) . (1.302)

Proof. It suffices to reproduce the calculations of Lemmas 1.8.4 and 1.8.9,


using (1.300) in place of (1.230). 

A fundamental idea in the proof of Theorem 1.10.1 is that we should
attack first a term T, (if there is any), using symmetry among sites to
write the left-hand side of (1.290) as ν((ε1 − ε2 )(ε3 − ε4 )f ) for a suitable f .
The goal is then to use (1.301); for this, one has to understand the influence
of the dependence of f on the last coordinate. This requires the knowledge of
(1.290), but where k1 has been decreased, opening the door to induction. If
no term T, is present, one instead attacks a term T (if there is any). We will
then have to use the more complicated formula (1.302) rather than (1.301),
but this is compensated by the fact that, by the previous step, we already
know (1.290) when k1 > 0, so that the values of many of the terms resulting
of the use of (1.302) are already known. Finally, if there is no term T, or T
in the left-hand side of (1.290), we are forced to use the formidable formula
(1.300) itself, but we will be saved by the fact that most of the resulting
1.10 Central Limit Theorem for the Overlaps 103

terms have already been computed in the previous steps. If one thinks about
it, this is exactly the way we have proceeded in Section 1.8.
We now start the proof of Theorem 1.10.1, the notation of which we use;
in particular:
k = k1 + k2 + k3 .
Proposition 1.10.4. We have
   k() 
k(, ) k3
ν T, T T (1.303)
1≤< ≤n 1≤≤n
   
k()
= a(k(,  ))Ak1 ν T T k3 + O(k + 1) .
1≤< ≤n 1≤≤n

Proof. This is true for k1 = 0. The proof goes by induction on k1 . We


assume k1 > 0, and, without loss of generality, we assume k(1, 2) ≥ 1. Before
starting any computation, we must address the fact that T, and T depend
on the disorder, and we must express the right-hand side of (1.303) as the
n
average of a non-random function on ΣN for a certain n . It eases notation to
label properly the terms with which we are working, and to enumerate them
as a sequence. For v ≤ k1 we consider two integers (v) and  (v) in such a
way that each pair (,  ) for 1 ≤  <  ≤ n is equal to the pair ((v),  (v))
for exactly k(,  ) values of v ≤ k1 , and that

((v),  (v)) = (1, 2) ⇔ v ≤ k(1, 2) .

It then holds that  k(, )



T, = T(v), (v) .
, v≤k1

For k1 < v ≤ k1 + k2 we consider an integer (v) such that for each  ≤ n,


we have  = (v) for exactly k() values of v. It then holds that
 k() 
T = T(v) . (1.304)
 k1 <v≤k1 +k2

Now we shall use on a massive scale the technique described on page 88 of


“replacing each copy of b by σ  for a new value of ”. For this purpose for
1 ≤ v ≤ k we consider for two integers j(v), j (v) > n, such that these are all
distinct as v varies. For v ≤ k1 we set
 
(σ (v) − σ j(v) ) · (σ  (v) − σ j (v) )
U (v) =
N
= R(v), (v) − R(v),j  (v) − Rj(v), (v) + Rj(v),j  (v) ;

for k1 < v ≤ k1 + k2 we set



(σ (v) − σ j(v) ) · σ j (v)
U (v) = = R(v),j  (v) − Rj(v),j  (v) ;
N
104 1. The Sherrington-Kirkpatrick Model

and for k1 + k2 < v ≤ k = k1 + k2 + k3 we set



σ j(v) · σ j (v)
U (v) = − q = Rj(v),j  (v) − q .
N
As we just assumed, the integers j(v), j (v) are all distinct for v ≤ k. Aver-
aging first in all the σ  ’s for  any one of these integers we see that
 k(, )
 k()

T, T T k3 = U (v) (1.305)
1≤< ≤n ≤n v≤k

and v≤k U (v) is now independent of the disorder. This quantity can be
n
considered as a function on ΣN , where n is any integer larger than all the
numbers j(v) and j (v).
Let us define

ε(v) = (ε(v) − εj(v) )(ε (v) − εj  (v) ) (1.306)

for v ≤ k1 ;
ε(v) = (ε(v) − εj(v) )εj  (v) (1.307)
for k1 < v ≤ k1 + k2 ; and finally, for k1 + k2 < v ≤ k + k1 + k2 + k3 , let

ε(v) = εj(v) εj  (v) . (1.308)

Using (1.305) and symmetry between sites in the second line,


   k()   
k(, )
V := ν T, T T k3 = ν U (v)
1≤< ≤n ≤n v≤k
  
= ν ε(1) U (v) . (1.309)
2≤v≤k

Now we want to bring out the dependence of 2≤v≤k U (v) on the last spin.
We define U − (v) = U (v) − ε(v)/N , and we expand the product
   ε(v) 
U (v) = + U − (v)
N
2≤v≤k 2≤v≤k
  ε(u) 
= U − (v) + U − (v) + S
N
2≤v≤k 2≤u≤k v =u

where the notation v=u U − (v) means that the product is over 2 ≤ v ≤ k,
v = u, and where S is the sum of all the other terms. Each term W of S is
the product of k − 1 factors, each of which being either ε(v)/N or U − (v),
and at least 2 of these factors are of the type ε(v)/N . √
If we do think of each factor U − (u) as “contributing like 1/ N ” by (1.296)
and each factor ε(u)/N as “contributing like 1/N ”(an argument which is
1.10 Central Limit Theorem for the Overlaps 105

made rigorous by using Hölder’s inequality as in the proof of (1.297)) we see


that ν(|W |) = O(k + 1). Therefore we have
     
ν ε(1) U (v) = ν ε(1) U − (v) + I + O(k + 1) (1.310)
2≤v≤k 2≤v≤k

where  
1  
I= ν ε(1)ε(u) U − (v) . (1.311)
N
2≤u≤k v =u

We first study the term I. Since the product v=u U − (v) contains k − 2

factors, the function ε(1)ε(u) v=u U − (v) is a function of order k − 2 as
defined four lines above (1.297); and then (1.298) entails
     
1 1
ν ε(1)ε(u) U − (v) = ν0 ε(1)ε(u) U − (v) + O(k + 1) .
N N
v =u v =u

Now Lemma 1.6.2 implies


    
− −
ν0 ε(1)ε(u) U (v) = ν0 (ε(1)ε(u))ν0 U (v) .
v =u v =u

For u ≤ k(1, 2) we have

ν0 (ε(1)ε(u)) = ν0 ((ε1 − εj(1) )(ε2 − εj  (1) )(ε1 − εj(u) )(ε2 − εj  (u) ))


= 1 − 2q + q ,

as is seen by computation: we expand the product and compute each of the


16 different terms. One of the terms is 1. Each of the other terms is either
of the type ±ν0 (εj εj  ) for j = j , and hence equal to ±q, or of the type
±ν0 (εj1 εj2 εj3 εj4 ) where the indexes j1 , j2 , j3 , j4 are all different and hence
the term is ± q.
For u > k(1, 2) we now show that ν0 (ε(1)ε(u)) = 0. First one checks on
the various cases (1.306) to (1.308) that ε(u) does not depend on both ε1 and
ε2 . If, say, it does not depend on ε1 , then since ε(u) does not depend on εj(1)
(because the integers j(v), j(v ) are all distinct and > n) the factor ε1 − εj(1)
in ε(1) ensures that ε(1)ε(u) = 0. This proves that, recalling the notation
I of (1.311),
  
1 −
I = (1 − 2q + q) ν0 U (v) + O(k + 1)
N
2≤u≤k(1,2) v =u
  
1 −
= (1 − 2q + q) ν U (v) + O(k + 1) ,
N
2≤u≤k(1,2) v =u

using again (1.298). Moreover, when u ≤ k(1, 2), by symmetry we have


106 1. The Sherrington-Kirkpatrick Model
    
− −
ν U (v) = ν U (v) .
v =u 3≤v≤k

Thus we obtain
  
k(1, 2) − 1
I= (1 − 2q + q )ν U − (v) + O(k + 1) . (1.312)
N
3≤v≤k

Next, we use (1.301) (with indices 1, j(1), 2, j (1) rather than 1, 2, 3, 4, and
with k − 1 rather than k) to see that
  
ν ε(1) U − (v)
2≤v≤k
  
− − − − −
= β (1 − 2q + q )ν
2
(R1,2 − R1,j  (1) − Rj(1),2 + Rj(1),j  (1) ) U (v)
2≤v≤k
+ O(k + 1)
  
= β 2 (1 − 2q + q )ν U − (v) + O(k + 1) .
1≤v≤k

Combining with (1.310) and (1.309) we reach the equality


  
ν ε(1) U (v)
2≤v≤k
  

= β (1 − 2q + q )ν
2
U (v)
1≤v≤k
  
k(1, 2) − 1 −
+ (1 − 2q + q )ν U (v) + O(k + 1) .
N
3≤v≤k

We claim that on the right-hand side we may replace each term U − (v) by
U (v) up to an error of O(k + 1). To see this we simply use the relation
U − (v) = U (v) − ε(v)/N and we expand the products. All the terms except
the one where all factors are U (v) are O(k + 1), as follows from (1.297).
Recalling (1.309) we have proved that
  
k(1, 2) − 1
(1 − β (1 − 2q + q))V =
2
(1 − 2q + q)ν U (v) + O(k + 1) .
N
3≤v≤k
(1.313)
The proof is finished if k(1, 2) = 1, since a(1) = 0. If k(1, 2) ≥ 2, we have
      k() 
k (, ) k3
ν U (v) = ν T, T T ,
3≤v≤k 1≤< ≤n 1≤≤n
1.10 Central Limit Theorem for the Overlaps 107

where k (,  ) = k(,  ) unless  = 1,  = 2, in which case k (1, 2) = k(1, 2) −


2. This term is of the same type as the left-hand side of (1.303), but with
k1 − 2 instead of k1 . We can therefore apply the induction hypothesis to get
 
 k (, )  k() k3
ν  T
1≤< ≤n ,  T
1≤≤n  T
 
 
= 1≤< ≤n a(k (,  ))Ak1 −2 ν
k() k3
T
1≤≤n  T + O(k + 1) .

Combining with (1.313) and using the value of A we get


   
k()
V = (k(1, 2) − 1) a(k (,  ))Ak1 ν T T k3 + O(k + 1) .
1≤< ≤n 1≤≤n

Using that a(k(1, 2)) = (k(1, 2) − 1)a(k (1, 2)), this completes the induction
and the proof of Proposition 1.10.4. 

Proposition 1.10.5. With the notation of Theorem 1.10.1 we have
   k() 
k(, ) k3
ν T, T T (1.314)
1≤< ≤n 1≤≤n
   
= a(k(,  )) a(k())Ak1 B k2 ν T k3 + O(k + 1) .
1≤< ≤n 1≤≤n

Proof. We already know from Proposition 1.10.4 that we can assume that
k1 = 0. So we fix k1 = 0 and we prove Proposition 1.10.5 by induction over
k2 . Thus assume k2 > 0 and also without loss of generality that k(1) > 0.
We keep the notation of Proposition 1.10.4. Recalling (1.304) we assume

(v) = 1 ⇔ v ≤ k(1) .

Using (1.305) we write, using symmetry between sites,


    
k()
V := ν T T k3 = ν U (v)
1≤≤n v≤k
  
= ν ε(1) U (v) (1.315)
2≤v≤k

and (1.310) remains valid. For v ≤ k(1) we have

ν0 (ε(1)ε(v)) = ν0 ((ε1 − εj(1) )εj  (1) (ε1 − εj(v) )εj  (v) )


= q − q

and for v > k(1) we have ν0 (ε(1)ε(v)) = 0 because ε(v) does not depend on
either ε1 or εj(1) . Thus, instead of (1.312) we now have (recalling that the
term I has been defined in (1.311))
108 1. The Sherrington-Kirkpatrick Model
  
k(1) − 1 −
I= (q − q)ν U (v) + O(k + 1) . (1.316)
N
3≤v≤k

We use (1.302) (with the indices 1, j(1), j (1) rather than 1, 2, 3 and with k−1
rather than k) to obtain
  

ν ε(1) U (v)
2≤v≤k
  
− − −
= β (1 − 4q + 3
2
q )ν (R1,j  (1) − Rj(1),j  (1) ) U (v)
2≤v≤k
+ II + O(k + 1) (1.317)
for
   
− − − −
II = β 2 (q − q) ν (R1, − Rj(1), − R1,n  +1 + Rj(1),n +1 ) U − (v) ,
 2≤v≤k
(1.318)
where n is an integer larger than all indices j(v), j (v), and where the sum-
mation is over 2 ≤  ≤ n ,  = j(1),  = j (1).
Compared to the proof of Proposition 1.10.4 the new (and non-trivial)
part of the argument is to establish the relation
  
II = (k(1) − 1)β 2 (q − q)ν 2
T(v) T1,n  +1 + O(k + 1) (1.319)
3≤v≤k

and we explain first how to conclude once (1.319) has been established. As

usual in (1.316) and (1.317) we can replace U − (v) by U (v) and R,  by R,

with an error O(k + 1), so that


  
k(1) − 1
I= (q − q)ν U (v) + O(k + 1) ,
N
3≤v≤k

and also
     
− − − −
ν (R1,j  (1) − Rj(1),j  (1) ) U (v) = ν U (v)
2≤v≤k 1≤v≤k
  
=ν U (v) + O(k + 1)
1≤v≤k
= V + O(k + 1) .
Combining with (1.310) and (1.317) we get
   
1
(1 − β (1 − 4q + 3
2
q ))V = (k(1) − 1)(q − q) ν U (v)
N
3≤v≤k
  
2 2
+β ν T(v) T1,n+1 + O(k + 1) . (1.320)
3≤v≤k
1.10 Central Limit Theorem for the Overlaps 109

This completes the proof if k(1) = 1 since a(1) = 0. If k(1) ≥ 2, we have


    
k () k3
ν U (v) = ν T T , (1.321)
3≤v≤k ≤n

where k () = k() for  > 1 and k (1) = k(1) − 2 if  = 1. Thus the induction
hypothesis implies
  
1 k () k3 1   
ν T T = a(k ())B k2 −2 ν T k3 +O(k +1) . (1.322)
N N
1≤≤n 1≤≤n

We can use Proposition 1.10.4 and the induction hypothesis to compute the
term   
2
ν T(v) T1,n +1
3≤v≤k

because this term contains only k2 − 2 factors T , and we find


     
ν 2
T(v) T1,n  +1 = A2 a(k ())B k2 −2 ν T k3 +O(k +1) . (1.323)
3≤v≤k 1≤≤n

Combining (1.320) to (1.323) we get

(1 − β 2 (1 − 4q + 3
q ))V
  
1  
= (q − q) + β 2 A2 a(k ())B k2 −2 ν T k3 + O(k + 1) .
N
1≤≤n

Using the relation


 
1
(1 − β 2 (1 − 4q + 3
q ))B 2 = (q − q) + β 2 A2 ,
N

and that (k(1) − 1)a(k(1) − 2) = a(k(1)) then completes the induction.


Now we turn to the proof of (1.319). As usual we have
   
II = β (q − q)
2
ν (R1, − Rj(1), − R1,n +1 + Rj(1),n +1 )
 U (v)
 2≤v≤k
+ O(k + 1) ,

where the summation is as in (1.318). Moreover (1.246) implies


   
II = β (q − q)
2
ν (T1, − Tj(1), − T1,n +1 + Tj(1),n +1 )
  U (v)
 2≤v≤k
+ O(k + 1) , (1.324)
110 1. The Sherrington-Kirkpatrick Model

and, for 2 ≤ v ≤ k2

U (v) = R(v),j  (v) − Rj(v),j  (v)


= T(v),j  (v) − Tj(v),j  (v) + T(v) − Tj(v) ,

while, for k2 < v ≤ k3

U (v) = Rj(v),j  (v) − q = Tj(v),j  (v) + Tj(v) + Tj  (v) + T .

This looks complicated, but we shall prove that when we expand the product
most of the terms are O(k + 1). We know from Proposition 1.10.4 that in
order for a term not to be O(k + 1), each factor T, must occur at an even
power because a(k) = 0 for odd k. In order for the terms T1, (or Tj(1), , or
T1,n +1 or Tj(1),n +1 ) to occur at an even power in the expansion, one has
to pick the same term again in one of the factors U (v) for v ≥ 2. Since all
the integers j(v), j (v) are ≤ n , this is impossible for the terms T1,n +1 and
Tj(1),n +1 .
Can this happen for the term Tj(1), ? We can never have {j(1), } =
{j(v), j (v)} for v ≥ 2 because the integers j(v), j (v) are all distinct. We
can never have {j(1), } = {(v), j (v)} because j(1) ∈ / {(v), j (v)} since
j(1) > n, (v) ≤ n and j(1) = j (v), so this cannot happen either for this
term Tj(1), .
Can it happen then for the term T1, ? Since j(v), j (v) ≥ n, we can never
have {1, } = {j(v), j (v)}. Since j (v) > n, we have {1, } = {(v), j (v)}
exactly when (v) = 1 and  = j (v). Since 2 ≤ v ≤ k, there are exactly
k(1) − 1 possibilities for v, namely v = 2, . . . , k(1). For each of these values
of v, there is exactly one possibility for , namely  = j (v).
So, only for the terms T1, where  ∈ {j (2), . . . , j (k(1))} can we pick
another copy of this term in the product 2≤v≤k U (v), and this term is
found in U (u) for the unique 2 ≤ u ≤ k(1) for which  = j (u). Therefore in
that case we have
     
ν (T1, − Tj(1), − T1,n +1 + Tj(1),n +1 ) U (v) = ν T1, 2
U (v) .
2≤v≤k v =u

Moreover, since  = j (u) we then have  > n, and since (v) ≤ n and all
the numbers j(v) and j (v) are distinct,  does not belong to any of the sets
{(v), j(v), j (v)} for v = u, so that, by symmetry between replicas,
        
2 2 2
ν T1, U (v) = ν T1,n  +1 U (v) = ν T1,n +1 T(v) ,
v =u 3≤v≤k 3≤v≤k

and since there are exactly k(1) − 1 such contributions, this completes the
proof of (1.319), hence of Proposition 1.10.5. 

Proof of Theorem 1.10.1. We prove by induction over k that
1.10 Central Limit Theorem for the Overlaps 111

ν(T k ) = a(k)C k + O(k + 1)

where C is as in (1.262). This suffices according to Propositions 1.10.4 and


1.10.5. We write
     
ν(T k ) = ν (R2v−1,2v − q) = ν (ε1 ε2 − q) (R2v−1,2v − q) .
1≤v≤k 2≤v≤k

We proceed as before, using now (1.230) for n = 2k to obtain

ν(T k ) = I + II + O(k + 1) , (1.325)

where, defining b(,  ) = b(card{,  } ∩ {1, 2}), we have


k−1
I= q − q 2 )ν(T k−2 )
( (1.326)
N

II = b(,  )ν((R, − q)f )
1≤< ≤n

−n b(, n + 1)ν((R,n+1 − q)f )
≤n
n(n + 1)
+ b(n + 1, n + 2)ν((Rn+1,n+2 − q)f ) (1.327)
2

for f = 2≤v≤k (R2v−1,2v − q). The key computation is the relation

II = β 2 (1 − 4q + 3
q )ν(T k ) (1.328)
 2 
+ (k − 1) β ( q − q )ν(T1,2 T
2 2 k−2
) + 2β (2q + q − 3
2 2 2 k−2
q )ν(T1 T ) .

Once this has been proved one can compute the last two terms using the
induction hypothesis and Propositions 1.10.4 and 1.10.5, namely
2
ν(T1,2 T k−2 ) = A2 a(k − 2)C k−2 + O(k + 1)

and
ν(T12 T k−2 ) = B 2 a(k − 2)C k−2 + O(k + 1) .
Combining this value of II with (1.325) and (1.326), and using (1.262) one
then completes the induction.
It would be nice to have a one-line argument to prove (1.328); maybe
such an argument exists if one finds the correct approach, which probably
means that one has to solve Research Problem 1.8.3. For the time being, one
carefully collects the terms of (1.327). Here are the details of this computation
(a more general version of which will be given in Volume II). In order to
compute ν((R, − q)f ) we can replace each factor R2v−1,2v − q of f by T
whenever {2v − 1, 2v} ∩ {,  } = ∅. Thus we see first that

ν((Rn+1,n+2 − q)f ) = ν(T k−1 (Rn+1,n+2 − q)) = ν(T k ) .


112 1. The Sherrington-Kirkpatrick Model

If w is the unique integer ≤ k such that  ∈ {2w −1, 2w}, then for w = 1 (and
since f does not contain the factor R1,2 −q) we have ν((R,n+1 −q)f ) = ν(T k ),
whereas for w ≥ 2 we have
   
ν (R,n+1 − q)f = ν (R,n+1 − q)(R2w−1,2w − q)T k−2 ,

as is seen simply by averaging first in σ 2v−1 and σ 2v for v = w. To compute


this term, we use (1.246) to write

R,n+1 − q = T,n+1 + T + Tn+1 + T

R2w−1,2w − q = T2w−1,2w + T2w−1 + T2w + T ,


and we expand the product of these quantities. Since a(1) = 0, the induction
hypothesis shows that

ν((R,n+1 − q)f ) = ν(T k ) + ν(T2 T k−2 ) + O(k + 1) .

To compute ν((R, − q)f ) for 1 ≤  <  ≤ n, we first consider the case


where for some 1 ≤ w ≤ n, we have  = 2w − 1 and  = 2w. If w ≥ 2 we
have

ν((R, − q)f ) = ν((R, − q)2 T k−2 ) .


Using again (1.246), the induction hypothesis, and the fact that a(1) = 0, we
get

ν((R, − q)2 T k−2 ) = ν(T k ) + ν(T,


2
T
k−2
) + ν(T2 T k−2 )
+ ν(T2 T k−2 ) + O(k + 1) .

If w = 1, we have instead ν((R, − q)f ) = ν((R1,2 − q)f ) = ν(T k ).


Next we consider the case where  ∈ {2w − 1, 2w},  ∈ {2w − 1, 2w } for
some 1 ≤ w < w .
• If w ≥ 2 we have
 
ν((R, − q)f ) = ν (R, − q)(R2w−1,2w − q)(R2w −1,2w − q)T k−3 ,

and proceeding as before we get

ν((R, − q)f ) = ν(T k ) + ν(T2 T k−2 ) + ν(T2 T k−2 ) + O(k + 1) .

• If w = 1, we find instead
 
ν((R, − q)f ) = ν (R, − q)(R2w −1,2w − q)T k−2 f
= ν(T k ) + ν(T2 T k−2 ) + O(k + 1) .

It remains to gather these terms as in the right-hand side of (1.327). The


coefficient of ν(T k ) is
1.11 Non Gaussian Behavior: Hanen’s Theorem 113
  n(n + 1)
b(,  ) − n b(, n + 1) + b(n + 1, n + 2)
2
< ≤n ≤n

(n − 2)(n − 1)
= β 2 1 − q 2 + 2(n − 2)(q − q 2 ) + q − q2 )
(
2

n(n + 1)
− 2n(q − q ) − (n − 2)n(
2
q−q )+
2
q−q )
( 2
2
= β 2 ((1 − q 2 ) − 4(q − q 2 ) + 3(
q − q 2 )) = β 2 (1 − 4q + 3
q) .
2 k−2 2
We observe that ν(T,  T ) = ν(T1,2 T k−2 ). The coefficient of ν(T1,2
2
T k−2 )
is 
b(2w − 1, 2w) = (k − 1)β 2 ( q − q2 ) .
2≤w≤k

The coefficient of ν(T12 T k−2 ) is

(n − 3)(n − 2)
β 2 (2(n − 2)(q − q 2 ) + 2 q − q 2 ) − (n − 2)n(
( q − q 2 ))
2
= 2β 2 (k − 1)(2q + q 2 − 3
q) ,

since n − 2 = 2(k − 1).


This completes the proof of (1.328) and of Theorem 1.10.1. 


1.11 Non Gaussian Behavior: Hanen’s Theorem


After reading the previous section one could form the impression that every
simple quantity defined in terms of a few spins will have asymptotic Gaussian
behavior when properly normalized. This however is not quite true. In this
section we prove the following remarkable result of A. Hanen, where, as usual,

Y = βz q + h, q is the root of the equation (1.74), a(k) = Eg k for k ∈ N and
a standard Gaussian r.v. g and q = Eth4 Y .

Theorem 1.11.1. (A. Hanen [79]) If β < 1/2, for each k we have
 k/2  2
β2 1
E( σ1 σ2 − σ1 σ2 )k = a(k) E
N (1 − β (1 − 2q + q))
2
ch2k Y
+ O(k + 1) . (1.329)

Of course O(k + 1) denotes a quantity U with |U | ≤ K/N (k+1)/2 where K


does not depend on N . Since the right-hand side is not of the type a(k)Dk ,
this is not a Gaussian behavior. In fact, the meaning of (1.329) is that in the
limit N → ∞ we have
√ D β 1 1
N ( σ1 σ2 − σ1 σ2 ) = g 2 2 , (1.330)
1 − β (1 − 2q + q) ch Y1 ch Y2
2
114 1. The Sherrington-Kirkpatrick Model

where D means equality in distribution, g, Y1 and Y2 are independent, g is


standard Gaussian and Y1 and Y2 are independent copies of Y .
Research Problem 1.11.2. A decomposition such as (1.330) can hardly be
accidental. Rather, it is likely to arise from some underlying structure. Find
it.
In some sense the proof of Theorem 1.11.1 is not very difficult. It relies on
the cavity method and Taylor’s expansions. On the other hand, it is among
the deepest of this volume, and the reader should be comfortable with argu-
ments such as those used in the proof of Theorem 1.7.11 before attempting
to follow all the details of the proof.
To start this proof we note that

E( σ1 σ2 − σ1 σ2 )k = E( σN σN −1 − σN σN −1 )k (1.331)

and that
1
σN σN −1 − σN σN −1 = (σ 1 − σN
2 1
)(σN −1 − σN −1 )
2
2 N
1
= (ε1 − ε2 )(σN
1
−1 − σN −1 )
2
(1.332)
2

where as usual ε = σN . Using replicas, we then have

(ε1 −ε2 )(σN


1
−1 −σN −1 )
2 k
= (ε1 −ε2 )(ε3 −ε4 ) · · · (ε2k−1 −ε2k )f − , (1.333)

where f − = (σN1
−1 − σN −1 ) · · · (σN −1 − σN −1 ).
2 2k−1 2k

For v ≥ 1, let us set ηv = ε2v−1 − ε2v , and for a set V of integers let us
set 
ηV = ηv .
v∈V

Let us also set !


V∗ = {2v − 1, 2v} ,
v∈V

so that ηV depends only on the variables ε for  ∈ V ∗ . From (1.331) to


(1.333) we see the relevance of studying quantities such as ν(ηV f − ) where
f − is a function on ΣN n
−1 . These quantities will be studied by making a
Taylor expansion of the functions t → νt (ηV f − ) at t = 0, where νt refers
(m)
to the interpolating Hamiltonian (1.147). We denote by νt (f ) the m-th
derivative of the function t → νt (f ), and we first learn how to control these
derivatives.

n
Lemma 1.11.3. If f is a function on ΣN we have

(m) K(m, n)
|νt (f )| ≤ ν(f 2 )1/2 . (1.334)
N m/2
1.11 Non Gaussian Behavior: Hanen’s Theorem 115

Proof. This is because “each derivative brings out a factor R,  − q that
−1/2
contributes as N .” More formally, by (1.151), counting each term with
its order of multiplicity, νt (f ) is the sum of 2n2 terms of the type

±β 2 νt (ε ε (R,  − q)f ) ,

(m)
where ,  ≤ n + 2, so that by iteration νt (f ) is the sum of at most
2n2 (2(n + 2)2 ) · · · (2(n + 2(m − 1))2 )
terms of the type
 
±β 2m
νt εr εr (R−r , − q)f
r
r≤m

and we bound each term through Hölder’s inequality and (1.103). 



For a set J of integers, we define

εJ = ε .
∈J

A basic idea is that the quantity ν0 (ηV εJ ) has a great tendency to be zero,
because each factor ηv = ε2v−1 −ε2v gives it a chance. And taking the product
by εJ cannot destroy all these chances if cardJ < cardV , as is made formal
in the next lemma.

Lemma 1.11.4. Assume that card(V ∗ ∩J) < cardV , and consider a function
f of (ε )∈V
/ ∗ . Then
ν0 (ηV εJ f) = 0 .
Proof. Recalling the definition of V ∗ and since card(V ∗ ∩ J) < cardV , there
exists v in V such that {2v − 1, 2v} ∩ J = ∅. Defining V = V \ {v} we get

ηV εJ f = ηv ηV  εJ f ,
where ηV  εJ f depends only on ε for  = {2v − 1, 2v}. Thus

ηV εJ f 0 = ηv 0 ηV  εJ f 0 = 0
because ηv 0 = 0. 

As a consequence of Lemma 1.11.4 terms ηV create a great tendency for
certain derivatives to vanish.

Lemma 1.11.5. Consider two integers r, s and sets V and J with cardV = s
and card(V ∗ ∩ J) ≤ r. Consider a function f of (ε )∈V
/ ∗ and a function f

n
on ΣN −1 . Then for 2m + r < s we have

(ηV εJ ff − ) = 0 .
(m)
ν0
116 1. The Sherrington-Kirkpatrick Model

Proof. Lemma 1.6.2 implies the following fundamental equality:


ν0 (ηV εJ ff − ) = ν0 (ηV εJ f)ν0 (f − ) .
Therefore for m = 0 the conclusion follows from Lemma 1.11.4 since
(0)
ν0 = ν0 . The proof is then by induction over m. We simply observe that
ν0 (ηV εJ ff − ) is a sum of 2n2 terms

±β 2 ν0 (ηV εJ ε ε f(R,



 − q)f

)
and that εJ ε ε = εJ  with J ⊂ J ∪ {,  }, so that
card(V ∗ ∩ J ) ≤ 2 + card(V ∗ ∩ J) ≤ 2 + r .
Moreover
2(m − 1) + 2 + r = 2m + r < s .
The induction hypothesis then yields

(ηV εJ ε ε f(R,


− −
(m−1)
ν0  − q)f )=0,
and this concludes the proof. 

The next corollary takes advantage of the fact that many derivatives van-
ish through Taylor’s formula.

Corollary 1.11.6. Consider sets V and J with cardV = s and card(V ∗ ∩


J) ≤ r. Consider a function f of (ε )∈V/ ∗ and a function f

on ΣN n
−1 .
 n
Assume that ηJ , εJ , f are functions on ΣN (that is, they depend only on ε
for  ≤ n). Then
K(s, n)
|ν(ηV εJ ff − )| ≤ ν((ff − )2 )1/2 ,
N a/2
where
s−r+1 s−r
a= if s − r is odd; a = if s − r is even. (1.335)
2 2
Proof. Consider the largest integer m with 2m < s − r so
s−r−1 s−r
m= if s − r is odd; m = − 1 if s − r is even.
2 2

Thus a = m + 1. Moreover Lemma 1.11.5 implies that ν0 (ηV εJ ff − ) = 0
(m )

whenever m ≤ m. Taylor’s formula and Lemma 1.11.3 then yield


K(s, n)
|ν(ηV εJ ff − )| ≤ sup |νt (ηV εJ ff − )| ≤ ν((ff − )2 )1/2 .
(m+1)

|t|≤1 N (m+1)/2

The reason why K(s, n) depends only on s and n is simply that m ≤ s. 



1.11 Non Gaussian Behavior: Hanen’s Theorem 117

Corollary 1.11.7. Consider a number q (that may depend on N ) with |q −


q | ≤ L/N . Consider a set V with cardV = s, and for u ≤ m consider integers
(u) <  (u). Then
  
ν ηV (R(u), (u) − q ) = O(b + m)
u≤m

where b = (s + 1)/2 if s is odd and b = s/2 if s is even. Moreover these


estimates are uniform over β ≤ β0 < 1/2.

Proof. Let us write


− ε(u) ε(u )
R(u), (u) − q = R(u), (u) − q +
N

and expand the product u≤m (R(u), (u) − q ) according to this decomposi-
tion. We find
   
ν ηV (R(u), (u) − q ) = WI
u≤m I⊂{1,...,m}

where   
WI = ν ηV Cu
u≤m

with ε(u) ε (u) −


Cu = if u ∈ I; Cu = R(u), (u) − q if u ∈
/ I.
N
Let r := cardI, so that
 1
C u = r  εJ
N
u∈I

where cardJ ≤ 2r . Let


 

f − := Cu = (R(u), (u) − q ) ,

u∈I
/ u∈I
/

so that
1
WI = ν(ηV εJ f − ) .
N r
We may use Corollary 1.11.6 with r = 2r , f = 1 to obtain

K(s, n)
|ν(ηV εJ f − )| ≤ ν((f − )2 )1/2 ,
N a/2
where a = (s + 1)/2 − r if s is odd and a = s/2 − r if s is even so that
a=b−r .
118 1. The Sherrington-Kirkpatrick Model

Also, by (1.103) we have νt ((R1,2 −q )2k ) ≤ KN −k and Hölder’s inequality
implies
K(m)
ν((f − )2 )1/2 ≤ (m−r )/2 .
N
Therefore
1 K(s, n) K(m) K(s, n, m)
|WI | ≤ = .
N r N b/2−r /2 N (m−r )/2 N (b+m)/2
The uniformity of these estimates over β ≤ β0 < 1/2 should be obvious. 


Lemma 1.11.8. Consider a set V with cardV = 2m, a function f of


(ε )∈V
/ ∗ , and a function f

on ΣNn 
−1 . Assume that ηV and f are functions
n
on ΣN . Then
 (m−1)
ν0 (ηV ff − ) = β 2 (ηV ε ε (R, − q)ff − ) .
(m)
ν0 (1.336)
< ≤n

Proof. From (1.151) we know that νt (ηV ff − ) is the sum of 2n2 terms of
the type

±β 2 νt (ηV ε ε (R,  −) .
 − q)f f

(m−1)
Now it follows from Lemma 1.11.5 used for s = 2m and ν0 rather than
(m)
ν0 that
(m−1)
ν0 −
(ηV ε ε (R,  −) = 0
 − q)f f

unless ,  ∈ V ∗ . Looking again at (1.151) we observe that the only terms for
which this occurs are the terms

β 2 νt (ηV ε ε (R,  − ) for  <  ≤ n .
 − q)f f 


The next result is the heart of the matter. Given a set V with cardV = 2m,
we denote by I a partition of V in sets J with cardJ = 2. When J = {u, v}
we consider the “rectangular sums”
UJ− = R2u−1,2v−1
− −
− R2u−1,2v −
− R2u,2v−1 −
+ R2u,2v ,
and
UJ = R2u−1,2v−1 − R2u−1,2v − R2u,2v−1 + R2u,2v .

Theorem 1.11.9. Consider a set V with cardV = 2m, a function f of


− n
(ε )∈V
/ ∗ and a function f on ΣN −1 . Then

     
(m)  − 2m  1 − −
ν0 (ηV f f ) = β m! E f 0 4m ν0 f UJ , (1.337)
I
ch Y J∈I

where the summation is over the possible choices of the partition I of V .


1.11 Non Gaussian Behavior: Hanen’s Theorem 119

When m = 1 and f = 1, this is (1.237).


Proof. We may assume that n is large enough so that ηV and f are functions
n
on ΣN . Iteration of (1.336) and use of Lemma 1.6.2 show that
   
(m)  −
ν0 (ηV f f ) = β 2m 
ν0 (ηV ε1 ε1 · · · εm εm f )ν0 f
 
− −
(Rr , − q) ,
r
r≤m
(1.338)
where the summation is over all choices of 1 ≤ 1 < 1 ≤ n, . . . , 1 ≤ m <
m ≤ n. Now, as shown in the proof of Lemma 1.11.4,

ν0 (ηV ε1 ε1 · · · εm εm f) = 0 (1.339)

unless each of the sets {2v − 1, 2v} for v ∈ V contains at least one of the
points r or r (r ≤ m). There are 2m such sets and 2m such points; hence
each set must contain exactly one point. When this is the case let us define

Jr = {vr , vr } where r ∈ {2vr − 1, 2vr } ; r ∈ {2vr − 1, 2vr } .

Then {J1 , . . . , Jm } forms a partition of V . Moreover,


 
ηV ε1 ε1 · · · εm εm f 0 = f 0 ηvr εr 0 ηvr εr 0
r≤m r≤m

and
"
1 − th2 Y = 1/ch2 Y if r = 2vr −1
ηvr εr 0 = (ε2vr −1 −ε2vr )εr 0 =
−(1 − th Y ) = −1/ch Y if r = 2vr
2 2

and similarly for ηvr εr 0 . Let us then define

τr = 1 if r = 2vr − 1 ; τr = −1 if r = 2vr ,

and τr similarly. Then the quantity (1.338) is


       
 1 − −
2m
β E f 0 4m τr τr ν0 f (Rr , − q) , (1.340)
ch Y r≤m r≤m
r

where the summation is over all the choices of the partition {J1 , . . . , Jm }
of V in sets of two elements, and all choices of r and r as above. Given
the set Jr , there are two possible choices for r (namely r = 2vr − 1 and
r = 2vr ) and similarly there are two possible choices for r . Thus, given the
sets J1 , . . . , Jr , there are 22m choices for the indices r and r , r ≤ m. In the
next step, we add the 22m terms in the right-hand side of (1.340) for which
the sets J1 , . . . , Jm take given values. We claim that this gives a combined
term of the form
120 1. The Sherrington-Kirkpatrick Model
    
1
β 2m
E f 0 ν0 f − −
UJr .
ch4m Y r≤m

To understand this formula, one simply performs the computation when m =


1 and one observes that “there is factorization over the different values of
r ≤ m”. If we keep in mind the fact that there are m! choices of the sequence
J1 , . . . , Jm for which {J1 , . . . , Jm } forms a given partition I of V in sets of 2
elements, we have proved (1.337). 

We are now ready to start the real computation. We recall the notation
A2 of (1.248).

Proposition 1.11.10. Consider a set V with cardV = 2m, and a partition


I of V in sets with 2 elements. Consider a function f of (ε )∈V
/ ∗ . Then
    
1
ν ηV f UJ− = E f 0 4m (4β 2 A2 )m + O(2m + 1) . (1.341)
J∈I
ch Y

Proof. First, since ν(( J∈I UJ− )2 )1/2 = O(m) (because there are m factors

in the product, each counting as 1/ N ), it follows from (1.334), used for

(ηV f J∈I UJ− ) = O(2m + 1) (uniformly in
(m+1)
m + 1 rather than m that νt
t). Next, it follows from Lemma 1.11.5 (used for r = 0 and s = 2m) that for

p < m we have νt (ηV f J∈I UJ− ) = 0. Combining these facts with Taylor’s
(p)

formula, we obtain:
     
1 (m)
ν ηV f UJ− = ν0 ηV f UJ− + O(2m + 1) . (1.342)
m!
J∈I J∈I

From (1.337) we get


       
1
ηV f UJ− = β 2m m!E f 0 UJ− UJ−
(m)
ν0 ν0 ,
J∈I
ch4m Y I J∈I J  ∈I 
(1.343)
where the summation is over all partitions I of V in sets with 2 elements.
Both I and I have m elements. In the previous section we have explained
in detail why
     
− −
ν0 UJ UJ  = ν UJ UJ  + O(2m + 1) . (1.344)
J∈I J  ∈I  J∈I J  ∈I 

Now, if J = {v, v } we obtain, recalling the notation T, of (1.245),

UJ = R2v−1,2v −1 − R2v−1,2v − R2v,2v −1 + R2v,2v


= T2v−1,2v −1 − T2v−1,2v − T2v,2v −1 + T2v,2v . (1.345)
1.11 Non Gaussian Behavior: Hanen’s Theorem 121

In this manner each term UJ is decomposed as the sum of 4 terms ±T, , so


that the right-hand side of (1.343) can be computed through Theorem 1.10.1
(using only the much easier case where k2 = k3 = 0). The fundamental fact
is that if a term T, occurs both in the decompositions of UJ and UJ  then
we must have J = J because  and  determine J by the formula
"# $ # $%
+1  +1
J= , ,
2 2
i.e. J is the two-point set whose elements are the integer parts of ( + 1)/2
and ( + 1)/2 respectively. In order for the quantity
  
ν UJ UJ 
J∈I J  ∈I 

not to be O(2m + 1), the following must occur: given any J0 ∈ I, at least
one of the terms T, of the decomposition (1.345) of UJ0 must occur in the
decomposition of another UJ , J ∈ I ∪ I , J = J0 . (This is because a(1) = 0.)
The only possibility is that J0 ∈ I . Since this must hold for any choice of
J0 , we must have I = I, and thus (1.343) implies
      
1
ηV f UJ− = β 2m m!E f 0 4m
(m)
ν0 ν UJ2 + O(2m + 1) .
J∈I
ch Y J∈I
(1.346)
Expanding UJ2 using (1.345), and using Theorem 1.10.1 then shows that
 
ν UJ = (4A2 )m + O(2m + 1) ,
2

J∈I

and combining with (1.342), (1.343) and (1.346) completes the proof. 


Proposition 1.11.11. Consider a set V with cardV = 2p and a partition I


of V in sets with two elements. Then
     p
1 2 2 1
ν ηV UJ = 4 +β A E 4p + O(2p + 1) . (1.347)
N ch Y
J∈I

Proof. We observe the relation


ηJ
UJ = UJ− +
N
  ηJ 
so that
UJ = UJ− + .
N
J∈I J∈I

We shall prove (1.347) by expanding the product and using (1.346) for each
term. We have
122 1. The Sherrington-Kirkpatrick Model
   
 ηJ    ηJ
UJ− + = −
UJ , (1.348)
N 
N 
J∈I  I J ∈I
/ J∈I

where the sum is over all subsets


& I of I. Consider such a subset with cardI =
m ≤ p = cardI. Let V = {J ; J ∈ I } and observe that

ηV = ηV  ηJ
/ 
J ∈I

so that    
ηV ηJ UJ− = ηV  ηJ2 UJ− .
/ 
J ∈I J∈I  / 
J ∈I J∈I 

We can then use (1.341) with V instead of V , I instead of I, m = cardI



and f = J ∈I 2
/  ηJ . We observe that

f 0 = ηJ2 0
J ∈I 

and that if J = {v, v },

ηJ2 0 = (ε2v−1 − ε2v )2 (ε2v −1 − ε2v )2 0


4
= (ε2v−1 − ε2v )2 20 = (2 − 2th Y )2 =
2
.
ch4 Y
Therefore (1.341) proves that
  
 ηJ   1

4
p−m
1


ν ηV UJ  = p−m E 4 4m (4β 2 A2 )m
N N ch Y ch Y
J ∈I  J  ∈I 
1
+ O(2m + 1)
N p−m
 p 
1 4
= p−m E (β 2 A2 )m + O(2p + 1) ,
N ch4p Y
and performing the summation in (1.348) completes the proof. 

Proof of Theorem 1.11.1. Combining (1.331), (1.332) and (1.333) we
obtain  
ν ( σ1 σ2 − σ1 σ2 )k = 2−k ν(ηV f − ) (1.349)
where V = {1, . . . , k} and

f − = (σN
1
−1 − σN −1 ) · · · (σN −1 − σN −1 ) .
2 2k−1 2k
(1.350)

Using Taylor’s formula and (1.334) proves that


 1 (m)
ν(ηV f − ) = ν (ηV f − ) + O(k + 1) . (1.351)
m! 0
m≤k
1.11 Non Gaussian Behavior: Hanen’s Theorem 123

Let us denote p = (k + 1)/2 when k is odd and p = k/2 when k is even. We


claim that
ν0 (ηV f − ) = O(p + m) .
(m)
(1.352)
To prove this we recall that by (1.151), and Lemma 1.6.2, the quantity
ν0 (ηV f − ) is the sum of terms of the type
(m)

  
− −
±β ν0 ηV ε1 ε1 · · · εm εm f
2m  (Rr , − q)
r
r≤m
  
− −
= ±β ν0 (ηV ε1 ε1 · · · εm εm )ν0 f
2m
(Rr , − q) .
r
r≤m

Thus it suffices to prove that


  
− −
ν0 f (Rr , − q) = O(m + p) . (1.353)
r
r≤m

This will be shown by using Corollary 1.11.7 for the (N − 1)-spin system with
Hamiltonian (1.144). First, we observe that if · − denotes an average for the
n
Gibbs measure with Hamiltonian (1.144) then for a function f on ΣN −1 we
have f 0 = f − . So, if ν− (·) = E · − , (1.353) shall follow from
  
− −
ν− f (Rr , − q) = O(m + p) . (1.354)
r
r≤m

Since the overlaps for the (N − 1)-spin system are given by

1   N

R,  = σi σi = R−  , (1.355)
N −1 N − 1 ,
i≤N −1

it suffices to prove (1.354) to show that


 
 N
ν− f − R∼r ,r − q = O(m + p) . (1.356)
N −1
r≤m

The (N − 1)-spin system with Hamiltonian (1.144) has parameter



N −1
β− = β≤β, (1.357)
N
and the corresponding value q− satisfies |q − q− | ≤ L/N by (1.187), so that
 
 N  L
 
 N − 1 q − q−  ≤ N − 1 .
124 1. The Sherrington-Kirkpatrick Model

Recalling the value (1.350) of f − we see that indeed (1.356) follows from
Corollary 1.11.7, because the estimate in that corollary is uniform over β ≤
β0 < 1/2 (and thus the fact that β− in (1.357) depends on N is irrelevant).
Thus we have proved (1.352), and combining with (1.351) we get
 1 (m)
ν(ηV f − ) = ν (ηV f − ) + O(k + 1) . (1.358)
m! 0
m≤k−p

When k is odd, we have k − p = (k − 1)/2, and for m ≤ k − p we have


2m < k. It then follows from Lemma 1.11.5 (used for r = 0 and s = k) that
ν0 (ηV f − ) = 0 for m ≤ k − p. In that case ν(ηV f − ) = O(k + 1), and since
(m)

a(k) = 0 when k is odd, we have proved (1.329) in that case.


So we assume that k is even, k = 2p. It then follows from Lemma 1.11.5
(used for r = 0 and s = 2p) that ν0 (ηV f − ) = 0 for m < p. Therefore from
(m)

(1.358) we obtain, using (1.337),


1 (p)
ν(ηV f − ) = ν (ηV f − ) + O(k + 1)
p! 0
  1    
− −
= β 2p E ν 0 f UJ + O(k + 1) , (1.359)
I
ch2k Y J∈I

where the summation is over all partitions I of V in sets of two elements.


Now we use (1.347) for the (N − 1)-spin system to see that, using (1.355)
and defining A− in the obvious manner:
     p
− − 1 2 2 1
ν0 f UJ = 4 + β A− E 4p + O(2p + 1)
N −1 ch Y−
J∈I

where Y− = β− z q− + h. It is a very simple matter to check that
  p   p
1 2 2 1 1 2 2 1
4 + β A− E 4p = 4 +β A E 4p + O(2p + 1)
N −1 ch Y− N ch Y
and thus
     p
1 1
ν0 f −
UJ− = 4 2 2
+β A E 4p + O(2p + 1) .
N ch Y
J∈I

Each choice of I gives the same contribution. To count the number of par-
titions I, we observe that if 1 ∈ J, and cardJ = 2, J is determined by its
other element so there are 2p − 1 choices for J. In this manner induction over
p shows that

cardI = (2p − 1)(2p − 3) · · · = a(2p) = a(k) .

Therefore
1.12 The SK Model with d-component Spins 125
 2   k/2
1 1
ν(ηV f − ) = a(k)β k E 4 + β 2 2
A + O(k + 1) ,
ch2k Y N

which completes the proof recalling (1.349) and since


1 1 1
+ β 2 A2 = . 

N N 1 − β (1 − 2q + q)
2

Having succeeded to make this computation one can of course ask all
kinds of questions.

Research Problem 1.11.12. (Level 1) Compute

lim N k/2 E( σ1 σ2 σ3 − σ1 σ2 σ3 )k .
N →∞

Research Problem 1.11.13. (Level 1− ) Recall the notation σ̇i = σi − σi .


Consider a number t, and i.i.d. standard Gaussian r.v.s gi , independent of
the randomness of HN . Compute
 k
 gi σ̇i t2
lim N k/2
E exp t √ − exp (1 − q) .
N →∞ N 2
i≤N

(Hint: read very carefully the proof of Theorem 1.7.11.)

1.12 The SK Model with d-component Spins

A model where spins take only the values ±1 could be an oversimplification.


It is more physical to consider spins as vectors in R3 or Rd . This is what
we will do in this section. The corresponding model is of obvious interest.
It has been investigated in less detail than the standard SK model, so many
questions remain unanswered. On the one hand, this is somewhat specialized
material, and it is not directly related to the rest of this volume. On the
other hand, this is as simple a situation as one might wish to describe a
“replica-symmetric solution” beyond the case of the ordinary SK model.
In the SK model with d-component spins, the individual spin σi is a vector
(σi,1 , . . . , σi,d ) of Rd . We will denote by (·, ·) the dot product in Rd , so that

(σi , σj ) = σi,u σj,u .
u≤d

The Hamiltonian is given by


126 1. The Sherrington-Kirkpatrick Model

β 
− HN = √ gij (σi , σj ) (1.360)
N 1≤i<j≤N

where, of course, (gij )i<j are independent standard normal r.v.s. We may
rewrite (1.360) as

β 
− HN = √ gij σi,u σj,u , (1.361)
N u≤d i<j

a formula that is reminiscent of a Hamiltonian depending on d configurations


σ u = (σ1,u , . . . , σN,u ) for u ≤ d, each of which has the same energy as in the
SK model. A first difference is that now σi,u varies in R rather than in {−1, 1}.
A deeper difference is that σ 1 , . . . , σ d are not independent configurations but
are interacting. (This does not show in the Hamiltonian itself, the interaction
takes place through the measure μ below.) In order to compare with the SK
model, and to accommodate the case σi,u = ±1, we will assume that

∀i, 2
σi,u ≤d, (1.362)
u≤d

or,
√ in words, that σi belongs to the Euclidean ball Bd centered at 0, of radius
d. Thus the configuration space is now

SN = BdN .

We consider a probability measure μ on Bd . We will define Gibbs’ measure as


the probability measure on SN = BdN of density proportional to exp(−HN )
with respect to μ⊗N . The case d = 1 is already of interest. This case is
simply the generalization of the standard SK model where the individual
spin σi is permitted to be any number in the interval [−1, 1]. When moreover
μ is supported by {−1, 1}, and for ε ∈ {−1, 1} has a density proportional to
exp εh with respect to the uniform measure on {−1, 1}, we recover the case
of the standard SK model with non-random external field. (Thus it might be
correct to think of μ as determining a kind of “external field” and to expect
that the behavior of the model will be very sensitive to the value of μ.) Also
of special interest is the case where d = 2, μ is supported by {−1, 1}2 , and
for (ε1 , ε2 ) ∈ {−1, 1}2 has a density proportional to

exp(ε1 h + ε2 h + λε1 ε2 )

with respect to the uniform measure on {−1, 1}2 . This is the case of “two
coupled copies of the SK model” considered in Section 1.9. This case is of
fundamental importance. It seems connected to some of the deepest remain-
ing mysteries of the low temperature phase of the SK model. For large values
of β, this case of “two coupled copies of the SK model” is far from being
completely understood at the time of this writing. One major reason for this
1.12 The SK Model with d-component Spins 127

is that it is not clear how to use arguments in the line of the arguments of
Theorem 1.3.7. The main difficulty is that some of the terms one obtains
when trying to use Guerra’s interpolation have the wrong sign, a topic to
which we will return later.
Let us define

ZN = ZN (β, μ) = exp(−HN )dμ(σ1 ) · · · dμ(σN ) , (1.363)

where HN is the Hamiltonian (1.360). (Let us note that in the case where
d = 1 and μ is supported by {−1, 1} this differs from our previous defini-
tion of ZN because we replace a sum over configurations by an average over
configurations.) Let us write
1
pN (β, μ) = E log ZN (β, μ) . (1.364)
N
One of our objectives is the computation of limN →∞ pN (β, μ). It will be
achieved when β is small enough. This computation has applications to the
theory of “large deviations”. For example, in the case of “two coupled copies
of the SK model”, computing limN →∞ pN (β, μ) amounts to computing
1
lim E log exp λN R1,2 , (1.365)
N →∞ N

where now the bracket is an average for the Gibbs measure of the usual
SK model. “Concentration of measure” (as in Theorem 1.3.4) shows that
N −1 log exp λN R1,2 fluctuates little with the disorder. Thus computing
(1.365) amounts to computing the value of exp λN R1,2 for the typical disor-
der. Since we can do this for every λ this is very much the same as computing
N −1 log G⊗2N ({R1,2 ≥ q + a}) and N
−1
log G⊗2N ({R1,2 ≤ q − a}) for a > 0 and
a suitable median value q. In summary, the result of (1.365) can be transfered
in a result about the “large deviations of R1,2 ” for the typical disorder. See
[151] and [162] for more on this.
We will be able to compute limN →∞ pN (β, μ) under the condition Lβd ≤
1, where as usual L is a universal constant. Despite what one might think
at first, the quality of this result does not decrease as d becomes large. It
controls “the same proportion of the high-temperature region √ independently
of d”. Indeed, if μ gives mass 1/2 to the two points (± d, 0, . . . , 0), the
corresponding model is “a clone” of the usual SK model at temperature
βd. The problem of computing limN →∞ pN (β, μ) is much more difficult (and
unsolved) if βd is large.
The SK model with d-component spins offers new features compared with
the standard SK model. One of these is that if μ is “spread out” then one can
understand the system up to values of β much larger than 1/d. For example,
if μ is uniform on {−1, 1}d , the model simply consists in d replicas of the SK
model with h = 0, and we understand it for β < 1/2, independently of the
128 1. The Sherrington-Kirkpatrick Model

value of d. Comparable results will be proved later in Volume II when μ is


the uniform measure on the boundary of Bd .
The good behavior of the SK model at small β < 1/2 is largely expressed
by (1.89), i.e. the fact that ν((R1,2 − q)2 ) ≤ L/N . The situation is more
complicated here. Consider 1 ≤ u, v ≤ d, and set
1 
Ru,v = σi,u σi,v . (1.366)
N
i≤N

This is a function of a single configuration (σ1 , . . . , σN ) ∈ SN , where σi =


(σi,u )u≤d . Consider now two configurations (σ11 , . . . , σN
1
) and (σ12 , . . . , σN
2
).
Consider the following function of these two configurations

u,v 1  1 2
R1,2 = σi,u σi,v . (1.367)
N
i≤N

In the present context, similar to (1.89), we have the following.

Theorem 1.12.1. If Lβd ≤ 1, we can find numbers (qu,v ), (ρu,v ) such that
   K(d)
ν (Ru,v − ρu,v )2 ≤ , (1.368)
N
u,v≤d

  u,v  K(d)
ν (R1,2 − qu,v )2 ≤ . (1.369)
N
u,v≤d

Here K(d) depends on d only; ν(·) = E · , · is an average for Gibbs’


measure, over one configuration in (1.368), and over two configurations in
(1.369). To get a first feeling for these conditions, consider the case d = 1.
Then (1.369) is the usual assertion
 that R1,2  q, but (1.368) is a new
feature which means that N −1 i≤N σi2  ρ. This of course was automatic
with ρ = 1 when we required that the individual spins be ±1.
In order to give a proper description of these numbers (qu,v )1≤u,v≤d ,
(ρu,v )1≤u,v≤d , let us consider (see Appendix page 439) for each symmet-
ric positive definite matrix (qu,v )1≤u,v≤d , a centered jointly Gaussian family
(Yu )u≤d with covariance
E Yu Yv = β 2 qu,v . (1.370)
For each family of real numbers (ρu,v )1≤u,v≤d and for x = (x1 , . . . , xd ) in
Rd , let us set further
 
β2 
E = E(x) = exp xu Yu + xu xv (ρu,v − qu,v ) . (1.371)
2
u≤d u,v≤d
1.12 The SK Model with d-component Spins 129

Theorem 1.12.2. Assuming that Lβd ≤ 1, and setting Z = E(x) dμ(x),
the following equations have a unique solution, and (1.368) and (1.369) hold
for these numbers:
 
1
qu,v = E xu E(x)dμ(x) xv E(x)dμ(x) , (1.372)
Z2
 
1
ρu,v = E xu xv E(x)dμ(x) . (1.373)
Z
Of course the above theorem subsumes Theorem 1.12.1, and moreover the
proof of Theorem 1.12.1 requires the relations (1.372) and (1.373). But for
pedagogical reasons we will prove first Theorem 1.12.1 and only then obtain
the information above.

Theorem 1.12.3. If βLd ≤ 1, then

β2  2
lim pN (β, μ) = − (ρu,v − qu,v
2
) + E log E(x)dμ(x) (1.374)
N →∞ 4
u,v≤d

where E(x) is given by (1.371).

Our argument gives a rate of convergence in N −1/2 , but it is almost certain


that a little bit more work would yield the usual rate in 1/N .
We will later explain why the solutions to (1.372) and (1.373) exist and
are unique for Lβd ≤ 1, but let us accept this for the time being and turn to
the fun part, the search for the “smart path”. We will compare the system
with a version of it where the last spin is “decoupled”.
We consider the “configurations of the (N − 1)-spin system”

ρu = (σ1,u , . . . , σN −1,u ) .

(One will distinguish between the configuration ρu and the numbers ρu,v .)
We define
β 
g(ρu ) = √ giN σi,u
N i≤N −1
√ √
gt (ρu ) = tg(ρu ) + 1 − tYu .
We consider the Hamiltonian
β   
− HN,t (σ1 , . . . , σN ) = √ gij σi,u σj,u + σN,u gt (ρu )
N u≤d i<j≤N −1 u≤d

β2 
+ (1 − t) σN,u σN,v (ρu,v − qu,v ) . (1.375)
2
u,v≤d

The last term is the new feature compared to the standard case.
130 1. The Sherrington-Kirkpatrick Model

n
For a function f on SN we write

νt (f ) = E f t ,

where · t denotes integration with respect to (the nth power of) the Gibbs
n
measure relative to the Hamiltonian (1.375). A function f on SN depends on
1 1 2 2 n n
configurations (σ1 , . . . , σN ), (σ1 , . . . , σN ), . . . , (σ1 , . . . , σN ). We define

−,u,v 1   
R,  = σi,u σi,v
N
i≤N −1

for ,  ≤ n and u, v ≤ d. As usual, we write νt (f ) = d


dt νt (f ). We define

qu,v (,  ) = qu,v if  = 

qu,v (, ) = ρu,v .

Proposition 1.12.4. We have

β2    −,u,v 
νt (f ) = νt f εu εv (R,  − qu,v (,  ))
2
, ≤n,u,v≤d
  −,u,v 
− nβ 2 νt f εu εn+1
v (R,n+1 − qu,v (, n + 1))
≤n,u,v≤d

β   n+1 n+1 −,u,v


2 
−n νt f εu εv (Rn+1,n+1 − qu,v (n + 1, n + 1)) (1.376)
2
u,v≤d
n(n + 1) 2   n+1 n+2 −,u,v 
+ β νt f εu εv (Rn+1,n+2 − qu,v (n + 1, n + 2)) .
2
u,v≤d

Here we have set εu = σN,u


. First let us explain why (1.376) coincides
with (1.151) in the case of the standard SK model. In such a case we have
−,1,1 −
d = 1, ε1 = ε , R,  = R,  , q1,1 = q, ρ1,1 = 1 (because x
2
= 1 if

x ∈ {−1, 1} cf. (1.373)). Let us also point out that R, = (N − 1)/N , so that
the contribution of the case  =  in the first sum of the right-hand side of
(1.376)
 cancels out
 with the contribution of the third sum. We finally observe
that = = 2 < .
Proof. Of course this formula is yet another avatar of (1.90), the new feature
being the last term of the Hamiltonian (1.375), which creates extra terms.
We leave to the reader the minimal work of deducing (1.376) from (1.90). A
direct proof goes as follows. We use straightforward differentiation (i.e. use
of rules of Calculus) in the definition of νt (f ) to obtain

β2  
νt (f ) = − νt (f εu εv (ρu,v − qu,v ))
2
≤n u,v≤d
1.12 The SK Model with d-component Spins 131

nβ 2 
+ νt (f εn+1
u εn+1
v (ρu,v − qu,v ))
2
u,v≤d
  
1  1 1
+ νt f εu √ g(ρu ) − √ Yu
2
≤n u≤d
t 1−t
  
n 1 1
− νt f εn+1
u √ g(ρn+1
u ) − √ Y u . (1.377)
2
u≤d
t 1−t

The first two terms are produced by the last term of the Hamiltonian (1.375),
and the last 2 terms by the dependence of gt (ρ) on t. One then performs
Gaussian integration by parts in the last two terms of (1.377), which yields
an expression similar to (1.376), except that one has qu,v rather than qu,v (,  )
everywhere. Combining this with the first two terms on the right-hand side
of (1.377) yields (1.376). 
The proof of Theorem 1.12.1 will follow the scheme of that of Proposi-
tion 1.6.6, but getting a dependence on d of the correct order requires some
caution.
Corollary 1.12.5. If n = 2, we have
  1/2
1/2
 −,u,v
|νt (f )| ≤ Lβ 2 d νt (f 2 ) νt (R1,1 − ρu,v )2
u,v≤d
 1/2 
−,u,v
+ νt (R1,2 − qu,v )2
(1.378)
u,v≤d

and also
|νt (f )| ≤ Lβ 2 d2 νt (|f |) . (1.379)
Here and throughout the book we lighten notation by writing νt (f )1/2 rather
than (νt (f ))1/2 , etc. The quantity νt (f )1/2 cannot be confused with the quan-
tity νt ((f )1/2 ) simply because we will never, ever, consider this latter quantity.
Proof. We write
   
  −,u,v    −,u,v
νt f εu εv (R, − qu,v (,  )) = νt f εu εv (R, − qu,v (,  )) .
u,v≤d u,v≤d

Next, we observe that since we are assuming that for each i we have

2
σi,u ≤d, (1.380)
u≤d

taking i = N , for each  we have



(εu )2 ≤ d . (1.381)
u≤d
132 1. The Sherrington-Kirkpatrick Model

Now, by the Cauchy-Schwarz inequality, and using (1.380), we have


 
    −,u,v 
 ε ε (R − q (,  ))
 u v ,  u,v 
u,v≤d
  1/2   1/2
 −,u,v
≤ (εu εv )2 (R,  − qu,v (,  ))2
u,v≤d u,v≤d
 1/2
−,u,v
≤d (R,  − qu,v (,  )) 2
,
u,v≤d

so that use of the Cauchy-Schwarz inequality for νt shows that


  
  
νt f εu εv (R, − qu,v (,  )) 
  −,u,v

u,v≤d
 1/2
1/2 −,u,v
≤ d νt (f 2 )νt (R,  − qu,v (,  ))2
.
u,v≤d

The right-hand side takes only two possible values, depending on whether
 =  or not. This yields (1.378).
To deduce (1.379) from (1.376), it suffices to show that, for each ,  we
have  
    −,u,v 
 εu εv (R, − qu,v (,  )) ≤ 2d2 .

u,v≤d

Using (1.381) and the Cauchy-Schwarz inequality, it suffices to prove that


 −,u,v
(R,  − qu,v (,  ))2 ≤ 4d2 ,
u,v≤d

which follows from


 −,u,v 2

(R,  ) ≤ d2 ; qu,v (,  )2 ≤ d2 . (1.382)
u,v≤d u,v≤d

To prove this we observe first that


 1  2
−,u,v 2  
(R,  ) = σi,u σi,v
N
u,v≤d u,v≤d i≤N −1
1   
≤ 
(σi,u  2
σi,v ) ≤ d2
N
u,v≤d i≤N −1

by (1.380). Next, we observe by (1.372) that


  2  2 
1
qu,v ≤ E
2
x u E(x)dμ(x) xv E(x)dμ(x) .
Z4
1.12 The SK Model with d-component Spins 133

The Cauchy-Schwarz inequality implies


 2
xu E(x)dμ(x) ≤ Z x2u E(x)dμ(x)

so that
    
1
2
qu,v ≤E x2u E(x)dμ(x) x2v E(x)dμ(x) ≤ d2
u,v
Z2 u v
 
since u≤d x2u ≤ d for x in the support of μ. The inequality u,v ρ2u,v ≤ d2
is similar. 
Proof of Theorem 1.12.1. In this proof we assume the existence of numbers
qu,v , ρu,v satisfying (1.372) and (1.373). This existence will be proved later.
Symmetry between sites implies
 
u,v
A := ν (R1,2 − qu,v )2 = ν(f ) , (1.383)
u,v≤d

where  u,v
f= (ε1u ε2v − qu,v )(R1,2 − qu,v ) .
u,v≤d

Using (1.381) and (1.382) we obtain


  
(ε1u ε2v − qu,v )2 ≤ 2 (ε1u ε2v )2 + 2 2
qu,v ≤ 4d2 ,
u,v≤d u,v≤d u,v≤d

and the Cauchy-Schwarz inequality entails


 u,v
f 2 ≤ 4d2 (R1,2 − qu,v )2 . (1.384)
u,v≤d

Next, as in the case of the ordinary SK model, (1.372) implies that for a
function f − on SN
n
−1 , we have

ν0 ((ε1u ε2v − qu,v )f − ) = 0

and thus, as in the case of the ordinary SK model,

K(d)
|ν0 (f )| ≤ .
N
If βd ≤ 1, (1.379) implies that νt (f ) ≤ Lν1 (f ) whenever f ≥ 0. Combining
this with (1.378), and the usual relation

ν(f ) ≤ ν0 (f ) + sup |νt (f )| ,


0<t<1
134 1. The Sherrington-Kirkpatrick Model

we get that
  1/2
K(d)  −,u,v
ν(f ) ≤ 2 2 1/2
+ Lβ d ν(f ) ν (R1,1 − ρu,v )2
N
u,v≤d
 1/2 
−,u,v
+ν (R1,2 − qu,v ) 2
.
u,v≤d

−,u,v u,v −,u,v


Using (1.383), (1.384) and the fact that replacing R1,2 by R1,2 or R1,1
by Ru,v creates an error term of at most K(d)/N , we get the relation

K(d)
A≤ + Lβ 2 d2 A1/2 (B 1/2 + A1/2 ) , (1.385)
N
where A is defined in (1.383) and
 
B=ν (Ru,v − ρu,v )2 .
u,v≤d

The same argument (using now (1.373) rather than (1.372)) yields the relation

K(d)
B≤ + Lβ 2 d2 B 1/2 (B 1/2 + A1/2 ) .
N
Combining with (1.385) we get

K(d)
A+B ≤ + L0 β 2 d2 (A + B) ,
N
so that if L0 β 2 d2 ≤ 1/2 this implies that A + B ≤ K(d)/N . 
The above arguments prove Theorems 1.12.1, except that it remains to
show the existence of solutions to the equations (1.372) and (1.373). It seems
to be a general fact that “the proof of the existence at high temperature of
solutions to the replica-symmetric equations is implicitly part of the proof
of the validity of the replica-symmetric solution”. What we mean here is
that an argument proving the existence of a solution to (1.372) and (1.373)
can be extracted from the smart path method as used in the above proof of
Theorem 1.12.1. The same phenomenon will occur in many places.
Consider a positive definite symmetric matrix Q = (qu,v )u,v≤d , and a
symmetric matrix Q = (ρu,v )u,v≤d . Consider a centered jointly Gaussian
family (Yu )u≤d as in (1.370). Consider the matrices T (Q, Q ) and T (Q, Q )
given by the right-hand sides of (1.372) and (1.373) respectively. The proof of
the existence of a solution to (1.372) and (1.373) consists in showing that if we
provide the set of pairs of matrices (Q, Q ) as above with Euclidean distance
2
(when seen as a subset of (Rd )2 ), the map (Q, Q ) → (T (Q, Q ), T (Q, Q ))
is a contraction provided Lβd ≤ 1. (Thus it admits a unique fixed point.) To
1.12 The SK Model with d-component Spins 135

 Q
see this, considering another pair (Q,  ) of matrices, we move from the pair
 
(Q, Q ) to the pair (Q, Q ) using the path t → (Q(t), Q (t)), where

Q(t) = (tqu,v + (1 − t)


qu,v )u,v≤d (1.386)

Q (t) = (tρu,v + (1 − t)


ρu,v )u,v≤d .
As already observed on page 14 this is very closely related to the smart
path used in the proof of Theorem 1.12.1, since (with obvious notation) the
Gaussian process Yu (t) associated to Q(t) is given by
√ √
Yu (t) = tYu + 1 − tYu

where Yu , Yu are assumed to be independent. This is simply because

E Yu (t)Yv (t) = tE Yu Yv + (1 − t)E Yu Yv .

All we have to do is to compute the derivative of the map t → (Q(t), Q (t))


and to exhibit a convenient upper bound for the modulus of this derivative,
 Q
depending on the distance between the pairs (Q, Q ) and (Q,  ), i.e. on
 1/2

(qu,v − qu,v ) + (ρu,v − ρu,v )
2 2
.
u,v

The estimates required are very similar to those of Corollary 1.12.5 and the
details are better left to the reader.
Proof of Theorem 1.12.2. We just proved the existence of solutions to the
equations (1.372) and (1.373). The uniqueness follows from Theorem 1.12.1.


We begin our preparations for the proof of Theorem 1.12.3. It seems very
likely that one could use interpolation as in (1.108) or adapt the proof of
(1.170). We sketch yet another approach, which is rather instructive in a
different way. We start with the relation
∂pN 1 
(β, μ) = 3/2 E (gij σi,u σj,u )
∂β N i<j u≤d
β  
= (ν(σi,u σj,u σi,v σj,v ) − ν(σi,u
1 1
σj,u 2
σi,v 2
σj,v )),
N 2 i<j
u,v≤d

where the first equality is by straightforward differentiation, and the second


one by integration by parts. Thus, since e.g.
1  1 1 2 2 1 u,v 2 1  1 2 2
2
σi,u σj,u σi,v σj,v = (R1,2 ) − (σi,u σi,v ) ,
N i<j 2 2N 2
i≤N
136 1. The Sherrington-Kirkpatrick Model

we obtain
 
 ∂p β 
 N u,v 2 
 (β, μ) − ν((Ru,v )2 − (R1,2 ) )
 ∂β 2 u,v 
 
β   2 2 
 K(d)
≤ 2  ν((σi,u σi,v )2 − (σi,u
1
σi,v ) ) ≤
N u,v   N
i≤N

and thus, by Theorem 1.12.1,


 
 ∂p β 2  K(d)
 N 
 (β, μ) − (ρu,v − qu,v ) ≤ √
2
.
 ∂β 2 u,v  N

Therefore, (and since the result is obvious for β = 0) all we have to check is
that the derivative of
β2  2
− (ρu,v − qu,v
2
) + E log E(x)dμ(x) (1.387)
4
u,v≤d


with respect to β is β 2
u,v≤d (ρu,v − qu,v
2
)/2. The crucial fact is as follows.
Lemma 1.12.6. The relations (1.372) and (1.373) mean that the partial
derivatives of the quantity (1.387) with respect to qu,v and ρu,v are zero.
The reader will soon observe that each time we succeed in computing the
limiting value of pN for a certain model, we find this limit as a function F of
certain parameters (here β, μ, (qu,v ) and (ρu,v )). Some of these parameters
are intrinsic to the model (here β and μ) while others are “free” (here (qu,v )
and (ρu,v )). It seems to be a general fact that the “free parameters” are
determined by the fact that the partial derivatives of the function F with
respect to these are 0.

Research Problem 1.12.7. Such a phenomenon as just described above


cannot be accidental. Understand the underlying structure.

Proof of Lemma 1.12.6. The case of the derivative with respect to ρu,v
is completely straightforward, so we explain only the case of the derivative
with respect to qu,v . We recall the definition (1.371) of E(x):
 
β2 
E(x) = exp xu Yu + xu xv (ρu ,v − qu ,v ) ,

2  
u ≤d u ,v ≤d

where the r.v.s Yu are jointly Gaussian and satisfy EYu Yv = β 2 qu ,v . Let us
now consider another jointly Gaussian family Wu and let au ,v = EWu Wv .
Let us define
1.12 The SK Model with d-component Spins 137
 
β2 
E ∗ (x) = exp xu Wu + xu xv (ρu ,v − qu ,v ) ,
2  
u ≤d u ,v ≤d

which we think of as a function of the families (qu ,v ) and (au ,v ) (the quanti-
ties (ρu ,v ) being fixed once and for all). The purpose of this is to distinguish
the two different manners in which E(x) depends on qu,v . Thus we have

E log E(x)dμ(x) = I + II , (1.388)
∂qu,v
where

I= E log E ∗ (x)dμ(x) , (1.389)
∂qu,v
and

II = β 2 E log E ∗ (x)dμ(x) . (1.390)
∂au,v
In both these relations, E ∗ (x) is computed at the values au ,v = β 2 qu ,v . To
perform the computation, on has to keep in mind that
 
xu xv (ρu ,v − qu ,v ) = 2 xu xv (ρu ,v − qu ,v )
u ,v  ≤d 1≤u <v  ≤d

+ x2u (ρu ,u − qu ,u ) .
u ≤d

For simplicity we will consider only the


 case u < v. The case u = v is entirely
similar. Recalling the notation Z = E(x)dμ(x), it should be obvious that
 
1
I = −β E
2
xu xv E(x)dμ(x) . (1.391)
Z
To compute the term II we consider the function

G(y1 , . . . , yd )
 
β2 
= log exp xu yu + xu xv (ρu ,v − qu ,v ) dμ(x) ,

2  
u ≤d u ,v ≤d

so that
E log E ∗ (x)dμ(x) = EG(W1 , . . . , Wd ) ,

and to compute the term II we simply appeal to Proposition 1.3.2. Since we


assume u = v we obtain

∂ ∂2G
E log E ∗ (x)dμ(x) = E (W1 , . . . , Wd ) ,
∂au,v ∂yu ∂yv

and when this is computed at the values au ,v = β 2 qu ,v this is
138 1. The Sherrington-Kirkpatrick Model
   
1 1
E xu xv E(x)dμ(x) − E xu E(x)dμ(x) xv E(x)dμ(x) .
Z Z2
Recalling (1.388) this yields the formula
 
∂ 1
E log E(x)dμ(x) = −β E 2
xu E(x)dμ(x) xv E(x)dμ(x) ,
∂qu,v Z2
from which the conclusion readily follows. 

Proof of Theorem 1.12.3. It follows from Lemma 1.12.6 that to differ-
entiate in β the quantity (1.387) we can pretend that qu,v and ρu,v do not
depend on β. To explain why this is the case in a situation allowing for simpler
notation, this is simple consequence of the chain rule,
d ∂F ∂F ∂F
F (β, p(β), q(β)) = + p (β) + q (β) , (1.392)
dβ ∂β ∂p ∂q
so that dF (β, p(β), q(β))/dβ = ∂F /∂β when the last two partial derivatives
of (1.392) are 0. Thus it suffices to prove that
∂ 
E log E(x)dμ(x) = β (ρ2u,v − qu,v
2
). (1.393)
∂β u,v

Consider a jointly Gaussian family (Xu )u≤d such that E Xu Xv = qu,v , and
which, like qu,v , we may pretend does not depend on β. We may choose these
so that Yu = βXu and now
  

E(x) = xu Xu + β xu xu (ρu,v − qu,v ) E(x) .
∂β
u≤d u,v≤d

Therefore, using (1.373) in the third line,



E E(x)dμ(x)
log
∂β
   
u≤d xu Xu + β u,v≤d xu xu (ρu,v − qu,v ) E(x)dμ(x)
=E 
E(x)dμ(x)

 u≤d xu Xu E(x)dμ(x)
=β (ρu,v − ρu,v qu,v ) + E
2  .
E(x)dμ(x)
u,v≤d

Using Gaussian integration by parts and (1.372) and (1.373) we then reach
that
 
u≤d xu Xu E(x)dμ(x)
 xu xv E(x)dμ(x)
E  =β qu,v E 
E(x)dμ(x) E(x)dμ(x)
u,v≤d
 
 xu E(x)dμ(x) xv E(x)dμ(x)
−β qu,v E 
( E(x)dμ(x))2
u,v≤d

=β (qu,v ρu,v − qu,v
2
),
u,v
1.12 The SK Model with d-component Spins 139

and this completes the proof of (1.393). 

Exercise 1.12.8. Find another proof of (1.393) using Proposition 1.3.2 as


in Lemma 1.12.6.

One should comment that the above method of taking the derivative in
β is rather similar in spirit to the method of (1.108); but unlike the proof of
(1.105) it does not use the “right path”, and as a penalty one would have to
work to get the correct rate of convergence K/N instead of obtaining it for
free.
Exercise 1.12.9. Write down a complete proof of Theorem 1.12.3 using in-
terpolation in the spirit of (1.108).
Research Problem 1.12.10. (Level 1) In this problem ν refers to the
Hamiltonian HN of (1.12). Consider a number λ and the following random
function on ΣN
1 
ϕ(σ) = log exp(λN R(σ, τ ) − HN (τ )) . (1.394)
N τ

Develop the tools to be able to compute (when β is small enough) the quantity
ν(ϕ(σ)). Compute also ν(ϕ(σ)2 ).
The relationship with the material of the present section is that by Jensen’s
inequality we have
1 
ϕ(σ) ≤ log exp(λN R(σ, τ ) − HN (σ) − HN (τ ))
N σ,τ
1 
− log exp(−HN (σ)) ,
N σ

and that the expected value of this quantity can be computed using (1.374)
by a suitable choice of μ in a 2-component spin model.
A possible solution to Problem 1.12.10 involves developing the cavity
method in a slightly different setting than we have done so far. Carrying
out the details should be a very good exercise for the truly interested reader.

Research Problem 1.12.11. (Level 2). With the notation above, is it true
that at any temperature for large N one has

ν(ϕ2 )  ν(ϕ)2 ? (1.395)


Quantities similar to those above are considered in physics, see e.g. [43].
The physicists find it natural to assert that the quantity (1.394) is “self-
averaging”, which means here that it is essentially independent of the disorder
and the value of σ (when weighted with the Gibbs measure), which is the
meaning of (1.395).
140 1. The Sherrington-Kirkpatrick Model

1.13 The Physicist’s Replica Method

Physicists have discovered their results about the SK model using the “replica
method”, a method that has certainly contributed to arouse the interest of
mathematicians in spin glasses. In this section, we largely follow the paper
[81], where the authors attempt as far as possible to make the replica method
rigorous. We start with the following, where we consider only the case of non-
random external field.
Theorem 1.13.1. Consider an integer n ≥ 1. Then
1 n
limN →∞ log E ZN (β, h) = n log 2 (1.396)
N 
nβ 2 n2 β 2 2 √
+ max (1 − q)2 − q + log E chn (βz q + h)
q 4 4

where z is standard normal.


We do not know if the arguments we will present extend to the case of
random external field, but (1.396) remains true in that case, and even if n ≥ 1
is not an integer. This is proved in [159]. The proof uses a fundamental prin-
ciple called the Ghirlanda-Guerra identities that we will present in Section
12.5 when we start to concentrate on low-temperature results. In some sense
this general argument is much more interesting than the specific arguments
of the present section, which, however beautiful, look more like tricks than
general principles.
To prove (1.396) we write
  β   
n
ZN = exp √  
gij σi σj + h σi .
σ ≤n
N i<j i≤N

Now we have
  2  2
   
E gij σi σj = σi σj
i<j ≤n i<j ≤n
  
= σi σi σj σj
, i<j
  2 
1  
= σi σi −N
2 
, i≤N
1  
= (nN 2 − n2 N ) + (σ  · σ  )2 , (1.397)
2
1≤< ≤n

so that combining (1.397) with (A.6) we get


1.13 The Physicist’s Replica Method 141
 
β2
n
E ZN = exp n(N − n)
4
  2   
β  2
× exp (σ · σ ) + h

σi , (1.398)
2N 
σ 1≤< ≤n ≤n,i≤N

where σ means that the summation is over (σ 1 , . . . , σ n ) in ΣN n
. Consider
2
now g = (g, )1≤< ≤n where (g, ) are i.i.d. Gaussian r.v.s with E g,  =

1/N . (Despite the similarity in notation these r.v.s play a very different rôle
than the interaction r.v.s (gij ).) It follows from (A.6) that
  2   
β 
exp (σ  · σ  )2 + h σi
2N
σ 1≤< ≤n ≤n,i≤N
    

=E exp β g, σ  · σ  + h σi
σ 1≤< ≤n ≤n,i≤N
    

=E exp β g, σi σi + h σi
σ i≤N 1≤< ≤n ≤n
  N
  
=E exp β g, ε ε + h ε
ε1 ,...,εn =±1 1≤< ≤n ≤n
= E exp N A(g) ,

where
    
A(g) = log exp β g, ε ε + h ε .
ε1 ,...,εn =±1 1≤< ≤n ≤n

Now,
 n(n−1)/4   
N 1
E exp N A(g) = exp N A(g) − 2
g,  dg
2π 2
1≤< ≤n

where the integral is taken with respect to Lebesgue’s measure dg on


Rn(n−1)/2 . Since |A(g)| ≤ K(g + 1), it is elementary to show that
 
1 1 2
lim log E exp N A(g) = max A(g) − g,
N →∞ N g 2  <

and from (1.398) we get


 
1 β2n 1 2
lim n
log E ZN = + max A(g) − g, . (1.399)
N →∞ N 4 g 2  <

Consider the function


142 1. The Sherrington-Kirkpatrick Model

1 
B(g) = A(g) − 2
g,  .
2
1≤< ≤n

We say that g is a maximizer of B if B attains its maximum at g. The


following is based on an idea of [81] (attributed to Elliot Lieb) and further
elaboration of this idea by D. Panchenko.

Proposition 1.13.2. a) If h > 0 and g is a maximizer of B, there exists a


number a ≥ 0 with g, = a for each 1 ≤  <  ≤ n.
b) If h = 0 and g is a maximizer of B, there exists a number a ≥ 0 and a
subset I of {1, . . . , n} such that g, = a if ,  ∈ I or ,  ∈
/ I and g, = −a
in the other cases.

Let us denote by a the sequence such that a, = a for 1 ≤  <  ≤ n. In


the case b), we have B(g) = B(a) as is shown by the transformation ε = ε
if  ∈ I and ε = −ε if  ∈
/ I. Therefore the maximizer cannot be unique for
symmetry reasons. In the case a), this symmetry is broken by the external
field.

Corollary 1.13.3. To compute maxg B(g) it suffices to maximize over the


sequences g where all coordinates are equal.

We start the proof of Proposition 1.13.2. The proof is pretty but is un-
related to any other argument in this work. It occupies the next two and a
half pages. The fun argument starts again on page 145.
Lemma 1.13.4. Consider numbers a1 , a2 , g. Then

cha1 cha2 chg + sha1 sha2 shg


≤ (ch2 a1 ch|g| + sh2 a1 sh|g|)1/2 (ch2 a2 ch|g| + sh2 a2 sh|g|)1/2 . (1.400)

Moreover, if there is equality in (1.400) and if g = 0, we have a1 = a2 if


g > 0 and a1 = −a2 if g < 0.

Proof. For numbers c1 , c2 , u ≥ 0 and s1 , s2 , v, we write, using the Cauchy-


Schwarz inequality in the second line,

c1 c2 u + s1 s2 v ≤ c1 c2 u + |s1 ||s2 ||v| (1.401)


≤ (c21 u + s21 |v|)1/2 (c22 u + s22 |v|)1/2 , (1.402)

and we use this for

cj = chaj ; u = chg ; sj = shaj ; v = shg . (1.403)

Then if g = 0 (so that |v| = |shg| = 0) there can be equality in (1.402) only
if for some λ we have
1.13 The Physicist’s Replica Method 143

(c1 , |s1 |) = λ(c2 , |s2 |)


i.e. we have |tha1 | = |tha2 | and |a1 | = |a2 |. If we moreover have equality in
(1.401) we have sha1 sha2 shg = s1 s2 v ≥ 0. The result follows. 


Lemma 1.13.5. Given g, consider the sequences g (resp. g ) obtained from


g by replacing g1,2 by |g1,2 | and g2, by g1, (resp. g1, by g2, ) for 3 ≤  ≤ n.
Now, if g is a maximizer, then both g and g are maximizers. Moreover if
g1,2 > 0 we have g1, = g2, for  ≥ 3, while if g1,2 < 0 we have g1, = −g2,
for  ≥ 3.

Proof. We will prove that


1 1
A(g) ≤ A(g ) + A(g ) . (1.404)
2 2
Since
  
2 1  
g,  = (g, )2 + (g, )2 ,
2
< 
<  <

this implies
1
B(g) ≤
(B(g ) + B(g )) (1.405)
2
so that both g and g are maximizers. Moreover, since g is a maximizer, we
have B(g) = B(g ) = B(g ) so in fact
1 1
A(g) = A(g ) + A(g ) . (1.406)
2 2
Let us introduce the notation

α = (ε1 , . . . , εn ) ; Aj (α) = β gj, ε + h for j = 1, 2
3≤≤n
   
w(α) = exp β g, ε ε + h ε .
3≤< ≤n 3≤≤n

Then, using (1.400) in the last line, we have


    
exp A(g) = exp β g, ε ε + h ε
ε1 ,...,εn =±1 1≤< ≤n 3≤≤n
 
= w(α) exp(A1 (α)ε1 + A2 (α)ε2 + βg1,2 ε1 ε2 )
α ε1 ,ε2 =±1

=4 w(α)(chA1 (α)chA2 (α)chβg1,2 + shA1 (α)shA2 (α)shβg1,2 )
α

≤4 w(α)B1 (α)1/2 B2 (α)1/2 (1.407)
α
144 1. The Sherrington-Kirkpatrick Model

where
Bj (α) = ch2 Aj (α)chβ|g1,2 | + sh2 Aj (α)shβ|g1,2 | .
The Cauchy-Schwarz inequality implies

   1/2   1/2
4 w(α)B1 (α) 1/2 1/2
B2 (α) ≤ 4 w(α)B1 (α) 4 w(α)B2 (α)
α α α

= (exp A(g ) exp A(g ))1/2 ,

where the equality follows from the computation performed in the first three
lines of (1.407). Combining with (1.407) proves (1.404).
In order to have (1.406) we must have equality in (1.407). Since each
quantity w(α) is > 0, for each α we must have

chA1 (α)chA2 (α)chβg1,2 + shA1 (α)shA2 (α)shβg1,2 = B1 (α)1/2 B2 (α)1/2 .

If g1,2 > 0 Lemma 1.13.4 shows that A1 (α) = A2 (α) for each α, and thus
g1, = g2, for each  ≥ 3. If g1,2 < 0 Lemma 1.13.4 shows that A1 (α) =
−A2 (α) for each α, so that (h = 0 and) g1, = −g2, for each  ≥ 3. 

Proof of Proposition 1.13.2. Consider a maximizer g. There is nothing
to prove if g = 0, so we assume that this is not the case. In a first step we
prove that |g, | does not depend on ,  . Assuming g2,3 = 0, we prove that
|g1,2 | = |g2,3 |; this clearly suffices. By Lemma 1.13.5, g is a maximizer, and
by definition g1,3 = g2,3 = 0. Since g1,3 = 0, and g is a maximizer, Lemma
1.13.5 shows that |g1, | = |g3, | for  ∈ {1, 3}, and in particular |g1,2 | = |g2,3 |,
i.e. |g1,2 | = |g2,3 |.
Next, consider a subset I ⊂ {1, . . . , n} with the property that

 <  , ,  ∈ I ⇒ g, > 0 .

If no such set exists, g, < 0 for each ,  and we are done. Otherwise consider
I as large as possible. Without loss of generality, assume that I = {1, . . . , m},
and note that m ≥ 1. If m = n we are done. Otherwise, consider first  > m.
We observe by Lemma 1.13.5 that if 1 < 2 ≤ m we have g1 , = g2 , , and
since we have assumed that I is as large as possible, we have g1 , < 0. Next
consider 1 < m <  <  . Then as we have just seen both g1 , and g1 ,
are < 0 so that Lemma 1.13.5 shows that g, > 0. Therefore, for a certain
number a ≥ 0 we have, for  < 

g, = a if  <  ≤ m or m <  < 


g, = −a if  ≤ m <  .

This proves b). To prove a) we observe that when h > 0 we have shown that
in fact g, ≥ 0 when g is a maximizer. 

1.13 The Physicist’s Replica Method 145

We go back to the main computation. By Corollary 1.13.3, in equation


(1.399) we can restrict the max to the case where for a certain number q we
have g, = βq for each 1 ≤  <  ≤ n. Then
    
exp β 2 q ε ε + h ε
ε1 ,...,εn =±1 < ≤n
  2
 2  
β q nβ 2 q
= exp ε − +h ε
ε1 ,...,εn =±1
2 2
≤n ≤n
   
√ nβ 2 q
= E exp (βz q + h) ε −
ε1 ,...,εn =±1
2
≤n
 
nβ 2 q √
= exp − E(2 ch(βz q + h))n ,
2
where z is a standard Gaussian r.v. and where the summations are over
ε1 , . . . , εn = ±1. Thus
 
1 2
max A(g) − g,
g 2
<
 
2
nβ q √ n(n − 1) 2 2
= max − + n log 2 + log E chn (βz q + h) − β q .
q 2 4
Combining this with (1.399) proves (1.396). 
It is a simple computation (using of course Gaussian integration by parts)
to see that the maximum in (1.396) is obtained for a value qn such that

E (chn Y th2 Y )
qn = (1.408)
E chn Y

where Y = βz qn + h.
Let us also observe that
1 t
lim log E ZN = E log ZN , (1.409)
t→0+ t
as follows from the fact that ZNt
 1 + t log ZN for small t.
Now we take a deep breath. We pretend that Theorem 1.13.1 is true not
only for n integer, but for any number n > 0. We rewrite (1.409) as
1 1 n
E log ZN = lim log E ZN . (1.410)
N n→0 N n

Let us moreover pretend that we can exchange the limits N → ∞ and n → 0.


Then presumably q = limn→0 qn exists, and (1.396) yields
1 β2
lim E log ZN = (1 − q)2 + log 2 + E log chY , (1.411)
N →∞ N 4
146 1. The Sherrington-Kirkpatrick Model

where Y = βz q + h and (1.408) becomes q = E th2 Y .
When trying to justify this procedure one is tempted to think about
analytic continuation. However the information contained in Theorem 1.13.1
about the large values of n seems to be completely irrelevant to the problem
at hand. To get convinced of this, one can consider the case where h = 0 and
β < 1; then it is not difficult to get convinced that limn→∞ qn = 1 (because
for n large only the large values of chY become relevant for the computation
of Echn Y , and for these values thY gets close to one) and this is hard to
relate to the fact that q = 0.
It is not difficult a posteriori to justify the previous method. The function
ψN (t) = N −1 log EZN t
is convex (by Hölder’s inequality) and for β small
enough its limit ψ(t) as N → ∞ exists and is differentiable at zero. (This
can be shown by generalizing (1.108) for any t = 0 using essentially the same
method). Therefore ψ (0) = limN →∞ ψN (0), which means exactly that the
exchange of the limits N → ∞ and n → 0 in (1.410) is justified; but of course
this has very limited interest since the computation of ψ(t) is not any easier
than that of the limit in (1.411).
Moreover the nice formula (1.411) is wrong for large β (low-temperature).
The book [105], following ground-breaking work of G. Parisi, attempts to ex-
plain how one should (from a physicist’s point of view) modify at low temper-
ature the computation (1.396) when n < 1. (This is particularly challenging
because the number of variables g, , which is n(n − 1)/2, is negative in
that case...) As a mathematician, the author does not feel qualified to try to
explain these ideas or even to comment on them.
Hundreds of papers have been written relying on the replica method; the
authors of these papers seem to have little doubt that this method always
gives the correct answer. Its proponents hope that at some point it will be
made rigorous. At the present time however it is difficult, at least for this
author, to see in it more than a way to guess the correct formulas. Certainly
the predictive power of the method is impressive. The future will tell whether
this is the case because its experts are guided by a set of intuitions that is
correct at a still deeper level, or whether the power comes from the method
itself.

1.14 Notes and Comments


The SK model has a rather interesting history. The paper of Sherrington and
Kirkpatrick [136] that introduces the model is called “Solvable model of a spin
glass”. The authors felt that limN →∞ pN (β, h) = SK(β, h) for all values of β
and h. They however already noticed that something must be wrong with this
result, and this was confirmed soon after [5]. Whatever one may think of the
methods of this branch of theoretical physics (and I do not really know what
I think myself about them), their reliability is not guaranteed. One can find
a description of these methods in the book of Mézard, Parisi, Virasoro [105],
1.14 Notes and Comments 147

but the only part of the book I feel I understand is the introduction (on which
Section 1.1 relies heavily). Two later (possibly more accessible) books about
spin glasses written by physicists are [59] and [112]. The recent book by M.
Mézard and A. Montanari [102] is much more accessible to a mathematically
minded reader. It covers a wide range of topics, and remarkably succeeds at
conveying the breath and depth of the physical ideas.
The first rigorous results on the SK model concern only the case h = 0.
They are proved by Aizenman, Lebowitz and Ruelle in [4] using a “cluster
expansion technique”, which is a common tool in physics. Their methods
seem to apply only to the case h = 0. At about the same time, Fröhlich and
Zegarlinski [61] prove (as a consequence of a more general approach that is
also based on a cluster expansion technique) that the spin correlations vanish
if β ≤ β0 , even if h = 0. In fact they prove that
L
E( σ1 σ2 − σ1 σ2 )2 ≤ . (1.412)
N
A later paper by Comets and Neveu [49] provides more elegant proofs
of several of the main results of [4] using stochastic calculus. Their method
unfortunately does not appear to extend beyond the case h = 0. They prove
a central limit theorem for the overlap R1,2 .
Theorem 1.3.4 is a special occurrence of the general phenomenon of con-
centration of measure. This phenomenon was first discovered by P. Lévy, and
its importance was brought to light largely through the efforts of V.D. Mil-
man [106]. It is arguably one of the truly great ideas of probability theory.
More references, and applications to probability theory can be found in [139]
and [140]. In the fundamental case of Gaussian measure, the optimal result
is already obtained in [86], and Theorem 1.3.4 is a weak consequence of this
result. Interestingly, it took almost 20 years after the paper [86] before re-
sults similar to (1.54) were obtained in the theory of disordered systems, by
Pastur and Shcherbina [120], using martingale difference sequences. A very
nice exposition of most of what is known about concentration of measure can
be found in the book of M. Ledoux [93]
It was not immediately understood that, while the case β < 1, h = 0
of the SK model is not very difficult, the case h = 0 is an entirely different
matter. The first rigorous attempt at justifying the mysterious expression
in the right-hand side of (1.73) is apparently that of Pastur and Shcherbina
[120]. They prove that this formula holds in the domain where

lim Var ( R1,2 ) = lim E( R1,2 − E R1,2 )2 = 0 , (1.413)


N →∞ N →∞

but they do not prove that (1.413) is true for small β. Their proof required
them to add a strange perturbation term to the Hamiltonian. The result was
later clarified by Shcherbina [127], who used the Hamiltonian (1.61) with
hi Gaussian. Using arguments somewhat similar to those of the Ghirlanda-
Guerra identities, (which we will study in Volume II) she proved that (1.413)
148 1. The Sherrington-Kirkpatrick Model

is equivalent (over a certain domain) to

lim E (R1,2 − R1,2 )2 = 0 . (1.414)


N →∞

She did not prove (1.414). She was apparently unaware that (1.414) is proved
in [61] for small β. Since the paper [127] was not published, I was not aware
of it and rediscovered its results in Section 4 of [141] with essentially the same
proof. I also gave a very simple proof of (1.412) for small β. Discovering this
simple proof was an absolute disaster, because I wasted considerable energy
trying to use the same principle in other situations, which invariably led to
difficult proofs of suboptimal results. I will not describe in detail the contents
of [141] or my other papers because this now does not seem so interesting
any more. I hope that the proofs presented here are much cleaner than those
of these previous papers.
In a later paper Shcherbina [128] proved that limN →∞ pN (β, h) = SK(β, h)
in a remarkably large region containing in particular all values β < 1. The
ideas of this paper are not really transparent to me. A later version [129]
is more accessible, but I became aware of its existence too late to have the
energy to analyze it. It would be interesting to decide if this approach suc-
ceeds because of a special trick, or if it contains the germ of a powerful
method. One should however point out that her use of relations similar to
the Ghirlanda-Guerra identities seems to preclude obtaining the correct rates
of convergence.
I proved in [149] an expansion somewhat similar to (1.151), using a more
complicated method that does not seem to extend to the model to be consid-
ered in Chapter 2. This paper proves weaker versions of many of the results
of Section 1.6 and Section 1.8 to Section 1.10. The existence of the limits of
quantities such as N k/2 E A , where A is the product of k terms of the type
R, is proved by a recursion method very similar to the one used here, but
the limit is not computed explicitly.
I do not know who first used the “smart path method”. The proof of
Proposition 1.3.3 is due to J.P. Kahane [87] and that of Theorem 1.3.4 is
due to G. Pisier [124]. I had known these papers since they appeared, but
it took a very, very long time to realize that it was the route to take in the
cavity method. The smart path method was first used in this context in [147],
and then systematically in [158]. Interestingly, Guerra and Toninelli arrived
independently at the very similar idea of interpolating between Hamiltonians
as in Section 1.3. Proposition 1.3.2 must have been known for a very long
time, at least as far back as [137].
The reader might wonder about the purpose of (1.152), since we nearly
always use (1.151) instead. One use is that, using symmetry between sites, we
can get a nice expression for ν1 (f ). This idea will be used in Volume II. We
do not use it here, because, besides
 controlling the quantities R1,2 , it requires
controlling R1,2,3,4 = N −1 i≤N σi1 σi2 σi3 σi4 . To give a specific example, if
f = R1,2 − q, we get from (1.152) that
1.14 Notes and Comments 149

ν1 ((ε1 ε2 − q)f ) = β 2 ν((1 − ε1 ε2 q)(R1,2 − q)f )


− 4β 2 ν((ε2 ε3 − qε1 ε3 )(R1,3 − q)f )
+ 3β 2 ν((ε1 ε2 ε3 ε4 − qε3 ε4 )(R3,4 − q)f )
= β 2 ν((1 − R1,2 q)(R1,2 − q)f )
− 4β 2 ν((R2,3 − qR1,3 )(R1,3 − q)f )
+ 3β 2 ν((R1,2,3,4 − qR3,4 )(R3,4 − q)f ) .
√ √
If we know that ν(|R1,2 − q|3 )1/3 ≤ L/ N and ν(|R1,2,3,4 − q|3 )1/3 ≤ L/ N ,
we get

ν1 ((ε1 ε2 − q)f ) = β 2 (1 − q 2 )ν((R1,2 − q)2 )


− 4β 2 (q − q 2 )ν((R1,2 − q)(R2,3 − q))
q − q 2 )ν((R1,2 − q)(R3,4 − q)) + O(3)
+ 3β 2 (

a relation that we may combine with

ν((R1,2 − q)2 ) = ν((ε1 ε2 − q)f ) = ν0 ((ε1 ε2 − q)f ) + ν1 ((ε1 ε2 − q)f ) + O(3) .

In this way we have fewer error terms to control in the course of proving the
central limit theorems presented here. The drawback is that one must prove
first that ν((R1,2,3,4 − q)2n ) ≤ K/N n (which is not very difficult).
Two months after the present Chapter was widely circulated at the time
of [157] (in a version that already contained the central limit theorems of
Section 1.10), the paper [74] came out, offering very similar results, together
with a CLT for N −1 log ZN (β, h), of which Theorem 1.4.11 is a quantitative
improvement.
I am grateful to M. Mézard for having explained to me the idea of coupling
two copies of the SK model, and the discontinuity this should produce beyond
the A-T line. This led to Theorem 1.9.6.
Guerra’s bound of (1.73) is proved in [71] where Proposition 1.3.8 can
also be found. (This lemma was also proved independently by R. Latala in
an unpublished paper.)
The present work should make self-apparent the amount of energy already
spent in trying to reach a mathematical understanding of mean field models
related to spin glasses. It is unfortunate that some of the most precise results
about the SK model rely on very specific properties of this model. However
fascinating, the SK model is a rather specific object, and as such its impor-
tance can be questioned. I feel that the appeal of the “theory” of spin glasses
does not lie in any particular model, but rather in the apparent generality
of the phenomenon it predicts. About this, we still understand very little,
despite all the examples that will be given in forthcoming chapters.
2. The Perceptron Model

2.1 Introduction
The name of this chapter comes from the theory of neural networks. An ac-
cessible introduction to neural networks is provided in [83], but what these
are is not relevant to our purpose, which is to study the underlying mathe-
matics. Roughly speaking, the basic problem is as follows. What “propor-
tion” of ΣN = {−1, 1}N is left when one intersects this set with many
random half-spaces? A natural definition for a random half-space is a set
{x ∈ RN ; x · v ≥ 0} where the random vector v is uniform over the unit
sphere of RN . More conveniently one can consider the set {x ∈ RN ; x·g ≥ 0},
where g is a standard Gaussian vector, i.e. g = (gi )i≤N , where gi are indepen-
dent standard Gaussian r.v.s. This is equivalent because the vector g/g is
uniformly distributed on the unit sphere of RN . Consider now M such Gaus-
sian vectors gk = (gi,k )i≤N , k ≤ M , all independent, the half-spaces
  
Uk = {x ; x · gk ≥ 0} = x , gi,k xi ≥ 0 ,
i≤N

and the set 


ΣN ∩ Uk . (2.1)
k≤M

A given point of ΣN has exactly a 50% chance to belong to Uk , so that


  
E card ΣN ∩ Uk = 2N −M . (2.2)
k≤M

The case of interest is when N becomes large and M is proportional to N ,


M/N → α > 0. A consequence of (2.2) is that if α > 1 the set (2.1) is typically
empty when N is large, because the expected value of its cardinality is  1.
When α < 1, what is interesting is not however the expected value (2.2) of
the cardinality of the set (2.1), but rather the typical value of this cardinality,
which is likely to be smaller. Our ultimate goal is the computation of this
typical value, which we will achieve only for α small enough.
A similar problem was considered √ in (0.2) where ΣN is replaced by the
sphere SN of center 0 and radius N . The situation with ΣN is usually
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 151
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 2, © Springer-Verlag Berlin Heidelberg 2011
152 2. The Perceptron Model

called the binary perceptron, while the situation with SN is usually called the
spherical perceptron. The spherical perceptron will motivate the next chapter.
We will return to both the binary and the spherical perceptron in Volume II,
in Chapter 8 and Chapter 9 respectively. Both the spherical and the binary
perceptron admit another popular version, where the Gaussian r.v.s gi,j are
replaced by independent Bernoulli r.v.s (i.e. independent random signs), and
we will also study these. Thus we will eventually investigate a total of four
related but different models. It is not very difficult to replace the Gaussian
r.v.s by random signs; but it is very much harder to study the case of ΣN
than the case of the sphere.

Research Problem 2.1.1. (Level 3!) Prove that there exists a number α∗
and a function ϕ : [0, α∗ ) → R with the following properties:
1 - If α > α∗ , then as N → ∞ and M/N → α the probability that the set
(2.1) is not empty is at most exp(−N/K(α)).
2 - If α < α∗ , N → ∞ and M/N → α, then
  
1
log card ΣN ∩ Uk → ϕ(α) (2.3)
N
k≤M

in probability. Compute α∗ and ϕ.

This problem is a typical example of a situation where one expects “reg-


ularity” as N → ∞, but where it is unclear how to even start doing anything
relevant. In Volume II, we will prove (2.3) when α is small enough, and we
will compute ϕ(α) in that case. (We expect that the case of larger α is much
more difficult.) As a corollary, we will prove that there exists a number α0 < 1
such that if M = αN , α > α0 , then the set (2.1) is typically empty for N
large, despite the fact that the expected value of its cardinality is 2N −M  1.
One way to approach the (very difficult) problem mentioned above is
to introduce a version “with a temperature”. We observe that if x ≥ 0 we
havelimβ→∞ exp(−βx) = 0 if x > 0 and = 1 if x = 0. Using this for
x = k≤M 1{σ∈U / k } where σ ∈ ΣN implies
      
card ΣN ∩ Uk = lim exp −β 1{σ∈U
/ k} , (2.4)
β→∞
k≤M σ∈ΣN k≤M

so that to study (2.3) it should be relevant to use the Hamiltonian



− HN,M (σ) = −β 1{σ∈U
/ k} . (2.5)
k≤M

If one can compute the corresponding partition function (and succeed in


exchanging the limits N → ∞ and β → ∞), one will then prove (2.3).
More generally, we will consider Hamiltonians of the type
2.1 Introduction 153
  1  
− HN,M (σ) = u √ gi,k σi , (2.6)
k≤M
N i≤N

where u is a function, and where (gi,k ) are independent standard normal r.v.s.
Of course the Hamiltonian depends on u, but the dependence is kept  implicit.
The role of the factor N −1/2 is to make the quantity N −1/2 i≤N gi,k σi
typically of order 1. There is no parameter β in the right-hand side of (2.6),
since this parameter can be thought of as being included in the function u.
Since it is difficult to prove anything at all without using integration
by parts we will always assume that u is differentiable. But if we want the
Hamiltonian (2.6) to be a fair approximation of the Hamiltonian (2.5), we will
have to accept that u takes very large values. Then, in the formulas where
u occurs, we will have to show that somehow these large values cancel out.
There is no magic way to do this, one has to work hard and prove delicate
estimates (as we will do in Chapter 8). Another source of difficulty is that we
want to approximate the Hamiltonian (2.5) for large values of β. That makes
it difficult to bound from below a number of quantities that occur naturally
as denominators in our computations.
On the other hand, there is a kind of beautiful “algebraic” structure
connected to the Hamiltonian (2.6), which is uncorrelated to the analytical
problems described above. We feel that it is appropriate, in a first stage,
to bring this structure forward, and to set aside the analytical problems (to
which we will return later). Thus, in this chapter we will assume a very strong
condition on u, namely that for a certain constant D we have

∀ , 0 ≤  ≤ 3 , |u() | ≤ D . (2.7)

Given values of N and M we will try to “describe the system generated by the
Hamiltonian (2.6)” within error terms that become small for N large. We will
be able to do this when the ratio α = M/N is small enough, α ≤ α(D). The
notation α = M/N will be used through this chapter and until Chapter 4.
Let us now try to give an overview of what will
 happen,  without getting
into details. We recall the notation R, = N −1 i≤N σi σi . As is the case
for the SK model, we expect that in the high-temperature regime we have

R1,2  q (2.8)

for a certain number q depending on the system. Let us define


1  1 
Sk = √ gi,k σi ; Sk = √ gi,k σi . (2.9)
N i≤N N i≤N

After one works some length of time with the system, one gets the irresistible
feeling that (in the high-temperature regime) “the quantities Sk behave like
individual spins”, and (2.8) has to be complemented by the relation
154 2. The Perceptron Model

1 
u (Sk1 )u (Sk2 )  r (2.10)
N
k≤M

where r is another number attached to the system. Probably the reader would
expect a normalization factor M rather than N in (2.10), but since we should
think of M/N as M/N → α > 0, this is really the same. Also, the occurrence
of u will soon become clear.
We will use the cavity method twice. In Section 2.2 we “remove one spin”
as in Chapter 1. This lets us guess what is the correct expression of q as
a function of r. In Section 2.3, we then use the “cavity in M ”, comparing
the system with the similar system where M has been replaced by M − 1.
This lets us guess what the expression of r should be as a function of q. The
two relations between r and q that are obtained in this manner are called
the “replica-symmetric equations” in physics. We prove in Section 2.4 that
these equations do have a solution, and that (2.8) and (2.10) hold for these
values of q and r. For N large and M/N small, we will then (approximately)
compute the value of
1 
pN,M (u) = E log exp(−HM,N (σ)) , (2.11)
N σ

(for the Hamiltonian defined by (2.6)) by an interpolation method motivated


by the idea that the quantities Sk “behave like individual spins”.

2.2 The Smart Path


It would certainly help to understand how the Hamiltonian (2.6) depends on
the last spin. Let us write
1 
Sk0 = √ gi,k σi ,
N i≤N −1

so that Sk = Sk0 + N −1/2 gN,k σN and if u is differentiable,


   gN,k σ 2  gN,k2
u(Sk ) = u(Sk0 ) + σN √ u (Sk0 ) + N u (Sk0 ) + · · ·
N 2 N
k≤M k≤M k≤M k≤M
(2.12)
The terms · · · are of lower order. We observe that σN
2
= 1. (This will no longer
be the case in Chapter 3, when we will consider spins taking all possible
2
values, so that σN will no longer be constant.) We also observe that the r.v.s
gN,k are independent. So it is reasonable according to the law of large numbers
to expect that the third term on the right-hand side should behave like a
constant and not influence the Hamiltonian. By the central limit theorem, one
should expect the second term on the right-hand side of (2.12) to behave like
2.2 The Smart Path 155

σN Y , where Y is a Gaussian r.v. independent of all the other r.v.s (Of course
at some point we will have to guess what is the right choice for r = EY 2 , but
the time will come when this guess will be obvious.) Thus we expect that
 
u(Sk )  u(Sk0 ) + σN Y + constant . (2.13)
k≤M k≤M

Rather than using power expansions (which are impractical when we do not
have a good control on higher derivatives) it is more fruitful to find a suitable
interpolation between the left and the right-hand sides of (2.13). The first idea
that comes to mind is to use the Hamiltonian

  t


u Sk +0
gN,k σN + σN 1 − tY . (2.14)
N
k≤M

This is effective and was used in [157]. However, the variance of the Gaussian
r.v. Sk0 + t/N gN,k σN depends on t; when differentiating, this creates terms
that we will avoid by being more clever. Let us consider the quantity
 
0 t 1−t
Sk,t = Sk,t (σ, ξk ) = Sk + gN,k σN + ξk
N N
 
1  t 1−t
= √ gi,k σi + gN,k σN + ξk . (2.15)
N N N
i<N

In this expression, we should think of (ξk )k≤M not just as random constants
ensuring that the variance of Sk,t is constant but also as “new spins”. That
is, let ξ = (ξk )k≤M ∈ RM , and consider the Hamiltonian
 √
− HN,M,t (σ, ξ) = u(Sk,t ) + σN 1 − tY . (2.16)
k≤M

The configurations are now points (σ, ξ) in ΣN × RM . Let us denote by γ the


canonical Gaussian measure on RM . We define Gibbs’ measure on ΣN × RM
by the formula
1 
f t = f (σ, ξ) exp(−HN,M,t (σ, ξ))dγ(ξ) ,
Z σ

where f is a function on ΣN × RM and where Z is the normalizing factor,



Z= exp(−HN,M,t (σ, ξ))dγ(ξ) .
σ

More generally for a function f on (ΣN × RM )n = ΣN


n
× RM n , we define
156 2. The Perceptron Model

1 
f t = ··· f (σ 1 , . . . , σ n , ξ 1 , . . . , ξ n )
Zn
σ 1 ,...,σ n
  
× exp − HN,M,t dγ(ξ 1 ) · · · dγ(ξ n ) ,

(2.17)
≤n

where Z is as above and



HN,M,t = HN,M,t (σ  , ξ  ) . (2.18)

Integration of ξ with respect to γ means simply that we think of (ξk )k≤M


as independent Gaussian r.v.s and we take expectation. We recall the con-
vention that Eξ denotes expectation with respect to all r.v.s labeled
ξ (be it with subscripts or superscripts). We thus rewrite (2.17) as
   
1
f t = n Eξ f (σ , . . . , σ , ξ , . . . , ξ ) exp −
1 n 1 n 
HN,M,t ; (2.19)
Z 1 n
σ ,...,σ ≤n

Z = Eξ exp(−HN,M,t (σ, ξ)) .
σ

In these formulas, ξ  = (ξk )k≤M , ξk are independent Gaussian r.v.s. One
should think of ξ  as being a “replica” of ξ. In this setting, replicas are
simply independent copies.
Exercise 2.2.1. Prove that when f depends on σ 1 , . . . , σ n , but not on
ξ 1 , . . . , ξ n , then f t in (2.19) is exactly the average of f with respect to
the Hamiltonian
  
 1  t √
−H = ut √ gi,k σi + gN,k σN + σN 1 − tY ,
N i≤N −1 N
k≤M

where ut is defined by
  
1−t
exp ut (x) = E exp u x + ξ , (2.20)
N
for ξ a standard normal r.v.

The reader might wonder whether it is really worth the effort to introduce
this present setting simply in order to avoid an extra term in Proposition 2.2.3
below, a term with which it is not so difficult to deal anyway. The point is
that the mechanism of “introducing new spins” is fundamental and must be
used in Section 2.3, so we might as well learn it now.
Consistently with our notation, if f is a function on ΣNn
× RM n , we define
d
νt (f ) = E f t ; νt (f ) = νt (f ) , (2.21)
dt
2.2 The Smart Path 157

where f t is given by (2.19).


We also write ν(f ) = ν1 (f ). When f does not depend on the r.v.s ξ  , then
ν(f ) = E f , where · refers to Gibbs’ measure with Hamiltonian (2.6). As

in Chapter 1, we write ε = σN , and we recall the r.v. Y of (2.16).

Lemma 2.2.2. Given a function f − on ΣNn


−1 , and a subset I of {1, . . . , n},
we have
    
 
ν0 f − ε = E (thY )cardI ν0 (f − ) = ν0 ε ν0 (f − ) .
∈I ∈I

This lemma holds whatever the value of r = EY 2 . The proof is identical to


that of Lemma 1.6.2. The Hamiltonian HN,M,0 decouples the last spin from
the first N − 1 spins, which is what it is designed to do.
We now turn to the computation of νt (f ). Throughout the chapter, we
write α = M/N . Implicitly, we think of N and M as being large but fixed.
The model then depends on the parameters N and α (and of course of u). We
recall the definition (2.15) of Sk,t , and consistently with the notation (2.18)
we write  
1  t 1−t 
Sk,t = √
 
gi,k σi + gN,k ε + ξ . (2.22)
N i<N N N k

Proposition 2.2.3. Assume that u is twice differentiable and let r = EY 2 .


n
Then for a function f on ΣN , we have

νt (f ) = I + II (2.23)

  
 
I=α νt ε ε u (SM,t )u (SM,t )f
1≤< ≤n
  
− αn 
νt ε εn+1 u (SM,t n+1
)u (SM,t )f
≤n
n(n + 1)  n+1 n+2

+α νt εn+1 εn+2 u (SM,t )u (SM,t )f . (2.24)
2

  
II = −r νt (ε ε f ) − n νt (ε εn+1 f )
1≤< ≤n ≤n

n(n + 1)
+ νt (εn+1 εn+2 f ) . (2.25)
2

The proposition resembles Lemma 1.6.3, so it should not be so scary


anymore. As in Lemma 1.6.3, the complication is algebraic, and each of the
terms I and II is made up of simple pieces. Moreover both terms have similar
structures. This formula will turn out to be much easier to use than one might
158 2. The Perceptron Model

think at first. In particular one should observe that by symmetry, and since
 
α = M/N , in the expression for I we can replace the term αu (SM,t )u (SM,t )
by
1   
u (Sk,t )u (Sk,t ),
N
k≤M

so that if (2.10) is indeed correct, the terms I and II should have a good will
to cancel each other out.
Proof. We could make this computation appear as a consequence of (1.40),
but for the rest of the book we will change policy, and proceed directly, i.e.
we write the value of the derivative and we integrate by parts. It is immediate
from (2.19) that

d  d d
f t = 
(−HN,M,t )f −n n+1
(−HN,M,t )f , (2.26)
dt dt t dt t
≤n

and, writing gk for gN,k ,

d  1  gk ε ξk

ε Y

(−HN,M,t ) = √ √ − √ 
u (Sk,t )− √ . (2.27)
dt
k≤M
2 N t 1 − t 2 1−t

We observe the symmetry for k ≤ M . All √the values


√ of k bring the same
contribution. There are M of them, and M/ N = α N , so that

νt (f ) = III + IV + V

  
α N     
III = νt gM ε u 
(SM,t )f − nνt gM εn+1 u (SM,t )f
n+1
(2.28)
2 t
≤n
  
N
α     n+1 
IV = − νt ξM u (SM,t )f − nνt ξM u (SM,t )f
  n+1
1−t
2
≤n
 
1 1 
V=− √ νt (ε Y f ) − nνt (εn+1 Y f ) .
2 1−t
≤n

It remains to integrate by parts in these formulas to get the result. The easiest
case is that of the term IV, because “different replicas use independent copies
 
of ξ”. We write the explicit formula for ξM u (SM,t )f t , that is
 
ξM u (SM,t )f t
   
1
= n Eξ ξM
u 
(SM,t )f (σ 1 , . . . , σ n ) exp − 
HM,N,t ,
Z 1 σ ,...,σ n ≤n
2.2 The Smart Path 159

and we see that we only have to integrate by parts in the numerator. The
  
dependence on ξM is through u (SM,t ) and through the term u(SM,t ) in the
Hamiltonian and moreover


∂SM,t 1−t

= , (2.29)
∂ξM N
so that

  1 − t 

ξM u (SM,t )f t = (u (SM,t ) + u 2 (SM,t

))f t ,
N
and therefore
 
α     
IV = − 
νt ((u (SM,t )+u 2 (SM,t

))f −nνt (u (SM,t
n+1 n+1
)+u 2 (SM,t ))f .
2
≤n

The second easiest case is that of V, because we have done the same com-
putation (implicitly at least) in Chapter 1; since EY 2 = r, we have V = II.
Of course, the reader who does not find this formula obvious should simply
write
νt (ε Y f ) = EY ε f t ,
and carry out the integration by parts, writing the explicit formula for ε f t .
To compute the term III, there is no miracle. We write
 
νt (gM ε u (SM,t )f ) = EgM ε u (SM,t )f t

and we use the integration by parts formula E(gM F (gM )) = EF (gM ) when

seeing ε u (SM,t )f t as a function of gM . The dependence on gM is through

the quantities SM,v , and


∂SM,v t
= ε .
∂gM N

Writing the (cumbersome) explicit formula for ε u (SM,t )f t , we get that
 
∂  t 
ε u (SM,t )f t = u (SM,t )f t
∂gM N
 

+ 
ε ε u (SM,t )u (SM,t )f t − n ε εn+1 u (SM,t
 n+1
)u (SM,t )f t .
 ≤n


The first term arises from the dependence of the factor u (SM,t ) on gM and
the other terms from the dependence of the Hamiltonian on gM . Consequently
we obtain
 
 t 
νt (ε u (SM,t )f ) = νt (u (SM,t )f )
N
 

+ νt (ε ε u (SM,t )u (SM,t )f ) − nνt (ε εn+1 u (SM,t )u (SM,t )f ) .
  n+1

 ≤n
160 2. The Perceptron Model

Similarly we have

∂ n+1 t n+1
εn+1 u (SM,t )f t = u (SM,t )f t
∂gM N
  
 n+1
+ ε εn+1 u (SM,t )u (SM,t )f t
 ≤n+1

 
− (n + 1) εn+1 εn+2 u (SM,t
n+1 n+2
)u (SM,t )f t ,

and consequently

n+1 t n+1
νt (εn+1 u (SM,t )f ) = νt (u (SM,t )f )
N

 n+1
+ νt (ε εn+1 u (SM,t )u (SM,t )f )
 ≤n+1

− (n + 1)νt (εn+1 εn+2 u (SM,t
n+1 n+2
)u (SM,t )f ) .

Regrouping the terms, we see that III + IV = I. 




Exercise 2.2.4. Suppose that we had not been as sleek as we were, and that
instead of (2.15) and (2.22) we had defined
 
t 1  t
0
Sk,t = Sk,t (σ) = Sk + gN,k σN = √ gi,k σi + gN,k σN
N N N
i<N

and 
1  t

Sk,t =√ gi,k σi + 
gN,k σN .
N i<N N

Prove that then in the formula (2.23) we would get the extra term

 
α       
VI = 
νt u (SM,t )2 +u (SM,t

) f −nνt u (SM,t
n+1 2 n+1
) +u (SM,t ) f .
2
≤n

2.3 Cavity in M

To pursue the idea that the terms I and II in (2.23) should nearly cancel out
each other, the first thing to do is to try to make sense of the term I, and to
 
understand the influence of the quantities u (SM,t ). The quantities SM,t also
occur in the Hamiltonian, and we should make this dependence explicit. For
this we introduce a new Hamiltonian
2.3 Cavity in M 161
 √
− HN,M −1,t (σ, ξ) = u(Sk,t (σ, ξk )) + σN 1 − tY , (2.30)
k≤M −1

where the dependence on ξ is stressed to point out that it will be handled as


in the case of the Hamiltonian (2.16), that is, an average · t,∼ with respect
to this Hamiltonian will be computed with the formula (2.31) below. Let us
first notice that, even though the right-hand side of (2.30) does not depend
on ξM , we denote for simplicity of notation the Hamiltonian as a function of
σ and ξ. If f is a function on ΣN n
× RM n , we then define
   
1
f t,∼ = n Eξ f (σ , . . . , σ , ξ , . . . , ξ ) exp −
1 n 1 n 
HN,M −1,t ,
Z∼ 1 n
σ ,...,σ ≤n
(2.31)
where 
Z∼ = Eξ exp(−HN,M −1,t (σ, ξ)) ,
σ
  
and where HN,M −1,t = HN,M −1,t (σ , ξ ). There of course Eξ includes ex-

pectation in the r.v.s ξM , even though the Hamiltonian does not depend on
those. Since −HN,M,t

= −HN,M
 
−1,t + u(SM,t ), the identity
 
1 1 1
Z = Eξ exp(−HN,M,t ) = Eξ exp u(SM,t ) exp(−HN,M −1,t )
σ σ
1
= Z∼ exp u(SM,t ) t,∼

holds, and, similarly,


   
Eξ f (σ , . . . , σ , ξ , . . . , ξ ) exp −
1 n 1 n 
HN,M,t
σ 1 ,...,σ n ≤n

n 
= Z∼ f exp u(SM,t ) .
≤n t,∼

Combining these two formulas with (2.31) yields that if f is a function


n
on ΣN × RM n , we have
  

f exp ≤n u(SM,t ) t,∼
f t= 1 ) n . (2.32)
exp u(SM,t t,∼


Our best guess now is that the quantities SM,t , when seen as functions
of the system with Hamiltonian (2.30), will have a jointly Gaussian behavior
under Gibbs’ measure, with pairwise correlation q, allowing us to approx-
imately compute the right-hand side of (2.32) in Proposition 2.3.5 below.
This again will be shown by interpolation. Let us consider a new parameter
0 ≤ q ≤ 1 and standard Gaussian r.v.s (ξ  ) and z that are independent of all
162 2. The Perceptron Model

the other r.v.s already considered. (The reader will not confuse the r.v.s ξ 

with the r.v.s ξM .) Let us set

θ = z q + ξ  1−q . (2.33)

Thus these r.v.s share the common randomness z and are independent given
that randomness. For 0 ≤ v ≤ 1 we define
√  √
Sv = vSM,t + 1 − vθ . (2.34)

The dependence on t is kept implicit; when using Sv we think of t (and M )


as being fixed.
Let us pursue the idea that in (2.31), Eξ denotes expectation in all
r.v.s labeled ξ including the variables ξ  and let us further define with this
convention   

f exp ≤n u(Sv ) t,∼
νt,v (f ) = E . (2.35)
exp u(Sv1 ) nt,∼
Using (2.32) yields
νt,1 (f ) = νt (f ) .
The idea of (2.35) is of course that in certain cases νt,0 (f ) should be much
easier to evaluate than νt (f ) = νt,1 (f ) and that these quantities should be
close to each other if q is appropriately chosen. Before we go into the details
however, we would like to explain the pretty idea that is hidden behind this
construction. The idea is simply that we consider ξ “as a new spin”. To
explain this, consider a spin system where the space of configurations is the
collection of all triplets (σ, ξ, ξ) for σ ∈ ΣN , ξ ∈ RM and ξ ∈ R. Consider
the Hamiltonian

−H(σ, ξ, ξ) = −HN,M −1,t (σ, ξ) + u(Sv ) ,


√ √ √ √
where Sv = vSM,t + 1 − vθ, for θ = z q + 1 − qξ. Then, for a function
f of σ 1 , . . . , σ n , ξ 1 , . . . , ξ n and ξ 1 , . . . , ξ n we can define a quantity f t,v by
a formula similar to (2.19) and (2.31). As in (2.32), we have
  

f exp ≤n u(Sv ) t,∼
f t,v = ,
exp u(Sv1 ) nt,∼

so that in fact νt,v = E · t,v . Let us observe that the r.v. θ depends also
on z, but this r.v. is not considered as a “new spin”, but rather as “new
randomness”.
The present idea of considering ξ as a new spin is essential. As we men-
tioned on page 156, the idea of considering ξ1 , . . . , ξM as new spins was not
essential, but since it is the same idea, we decided to make the minimal extra
effort to use the setting of (2.19).
First, we reveal the magic of the computation of νt,0 .
2.3 Cavity in M 163

Lemma 2.3.1. Consider 0 ≤ q ≤ 1 and define


 2
Eξ u (θ) exp u(θ)
r = E , (2.36)
Eξ exp u(θ)
√ √
where θ = z q + ξ 1 − q for independent standard Gaussian r.v.s z and ξ
n
and where Eξ denotes expectation in ξ only. Consider a function f on ΣN .
This function might depend on the variables ξk for k < M and  ≤ n, but it


does not depend on the randomness of the variables ξM or ξ  . Then

νt,0 (f ) = E f t,∼ , (2.37)

and
νt,0 (u (S01 )u (S02 )f ) = rE f t,∼ . (2.38)

In particular we have νt,0 (u (S01 )u (S02 )f ) = rνt,0 (f ). If such an equality is


nearly true for v = 1 rather than for v = 0, we are in good shape to use
Proposition 2.2.3.
Proof. First we have
   
f exp u(θ ) = f t,∼ Eξ exp u(θ ) . (2.39)
t,∼
≤n ≤n

This follows from the formula (2.31). The quantities θ do not depend on the
spins σ, and their randomness “in the variables labeled ξ” is independent of
the randomness of the other terms. Now, independence implies

Eξ exp u(θ ) = (Eξ exp u(θ))n .
≤n

Moreover exp u(θ) t,∼ = Eξ exp u(θ), as (an obvious) special case of
(2.39). This proves (2.37).
To prove (2.38), proceeding in a similar manner and using now that
   2  n−2
Eξ u (θ1 )u (θ2 ) exp u(θ ) = Eξ u (θ) exp u(θ) Eξ exp u(θ) ,
≤n

we get
  
f u (θ1 )u (θ2 ) exp ≤n u(θ ) t,∼
νt,0 (u (S01 )u (S02 )f ) =E n
exp u(θ) t,∼
= rE f t,∼ ,

and this finishes the proof. 



We now turn to the proof that νt,0 and νt,1 are close. We recall that D is
the constant of (2.7).
164 2. The Perceptron Model

n
Lemma 2.3.2. Consider a function f on ΣN . This function depend on the
variables ξk for k < M and  ≤ n, but it does not depend on the randomness


of the variables z, gi,M , ξM or ξ  . Then if Bv ≡ 1 or Bv = u (Sv1 )u (Sv2 ),
whenever 1/τ1 + 1/τ2 = 1 we have
   
d 
 νt,v (Bv f ) ≤ K(n, D) νt,v (|f |τ1 )1/τ1 νt,v (|R1,2 − q|τ2 )1/τ2 + 1 νt,v (|f |) .
 dv  N
(2.40)
Here K(n, D) depends on n and D only.
Therefore the left-hand side is small if we can find q such that R1,2  q. The
reason why we write a derivative in the left-hand side rather than a partial
derivative is that when considering νt,v we always think of t as fixed.
Proof. The core of the proof is to compute d(νt,v (Bv f ))/dv by differentiation
and integration by parts, after which the bound (2.40) basically follows from
Hölder’s inequality. It turns out that if one looks at things the right way,
there is a relatively simple expression for d(νt,v (Bv f ))/dv. We will not reveal
this magic formula now. Our immediate concern is to explain in great detail
the mechanism of integration by parts, that will occur again and again, and
for this we decided to use a completely pedestrian approach, writing only
absolutely explicit formulas.
First, we compute d(νt,v (Bv f ))/dv by straightforward differentiation of
the formula (2.35). In the case where Bv = u (Sv1 )u (Sv2 ), setting
1  1
Sv = √ SM,t − √ θ ,
2 v 2 1−v
we find
d    
(νt,v (Bv f )) = νt,v f Sv1 u (Sv1 )u (Sv2 ) + νt,v f Sv2 u (Sv1 )u (Sv2 )
dv   
+ νt,v f Sv u (Sv )u (Sv1 )u (Sv2 )
≤n
 
− (n + 1)νt,v f Svn+1 u (Svn+1 )u (Sv1 )u (Sv2 ) . (2.41)

Of course the first term occurs because of the factor u (Sv1 ) in Bv , the second
term because of the factor u (Sv2 ) and the other terms because of the depen-
dence of the Hamiltonian on v. The rest of the proof consists in integrating
by parts. In some sense it is a straight forward application of the Gaussian
integration by parts formula (A.17). However, since we are dealing with com-
plicated expressions, it will take several pages to fill in all the details. The
notation is complicated, and this obscures the basic simplicity of the argu-
ment. Probably the ambitious reader should try to compute everything on
her own in simple case, and look at our presentation only if she gets stuck.
Even though we have written the previous formula in a compact form
using νt,v , to integrate by parts we have to spell out the dependence of the
2.3 Cavity in M 165

Hamiltonian on the variables Sv by using the formula (2.35). For example,
the first term in the right-hand side of (2.41) is
 1  
f Sv u (Sv1 )u (Sv2 ) exp 
≤n u(Sv ) t,∼
E . (2.42)
exp u(Sv1 ) nt,∼
To keep the formulas manageable, let us write
  
w = w(σ 1 , . . . , σ n , ξ 1 , . . . , ξ n ) = exp − 
HN,M −1,t
≤n

and let us define

w∗ = w∗ (σ  , ξ  ) = exp(−HN,M

−1,t ) .

These quantities are probabilistically independent of the randomness of the


variables Sv (which is why we introduced the Hamiltonian HN,M −1,t in the
first place).
The quantity (2.42) is then equal to

Eξ σ1 ,...,σn wSv1 C
E , (2.43)
Zn
where 
Z = Eξ w∗1 exp u(Sv1 ) ,
σ1

and where  
C = f u (Sv1 )u (Sv2 ) exp u(Sv ) .
≤n

Let us now make an observation that will be used many times. The r.v.
Z is independent of all the r.v.s labeled ξ, so that
 
Eξ σ1 ,...,σn w Sv1 C 1
σ 1 ,...,σ n w Sv C
n
= E ξ n
,
Z Z
and thus the quantity (2.43) is then equal to
 C  C
EEξ w Sv1 =E w Sv1 . (2.44)
Zn Zn
σ 1 ,...,σ n σ 1 ,...,σ n


Let us now denote by E0 integration in the randomness of gi,M , ξM , z and

ξ , given all the other sources of randomness. Therefore, since the quantities
w do not depend on any of the variables gi,M , ξk , z or ξ  , the quantity (2.44)
equals
 C
E w E0 Sv1 n . (2.45)
1 n
Z
σ ,...,σ
166 2. The Perceptron Model

The main step in the computation is the calculation of the quantity


E0 Sv1 C/Z n by integration by parts. We advise the reader to study the el-
ementary proof of Lemma 2.4.4 below as a preparation to this computation
in a simpler setting. To apply the Gaussian integration by parts formula
(A.17), we need to find a jointly Gaussian family (g, z1 , . . . , zP ) of r.v.s such
that g = Sv1 and that C/Z n is a function F (z1 , . . . , zP ) of z1 , . . . , zP . The
first idea that comes to mind is to use for the r.v.s (zp ) the following family
of variables, indexed by σ and ,
√ √
zσ = vSM,t (σ, ξM
) + 1 − vθ
   
√ 1  t 1−t 
= v √ gi,M σi + gN,M σN + ξ
N i<N N N M
√ √
+ 1 − v(z q + ξ  1 − q) ,

where σ ∈ ΣN takes all possible values and  is an integer. Of course these


variables depend on v but the dependence is kept implicit because we think
now of v as fixed. We observe that

Sv = zσ  , (2.46)

so that we can think of C as a function of these quantities:

C = Cσ1 ,...,σn = Fσ1 ,...,σn ((zσ )) , (2.47)


where Fσ1 ,...,σn is the function of the variables xσ given by
 
 1 n 1 2 
Fσ1 ,...,σn ((xσ )) = f (σ , . . . , σ )u (xσ1 )u (xσ2 ) exp u(xσ ) . (2.48)
≤n

Condition (2.47) holds simply because to compute Fσ1 ,...,σn ((zσ  )), we sub-
stitute zσ  = Sv to xσ in the previous formula. This construction however
does not suffice, because Z cannot be considered as a function of the quan-
tities zσ : the effect of the expectation Eξ is that “the part depending on the
r.v.s labeled ξ has been averaged out”. The part of zσ that does not depend
on the r.v.s labeled ξ is simply
  
√ 1  t √ √
yσ = v √ gi,M σi + gN,M σN + 1 − v qz .
N N
i<N

Defining 
√ 1−t  √
ξ∗ = v ξM + 1 − v 1 − qξ  ,
N
we then have
zσ = yσ + ξ∗ .
2.3 Cavity in M 167

It is now possible to express Z as a function of the r.v.s yσ . This is shown


by the formula
Z = F1 ((yσ )) ,
where F1 is the function of the variables xσ given by

F1 ((xσ )) = Eξ w∗ (σ, ξ 1 ) exp u(xσ + ξ∗1 ) . (2.49)
σ

Let us now define


1 1
zσ = √ SM,t (σ, ξM

)− √ θ
2 v 2 1−v
   
1 1  t 1−t 
= √ √ gi,M σi + gN,M σN + ξ
2 v N i<N N N M

1 √
− √ ( qz + 1 − qξ  ) ,
2 1−v

so that Sv = zσ  . The family of all the r.v.s zσ , yσ , ξ∗ , and zσ is a Gaussian
family, and this is the family we will use to apply the integration by parts
formula. In the upcoming formulas, the reader should take great care to

distinguish between the quantities zσ and zσ (The position of the  is not
the same).
We note the relations

E(θ )2 = 1 = E(SM,t (σ, ξM

))2 ;  =  ⇒ Eθ θ = q .
 1  t
 =  ⇒ ESM,t (σ, ξM
 
)SM,t (τ , ξM ) = Rt (σ, τ ) := σi τi + σN τN ,
N N
i<N

so that
 1 t
Ezσ zσ = 0 ;  =  ⇒ Ezσ zτ = (R (σ, τ ) − q) , (2.50)
2
and
1 t
(R (σ, τ ) − q) .
Ezσ yτ = (2.51)
2
We will simply use the integration by parts formula (A.17) and these
relations to understand the form of the quantity

C 1 Fσ 1 ,...,σ n ((zσ ))
E0 Sv1 = E 0 zσ 1 . (2.52)
Zn F1 ((yσ ))n

Let us repeat that this integration by parts takes place given all the
sources of randomness other than the r.v.s gi,M , ξk for k < M , z and ξ 
(so that it is fine if f depends on some randomness independent of these).
The exact result of the computation is not relevant now (it will be given
168 2. The Perceptron Model

in Chapter 9). For the present result we simply need the information that
t
dνt,v (Bv f )/dv is a sum of terms of the type (using the notation R,  =

t  
R (σ , σ ))
 − q)A) ,
t
νt,v (f (R, (2.53)
where A is a monomial in the quantities u (Svm ), u (Svm ), u(3) (Svm ) for m ≤
n + 2. So, let us perform the integration by parts in (2.52):

Fσ1 ,...,σn ((zσ ))  ∂Fσ1 ,...,σn 1


E0 zσ1 1 n
= E0 zσ1 1 zτ E0 ((zσ ))
F1 ((yσ )) ∂xτ F1 ((yσ ))n
τ ,
 ∂F1 Fσ1 ,...,σn ((zσ ))
−n E0 zσ1 1 yτ E0 ((yσ )) .
τ
∂xτ F1 ((yσ ))n+1

It is convenient to refer to the last term in the above (or similar) formula “as
the term created by the denominator” when performing the integration by
parts in (2.52). (It would be nice to remember this, since we will often use this
expression in our future attempts at describing at a high level computations
similar to the present one.) We first compute this term. We observe that

∂F1
= Eξ w∗ (τ , ξ 1 )u (xτ + ξ∗1 ) exp u(xτ + ξ∗1 ) .
∂xτ
Therefore using (2.51) we see that the term created by the denominator in
(2.52) is

n  Fσ1 ,...,σn ((zσ ))Eξ w∗ (τ , ξ 1 )u (yτ + ξ∗1 ) exp u(yτ + ξ∗1 )


− E0 (Rt (σ 1,τ )−q) .
2 τ
F1 ((yσ ))n+1

Since yτ + ξ∗1 = zτ1 , the contribution of this term to (2.44) is then

n  Fσ1 ,...,σn ((zσ ))Eξ w∗ (τ , ξ 1 )u (zτ1 ) exp u(zτ1 )


− E w(Rt (σ 1 , τ ) − q) .
2 F1 ((yσ ))n+1
σ 1 ,...,σ n ,τ
(2.54)
Now,

Eξ w∗ (τ , ξ 1 )u (zτ1 ) exp u(zτ1 ) = Eξ w∗ (τ , ξ n+1 )u (zτn+1 ) exp u(zτn+1 ) ,

so that, changing the name of τ into σ n+1 , and since w∗n+1 = w∗ (σ n+1 , ξ n+1 ),
the quantity (2.54) is equal to (using (2.46) in the second line)

n  Fσ1 ,...,σn ((zσ ))Eξ w∗n+1 u (zσn+1 n+1


n+1 ) exp u(zσ n+1 )
=− E t
w(R1,n+1 − q)
2 1 n+1
F1 ((yσ ))n+1
σ ,...,σ

n  CEξ w∗n+1 u (Svn+1 ) exp u(Svn+1 )


=− E t
w(R1,n+1 − q) .
2 Z n+1
σ 1 ,...,σ n+1
2.3 Cavity in M 169

In a last step we observe that in the above formula we can remove the expec-
tation Eξ . This is because the r.v.s labeled ξ that occur in this expectation
(namely ξ n+1 and ξ n+1 ) are independent of the other r.v.s labeled ξ that
occur in C and w. In this manner we finally see that the contribution of this
quantity to the computation of (2.42) is

n  t
C(R1,n+1 − q)ww∗n+1 u (Svn+1 ) exp u(Svn+1 )
− E
2 Z n+1
σ 1 ,...,σ n+1
n  
= − νt,v f (R1,n+1t
− q)u (Sv1 )u (Sv2 )u (Svn+1 ) .
2
In a similar manner we compute the contribution in (2.52) of the dependence
of Fσ1 ,...,σn on the variables zσ at a given value of , i.e of the quantity
 ∂Fσ1 ,...,σn 1
E0 zσ1 1 zτ E0 ((zσ )) . (2.55)
τ
∂xτ F1 ((yσ ))n

We observe in particular from (2.48) that


∂Fσ1 ,...,σn
((zσ )) = 0
∂xτ

unless τ = σ  , so that the quantity (2.55) equals


∂Fσ1 ,...,σn 1
E0 zσ1 1 zσ  E0 ((zσ )) . (2.56)
∂xσ F1 ((yσ ))n

Since Ezσ zσ = 0 by (2.50) we see that for  = 1 the contribution of this term
is 0.
When  ≥ 3, we have
 
∂Fσ1 ,...,σn  1 n 1 2  
((xσ )) = f (σ , . . . , σ )u (xσ1 )u (xσ2 )u (xσ ) exp u(xσ ) ,
∂xτ
≤n

so that the term (2.55) is simply


1  
t
νt,v f (R1, − q)u (Sv1 )u (Sv2 )u (Sv ) .
2
If  = 2, there is another term because of the factor u (Sv2 ), and this term is
2 νt,v f (R1,2 − q)u (Sv )u (Sv ) . So actually we have shown that
1 t 1 2

1  
νt,v (f Sv1 u (Sv1 )u (Sv2 )) = t
νt,v f (R1,2 − q)u (Sv1 )u (Sv2 )
2
1   
+ t
νt,v f (R1, − q)u (Sv1 )u (Sv2 )u (Sv )
2
2≤≤n
n  
− νt,v f (R1,n+1
t
− q)u (Sv1 )u (Sv2 )u (Svn+1 ) .
2
170 2. The Perceptron Model

We strongly suggest to the enterprising reader to compute now all the


other terms of (2.41). This is the best way to really understand the mechanism
at work. There is no difficulty whatsoever, this just requires patience.
Calculations similar to the previous one will be needed again and again.
We will not anymore explain them formally as above. Rather, we will give
the result of the computation with possibly a few words of explanation. It is
worth making now a simple observation that helps finding the result of such
a computation. It is the fact that from (2.51) we have

Ezσ yτ = Ezσ zτn+1 .

In a sense this means that when performing the integration by parts, we


obtain the same result as if Z were actually a function of the variables zσn+1 .
It is useful to formulate this principle as a heuristic rule:

The result of the expectation Eξ in the definition of Z is somehow


“to shift the dependence of Z in Sv on a new replica” . (2.57)

When describing in the future the computation of a quantity such as


νt,v (f Sv1 u (Sv1 )u (Sv2 )) by integration by parts, we will simply say: we inte-
grate by parts using the relations
 1 t
ESv Sv = 0 ; ESv Sv = (R  − q) , (2.58)
2 ,
and we will expect that the reader has understood enough of the algebraic
mechanism at work to be able to check that the result of the computation is
indeed the one we give, and the heuristic rule (2.57) should be precious for
this purpose. There are two more such calculations in the present chapter,
and the algebra in each is much simpler than in the present case. As a good
start to develop the understanding of this mechanism, the reader should at
the very least check the following two formulas involved in the computation
of (2.41):
 
νt,v f Sv3 u (Sv3 )u (Sv1 )u (Sv2 )
1  
= νt,v f (R3,1 t
− q)u (Sv3 )u (Sv1 )u (Sv2 )
2
1  
+ νt,v f (R3,2 t
− q)u (Sv3 )u (Sv1 )u (Sv2 )
2
1   
+ t
νt,v f (R3, − q)u (Sv3 )u (Sv1 )u (Sv2 )u (Sv )
2
=3,≤n
n  
− νt,v f (R3,n+1
t
− q)u (Sv3 )u (Sv1 )u (Sv2 )u (Svn+1 ) ,
2
and
2.3 Cavity in M 171
 
νt,v f Svn+1 u (Svn+1 )u (Sv1 )u (Sv2 )
1  
t
= νt,v f (Rn+1,1 − q)u (Svn+1 )u (Sv1 )u (Sv2 )
2
1  
t
+ νt,v f (Rn+1,2 − q)u (Svn+1 )u (Sv1 )u (Sv2 )
2
1  
+ t
νt,v f (Rn+1, − q)u (Svn+1 )u (Sv1 )u (Sv2 )u (Sv )
2
≤n
n+1  
− t
νt,v f (Rn+1,n+2 − q)u (Svn+1 )u (Sv1 )u (Sv2 )u (Svn+2 ) .
2
We bound a term (2.53) by

K(D)νt,v (|f ||R1,


t
 − q|) ,

and we write |R,


t
 − q| ≤ |R, − q| + 1/N to obtain the inequality

   
d   1
 νt,v (Bv f ) ≤ K(n, D) νt,v (|f ||R, − q|) + νt,v (|f |) .
 dv  N

1≤< ≤n+2
(2.59)
To conclude we use Hölder’s inequality. 


Exercise 2.3.3. Let us recall the notation Sk,t of Proposition 2.2.3 and de-
fine  
1 gk ε ξ

Sk,t = √ √ −√ k ,
2 N t 1−t
so that (2.27) becomes
d  ε Y

(−HN,M,t )= 
Sk,t 
u (Sk,t )− √ .
dt
k≤M
2 1−t

Observe the relations


 1 

ESk,t 
Sk,t 
= 0 ; ESk,t 
Sk,t = ε ε if  =  ; ESk,t 
Sk  ,t = 0 if k = k .
2N
(2.60)
Get convinced that the previously described mechanism yields the formula
(when  ≤ n + 1)
 
  1  
νt (Sk,t u (Sk,t )f ) = νt (ε ε u (Sk,t )u (Sk,t )f )
2N 
 =, ≤n+1

− (n + 1)νt (ε εn+2 u (Sk,t )u (Sk,t )f ) .
 n+2

Then get convinced that the term I in (2.23) can be obtained “in one step”
rather than by integrating by parts separately over the r.v.s ξk, and gk as
was done in the proof of Proposition 2.2.3.
172 2. The Perceptron Model

To follow future computations it is really important to understand the


difference between the situation (2.58) (where integration by parts “brings
 − q)/2 in each term”) and the situation (2.60), where this
t
a factor (R,
integration by parts brings “a factor ε ε /2N in each term”.
Let us point out that the constants K(n, D) and K(D) are simply avatars
of our ubiquitous constant K, and they need not be the same at each occur-
rence. The only difference is that here we make explicit that these constants
depend only on n and D (etc.) simply because this is easier to do when there
are so few parameters. Of course, K1 (D), etc. denote specific constants.

Lemma 2.3.4. If f ≥ 0 is a function on ΣN


n
we have

νt,v (f ) ≤ K(n, D)νt (f ) . (2.61)

Proof. We use (2.40) with Bv ≡ 1, τ1 = 1, τ2 = ∞ to get


 
d 
 νt,v (f ) ≤ K(n, D)νt,v (f ) .
 dv 

We integrate and we use that νt,1 (f ) = νt (f ). 



n
Proposition 2.3.5. Consider a function f on ΣN . This function might be

random, but it does not depend on the randomness of the variables gi,M , ξM ,ξ 
or z. Then, whenever 1/τ1 + 1/τ2 = 1, we have

|νt (f u (SM,t
1 2
)u (SM,t )) − rνt (f )| ≤ K(n, D) νt (|f |τ1 )1/τ1 νt (|R1,2 − q|τ2 )1/τ2

1
+ νt (|f |) . (2.62)
N

This provides a good understanding of the term I of (2.23), provided we can


find q such that the right-hand side is small.
Proof. We consider Bv as in Lemma 2.3.2, we write
 
d 

|νt,1 (B1 f ) − νt,0 (B0 f )| ≤ max  νt,v (Bv f ) , (2.63)
v dv

and we use (2.40) and (2.61) to get

|νt,1 (B1 f ) − νt,0 (B0 f )| ≤ B , (2.64)

where B is a term as in the right-hand side of (2.62). Thus in the case Bv ≡ 1,


and since νt,1 = νt , (2.37) and (2.64) imply that

|νt (f ) − E f t,∼ | ≤B. (2.65)

In the case Bv = u (Sv1 )u (Sv2 ), (2.38) and (2.64) mean


2.4 The Replica Symmetric Solution 173
   
νt f u (SM,t
1 2
)u (SM,t ) − r E f t,∼
≤B

and combining with (2.65) finishes the proof. 



We now set r = α r, and (2.62) implies
   
ανt ε ε f u (SM,t
 
)u (SM,t ) − rνt (ε ε f )
 
1
≤ αK(n, D) νt (|f |τ1 )1/τ1 νt (|R1,2 − q|τ2 )1/τ2 + νt (|f |) .
N
Looking again at the terms I and II of Proposition 2.2.3, we have proved the
following.
n
Proposition 2.3.6. Consider a function f on ΣN (that does not depend on
 
any of the r.v.s gi,M , ξ , ξM or z). Then, whenever 1/τ1 + 1/τ2 = 1, we have
 
1
|νt (f )| ≤ αK(D, n) νt (|f | )
τ1 1/τ1
νt (|R1,2 − q| )
τ2 1/τ2
+ νt (|f |) . (2.66)
N
The following is an obviously helpful way to relate ν and νt .
Lemma 2.3.7. There exists a constant K(D) with the following property. If
αK(D) ≤ 1, whenever f ≥ 0 is a function on ΣN 2
(that does not depend on
 
any of the r.v.s gi,M , ξ , ξM or z), we have
νt (f ) ≤ 2ν(f ) . (2.67)
Proof. We use Proposition 2.3.6 with τ1 = 1 and τ2 = ∞ to see that
|νt (f )| ≤ αK1 (D)νt (f ) ,
from which (2.67) follows by integration if αK1 (D) ≤ log 2. 


2.4 The Replica Symmetric Solution


√ √
We recall the notation θ = z q + ξ 1 − q where z and ξ are independent
standard Gaussian r.v.s, and that Eξ denotes expectation in ξ only.
Theorem 2.4.1. Given D > 0, there is a number K(D) with the following
property. Assume that the function u satisfies (2.7), i.e.
∀ ≤ 3 , |u() | ≤ D .
Then whenever α ≤ 1/K(D) the system of equations
 2
√ Eξ u (θ) exp u(θ)
q = E th2 (z r) ; r = αE (2.68)
Eξ exp u(θ)
in the unknown q and r has a unique solution, and
  L
ν (R1,2 − q)2 ≤ . (2.69)
N
174 2. The Perceptron Model

Proof. Let us write the second equation of (2.68) as r = α r = α


r(q).
Differentiation and integration √by parts show that | r (q)| ≤ K(D) under
(2.7). The function r → E th2 (z r) has a bounded derivative; so the func-
tion q → ψ(q) := Eth2 (z α r(q)) has a derivative ≤ αK2 (D). Therefore if
2αK2 (D) ≤ 1 there is a unique solution to the equation q = ψ(q) because
then the function ψ(q) is valued in [0, 1] with a derivative ≤ 1/2.
Symmetry among sites yields
 
ν (R1,2 − q)2 = ν(f ) (2.70)

where f = (ε1 ε2 − q)(R1,2 − q), and we write

ν(f ) ≤ ν0 (f ) + sup |νt (f )| . (2.71)


0<t<1

Since q = E th2 (z r) = E th2 Y , Lemma 2.2.2 implies
− −
ν0 ((ε1 ε2 − q)(R1,2 − q)) = (E th2 Y − q)ν0 (R1,2 − q) = 0 ,

and thus
1 1
ν0 (f ) = ν0 (1 − ε1 ε2 q) = (1 − q 2 ) . (2.72)
N N
To compute νt (f ), we use Proposition 2.3.6 with n = 2 and τ1 = τ2 = 2.
Since |f | ≤ 2|R1,2 − q|, we obtain
 
  1
|νt (f )| ≤ αK(D) νt (R1,2 − q) + ν(|f |) .
2
(2.73)
N

We substitute in (2.71) and use (2.67) to get the relation


    1  1
ν(f ) = ν (R1,2 − q)2 ≤ αK(D) ν (R1,2 − q)2 + ν(|f |) + (1 − q 2 ) ,
N N
so that since |f | ≤ 4 we obtain
    K(D)(α + 1)
ν (R1,2 − q)2 ≤ αK(D)ν (R1,2 − q)2 + . 

N

One should observe that in the above argument we never used the unique-
ness of the solutions of the equations (2.68) to obtain (2.69), only their exis-
tence. In turn, uniqueness of these solutions follows from (2.69).
One may like to think of the present model as a kind of “square”. There
are two “spin systems”, one that consists of the σi and one that consists of the
Sk . These are coupled: the σi determine the Sk and these in turn determine
the behavior of the σi . This philosophy undermines the first proof of Theorem
2.4.2 below.
From now on in this section, q and r always denote the solutions of (2.68).
We recall the definition (2.11)
2.4 The Replica Symmetric Solution 175

1 
pN,M (u) = E log exp(−HN,M (σ)) ,
N σ

and we define
1 √ √
p(u) = − r(1 − q) + E log(2ch(z r)) + αE log Eξ exp u(z q + ξ 1 − q) .
2
(2.74)

Theorem 2.4.2. Under the conditions of Theorem 2.4.1 we have


K(D)
|pN,M (u) − p(u)| ≤ . (2.75)
N
We will present two proofs of this fact.
First proof of Theorem 2.4.2. We start with the most beautiful proof,
which is somewhat challenging. It implements through interpolation the idea
that “the quantities Sk behave like individual spins”. We consider indepen-
dent standard Gaussian r.v.s z, (zk )k≤M , (zi )i≤N , (ξk )k≤M and for 0 < s < 1
the Hamiltonian
 √ √  √ √
− HM,N,s = u( sSk + 1 − sθk ) + σi 1 − szi r (2.76)
k≤M i≤N

√ √
where θk = zk q + ξk 1 − q. In this formula, we should think of zi and zk
as representing new randomness, and of ξk as representing “new spins”, so
that Gibbs averages are given by (2.19), and we define
1 
pN,M,s = E log Eξ exp(−HM,N,s ) .
N σ

The variables ξk are not the same as in Section 2.2; we could have denoted
them by ξk to insist on this fact, but we preferred simpler notation.
A key point of the present interpolation is that the equations giving the
parameters qs and rs corresponding to the parameters q and r in the case
s = 1 are now
√ √ √ √
qs = Eth2 ( sz rs + 1 − sz r) (2.77)
 2
Eξ u (θs ) exp u(θs )
rs = αE (2.78)
Eξ exp u(θs )

where
√ √ √ √
θs = s(z qs + ξ 1 − qs ) + 1 − s(z q + ξ 1 − q) .

To understand the formula (2.77) one should first understand what hap-
pens if we include theaction of a random external field in the Hamiltonian,
i.e. we add a term h i≤N gi σi (where gi are i.i.d. standard Gaussian) to
176 2. The Perceptron Model

the right-hand side of (2.6). Then there is nothing to change to the proof of
Theorem 2.4.1; only the first formula of (2.68) becomes

q = E th2 (z r + hg) , (2.79)

where g, z are independent standard Gaussian r.v.s. We then observe √ that



the last term in (2.76) is an external field, that creates the term 1 − sz r
√ (2.77). The second term in the definition of θs is created by the terms
in
1 − sθk in the Hamiltonian (2.76), a source of randomness “inside u”.
The values qs = q, rs √= r are solutions
√ of the equations (2.77) and (2.78),
√ √ √
because for these values sz qs + 1 − sz q is distributed like z q (etc.).
One could easily check that the solution of the system of equations (2.77)
and (2.78) is unique when αK(D) ≤ 1, but this is not needed.
We leave to the readers, as an excellent exercise for those who really
want to master the present ideas, the task to prove (2.69) in the case of the
Hamiltonian (2.76). Since we have already made the effort to understand the
effect of the expectations Eξ , there is really not much to change to the proof
we gave.
So, with obvious notation, one has
  L
∀s ∈ [0, 1] , νs (R1,2 − q)2 ≤ . (2.80)
N
Let us define
√ √ 1 1
Sk,s = sSk + 1 − sθk ; Sk,s = √ Sk − √ θk ,
2 s 2 1−s
so that
 
d 1 d
pN,M,s (u) = νs (−HN,M,s )
ds N ds
  
1 1 √
= νs Sk,s u (Sk,s ) − √ σi zi r . (2.81)
N
k≤M
2 1 − s i≤N

The next step is to integrate by parts. It should be obvious how to proceed


for the integration by parts in zi ; this gives
  
1 1 √ r
νs √ σi zi r = (1 − νs (R1,2 )) .
N 2 1 − s i≤N 2

Let us now explain how to compute νs (Sk,s u (Sk,s )). Without loss of general-
ity we assume k = M . We make explicit the dependence of the Hamiltonian
on SM,s by introducing the Hamiltonian
 √ √  √ √
−HM −1,N,s = u( sSk + 1 − sθk ) + σi 1 − szi r .
k≤M −1 i≤N
2.4 The Replica Symmetric Solution 177

Denoting by · ∼ an average for this Hamiltonian, we then have


SM,s u (SM,s ) exp u(SM,s ) ∼
νs (SM,s u (SM,s )) = E . (2.82)
exp u(SM,s ) ∼
Let us denote as usual by an upper index  the fact “that the spins are
in the -th√replica”. For example, (since we think of ξk as a spin) θk =

zk q + ξk 1 − q where ξk are independent standard Gaussian r.v.s, and
√ √

Sk,s = sSk + 1 − sθk , and let us observe the key relations (where the
 
reader will not confuse SM,s with SM,s )
 1

ESM,s 
SM,s = 0 ;  =  ⇒ ESM,s
 
SM,s = (R, − q) .
2
Now we integrate by parts in (2.82). This integration by parts will take
place given the randomness of HM −1,N,s . We have explained in detail in
the proof of Lemma 2.3.2 how to proceed. The present case is significantly
simpler. There is only one term, “the term created by the denominator” (as
defined page 168), and we obtain
1  
νs (SM,s u (SM,s )) = − νs (R1,2 − q)u (SM,s
1 2
)u (SM,s ) .
2
This illustrates again the principle (2.58) that the expectation Eξ in the
denominator “shifts the variables there to a new replica.” Therefore we have
found that
 
d 1 1  r
pN,M,s (u) = − νs (R1,2 − q) 1
u (Sk,s 2
)u (Sk,s ) − (1 − νs (R1,2 )) .
ds 2 N 2
k≤M

We will not use the fact that the contribution for each k ≤ M is the same,
but rather we regroup the terms as
d r
pN,M,s (u) = − (1 − q)
ds 2
 
1 1 
− νs (R1,2 − q) 1
u (Sk,s 2
)u (Sk,s )−r . (2.83)
2 N
k≤M

This formula should be compared to (1.65). There seems to be little hope


to get any kind of positivity argument here. This is unfortunate because
as of today, positivity arguments are almost our only tool to obtain low-
temperature results.
We get, using the Cauchy-Schwarz inequality
 
d   
 pN,M,s (u) + r (1 − q) ≤ νs (R1,2 − q)2 1/2 (2.84)
 ds 2 
 2 1/2
1 
× νs u (Sk,s )u (Sk,s ) − r
1 2
.
N
k≤M
178 2. The Perceptron Model

From (2.80) we see that the right-hand side √ is ≤ K(D)/ N ; but to get the
correct rate K(D)/N (rather than K(D)/ N ) in Theorem 2.4.2, we need to
know the following, that is proved separately in Lemma 2.4.3 below:
 2  K(D)
1 
νs 1
u (Sk,s 2
)u (Sk,s )−r ≤ . (2.85)
N N
k≤M

We combine with (2.80) to obtain from (2.84) that


 
d 
 pN,M,s (u) + r (1 − q) ≤ K(D)
 ds 2  N

so that, since pN,M (u) = pN,M,1 (u),


  K(D)
 r 
pN,M (u) + (1 − q) − pN,M,0 (u) ≤ .
2 N
As the spins decouple in pN,M,0 (u), the computation of this quantity is
straightforward and this yields (2.75). 


Lemma 2.4.3. Inequality (2.85) holds under the conditions of Theorem


2.4.1.

Proof. Let us write


1 
f = 1
u (Sk,s 2
)u (Sk,s )−r
N
k≤M
1 
f− = 1
u (Sk,s 2
)u (Sk,s )−r ,
N
k<M

so that, using symmetry between the values of k ≤ M ,


 
νs (f 2 ) = νs (αu (SM,s
1 2
)u (SM,s ) − r)f
  K(D)
≤ νs (αu (SM,s
1 2
)u (SM,s ) − r)f − + . (2.86)
N
We extend Proposition 2.3.5 to the present setting of the Hamiltonian (2.76)
to get
  
νs (αu (SM,s
1 2
)u (SM,s ) − r)f − 
 1/2  − 2 1/2 1
≤ αK(D) νs (R1,2 − q)2 νs (f ) + .
N

Combining these, and since 2 ab ≤ a + b, for αK(D) ≤ 1 this yields

1   1   K(D)
νs (f 2 ) ≤ νs (R1,2 − q)2 + νs (f − )2 +
2 2 N
2.4 The Replica Symmetric Solution 179

and since |f 2 − (f − )2 | ≤ K(D)/N we get

1   1 K(D)
νs (f 2 ) ≤ νs (R1,2 − q)2 + νs (f 2 ) + ,
2 2 N
which completes the proof using (2.80). 

To prepare for the second proof of Theorem 2.4.2, let us denote by
F (α, r, q) the right-hand side of (2.74), i.e.
1 √
F (α, r, q) = − r(1 − q) + E log(2ch(z r)) + αE log Eξ exp u(θ) ,
2
√ √
where θ = z q + ξ 1 − q and let us think of this quantity as a function of
three unrelated variables. For convenience, we reproduce the equations (2.68):
 2
2 √ Eξ u (θ) exp u(θ)
q = E th (z r) ; r = αE . (2.87)
Eξ exp u(θ)

Lemma 2.4.4. The conditions (2.87) mean respectively that ∂F/∂r = 0,


∂F/∂q = 0.
Proof. This is of course calculus, differentiation and integration by parts,
but it would be nice to really understand why this is true. We give the proof
in complete detail, but we suggest as a simple exercise that the reader tries
first to figure out these details by herself.
Integration by parts yields
   
∂F 1 1 √ 1 1
= q − 1 + √ E zthz r = q−1+E 2 √
∂r 2 r 2 ch (z r)

so that ∂F/∂r = 0 if
1 2 √
q =1−E 2 √ = E th (z r) .
ch (z r)

Next, if
√ z ξ
θ =z q+ξ 1 − q, θ = √ − √ ,
2 q 2 1−q
we have  
∂F r α u (θ) exp u(θ)
= + E θ . (2.88)
∂q 2 2 Eξ exp u(θ)
To integrate by parts, we observe that F1 (z) = Eξ exp u(θ) does not depend
on ξ and
dF1 d √ √
= Eξ exp u(z q + ξ 1 − q) = qEξ u (θ) exp u(θ) .
dz dz
180 2. The Perceptron Model

We appeal to the integration by parts formula (A.17) to find, since E(θ θ) = 0,



E(θ z) = 1/ q that
   
u (θ) exp u(θ) 1
E θ = −E u (θ) exp u(θ)Eξ (u (θ) exp u(θ))
F1 (z) F1 (z)2
(Eξ u (θ) exp u(θ))2
= −E ,
(Eξ exp u(θ))2
so that by (2.88), ∂F/∂q = 0 if and only if the second part of (2.87) holds.  
If q and r are now related by the conditions (2.87), for small α they are
functions q(α) and r(α) of α (since, as shown by Theorem 1.4.1 the equations
(2.87) have a unique solution). The quantity F (α, r(α), q(α)) is function F (α)
of α alone, and
dF ∂F ∂F dq ∂F dr ∂F
= + + = ,
dα ∂α ∂q dα ∂r dα ∂α
since ∂F/∂q = ∂F/∂r = 0 when q = q(α) and r = r(α). Therefore

F (α) = E log Eξ exp u(θ) . (2.89)


Second proof of Theorem 2.4.2. We define ZN,M = σ exp(−HN,M (σ)),
and we note the identity
 
1 
ZN,M +1 = ZN,M exp u √ gi,M +1 σi
N i≤N

so that
 
1 1 
pN,M +1 (u) − pN,M (u) = E log exp u √ gi,M +1 σi . (2.90)
N N i≤N

To compute the right-hand side of (2.90) we introduce



v  √
Sv = gi,M +1 σi + 1 − vθ ,
N
i≤N
√ √
where θ = z q + ξ 1 − q, where (I almost hesitate to say it again) z and
ξ are independent standard Gaussian r.v.s, and where q is as in (2.68) for
α = M/N (so that the value of q depends on M ). We set

ϕ(v) = E log Eξ exp u(Sv ) .

As usual Eξ denotes expectation in all the r.v.s labeled ξ. Here this expecta-
tion is not built in the bracket · , in contrast with what we did e.g in (2.35),
so that it must be written explicitly.
2.4 The Replica Symmetric Solution 181

We note that

ϕ(1) = N (pN,M +1 (u) − pN,M (u)) ; ϕ(0) = E log Eξ exp u(θ) .

With obvious notation we have


Eξ Sv exp u(Sv ) S exp u(Sv )
ϕ (v) = E =E v .
Eξ exp u(Sv ) Eξ exp u(Sv )

We then integrate by parts, exactly as in (2.82). This yields the formula

1 (R1,2 − q)u (Sv1 )u (Sv2 ) exp(u(Sv1 ) + u(Sv2 ))


ϕ (v) = − E , (2.91)
2 Eξ exp(u(Sv1 ) + u(Sv2 ))

where Sv is defined as Sv , but replacing ξ by ξ  and σ by σ  . Now (2.69)


implies
 1/2 K(D)
|ϕ (v)| ≤ K(D)ν(|R1,2 − q|) ≤ K(D)ν (R1,2 − q)2 ≤ √ .
N
This bound unfortunately does not get the proper rate. To get the proper
bound in K(D)/N in (2.75) one must replace the bound

|ϕ(1) − ϕ(0)| ≤ sup |ϕ (v)|

by the bound
|ϕ(1) − ϕ(0) − ϕ (0)| ≤ sup |ϕ (v)| . (2.92)
A new differentiation and integration by parts in (2.91) bring out in each
term a new factor (R, − q), so that using (2.69) we now get
  K(D)
|ϕ (v)| ≤ K(D)ν (R1,2 − q)2 ≤ .
N
As a special case of (2.91),
1
ϕ (0) = − rν(R1,2 − q) .
2
We shall prove later (when we learn how to prove central limit theorems in
Chapter 9) the non-trivial fact that |ν(R1,2 − q)| ≤ K(D)/N , and (2.92) then
implies
 
 
pN,M +1 (u) − pN,M (u) − 1 E log Eξ exp u(θ) ≤ K(D) . (2.93)
 N  N2

One can then recover the value of pN,M (u) by summing these relations over
M . This is a non-trivial task, since the value of q (and hence of θ) depends
on M .
182 2. The Perceptron Model

Let us recall the function F (α) of (2.89). It is tedious but straightforward


to check that F (α) remains bounded as αK(D) ≤ 1, so that (2.89) yields
     
 
F M + 1 − F M − 1 E log Eξ exp u(θ) ≤ K(D) .
 N N N  N2

Comparing with (2.93) and summing over M then proves (2.75) (and even
better, since the summation is over M , we get a bound αK(D)/N ). This
completes the second proof of Theorem 2.4.2. 

It is worth noting that the first proof of Theorem 2.4.2 provides an easy
way to discover the formula (2.74), but that this formula is much harder to
guess if one uses the second proof. In some sense the first proof of Theo-
rem 2.4.2 is more powerful and more elegant than the second proof. However
we will meet situations (in Chapters 3 and 4) where it is not immediate to
apply this method (and whether this is possible remains to be investigated).
In these situations, we shall use instead the argument of the second proof of
Theorem 2.4.2.

2.5 Exponential Inequalities


Our goal is to improve the control of R1,2 −q from second to higher moments.

Theorem 2.5.1. Given D, there is a number K(D) such that if u satisfies


(2.7), i.e. |u() | ≤ D for all 0 ≤  ≤ 3 then for αK(D) ≤ 1, we have
 k
  64k
∀k ≥ 0, ν (R1,2 − q) 2k
≤ . (2.94)
N
Proof. It goes by induction over k, and is nearly identical to that of Propo-
sition 1.6.7. 
For 1 ≤ n ≤ N , we define An = N −1 n≤i≤N (σi1 σi2 − q), and the induc-
tion hypothesis is that for each n ≤ N ,
 k
64k
ν(A2k
n ) ≤ . (2.95)
N
To perform the induction from k to k + 1, we can assume n < N , for
(2.95) holds if n = N . Using symmetry between sites yields
N −n+1
ν(A2k+2
n )= ν(f ) ,
N
where
f = (ε1 ε2 − q)A2k+1
n .
Thus
2.5 Exponential Inequalities 183

ν(A2k+2
n ) ≤ |ν0 (f )| + sup |νt (f )| . (2.96)
t

We first study the term ν0 (f ). Consider


1 
A = (σi1 σi2 − q) .
N
n≤i≤N −1

Since by Lemma 2.2.2 we have ν0 ((ε1 ε2 − q)A 2k+1 ) = 0, using the inequality

|x2k+1 − y 2k+1 | ≤ (2k + 1)|x − y|(x2k + y 2k )

for x = An and y = A we get, since |x − y| ≤ 2/N and |ε1 ε2 − q| ≤ 2,


4(2k + 1)  
|ν0 (f )| ≤ ν0 (A2k
n ) + ν0 (A
2k
) .
N
We use (2.67), the induction hypothesis, and the observation that since n <
N , we have
ν(A 2k ) = ν(A2k
n+1 )

to obtain
 k  k+1
16(2k + 1) 64k 2k + 1 64(k + 1)
|ν0 (f )| ≤ ≤ . (2.97)
N N 4(k + 1) N

To compute νt (f ) we use Proposition 2.3.6 with n = 4,τ1 = (2k + 2)/(2k + 1),


τ2 = 2k + 2 and (2.67) to get
 
 
2k+2 1/τ2 1
|νt (f )| ≤ αK(D) ν(A2k+2
n )1/τ1
ν (R1,2 − q) + ν(|An | 2k+1
) .
N

Using the inequality x1/τ1 y 1/τ2 ≤ x + y for x = ν(A2k+2


n ) and y = ν((R1,2 −
q)2k+2 ) this implies
 
  1
|νt (f )| ≤ αK(D) ν(An ) + ν (R1,2 − q)
2k+2 2k+2
+ ν(|An | 2k+1
) .
N
Combining with (2.96) and (2.97) we get if αK(D) ≤ 1/4,
1  
ν(A2k+2
n )≤ ν(A2k+2
n ) + ν (R1,2 − q)2k+2
4
 k+1
2k + 1 64(k + 1) 1
+ + ν(|An |2k+1 ) . (2.98)
4(k + 1) N N

Since |An | ≤ 2 and hence |An |2k+1 ≤ 2A2k


n , the induction hypothesis implies
that the last term of (2.98) is at most
 k+1
1 64(k + 1)
,
32(k + 1) N
184 2. The Perceptron Model

so the sum of the last 2 terms is at most


 k+1
1 64(k + 1)
.
2 N
Since A1 = R1,2 − q, considering first the case n = 1 provides the required
inequality in that case. Using back this inequality in (2.98) provides the
required inequality for all values of n. 

The following extends Lemma 2.4.3. Its proof is pretty similar to that
of Theorem 2.5.1, and demonstrates the power of this approach. The reader
who does not enjoy the argument should skip the forthcoming proof and make
sure she does not miss the pretty Theorem 2.5.3. We denote by K0 (D) the
constant of Theorem 2.5.1.

Theorem 2.5.2. Assume that u satisfies (2.7) for a certain number D. Then
there is a number K(D), depending on D only, with the following property.
For αK0 (D) ≤ 1 we have
 2k   k
1  αkK(D)
∀k ≥ 0 , ν u (Sj )u (Sj ) − r
1 2
≤ . (2.99)
N N
j≤M

Proof. We recall the definition of r given by (2.36), i.e.


 2
Eξ u (θ) exp u(θ)
r = E ,
Eξ exp u(θ)

r. For 1 ≤ n ≤ M we define
so that with the notation (2.87) we have r = α
1 
Cn = (u (Sj1 )u (Sj2 ) − r) .
M
n≤j≤M

r and 1/N = α/M the left-hand side of (2.99) is α2k ν(C12k ).


Since r = α
We prove by induction over k that if αK0 (D) ≤ 1 then for a suitable
number K1 (D) we have for k ≥ 1 and any n ≤ M that
 k
kK1 (D)
ν(Cn2k ) ≤ . (2.100)
M
Using this for n = 1 concludes the proof. For k = 0 (2.100) is true if one then
understands the right-hand side of (2.99) as being 1. The reader disliking this
can instead start the induction at k = 1. To prove the case k = 1 it suffices
to repeat the proof of Lemma 2.4.3 (while keeping a tighter watch on the
dependence on α). For the induction step from k to k + 1 we can assume that
n < M , and we use symmetry among the values of j to obtain

ν(Cn2k+2 ) = ν(f ∼ ) , (2.101)


2.5 Exponential Inequalities 185

where f ∼ = (u (SM
1 2
)u (SM ) − r)Cn2k+1 . Let us define
1 
C = (u (Sj1 )u (Sj2 ) − r) .
M
n≤j≤M −1

Using the inequality

|x2k+1 − y 2k+1 | ≤ (2k + 1)|x − y|(x2k + y 2k ) (2.102)

for x = Cn and y = C , and since |u (SM 1 2


)u (SM ) − r| ≤ 2D2 , we obtain that
for f ∗ = (u (SM
1 2
)u (SM ) − r)C 2k+1 :

2(2k + 1)D2
ν(f ∼ ) ≤ ν(f ∗ ) + (ν(Cn2k ) + ν(C 2k
)) . (2.103)
M
2k 2k
Since n < M , symmetry among the values of j implies ν(C ) = ν(Cn+1 )
and the induction hypothesis yields
 k
∼ ∗ 8(k + 1)D2 K1 (D)k
ν(f ) ≤ ν(f ) + . (2.104)
M M

Next, we use (2.62) for t = 1, f = C 2k+1 and n = 2. This is permitted



because f does not depend on the randomness of ξM , ξ  or gi,M . We choose
τ1 = (2k + 2)/(2k + 1) and τ2 = 2k + 2 to get
 

 
2k+2 1/τ2 1
|ν(f )| ≤ K2 (D) ν(C 2k+2 1/τ1
) ν (R1,2 − q) + ν(|C | 2k+1
) .
N

Since we work under the condition αK0 (D) ≤ 1, we can as well assume that
α ≤ 1, so that M ≤ N and
 

 
2k+2 1/τ2 1
|ν(f )| ≤ K2 (D) ν(C 2k+2 1/τ1
) ν (R1,2 − q) + ν(|C | 2k+1
) .
M
(2.105)
We recall the inequality x1/τ1 y 1/τ2 ≤ x + y. Changing x to x/A and y to
Aτ2 /τ1 y in this inequality gives
x
x1/τ1 y 1/τ2 ≤ + Aτ2 /τ1 y .
A
Using this for A = 2K2 (D), x = ν(C 2k+2
) and y = ν((R1,2 − q)2k+2 ), we
deduce from (2.105) that

1   K(D)
|ν(f ∗ )| ≤ ν(C) + K(D)2k+1 ν (R1,2 − q)2k+2 +
2k+2
ν(|C |2k+1 ) .
2 M
(2.106)
We now use the inequality

|x2k+2 − y 2k+2 | ≤ (2k + 2)|x − y|(|x|2k+1 + |y|2k+1 )


186 2. The Perceptron Model

for x = C and y = Cn to obtain

2(2k + 2)D2  
ν(C 2k+2
) ≤ ν(Cn2k+2 ) + ν(|C |2k+1 ) + ν(|Cn |2k+1 ) .
M
We combine this with (2.106), we use that |Cn |2k+1 ≤ 2D2 Cn2k and |C |2k+1 ≤
2D2 C 2k and the induction hypothesis to get
1  
|ν(f ∗ )| ≤ ν(Cn2k+2 ) + K(D)2k+2 ν (R1,2 − q)2k+2
2
 k
(k + 1)K(D) K1 (D)k
+ ,
M M

and combining with (2.101) and (2.104) that


1  
ν(Cn2k+2 ) ≤ ν(Cn2k+2 ) + K(D)2k+2 ν (R1,2 − q)2k+2
2
 k
(k + 1)K(D) K1 (D)k
+ .
M M

Finally we use (2.94) to conclude the proof that ν(Cn2k+2 ) ≤ (K1 (D)(k +
1)/M )k+1 if K1 (D) has been chosen large enough. This completes the induc-
tion. 

The following central limit theorem describes the fluctuations of pN,M (u)
(given by (2.11)). We recall that a(k) = Ez k where z is a standard Gaussian
r.v. and that O(k) denotes a quantity A = AN with |A| ≤ KN −k/2 where K
does not depend on N . We recall the notation p(u) of (2.74),
1 √ √
p(u) = − r(1 − q) + E log(2ch(z r)) + αE log Eξ exp u(z q + ξ 1 − q) .
2
Theorem 2.5.3. Let
√ √
b = E(log ch(z r))2 − (E log ch(z r))2 − qr .

Then for each k ≥ 1 we have


 k/2
b
E(pN,M (u) − p(u)) = k
a(k) + O(k + 1) .
N

Proof. This argument resembles that in the proof of Theorem 1.4.11, and
it would probably help the reader to review the proof of that theorem now.
The present proof is organized a bit differently, avoiding the a priori estimate
of Lemma 1.4.12. The interpolation method of the first proof of Theorem
2.4.2 is at the center of the argument, so the reader should feel comfortable
with this proof in order to proceed. We recall the Hamiltonian (2.76) and
we denote by · s an average for the corresponding Gibbs measure. In the
2.5 Exponential Inequalities 187

proof O(k) will denote a quantity A = AN such that |A| ≤ KN −k/2 where
K does not depend on N or s, and we will take for granted that Theorems
2.5.1 and 2.5.2 hold uniformly over s. (This fact is left as a good exercise for
the reader.)
Consider the following quantities
1 
A(s) = log Eξ exp(−HN,M,s (σ))
N σ
√ √ s
RS(s) = E log 2ch(z r) + αE log Eξ exp u(z q + ξ 1 − q) − r(1 − q)
2
V (s) = A(s) − RS(s)
√ √
b(s) = E(log ch(z r))2 − (E log ch(z r))2 − rqs .

The quantities EA(s), RS(s) and b(s) are simply the quantities corresponding
for the interpolating system respectively to the quantities pN,M (u), pu , and
b. Fixing k, we set
ψ(s) = EV (s)k .
We aim at proving by induction over k that ψ(s) = (b(s)/N )k/2 a(k)+O(k+1),
which, for s = 1, proves the theorem. Consider ϕ(s, a) = E(A(s)−a)k , so that
ψ(s) = ϕ(s, RS(s)) and by straightforward differentiation ∂ϕ/∂s is given by
the quantity
 
k   Sj θj
  σi √
E √ −√ u (Sj,s ) − √ zi r (A(s) − a) k−1
,
2N
j≤M
s 1−s i≤N
1−s s

√ √
where Sj,s = sSj + 1 − sθj . Next, defining Sj,s 
as usual we claim that
∂ϕ/∂s = I + II, where
 
k 1 
I= E − (R1,2 − q)u (Sj,s )u (Sj,s ) − r(1 − R1,2 ) (A(s) − a)
1 2 k−1
2 N s
j≤M

and II is the quantity


 
k(k − 1) 1 
E (R1,2 − q)u (Sj,s
1 2
)u (Sj,s ) − rR1,2 (A(s) − a)k−2 .
2N N s
j≤M

This follows by integrating by parts as in the proof of (2.83). The term I


is created by the dependence of the bracket · s on the r.v.s Sj , θj and zi ,
and the term II by the dependence on these variables of A(s). We note the
obvious identity I = III + IV where
   
k 1 
III = − E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
(A(s) − a)k−1
2 N s
j≤M
188 2. The Perceptron Model

and
kr(1 − q)
IV = − E((A(s) − a)k−1 ) .
2
Similarly we have also II = V + VI where V is the quantity
   
k(k − 1) 1 
E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
(A(s) − a)k−2
2N N s
j≤M

and
rq
VI = − k(k − 1)E((A(s) − a)k−2 ) .
2N
Now,
d ∂ϕ ∂ϕ
ψ (s) = ϕ(s, RS(s)) = (s, RS(s)) + RS (s) (s, RS(s)) . (2.107)
ds ∂s ∂a
Since RS (s) = −r(1 − q)/2 and ∂ϕ/∂a(s, RS(s)) = −kEv(s)k−1 , the second
term of (2.107) cancels out with the term IV and we get

ψ (s) = VII + VIII + IX (2.108)

where
   
k 1 
VII = − E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
V (s)k−1
2 N s
j≤M
   
k(k − 1) 1 
VIII = E (R1,2 − q) u (Sj,s )u (Sj,s ) − r
1 2
V (s)k−2
2N N s
j≤M
rq
IX = − k(k − 1)EV (s)k−2 .
2N

The idea is that each of the factors R1,2 − q, (N −1 j≤M u (Sj,s
1 2
)u (Sj,s ) − r)
−1/2
and V (s) “counts as N ”. This follows from Theorems 2.5.1 and 2.5.2 for
the first two terms, but we have not proved it yet in the case of V (s). (In the
case of Theorem 1.4.11, the a priori estimate of Lemma 1.4.12 showed that
V (s) “counts as N −1/2 ”.) Should this be indeed the case, the terms VII and
VIII will be of lower order O(k + 1). We turn to the proof that this is actually
the case.
A first step is to show that
K(k) k−1 K(k) k−2
VII ≤ (E|V (s)|k ) k ; VIII ≤ 2
(E|V (s)|k ) k . (2.109)
N N
In the case of VII, setting A = R1,2 − q and
1 
B= 1
u (Sj,s 2
)u (Sj,s )−r
N
j≤M
2.5 Exponential Inequalities 189

we write, using Hölder’s inequality and Theorems 2.5.1 and 2.5.2:


k−1
E( AB s V (s)k−1 ) ≤ E A2k s1/2k E B 2k s1/2k (E|V (s)|k ) k

K(k) k−1
≤ E|V (s)|k ) k .
N
We proceed in a similar manner for VIII, i.e. we write that
k−2
E( AB s V (s)k−1 ) ≤ E |A|k 1/k
s E |B| s (E|V (s)| )
k 1/k k k

K(k) k−2
≤ (E|V (s)|k ) k ,
N
and this proves (2.109).
Since xy ≤ xτ1 + y τ2 for τ2 = k/(k − 2) and τ1 = k/2 we get
1 k−2 1
(E|V (s)|k ) k ≤ k/2 + E|V (s)|k .
N N
This implies in particular
 
K(k) k−2 1
IX ≤ (E|V (s)|k ) k ≤ K(k) + E|V (s)| k
N N k/2
and
   
K(k) 1 1
VIII ≤ + E|V (s)| k
≤ K(k) + E|V (s)| k
.
N N k/2 N k/2
Next, we use that xy ≤ xτ1 + y τ2 for τ2 = k/(k − 1) and τ1 = k to get
1 k−1 1 1
(E|V (s)|k ) k ≤ k + E|V (s)|k ≤ k/2 + E|V (s)|k .
N N N
When k is even (so that |V (s)|k = V (s)k and E|V (s)|k = ψ(s)) we have
proved that  
1
ψ (s) ≤ K(k) + ψ(s) . (2.110)
N k/2
Thus (2.110) and Lemma A.13.1 imply that
 
1
ψ(s) ≤ K(k) ψ(0) + .
N k/2

Since it is easy (as the spins decouple) to see that ψ(0) ≤ K(k)N k/2 , we
have proved that for k even we have EV (s)k = O(k). Since E|V (s)|k ≤
(EV (s)2k )1/2 this implies that E|V (s)|k = O(k) for each k so that by (2.109)
we have VII = O(k + 1) and VIII = O(k + 1). Thus (2.108) yields
rq
ψ (s) = − k(k − 1)EV (s)k−2 + O(k + 1)
2N
b (s) k
= (k − 1)EV (s)k−2 + O(k + 1) .
N 2
190 2. The Perceptron Model

As in Theorem 1.4.11, one then shows by induction over k that


 k/2
k b(s)
EV (s) = a(k) + O(k + 1) ,
N
using that this is true for s = 0, which is again proved as in Theorem 1.4.11.



Exercise 2.5.4. Rewrite the proof of Theorem 1.4.11 without using the a
priori estimate of Lemma 1.4.12. This allows to cover the case where the r.v.
h is not necessarily Gaussian.

Research Problem 2.5.5. (Level 1+ ) Prove the result corresponding to


Theorem 1.7.1 for the present model.

This problem has really two parts. The first (easier) part is to prove results
for the present model. For this, the approach of “separating the numerator
from the denominator” as explained in Section 9.1 seems likely to succeed.
The second part (harder) is to find arguments that will carry over when we
will have much less control over u as in Chapter 9. For this second
√ part, the
work is partially done in [100], but reaching only the rate 1/ N rather than
the correct rate 1/N .

Research Problem 2.5.6. (Level 2) For the present model prove the TAP
equations.

These equations have two parts. One part expresses σi as a function of


( u (Sk ) )k≤M , and one part expresses u (Sk ) as a function of ( σi )i≤N . It
is (perhaps) not too difficult to prove these equations when one has a good
control over all derivatives of u, but it might be another matter to prove
something as precise as Theorem 1.7.7 in the setting of Chapter 9.

2.6 Notes and Comments


The problems considered in this chapter are studied in [63] and [52].
It is predicted in [90] that the replica-symmetric solution holds up to
α∗ , so Problem 2.1.1 amounts to controlling the entire replica-symmetric
(=“high-temperature”) region, typically a very difficult task.
It took a long time to discover the proof of Theorem 2.4.1. The weaker
methods developed previously [148] for this model or for the SK and the
Hopfield models just would not work. During this struggle, it became clear
that the smart path method as used here was a better way to go for these
three models.
3. The Shcherbina and Tirozzi Model

3.1 The Power of Convexity

In the present model the configuration space is RN , that is, the configuration
σ can be any point in RN . Given another integer M , we will consider the
Hamiltonian
  1   
− HN,M (σ) = u √ gi,k σi + h gi σi − κσ2 . (3.1)
k≤M
N i≤N i≤N


Here σ2 = i≤N σi2 , (gi,k )i≤N,k≤M and (gi )i≤N are independent standard
Gaussian r.v.s and κ > 0, h ≥ 0. We will always assume

u ≤ 0 , u is concave. (3.2)

To get a feeling for this Hamiltonian, let us think of u such that, for a
certain number τ , u(x) = −∞ if x < τ and u(x) = 0 if x ≥ τ . Then it
is believable that the Hamiltonian (3.1) will teach us something about the
region in RN defined by
1 
∀k ≤ M , √ gi,k σi ≥ τ . (3.3)
N i≤N

This region has a natural meaning: it is the intersection of M half-spaces of


random directions, each of which is determined by an hyperplane at distance
(about) τ from the origin. It is for√
the purpose of computing “the proportion”
of the sphere SN = {σ ; σ = N } that belongs to the region (3.3) that
the Hamiltonian (3.1) was introduced in [133]. This generalizes
 the problem
considered in (0.2), where we had τ = 0. The term h i≤N gi σi is not neces-
sary for this computation, but it is not a real trouble either, and there is no
reason to deprive the reader from the added charm it brings to the beautiful
formulas the Hamiltonian (3.1) will create. We will always assume that h ≥ 0.
There are obvious connections between the present model and the model of
Chapter 2. As in Chapter 2 the important case is when M is proportional
to N .
This Hamiltonian is a convex function of σ. In fact

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 191
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 3, © Springer-Verlag Berlin Heidelberg 2011
192 3. The Shcherbina and Tirozzi Model
 ' '
1 ' x − y '2
x+y
∀x, y ∈ RN , (HN,M (x) + HN,M (y)) − HN,M ≥ κ' ' 2 ' .
'
2 2
(3.4)
The beauty of the present model is that it allows the use of powerful tools
from convexity, from which a very strong control of the overlaps will follow.
The overlaps are defined as usual, by

σ · σ
R, = .
N
The case  =  is now of interest, R, = σ  2 /N . Let us consider the Gibbs’
measure G on RN with Hamiltonian HN,M , that is, for any subset B of RN ,

1
G(B) = exp(−HN,M (σ))dσ , (3.5)
ZN,M B

where dσ denotes Lebesgue’s measure and ZN,M = exp(−HN,M (σ))dσ is
the normalization factor. As usual, we denote by · an average for this Gibbs
measure, so that G(B) = 1B . We use the notation ν(f ) = E f .
The goal of this section is to prove the following.

Theorem 3.1.1. Assume that for a certain number D we have

∀x , u(x) ≥ −D(1 + |x|) . (3.6)

|u | ≤ D ; |u | ≤ D . (3.7)
Then for k ≤ N/4 we have
 k
  Kk
ν (R1,1 − ν(R1,1 )) 2k
≤ (3.8)
N
 k
  Kk
ν (R1,2 − ν(R1,2 ))2k ≤ , (3.9)
N

where K does not depend on N or k.

There is of course nothing special in the value N/4 which is just a convenient
choice. We could replace the condition k ≤ N/4 by the condition k ≤ AN
for any number A, with now a constant K(A) depending on A.
The basic reason why in Theorem 3.1.1 one does not control all moments
is that moments of high orders are very sensitive to what happens on very
small sets or very rare events. For example moments of order about N are
very sensitive to what happens on “events of size exp(−N/K)”. Controlling
events that small is difficult, and is quite besides our main goal. Of course
one can dream of an entire “large deviation theory” that would describe the
extreme situations that can occur with such rarity. In the present model, and
3.1 The Power of Convexity 193

well as in the other models considered in the book, such a theory remains
entirely to be built.
Theorem 3.1.1 asserts that the overlaps are nearly constant. For many of
the systems studied in this book, it is a challenging task to prove that the
overlaps are nearly constant, and this requires a “high-temperature” condi-
tion. In the present model, no such condition is necessary, so one might say
that the system is always in a high-temperature state. One would expect
that it is then a simple matter to completely understand this system, and in
particular to compute
1
lim E log exp(−HN,M (σ))dσ . (3.10)
N →∞,M/N →α N

This, however, does not seem to be the case. At the present time we know how
to handle only very special situations, and the reasons for this will become
apparent as the reader progresses through the present chapter.

Research Problem 3.1.2. (Level 2+). Under the conditions of Theorem


3.1.1, compute the limit in (3.10).

The fundamental fact about convexity theory is the following functional


version of the Brunn-Minkowski theorem. A very clean proof can be found in
([93], Theorem 2.13). For the convenience of the reader, this proof is repro-
duced in Appendix A.15.

Theorem 3.1.3. Consider non-negative functions U, V, W on RN and a


number 0 < s < 1, and assume that for all x, y in RN we have

W (sx + (1 − s)y) ≥ U (x)s V (y)1−s . (3.11)

Then  s  1−s
W (x)dx ≥ U (x)dx V (x)dx . (3.12)

Consider sets A and B. The functions U = 1A , V = 1B and W =


1sA+(1−s)B satisfy (3.11). Writing VolA = A dx, we deduce from (3.12)
that
Vol(sA + (1 − s)B) ≥ (VolA)s (VolB)1−s , (3.13)
the Brunn-Minkowski inequality.
B. Maurey discovered that Theorem 3.1.3 implies the following sweeping
generalization of Theorem 1.3.4.

Theorem 3.1.4. Consider a function H on RN , and assume that for some


number κ > 0 we have (3.4) i.e.
  ' '
1 x+y ' x − y '2
∀x, y ∈ R N
, (H(x) + H(y)) − H '
≥ κ' ' .
2 2 2 '
194 3. The Shcherbina and Tirozzi Model

Consider the probability measure μ on RN with density proportional to


exp(−H(x)) with respect to the Lebesgue measure. Then for any set B ⊂ RN
we have
κ 1
exp d2 (x, B)dμ(x) ≤ , (3.14)
2 μ(B)
where d(x, B) = inf{d(x, y); y ∈ B} is the distance from x to B. Moreover,
if f is a function on RN with Lipschitz constant A, i.e. it satisfies

∀x, y ∈ RN , |f (x) − f (y)| ≤ Ax − y , (3.15)

then  2
κ
exp f (x) − f dμ dμ(x) ≤ 4 (3.16)
8A2
and
 2k  k
8kA2
∀k ≥ 1 , f (x) − f dμ dμ(x) ≤ 4 . (3.17)
κ

The most striking feature of the inequalities (3.16) and (3.17) is that they
do not depend on the dimension of the underlying space. When H(x) =
x2 /2, μ is the canonical Gaussian measure and (3.17) recovers (1.47) (with
worse constants).
Proof. Define the functions W, U, V as follows:
κ 
W (x) = exp(−H(x)) ; V (y) = exp d(y, B)2 − H(y)
2
and
U (x) = 0 if x ∈
/B
U (x) = exp(−H(x)) if x ∈ B .
These functions satisfy (3.11) with s = 1/2. Indeed, it suffices to consider the
case where x ∈ B, in which case (3.11) reduces to
  
x+y 1 κ
−H ≥ −H(x) − H(y) + d(y, B)2 ,
2 2 2

which follows from (3.4) and the fact that d(y, B) ≤ x − y. Then (3.12)
holds, and for the previous choices it means exactly (3.14).
To prove (3.16) we consider a median m of f for μ, that is number m
such that μ({f ≤ m}) ≥ 1/2 and μ({f ≥ m}) ≥ 1/2. The set B = {f ≤ m}
then satisfies μ(B) ≥ 1/2 and since (3.15) implies

f (x) ≤ m + Ad(x, B)

it follows from (3.14) that


3.1 The Power of Convexity 195

κ
exp (f (x) − m)2 dμ(x) ≤ 2 . (3.18)
{f ≥m} 2A2

Proceeding in a similar manner to control the integral over the set {f ≤ m}


we get
κ
exp (f (x) − m)2 dμ(x) ≤ 4 . (3.19)
2A2
The convexity of the map x → exp x2 shows that
1 1
exp (x + y)2 ≤ (exp x2 + exp y 2 ) .
4 2
Since |f (x) − f (y)| ≤ |f (x) − m| + |f (y) − m| we deduce from (3.19) that

κ
exp (f (x) − f (y))2 dμ(x)dμ(y) ≤ 4 ,
8A2

from which (3.16) follows using Jensen’s inequality, averaging in y in the


exponential rather than outside. To prove (3.17) we relate as usual exponen-
tial integrability and growth of moments. We write that if x ≥ 0 we have
xk /k! ≤ exp x so that
xk ≤ k k exp x (3.20)
and hence  k
8kA2 κ 2
y 2k ≤ exp y . 

κ 8A2

Let us point out that in Theorem 3.1.4 the function H can take the value
+∞. Equivalently, this theorem holds when μ is a probability on a convex
set C with a density proportional to exp ψ(σ), where ψ satisfies
  ' '
1 x+y ' x − y '2
(ψ(x) + ψ(y)) − ψ ≤ −κ '
' 2 ' .
' (3.21)
2 2

The argument that allows to deduce (3.16) from (3.19) is called a sym-
metrization argument. This argument proves also the following. For each
number m, each function f and each probability μ we have
 2k
f− f dμ dμ ≤ 22k (f − m)2k dμ . (3.22)

To see this we simply write, using Jensen’s inequality in the second line and
that (a + b)2k ≤ 22k−1 (a2k + b2k ) in the third line,
196 3. The Shcherbina and Tirozzi Model
 2k  2k
f− f dμ dμ = f (x) − f (y)dμ(y) dμ(x)

2k
≤ (f (x) − f (y)) dμ(x)dμ(y)

2k
= ((f (x) − m) − (f (y) − m)) dμ(x)dμ(y)

≤ 22k (f − m)2k dμ .

The essential feature of the present model is that any realization of the
Gibbs measure with Hamiltonian (3.1) satisfies (3.16) and (3.17). We will
need to use (3.16) for functions such as x2 that are not Lipschitz on RN ,
but are Lipschitz when x is not too large. For this, it is useful to know that
the Gibbs
√ measure with Hamiltonian (3.1) essentially lives on a ball of radius
about N , and the next two lemmas prepare for this. In this chapter and
the next, we will use many times the fact that
 N
2 2 π N/2
exp(−tσ )dσ = exp(−tx )dx = . (3.23)
t

Lemma 3.1.5. Consider a probability μ on RN such that for any subset B


of RN we have
  
1
μ(B) = exp U (σ) − κσ2 + ai σi dσ ,
Z B
i≤N

where U ≤ 0 and where Z is the normalizing factor. Then


 N/2  
κ 1 2π 1  2
exp σ2 dμ(σ) ≤ exp ai .
2 Z κ 2κ
i≤N

Proof. Using the definition of μ in the first line, that U ≤ 0 in the second
line, completing the squares in the third line and using (3.23) in the last line,
we obtain
  
κ 1 κ
exp σ2 dμ(σ) = exp U (σ) − σ2 + ai σi dσ
2 Z 2
i≤N
  
1 κ
≤ exp − σ + 2
ai σi dσ
Z 2
i≤N
 
1 κ ai 2 1  2
= exp − σi − + ai dσ
Z 2 κ 2κ
i≤N i≤N
   
1 1  2 κ
= exp ai exp − σ dσ 2
Z 2κ 2
i≤N
3.1 The Power of Convexity 197
 N/2  
1 2π 1  2
= exp ai .
Z κ 2κ
i≤N

This concludes the proof. 



In order to use Lemma 3.1.5 when μ is Gibbs’ measure (3.5) we need an
upper bound for 1/ZN,M .

Lemma 3.1.6. Assume (3.6), that is u(x) ≥ −D(1 + |x|) for a certain num-
ber D and all x. Then we have
 ( 
1 κ N/2 M 
≤ exp D M + 2
gi,k .
ZN,M π κN
i≤N,k≤M

Proof. The proof relies on the rotational invariance of the Gaussian mea-
sure γ on RN of density (κ/π)N/2 exp(−κσ2 ) with respect to Lebesgue’s
measure. For x ∈ RN we have
 
1 1
|x · σ|dγ(σ) = x ≤ x , (3.24)
πκ κ
because the rotational invariance of γ reduces this to the case N = 1.
Letting gk = (gi,k )i≤N , we have
    
gk · σ
ZN,M = exp u √ − κσ + h
2
gi σi dσ
k≤M
N i≤N
   
π N/2 gk · σ 
= exp u √ +h gi σi dγ(σ)
κ N
k≤M i≤N
        
π N/2 gk · σ
≥ exp u √ +h gi σi dγ(σ)
κ N
k≤M i≤N
     
π N/2 gk · σ
= exp u √ dγ(σ) ,
κ N
k≤M

using Jensen’s inequality in the third line and since σi dγ(σ) = 0. Now,
using (3.6) and (3.24) for x = gk yields
   
 gk · σ 1  gk 
u √ dγ(σ) ≥ −D M + √ √
N κ N
k≤M k≤M
  ( 
1 M 
≥ −D M + √ gk 2 ,
κ N
k≤M

using the Cauchy-Schwarz inequality. 



198 3. The Shcherbina and Tirozzi Model

We will often assume that


κ ≥ κ0 , 0 ≤ h ≤ h0 , M ≤ 10N (3.25)
where κ0 and h0 are given numbers. The condition M ≤ 10N is simply to
avoid trivial complications, and it contains the case relevant to the compu-
tation of the part of the sphere SN that belongs to the region (3.3).
In the entire chapter we make the convention that K denotes a number
that might depend on κ0 , h0 , D but that does not depend on M and N or
on any other parameter. This number does not need to be the same at each
occurrence.
The following is an immediate consequence of Lemmas 3.1.5 and 3.1.6.
Corollary 3.1.7. Under (3.6) and (3.25) we have
 (  
 κ  
exp σ ≤ exp K N +
2 2
gi,k + 2
gi . (3.26)
2
i≤N,k≤M i≤N

We set (  
B∗ = N + 2 +
gi,k gi2 . (3.27)
i≤N,k≤M i≤N

It will help in all the forthcoming computations to think of B ∗ as being


≤ KN for all practical purposes. In other words, the event where this is not
the case is so rare as being irrelevant for the questions we pursue. This will
be made precise in Lemma 3.1.10 below.
With the notation (3.27) we rewrite (3.26) as
κσ2
exp ≤ exp KB ∗ . (3.28)
2
This inequality is a sophisticated way to√express that the Gibbs’ measure “is
basically supported by a ball of radius K N ”. The following simple fact from
Probability theory will help to exploit this inequality in terms of moments.
Lemma 3.1.8. Consider a r.v. X ≥ 0 and C = log E exp X. Then for each
k we have
EX k ≤ 2k (k k + C k ) . (3.29)
Proof. By definition of C,
E exp(X − C) = 1
so that if x+ = max(x, 0) we have exp(X − C)+ ≤ 1 + exp(X − C) and hence
E exp(X − C)+ ≤ E(1 + exp(X − C)) = 2 .
Since by (3.20) we have xk ≤ k k ex for x ≥ 0, we get
E(X − C)+k ≤ 2kk .
Now X ≤ (X − C)+ + C and (a + C)k ≤ 2k−1 (ak + C k ). 

3.1 The Power of Convexity 199

Corollary 3.1.9. For k ≤ 4N we have


σ2k ≤ (KB ∗ )k . (3.30)
As in the case of Theorem 3.1.1, there is nothing specific here in the choice
of the number 4 in the inequality k ≤ 4N . We can replace the condition
k ≤ 4N by the condition k ≤ AN for any number A (with a constant K(A)
depending on A). The same comment applies to many results of this section.
Let us also note that the fact that (3.30) holds for each k ≤ 4N is equivalent
to saying that it holds for k = 4N , by Hölder’s inequality.
Proof. We use (3.29) in the probability space given by Gibbs’ measure. If
X = κσ2 /2, then (3.28) implies log exp X ≤ KB ∗ and (3.29) then implies
X k ≤ 2k (k k + (KB ∗ )k ). Since k ≤ 4N ≤ 4B ∗ , we finally get
X k ≤ 2k ((4B ∗ )k + (KB ∗ )k ) ≤ ((8 + 2K)B ∗ )k , (3.31)
and this finishes the proof. 

As the reader is getting used to the technique of denoting by the letter
K an unspecified constant, we will soon no longer fully detail trivial bounds
such as (3.31). Rather we will simply write “since k ≤ 4N ≤ 4B ∗ we have
X k ≤ 2k (k k + (KB ∗ )k ) ≤ (KB ∗ )k ”.
Lemma 3.1.10. For k ≤ N we have
EB ∗k ≤ (KN )k . (3.32)
√ 
Proof. Using that 2 x ≤ x/a + a for x = i≤N,k≤M gi,k and a = N , and
2

then using (A.11) and independence, we get


 
B∗ N N 1  1 2
E exp ≤ exp + + 2
gi,k + gi (3.33)
4 4 2 8 4
i≤N,k≤M i≤N
  M N  N
3N 1 1
≤ exp ≤ exp LN ,
4 1 − 1/4N 1 − 1/2
and we use (3.29) for X = B ∗ /4. 

After these preliminaries, we turn to the central argument, the use of
Theorem 3.1.4 to control the overlaps. The idea is simply
√ that since Gibbs’
measure is essentially supported by a ball of radius B ∗ centered at the
√ pretend that the functions R1,2 and R1,1 have a
origin, we can basically
Lipschitz constant ≤ B ∗ /N and use (3.17).
Theorem 3.1.11. For k ≤ N we have
 k
KkB ∗
(R1,2 − R1,2 )2k ≤ , (3.34)
N2
 k
KkB ∗
(R1,1 − R1,1 ) 2k
≤ . (3.35)
N2
200 3. The Shcherbina and Tirozzi Model

Proof. We write b = σ , so that R1,2 = b2 /N , and


 1 2   
σ · σ σ 1 · b   σ 1 · b b · b 

|R1,2 − R1,2 | ≤  − + − . (3.36)
N N   N N 

If we fix σ 1 , the map f : x → σ 1 · x/N satisfies (3.15) with A = σ 1 /N , so


that by (3.17) we get
 2k  k
σ1 · σ2 σ1 · b Kkσ 1 2
− dG(σ 2 ) ≤ ,
N N N2

and therefore, integrating the previous inequality for σ 1 with respect to G,


 2k  k
σ1 · σ2 σ1 · b KkB ∗
− dG(σ )dG(σ ) ≤
1 2
N N N2

using (3.30). The second term on the right-hand side of (3.36) is handled
similarly, using now that b2k ≤ (KB ∗ )k by (3.30) and Jensen’s inequality.
To prove (3.35), let us consider a parameter a to be chosen later and let

f (σ) = min(σ2 /N, a2 /N ) = (min(σ, a))2 /N .

This function satisfies (3.15) for A = 2a/N , so that by (3.17) we get


 k
Ka2 k
(f − f )2k ≤ . (3.37)
N2

Let ϕ(σ) = σ2 /N − f (σ), so that

σ2
|ϕ(σ)| ≤ 1{σ≥a} ,
N
and, using (3.22) for m = 0 in the first inequality and the Cauchy-Schwarz
inequality in the second line,
 2k
σ2
(ϕ − ϕ ) 2k
≤2 2k
ϕ2k
≤2 2k
1{σ≥a} (3.38)
N
 4k 1/2
σ2
≤ 22k 1{σ≥a} 1/2
.
N

Using (3.30) (for k = 4k ≤ 4N rather than for k) we obtain


 4k 2k 
KB ∗
1/2
σ2
≤ (3.39)
N N

and using (3.28) we see that if “we choose a = K B ∗ ” then
3.1 The Power of Convexity 201

1{σ≥a} ≤ exp(−2B ∗ ) . (3.40)

Again, here, to understand what this means the reader must keep in mind
that the letter K might denote different constants at different occurrences.
The complete argument is that if

κσ2
exp ≤ exp K1 B ∗ ,
2

then
κa2 
1{σ≥a} ≤ exp K1 B ∗ − ,
2

so that (3.40) holds for a = K2 B ∗ whenever K2 ≥ 2(K1 + 2)/κ.
Therefore with this choice of a we have, plugging (3.40) and (3.39) into
(3.38),
 2k
∗ KB ∗
(ϕ − ϕ ) 2k
≤ exp(−B ) .
N
Since R1,1 = σ2 /N = f + ϕ, using that (x + y)2k ≤ 22k (x2k + y 2k ) and
(3.37) we get the estimate
 
(R1,1 − R1,1 )2k ≤ 22k (f − f )2k + (ϕ − ϕ )2k
 k  2k
KB ∗ k ∗ KB ∗
≤ + exp(−B ) .
N2 N

We deduce from (3.20) that


 k
k
exp(−y) ≤ (3.41)
y

so that
 2k  k  2k  k
∗ KB ∗ k KB ∗ K 2B∗k
exp(−B ) ≤ = ,
N B∗ N N2

and the result follows. 



Combining with Lemma 3.1.10 we get the following.

Proposition 3.1.12. For k ≤ N we have


 k
Kk
E (R1,2 − R1,2 ) 2k
≤ (3.42)
N
 k
Kk
E (R1,1 − R1,1 )2k ≤ . (3.43)
N
202 3. The Shcherbina and Tirozzi Model

A further remarkable property to which we turn now is that the random


quantities R1,2 and R1,1 are nearly constant. A general principle that we
will study later (the Ghirlanda-Guerra identities) implies that (in some sense)
this “near constancy”is an automatic consequence of Proposition 3.1.12. On
the other hand, in the specific situation considered here, Shcherbina and
Tirozzi discovered ([134]) a special argument that gives a much better rate
of convergence than general principles. The idea is that if we think of R1,2
and R1,1 as functions of the Gaussian r.v.s (gi,k )i≤N,k≤M and (gi )i≤N√, they
are essentially Lipschitz functions with Lipschitz constant of order 1/ N ; so
that we can use (3.17). We need however to work a bit more before we can
show this. We recall the notation b = σ and we will use not only (3.6) but
also (3.7).
Lemma 3.1.13. For any random function f on RN we have
 (σ − b)f (σ)  ≤ K f 2 1/2
. (3.44)
Consequently
(σ 1 − b) · (σ 2 − b)f (σ 1 )f (σ 2 ) =  (σ − b)f (σ) 2 ≤ K f 2 . (3.45)
Here the function f is permitted to depend on the randomness gi,k , gi .
Proof. For any y ∈ RN , using the Cauchy-Schwarz inequality, we see that
(σ − b)f (σ) · y = (σ − b) · yf (σ)
≤ ((σ − b) · y)2 1/2 f 2 1/2
.
We then use (3.17) for f (σ) = σ · y and k = 1 to see that ((σ − b) · y)2 ≤
Ky2 , so that combining with the above we get
(σ − b)f (σ) · y ≤ Ky f 2 1/2
.
Since this holds for any value of y, (3.44) follows. 

We denote by
B the operator norm of the matrix (gi,k )i≤N,k≤M , (3.46)
so that for any sequences (xi )i≤N and (yk )k≤M we have
   1/2   1/2
gi,k xi yk ≤ B x2i yk2 ,
i≤N,k≤M i≤N k≤M

and, equivalently,
  2 1/2  1/2
gi,k yk ≤B yk2 . (3.47)
i≤N k≤M k≤M

It is useful to think that for all practical purposes we have B 2 ≤ KN , as


is shown in Lemma A.9.1. 
We recall the standard notation Sk = N −1/2 i≤N gi,k σi .
3.1 The Power of Convexity 203

Lemma 3.1.14. Given any y = (yk )k≤M ∈ RM , the function



σ → f (σ) = u (Sk )yk
k≤M

 1/2 √ √
has a Lipschitz constant A ≤ KB k≤M yk2 / N = KB y/ N .
Proof. Since
∂ 1 
f (σ) = √ gi,k u (Sk )yk ,
∂σi N k≤M

the length of the gradient of f is


  2 1/2  1/2
1   KB  2
gi,k u (Sk )yk ≤ √ yk
N N k≤M
i≤N k≤M

by (3.47) and since |u (Sk )| ≤ D. 




Lemma 3.1.15. Let us denote by U = U(σ) the M -dimensional vector


(u (Sk ))k≤M . Then for any random function f on RN we have
KB 2
 (U − U )f  ≤ √ f 1/2
(3.48)
N
and, consequently
2
KB
(U(σ 1 ) − U )(U(σ 2 ) − U )f (σ 1 )f (σ 2 ) ≤ f2 . (3.49)
N
Proof. It is identical to the proof of Lemma 3.1.13. If y ∈ RM , then
(U − U )f · y = (U − U ) · yf
≤ ((U − U ) · y)2 1/2
f2 1/2
.
Using Lemma 3.1.14 and applying√(3.17) to f (σ) = U(σ) · y, we obtain that
((U − U ) · y)2 1/2 ≤ KB y/ N . Therefore
KB y 2
(U − U )f · y ≤ √ f 1/2
,
N
and this yields (3.48). Furthermore, the left-hand side of (3.49) is the square
of the left-hand side of (3.48). 


Proposition 3.1.16. Let us denote by ∇ the gradient of R1,1 (resp.


R1,2 ) when this quantity is seen as a function of the numbers (gi,k )i≤N,k≤M
and (gi )i≤N . Then, recalling the quantities B ∗ of (3.27) and B of (3.46) we
have  ∗ 
B B ∗2 B 2
∇ ≤ K
2
+ .
N2 N4
204 3. The Shcherbina and Tirozzi Model

If we think that B and B 2 are basically of order N , this shows that ∇2
is about √
1/N , i.e. that the functions R1,2 and R1,1 have Lipschitz constants
about 1/ N .
Proof. With the customary abuse of notation we have
∂ 1
R1,1 = √ ( R1,1 σi1 u (Sk1 ) − R1,1 σi1 u (Sk1 ) )
∂gi,k N
1
= √ ( f (σ 1 )σi1 u (Sk1 ) ) ,
N
where f (σ 1 ) = R1,1 − R1,1 . We define σ̇i1 = σi1 − σi and u̇ (Sk1 ) = u (Sk1 ) −
u (Sk1 ) . Since f = 0 the identity

f (σ 1 )σi1 u (Sk1 ) = f (σ 1 )σ̇i1 u (Sk1 ) + σi f (σ 1 )u̇ (Sk1 )

holds. Thus
 ∂ 2
1 
R1,1 = f (σ 1 )σi1 u (Sk1 ) 2
≤ 2(I + II) ,
∂gik N
i,k i,k

where, using replicas


1 
I= f (σ 1 )σ̇i1 u (Sk1 ) 2
N
i≤N,k≤M
1 
= f (σ 1 )f (σ 2 )σ̇i1 σ̇i2 u (Sk1 )u (Sk2 )
N
i≤N,k≤M
1 
= f (σ 1 )f (σ 2 )(σ 1 − b) · (σ 2 − b)u (Sk1 )u (Sk2 ) ,
N
k≤M

and
 
1  2
II = σi f (σ 1 )u̇ (Sk1 ) 2
N
i≤N k≤M
  
1
= σi 2 f (σ 1 )f (σ 2 )u̇ (Sk1 )u̇ (Sk2 )
N
i≤N k≤M
  
1
= σi 2 f (σ 1 )f (σ 2 )(U(σ 1 ) − U ) · (U(σ 2 ) − U ) .
N
i≤N

Using (3.35) for k = 1 we get


KB
f2 ≤ . (3.50)
N2
We use (3.45) with f (σ 1 )u (Sk1 ) instead of f to get, since |u | ≤ D and
M ≤ N,
3.1 The Power of Convexity 205

KM 2 KB ∗
I≤ f ≤ .
N N2
We note that (3.30) used for k = 1 implies

1  1  2 1 KB ∗
R1,2 = σi 2
≤ σi = σ2 ≤ . (3.51)
N N N N
i≤N i≤N

We use (3.49) and (3.50) to get


  
1 2 KB
2
KB 2 B ∗2
II ≤ σi f2 ≤ .
N N N4
i≤N

We take care in a similar manner of the term ∂ R1,1 /∂gi , and the case of
R1,2 is similar. 


Proposition 3.1.17. For k ≤ N/4 we have


 k
Kk
E( R1,1 − E R1,1 )2k ≤ (3.52)
N
 k
Kk
E( R1,2 − E R1,2 ) 2k
≤ . (3.53)
N

Proof. We consider the space RN ×M × RN , in which we denote the generic


point by g = ((gi,k )i≤N,k≤M , (gi )i≤N ). We provide this space with the canon-
ical Gaussian measure γ. Integration with respect to this measure means that
we take expectation in the (gi,k ), (gi ) seen as independent standard Gaussian
r.v.s. Let us consider the convex set

C = {g; B ∗ ≤ LN ; B 2 ≤ LN } ,

where we have chosen the number L large enough that

P(C c ) ≤ L exp(−N ) . (3.54)

(To see that this is possible we recall Lemma A.9.1 and that E exp B ∗ /4 ≤
exp LN by (3.33).) Let us think of R1,1 (resp. R1,2 ) as a function f (g),
√∇f  ≤
so that by Proposition 3.1.16, on C the gradient ∇f of f satisfies 2

K/N , and since C is convex f satisfies (3.15) on C with A = K/ N .


Consider the probability measure γ on C with density proportional to
 
1  1 2
W = exp − gi,k −
2
gi . (3.55)
2 2
i≤N,k≤M i≤N

By (3.17) we have
206 3. The Shcherbina and Tirozzi Model
 k
Kk
∀k ≥ 1 , (f − m)2k dγ ≤ , (3.56)
N

where m = f dγ . The rest of the proof consists simply in checking as
expected that the set C c is so small that (3.52) and (3.53) follow from (3.56).
This is tedious and occupies the next half page. By definition of γ , for any
function h we have
 
hW dσ hdγ
hdγ =  C
= C .
C
W dσ γ(C)

Thus
 k
  Kk
E 1C (f − m)2k = (f − m) dγ = γ(C)
2k
(f − m) dγ ≤ 2k
,
C C N

and
   
E(f − m)2k = E 1C (f − m)2k + E 1C c (f − m)2k
 k
Kk  
≤ + E 1C c (f − m)2k
N
 k
Kk  1/2
≤ + P(C c )1/2 E(f − m)4k .
N

Using (3.51), we see that |f | ≤ KB
 /N , and since γ is4ksupported by C and

B ≤ LN on C we have |m| = | f dγ | ≤ K. Also (Ef )1/2 ≤ K k by (3.30)
and (3.32). Therefore (E(f − m)4k )1/2 ≤ K k . Hence, recalling that by (3.54)
we have P(C c ) ≤ exp(−N ) and using that exp(−N/2) ≤ (2k/N )k by (3.41)
we obtain
 k    k
Kk N Kk
E(f − m)2k ≤ + L exp − Kk ≤
N 2 N

for k ≤ N . The conclusion follows by the symmetrization argument (3.22).




Combining Propositions 3.1.12 and 3.1.17, we have proved the following.

Theorem 3.1.18. For k ≤ N/4, and assuming (3.6) and (3.7) we have
 k
  Kk
ν (R1,1 − ν(R1,1 ))2k ≤ (3.57)
N
 k
  Kk
ν (R1,2 − ν(R1,2 ))2k
≤ , (3.58)
N

where K depends only on κ0 , h0 and D.


3.2 The Replica-Symmetric Equations 207

3.2 The Replica-Symmetric Equations

Theorem 3.1.18 brings forward the importance of the numbers

q = qN,M = ν(R1,2 ) ; ρ = ρN,M = ν(R1,1 ) . (3.59)

The notation, similar to that of the previous chapter, should not hide that
the procedure is different. The numbers q and ρ are not defined through
a system of equations, but by “the physical system”. They depend on N
and M . It would help to remember the definition (3.59) now. The purpose
of the present section is to show that q and ρ nearly satisfy the system of
“replica-symmetric” equations (3.69), (3.76) and (3.104) below. These equa-
tions should in principle allow the computation of q and ρ.
Since the cavity method, i.e. the idea of “bringing forward the influence
of the last spin” was successful in previous chapters, let us try it here. The
following approach is quite close to that of Section 2.2 so some familiarity
with that section would certainly help the reader who wishes to follow all
the details. Consider two numbers r and r, with r ≤ r. Consider a centered
Gaussian r.v. Y , independent of all the other r.v.s already considered, with
E Y 2 = r, and consider 0 < t < 1. We write

1  t
Sk,t = √ gi,k σi + gN,k σN , (3.60)
N N
i≤N −1

and we consider the Hamiltonian


 
− HN,M,t (σ) = u(Sk,t (σ)) − κσ2 + h gi σi
k≤M i≤N
√ σ2
+ σN 1 − tY − (1 − t)(r − r) N . (3.61)
2
Comparing (3.60) with (2.15) we observe that now we do not have the last
term (1 − t)/N ξk of (2.15). The purpose of this term was to ensure that
the variance of the quantity (2.15) does not depend on t. Since it is no longer
2
true that σN = 1 we can no longer use the same device here. Fortunately,
as already pointed out, this device was not essential. The last term in the
Hamiltonian (3.61) also accounts in a more subtle way for the fact that it is
2
not true that σN = 1.
We denote an average for the Gibbs measure with Hamiltonian (3.61) by
· t , and we write νt (f ) = E f t , νt (f ) = dνt (f )/dt. We recall the notation

ε = σN .
Please do not be discouraged by the upcoming formula. Very soon it will
be clear to you that Proposition 3.2.1 is no more complicated to use than
Proposition 2.2.3.
208 3. The Shcherbina and Tirozzi Model

Proposition 3.2.1. If f is a function on (RN )n , then for α = M/N we


have
νt (f ) = I + II + III + IV + V ,
where

α    
I= νt ε2 u 2 (SM,t
 
) + u (SM,t ) f
2
≤n

  
− nνt ε2n+1 u 2 (SM,t
n+1 n+1
) + u (SM,t ) f (3.62)
  
 
II = α νt ε ε u (SM,t )u (SM,t )f
1≤< ≤n
 
−n 
νt ε εn+1 u (SM,t n+1
)u (SM,t )f
≤n

n(n + 1) 
n+1 n+2
+ νt εn+1 εn+2 u (SM,t )u (SM,t )f (3.63)
2
  
III = −r νt (ε ε f ) − n νt (ε εn+1 f )
1≤< ≤n ≤n

n(n + 1)
+ νt (εn+1 εn+2 f ) (3.64)
2
 
r 
IV = − νt (ε2 f ) − nνt (ε2n+1 f ) (3.65)
2
≤n
 
1
V = (r − r) νt (ε2 f ) − nνt (ε2n+1 f ) . (3.66)
2
≤n

We do not merge in this statement the similar terms IV and V since it is


then easier to explain why the formula is true.
Proof. Of course this is obtained by differentiation and integration by parts.
Probably the best way to understand this formula is to compare it with
Proposition 2.2.3. The term V is simply created by the last term of (3.61);
the term IV, created when integrating by parts in Y , was invisible in (2.23)
because there ε2 = 1. The really new feature is the term I, which is created
by the fact that the variances of the quantities Sk,t are not constant. It is
exactly to avoid this term in Proposition 2.2.3 that we introduced the last
term (1 − t)/N ξk in the quantities of (2.15), see Exercise 2.2.4. 

The reason why the formula of Proposition 3.2.1 is manageable is exactly
the same why the formula of Proposition 2.2.3 is manageable. Quantities such
as
 
u (SM,t )u (SM,t )
can be replaced by their averages
3.2 The Replica-Symmetric Equations 209

1   
u (Sk,t )u (Sk,t ),
M
k≤M

and we can expect these to behave like constants. If we make the proper
choice for r and r, the terms II and III will nearly cancel each other, while
the term I will nearly cancel out with IV + V. For these choices (that will
not be very hard to guess) we will have that νt (f )  0, i.e. ν(f )  ν0 (f ). The
strategy to prove the replica-symmetric equations will then (predictably) be
as follows. Using symmetry between sites, we have ρ = ν(R1,1 ) = ν(ε21 ), and
ν(ε21 )  ν0 (ε21 ) is easy to compute because the last spin decouples for ν0 .
Before we start the derivation of the replica-symmetric equations, let us
try to describe the overall strategy. This is best done by comparison with
the situation of Chapter 2. There, to compute the quantity q, that contained
information about the spins σi , we needed an auxiliary quantity r, that con-
tained information about the “spins” Sk . We could express r as a function
of q and then q as a function of r. Now we have two quantities q and ρ that
contain information about the spins σi . To determine them we will need the
two auxiliary quantities r and r, which “contain information about the spins
Sk ”. We will express r and r as functions of q and r, and in a second stage
we will express r and r as functions of q and ρ, and reach a system of four
equations with four unknown.
We now define r and r as functions of q and ρ. Of course the forthcom-
ing formulas have been guessed by analyzing the “cavity in M ” arguments
of Chapter 2. Consider independent √ standard Gaussian r.v.s ξ, z. Consider

numbers 0 ≤ x < y, the r.v. θ = z x + ξ y − x, and define
 2
Eξ (u (θ) exp u(θ))
Ψ (x, y) = αE
Eξ exp u(θ)
 2
α Eξ (ξ exp u(θ))
= E , (3.67)
y−x Eξ exp u(θ)
using integration by parts (of course as usual Eξ denotes averaging in ξ only).
We also define
 
Eξ (u (θ) + u 2 (θ)) exp u(θ)
Ψ (x, y) = αE
Eξ exp u(θ)
 2 
α Eξ (ξ − 1) exp u(θ)
= E , (3.68)
y−x Eξ exp u(θ)
integrating by parts twice. We set
r = Ψ (q, ρ); r = Ψ (q, ρ) . (3.69)
1/2 1/2
This makes sense because by the Cauchy-Schwarz inequality R1,2 ≤ R1,1 R2,2
and thus q = ν(R1,2 ) ≤ ρ = ν(R1,1 ). We also observe that from the first line
of (3.67) and (3.68) we have r, r ≤ K(D). We first address a technical point
by proving that r ≤ r.
210 3. The Shcherbina and Tirozzi Model

Lemma 3.2.2. Consider a number c > 0 and a concave function w. Assume


that w ≤ −c < 0. Consider the unique point x∗ where w (x∗ ) = 0. Then

c (x − x∗ )2 exp w(x)dx ≤ exp w(x)dx . (3.70)

Proof. We have w (x) ≥ −c(x − x∗ ) for x ≤ x∗ and w (x) ≤ −c(x − x∗ ) for


x ≥ x∗ , so that c(x − x∗ )2 ≤ −w (x)(x − x∗ ). Hence

c (x − x∗ )2 exp w(x)dx ≤ −w (x)(x − x∗ ) exp w(x)dx

= exp w(x)dx ,

by integration by parts. 


Lemma 3.2.3. We have r ≤ r.

Proof. If v is a concave function, using (3.70) for w(x) = v(x) − x2 /2 and


c = 1 implies that if ξ is a standard Gaussian r.v., then

E((ξ − x∗ )2 exp v(ξ)) ≤ E exp v(ξ) . (3.71)

Minimization of the left-hand side over x∗ yields


 2
E(ξ exp v(ξ))
E(ξ exp v(ξ)) −
2
≤ E exp v(ξ) (3.72)
E exp v(ξ)

i.e.  2
E((ξ 2 − 1) exp v(ξ)) E(ξ exp v(ξ))
≤ . (3.73)
E exp v(ξ) E exp v(ξ)

Now we fix z and we use this inequality for the function v(x) = u(z q +

x ρ − q). Combining with (3.67) and (3.68) yields the result. 

We are now in a position to guess how to express q and ρ as functions of
r and r.

Proposition 3.2.4. We have

1 r + h2
ρ= + + δ1 (3.74)
2κ + r − r (2κ + r − r)2
r + h2
q= + δ2 , (3.75)
(2κ + r − r)2

with δ1 ≤ |ν(ε21 ) − ν0 (ε21 )| and δ2 ≤ |ν(ε1 ε2 ) − ν0 (ε1 ε2 )|.


3.2 The Replica-Symmetric Equations 211

Presumably δ1 and δ2 will be small when N is large, so that (q, ρ, r, r) is


a near solution of the system of equations (3.69) together with

1 r + h2
ρ= + (3.76)
2κ + r − r (2κ + r − r)2
r + h2
q= . (3.77)
(2κ + r − r)2
These four equations are the “replica-symmetric” equations of the present
model. Please note that (3.76) and (3.77) are exact equations, in contrast
with (3.74) and (3.75). When we write the equations (3.69), (3.76) and (3.77),
we think of q, ρ, r, r as variables, while in (3.74) and (3.75) they are given by
(3.59). This follows our policy that a bit of informality is better than bloated
notation. This will not be confusing. Until the end of this section, q and ρ
keep the meaning (3.59), and afterwards we will revert to the notation qN,M
and ρN,M .
Proof. Symmetry between sites entails ρ = ν(R1,1 ) = ν(ε21 ), q = ν(R1,2 ) =
ν(ε1 ε2 ), so it suffices to show that ν0 (ε21 ) is given by the right-hand side of
(3.76) and ν0 (ε1 ε2 ) is given by the right-hand side of (3.77).
We observe that, for ν0 , the last spin decouples from the others (which is
a major reason behind the definition of ν0 ) so that

1 ε2 
ν0 (ε21 ) = E ε2 exp ε(Y + hgN ) − (2κ + r − r) dε (3.78)
Z 2
  2
1 ε2
ν0 (ε1 ε2 ) = E ε exp ε(Y + hgN ) − (2κ + r − r) dε , (3.79)
Z 2
where 
ε2
Z= exp ε(Y + hgN ) − (2κ + r − r) dε .
2
We compute these Gaussian integrals as follows. If z is a centered Gaussian
r.v., and d is a number, writing z 2 edz = z(zedz ), integration by parts yields

E z 2 edz = E z 2 (E edz + dE zedz )

E zedz = dE z 2 E edz .
Thus
E z 2 edz
= E z 2 + d2 (E z 2 )2 .
E edz
Using this for d = Y + hgN , E z 2 = 1/(2κ + r − r) we get

1 (Y + hgN )2
ε21 0 = + ,
2κ + r − r (2κ + r − r)2
and, taking expectation,
212 3. The Shcherbina and Tirozzi Model

1 r + h2
ν0 (ε21 ) = + , (3.80)
2κ + r − r (2κ + r − r)2
and we compute ν0 (ε1 ε2 ) similarly. 

We now start the real work, the proof that when N is large, δ1 and δ2
in Proposition 3.2.4 are small. In order to have a chance to make estimates
using Proposition 3.2.1, we need some integrability properties of ε = σN , and
we address this technical point first. We will prove an exponential inequality,
which is quite stronger than what we really need, but the proof is not any
harder than that of weaker statements. We start by a general principle.
Lemma 3.2.5. Consider a concave function T (σ) ≤ 0 on RN , numbers
(ai )i≤N , numbers κ, κ > 0 and a convex subset C of RN . Consider the prob-
ability measure G on RN given by
  
1
∀B , G(B) = exp T (σ) − κσ2 − κ σN 2
+ ai σi dσ , (3.81)
Z B∩C
i≤N

where Z is the normalizing factor. Let us denote by ρ the generic point of


RN −1 , so that, keeping the notation σN = ε, we write σ = (ρ, ε). Consider
the projection C of C on the last component of RN , that is

C = {ε ∈ R ; ∃ρ ∈ RN −1 , (ρ, ε) ∈ C} .

Consider the function f on C defined by


   
f (ε) = log exp T (σ) − κ 2
σi + ai σi dρ . (3.82)
(ρ,ε)∈C i≤N −1 i≤N −1

Then this function is concave and the law μ of σN under G is the probability
measure on C with density proportional to exp w(x), where

w(x) = f (x) − (κ + κ )x2 + aN x .

Proof. Let us define


 
F (σ) = T (σ) − κ σi2 + ai σi ,
i≤N −1 i≤N −1

so that (3.82) simply means that

f (ε) = log exp F (σ)dρ . (3.83)


(ρ,ε)∈C

The definition of μ as the law of σN under G implies that for any function v,
  
1
v(x)dμ(x) = v(ε) exp T (σ)−κσ −κ σN +
2 2
ai σi dσ . (3.84)
Z C
i≤N
3.2 The Replica-Symmetric Equations 213

where Z is the normalizing factor. Now, since σN = ε, we have



T (σ) − κσ2 − κ σN2
+ ai σi = F (σ) − (κ + κ )ε2 + aN ε ,
i≤N

Integration in ρ first in the right-hand side of (3.84) gives


 
1
v(x)dμ(x) = v(ε) exp F (σ)dρ exp(−(κ + κ )ε2 + aN ε)dε
Z C (ρ,ε)∈C

1
= v(ε) exp(f (ε) − (κ + κ )ε2 + aN ε)dε
Z C
1
= v(ε) exp w(ε)dε .
Z C

This proves that μ has a density exp w(ε). To finish the proof it suffices to
show that f is concave. Let us write

C(ε) = {ρ ∈ RN −1 ; (ρ, ε) ∈ C} ,

so that recalling (3.83) we get

exp f (ε) = exp F (σ)dρ = 1C(ε) (ρ) exp F (σ)dρ .


C(ε)

Fixing ε1 , ε2 ∈ R and 0 < s < 1, we define the functions

W (ρ) = 1C(sε1 +(1−s)ε2 ) (ρ) exp F (ρ, sε1 + (1 − s)ε2 )


U (ρ) = 1C(ε1 ) (ρ) exp F (ρ, ε1 )
V (ρ) = 1C(ε2 ) (ρ) exp F (ρ, ε2 ) .

We observe that (3.11) holds by concavity of F and we simply use (3.12) to


obtain that f (sε1 +(1−s)ε2 ) ≥ sf (ε1 )+(1−s)f (ε2 ). (The argument actually
proves the general fact that the marginal of a log-concave density function is
log-concave.) 

We return to the problem of controlling σN .

Lemma 3.2.6. Under (3.25) we have


 
σ2
νt exp N ≤ K .
K

Let us remind the reader that K depends only on κ0 , h0 and D, so in


particular it is does not depend on t.
The proof will use several times the following simple observation. If two
quantities f1 , f2 satisfy νt (exp(f12 /K)) ≤ K and νt (exp(f22 /K)) ≤ K then
214 3. The Shcherbina and Tirozzi Model

νt (exp((f1 + f2 )2 /K)) ≤ K (of course for a different K). This follows from
the convexity of the function x → exp x2 .
Proof. The Gibbs measure corresponding to the
 Hamiltonian (3.61) is given
by the formula (3.81) for C = RN , T (σ) = k≤M u(Sk,t (σ)), ai = hgi if

i < N , aN = hgN + 1 − tY and κ = (1 − t)(r − r)/2. Lemma 3.2.5 implies
that the function
   
f (σN ) = log exp u(Sk,t ) − κ 2
σi + h gi σi dρ (3.85)
k≤M i≤N −1 i≤N −1

is concave, and that the law of σN under · t has a density proportional to


exp w(x), where
w(x) = f (x) − κ(t)x2 + Yt x

for κ(t) = κ + (1 − t)(r − r)/2 and Yt = 1 − tY + hgN . We note that since
r ≥ r we have, recalling (3.25), that κ(t) ≥ κ ≥ κ0 .
Consider the point x∗ where the concave function w(x) is maximum (the
dependence of this point on t is kept implicit). It√follows from (3.70) that
(σ√N − x∗ )2 t ≤ 1/2κ0 , so that |σN − x∗ | t ≤ 1/ 2κ0 and | σN t − x∗ | ≤
1/ 2κ0 , and therefore
1
σN t ≤√ + |x∗ | .
2κ0
Now, since w (x) ≤ −2κ, we have |w (x∗ ) − w (0)| ≥ 2κ|x∗ |, and since
w (x∗ ) = 0 this shows that |x∗ | ≤ |w (0)|/2κ. Since |w (0)| = |f (0) + Yt | ≤
|f (0)| + |Yt |, we have shown that
1 1  
| σN t | ≤ √ + |Yt | + |f (0)| . (3.86)
2κ0 2κ0

Also, it follows from (3.16) that

(σN − σN t )2
exp ≤4, (3.87)
K t

so that it suffices to prove that E exp σN 2t /K ≤ K, and, by (3.86) it suffices


to prove that E exp(f (0)2 /K) ≤ K. We compute f (0) by differentiating
(3.85). We observe that the only dependence of the right-hand side on σN is
through the terms u(Sk,t ) and that
 √
∂Sk,t  t
= √ Sk,0 ,
∂σN σN =0 N
where 
1
Sk,0 = √ gi,k σi (= Sk,t |t=0 ) .
N i≤N −1
3.2 The Replica-Symmetric Equations 215

Therefore we get √
t 
f (0) = √ gN,k u (Sk,0 ) ,
N k≤M

where · is a certain Gibbs average that does not depend on the r.v.s gN,k .
Let us denote by E0 expectation in the r.v.s gN,k only. Then, since |u | ≤ D,

t 
E0 f (0)2 = u (Sk,0 ) 2
≤ αD2 ,
N
k≤M

and thus by (A.11) we have

f (0)2 1
E0 exp ≤ ≤2.
4αD2 1 − 1/2

Therefore

f (0)2 f (0)2
E exp = EE 0 exp ≤2. 

4αD2 4αD2

Despite this excellent control, the fact that σN is not bounded does create
hardship. For example, it does not seem possible to use the argument of
Lemma 2.3.7 to compare ν(f ) and νt (f ) when f ≥ 0.
We turn to the study of terms I and II of Proposition 3.2.1. Let us consider
the Hamiltonian
 
− HN,M −1,t (σ) = u(Sk,t (σ)) − κσ2 + h gi σi
k≤M −1 i≤N
√ σ2
+ σN 1 − tY − (1 − t)(r − r) N . (3.88)
2
The difference with (3.61) is that the summation is over k ≤ M − 1 rather
than over k ≤ M . We denote by · t,∼ an average for the corresponding Gibbs
measure.
We consider standard Gaussian r.v.s z, (ξ  ) that are independent of all
the other r.v.s already considered, and we set
√ √
θ = z q + ξ  ρ − q . (3.89)

For 0 ≤ v ≤ 1 we define
√  √
Sv = vSM,t + 1 − vθ . (3.90)

The dependence on t is kept implicit; when using Sv we think of t (and M )


as being fixed. We then define
216 3. The Shcherbina and Tirozzi Model
  

f exp ≤n u(Sv ) t,∼
νt,v (f ) = E n . (3.91)
Eξ exp u(Sv1 ) t,∼

Here as usual Eξ means expectation in all the r.v.s labeled ξ, or, equivalently
here, in ξ 1 . This really is the same as definition (2.35). The notation is a bit
different (there is an expectation Eξ in the denominator) simply because in
(2.35) we made the convention that this expectation Eξ was “built-in” the
average · t,∼ and we do not do it here (for the simple reason that we do
not want to have to remind the reader of this each time we write a similar
formula). Obviously we have

νt,1 (f ) = νt (f ) .

The magic of the definition of νt,v is revealed by the following, whose proof
is nearly identical to that of Lemma 2.3.1.
n
Lemma 3.2.7. Consider a function f on ΣN . Then we have

νt,0 (f ) = E f t,∼ , (3.92)

ανt,0 (u (S01 )u (S02 )f ) = rE f t,∼ = rνt,0 (f ) (3.93)


and

ανt,0 ((u (S01 ) + u (S01 )2 )f ) = rE f t,∼ = rνt,0 (f ) . (3.94)

Throughout the rest of the chapter we reinforce (3.7) into

∀ , 1 ≤  ≤ 4 , |u() | ≤ D . (3.95)

Lemma 3.2.8. If Bv is one of the following: 1, u (Sv1 )u (Sv2 ), u 2 (Sv1 ) +


u (Sv1 ), then for a function f on ΣN n
, we have

d  
 
 νt,v (f Bv ) ≤ K(n, κ0 , h0 , D) νt,v (|f ||R, − ρ|)
dv
≤n+1

 1
+ νt,v (|f ||R, − q|) + νt,v (|f |) . (3.96)

N
1≤< ≤n+2

The proof is nearly identical to that of (2.59) (except that one does not use
Hölder’s inequality in the last step). The new feature is that there are more
terms when we integrate by parts. Defining
1  1
Sv = √ SM,t − √ θ ,
2 v 2 1−v
we then have
3.2 The Replica-Symmetric Equations 217

1
ESv Sv = (R, − ρ) ,
2
while in (2.59) we had ESv Sv = 0. This is what creates the new terms
νt,v (|f ||R, − ρ|) in (3.96) compared to (2.59). In (2.59) these terms do not
occur because there R, = 1 = ρ.
We have proved in Theorem 3.1.18 that, with respect to ν, we have R1,2 
q and R1,2  ρ. If the same is true with respect to νt,v then (3.96) will go
a long way to fulfill our program that the terms of Proposition 3.2.1 nearly
cancel out.
The first step will be to prove that in the bound (3.96) we can replace νt,v
by νt in the right-hand side. (So that it to use this bound it will suffice to know
that R1,2  q and R1,2  ρ for νt ). Unfortunately we cannot immediately
write a differential inequality such as |dνt,v (f )/dv| ≤ K(n)νt,v (f ) when f ≥ 0
because it is not true that the quantities |R, −ρ| and |R, −q| are bounded.
But it is true that they are “almost bounded” in the sense that they are
bounded outside an exponentially small set, namely that we can find K for
which

νt,v (1{|R1,2 −q|≥K} ) ≤ exp(−4N ) (3.97)


νt,v (1{|R1,1 −ρ|≥K} ) ≤ exp(−4N ) . (3.98)

The reader wishing to skip the proof of this purely technical point can
jump ahead to (3.105) below. To prove these inequalities, we observe from
(3.91) that when f is a function on ΣN (that does not depend on the r.v.s
ξ  ) then
νt,v (f ) = E f t,v ,
where · is a Gibbs average for the Hamiltonian
t,v
 √ √ √
− H(σ) = u(Sk,t (σ)) + uv ( vSM,t (σ) + 1 − v qz) − κσ2
k<M
 √ σ2
+h gi σi + σN 1 − tY − (1 − t)(r − r) N , (3.99)
2
i≤N

and where the function uv is given by


√ √
uv (x) = log E exp u(x + ξ 1 − v ρ − q) .

This function is concave because a marginal of the log-concave function is


log-concave, as was shown in the proof of Lemma 3.2.5, and since exp uv (x)
is the marginal of the log concave function
√ √
(x, y) → exp(u(x) + y 1 − v ρ − q − y 2 /2) .

Another
√ proof of the concavity of uv is as follows. Writing X = x +

ξ 1 − v ρ − q, the concavity of uv i.e. the fact uv ≤ 0 means that
218 3. The Shcherbina and Tirozzi Model
 2
E((u (X)2 + u (X))eu(X) ) Eu (X)eu(X)
≤ ,
Eeu(X) Eeu(X)

an inequality
√ that we can prove by applying (3.73) to the function v(ξ) =

u(x + ξ 1 − v ρ − q) and integration by parts as in (3.67) and (3.68).
There is nothing to change to the proof of Lemma 3.2.6 to obtain
 
σ2
νt,v exp N ≤ K . (3.100)
K

Again K is as usual in this chapter, depending only on D and the quantities


κ0 , h0 of (3.25) and in particular it does not depend on t or v. There is very
little to change to the proof of (3.28) to get
 κ 
exp σ2 ≤ exp KB ∗ , (3.101)
2 t,v

where · t,v denotes an average for the Gibbs measure with Hamiltonian
(3.99). We now prove that
 
σ2
νt,v exp ≤ exp LN . (3.102)
K

For this, we recall that E exp(B ∗ /4) ≤ exp LN by (3.33), and we denote
by K0 the constant K in (3.101). We define K1 = 8K0 /κ. Using Hölder’s
inequality in the first inequality and (3.101) in the second inequality we get
   2/κK1
σ2 σ2 κ
νt,v exp = E exp ≤ E exp σ2
K1 K1 t,v 2 t,v
  ∗
2K0 ∗ B
≤ E exp B = E exp ≤ exp LN .
κK1 4

It follows (for yet another constant K) that

νt,v (1{σ2 ≥KN } ) ≤ exp(−4N ) .

Since
N |R1,2 | = |σ 1 · σ 2 | ≤ σ 1 σ 2  ,
we have

|R1,2 | ≥ t ⇒ (N t)2 ≤ σ 1 2 σ 2 2
⇒ σ 1 2 ≥ tN or σ 2 2 ≥ tN ,

and it follows that

νt,v (1{R1,1 ≥K} ) ≤ 2 exp(−4N ) ; νt,v ({|R1,2 | ≥ K}) ≤ 2 exp(−4N ) . (3.103)


3.2 The Replica-Symmetric Equations 219

Since by (3.28) and (3.32) we have |ρ| ≤ K and |q| ≤ K, (3.97) and (3.98)
follow. Let us also note from (3.102) by a similar argument that
   
νt,v (R1,2 − q)8 ≤ K ; νt,v (R1,1 − ρ)8 ≤ K , (3.104)

where of course there is nothing magic in the choice of the number 8.


It seems unlikely that what happens in the exponentially small set where
|R, − q| and |R, − ρ| might be large could be troublesome; nonetheless we
must spend a few lines to check it. We recall that for a function f ∗ we write
ν(f ∗ )1/4 rather than (ν(f ∗ ))1/4 (etc.). We have

νt,v (|f ||R, − ρ|) ≤ νt,v (|f ||R, − ρ|1{|R, −ρ|≤K} )


+ νt,v (|f ||R, − ρ|1{|R, −ρ|>K} )
≤ Kνt,v (|f |)
 1/4
+ νt,v (f 2 )1/2 νt,v (R, − ρ)4 νt,v (1{|R, −ρ|>K} )1/4
≤ Kνt,v (|f |) + K exp(−N )νt,v (f 2 )1/2 ,

using (3.104) and (3.98). We then proceed in a similar manner for |R, − q|.
4
In this fashion, we deduce from (3.96) that, if f is any function on ΣN
then
d 
 
 νt,v (f ) ≤ Kνt,v (|f |) + K exp(−N ) sup νt,v (f 2 )1/2 . (3.105)
dv v

Lemma 3.2.9. Consider a function f ∗ ≥ 0 on ΣN 4


. Then
 
νt,v (f ∗ ) ≤ K νt (f ∗ ) + exp(−N ) sup νt,v (f ∗2 )1/2 . (3.106)
v

Proof. This follows from (3.105) and Lemma A.13.1. 



Proposition 3.2.10. If f = ε21 or f = ε1 ε2 , we have
 
 
2 1/2
 
2 1/2 1
|νt (f )| ≤ K νt (R1,2 − q) + νt (R1,1 − ρ) + . (3.107)
N
Proof. The idea is that we reproduce the proof of (2.66), using Propo-
sition 3.2.1 instead of Proposition 2.2.3 and using Lemma 3.2.7 instead of
Lemma 2.3.1, Lemma 3.2.9 being an appropriate substitute for Lemma 2.3.4.
More specifically, computing νt (f ) through Proposition 3.2.1, and denoting
by R a quantity such that |R| is bounded by the right-hand of (3.107) (with
possibly a different value of K), we will prove that

ανt (ε2 (u 2 (SM,t


 
) + u (SM,t ))f ) = rν0 (ε2 f ) + R ; (3.108)
νt (ε2 f ) = ν0 (ε2 f ) +R;

ανt (ε ε u 
(SM,t )u (SM,t )f )= rν0 (ε ε f ) + R ;
νt (ε ε f ) = ν0 (ε ε f ) + R .
220 3. The Shcherbina and Tirozzi Model

We will prove only (3.108), since the proof of the other relations is entirely

similar. Let ϕ(v) = ανt,v (ε ε u (Sv )u (Sv )f ). Lemma 3.2.7 implies

|ϕ(1) − ϕ(0)| = |ανt (ε ε u (SM,t
 
)u (SM,t )f ) − rν0 (ε ε f )| . (3.109)

On the other hand, |ϕ(1)−ϕ(0)| ≤ supv |ϕ (v)|, and by (3.96) (used for ε ε f
rather than f ) we obtain


|ϕ (v)| ≤ K νt,v (|ε ε f ||R1 ,1 − ρ|)
1 ≤3

 1
+ νt,v (|ε ε f ||R1 ,2 − q|) + νt,v (|ε ε f |) . (3.110)
N
1≤1 <2 ≤4

Now since f = ε1 ε2 or f = ε21 , using Hölder’s inequality and then (3.100) we


get νt,v ((ε ε f )2 ) ≤ νt,v (ε41 ) ≤ K. Using the Cauchy-Schwarz inequality we
then deduce from (3.109) and (3.110) that

|ανt (ε ε u (SM,t
 
)u (SM,t )f ) − rνt (ε ε f )|
 1/2  1/2 1
≤ K sup νt,v (R1,1 − ρ)2 + νt,v (R1,2 − q)2 + .
v N

We finally conclude with (3.106) and (3.104), used for f = R1,2 − q or f =


R1,1 . 


We know from Theorem
√ 3.1.18 that ν((R1,2 − q)2 )1/2 ≤ K/ N and
ν((R1,1 −ρ) )
2 1/2
≤ K/ N , so in the right-hand side of (3.107), we would like
to replace νt by ν. Unfortunately, since σN is not bounded, it is unclear how
one could prove a differential inequality such as |νt (f )| ≤ Kνt (f ) to relate
νt and ν. The crucial observation to bypass this difficulty is that Theorem
3.1.18 holds uniformly over the functionals νt (with the same proof), so that,
if we set
qt = νt (R1,2 ) ; ρt = νt (R1,1 ) ,
we have in particular
  K   K
νt (R1,1 − ρt )4 ≤ 2 ; νt (R1,2 − qt )4 ≤ 2 . (3.111)
N N
Therefore it is of interest to bound q − qt and ρ − ρt .

Lemma 3.2.11. We have


K K
|qt − q| ≤ ; |ρt − ρ| ≤ . (3.112)
N N
3.2 The Replica-Symmetric Equations 221

Proof. Since q = q1 it suffices to prove that qt = dqt /dt satisfies |qt | ≤ K/N
(and similarly for ρ). Since qt = νt (R1,2 ), qt = νt (R1,2 ) is given by Proposition
3.2.1 for f = R1,2 .
A key observation is that the five terms of this proposition cancel out
(as they should!) if f is a constant, i.e. is not random and does not depend
on σ 1 , σ 2 , . . .. Therefore to evaluate νt (R1,2 ) we can in each of these terms
replace f = R1,2 by R1,2 − qt , because the contributions of qt to the various
terms cancel out.
The point of doing this is that the quantity R1,2 − qt is small (for νt )
as is shown by√(3.112), and therefore each of the terms of Proposition 3.2.1
is at most K/ N . This is seen by using that |u | ≤ D, |u | ≤ D, Hölder’s
inequality, (3.112) (and (3.100) to take care of the terms ε ε ). √
This argument is enough to prove (3.112) with a bound K/ N rather
than K/N . This is all what is required to prove Proposition 3.2.12 below.
The rest of this proof describes the extra work required to reach the
correct rate K/N (just for the beauty of it).
We proceed as in the proof of Proposition 3.2.10 (with now f = R1,2 − qt )
but in the right-hand side of (3.110) we use Hölder’s inequality as in
 1/4
νt,v (|ε ε f (R1 ,1 − ρ)|) ≤ νt,v ((ε ε )2 )1/2 νt,v (f 4 )1/4 νt,v (R1 ,1 − ρ)4 ,

and, since f = R1,2 − qt , we get



 1/4
|qt | ≤ sup νt,v (R1,2 − qt )4
v
 
 
4 1/4
 
4 1/4 1
× νt,v (R1,2 − q) + νt,v (R1,1 − ρ) +
N
 
K  
4 1/4
 
4 1/4 1
≤ √ sup νt,v (R1,2 − q) + νt,v (R1,1 − ρ) + ,
N v N
using (3.111) in the second line. Using (3.106) and (3.104) we get
 
K  
4 1/4
 
4 1/4 1
|qt | ≤ √ νt (R1,2 − q) + νt (R1,1 − ρ) + .
N N
Using (3.111) and the triangle inequality we obtain
 1/4  1/4 K
νt (R1,2 − q)4 ≤ |q − qt | + νt (R1,2 − qt )4 ≤ |q − qt | + √ (3.113)
N
 1/4  1/4 K
νt (R1,1 − ρ)4 ≤ |ρ − ρt | + νt (R1,1 − ρt )4 ≤ |ρ − ρt | + √ , (3.114)
N
and we reach that
K K
|qt | ≤ √ (|q − qt | + |ρ − ρt |) + .
N N
222 3. The Shcherbina and Tirozzi Model

Similarly we get
K K
|ρt | ≤ √ (|q − qt | + |ρ − ρt |) + ,
N N
so that if ψ(t) = |q − qt | + |ρ − ρt | the right derivative ψ (t) satisfies
 
ψ(t) 1 K
|ψ (t)| ≤ K √ + ≤ Kψ(t) + .
N N N
Since ψ(1) = 0, Lemma A.13.1 shows that ψ(t) ≤ K/N . 

Using (3.113), (3.114) and (3.112) we get νt ((R1,2 − q) ) 4 1/4
≤ K/N and
νt ((R1,1 − ρ)4 )1/4 ≤ K/N , so that combining with (3.107) we have proved
that |νt (f )| ≤ K/N for f = ε21 or f = ε1 ε2 , and therefore the following.
Proposition 3.2.12. We have
K K
|ν(ε21 ) − ν0 (ε21 )| ≤ √ ; |ν(ε1 ε2 ) − ν0 (ε1 ε2 )| ≤ √ . (3.115)
N N

Combining with Proposition 3.2.4, this shows that (q, ρ) is a solution of the
system of
√ replica-symmetric equations (3.69), (3.76) and (3.104) “with accu-
racy K/ N ”. Letting N → ∞ this proves in particular that this system does
have a solution, which did not seem obvious beforehand.
Let us consider the function
√ √
F (q, ρ) = αE log Eξ exp u(z q + ξ ρ − q)
1 q 1 h2
+ + log(ρ − q) − κρ + (ρ − q) , (3.116)
2ρ−q 2 2
which is defined for 0 ≤ q < ρ. It is elementary (calculus and integration by
parts as in Lemma 2.4.4) to show that the conditions ∂F/∂ρ = 0 = ∂F/∂q
mean that (3.69), (3.76) and (3.104) are satisfied.
We would like to prove that for large N the quantity
1
E log exp(−HN,M,t (σ))d(σ) (3.117)
N
is nearly F (q, ρ) + log(2eπ)/2. Unfortunately we see no way to do this unless
we know something about the uniqueness of the solutions of (3.69), (3.76),
(3.104).
Research Problem 3.2.13. (Level 2) Find general conditions under which
the equations (3.69), (3.76), (3.104) have a unique solution.
As we will show in the next section, Shcherbina and Tirozzi managed to
solve this problem in a very important case. Before we turn to this, we must
however address the taste of unfinished work left by Proposition 3.2.12. We
turn to the proof of the correct result.
3.2 The Replica-Symmetric Equations 223

Theorem 3.2.14. We have


K K
|ν(ε21 ) − ν0 (ε21 )| ≤ ; |ν(ε1 ε2 ) − ν0 (ε1 ε2 )| ≤ . (3.118)
N N

Consequently, (q, ρ) is a solution of the equations (3.69), (3.74) and (3.75)


“with accuracy K/N ”. Of course improving (3.113) into (3.118) is really a
side story; but it is not very difficult, so we cannot resist the pleasure of doing
it.
Proof. The proof we give is complete but sketchy, and filling in all details
should be a nice exercise for the motivated reader. We will obtain the esti-
mates
K K
|νt (ε21 )| ≤ ; |νt (ε1 ε2 )| ≤ .
N N
For this we will prove that when using Proposition 3.2.1 the cancellation of
the various terms occurs with accuracy K/N . Consider f = ε21 or f = ε1 ε2 ,
and Bv as in Lemma 3.2.8. To prove that in Proposition 3.2.1 this cancellation
of the various terms occurs with accuracy K/N we have to show that
K
|νt,1 (B1 f ) − νt,0 (B0 f )| ≤ . (3.119)
N
To prove this we replace the first order estimate
d 
 
|νt,1 (B1 f ) − νt,0 (B0 f )| ≤ sup  νt,v (Bv f )
0<v<1 dv

that we used in the proof of Proposition 3.2.10 by a second order estimate


    d2 
 d    
νt,1 (B1 f ) − νt,0 (B0 f ) − νt,v (Bv f )  ≤ sup  2 νt,v (Bv f ) . (3.120)
dv v=0 0<v<1 dv

Differentiating in v once creates terms that each contains a factor R, − ρ


or R, − q. Differentiating twice brings a second such factor in each term.
We know (3.111) (and a similar result for higher powers) and (3.112). Using
(3.106) shows that the right-hand side of (3.120) is ≤ K/N . Thus to prove
(3.119) the issue is to prove that
  
d
 νt,v (Bv f )  ≤ K .
 dv v=0
 N

Computation of this derivative shows that it is a sum of terms AE f (R, −


q) t,∼ or AE f (R, − ρ) t,∼ (and of a lower order term due to the difference
t
between R,  and R, ), where A is a quantity which does not depend on N .

Therefore it suffices to show that


K K
|E f (R, − ρ) t,∼ | ≤ ; |E f (R, − q) t,∼ | ≤ .
N N
224 3. The Shcherbina and Tirozzi Model

We have
K
|νt (f (R, − q)) − E f (R, − q) t,∼ |
, ≤
N
by proceeding as in (2.65) because,
√ as expected, the extra factor R, − q
allows one to gain a factor 1/ N (and similarly for R, − ρ). Therefore to
prove (3.119) it suffices to prove that

K K
|νt (f (R, − q))| ≤ ; |νt (f (R, − ρ))| ≤ .
N N

In the same manner that we have proved the inequality |νt (f )| ≤ K/ N , we
√ |νt (f (R, −q))| ≤ K/N and |νt (f (R, −ρ))| ≤ K/N (gaining
show now that 

a factor 1/ N because of the extra term R, − q or R, − ρ) so the issue is


to prove that
K K
|ν(f (R, − q))| ≤ ; |ν(f (R, − ρ))| ≤ .
N N
By symmetry among sites, when f = ε21 ,

ν(ε21 (R, − q)) = ν(R1,1 (R, − q)) = ν((R1,1 − ρ)(R, − q))

since ν(R, − q) = 0. Using Theorem 3.1.18 for k = 1 and the Cauchy-


Schwarz inequality we then obtain that ν(ε21 (R, − q)) ≤ K/N . The case of
f = ε1 ε2 is similar. 


3.3 Controlling the Solutions of the RS Equations

We recall the notation (3.59) (where the function u is implicit)

qN,M = ν(R1,2 ); ρN,M = ν(R1,1 ) .

In Section 3.2 these were denoted simply q and ρ, but we now find it more
convenient to denote in this section by q and ρ two “variables” with
0 ≤ q < ρ.
As pointed out in Section 3.1, the case where exp u(x) = 1{x≥τ } is of
special interest. In this case, we will prove that the system of equations (3.69),
(3.76) and (3.104) has a unique solution. The function (3.116) takes the form
√ √ 1 q
F (q, ρ) = αE log Pξ (z q + ξ ρ − q ≥ τ ) +
2ρ−q
1 h2
+ log(ρ − q) − κρ + (ρ − q) . (3.121)
2 2
We observe that
3.3 Controlling the Solutions of the RS Equations 225
 √ 
√ √ τ −z q
Pξ (z q + ξ ρ − q ≥ τ ) = N √ ,
ρ−q
where
N (x) = P(ξ ≥ x) .

We now fix once and for all τ ≥ 0.


Theorem 3.3.1. If α < 2 and F is given by (3.121) there is a unique solu-
tion q0 = q0 (α, κ, h), ρ0 = ρ0 (α, κ, h) to the equations
∂F ∂F
(q0 , ρ0 ) = 0 = (q0 , ρ0 ) . (3.122)
∂q ∂ρ

We define
 √ 
τ − z q0
RS0 (α) = αE log N √
ρ0 − q0
1 q0 1 h2
+ + log(ρ0 − q0 ) − κρ0 + (ρ0 − q0 ) . (3.123)
2 ρ0 − q0 2 2
The reader will recognize that this is F (q0 , ρ0 ), where F is defined in (3.121).
The value of κ and h will be kept implicit. (We recall that τ has been fixed
once and for all.) The main result of this chapter is as follows.
Theorem 3.3.2. Consider α0 < 2, 0 < κ0 < κ1 , h0 > 0, ε > 0. Then we
can find ε > 0 with the following property. Consider any concave function
u ≤ 0, with the following properties:
x ≥ τ ⇒ u(x) = 0 (3.124)
exp u(τ − ε ) ≤ ε (3.125)
u is four times differentiable and |u() | is bounded for 1 ≤  ≤ 4 . (3.126)
Then for N large enough, and if HN,M denotes the Hamiltonian (3.1) we
have
    
 1 
E log exp(−HN,M (σ))dσ − RS0 M + 1 log(2eπ)  ≤ ε (3.127)
 N N 2 

whenever κ0 ≤ κ ≤ κ1 , h ≤ h0 , M/N ≤ α0 .
In particular, we succeed in computing
1
lim lim E log exp(−HN,M (σ))dσ . (3.128)
u→1{x≥τ } N →∞,M/N →α N
In Volume II we will prove the very interesting fact that the limits can be
interchanged, solving the problem of computing the “part of the sphere SN
that belongs to the intersection of M random half-spaces”.
Besides Theorem 3.3.1, the proof of Theorem 3.3.2 requires the following.
226 3. The Shcherbina and Tirozzi Model

Proposition 3.3.3. Consider κ0 > 0, α0 < 2 and h0 > 0. Then we can find
a number C, depending only on κ0 , α0 and h0 , such that if κ ≥ κ0 , α ≤ α0
and h ≤ h0 , then for any concave function u ≤ 0 that satisfies (3.124), and
whenever q and ρ satisfy the system of equations (3.69), (3.76) and (3.104),
we have
1
q, ρ ≤ C ; ≤C. (3.129)
ρ−q
We recall that the numbers q0 and ρ0 are given by (3.122).
Corollary 3.3.4. Given 0 < κ0 < κ1 , α0 < 2, h0 > 0 and ε > 0, we can
find a number ε > 0 such that whenever the concave function u ≤ 0 satisfy
(3.124) and (3.125), whenever κ0 ≤ κ ≤ κ1 , h ≤ h0 and α ≤ α0 , given any
numbers 0 ≤ q ≤ ρ that satisfy the equations (3.69), (3.76) and (3.104) we
have
|q − q0 | ≤ ε ; |ρ − ρ0 | ≤ ε .
It is here that Theorem 3.3.1 is really needed. Without it, it seems very
difficult to control q and ρ.
Proof. This is a simple compactness argument now that we know (3.129).
We simply sketch the proof of this “soft” argument. Assume for contradiction
that we can find a sequence εn → 0, a sequence un of functions that satisfies
(3.124) and (3.125) for εn rather than ε , numbers κ0 ≤ κn ≤ κ1 , hn ≤ h0 ,
αn ≤ α0 , numbers qn and ρn that satisfy the corresponding equations (3.69),
(3.76) and (3.104), and are such that |qn − q0 | ≥ ε and |ρn − ρ0 | ≥ ε. By
Proposition 3.3.3 we have qn , ρn , 1/(qn − ρn ) ≤ C. This boudedness permits
us to take converging subsequences. So, without loss of generality we can
assume that the sequences κn , hn , αn , qn and ρn have limits called κ, h, α,
q and ρ respectively. Moreover 1/(q − ρ) < C, so in particular ρ < q. Finally
we have |q − q0 | ≥ ε and |ρ − ρ0 | ≥ ε. If one writes explicitly the equations
(3.122), it is obvious from the fact that (qn , ρn ) is a solution to the equations
(3.69), (3.76) and (3.104) (for κn and hn rather than for κ and h) that (q, ρ)
is a solution to these equations. But this is absurd, since by Theorem 3.3.1
one must then have q = q0 and ρ = ρ0 . 

Once this has been obtained the proof of Theorem 3.3.2 is easy following
the approach of the second proof of Theorem 2.4.2, so we complete it first.
We recall the bracket · t,∼ associated with the Hamiltonian (3.88). To
lighten notation we write · ∼ rather than · 1,∼ .
Lemma 3.3.5. Assume that the function u satisfies (3.7). Writing gi =
gi,M , we have
 
1 
E log exp u √ gi σi
N i≤N ∼

= E log Eξ exp u(z qN,M + ξ ρN,M − qN,M ) + R ,
3.3 Controlling the Solutions of the RS Equations 227

where
K(κ0 , h0 , D)
|R| ≤ √ .
N
Now that we have proved Theorem 3.1.18 this is simply an occurrence of the
general principle explained in Section 1.5. We compare a quantity of the type
(1.140) with the corresponding quantity (1.141) when f (x) = exp u(x) and
w(x) = log x, when μ is Gibbs’ measure.
Proof. We consider

v  √ √
Sv = gi σi + 1 − v(z qN,M + ξ ρN,M − qN,M )
N
i≤N

and
ϕ(v) = E log Eξ exp u(Sv ) ∼ .
We differentiate and integrate by parts to obtain:

(|R1,1 − ρN,M | + |R1,2 − qN,M |) exp(u(Sv1 ) + u(Sv2 )) ∼


|ϕ (v)| ≤ K(D)E
(Eξ exp u(Sv ) ∼ )2
= K(D)ν1,v (|R1,1 − ρN,M | + |R1,2 − qN,M |) .

We use (3.106) with t = 1 to get

|ϕ (v)| ≤ Kν(|R1,1 − ρN,M | + |R1,2 − qN,M |) + K exp(−N )

and we conclude with Theorem 3.1.18. 




Lemma 3.3.6. We have


 √ 
dRS0 τ − z q0 √ √
(α) = E log N √ = E log Pξ (z q0 + ξ ρ0 − q0 ≥ τ ) .
dα ρ0 − q0
(3.130)

Proof. Obvious by (3.122). 




Proof of Theorem 3.3.2. Let us write


1
pN,M = E log exp(−HN,M (σ))dσ (3.131)
N
and let us first consider the case M = 0. In that case
 
 π N/2 h2  2
exp −κσ + h
2
gi σi dσ = exp gi
κ 4κ
i≤N i≤N

and
1 π  h2
pN,M = log + .
2 κ 4κ
228 3. The Shcherbina and Tirozzi Model

When α = 0, we have r = r = 0, so (3.76) and (3.77) yield ρ0 − q0 = 1/(2κ),


q0 = h2 /(4κ2 ), and thus by straight forward algebra,
 
1 1 1 h2
RS0 (0) = log − + ,
2 2κ 2 4κ
so that (3.127) holds in that case. Next, we observe that
 
1 1 
pN,M − pN,M −1 = E log exp u √ gi,M σi .
N N ∼

Informally, the rest of the proof goes as follows. By Lemma 3.3.5 we have
 
1 
E log exp u √ gi,M σi
N ∼

 E log Eξ exp u(z qN,M + ξ ρN,M − qN,M ) .
Now by Propositions 3.2.4 and 3.2.12 the numbers qN,M and ρN,M are near
solutions of the system of equations (3.69), (3.76) and (3.104).
As N → ∞, (u being fixed) these quantities become close (uniformly on α)
to a true solution of these equations. Thus, by Corollary 3.3.4, and provided
u satisfies (3.124) and (3.125) and ε is small enough, we have qN,M  q0 (=
q0 (M/N, κ, h)) and ρN,M  ρ0 (= ρ0 (M/N, κ, h)) and thus

E log Eξ exp u(z qN,M + ξ ρN,M − qN,M )
√ √ √ √
 E log Eξ u(z q0 + ξ ρ0 − q0 )  E log Pξ (z q0 + ξ ρ0 − q0 ≥ τ ) ,
using again (3.124) and (3.125). Now (3.130) implies
1 √ √ M/N
d
E log Pξ (z q0 + ξ ρ0 − q0 ≥ τ )  RS0 (α)dα
N (M −1)/N dα
   
M M −1
= RS0 , κ, h − RS0 , κ, h .
N N
This chain of approximations yields
   
M M −1
pN,M − pN,M −1  RS0 , κ, h − RS0 , κ, h ,
N N
where  means with error ≤ ε/N . Summation over M of these relations
together with the case M = 0 yields the desired result.
It is straightforward to write an “ε-δ proof” following the previous scheme,
so there seems to be no point in doing it here. 

Our next goal is the proof of Proposition 3.3.3, that will reveal how the
initial condition α0 < 2 comes into play. Preparing for this proof, we consider
the function
1 e−x /2
2
d
A(x) = − log N (x) = √ , (3.132)
dx 2π N (x)
about which we collect simple facts.
3.3 Controlling the Solutions of the RS Equations 229

Lemma 3.3.7. We have


A(v) ≥ v (3.133)
A (v) = A(v) − vA(v) ≥ 0
2
(3.134)
vA(v)A (v) ≤ A(v) 2
(3.135)
vA(v) ≤ 1 + v 2 . (3.136)

Proof. To prove (3.133) we can assume v ≥ 0. Then


∞  2 ∞  2  2
t t v
v exp − dt ≤ t exp − dt = exp −
v 2 v 2 2

which is (3.133). The equality in (3.134) is straightforward, and the inequality


follows from (3.133) since A(v) ≥ 0.
Now (3.134) implies

vA(v)A (v) = A(v)2 (vA(v) − v 2 ) ,

so that (3.135) is equivalent to (3.136). Integrating by parts,


 2  2

t v √
t2 exp − dt = v exp − + 2πN (v)
v 2 2
∞    2
t2 v
t exp − dt = exp −
v 2 2
so that expanding the square and using the previous equalities we get
 2  2

t √ v
0≤ (t − v)2 exp − dt = (1 + v 2 ) 2πN (v) − v exp − .
v 2 2

This proves (3.136). 



let us observe that (3.133) and (3.136) mean that when x ≥ 0
 2  2
x 1 x 1 x
√ exp − ≤ P(ξ ≥ x) ≤ √ exp − , (3.137)
1 + x2 2π 2 x 2π 2

which becomes quite accurate as x → ∞.

Lemma 3.3.8. Consider numbers 0 ≤ q < ρ and a concave function u ≤ 0


with u(x) = 0 for x ≥ τ . Consider independent standard Gaussian r.v.s z
√ √ √ √
and ξ and set θ = z q + ξ ρ − q and Y = (τ − z q)/ ρ − q. Then
 2 "
Eξ ξ exp u(θ) Y2+L if Y ≥0
≤ (3.138)
Eξ exp u(θ) L if Y ≤0,

where L is a universal constant.


230 3. The Shcherbina and Tirozzi Model

Proof. We observe that, integrating by parts and since u ≥ 0,



Eξ ξ exp u(θ) = Eξ ρ − qu (θ) exp u(θ) ≥ 0 . (3.139)
Consider first the case where Y ≥ 0. Let U = Eξ (1{ξ<Y } exp u(θ)). Then,
since u ≤ 0,
Eξ ξ exp u(θ) = Eξ (ξ1{ξ<Y } exp u(θ)) + Eξ (ξ1{ξ≥Y } exp u(θ))
 
1 Y2
≤ Y U + Eξ (ξ1{ξ≥Y } ) = Y U + √ exp − . (3.140)
2π 2
Since u(θ) = 0 for ξ ≥ Y , (3.137) implies
 
Y 1 Y2
Eξ (1{ξ≥Y } exp u(θ)) = Pξ (ξ ≥ Y ) ≥ √ exp − ,
1 + Y 2 2π 2
and thus
Eξ exp u(θ) = U + Eξ (1{ξ≥Y } exp u(θ))
 
Y 1 Y2
≥U+ √ exp − . (3.141)
1 + Y 2 2π 2
Combining (3.140) and (3.141) we get
√ 
Y2
Eξ ξ exp u(θ) 2πY U + exp − 2
0≤ ≤√  Y2 . (3.142)
Eξ exp u(θ) Y
2πU + 1+Y 2 exp − 2

It is elementary that for numbers a, b > 0 we have


aY + b 1
Y
≤Y + .
a + 1+Y 2 b Y

Combining with (3.142) yields


Eξ ξ exp u(θ) 1
0≤ ≤Y + .
Eξ exp u(θ) Y
Taking squares proves (3.138) when Y ≥ 1. When Y ≤ 1 (and Y is not
necessarily ≥ 0) since u ≤ 0 we have

2
0 ≤ Eξ ξ exp u(θ) ≤ E|ξ| =
π
and, since u(x) = 0 for x ≥ τ ,
Eξ exp u(θ) ≥ Pξ (ξ ≥ Y ) ≥ Pξ (ξ ≥ 1)
and this finishes the proof. 

We bring forward the following trivial fact which seems to be at the root
of the condition “α ≤ 2”.
3.3 Controlling the Solutions of the RS Equations 231

Lemma 3.3.9. If z is a standard Gaussian r.v.,


1
lim E((z − b)2 1{z≤b} ) = . (3.143)
b→0 2
The following is also straightforward, where we recall (3.67) and (3.68).

Lemma 3.3.10. Given ρ ≥ q, consider the function



v(x) = log Eξ exp u(x + ξ ρ − q) .

Then, recalling the definitions (3.67) and (3.68) of Ψ and Ψ , we have v ≥ 0


and
√ √
Ψ (ρ, q) = αEv (z q)2 ; Ψ (ρ, q) − Ψ (ρ, q) = αEv (z q) . (3.144)

Proof of Proposition 3.3.3. From (3.76) and (3.77) we have


1 √
= 2κ + r − r = 2κ − αEv (z q) , (3.145)
ρ−q

where the last equality uses that r − r = Ψ (ρ, q) − Ψ (ρ, q) and (3.144). Inte-
gration by parts yields
√ 1 √
−Ev (z q) = − √ Ezv (z q) .
q

A direct computation proves that v ≥ 0 since u ≥ 0. Hence, using the


Cauchy-Schwarz inequality in the second line,
√ √
−Ezv (z q) ≤ −Ez1{z≤0} v (z q)

≤ (Ez 2 1{z≤0} )1/2 (Ev (z q)2 )1/2
1 √
= √ (Ev (z q)2 )1/2 .
2
Thus, combining the previous relations we obtain
1 α √
≤ 2κ + √ √ (Ev (z q)2 )1/2 . (3.146)
ρ−q 2 q

On the other hand, (3.144) implies



r = Ψ (ρ, q) = αEv (z q)2

and from (3.75) and the first part of (3.145) we deduce

r + h2 r √
q= ≥ = (ρ − q)2 r = α(ρ − q)2 Ev (z q)2 .
(2κ + r − r)2 (2κ + r − r)2
232 3. The Shcherbina and Tirozzi Model

This inequality can be rewritten as



α √ α 1
√ √ (Ev (z q)2 )1/2 ≤ ,
2 q 2 ρ−q

and combining with (3.146) yields



1 α 1
≤ 2κ + ,
ρ−q 2 ρ−q

so that (ρ − q)−1 ≤ 2κ0 (1 − α0 /2)−1 . The exact form of the right-hand


side is not relevant, but this shows that 1/(ρ − q) is bounded by a number
depending only on α0 and κ0 .
We now try to bound similarly ρ and q. Since r ≤ r, we have ρ − q ≤
(2κ)−1 ≤ (2κ0 )−1 , so the issue is to bound q. Using (3.104) and (3.145) again,
and since r ≥ r,

r + h2 h20
q= ≤ + r(ρ − q)2 . (3.147)
(2κ + r − r)2 (2κ0 )2

Recalling (3.67) and using Lemma 3.3.8 we get


 2
α Eξ ξ exp u(θ)
r = Ψ (ρ, q) = E
ρ−q Eξ exp u(θ)
α
≤ (L + E(Y 2 1{Y ≥0} )) (3.148)
ρ−q
where √ 
τ −z q q τ 
Y = √ =− z−√
ρ−q ρ−q q
so that Y satisfies
 2 
q τ
E(Y 2 1{Y ≥0} ) = E z−√ 1{z≤τ /√q} .
ρ−q q

Since α0 < 2 we can find a > 1/2 with aα0 < 1. Then by (3.143) there is a
number q(τ, a) satisfying
 2 
τ aq
q ≥ q(τ, a) ⇒ E z−√ 1{z≤τ / q} < a ⇒ E(Y 2 1{Y ≥0} ) ≤
√ .
q ρ−q

Thus, using (3.148) we get, using also that ρ − q ≤ 1/(2κ0 ) in the second
inequality,
αL
q ≥ q(τ, a) ⇒ r(ρ − q)2 ≤ αL(ρ − q) + aαq ≤ + aαq
2κ0
3.3 Controlling the Solutions of the RS Equations 233

and combining with (3.147) yields

h20 αL
q ≥ q(τ, a) ⇒ q ≤ + + aαq ,
(2κ0 )2 2κ0
so that
h20 αL
q ≥ q(τ, a) ⇒ (1 − aα)q ≤ 2
+ .
(2κ0 ) 2κ0
Since aα ≤ aα0 < 1, this proves that q (and hence ρ) is bounded by a number
depending only on h0 , κ0 and α0 . 

It remains to prove Theorem 3.3.1. The proof is unrelated to the methods
of this work. While it is not difficult to follow line by line, the author cannot
really explain why it works (or how Shcherbina and Tirozzi could ever find
it). The need for a more general and enlightening approach is rather keen
here.
We make the change of variable x = q/(ρ − q), so that q = xρ/(1 + x),
ρ − q = ρ/(1 + x), and
 √ 
τ 1+x √ x
F (q, ρ) = G(x, ρ) := α E log N √ −z x +
ρ 2
1 1 h2 ρ
+ log ρ − log(1 + x) − κρ + . (3.149)
2 2 2(1 + x)
Proposition 3.3.11. For x > 0 and ρ > 0 we have
 
∂2G ∂ x + 1 ∂G
< 0 ; >0. (3.150)
∂ρ2 ∂x x ∂x
Corollary 3.3.12. a) Given ρ > 0 there exists at most one value x1 such
that (∂G/∂x)(x1 , ρ) = 0. If such a value exists, the function x → G(q, ρ)
attains its minimum at x1 .
b) Given ρ > 0 there exists at most one value q1 such that (∂F/∂q)(q1 , ρ) =
0. If such a value exists, the function q → F (q, ρ) attains its minimum at q1 .
Proof. a) By the second part of (3.150) we have ∂G(x, ρ)/∂x < 0 for x < x1
while ∂G(x, ρ)/∂x > 0 for x > x1 .
b) Follows from a) since at given ρ the change of variable x = q/(ρ − q)
is monotonic. 

Proof of Theorem 3.3.1. Suppose that we have ∂G/∂x = 0 and ∂G/∂ρ = 0
at the points (x1 , ρ1 ) and (x2 , ρ2 ). Then, since ∂ 2 G/∂ρ2 < 0, we have
G(x2 , ρ1 ) < G(x2 , ρ2 ) unless ρ2 = ρ1 . By the first part of Corollary
3.3.12 used for ρ = ρ1 we have G(x1 , ρ1 ) < G(x2 , ρ1 ) unless x1 = x2 . So
G(x1 , ρ1 ) < G(x2 , ρ2 ) unless (x1 , ρ1 ) = (x2 , ρ2 ). Reversing the argument
shows that (x1 , ρ1 ) = (x2 , ρ2 ). 
√ √ √
We write W = τ 1 + x/ ρ − z x.
234 3. The Shcherbina and Tirozzi Model

Lemma 3.3.13. Recalling the definition (3.132) of the function A(x), we


have
∂G √ 1 h2
2 = ατ 1 + xρ−3/2 E A(W ) + − 2κ + (3.151)
∂ρ ρ 1+x
and  
x+1 ∂G α h2 ρ
2 = − EA2 (W ) + 1 − . (3.152)
x ∂x x x(1 + x)
Proof. We differentiate (3.149) in ρ to obtain (3.151). To prove (3.152) we
differentiate (3.149) in x to obtain
  
∂G z τ
2 = −αE − √ + √ √ A(W )
∂x x 1+x ρ
1 h2 ρ
+ 1− − . (3.153)
1 + x (1 + x)2

Now, by integration by parts and (3.134)


 
z
E √ A(W ) = −E (A (W )) = E (W A(W )) − E A(W )2
x
 √  
τ 1+x √
=E √ − z x A(W ) − E A(W )2
ρ

and thus
  √
z τ 1+x
(1 + x) E √ A(W ) = √ E A(W ) − E A(W )2 . (3.154)
x ρ

Plugging back this value in (3.153) yields (3.152). 

Proof of Proposition 3.3.11. We differentiate (3.151) in ρ to obtain


√  √ 2
∂2G 3 ατ 1 + x τ 1+x 1
2 2
= − 5/2
E A(W ) − α 3/2
EA (W ) − 2
∂ρ 2 ρ ρ ρ

and this is ≤ 0 since A ≥ 0. We differentiate (3.152) in x to get


    
∂ 1 + x ∂G α τ z
2 =− E √ √ − √ A(W )A (W )
∂x x ∂x x 1+x ρ x
α h2 ρ h2 ρ
+ 2
E A(W )2 + 2 − . (3.155)
x x (1 + x)2

Now, we observe the identity


τ z W τ
√ √ −√ = − √ √ ,
1+x ρ x x x 1+x ρ
3.4 Notes and Comments 235

so that (3.155) yields


 
∂ 1+x ατ
2 G = 2√ √ E (A(W )A (W )) (3.156)
∂x x x 1+x ρ
 
α 1 1
+ 2 E (A(W )2 − W A(W )A (W )) + h2 ρ 2

x x (1 + x)2

and all the terms are ≥ 0 by (3.135) and since x ≥ 0. 

3.4 Notes and Comments


The results of this chapter are essentially proved in [133] and [134]. The way
Shcherbina and Tirozzi [133] obtain the replica-symmetric equations seems
genuinely different from what I do. It would be nice to make explicit the
rationale behind their approach, but as I am allergic to the style in which
their papers are written I find it much easier to discover my own proofs than
to decipher their work.
Instead of basing the approach on Theorem 3.1.4 one can also use the
Brascamp-Lieb inequalities [37]. In dimension 1, the inequality is stated in
(3.158) below. In more dimensions, it is convenient to use a simplified form of
these inequalities as follows. Consider a measure μ on RN as in Theorem 3.1.4.
Consider a function f on RN (that need not be Lipschitz). Then, if ∇f
denotes the gradient of f ,
 2
1
f 2 dμ − f dμ ≤ ∇f 2 dμ . (3.157)
κ

This inequality can be iterated. If f has a Lipschitz constant A as in (3.15)


then ∇f  ≤ A. Using (3.157) for exp(λf /2) rather than f yields
   2
λf λ2 A2
exp λf dμ − exp dμ ≤ exp λf dμ
2 κ

so that if λ2 A2 ≤ κ,
 2
1 λf
exp λf dμ ≤ exp dμ .
1− λ 2 A2
κ
2

By iteration we get
 2    2k
 1 λf
exp λf dμ ≤ exp dμ
0≤<k
1− λ 2 A2
κ22
2k

so that when λ2 A2 ≤ κ/2 this implies


236 3. The Shcherbina and Tirozzi Model
    2k
λf
exp λf dμ ≤ L exp dμ .
2k

Now the inequality |ex − x − 1| ≤ x2 ex shows that if f dμ = 0, the right-
hand side goes to L as k → ∞, so that if f has a Lipschitz constant A, we
have  
exp λ f − f dμ dμ ≤ L

whenever |λ| = κ/2A, a result that is not far from (3.16), at least for
our purposes here. The reader should consult [11] for more on the relations
between these different inequalities.
Inequality (3.157) follows from a nice general result, namely that in di-
mension 1,
 2
f2
f dμ −
2
f dμ ≤ dμ . (3.158)
H
(The Brascamp-Lieb inequality in dimension 1.) For f (x) = x, −H(x) =
u(x) − x2 /2 this implies (3.72); the proof we give is just a simplified proof of
(3.158) in the special case we need.
4. The Hopfield Model

4.1 Introduction: The Curie-Weiss Model

We go back to the case where the spins take values in {−1, 1}. The Curie-
Weiss model is the “canonical” model for mean-field (deterministic) ferro-
magnetic interaction, i.e. interaction where the spins tend to align with each
other. The simplest Hamiltonian that will achieve this will  contain a term
σi σj for each pair of spins, so it will be (proportional to) i<j σi σj . Equiv-
alently, we consider the Hamiltonian
 2  2
βN 1  β 
− HN (σ) = σi = σi . (4.1)
2 N 2N
i≤N i≤N

This is a simple, almost trivial model, that can be studied in considerable


detail (see [54]). It is not our purpose to do this, but we will explain the basic
facts that are relevant to this chapter. The partition function is given by
   
β 2
ZN (β) = exp(−HN (σ)) = Ak exp k , (4.2)
2N
σ |k|≤N

where "  %
Ak = card σ ∈ ΣN ; σi = k .
i≤N

Consider the function


1 
I(t) = (1 + t) log(1 + t) + (1 − t) log(1 − t) , (4.3)
2
which is defined for −1 < t < 1, and can be defined for −1 ≤ t ≤ 1 by setting
I(−1) = I(1) = log 2. We recall (see (A.29)) that
  
k
Ak ≤ 2 exp −N I
N
, (4.4)
N

so by (4.2) we have, bounding the sum in the right-hand side by the number
of terms (i.e. 2N + 1) times the largest term,

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 237
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 4, © Springer-Verlag Berlin Heidelberg 2011
238 4. The Hopfield Model

  
βt2
ZN (β) ≤ (2N + 1)2 exp N max
N
− I(t) . (4.5)
t 2
Also, by (A.30), when N + k is even we have
  
2N k
Ak ≥ √ exp −N I , (4.6)
L N N

and thus
    
2N k β 2
ZN (β) ≥ max √ exp −N I exp k .
k+N even L N N 2N

Finally we get
 2 
1 βt
log ZN (β) = log 2 + max − I(t) + o(1) , (4.7)
N t∈[−1,1] 2

where o(1) is a quantity such that o(1) → 0 as N → ∞. The function


βt2 /2 − I(t) attains its maximum at a point t such that
 
1 1+t
βt = I (t) = log (4.8)
2 1−t

or, equivalently,
1+t
= exp(2βt) ,
1−t
i.e.
exp(2βt) − 1
t= = thβt . (4.9)
exp(2βt) + 1
If β ≤ 1, the only root of (4.9) is t = 0. For β > 1, there is a unique root
m∗ > 0. That is, m∗ = m∗ (β) satisfies

thβm∗ = m∗ . (4.10)

Since thx = x − x3 /3 + x3 o(x), where o(x) → 0 as x → 0, (4.10) implies

β 3 m∗3
βm∗ − + β 3 m∗3 o(βm∗ ) = m∗ ,
3
so that
m∗ (β) ∼ 3(β − 1) as β → 1+ . (4.11)
We define
βm∗2
b∗ = − I(m∗ ) (4.12)
2
so (4.7) reads
4.1 Introduction: The Curie-Weiss Model 239

1
log ZN (β) = log 2 + b∗ + o(1) . (4.13)
N
When β > 1, as N → ∞, Gibbs’ measure
 is essentially supported by the
set of configurations σ for which N −1 i≤N σi  ±m∗ . This is because for
a subset U of R,
" %
1  −1

GN σ; σi ∈ U = ZN (β) exp(−HN (σ)) ,
N
i≤N

where the summation is over all sequences for which N −1 i≤N σi ∈ U . Thus,
using (4.4), and bounding the sum in the second line by (2N + 1) times a
bound for the largest term,
" %  2
1  1  βN k
GN σ; σi ∈ U = Ak exp (4.14)
N ZN (β) 2 N
i≤N k/N ∈U
  2 
2N βt
≤ (2N + 1) exp N sup − I(t) .
ZN (β) t∈U 2
If we take
U = {t ; |t ± m∗ | ≥ ε}
where ε is given (does not depend on N ) then
 2   2 
βt βt
sup − I(t) < max − I(t) ,
t∈U 2 t∈[0,1] 2

so (4.7) shows that the right-hand side of (4.14) goes to zero as N → ∞.


Thus, (when weighted with Gibbs’ measure), the set of configurations
 is
made of two different pieces: the configurations for which N −1 i≤N σi  m∗

and those for which N −1 i≤N σi  −m∗ . The global invariance of Gibbs’
measure by the transformation σ → −σ shows that these two pieces have
the same weight. The system “spontaneously breaks down in two states”.
This situation changes drastically if one adds an external field, i.e. one
considers the Hamiltonian
 2
βN 1  
− HN (σ) = σi + h σi , (4.15)
2 N
i≤N i≤N

where h > 0. To see where Gibbs’ measure lies, one should now maximize
β 2
f (t) = t − I(t) + th .
2
This maximum is attained at a point 0 < t < 1 because f (t) > f (−t) for
t > 0; this point t must satisfy βt + h = I (t), i.e.

t = th(βt + h) , (4.16)
240 4. The Hopfield Model

and we will see that there is a unique positive root to this equation. The
external field “breaks the symmetry between the two states”.
Consider now a random sequence (ηi )i≤N , ηi = ±1, and the random
Hamiltonian  2
β 
− HN (σ) = ηi σi . (4.17)
2N
i≤N

The randomness is not intrinsic, and can be removed by setting σi = ηi σi .


We can describe the Hamiltonian (4.17) by saying that it tends to align the
spins (σi )i≤N with the sequence (ηi )i≤N or the sequence (−ηi )i≤N rather
than with the sequences (1, . . . , 1) or (−1, . . . , −1).
The situation is much more interesting if we put in the Hamiltonian sev-
eral terms that “pull in different directions”. Consider numbers (ηi,k )i≤N,k≤M ,
ηi,k = ±1, and the Hamiltonian
 2
βN  1 
− HN,M (σ) = ηi,k σi . (4.18)
2 N
k≤M i≤N

We will always write


1 
mk = mk (σ) = ηi,k σi . (4.19)
N
i≤N

When β > 1, the effect of the term βN m2k /2 of (4.18) is to tend to align
the sequence σ with the sequence (ηi,k )i≤N or the sequence (−ηi,k )i≤N . If
the sequences (ηi,k )i≤N are really different as k varies, this creates conflict.
For this reason the case β > 1 seems the most interesting.
The Hopfield model is the system with random Hamiltonian (4.18), when
the numbers ηi,k are independent Bernoulli r.v.s, that is are such that P(ηi,k =
±1) = 1/2. It simplifies notation to observe that, equivalently, one can assume
"
ηi,1 = 1 ∀ i ≤ N ; the numbers (ηi,k )i≤N,2≤k≤M
(4.20)
are independent r.v.s with P(ηi,k = ±1) = 1/2 .
This assumption is made throughout this chapter and Chapter 10. The
Hopfield model is already of interest if we fix M and let N → ∞. We shall
however focus on the more challenging case where N → ∞, M → ∞, M/N →
α, α > 0.
The Hopfield model (with Hamiltonian (4.18), that is, without external
field) has a “high -temperature phase” somewhat similar to the phase β < 1,
h = 0 of the SK model. This phase occurs in the region

β(1 + α) < 1 (4.21)

and it is quite interesting to see how this condition occurs. We will refer the
reader to Section 2 of [142] for this, because this study does not use the cavity
method and is somewhat distinct from the main theme we pursue here.
4.1 Introduction: The Curie-Weiss Model 241

Another topic of interest is the “zero-temperature” problem, i.e. the study


of the (random) function
σ → −HN (σ)
on ΣN . We will not study this topic either because we feel that the current
results in this direction are too far from their optimal form. We refer the
reader to [112], [97], [56] for increasingly more sophisticated results.
Compared with the SK model, the Hopfield model brings two kinds of
new features. One is the ferromagnetic interaction (4.1). For β > 1 and β
close to one, this interaction creates difficulties that arise from the fact that
the root of the equation m∗ = thβm∗ is “not so stable”, in the sense that the
slope of the tangent to the function t → thβt at t = m∗ gets close to 1 as
β → 1. This simple fact creates many of the technical difficulties inherent to
the Hopfield model. Another difference between the Hopfield model and the
SK model is that the nature of the disorder is not exactly the same.
It would be more pedagogical, before attacking the Hopfield model, to
study a disordered system that presents the difficulties due to the ferromag-
netic interaction, but with a familiar disorder. Such a model exists. It is the
SK model with ferromagnetic interaction. The Hamiltonian is
 2
β1 N 1  β2  
− HN (σ) = σi + √ gij σi σj + h σi . (4.22)
2 N N i<j
i≤N i≤N

Space (and energy!) limitations do not allow the study of this model here.

Research Problem 4.1.1. (Level 1) Extend the results of Chapter 1 to the


Hamiltonian (4.22).

What is really interesting is not to study this model for β1 , β2 small, but,
given β1 (possibly large) to study the system for β2 as large as possible. The
“replica-symmetric” equations for this model are

μ = E th(β2 z q + β1 μ + h) (4.23)

q = E th2 (β2 z q + β1 μ + h) . (4.24)
Throughout the chapter we will consider the Hopfield model with external
field, so that the Hamiltonian is
 2
Nβ  1  
− HN,M (σ) = ηi,k σi + h σi
2 N
k≤M i≤N i≤N
Nβ  2
= mk + N hm1 . (4.25)
2
k≤M

It is of course important that we have chosen ηi,1 = 1, so that the external


field “pulls in the same direction as m1 ”. Among the values of k, when h = 0
242 4. The Hopfield Model

we can expect the value k = 1 to play a special role. Without loss of generality
we can and do assume h ≥ 0.
We observe that the function f (x) = th(βx + h) is concave for x ≥ 0. If
h > 0 we have f (0) > 0. If h = 0 and β > 1 we have f (0) = 0 and f (0) > 1.
Thus if β > 1 there is a unique positive solution to (4.16). Throughout the
chapter, we denote by m∗ = m∗ (β, h) this solution, i.e.

m∗ = th(βm∗ + h) . (4.26)

We set
β ∗2
b∗ = log ch(βm∗ + h) − m . (4.27)
2
The expression of b∗ given here is appropriate for the proof of Lemma 4.1.2
below. It is not obvious that this is the same as the value (4.12), which, in
the presence of the external field, is

βm∗2
+ m∗ h − I(m∗ ) . (4.28)
2
To prove the equality of (4.27) and (4.28), we observe that (A.26) implies

I(x) = max(λx − log chλ)


λ

and that the maximum is obtained for thλ = x, so that, if x = m∗ =


th(m∗ β + h), λ is m∗ β + h and hence I(m∗ ) = m∗2 β + m∗ h − log ch(m∗ β + h),
so that the quantities (4.27) and (4.28) coincide.
Lemma 4.1.2. If β > 1 we have

| log ZN,1 − N (b∗ + log 2)| ≤ K(β, h) .

Of course ZN,M = ZN,M (β, h) denotes the partition function of the Hamil-
tonian (4.25). Thus ZN,1 is the partition function of the Curie-Weiss model
with external field. The proof serves as an introduction to the method of
Section 4.3. It is much more effective and accurate than the more natural
method leading to (4.13). The result is also true for β < 1 if we define m∗ by
(4.26) for h > 0 and m∗ = 0 for h = 0. This is left as an exercise.
Proof. We start with the identity (see (A.6))

a2
E exp ag = exp
2
whenever g is standard Gaussian r.v., so that

ZN,1 = E exp( N βgm1 + N hm1 ) .
σ

−1

Now, since m1 = N i≤N σi we have, using (1.30) in the second equality,
4.1 Introduction: The Curie-Weiss Model 243

    
β
exp( N βgm1 + N hm1 ) = exp σi g+h
σ σ
N
i≤N
 N
N β
= 2 ch g+h ,
N

and therefore  N
N β
ZN,1 = 2 E ch g+h .
N
Thus
   
1 β t2
ZN,1=2 √
N
exp N log ch t+h − dt
2π R N 2
  
Nβ βz 2
= 2N exp N log ch(βz + h) − dz
2π R 2

with the change of variable t = N βz. The function z → log ch(βz + h) −
βz 2 /2 attains its maximum at the point z such that th(βz + h) = z, i.e.
z = m∗ , and this maximum is b∗ . Thus

ZN,1 = 2N exp(N b∗ )AN (4.29)

where 

AN = exp N ψ(z)dz ,
2π R
for
βz 2
ψ(z) = log ch(βz + h) − − b∗ .
2
To finish the proof we will show that there is a number K such that
1
ψ(z) ≤ − (z − m∗ )2 . (4.30)
K

Making the change of variable z = m∗ + x/ N then implies easily that
log AN stays bounded as N → ∞, and (4.29) concludes the proof.
The proof of (4.30) is elementary and tedious. We observe that the func-
tion ψ satisfies ψ(m∗ ) = ψ (m∗ ) = 0. Also, the function

ψ (z) = β(th(βz + h) − z)

is strictly concave for z ≥ 0 and is ≥ 0 for z < m∗ , so that its derivative


at z = m∗ must be < 0, i.e. ψ (m∗ ) < 0. This proves that (4.30) holds for
z close to m∗ . Next, we observe that ψ(z) < 0 if z = m∗ . For z ≥ 0 this
follows from the fact that ψ (z) > 0 for 0 < z < m∗ while ψ (z) < 0 for
z > m∗ , and for z < 0 this follows from the fact that ψ(z) < ψ(−z) ≤ 0.
244 4. The Hopfield Model

Since ψ(z) < −β(z − m∗ )2 /4 for large z, it follows that (4.30) holds for all
values of z. 
The Hopfield model has a kind of singularity for β = 1. In that case,
some understanding has been gained only when M/N → 0, see [154] and
the references therein to earlier work. These results again do not rely on the
cavity method and are not reproduced here. Because of that singularity, we
study the Hopfield model only for β = 1. Our efforts in the next sections
concentrate on the most interesting case, i.e. β > 1. We will explain why
the case β < 1 is several orders of magnitude easier than the case β > 1.
It is still however not trivial. This is because the special methods that allow
the control of the Hopfield model without external field under the condition
(4.21) break down in the presence of an external field.
When studying the Hopfield model, we will think of N and M as large
but fixed. Throughout the chapter we write
M
α= .
N
The model then depends on the parameters (N, α, β, h).

Exercise 4.1.3. Prove that there exists a large enough universal constant L
such that one can control the Hopfield model with external field in a region
of the type β < 1, α ≤ (1 − β)2 /L.

Of course this exercise should be completed only after reading some of the
present chapter, and in particular Theorem 4.2.4 below. On the other hand,
even if β < 1, when h = 0, reaching the largest possible value of α for which
there is “high-temperature” behavior is likely to be a level 3 problem.

4.2 Local Convexity and the Hubbard-Stratonovitch


Transform

We recall the Hamiltonian (4.25):

Nβ  2
− HN,M (σ) = mk (σ) + N hm1 (σ) . (4.31)
2
k≤M

Since it is defined entirely as a function of the quantities (mk (σ))k≤M (defined


in (4.19)), these should be important objects.
Consider the image G of the Gibbs measure G under the random map

σ → m(σ) := (mk (σ))k≤M . (4.32)

This is a random probability measure on RM .


4.2 Local Convexity and the Hubbard-Stratonovitch Transform 245

As an auxiliary tool, we consider the probability measure γ on RM , of


density W exp(−βN z2 /2) with respect to Lebesgue measure on RM , where
z is the Euclidean norm of z and W is the normalizing factor

W = (N β/2π)M/2 . (4.33)

(This notation W will be used throughout the present chapter.) It will be


useful to replace G by its convolution G = G ∗ γ with γ. This method is
called the Hubbard-Stratonovitch transform. It is an elaboration of the trick
used in Lemma 4.1.2.
It is useful to think of G as a small perturbation of G , an idea that we
will make precise later. The reason why G is more convenient than G is that
it has a simple density with respect to Lebesgue measure. To see this, we
consider the vector
η i = (ηi,k )k≤M
of RM , and the (random) function ψ on RM given by
Nβ 
ψ(z) = − z2 + log ch(βη i · z + h) , (4.34)
2
i≤N
 
where of course z2 = k≤M zk2 and η i · z = k≤M ηi,k zk . This function ψ
is a multidimensional generalization of the function log ch(βz + h) − β 2 z 2 /2
used in the proof of Lemma 4.1.2.
Lemma 4.2.1. The probability G has a density
−1
W 2N ZN,M exp ψ(z)

with respect to Lebesgue’s measure, where ZN,M is the partition function,



ZN,M = exp(−HN,M (σ)) .
σ

Proof. If we consider the positive measure δ consisting of a mass a at a point


x, the density of δ ∗ γ at a point z is given by
 
βN
aW exp − z − x .
2
2
Since for each σ the probability measure G gives mass
 
1 1 Nβ
exp(−HN,M (σ)) = exp m(σ) + N hm1 (σ)
2
ZN,M ZN,M 2

to the point m(σ) = (mk (σ))k≤M , the density at z of G ∗ γ is


 
1  Nβ Nβ
W exp m(σ)2 + N hm1 (σ) − z − m(σ)2 .
ZN,M σ 2 2
246 4. The Hopfield Model

This is
 
W Nβ
exp − z2 exp(N βz · m(σ) + N hm1 (σ)) .
ZN,M 2 σ

Now
   
N βz · m(σ) + N hm1 (σ) = β zk ηi,k σi +h σi
k≤M i≤N i≤N
  
= σi β zk ηi,k + h σi
i≤N k≤M i≤N

= σi (βη i · z + h) ,
i≤N

and therefore
 
exp(N βz · m(σ) + N hm1 (σ)) = 2N ch(βη i · z + h)
σ i≤N

= 2 expN
log ch(βη i · z + h) ,
i≤N

which finishes the proof. 



In the present chapter we largely follow an approach invented by Bovier
and Gayrard. The basic idea is to use the tools of Section 3.1 to control the
overlaps. This approach is made possible by the following convexity property,
that was also discovered by Bovier and Gayrard. We denote by (ek )k≤M the
canonical basis of RM .
Let us recall that everywhere in this chapter we write α = M/N .

Theorem 4.2.2. There exists a number L with the following property. Given
β > 1, there exists a number κ > 0 with the following property. Assume
that α ≤ m∗4 /Lβ. Then there exists a number K such that with probability
≥ 1 − K exp(−N/K), the function z → ψ(z) + κN z2 is concave in the
region " %
∗ m∗
z ; z − m e1  ≤ . (4.35)
L(1 + log β)
Here, and everywhere in this chapter, K denotes a number that does not
depend on N or M (so that K never depends on α = M/N ). In the present
case, K depends only on β and h. As usual the letter L denotes a univer-
sal constant, that certainly need not be the same at each occurrence. We
will very often omit the sentence “There exists a number L with the follow-
ing property” and the sentence “There exists a number K” in subsequent
statements.
4.2 Local Convexity and the Hubbard-Stratonovitch Transform 247

The point of Theorem 4.2.2 is that the function z2 is convex, so that
the meaning of this theorem is that in the region (4.35) the function ψ is
sufficiently concave that it will satisfy (3.21), opening the way to the use
of Theorem 3.1.4. The conditions α ≤ m∗4 /Lβ and (4.35) are by no means
intuitive, but are the result of a careful analysis.
Even though Theorem 4.2.2 will not be used before Section 4.5 we will
present the proof now, since it is such a crucial result for the present approach
(or the other hand, when we return to the study of the Hopfield model in
Chapter 10 this result will no longer be needed). We must not hide the fact
that this proof uses ideas from probability theory, which, while elementary,
have been pushed quite far. This is also the case of the results of Section 4.3.
These proofs contain no “spin glasses ideas”. Therefore the reader who finds
these proofs difficult should simply skip them all. In Section 4.4 page 272,
matters become quite easier.
Throughout the book we will use the letter Ω to denote an event (so we
do not follow the standard probability notation, which is to denote by Ω the
entire underlying probability space).

Definition 4.2.3. We say that an event Ω occurs with overwhelming prob-


ability if P (Ω) ≥ 1 − K exp(−N/K) where K does not depend on N or
M.

Of course the event Ω = ΩN,M depends on N and M , so it would be


more formal to say that “a family of events ΩN,M occurs with overwhelming
probability”, but it seems better to be a bit informal than pedantic.
Using Definition 4.2.3, the second sentence of Theorem 4.2.2 reformu-
lates as “With overwhelming probability the function z → ψ(z) + κN z2 is
concave in the region (4.35).”
Maybe it will help to mention that one of our goals is, given β and h, to
control the Hopfield model uniformly over all the values of M and N with
α = M/N ≤ α0 (β, h) for a certain number α0 (β, h) (as large as we can
achieve). This will be technically important in Section 10.8, and is one of the
reasons why we insist that K does not depend on N or M .
As a warm-up, and in order to make the point that things are so much
simpler when β < 1 we shall prove the following.

Theorem 4.2.4. Given β < 1, there exists a number κ > 0 with the follow-
ing property. Assume that α ≤ (β − 1)2 /L. Then with overwhelming proba-
bility the function ψ(z) + κN z2 is concave.

Again, we have omitted the sentence “There exists a number L such that
the following holds”. In Theorem 4.2.4 the constant K implicit in the words
“with overwhelming probability” depends only on β.
To prove that a function ϕ is concave in a convex domain, we prove
2
that at each point w of this domain the second differential Dw of ϕ satisfies
248 4. The Hopfield Model

Dw2
(v, v) ≤ 0 for each vector v. If differentials are not familiar to you, the
2
quantity Dw (v, v) is simply the second derivative at t = 0 of the function
t → ϕ(w + tv).
Proof of Theorem 4.2.4. Let us set z = m∗ e1 + w, and denote by Dw 2

the second differential of ψ at the point z, so that, for v ∈ R (and since


M

ηi,1 = 1 for each i),


 1
2
Dw (v, v) = −βN v2 + β 2 2 ∗ + h + βη · w)
(η i · v)2
i≤N
ch (βm i
 
≤ β −N v2 + β (η i · v)2 . (4.36)
i≤N

It follows from Corollary A.9.4 that (there exists a number L such that) if
M ≤ (1 − β)2 N/L, with overwhelming probability one has

∀v , (η i · v)2 ≤ N (1 + (1 − β))v2 ,
i≤N

and therefore Dw2


(v, v) ≤ −β(1 − β)2 N v2 , so that if κ = β(1 − β)2 , the
function ψ(z) + κN z2 is concave. 

To continue the study of the case β < 1 and complete Exercise 4.1.3, the
reader can go directly to Section 4.5. Through the rest of this chapter, we
assume that β > 1.
Before the real work starts, we need some simple facts about the behavior
of m∗ . These facts are needed to get a qualitatively correct dependence of α
on β, but are otherwise not fundamental.
Lemma 4.2.5. The quantity m∗ (β, h) increases as β or h increases. More-
over we have
β ≥ 2 ⇒ β(1 − m∗2 ) ≤ L exp(−β/L) (4.37)
m∗2
≤ a∗ := 1 − β(1 − m∗2 ) ≤ m∗2 . (4.38)
L
Proof. We observe that if z ≥ 0 then

z ≤ th(βz + h) ⇐⇒ z ≤ m∗ (β, h) . (4.39)

Now if β ≥ β and h ≥ h we have

m∗ (β, h) = th(βm∗ (β, h) + h) ≤ th(β m∗ (β, h) + h ) ,

and therefore m∗ (β, h) ≤ m∗ (β , h ) by (4.39), so that the quantity m∗ (β, h)


increases as β or h increases. To prove (4.37) we observe that m∗ =
m∗ (β, h) ≥ m∗ (2, 0) and hence

m∗ = th(βm∗ + h) ≥ th(βm∗ (2, 0))


4.2 Local Convexity and the Hubbard-Stratonovitch Transform 249

and consequently,
 
β β
β(1 − m∗2 ) ≤ ≤ L exp − .
ch2 (βm∗ (2, 0)) L

The right-hand side inequality of (4.38) holds since 1 − β(1 − m∗2 ) ≤ 1 −


(1 − m∗2 ) = m∗2 . To prove the left-hand side inequality of (4.38), we observe
that, since this inequality is equivalent to β − 1 ≤ (β − 1/L)m∗2 , and since
m∗ (β, h) increases with h, we can assume h = 0. We then observe that for
x > 0 we have
x 2x sh(2x) − 2x 1 x2
1− 2 = 1 − sh(2x) = ≥ ,
thxch x sh(2x) L 1 + x2
as is seen by studying the behavior at x → 0 and x → ∞. Taking x = βm∗
where m∗ = m∗ (β, 0), we get, since m∗ = thβm∗ ,
β βm∗ βm∗
1 − β(1 − m∗2 ) = 1 − =1− =1−
ch βm∗ 2
m∗ ch βm∗
2
thβm∗ ch2 βm∗
x 1 x2 1 β 2 m∗2 1
= 1− 2 ≥ 2
≥ 2)
≥ m∗2 ,
thxch x L 1 + x L (1 + β L

using that 1 + x2 ≤ 1 + β 2 . 

Theorem 4.2.2 asserts that with overwhelming probability we control the
Hessian (= the second differential) of ψ over the entire region (4.35). Con-
trolling the Hessian at a given point with overwhelming probability is easy,
but controlling at the same time every point of a region is distinctly more
difficult, and it is not surprising that this should require significant work. The
key to our approach is the following.
Proposition 4.2.6. We can find numbers L and L1 with the following prop-
erty. Consider 0 < a < 1 and b ≥ L1 log(2/a). Assume that α ≤ a2 .
Then the following event occurs with probability ≥ 1 − L exp(−N a2 ): for each
w, w ∈ RM , we have

(η i · v)2 ≤ LN a2 v2 , (4.40)
i∈J(w)

where
J(w) = {i ≤ N ; |η i · w| ≥ bw} . (4.41)
Tounderstand this statement, we note that E(η i · z)2 = z2 , so that
E i≤N (η i · v)2 = N v2 . Also when b  1, and since E(η i · w)2 = w2 ,
it is rare that |η i · w| ≥ bw, so the set J(w) has a tendency to be a rather
small subset of {1, . . . , N }, and it is much easier to control in (4.40) the sum
over J(w) rather than the sum over {1, . . . , N }. The difficulty of course is to
find a statement that holds for all v and w.
250 4. The Hopfield Model

Proof of Theorem 4.2.2. As in the proof of Theorem 4.2.4 we set z =


m∗ e1 + w, we denote by Dw
2
the second differential of ψ at the point z, and
we recall (4.36):
 1
2
Dw (v, v) = −βN v2 + β 2 (η i · v)2 . (4.42)
i≤N
ch2
(βm∗ + h + βη i · w)

In contrast with the case β < 1 we must now take advantage of the fact
that the denominators have a tendency to be > 1, or even  1 for large β.
The difficulty is that some of the terms βm∗ + h + βη i · w might be close
to 0, in which case ch2 (βm∗ + h + βη i · w) is not large. We have to show
somehow that these terms do not contribute too much. The strategy is easier
to understand when β is not close to 1. In that case, the only terms that can
be troublesome are those for which βm∗ + h + βη i · w might be close to 0 (for
otherwise ch2 (βm∗ +h+βη i ·w)  1) and these are such that η i ·w ≤ −m∗ /2
and in particular |η i · w| ≥ m∗ /2. Proposition 4.2.6 is perfectly appropriate
to control these terms (as it should, since this is why it was designed).
We first consider the case β ≥ 2. In that case (following the argument of
(4.37)), since m∗ ≥ m∗ (2, 0), we have
 
1 β
β 2
≤ L exp − ,
ch2 (βm∗ /2 + h) L
and thus
 
β 
2
Dw (v, v) ≤ −βN v2 + L exp − (η i · v)2
L
i≤N

+β 2
1{|ηi ·w|≥m∗ /2} (η i · v)2 . (4.43)
i≤N

To control the second term in the right-hand side, we note that by Corol-
lary A.9.4, with overwhelming probability we have (whenever M ≤ N )

∀v, (η i · v)2 ≤ Lv2 . (4.44)
i≤N

Next, denoting by L0 the constant of (4.40), we set L2 = 2L0 , so that if we


define a by a2 = 1/L2 β, the right-hand side of (4.40) is
L0 N v2
L0 N a2 v2 = N v2 = .
L2 β 2β
Moreover since β > 2 there exists a universal constant L3 such that

L3 log β ≥ L1 log(2/a) .

Thus we can use Proposition 4.2.6 with a as above and b = L3 log β. We
observe that
4.2 Local Convexity and the Hubbard-Stratonovitch Transform 251

m∗ m∗
w ≤ and |η i · w| ≥ ⇒ |η i · w| ≥ bw .
2b 2
It then follows from Proposition 4.2.6 that if M ≤ N a2 = N/Lβ (i.e. α ≤
1/Lβ), then with overwhelming probability, the following occurs:

∀w, w ≤ m∗ /2b = m∗ /L log β ⇒


 N
1{|ηi ·w|≥m∗ /2} (η i · v)2 ≤ v2 ,

i≤N

and (4.36) yields


  
β β
2
Dw (v, v) ≤ −N − L exp − v2 .
2 L

Therefore, when β is large enough, say β ≥ β0 , we have shown that if α ≤


1/Lβ, with overwhelming probability we have

m∗ N β0
w ≤ √ ⇒ Dw
2
(v, v) ≤ − v2 . (4.45)
L log β 4

We now turn to the case 1 < β ≤ β0 . We will as before consider separately


the terms for which |η i · w| ≥ cm∗ where 0 < c < 1 is a parameter 0 < c <
1/2β0 (< 1) to be determined later. We first prove the inequality
1 1
≤ 2 + 2m∗2 βc + m∗2 1{|x|≥cm∗ }
ch2 (β(m∗ + x) + h) ch (βm∗ + h)
= 1 − m∗2 + 2m∗2 βc + m∗2 1{|x|≥cm∗ } . (4.46)

This is obvious if |x| ≥ cm∗ because then the right-hand side is ≥ 1. This
is also obvious if x ≥ 0 because this is true for x = 0 and the function
f (x) = ch−2 (β(m∗ + x) + h) decreases. Now,

2βth(β(m∗ + x) + h)
f (x) = − ,
ch2 (β(m∗ + x) + h)

so that for −m∗ ≤ x ≤ 0 we have

|f (x)| ≤ 2βth(β(m∗ + x) + h) ≤ 2βth(βm∗ + h) = 2βm∗ ,

and thus for −cm∗ ≤ x ≤ 0 we get

f (x) ≤ f (0) + 2βm∗2 c = 1 − m∗2 + 2βm∗2 c .

Therefore (4.46) also holds for |x| ≤ cm∗ , and is proved in every case.
We define
d = 1 − m∗2 + 2βm∗2 c ,
252 4. The Hopfield Model

and we note that since c < 1/2β0 and β < β0 we have

d<1.

Now (4.46) implies


1
≤ d + m∗2 1{|ηi ·w|≥cm∗ } , (4.47)
ch2
(β(m∗ + η i · w) + h)

and we deduce from (4.42) that


2
Dw (v, v) ≤ I + II (4.48)

where

I = −βN v2 + β 2 d (η i · v)2
i≤N

∗2
II = β m2
1{|ηi ·w|≥cm∗ } (η i · v)2 .
i≤N

Consider a parameter ρ > 0, to be fixed later. It follows from Corollary


A.9.4 that if α ≤ ρ2 /L, with overwhelming probability we have

∀v, (η i · v)2 ≤ N (1 + ρ)v2 ,
i≤N

and consequently
I ≤ −βN v2 (1 − βd(1 + ρ)) . (4.49)
∗2 ∗2
By (4.38), we have 1 − β(1 − m ) ≥ m /L, so that, recalling the definition
of d, that d ≤ 1, and that β ≤ β0 ,

1 − βd(1 + ρ) ≥ 1 − βd − β0 ρ
= 1 − β(1 − m∗2 ) − 2β 2 m∗2 c − β0 ρ
≥ 1 − β(1 − m∗2 ) − 2β02 m∗2 c − β0 ρ
m∗2
≥ − 2β02 m∗2 c − β0 ρ .
L0
We make the choices
m∗2 1
ρ= ; c= ,
4β0 L0 8β02 L0

so that 1 − βd(1 + ρ) ≥ m∗2 /2L0 and we see that provided that

ρ2 m∗4
α≤ = ,
L L
with overwhelming probability we have
4.2 Local Convexity and the Hubbard-Stratonovitch Transform 253

m∗2
I ≤ −βN v2 . (4.50)
2L0
To take care of the term II we use Proposition 4.2.6 again. We choose
a = 1/L4 , where L4 = 2β0 L0 . We can then apply Proposition 4.2.6 for
b = L1 log(2/a) (= L5 ). Then, since L0 is the constant in (4.40), the right-
hand side of this inequality is N v2 /4β02 L0 . Since when |η · w| ≥ cm∗
and w ≤ m∗ c/b = m∗ /L then |η · w| ≥ bw, this proves that if
M ≤ a2 N = N/L24 then with overwhelming probability II ≤ N v2 m∗2 /4L0
whenever w ≤ m∗ c/b = m∗ /L. Consequently, combining with (4.50) we
have shown that if β ≤ β0 and α ≤ m∗4 /L , then with overwhelming proba-
bility
m∗ m∗2
w ≤ ⇒ Dw (v, v) ≤ −βN v2 .
L 4L0
Combining with (4.45) we have completed the proof, because if the constant
L in (4.35) is large enough, the region this condition defines is included into
the region we have controlled. This is obvious by distinguishing the cases
β ≥ β0 and β ≤ β0 . 

Proof of Proposition 4.2.6. Consider the largest integer N0 ≤ N with
N0 log(eN/N0 ) ≤ N a2 . In Proposition A.9.5 it is shown that the following
event occurs with probability ≥ 1 − L exp(−N a2 ):

∀J ⊂ {1, · · · , N }, cardJ ≤ N0 , ∀w ∈ RM ,

(η i · w)2 ≤ w2 (N0 + L max(N a2 , N N0 a)) . (4.51)
i∈J

This statement is of similar nature as (4.51), except that we have a cardi-


nality restriction on the index set J instead of specifying that it is of the type
J(w) as defined by (4.41). The core of the proof is to show that when (4.51)
holds, then for each w we have cardJ(w) < N0 , after which a straightforward
use of (4.51) will imply (4.40).
To control the cardinality of J(w) suppose, if possible, that there exists
J ⊂ J(w) with cardJ = N0 . Then since |η i · w| ≥ bw for i ∈ J we have

(η i · w)2 ≥ b2 N0 w2 ,
i∈J

and, comparing with (4.51), we see that

b2 N0 ≤ N0 + L max(N a2 , N N0 a) ,

and therefore
(b2 − 1)N0 ≤ L max(N a2 , N N0 a) . (4.52)
The idea of the proof is to show that this bound on N0 contradicts the
definition of N0 , by forcing N0 to be too small. It is clear that, given a,
254 4. The Hopfield Model

this is the case if b is large enough, but to get values of b of the right order
one has to work a bit. Assuming without loss of generality b ≥ 2, we have
b2 − 1 ≥ b2 /2, so that (4.52) implies
b2 N0 ≤ L max(N a2 ,
N N0 a) .

Thus we have either N0 ≤ LN a /b or else b N0 ≤ L N N0 a, i.e.
2 2 2

a2 a2
N0 ≤ LN 4
≤ LN 2 .
b b
Therefore we always have N0 ≤ L6 N a2 /b2 . We show now that we can choose
the constant L1 large enough so that
 2 
L6 eb 1
b ≥ L1 log(2/a) =⇒ 2 log ≤ . (4.53)
b L6 a 2 2
To see that such a number L1 exists we can assume L6 ≥ e and we observe
that log(eb2 /L6 a2 ) ≤ 2 log b + 2 log(2/a). We moreover take L1 large enough
such that we also have L6 a2 /b2 ≤ L6 /b2 ≤ 1.
Since the function x → x log(eN/x) increases for x ≤ N , and since N0 ≤
L6 N a2 /b2 ≤ N , when b ≥ L1 log(2/a) we deduce from (4.53) that
   2 
eN a2 eb N a2
N0 log ≤ L6 N 2 log ≤ ,
N0 b L6 a 2 2
and therefore since N0 + 1 ≤ 2N0 we have
eN
(N0 + 1) log ≤ N a2 .
N0 + 1
But this contradicts the definition of N0 .
Thus we have shown that cardJ(w) < N0 . Then, by (4.51), and since
N0 ≤ N a2 we get 
(η i · v)2 ≤ LN a2 v2 . 

i∈J(w)

4.3 The Bovier-Gayrard Localization Theorem


Theorem 4.2.2 can be really useful only if the region (4.35) is actually relevant
for the computation of G . This is what we shall prove in this section.
Before we state the main result, we introduce some terminology, that
matches the spirit of Definition 4.2.3.
Definition 4.3.1. We say that a set A of RM is negligible if
 
N
E G (A) ≤ K exp − (4.54)
K
where K does not depend on N, M . We say that G is essentially supported
by A if Ac = RM \A is negligible.
4.3 The Bovier-Gayrard Localization Theorem 255

Of course the set A also depends on M and N , so it would be more


formal to say that “a family AN,M of sets is negligible if E G(AN,M ) ≤
K exp(−N/K), where K does not depend on M or N ”.
In a similar manner, we will say that a subset A of ΣN is negligible if
E G(A) ≤ K exp(−N/K), where K does not depend on N or M .

Theorem 4.3.2. (The Bovier-Gayrard localization theorem.) Consider β >


1, h ≥ 0 and ρ0 ≤ m∗ /2. If α ≤ m∗2 ρ20 /L, then G is essentially supported by
the union of the 2M balls in RM of radius ρ0 centered at the points ±m∗ ek ,
k ≤ M.

It is useful to note that the balls of this theorem are disjoint since ρ0 ≤
m∗ /2.
To reformulate Theorem 4.3.2, if we consider the set

A = {z ∈ RM ; ∀k ≤ M , ∀τ = ±1 , z − τ m∗ εk  > ρ0 } ,

then when α ≤ m∗2 ρ20 /L we have EG (A) ≤ K exp(−K/N ) where K depends


only on β, h and ρ0 but certainly not on α or N .
It is intuitive that something of the type of Theorem 4.3.2 should hap-
pen when h = 0. (The case h > 0 is discussed in Section 4.4.) Each of the
terms mk (σ) in the Hamiltonian attempts to create a Curie-Weiss model
“in the direction of (ηi,k )i≤N ”; and in such a model mk (σ)  ±m∗ . What
Theorem 4.3.2 says is that if there are not too many such terms, for (nearly)
each configuration, one of these terms wins over the others. For one k (de-
pending on the configuration) we have |mk (σ) ± m∗ | ≤ ρ0 , and for k = k,
|mk (σ)| ≤ ρ0 is smaller. What is not intuitive is how large α can be taken.
It is a separate problem to know whether the same k “wins” independently
of the configuration.
The Bovier-Gayrard localization theorem is a deep fact, that will require
significant work. The methods are of interest, but they are not related to the
main theme of the book (the cavity method) and will be used only in this
section. Therefore the reader mostly interested in following the main story
should skip this material.
We recall the probability γ introduced page 245. That is, γ has density
W exp(−βN z2 /2) with respect to Lebesgue measure on RM , where W is
given by (4.33).
We first elaborate on the idea that G = G ∗ γ is a small perturbation
of G . One reason is that if α ≤ βρ2 /4, then γ is essentially supported by
the ball centered at the origin, of radius ρ. To see this, we observe that, by
change of variable,
   
βN βN
exp z2 dγ(z) = W exp − z2 dz
RM 4 RM 4
 
βN
=2 M/2
W exp − z dz = 2M/2 .
2
RM 2
256 4. The Hopfield Model

Thus,    
Nβ 2 αN
γ({z2 ≥ ρ2 }) exp ρ ≤ 2M/2 ≤ exp
4 2
and, since α ≤ βρ2 /4, we get
   
N N βρ2
γ({z ≥ ρ }) ≤ exp − (βρ − 2α) ≤ exp −
2 2 2
. (4.55)
4 8

This inequality shows in particular by taking ρ = 2 α/β that if


"  %
α
B = z ; z ≤ 2 (4.56)
β

then γ(B) ≥ 1/L (observe that αN = M ≥ 1, so that N βρ2 /8 = M/2 ≥ 1/2).


Thus, given any subset A of RM , we have

G(A + B) = G ⊗ γ({(x, y) ; x + y ∈ A + B}) ≥ G (A)γ(B) ,

and hence
G (A) ≤ LG(A + B) . (4.57)
To prove that a set A is negligible for G it therefore suffices to prove
that A + B is negligible for G. Consequently if G is essentially supported
by a set C, then G is essentially supported by C + B. This is because the
complement A of C + B is such that A + B is contained in the complement of
C so that it is negligible for G and hence A is negligible for G . In particular
when G is essentially supported by the union C of the balls of radius ρ0 /2
centered at the points ±m∗ ek , then G is essentially supported by C + B.
When α ≤ m∗2 ρ20 /16, we have 2 α/β ≤ ρ0 /2 and hence C + B is contained
in the union of the balls of radius ρ0 centered at the points ±m∗ ek . Thus it
suffices to prove Theorem 4.3.2 for G rather than for G .
As a consequence of Lemma 4.2.1, for a subset A of RM , the identity

W A exp ψ(z)dz
G(A) = (4.58)
2−N ZN,M

holds, and the strategy to prove that A is negligible is simply to prove that
typically the numerator in (4.58) is much smaller than the denominator. For
this it certainly helps to bound the denominator from below. As is often the
case in this chapter, we need different arguments when β is close to 1 and
when β is away from 1. Of course the choice of the number 2 below is very
much arbitrary.

Proposition 4.3.3. If 1 < β ≤ 2 and α ≤ m∗4 , we have


 M/2
−N 1
2 ZN,M ≥ exp N b∗ (4.59)
La∗
4.3 The Bovier-Gayrard Localization Theorem 257

where

b∗ = log ch(βm∗ + h) − βm∗2 /2 , a∗ = 1 − β(1 − m∗2 ) .

This bound is true for any realization of the randomness. It somewhat


resembles Lemma 4.1.2.
Before we start the proof, we mention the following elementary fact. The
function ξ(x) = log chx satisfies

1 thx
ξ (x) = thx; ξ (x) = ; ξ (x) = −2 2 ; |ξ (4) (x)| ≤ 4 . (4.60)
ch2 x ch x

Proof of Proposition 4.3.3. Since G is a probability, Lemma 4.2.1 shows


that
2−N ZN,M = W exp ψ(z)dz ,

so that if we set ψ ∼ (v) = ψ(m∗ e1 + v) then

2−N ZN,M = W exp ψ ∼ (v)dv . (4.61)

Now, since we assume ηi,1 = 1, we have

η i · (m∗ e1 + v) = m∗ + η i · v ,

so that, setting b = βm∗ + h, we get,

ψ ∼ (v) = ψ(m∗ e1 + v)
Nβ 
=− m∗ e1 + v2 + log ch(βη i · (m∗ e1 + v) + h)
2
i≤N
Nβ 
=− m∗ e1 + v2 + log ch(b + βη i · v) . (4.62)
2
i≤N

We make an order 4 Taylor expansion of log ch around b (= βm∗ + h). This


yields
N β ∗2 N β
ψ ∼ (v) = − m − v2 − N βm∗ v · e1
2 2
 β2 
+ N log chb + βthb ηi · v + 2 (η i · v)2
i≤N
2ch b i≤N

β 3 thb  β4 
− 2 (η i · v)3 + Ri (v)(η i · v)4 (4.63)
3 ch b 6
i≤N i≤N

where |Ri (v)| ≤ 1.


258 4. The Hopfield Model

The idea of the proof is to simplify (4.63) by averaging over rotations.


If U denotes a rotation of RM , the invariance of Lebesgue’s measure by U
shows from (4.61) that

2−N ZN,M = W exp ψ ∼ (U (v))dv (4.64)

so that, if dU denotes Haar measure on the group of rotations, we have

2−N ZN,M = W exp ψ ∼ (U (v))dU dv


 

≥W exp ψ (U (v))dU dv (4.65)

by Jensen’s inequality.
Given a vector x of RM , we have for a certain constant cp that

(x · U (v))p dU = cp xp vp (4.66)

because the left-hand side depends only on x and v.


To compute the quantity cp , we consider a vector g = (gk )k≤M where
(gk ) are independent standard Gaussian r.v.s. We apply (4.66) to x = g and
we take expectation. We observe that g · U (v) is a Gaussian r.v. of variance
U (v)2 so that

E((g · U (v))p ) = U (v)p E g p = vp E g p ,

where g is a standard Gaussian r.v. Thus we obtain


E gp
cp = .
E gp

In particular, cp = 0 when p is odd, c2 = 1/M , and, since

E g4 ≥ (E g2 )2 = M 2

and Eg 4 = 3 we get c4 ≤ 3/M 2 .


We observe that η i 2 = M , so that using (4.66) for x = η i , (4.63) implies
 
βm∗2
ψ ∼ (U (v))dU ≥ N log chb −
2
 
Nβ β N β4
− 1− 2 v2 − v4 . (4.67)
2 ch b 2
Since b = βm∗ + h, we have thb = m∗ and thus
β
1− = 1 − β(1 − th2 b) = 1 − β(1 − m∗2 ) = a∗ ,
ch2 b
4.3 The Bovier-Gayrard Localization Theorem 259

so that from (4.65), and since b∗ = log chb − βm∗2 /2,


 
−N ∗ Nβ ∗ N β4
2 ZN,M ≥ (exp N b )W exp − a v −
2
v dv
4
2 2
 M/2  
1 ∗ Nβ N β 4 v4
= (exp N b )W exp − v 2
− dv
a∗ 2 2a∗2

by change of variable. Therefore, the definition of γ implies


 M/2  
−N 1 ∗ N β 4 v4
2 ZN,M ≥ (exp N b ) exp − dγ(v) .
a∗ 2 a∗2

Recalling also that B of (4.56), we get


     4 
N β 4 v4 N β4 α
exp − dγ(v) ≥ γ(B) exp − ∗2 2
2 a∗2 2a β
 
1 LN β 2 α2
≥ exp −
L a∗2
 
1 LM β 2 α
= exp − .
L a∗2

Recalling that m∗4 ≤ La∗2 by (4.38), and since we assume that β ≤ 2 we


have
 M/2
∗4 ∗2 −N 1
α ≤ m ≤ La ⇒ 2 ZN,M ≥ exp N b∗ . (4.68)
La∗


When β ≥ 2, we will use a different bound. We will use the vector θ =
(θk )k≤M given by
m∗ 
θ= (η i − e1 ) , (4.69)
N
1≤i≤N

so that θ1 = 0, whereas for 2 ≤ k ≤ M we have


m∗ 
θk = ηi,k .
N
1≤i≤N

Proposition 4.3.4. We have


 
−N ∗ Nβ
2 ZN,M ≥ exp N b + θ 2
. (4.70)
2
260 4. The Hopfield Model

This bound is true for any realization of the disorder and every value of β, M
and N . Since θ2 is about αm∗ /N this is much worse when β ≤ 2 than the
bound (4.59) when La∗ < 1.
Proof. The convexity of the function log ch, and the fact that since b =
βm∗ + h, we have thb = m∗ imply that log ch(b + x) ≥ log chb + m∗ x.
Therefore (4.62) implies
Nβ 
ψ ∼ (v) ≥ − m∗ e1 + v2 + N log chb + m∗ βη i · v
2
i≤N
 

= N b∗ − v2 + βm∗ (η i − e1 ) · v
2
i≤N

= N b∗ − v2 + N βθ · v
2
Nβ Nβ
= N b∗ + θ2 − v − θ2 .
2 2
Thus
   
Nβ Nβ
W exp ψ ∼ (v)dv ≥ exp N b∗ + θ2 W exp − v − θ2 dv
2 2
 
N β
= exp N b∗ + θ2 ,
2
and the result follows from (4.61). 
We have the following convenient consequence of Propositions 4.3.3 and
4.3.4: whenever α ≤ m∗4 ,
 M/2
1
2−N ZN,M ≥ exp b∗ N . (4.71)
La∗
Indeed, if β ≤ 2 this follows from Proposition 4.3.3, while if β ≥ 2, by Propo-
sition 4.3.4 we have 2−N ZM,N ≥ exp b∗ N , and, since a∗ remains bounded
away from 0 as β ≥ 2, we simply take L large enough that then La∗ ≥ 1.
The bound (4.71) does not however capture (4.70).
We turn to the task of finding upper bounds for the numerator of (4.58).
For this we will have to find an upper bound for ψ. We will use two rather
distinct bounds, the first of which will rely on the following elementary fact.
Lemma 4.3.5. The function

ϕ(x) = log ch(β x + h) (4.72)

is concave. Moreover, if x ≤ 2 then


β
ϕ (x) ≤ − . (4.73)
L
4.3 The Bovier-Gayrard Localization Theorem 261

Proof. Setting y = β x + h, computation shows that
 √   
β β x β y
ϕ (x) = 3/2 − thy ≤ 3/2 − thy
4x ch2 y 4x ch2 y
 
β sh2y − 2y
= − 3/2 ≤0.
8x ch2 y
Moreover, distinguishing the cases y ≤ 1 and y ≥ 1, we obtain
sh2y − 2y 1 1
2 ≥ min(1, y 3 ) ≥ min(1, x3/2 ) .
ch y L L

The result follows. 


Before we prove our first localization result, let us make a simple observa-
tion that we will use many times: to prove that a set A is negligible, it suffices
to show that with overwhelming probability we have G(A) ≤ K exp(−N/K).
This is because for any set A and any ε > 0, we have

E G(A) ≤ P(G(A) ≥ ε) + ε

since G(A) ≤ 1.

Proposition 4.3.6. If α ≤ m∗4 /L, the set

A = {z ∈ RM ; z ≥ 2m∗ }

is negligible for G, that is


 
N
E G(A) ≤ K exp − ,
K

where K depends only on β and h.

Proof. We write
   
1 
log ch(βη i · z + h) ≤ ϕ((η i · z) ) ≤ N ϕ
2
(η i · z)2
(4.74)
N
i≤N i≤N i≤N

by concavity of ϕ.
Using Corollary A.9.4 we see that provided

a∗2
α≤ (4.75)
L
then the event
1  a∗ 
∀ z ∈ RM , (η i · z)2 ≤ 1 + z2 (4.76)
N 8
i≤N
262 4. The Hopfield Model

occurs with overwhelming probability (that is, the probability of failure is at


most K exp(−N/K), where K depends on β, h only). When the event (4.76)
occurs, (4.74) implies
  
Nβ 1
ψ(z) ≤ − z + N ϕ
2
(η i · z)2
2 N
i≤N
 
Nβ a∗ 
≤− z + N ϕ
2
1+ z 2
. (4.77)
2 8

Let√us consider the function f (t) = log ch(βt + h) − βt2 /2, so that ϕ(x) =
f ( x) + βx/2 and (4.77) means
 
N βa∗ a∗
ψ(z) ≤ z + N f
2
1 + z . (4.78)
16 8

The second derivative of f is f (t) = β 2 ch−2 (βt + h) − β, which decreases as


t increases from 0. Moreover
 
β
f (m∗ ) = −β 1 − 2 = −βa∗ ,
ch (βm∗ + h)

where a∗ = 1 − β(1 − m∗2 ). For t ≥ m∗ , we have f (t) ≤ f (m∗ ) = −βa∗ .


Since f (m∗ ) = b∗ , f (m∗ ) = 0, for t ≥ m∗ we get

β ∗
f (t) ≤ b∗ − a (t − m∗ )2 .
2
Thus for t ≥ 2m∗ , and since then t − m∗ ≥ t/2 and thus (t − m∗ )2 ≥ t2 /4,
we have
βa∗ 2
f (t) ≤ b∗ − t ,
8
and therefore
   
a∗ βa∗ a∗ βa∗
f 1 + z ≤ b∗ − 1+ z2 ≤ b∗ − z2 .
8 8 8 8

It then follows from (4.78) that ψ(z) ≤ N b∗ − N βa∗ z2 /16 for z ≥ 2m∗ .
Thus, under (4.75), with overwhelming probability we have
 
Nβ ∗
ψ(z)dz ≤ exp N b∗ exp − a z2 dz (4.79)
A A 8
    
βa∗ m∗2 Nβ ∗
≤ exp N b∗ − exp − a z2 dz
4 A 16

because z2 ≥ 4m∗2 on A. Now, by change of variable,


4.3 The Bovier-Gayrard Localization Theorem 263
   M/2  
Nβ ∗ 8 Nβ
exp − a z2 dz = exp − z 2
dz
16 a∗ 2
 M/2
8
= W −1 ,
a∗
so that
    M/2
∗ βa∗ m∗2 8
ψ(z)dz ≤ exp N b − W −1
A 8 a∗
    M/2
∗ βm∗4 L
≤ exp N b − W −1
L a∗
since a∗ ≥ m∗2 /L by (4.38). Combining with (4.58) and (4.59), we deduce
that with overwhelming probability it holds

     
βm∗4 βm∗4 βm∗4
G(A) ≤ LM exp −N ≤ exp N αL7 − ≤ exp −N ,
L L7 2L7
provided α ≤ m∗4 /2L27 . This completes the proof. 
This preliminary result is interesting in itself, and will be very helpful
since from now on we need to be concerned only with the values of z such
that |z| ≤ 2m∗ .
Our further results will be based on the following upper bound for ψ; it
is this bound that is the crucial fact.
Lemma 4.3.7. We have
 
∗β 
ψ(z) ≤ N b + (η i · z) − N z
2 2
2
i≤N
β   
− min 1, ((η i · z)2 − m∗2 )2 . (4.80)
L
i≤N

The last term in (4.80) has a crucial influence. There are two main steps
to use this term. First, we will learn to control it from below uniformly on
large balls. This control will be achieved by proving that with overwhelming
probability at every point of the ball this term is not too much smaller than its
expectation. In a second but separate step, we will show that this expectation
cannot be small unless z is close to one of the points ±m∗ ei . Therefore with
overwhelming probability this last term can be small only if z is close to one
of the points ±m∗ ei , and this explains why Gibbs’ measure concentrates near
these points.
There is some simple geometry behind the behavior of the expectation of
the last term of (4.80). If we forget the minimum and consider simply the
average
264 4. The Hopfield Model

1  2
(η i · z)2 − m∗2 ,
N
i≤N

its expectation is precisely



(z2 − m∗2 )2 + zk2 z2 .
k=

As a warm-up before the real proof, the reader should convince herself that
this quantity can be small only if one of the zk ’s is approximately ±m∗ and
the rest are nearly zero.
Proof of Lemma 4.3.7. We recall the function ϕ of Lemma 4.3.5. Assuming
for definiteness that x ≥ m∗2 , this lemma implies
x
ϕ(x) = ϕ(m∗2 ) + ϕ (m∗2 )(x − m∗2 ) + (x − t)ϕ (t)dt
m∗2
β min(x,2)
≤ ϕ(m∗2 ) + ϕ (m∗2 )(x − m∗2 ) − (x − t)dt
L m∗2
β  
≤ ϕ(m∗2 ) + ϕ (m∗2 )(x − m∗2 ) − min 1, (x − m∗2 )2 . (4.81)
L
Now,
β β
ϕ (m∗2 ) = ∗
th(βm∗ + h) =
2m 2
and
β ∗2 β
ϕ(m∗2 ) − m = log ch(βm∗ + h) − m∗2 = b∗ ,
2 2
so that (4.81) implies

β β  
ϕ(x) ≤ b∗ + x − min 1, (x − m∗2 )2 .
2 L
Using this for x = (η i ·z)2 , summing over i ≤ N and using the first inequality
in (4.74) yields the result. 
To perform the program outlined after the statement of Lemma 4.3.7, it
helps to introduce a kind of truncation. Given a parameter d ≤ 1, we write
 
Rd (z) = E min d, ((η i · z)2 − m∗2 )2 , (4.82)

a quantity which is of course does not depend on i.

Proposition 4.3.8. Consider a ball B of RM , of radius ρ, and assume that

B ⊂ {z ; z ≤ 2m∗ } . (4.83)

Then, for each ε > 0, with overwhelming probability, we have


4.3 The Bovier-Gayrard Localization Theorem 265
  
∀ z ∈ B, min d, ((η i · z)2 − m∗2 )2
i≤N
N ρ √ 
≥ Rd (z) − 4α log 1 + − Lεm∗ d . (4.84)
4 ε
This is the first part of our program, showing that the last term of (4.80) is
not too much below its expectation. The strange logarithm in the right-hand
side will turn to be harmless, because we will always choose ρ and ε of the
same order.
The proof of Proposition 4.3.8 itself has two steps. In the first we will
show that the left-hand side of (4.84) can be controlled uniformly over many
points. In the second we will show that this implies uniform control over B.

Lemma 4.3.9. Consider a finite subset A of RM and C > 1. Then, with


probability at least 1 − 1/C we have
   N
∀ z ∈ A, min d, ((η i ·z)2 −m∗2 )2 ≥ Rd (z)−log cardA−log C . (4.85)
4
i≤N

Proof. Let
N
A1 = {z ∈ A ; Rd (z) ≥ log cardA + log C} .
4
To prove (4.85), it suffices to achieve control over z ∈ A1 . Let us fix z in A1 .
The r.v.s  
Xi = min d, ((η i · z)2 − m∗2 )2
are i.i.d., 0 ≤ Xi ≤ 1, E Xi = Rd (z). We prove an elementary exponential
inequality about these variables. Since exp(−x) ≤ 1 − x/2 ≤ exp(−x/2) for
x ≤ 1, we have
   
E Xi E Xi Rd (z)
E exp(−Xi ) ≤ 1 − ≤ exp − = exp −
2 2 2
and thus     
N Rd (z)
E exp − Xi ≤ exp − ,
2
i≤N

so that
     
N Rd (z) N Rd (z) N Rd (z)
P Xi ≤ exp − ≤ exp − ,
4 4 2
i≤N

and    
N Rd (z) N Rd (z) 1
P Xi ≤ ≤ exp − ≤
4 4 CcardA
i≤N

since z ∈ A1 . Thus, with probability at least 1 − 1/C, we have


266 4. The Hopfield Model
   N Rd (z)
∀ z ∈ A1 , min d, ((η i · z)2 − m∗2 )2 ≥ . 

4
i≤N

Next, we relate what happens for two points close to each other.

Lemma 4.3.10. We have



|Rd (z1 ) − Rd (z2 )| ≤ L dz1 − z2 (z1  + z2 ) . (4.86)

Moreover, with overwhelming probability it is true that for any z1 and z2 in


RM we have
 
     
 ∗2 2
min d, ((η i · z1 ) − m ) −
2 ∗2 2 
min d, ((η i · z2 ) − m ) 
2

i≤N i≤N

≤ N L dz1 − z2 (z1  + z2 ) . (4.87)

Proof. We start with the observation that, since d ≤ 1,

| min(d, x2 ) − min(d, y 2 )|
√ √
= |min( d, |x|)2 − min( d, |y|)2 |
√ √ √ √
= | min( d, |x|) − min( d, |y|)|(min( d, |x|) + min( d, |y|))
√ √ √ √
≤ 2 d| min( d, |x|) − min( d, |x|)| ≤ 2 d|x − y| , (4.88)

and thus the left-hand side of (4.87) is bounded by


√ 
2 d |(η i · z1 )2 − (η i · z2 )2 |
i≤N
√ 
≤2 d |η i · (z1 − z2 )|(|η i · z1 | + |η i · z2 |)
i≤N
 1/2

≤2 d (η i · (z1 − z2 ))2
i≤N
#  1/2   1/2 $
× (η i · z1 )2 + (η i · z2 )2 . (4.89)
i≤N i≤N

Taking expectation and using the Cauchy-Schwarz inequality proves (4.86)


since the expectation of the left-hand side of (4.87) is N |Rd (z1 ) − Rd (z2 )|.
Moreover, with overwhelming probability (4.44) holds, and then (4.87) is a
simple consequence of (4.89). 
Proof of Proposition 4.3.8. It is shown in Proposition A.8.1 that we can
find a subset A of B such that
ρ M
cardA ≤ 1 +
ε
4.3 The Bovier-Gayrard Localization Theorem 267

and such that each point of B √is within distance 2ε of A. We apply


Lemma 4.3.9 with C = exp(N εm∗ d). We observe that given z2 in B, there
exists z1 in A with z2 − z1  ≤ 2ε, and z1 , z2  ≤ 2m∗ . We then apply
(4.87) and (4.86) to obtain the result. The choice of C is no magic, the √
ex-
ponent is simply small enough that log C is about the error term LN εm∗ d
produced by Lemma 4.3.10. 
To use efficiently (4.84), we need to understand the geometric nature of
Rd (z). We will show that this quantity is small only when z is close to a point
±m∗ ek .

Lemma 4.3.11. Consider a number 0 ≤ ξ ≤ 1. Assume that

∀ k ≤ M, z ± m∗ ek  ≥ ξm∗ . (4.90)

Then if
d = ξ 2 m∗4 , (4.91)
we have
ξ 2 ∗4
Rd (z) ≥ m . (4.92)
L
The proof relies on the following probabilistic estimate.

Lemma 4.3.12. We have


  
1
Rd (z) ≥ min d, (z2 − m∗2 )2 + zk2 z2 . (4.93)
L
k=

Proof of Lemma 4.3.11. Using (4.93), it is enough to prove that if


 ξ 2 m∗4
(z2 − m∗2 )2 + zk2 z2 ≤ (4.94)
16
k=

then we can find k ≤ M and τ = ±1 such that

z − τ m∗ ek  < ξm∗ . (4.95)

First, we observe from (4.94) that

ξm∗2
|z2 − m∗2 | ≤ (4.96)
4
so that
|z2 − m∗2 | ξm∗
|z − m∗ | = ≤ . (4.97)
z + m∗ 4
Next, (4.94) implies
268 4. The Hopfield Model

ξ 2 m∗4  
≥ zk2 z2 = z4 − z4
16
k= ≤M

≥ z − (max z2 )
4
z2
≤M
≤M

= z (z − max z2 ) .


2 2
(4.98)
≤M

Consider k such that zk2 = max≤M z2 . Then, since z2 ≥ 3m∗2 /4 by (4.96),
we have from (4.98) that

ξ 2 m∗4 ξ 2 m∗2
z2 − zk2 ≤ ≤ .
16z2 12

Now z2 − zk2 = 2
=k z = z − zk ek 2 , so that

ξm∗
z − zk ek  ≤ (4.99)
3
and consequently
ξm∗
|z − |zk || ≤ . (4.100)
3
Moreover, if τ = signzk , we have

zk ek − τ m∗ ek  = |zk − τ m∗ | = ||zk | − m∗ |


 
∗ 1 1
≤ |z − m | + |z − |zk || ≤ + ξm∗ , (4.101)
4 3

using (4.97) and (4.100). Combining with (4.99) proves (4.95). 


Proof of Lemma 4.3.12. We consider the r.v.s

X = ((η i · z)2 − m∗2 )2 , (4.102)

so the Paley-Zygmund inequality (A.80) implies


 
1 1 (E X)2
P X ≥ EX ≥ , (4.103)
2 4 E (X 2 )

and thus
   
EX EX
E min(d, X) ≥ min d, P X≥
2 2
  2
E X (E X)
≥ min d, . (4.104)
2 4E (X 2 )

We have
X = (U + a)2 , (4.105)
4.3 The Bovier-Gayrard Localization Theorem 269

where a = z2 − m∗2 , 


U= ηi,k ηi, zk z ,
k=

so that E U = 0, E U 2 = 2 2
k= zk z and thus

EX = zk2 z2 + a2 . (4.106)
k=

It can be checked simply by force (expansion) that

E U 4 ≤ L(E U 2 )2 , (4.107)

but there are much nicer arguments to do this [14]. From (4.105) it follows
that

E X 2 = E U 4 + 4aE U 3 + 6a2 E U 2 + 4a3 E U + a4


≤ E U 4 + 4a(E U 4 )3/4 + 6a2 E U 2 + a4
≤ L((E U 2 )2 + a(E U 2 )3/2 + a2 E U 2 + a4 )

using (4.107). Using that ab ≤ a4 + b4/3 for b = (E U 2 )3/2 , we get E X 2 ≤


L(E U 2 + a2 )2 = L(E X)2 and (4.104) implies
 
1 EX
E min(d, X) ≥ min d, . 

L 2

We now put together the different pieces, and we state our main tool for
the proof of Theorem 4.3.2.

Proposition 4.3.13. Assume that α ≤ m∗4 /L. Consider a ball B of RM of


radius ρ, and assume that

B ⊂ {z ; z ≤ 2m∗ } .

Consider a subset A of B and assume that for some number ξ < 1,

z ∈ A ⇒ ∀ k ≤ M , z ± m∗ ek  ≥ ξm∗ .

Then, with overwhelming probability, we have


  
∗ M/2 β 
G(A) ≤ (La ) W exp (η i · z) − N z
2 2
A 2
i≤N
  
N β 2 ∗4 Lρ
− ξ m − Lα log 1 + dz . (4.108)
L ξm∗
270 4. The Hopfield Model

Proof. We take d = ξ 2 m∗4 . We recall (4.80) and we consider Proposition 4.3.8


and Lemma 4.3.11 to see that, given ε > 0, with overwhelming probability
we have
 
∗ β 
∀ z ∈ A , ψ(z) ≤ N b + (η i · z) − N z
2 2
2
i≤N
N β 2 ∗4 ρ 
− ξ m − Lα log 1 + − L7 εξm∗3 .
L ε
We choose ε = ξm∗ /2L7 and the result then follows from (4.58) and (4.71).

Proof of Theorem 4.3.2. First we consider the case where

B = {z ; z ≤ 2m∗ }
" %
1
A= z ; z ≤ 2m∗ , ∀ k ≤ M , z ± m∗ ek  ≥ m∗ . (4.109)
2
Thus, we can use (4.108) with ρ = 2m∗ and ξ = 1/2. Consider a number
0 < c < 1, to be determined later. Using Corollary A.9.4 we see that if
α ≤ c2 /L, with overwhelming probability we have
 
∗ M/2 m∗4
G(A) ≤ (La ) W exp N β cL8 z −
2
+ L8 α dz . (4.110)
A L8

It appears that a good choice for c is c = m∗2 /16L28 , so that

m∗4 m∗4
z ≤ 2m∗ ⇒ cL8 z2 − ≤− − cL8 z2
L8 2L8
and thus (4.110) yields that if α ≤ m∗4 /4L28 , with overwhelming probability
we have
   
N βm∗4 N βm∗2
G(A) ≤ (La∗ )M/2 W exp − exp − z2 dz
L L
 ∗ M/2  ∗4

La N βm
≤ ∗2
exp − . (4.111)
m L

Since a∗ ≤ Lm∗2 by (4.38) we get


 M/2
La∗
≤ LM = LαN ≤ exp LαN ,
m∗2

so that for α ≤ m∗4 /L with overwhelming probability we have G(A) ≤


exp(−N βm∗4 /L). This implies that A is negligible.
At this stage, we know that G is essentially supported by the sets z ±
m∗ ek  ≤ m∗ /2. To go beyond this, the difficulty in using (4.108) is to control
4.3 The Bovier-Gayrard Localization Theorem 271

the term i≤N (η i · z)2 − N z2 . A simple idea is that it is easier to control
this term when we know that z is not too far from a point ±m∗ ek .
Given an integer  ≥ 1, given τ = ±1, given k ≤ M , we want to apply
(4.108) to the sets
B = {z ; z − τ m∗ ek  ≤ 2− m∗ } ,
A = Ak,τ = {z ; 2−−1 m∗ ≤ z − τ m∗ ek  ≤ 2− m∗ } . (4.112)
−−1
Thus, we can use (4.108) with ξ = 2 , ρ = 2 m . We set v = z − τ m∗ ek ,
− ∗

so that
 
(η i · z)2 − N z2 = (η i · v)2 − N v2 (4.113)
i≤N i≤N
 

+ 2τ m (η i · v)(η i · ek ) − N v · ek ,
i≤N

where we have used that |η i ·ek | = 1. Consider now a parameter c to be chosen


later (depending only on , β, h). Corollary A.9.4 implies that if α ≤ c2 /L,
with overwhelming probability the quantity (4.113) is at most
N c(v2 + m∗ v) ,
and (4.108) implies
β
G(A) ≤ (La∗ )M/2 W exp N C(v)dv , (4.114)
{v≤2− m∗ } 2
where
m∗4 2−2
C(v) = c(v2 + m∗ v) − + Lα .
L
Since α ≤ c2 and v ≤ 2− m∗ , we get
m∗4 2−2
C(v) ≤ L9 c(2− m∗2 ) − + L9 c 2 .
L9
It is a good idea to take c = 2− m∗2 /4L29 because then, for v ≤ 2− m∗ ,
m∗4 2−2 m∗4 2−2 m∗2
C(v) ≤ − ≤ − v2 ,
2L9 4L9 4L9
and, since a∗ ≤ Lm∗2 , (4.114) gives
   
N βm∗4 2−2 N βm∗2
G(A) ≤ (La∗ )M/2 W exp − exp − v2 dv
L L
 ∗ M/2  ∗4 −2

La N βm 2
= exp −
m∗2 L
 ∗4 −2

N βm 2
≤L M/2
exp −
L
   
N βm∗4 2−2 N
= exp M L10 − ≤ K exp − (4.115)
L10 K
272 4. The Hopfield Model

provided α ≤ 2−2 m∗4 /2L210 . Thus, if


!
A∗ = Ak,τ = {z ; ∃k ≤ M, τ = ±1, 2−−1 m∗ ≤ z − τ m∗ ek  ≤ 2− m∗ } ,
k,τ

we have G(A∗ ) ≤ KM exp(−N/K) ≤ K exp(−N/2K) since M ≤ N .


Summarizing, if α ≤ m∗2 ρ20 /L, the set A∗ is negligible whenever  ≥ 0
and m∗ 2− ≥ ρ0 . Combining with Proposition 4.3.6, we have proved The-
orem 4.3.2. 

4.4 Selecting a State with an External Field

We recall the notation m(σ) = (mk (σ))k≤M of (4.32). A consequence of


Theorem 4.3.2 is as follows. Consider, for k ≤ M , the set
" %
m∗
Ck = σ ; m(σ) ± m∗ ek  ≤ . (4.116)
4

Then, if α ≤ m∗4 /L, Gibbs’ measure is essentially supported by the union of


the sets Ck , as k ≤ M .

Conjecture 4.4.1. (Level 2) Assume that h = 0. If M is large, prove that, for


the typical disorder, there is a k ≤ M such that G(Ck ) is nearly 1.

The paper [30] contains some results relevant to this conjecture.

Research Problem 4.4.2. (Level 2) More generally, when h = 0, under-


stand precisely the properties of the random sequence (G(Ck ))k≤M .

Our goal in the present section is to prove the following.

Theorem 4.4.3. If β > 1, h > 0, then for α ≤ m∗4 /L, the set
" %
m∗
A = σ ; m(σ) − m∗ e1  ≥ (4.117)
4

is negligible.

This means of course that E G(A) ≤ K exp(−N/K) where K depends


only on β, h.
The content of Theorem 4.4.3 is as follows. We know from Theorem 4.3.2
that Gibbs’ measure is essentially supported by the union of 2M balls cen-
tered at ±m∗ ek with a certain radius. When h = 0, however small, only the
ball centered at m∗ e1 matters. A precise consequence is as follows.
4.4 Selecting a State with an External Field 273

Theorem 4.4.4. Consider β > 1, h > 0 and ρ0 ≤ m∗ /2. If α ≤ m∗2 ρ20 /L,
then G is essentially supported by the ball in RM of radius ρ0 centered at the
point m∗ e1 . Equivalently, the set

{σ ; m(σ) − m∗ e1  ≥ ρ0 } (4.118)

is negligible.

Proof. By Theorem 4.3.2 the set

{σ ; ∀k ≤ M, m(σ) ± m∗ ek  ≥ ρ0 }

is negligible. The union of this set and the set (4.117) is the set (4.118), which
is therefore negligible. 

For k ≤ M and τ = ±1 we consider the sets
" %
m∗
Bk,τ = σ ; m(σ) − τ m∗ ek  ≤ (4.119)
4
" %
3m∗
Ck,τ = σ ; τ mk (σ) ≥ . (4.120)
4
0
Let us denote by HN,M (σ) the Hamiltonian (4.18), which corresponds to
the case h = 0. We define

0
S(k, τ ) = exp(−HN,M (σ)) . (4.121)
σ∈Ck,τ

The crucial property is as follows.

Lemma 4.4.5. There exists a number a such that for each k ≤ M , each
τ = ±1, we have
    
1  N u2
 
0 ≤ u ≤ 1 ⇒ P  log S(k, τ ) − a ≥ u ≤ K exp − . (4.122)
N K

It suffices to prove this for k = τ = 1, because the r.v.s S(k, τ ) all have
the same distribution. This inequality relies on a “concentration of measure”
principle that is somewhat similar to Theorem 1.3.4. This principle, which
has received numerous applications, is explained in great detail in Section 6 of
[140], and Lemma 4.4.5 is proved exactly as Theorem 6.8 there. The author
believes that learning properly this principle is well worth the effort, and
that this is better done by reading [140] than by reading only the proof of
Lemma 4.4.5. Thus, Lemma 4.4.5 will be one of the few exceptions to our
policy of giving complete self-contained proofs.
Proof of Theorem 4.4.3. We have
274 4. The Hopfield Model

1 
G(C1,1 ) = exp(−HN,M (σ)) .
ZN,M
σ∈C1,1

For σ in C1,1 , we have hm1 (σ) ≥ 3hm∗ /4, so that −HN,M (σ) ≥ −HN,M
0
(σ)

+3N hm /4, and
 
3 ∗ S(1, 1)
1 ≥ G(C1,1 ) ≥ exp N hm . (4.123)
4 ZN,M

If (k, τ ) = (1, 1), we have hm1 (σ) ≤ hm∗ /4 for σ in Bk,τ so that
   
1
exp(−HN,M (σ)) ≤ exp N hm∗ 0
exp(−HN,M (σ))
4
σ∈Bk,τ σ∈Bk,τ
 
1 ∗
≤ exp N hm S(k, τ )
4

and thus  
1 S(k, τ )
G(Bk,τ ) ≤ exp N hm∗ . (4.124)
4 ZN,M
Taking u = min(1, hm∗ /8) in Lemma 4.4.5 shows that with overwhelming
probability we have
 
1
S(1, 1) ≥ exp N a − N hm∗
8
 
1 ∗
∀ k, τ, S(k, τ ) ≤ exp N a + N hm
8
and thus  
1
∀ k, τ, S(k, τ ) ≤ exp N hm∗ S(1, 1) .
4
Combining with (4.123) and (4.124) yields that, with overwhelming proba-
bility,  
1
(k, τ ) = (1, 1) ⇒ G(Bk,τ ) ≤ exp − N hm∗
4
so that Bk,τ is negligible. Combining with Theorem 4.3.2 finishes the proof.


4.5 Controlling the Overlaps

From now on we assume h > 0, and we recall Theorem 4.4.4: given ρ0 ≤ m∗ /2,
if α ≤ m∗2 ρ20 /L8 , then G is essentially supported by the set

B1 = {z ; z − m∗ e1  ≤ ρ0 } .
4.5 Controlling the Overlaps 275

Moreover (assuming without loss of generality that L8 ≥ 4) if follows from


(4.55) that γ is essentially supported by the set

B2 = {z ; z ≤ ρ0 } .

Therefore G = G ∗ γ is essentially supported by

B = B1 + B2 = {z ; z − m∗ e1  ≤ 2ρ0 } .

Using this for ρ0 = m∗ /L β proves that if α ≤ m∗4 /Lβ, then G is essentially
supported by the set
" %
m∗
B = z ; z − m∗ e1  ≤ √ . (4.125)
L β

Combining with Theorem 4.2.2 yields that moreover there exists κ > 0 such
that, with overwhelming probability, the function z → ψ(z) + κN z2 is
concave on B (so that ψ satisfies (3.21)). This will permit us to control the
model in the region where α ≤ m∗4 /Lβ. (In Chapter 10 we will be able to
control a larger region using a different approach.)
Consider the random probability measure G∗ on B which has a density
proportional to exp ψ(z) with respect to Lebesgue’s measure on B. Then we
should be able to study G∗ using the tools of Section 3.1. Moreover, we can
expect that G and G∗ are “exponentially close”, so we should be able to
transfer our understanding of G∗ to G, and then to G . We will address this
technical point later, and we start the study of G∗ . As usual, one can expect
the overlaps to play a decisive part. The coordinate z1 plays a special rôle, so
we exclude it from the following definition of the overlap of two configurations
  
U, = U, (z , z ) = zk zk . (4.126)
2≤k≤M

There is no factor 1/N because we are here in a different normalization than


in the previous chapter. We will also write

U1,1 = U1,1 (z) = (zk )2 .
2≤k≤M

As a first goal, we would like to show that for k ≥ 1,


 ∗
E (U1,1 − E U1,1 ∗ )2k (4.127)

is small, and we explain the beautiful argument of Bovier and Gayrard which
proves this. For λ ≥ 0, consider the probability Gλ on B that has a density
proportional to exp(ψ(z) + λN U1,1 (z)) with respect to Lebesgue’s measure
on B; and denote by · ∗λ an average for this probability, so that · ∗ = · ∗0 .
The function
276 4. The Hopfield Model

κ κ 2  κ 
z → λU1,1 (z) − z = − z1 −
2
− λ zk2
2 2 2
2≤k≤M

is concave for λ ≤ κ/2. Therefore, for every λ ≤ κ/2, the function


κ
z → ψ(z) + λN U1,1 (z) + N z2 (4.128)
2
is concave whenever the function z → ψ(z) + κN z2 is concave, and this
occurs with overwhelming probability. Also, since z ≤ 2 for z ∈ B,

∀x, y ∈ B, |U1,1 (x) − U1,1 (y)| ≤ 4x − y .

When the function (4.128) is concave, we can use (3.17) (with N κ/2 instead
of κ) to get
 k
 ∗ Kk
∀k ≥ 1 , (U1,1 − U1,1 ∗λ )2k λ ≤ , (4.129)
N
where K depends on κ only, and hence on β only.
The next step is to control the fluctuations of U1,1 ∗ . For this we consider
the random function
1
ϕ(λ) = log exp(ψ(z) + λN U1,1 (z))dz (4.130)
N B

so that it is straightforward to obtain that


 ∗
ϕ (λ) = U1,1 ∗λ ; ϕ (λ) = N (U1,1 − U1,1 ∗λ )2 λ . (4.131)

Thus ϕ is convex since ϕ ≥ 0. Taking k = 1 in (4.129) yields


κ
λ< ⇒ ϕ (λ) ≤ K with overwhelming probability. (4.132)
2
Also, since on B we have |U1,1 | ≤ L, relation (4.131) implies that ϕ (λ) ≤ LN
and (4.132) that
κ
λ < ⇒ Eϕ (λ) ≤ K . (4.133)
2
We will deduce the fact that ϕ (0) has small fluctuations from (4.132) and
the fact that ϕ(λ) has small fluctuations. We write

ϕ(λ) = Eϕ(λ) ,

and we show first that ϕ has small fluctuations.


Lemma 4.5.1. We have
 k
Kk
∀k > 1, E(ϕ(λ) − ϕ(λ)) 2k
≤ . (4.134)
N
4.5 Controlling the Overlaps 277

This is really another occurrence of the principle behind Lemma 4.4.5. We


give enough details so that the reader interested in the abstract principle of
[120], Section 6, should find its application to the present situation immediate.
Consider (xi )i≤N , xi ∈ RM , and define

1 Nβ 
F (x1 , . . . , xN ) = log exp − z2 + log ch(βxi · z + h)
N B 2
i≤N

+ λN U1,1 (z) dz .

This function has the following two remarkable properties: first, given any
number b, the set {F ≤ b} is a convex set in RN ×M . This follows from the
convexity of the function
√ log ch and Hölder’s inequality. Second, its Lipschitz
constant is ≤ K/ N . Indeed,

∂F β
= zk th(βxi · z + h) ,
∂xi,k N

where the meaning of the average · should be obvious. Thus


 2
∂F β2 2
≤ z ,
∂xi,k N2 k

and since for z ∈ B we have k≤M zk2 ≤ L this yields

  2
∂F Lβ 2
≤ ,
∂xi,k N
i≤N,k≤M

i.e. the gradient of F has a norm ≤ K/ N . The abstract principle of [120]
implies then that
 
N u2
∀u > 0, P (|ϕ(λ) − a| ≥ u) ≤ exp − ,
K

where a is the median of ϕ(λ). Therefore by (A.35) we have E(ϕ(λ) − a)2k ≤


(Kk/N )k for k ≥ 1, and (4.134) follows by the symmetrization argument
(3.22). 

The following gives an elementary method to control the fluctuations of
the derivative of a random convex function when we control the fluctuations
of the function and the size of its second derivative.
Lemma 4.5.2. Consider λ > 0. Consider a random convex function ϕ :
[0, λ0 ] → R that satisfies the following conditions, where δ, C0 , C1 , C2 are
numbers, where k ≥ 1 is a given integer and where ϕ = Eϕ:
278 4. The Hopfield Model

|ϕ | ≤ C0 (4.135)

ϕ ≤ C1 (4.136)
ϕ ≤ C1 with probability ≥ 1 − δ (4.137)
E(ϕ(λ) − ϕ(λ))2k ≤ C2k . (4.138)
Then when C2 ≤ λ40 C12 we have
k/2
E(ϕ (0) − ϕ (0))2k ≤ Lk (δC02k + C1k C2 ) . (4.139)

Proof. Since ϕ is convex we have ϕ ≥ 0 so when ϕ ≤ C1 we have


|ϕ | ≤ C1 , and for x ≥ 0 we have |ϕ (x) − ϕ (0)| ≤ C1 x. Integration of this
inequality for 0 ≤ x ≤ λ (where λ < λ0 ) yields
 
 
ϕ (0) − ϕ(λ) − ϕ(0)  ≤ C1 λ .
 λ  2

For the same reason, using now (4.136), we get


 
 
ϕ (0) − ϕ(λ) − ϕ(0)  ≤ C1 λ ,
 λ  2
and therefore
|ϕ(λ) − ϕ(λ)| |ϕ(0) − ϕ(0)|
|ϕ (0) − ϕ (0)| ≤ C1 λ + + ,
λ λ
so that

1
(ϕ (0) − ϕ (0)) 2k
≤3 k
(ϕ(λ) − ϕ(λ))2k
(C1 λ)2k +
λ2k

1
+ 2k (ϕ(0) − ϕ(0))2k
. (4.140)
λ

Recalling that (4.140) might fail with probability δ, taking expectation


and using (4.135) and (4.138) we obtain that for λ ≤ λ0
 
1
E(ϕ (0) − ϕ (0))2k ≤ Lk δC02k + (C1 λ)2k + 2k C2k .
λ
1/4 −1/2
Choosing now λ = C2 C1 , this yields that whenever λ ≤ λ0 , we have
k/2
E(ϕ (0) − ϕ (0))2k ≤ Lk (δC02k + C1k C2 ) . 


Corollary 4.5.3. For k ≥ 1 we have


 k/2
∗ ∗ 2k Kk
E( U1,1 − E U1,1 ) ≤ . (4.141)
N
4.5 Controlling the Overlaps 279

Comment. We expect that (4.141) holds with the better bound (Kk/N )k . It
does not seem to be possible to prove this by the arguments of this section,
that create an irretrievable loss of information.
Proof. We are going to apply Lemma 4.5.2 to the function (4.130) with
λ0 = κ/2. Since |U1,1 | ≤ L on B the first part of (4.131) implies that |ϕ | ≤ L
so that (4.135) holds for C0 = L. We see from (4.132) and (4.133) that (4.136)
and (4.137) hold for C1 = K and δ = K exp(−N/K), and from Lemma 4.5.1
that (4.138) holds with C2 = Kk/N . We conclude from (4.139) that, provided
C2 ≤ λ40 C12 , and in particular whenever Kk/N ≤ κ4 K1 , we have
    k/2 
∗ ∗ 2k N Kk
E( U1,1 − E U1,1 ) ≤ K k
exp − + .
K2 N

This implies that (4.141) holds whenever exp(−N/K2 ) ≤ (Kk/N )k/2 . This
occurs provided k ≤ N/K3 . To handle the case k ≥ N/K3 , we simply write
that  4 k/2
∗ ∗ 2k L K3 k
E( U1,1 − E U1,1 ) ≤ L ≤ 2k
. 

N
Proposition 4.5.4. For k ≥ 1 we have
 k/2
  Kk
E (U1,1 − E U1,1 ∗ )2k ≤ . (4.142)
N

Proof. Inequality (4.129) might fail with probability ≤ K exp(−K/N ), but


since |U1,1 | ≤ L, taking expectation in this inequality we get
 k
  Kk
E (U1,1 − U1,1 ∗ )2k ≤ + L2k K exp(−N/K) ,
N

and we show as in Corollary 4.5.3 that this implies in fact that


 k
  Kk
E (U1,1 − U1,1 ∗ )2k ≤ .
N

Combining with (4.141) completes the proof. 




Proposition 4.5.5. For k ≥ 1, we have


 k/2
 ∗ Kk
∀j ≤ M, E (zj − E zj ∗ )2k ≤ . (4.143)
N

Proof. Identical to that of Proposition 4.5.4, using now the Gibbs measure
with density proportional to exp(ψ(z) + λN zj ). The proof can be copied
verbatim replacing U1,1 by zj . 

280 4. The Hopfield Model

Proposition 4.5.6. For k ≥ 1, we have


 k/2
 ∗ Kk
E (U1,2 − E U1,2 ∗ )2k ≤ . (4.144)
N
Proof. We consider the Gibbs measure on B × B with density proportional
to
exp(ψ(z1 ) + ψ(z2 ) + λN U1,2 ) . (4.145)
We observe that for λ ≤ κ the function
κ
(z1 , z2 ) → λU1,2 − (z1 2 + z2 2 )
2
is concave. This is because at every point its second differential D2 satisfies

D2 ((v1 , v2 ), (v1 , v2 )) = 2λ vk1 vk2 − κ(v1 2 + v2 2 ) ≤ 0 ,
2≤k≤M

using that 2vk1 vk2 ≤ (vk1 )2 + (vk2 )2 . The proof is then identical to that of
Proposition 4.5.4. 

We now turn to the task of transferring our results from G∗ to G and then
to G . One expects that these measures are very close to each other; still we
must check that the exponentially small set B c does not create trouble. Such
lackluster technicalities occupy the rest of this section. Let us denote by · −
an average for G. By definition of · ∗ , for a function f on RM we have
1B f − = G(B) f ∗ , so that
− − − ∗ −
f = 1B f + 1B c f = G(B) f + 1B c f .

Taking expectation and using the Cauchy-Schwarz inequality in the last term
shows that when f ≥ 0 it holds
− ∗ − 1/2
Ef ≤E f + (EG(B c ))1/2 (E f 2 ) ,

and, in particular, since G is essentially supported by B,


 
N
E f − ≤ E f ∗ + K exp − (E f 2 − )1/2 . (4.146)
K

To use (4.146) constructively it suffices to show that E f 2 − is not extremely


large. Of course one never doubts that this is the case for the functions we
are interested in, but this has to be checked nonetheless. Inequality (4.151)
below will take care of this.

Lemma 4.5.7. The quantity



T = max m2j (σ) (4.147)
σ
j≤M
4.5 Controlling the Overlaps 281

satisfies
  k 
k
∀k , ET ≤ L 1 +
k k
, (4.148)
N
and therefore
∀k ≤ N , ET k ≤ Lk . (4.149)
This means that for all practical purposes, one can think of T as being
bounded. This quantity occurs in many situations.
Proof. We have
NT  N  2
exp ≤ exp mj (σ)
4 σ
4
j≤M

and therefore
NT  N  2
E exp ≤ E exp mj (σ)
4 σ
4
j≤M
  N 2
= E exp m (σ) ≤ 2N +M , (4.150)
σ j≤M
4 j

using independence and (A.24) to see that E exp(N m2j (σ)/4) ≤ 2. We use
Lemma 3.1.8 with X = N T /4 to get, since M ≤ N , that

EX k ≤ 2k (k k + (LN )k ) ≤ (LN )k + (Lk)k . 




Corollary 4.5.8. For any number C that does not depend on N we have
E exp CT ≤ K(C).
Proof. We can either use (4.150) and Hölder’s inequality or expand the
exponential as a power series and use (4.148). 

Lemma 4.5.9. We have
 −
∀k ≤ N , E z2k ≤ Lk . (4.151)

Proof. Since G is the convolution of G and γ, we have


 −
z2k = x + y2k dG (x)dγ(y)
 
≤2 2k
x dG (x) + y dγ(y) .
2k 2k
(4.152)

Since

exp(βN y2 /4)dγ(y) = W exp(−βN y2 /4)dy = 2M/2 ,


282 4. The Hopfield Model

and since exp x2 ≥ x2k /k!, taking k = N implies

(N β)N
y2N dγ(y) ≤ 2M/2 ,
4N N !
 
so that y2N dγ(y) ≤ LN , and in particular y2k dγ(y) ≤ Lk for k ≤ N
by Hölder’s inequality.
By definition G is the image of G under that map σ → m(σ) =
(mk (σ))k≤M , so that

G ({x ∈ RM ; x2 > T }) = 0 , (4.153)



and hence x2k dG (x) ≤ T k . The result then follows from (4.149). 


Proposition 4.5.10. For k ≤ N we have


 k/2
 
− 2k − Kk
E (U1,1 − E U1,1 ) ≤ (4.154)
N
 k/2
 
− 2k − Kk
∀j ≤ M, E (zj − E zj ) ≤ (4.155)
N
 k/2
 
− 2k − Kk
E (U1,2 − E U1,2 ) ≤ . (4.156)
N

Proof. Condition (4.151) implies that for k ≤ N we have


 −
E (U1,1 − E U1,1 ∗ )2k ≤ Lk ,

so that (4.142) and (4.146) imply


 k/2    k/2
 − Kk K Kk
E (U1,1 − E U1,1 ∗ )2k ≤ + K exp − Lk ≤
N N N

for k ≤ N . This yields in particular

− K
|E U1,1 − E U1,1 ∗ | ≤ √
N
and (4.154). The proof of (4.155) is similar, and only a small adaptation of
(4.146) to the case of 2 replicas is required to prove (4.156) using the same
scheme. 

The measure G itself is a technical tool. What we are really looking for
is information about G , and we are ready to prove it. We denote by · an
average for G .
4.5 Controlling the Overlaps 283

Proposition 4.5.11. For k ≤ N we have


 k/2
  Kk
E (U1,1 − E U1,1 )2k ≤ (4.157)
N
 k/2
  Kk
∀j ≤ M, E (zj − E zj )2k ≤ (4.158)
N
 k/2
  Kk
E (U1,2 − E U1,2 )2k ≤ . (4.159)
N
Proof. The basic reason this follows from Proposition 4.5.10 is that “con-
volution spreads out the measure”, so that statements of Proposition 4.5.10
are stronger than corresponding statements of Proposition 4.5.11. For x, y in
RM , let us write 
(x, y) = xj yj ,
2≤j≤M

so that U1,1 (z) = (z, z). Then, since for all x we have (x, y)dγ(y) = 0, and
since G is the convolution of G and γ,


U1,1 = (x + y, x + y)dG (x)dγ(y) = (x, x)dG (x) + C

= U1,1 + C ,

where C = (y, y)dγ(y) is non-random. Thus, using (4.154),
 k/2
− − 2k Kk
E( U1,1 − E U1,1 ) = E( U1,1 − E U1,1 ) 2k
≤ . (4.160)
N
Next,
 −
(U1,1 − U2,2 )2k =
 1 2k
(x + y1 , x1 + y1 ) − (x2 + y2 , x2 + y2 ) dG (x1 )dG (x2 )dγ(y1 )dγ(y2 )

≥ ((x1 , x1 ) − (x2 , x2 ))2k dG (x1 )dG (x2 ) , (4.161)

by using Jensen’s inequality to integrate in γ inside the power (·)2k rather
than outside, and using again the fact that (x, y)dγ(y) = 0. Thus, applying
Jensen’s inequality in the second inequality below, we get
 −    
(U1,1 − U2,2 )2k ≥ (U1,1 − U2,2 )2k ≥ (U1,1 − U1,1 )2k .

Since (U1,1 − U2,2 )2k ≤ 22k ((U1,1 − E U1,1 − )2k + (U2,2 − E U1,1 − 2k
) ), using
(4.154) yields
 k/2
  Kk
E (U1,1 − U1,1 ) 2k
≤ .
N
284 4. The Hopfield Model

Combining with (4.160) proves (4.157); the rest is similar. 



The following improves on (4.158) when j ≥ 2.
Proposition 4.5.12. For 2 ≤ j ≤ M and k ≤ N we have
 k/2
  Kk
E zj2k ≤ . (4.162)
N

Proof. Using (4.158) it suffices to see that |E zj | ≤ K/ N . Using symme-
try between sites,
) *
 2 1 
(E zj ) ≤ E zj =
2
E 2
zj .
M −1
2≤j≤M

It follows from
 (A.56) (used for a = 1) that with overwhelming probability
T = maxσ j≤M m2j (σ) ≤ LM/N . Using (4.149) and the Cauchy-Schwarz
inequality to control the expectation
 of T on 2the rare event where this fails, we
obtain that ET ≤ LM/N . Since 2≤j≤M zj ≤ T by (4.153), this concludes
the proof. 


4.6 Approximate Integration by Parts and the


Replica-Symmetric Equations

We denote by · an average for the Gibbs measure with Hamiltonian (4.25),


and we write ν(f ) = E f .
Since G is the image of the Gibbs measure under the map σ → m(σ) =
(mk (σ))k≤M , for a function f on RM we have f = f (m1 (σ), . . . , mM (σ)) ,
and similar formulas hold for replicas.
We define the following quantities

μ = ν(m1 (σ)) = E z1 (4.163)


  
ρ=ν m2k (σ) = E U1,1 (4.164)
2≤k≤M
  
r=ν mk (σ 1 )mk (σ 2 ) = E U1,2 (4.165)
2≤k≤M
q = ν(R1,2 ) . (4.166)

As in (3.59) these quantities depend on (β, h) and M, N , although this is


not indicated by the notation. The purpose of this section is to show that these
fundamental quantities attached to the system nearly satisfy the following
system of (exact) equations:
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 285
√ √
μ = Eth(βz r + βμ + h) ; q = Eth2 (βz r + βμ + h) ; r(1 − β(1 − q))2 = αq
(4.167)
and
(ρ − r)(1 − β(1 − q)) = α(1 − q) , (4.168)
where as usual α = M/N and z is a standard Gaussian r.v. The equations
(4.167) are called the replica-symmetric equations. To pursue the study of
the Hopfield model it seems then required to show that the system of replica-
symmetric equations determine the values of μ, r and q. This task is in prin-
ciple elementary, but it is quite tedious and is deferred to Volume II. For the
time being, our study of the Hopfield model will culminate with the proof
that the quantities (4.163) to (4.165) nearly satisfy the replica-symmetric
equations (4.167). The correct result is that the replica-symmetric equations
are satisfied “with accuracy K/N ”. The methods
√ of this chapter do not seem
to be able to reach better than a rate K/ N , for the reasons stated after
Proposition 4.5.4. Even reaching that rate requires significant work. We have
made the choice to prove in this section that the replica-symmetric equations
hold with rate K/N 1/4 , even though the proof that the equations “just hold
in the limit” (without a rate) is simpler. Besides the fact that this choice
is coherent with the use of the quantitative methods that form the core of
this work, it is really a pretense to learn the fundamental technique of ap-
proximate integration by parts that we will use a great many times later.

Before we start the proof we observe that we can reformulate Propositions


4.5.11 and 4.5.12 as follows.

Proposition 4.6.1. For k ≤ N , we have


 2k   k/2
 Kk
ν mj (σ) − ρ
2
≤ ; (4.169)
N
2≤j≤M
 2k   k/2
 Kk
ν mj (σ )mj (σ ) − r
1 2
≤ ; (4.170)
N
2≤j≤M
 k/2
Kk
for all 2 ≤ j ≤ M , ν(mj (σ)2k ) ≤ . (4.171)
N

Given σ = (σ1 , . . . , σN ), we write ρ = (σ1 , . . . , σN −1 ) ∈ ΣN −1 , and


1 
nk = nk (σ) = nk (ρ) = ηi,k σi . (4.172)
N
i≤N −1

We note that
ηk σN
mk (σ) = nk (σ) + , (4.173)
N
where for simplicity we write ηk rather than ηN,k .
286 4. The Hopfield Model

Lemma 4.6.2. We have

μ = ν(σN ) ; (4.174)
1 2
q = ν(σN σN ) ; (4.175)
  
M −1
ρ= + ν σN ηk nk (σ) ; (4.176)
N
2≤k≤M
  
M −1 1
r= q + ν σN ηk nk (σ 2 ) . (4.177)
N
2≤k≤M

Proof. Using (4.173) and symmetry among sites yields


1
ν(m2k (σ)) = ν(ηk σN mk (σ)) = + ν(σN ηk nk (σ)) ,
N
from which (4.176) follows by summation over 2 ≤ k ≤ M . Relation (4.177)
is similar and the rest is obvious. 

To use these formulas, we make the dependence of the Hamiltonian on
σN explicit. We define
Nβ 
−HN −1,M (ρ) = n2k (ρ) + N hn1 (ρ) .
2
1≤k≤M

(Despite the notation, this is not exactly the Hamiltonian of an (N − 1)-spin


system; more specifically, this is the Hamiltonian of an (N − 1)-spin system
where β has been replaced by βN/(N − 1).) Using (4.173) in the definition
of HN,M and expending the squares shows that

− HN,M (σ) = −HN −1,M (ρ) + βσN ηk nk (ρ) + σN h , (4.178)
1≤k≤M

ignoring the constant βM/(2N ) that plays no role. The strategy we will follow
should come as no surprise. We will express the averages · in Lemma 4.6.2
using the Hamiltonian (4.178). We will bet that the quantities

ηk nk (ρ) (4.179)
2≤k≤M

have a Gaussian behavior, and to bring this out we will interpolate them with
suitable Gaussian r.v.s. The reader may observe that in (4.179) the sum is
over 2 ≤ k ≤ M . The quantity n1 (ρ) requires a special treatment. The idea is
that for k ≥ 2 the quantity nk (ρ) should be very small, allowing the quantity
(4.179) to have a Gaussian behavior. On the other hand, one should think of
the quantity n1 (ρ) as n1 (ρ)  μ = 0.
Given replicas, ρ1 , . . . , ρn , we write nk = nk (ρ ), and given a parameter
t we define
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 287
√  √ √ √
gt = t ηk nk + 1 − t(z r + ξ  ρ − r) , (4.180)
2≤k≤M

where z, ξ  ,  ≥ 1 are independent standard Gaussian r.v.s. Denoting by · −


an average for the Gibbs measure with Hamiltonian HN −1,M , and given a
function f = f (σ 1 , . . . , σ n ), that might be random, we define

Avε1 ,...,εn =±1 f Et −


νt (f ) = E , (4.181)
Eξ Avε1 ,...,εn =±1 Et −

where ε = σN , Eξ denotes as usual expectation in ξ 1 , . . . , ξ n , and where
  
Et = exp ε β(gt + tn1 + (1 − t)μ) + h . (4.182)
≤n

As already pointed out, the quantity n1 receives special treatment compared


to the quantities nk , 2 ≤ k ≤ M , and the previous interpolation implements
the idea that n1  μ.
Given a function f = f (σ 1 , . . . , σ n , x1 , . . . , xn ) we write

ft = f (σ 1 , . . . , σ n , gt1 , . . . , gtn ) . (4.183)

We shall show that for the four choices of f occurring in Lemma 4.6.2 we
have ν0 (f0 )  ν1 (f1 ), where  means that the error is ≤ KN −1/4 . This will
provide the desired equations for μ, ρ, r, q. The computation of ν0 (f0 ) is fun,
so we do it first. We write

Y = βz r + βμ + h .

Lemma 4.6.3. a) If f (σ 1 ) = σN
1
then

ν0 (f0 ) = E thY . (4.184)

b) If f (σ 1 , σ 2 ) = σN
1 2
σN then

ν0 (f0 ) = E th2 Y . (4.185)

c) If f (σ 1 , x1 ) = σN
1
x1 then

ν0 (f0 ) = βr(1 − E th2 Y ) + β(ρ − r) . (4.186)

d) If f (σ 1 , x1 , x2 ) = σN
1
x2 then

ν0 (f0 ) = β(ρ − r)q + βr(1 − E th2 Y ) . (4.187)



Proof. Let Y = Y + β ρ − rξ. Then

β(ρ − r)
Eξ shY = exp shY
2
288 4. The Hopfield Model

and similarly for Eξ chY . Since

1 shY Eξ shY
ν0 (σN )=E =E ,
Eξ chY Eξ chY

this makes (4.184) obvious, and (4.185) is similar. So we prove (4.186). Now
√ √
Eξ (z r + ξ ρ − r)shY
ν0 (f0 ) = E ,
Eξ chY

and by integration by parts



Eξ ξ ρ − rshY = β(ρ − r)Eξ chY .

Thus, integrating by parts in the second equality,


√ 1
ν0 (f0 ) = β(ρ − r) + E z rthY = β(ρ − r) + βrE ,
ch2 Y
and the conclusion follows since 1 − th2 Y = 1/ch2 Y . The proof of (4.187) is
similar. 

In the reminder of the chapter we shall prove that ν(f1 ) = ν1 (f1 )  ν0 (f0 )
for the functions of Lemma 4.6.3. Before doing this, we explain why this
implies that (μ, q, r) is nearly a solution of the system of equations (4.167).
Combining the relation ν(f1 ) = ν1 (f1 )  ν0 (f0 ) with Lemma 4.6.2 proves
that the relations
μ  E thY ; q  E th2 Y (4.188)
ρ  α + βr(1 − q) + β(ρ − r) ; (4.189)
r  αq + β(ρ − r)q + βr(1 − q) (4.190)
hold. Subtraction of the last two relations gives

(ρ − r)(1 − β(1 − q))  α(1 − q) . (4.191)

We rewrite (4.190) as

r(1 − β(1 − q))  αq + β(ρ − r)q ,

and we multiply by 1 − β(1 − q) to get

r(1 − β(1 − q))2  αq(1 − β(1 − q)) + βq(ρ − r)(1 − β(1 − q)) .

Using (4.191) in the second term in the right-hand side then yields

r(1 − β(1 − q))2  αq . (4.192)

This shows as promised that (μ, q, r) is nearly a solution of the system of


equations (4.167).
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 289

We turn to the comparison of ν(f1 ) and ν0 (f0 ), with the goal of proving
that for the functions of Lemma 4.6.3 these two quantiles are nearly equal.
As expected this will be done by controlling the derivative of the function
t → νt (ft ). We define
  
1 1 √ √
gt = √ ηk nk − √ (z r + ξ  ρ − r) . (4.193)
2 t 2≤k≤M 2 1−t

Lemma 4.6.4. We have


d
νt (ft ) = I + II + III (4.194)
dt
where
  
∂ft
I= νt gt (4.195)
∂x
≤n
 
II = β νt (ε gt ft ) − nνt (εn+1 gtn+1 ft ) (4.196)
≤n

III = β νt (ε (n1 − μ)ft ) − nνt (εn+1 (nn+1
1 − μ)ft ) . (4.197)
≤n


Here of course ε = σN and
∂ft ∂f
= (σ 1 , . . . , σ n , gt1 , . . . , gtn ) .
∂x ∂x
Proof. This looks complicated, but this is straightforward differentiation.
There are 3 separate reasons why νt (ft ) depends on t. First, ft depends on t
through gt , and this creates the term I. Second, νt (ft ) depends on t because
in (4.181) the term Et depends on t through gt , and this creates the term
II. Finally, νt (ft ) depends on t because in (4.181) the term Et depends on
t through the quantity (1 − t)μ, and this creates the term III. Let us also
mention that for clarity we have stated this result for general n but that the
case n = 2 suffices. 

We would like to integrate by parts in the terms I and II using (4.193).
Unfortunately the r.v. ηk is not Gaussian, it is a random sign. We now de-
scribe the technique, called “approximate integration by parts”, that is a
substitute of integration by parts for such variables.
The basic fact is that if v is a three times differentiable function on R,
then
1
1
v(1) − v(−1) = v (1) + v (−1) + (x2 − 1)v (x)dx . (4.198)
2 −1

This is proved by integrating the last term by parts,


290 4. The Hopfield Model
1 x=1 1
1 2 1 
(x − 1)v (x)dx = (x2 − 1)v (x) − xv (x)dx ,
−1 2 2 x=−1 −1

1 x=1 1

xv (x)dx = xv (x) − v (x)dx
−1 x=−1 −1
= v (1) + v (−1) − (v(1) − v(−1)) .
If η is a r.v. such that P(η = ±1) = 1/2, then (4.198) implies
1
1
E ηv(η) = E v (η) + (x2 − 1)v (x)dx . (4.199)
4 −1

We will call E v (η) the main term and the last term the error term. This term
will have a tendency to be small because v will depend little on η. √Typically
every occurrence of η in v is multiplied by a small factor (e.g. 1/ N ). We
will always bound the error term through the crude inequality
 
1 1 2 
 (x − 1)v (x)dx ≤ sup |v (x)| . (4.200)
4
−1 |x|≤1

The contribution of the main term is what we would get if the r.v. η had
been Gaussian.
We start to apply approximate integration by parts to (4.194). We take
care of the main terms first. These terms are the same as if we were integrating
by parts for Gaussian r.v.s, and we have learned how to make this calculation
in Chapter 3. Let us set
 
S, = nk nk ,
2≤k≤M


so that (being careful to distinguish between gt and gt , where the position
of the  is not the same) the relations

 =  ⇒ E gt gt = S, − r
E gt gt = S, − ρ
hold and integration by parts brings out factors S, − r and S, − ρ. The
dependence on gt is through the Hamiltonian and ft . It then should be clear
that the contribution of the main terms to the integration by parts in II is
bounded by
    
  ∂ft 
IV = K νt 
|ft | +   |S, − r|
∂x 
= ≤n+2

    
 ∂ft 
+ νt 
|ft | +   |S, − ρ| . (4.201)
∂x 
≤n+1
4.6 Approximate Integration by parts; the Replica-Symmetric Equations 291

Here, as well as in the rest of the section the quantity K is permitted to


depend on n. In this bound we would like to have ν rather than νt . Since |ft |
depends on gt ,  ≤ n, we cannot readily use differential inequalities to relate
ν and νt . The next page or so will take care of that technical problem. We
will then show how to control the error terms in the approximate integration
by parts, which is not trivial.
Sophistication is not needed to prove that we can replace νt by ν in
(4.201), but the details are tedious.
Lemma 4.6.5. Consider a function f ∗ ≥ 0 of n replicas ρ1 , . . . , ρn , n ≤ 3,
that might also depend on ηk , z, ξ  for  ≤ n and k ≤ M . Then
 
νt (f ∗ ) ≤ Kν (E0 f ∗2 )1/2 exp KT − (4.202)

where 
T − = sup n2k (ρ) , (4.203)
ρ
2≤k≤M

and where E0 denotes expectation in the r.v.s ηk , z and ξ  . Moreover

νt (f ∗ ) ≤ Kν((E0 f ∗2 )1/2 ) + exp(−N )ν(E0 f ∗2 )1/2 . (4.204)

The restriction n ≤ 3 is simply so that K does not depend on n.


Proof. We write the definition of νt (f ∗ ) as in (4.181). Let

Yt, = β(gt + tn1 + (1 − t)μ) + h



so that Avε1 ,...,εn Et = ≤n chYt, ≥ 1. Since f ∗ is a function of ρ1 , . . . , ρn ,
equality (4.181) implies
 ∗ 
f   
∗ ≤n chYt, −
νt (f ) = E   ≤ E f∗ chYt, .
Eξ ≤n chYt, − −
≤n

Taking first expectation E0 inside the bracket and using the Cauchy-Schwarz
inequality we get
 1/2
νt (f ∗ ) ≤ E (E0 f ∗2 )1/2 E0 ch2 Yt, . (4.205)
≤n −

We claim that 
E0 ch2 Y,t ≤ K exp KT − . (4.206)
≤n

Using that n ≤ 3 and Hölder’s inequality, it suffices to show that

E0 ch6 Yt, ≤ K exp KT − .


292 4. The Hopfield Model

First we observe that ch6 x ≤ L(exp 6x+exp(−6x)). As in the proof of (A.21),


for numbers ak we have
   
E exp ηk ak = chak = exp log chak ≤ exp a2k /2 ,
k≤M k≤M k≤M k≤M

and recalling the definition (4.180) of gt , using (A.6) and independence, we
see that indeed
E0 exp(±6Yt, ) ≤ K exp KT − .
Combining (4.206) and (4.205) we get
 
νt (f ∗ ) ≤ KE (E0 f ∗2 )1/2 − exp K1 T − . (4.207)

For a function f ∼ ≥ 0 that depends only on ρ1 , . . . , ρn ,


 ∼ 

f ≤n chY1, −
E0 f = E0  
≤n chY1, −
1
≥ f∼ − E0 n
chY1,1 −
≥ f∼ − exp(−K2 T ) ,−

using that E0 (1/X) ≥ 1/E0 X for X = chY1,1 n , and using (4.206) for
t = 1. We write this inequality for f ∼ = (E0 f ∗2 )1/2 (that depends only
on ρ1 , . . . , ρn ), we multiply by exp((K1 + K2 )T − ) and we take expectation
to get
   
E (E0 f ∗2 )1/2 − exp K1 T − ≤ ν (E0 f ∗2 )1/2 exp KT − .

Combining with (4.207) this proves (4.202). The point of (4.204) is that T −
is not bounded, so we write
 
ν (E0 f ∗2 )1/2 exp KT − ≤ exp KLν((E0 f ∗2 )1/2 )
 
+ ν 1{T − ≥L} (E0 f ∗2 )1/2 exp KT − .

The last term is  


E 1{T − ≥L} exp KT − (E0 f ∗2 )1/2
and using Hölder’s inequality we bound it by

P(T − ≥ L)1/4 (E exp 4KT − )1/4 ν(E0 f ∗2 )1/2 .

Using Corollary 4.5.8 for N − 1 rather than N yields that E exp 4KT − ≤ K.
Using (4.150) for for N − 1 rather than N we then obtain that if L is large
enough we have P(T − ≥ L) ≤ exp(−4N ). 

4.6 Approximate Integration by parts; the Replica-Symmetric Equations 293

Corollary 4.6.6. If f is one of the functions of Lemma 4.6.3 then the term
(4.201) satisfies
 
N
IV ≤ Kν(|S1,2 − r| + |S1,1 − ρ|) + K exp − .
K

Proof. First we note that E0 ft2 ≤ K and E0 (∂ft /∂x )2 ≤ K. Let


  
 ∂ft 

f = |ft | +    |S, − r| ,
∂x 

so that
E0 f ∗2 ≤ K|S, − r|2 . (4.208)

Now, the Cauchy-Schwarz inequality implies | 2≤k≤M mk (σ )mk (σ )| ≤ T ,
1 2

so that recalling (4.164) and (4.149) we have |r| ≤ E T ≤ K. In a similar


∗2
 ) ≤ K. Thus (4.208) proves that ν(E0 f ) ≤ K, and
2
manner we get ν(S,
(4.204) proves that

νt (f ∗ ) ≤ Kν(|S, − r|) + K exp(−N ) .

Proceeding in the same manner for the other terms of (4.201) completes the
proof. 

Next we deduce from Proposition 4.6.1 that the term IV is ≤ KN −1/4 .
Using obvious notation,
 
S1,2 − m1k m2k = n1k n2k − m1k m2k
2≤k≤M 2≤k≤M

= ((n1k − m1k )m2k + m1k (n2k − m2k ))
2≤k≤M

+ (n1k − m1k )(n2k − m2k )
2≤k≤M

and using that |nk − mk | ≤ 1/N , the Cauchy-Schwarz inequality and (4.149)
yields  
   K
ν S1,2 − m1k m2k  ≤ √ .
2≤k≤M
N

Using (4.170) for k = 1 we then obtain that ν(|S1,2 − r|) ≤ KN −1/4 . We then
proceed similarly for the other terms.
In this manner we can control the main terms produced by approximate
integration by parts in the term II of (4.196). The case of the term I of
(4.196) is entirely similar, and the term III of (4.197) is immediate to control
as in Corollary 4.6.6. We turn to the control of the error terms produced
by approximate integration by parts. Let us fix 2 ≤ k ≤ M , and consider
294 4. The Hopfield Model

approximate integration by parts in ηk using (4.199). Consider e.g. the case


of the term
νt (ηk nk ε ft ) ,
on n ≤ 3 replicas. We have to consider the case of the four functions of
1
Lemma 4.6.3. We consider only the case where f (σ 1 , x1 ) = σN x1 . The other

cases are completely similar. In this case the term νt (ηk nk ε ft ) is simply

νt (ηk nk ε1 ε gt1 ) . (4.209)




Let us define gt,x as gt in (4.180) except that we replace the term tηk nk

by txnk . Recalling (4.182) let us define Et,x as Et but using gt,x 
instead of
gt , so that Et,x does not depend on ηk . For a possible random function f ∗ of


t, x, σ 1 , · · · , σ n let us define
Eξ Avε1 ,...,εn =±1 f ∗ Et,x −
f∗ t,x = .
Eξ Avε1 ,...,εn =±1 Et,x −
We consider the function

v(x) = E nk ε1 ε2 gt,x


1
t,x .

In words, in the definition of the term (4.209), we replace every occurrence


of ηk by x. We note that Eηk v(ηk ) is the quantity (4.209), and that E v (ηk )
is the “main term” in the approximate integration by parts, that we have
already taken into account.
Since there is a factor nk in front of the occurrence of x in gt,x 
, differen-
tiation of v(x) in x brings out such a factor  in each term. It should then be
obvious using the inequality |x1 x2 x3 x4 | ≤ ≤4 x4 that
  
|v (x)| ≤ KE (nk )4 (1 + |gt,x
1
|) .
t,x
≤n+3

We then reproduce the argument of Lemma 4.6.5 to find that this quantity
is bounded by
Kν((nk )4 ) + K exp(−N/K) . (4.210)
The bound (4.200) implies that the error term created by the approximate
integration by parts in the quantity νt (η nk ε ft ) is bounded by the quantity
(4.210). The sum over all values of k of these errors is bounded by
    
N
ν (nk ) + K exp −
4
.
K
2≤k≤M

Writing x4 = x · x3 and using the Cauchy-Schwarz inequality yields


   1/2   1/2
(nk ) ≤
4
(nk )2
(nk )6
,
2≤k≤M 2≤k≤M 2≤k≤M
4.7 Notes and Comments 295

and using the Cauchy-Schwarz inequality for ν we get


     1/2    1/2

ν 4
(nk ) ≤ ν (nk )2
ν (nk )6
.
2≤k≤M 2≤k≤M 2≤k≤M

Now we use (4.171) for k = 3 to see that ν((nk )6 ) ≤ KN −3/2 and thus
  
K
ν (nk )6 ≤ 1/2 .
2≤k≤M
N

Finally, recalling (4.203) we have 2≤k≤M (nk )2 ≤ T − , so that, using (4.149)

for N − 1 rather than N , we get ν( 2≤k≤M (nk )2 ) ≤ E T − ≤ L . Therefore
  
K
ν (nk )4 ≤ .
N 1/4
2≤k≤M

This completes the proof that the equations (4.188) and (4.192) are satisfied
with error terms ≤ KN −1/4 .

4.7 Notes and Comments

The Hopfield model was introduced in [118], but became popular only after
Hopfield [79], [80] put it forward as a model of memory. For this aspect
 a model
as of memory, it is the energy landscape, i.e. the function σ →
2
k≤M m k (σ) that matters. There are some rigorous results, [112], [97], [142],
[132], [56] but they are based on ad hoc methods, none of which deserves to
appear in a book. A detailed study of the model from the physicists’ point
of view appears in [3].
The first attempt at justifying the replica-symmetric equations can be
found in [121]. The authors try to duplicate the results of [120] for the Hopfield
model, i.e. to establish the replica-symmetric equations under the assump-
tion that a certain quantity does not fluctuate with the disorder. This paper
contains many interesting ideas, but one could of course wonder, among other
things, how one could prove anything at all without addressing the question
of uniqueness of the solutions of these equations. See also [122].
My notation differs from the traditional one as I call r what is traditionally
called rα. Thus, the replica-symmetric equations usually read

q = E th2 (βz rα + βμ + h)

μ = E th(βz rα + βμ + h)
q
r= .
(1 − β(1 − q))2
296 4. The Hopfield Model

This might be natural when one derives these equations from the “replica
trick”. The reason for not following this tradition is that the entire approach
starts with studying the sequence (mk (σ))k≤M , and its global behavior (as in
the Bovier-Gayrard localization theorem). Thus it is natural to incorporate
the data about the length of this sequence (i.e. α) in the parameter r. (Maybe
it is not such a good idea after all, but it is too late to change it anyway!)
The Bovier-Gayrard localization theorem is the culmination of a series of
papers of these authors, sometimes with P. Picco. I am unsure as to whether
the alternate approach I give here is better than the original one, but at least
it is different. Bovier and Gayrard put forward the law of (mk (σ)) under
Gibbs’ measure as the central object. This greatly influenced the paper [142]
where I first proved the validity of the replica-symmetric solution, using the
cavity method. Very soon after seeing the paper [142], Bovier and Gayrard
gave a simpler proof [28], based on convexity properties of the function ψ of
(4.34) (which they proved) and on the Brascamp-Lieb inequalities. It is quite
interesting that the convexity of ψ does not seem to hold in the whole region
where there is replica-symmetry (the physicists’ way to say that R1,2  q).
Despite this the Bovier-Gayrard approach is of interest, as will become even
clearer in Section 6.7. I have largely followed it here, rewriting of course some
of the technicalities in the spirit of the rest of the book. In Volume II I
will present my own approach, which is not really that much more difficult,
although it yields much more accurate results.
In her paper [128], Shcherbina claims that her methods allow her to prove
that the replica-symmetric solution holds on a large region. It would indeed be
very nice to have a proof of the validity of the replica-symmetric solution that
does not require to prove first something like the Bovier-Gayrard localization
theorem. It is sad to see how some authors apparently do not care whether
their ideas will be transmitted to the community or will be lost. More likely
than not, in the present case they will be lost.
The paper [17] should not be missed. The interesting paper [20] is also
related to the present chapter.
5. The V -statistics Model

5.1 Introduction

The model presented in this chapter was invented in an effort to discover


natural Hamiltonians of mathematical interest. It illustrates well the power
of the methods we have developed so far. It presents genuinely new features
compared with the previous models, and these new features are the main
motivations for studying it. The discovery of this model raises the question
as to whether the models presented in this book represent well the main
types of possible features of mean-field models, or whether genuinely new
types remain to be discovered.
The model of the present chapter is related to the Perceptron model of
Chapter 2 at the technical level, so we advise the reader to be comfortable
with that chapter before reading the details of the proofs here. We consider
independent standard normal r.v.s (gi,k )i,k≥1 and for σ ∈ ΣN we define as
usual
1 
Sk = Sk (σ) = √ gi,k σi . (5.1)
N i≤N

We consider a function u : R2 → R. We assume that it is symmetric,

u(x, y) = u(y, x) (5.2)

and, given an integer M , we consider the Hamiltonian


1 
− HN,M (σ) = u(Sk1 , Sk2 ) . (5.3)
N
1≤k1 <k2 ≤M

The name of the model is motivated by the fact that the right-hand side of
(5.3) resembles an estimator known as a V -statistics. However no knowledge
about these seems relevant for the present chapter. The case of interest is
when M is a proportion of N . Then HN,M is of order N . As in Chapter
2, we will be interested only in the “algebraic” structure connected with
the Hamiltonian (5.3), so we will decrease technicalities by making a strong
assumption on u. We assume that for a certain number D,

u and all its partial derivatives of order ≤ 6 are bounded by D. (5.4)


M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 297
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 5, © Springer-Verlag Berlin Heidelberg 2011
298 5. The V -statistics Model

How, and to which extent, this condition can be relaxed is an open problem,
although one could expect that techniques similar to those presented later in
Chapter 9 should bear on this question. We assume

D≥1. (5.5)

Let us first try to describe what happens at a global level. In the high-
temperature regime, we expect to have the usual relation

R1,2  q (5.6)

where R1,2 = N −1 i≤N σi1 σi2 , and where the number q depends on the
system.
Throughout the chapter we use the notation
∂u
w= . (5.7)
∂x
Thus the symmetry condition (5.2) implies

∂u ∂u
w(x, y) = (x, y) = (y, x) . (5.8)
∂x ∂y

The relation (5.6) has to be complemented by the relation


1 
w(Sk11 , Sk12 )w(Sk21 , Sk23 )  r , (5.9)
4N 3
k1 ,k2 ,k3 ≤M

where as usual Sk = Sk (σ  ). The new and unexpected feature is that the
computation of r seems to require the use of an auxiliary function γ(x).
Intuitively this function γ satisfies

1  M
γ(x)  E u(x, Sk ) = E u(x, SM ) ,
N N
k≤M

where of course the bracket denotes an average for the Gibbs measure with
Hamiltonian (5.3). The reason behind the occurrence of this function is the
“cavity in M ” argument. Going from M − 1 to M we add the term
1 
u(Sk , SM )
N
k<M

to the Hamiltonian, and we will prove that in fact this term acts somewhat as
γ(SM ). The function γ will be determined through a self-consistency equation
and will in turn allow the computation of r. In the present model, the “replica
symmetric equations” are a system of there equations with three unknowns,
one of which is the function γ.
5.2 The Smart Path 299

5.2 The Smart Path

We use the same interpolation as in Chapter 2. We consider independent


standard Gaussian r.v.s (ξk )k≤M and, as in (2.15), we consider
 
1  t 1−t
Sk,t = √ gi,k σi + gN,k σN + ξk , (5.10)
N N N
i<N

and the Hamiltonian


1  √
− HN,M,t = u(Sk1 ,t , Sk2 ,t ) + σN 1 − tY , (5.11)
N
1≤k1 <k2 ≤M

where Y is a Gaussian r.v. independent of any other randomness, and where

r = EY 2 (5.12)

will be determined later.


Consider independent copies ξ  of ξ = (ξk )k≤M . We recall that as usual,
Eξ denotes expectation in all the r.v.s ξk . For a function

f = f (σ 1 , . . . , σ n , ξ 1 , . . . , ξ n )

we define f t by the formula (2.19), i.e.


   
1
f t = Eξ f (σ 1 , . . . , σ n , ξ 1 , . . . , ξ n ) exp − Ht , (5.13)
Ztn
σ 1 ,...,σ n ≤n


where Zt = Eξ σ exp(−Ht (σ)) and Ht = Ht,N,M (σ  , ξ  ). We write

d
νt (f ) = E f t ; νt (f ) = νt (f ) ,
dt

and, as usual, ε = σN . We also recall that ν = ν1 .
This interpolation is designed to decouple the last spin. The following is
proved as Lemma 1.6.2.

Lemma 5.2.1. For a function f − on ΣNn


−1 , and a subset I of {1, . . . , n} we
have     
ν0 f − ε = E(thY )cardI ν0 (f − ) = ν0 ε ν0 (f − ) .
∈I ∈I

Throughout this chapter, we write α = M/N . We recall that r = EY 2 , and



that w = ∂u/∂x. As usual we use the notation ε = σN .
300 5. The V -statistics Model

n
Proposition 5.2.2. For a function f on ΣN , we have

νt (f ) = I + II , (5.14)

where, defining
1     
A, = 3
νt ε ε w(Sk 1 ,t , Sk 2 ,t )w(Sk 1 ,t , Sk 3 ,t )f (5.15)
N
for a summation over k1 , k2 , k3 ≤ M , k2 = k1 , k3 = k1 , we have
  n(n + 1)
I= A, − n A,n+1 + An+1,n+2 (5.16)
2
1≤< ≤n ≤n

and
   
n(n + 1)
II = −r νt (ε ε f ) − n νt (ε εn+1 f ) + νt (εn+1 εn+2 f ) .
2
1≤< ≤n ≤n
(5.17)

Proof. Let us write


∂  1 1

Sk,t = S = √ gN,k ε − ξk .
∂t k,t 2 tN 2 (1 − t)N

 
The reader should carefully distinguish between Sk,t and Sk,t (the position
of the  is not the same). From (5.8), we get

∂u
(x, y) = w(y, x) ,
∂y

and the relations (5.10) and (5.11) imply

d 1   
(−HN,M,t (σ  )) = Sk 1 ,t w(Sk 1 ,t , Sk 2 ,t ) + Sk 2 ,t w(Sk 2 ,t , Sk 1 ,t )
dt N
1≤k1 <k2 ≤M
1
− √ ε Y
2 1−t
1   1
= Sk1 ,t w(Sk 1 ,t , Sk 2 ,t ) − √ ε Y .
N
k =k
2 1−t
1 2

Therefore, differentiation of the formula (5.13) yields

νt (f ) = III + IV , (5.18)

where 
III = νt (D f ) − nνt (Dn+1 f )
≤n
5.2 The Smart Path 301

for
1  
D = Sk1 ,t w(Sk 1 ,t , Sk 2 ,t )
N
k1 =k2

and where
 
1
IV = − √ νt (ε Y f ) − nνt (εn+1 Y f ) .
2 1 − t ≤n

It remains of course to integrate by parts. The integration by parts in Y in


the term IV has been done many times and IV = II. Concerning the term
III, we have explained in great detail a similar case in Chapter 2. We think
 
of the r.v.s Sk,t , Sk,t as  and k vary as a jointly Gaussian family of r.v.s. The
relations

ESk,t Sk  ,t = 0

imply that, when integrating by parts in the r.v.s Sk,t , only the dependence
of the Hamiltonian on the randomness creates terms (but not the randomness
of w(Sk 1 ,t , Sk 2 ,t )). For  = , the relations

 1 

ESk,t 
Sk,t = ε ε ; 
ESk,t Sk  ,t = 0 if k = k
2N
hold and (with the usual abuse of notation)
 
∂HN,M,t 1  
= w(Sk,t , Sk2 ,t ) + w(Sk,t , Sk1 ,t )
∂Sk,t N
k<k2 k1 <k
1 
= w(Sk,t , Sk ,t ) .
N 
k =k

Then the result follows by carrying out the computation as in Chapter 2,


following the method outlined in Exercise 2.3.3. 

We recall that α = M/N .

Corollary 5.2.3. Assume that D2 α3 ≤ 1 and |r| ≤ 1. Then for any function
f ≥ 0 on ΣN
n
we have
2
νt (f ) ≤ Ln ν(f ) . (5.19)

Of course the conditions D2 α3 ≤ 1 and |r| ≤ 1 are simply convenient


choices and do not have any intrinsic meaning.
Proof. It follows from (5.14) that |νt (f )| ≤ 2n2 νt (f ), and we integrate. 

302 5. The V -statistics Model

5.3 Cavity in M

We would like, with the appropriate choice of r (i.e. if (5.9) holds), that the
terms I and II of (5.14) nearly cancel out. So we need to make sense of the
term A, . To lighten notation we assume  = 1,  = 2.
In the summation (5.15) there are at most M 2 terms for which k2 = k3 .
Defining
1   
A1,2 = νt ε1 ε2 w(Sk11 ,t , Sk13 ,t )w(Sk21 ,t , Sk22 ,t )f
N3
k1 ,k2 ,k3 all different

(and keeping the dependence on t implicit) we get

KM 2 K K
|A1,2 − A1,2 | ≤ νt (|f |) = α2 νt (|f |) ≤ νt (|f |) , (5.20)
N3 N N
where K is a number depending only on D and α. Each triplet (k1 , k2 , k3 )
brings the same contribution to A1,2 , so that

M (M − 1)(M − 2)  1 1 2 2

A1,2 = 3
νt ε1 ε2 w(SM,t , SM −1,t )w(SM,t , SM −2,t )f .
N
Therefore, defining
 1 1 2 2

C1,2 = νt ε1 ε2 w(SM,t , SM −1,t )w(SM,t , SM −2,t )f , (5.21)

we have
M (M − 1)(M − 2)
A1,2 = C1,2 ,
N3
so that
K
|A1,2 − α3 C1,2 | ≤ .
N
Combining with (5.20) we reach that

K
|A1,2 − α3 C1,2 | ≤ νt (|f |) . (5.22)
N
To estimate C1,2 it seems a good idea to make explicit the dependence of the
Hamiltonian on SM,t , SM −1,t and SM −2,t . Defining

1  √
− HN,M −3,t = u(Sk1 ,t , Sk2 ,t ) + 1 − tσN Y , (5.23)
N
1≤k1 <k2 ≤M −3

it holds that
− HN,M,t = −HN,M −3,t − H , (5.24)
where
5.3 Cavity in M 303

1 
−H = u(Sk1 ,t , Sk2 ,t ) . (5.25)
N
1≤k1 <k2 ≤M,k2 ≥M −2

When in this formula we replace σ by σ  and ξ by ξ  we denote the result


by −H  , and when we do the same for HN,M −3,t we denote the result by

HN,M −3,t .
Let us denote by · t,∼ an average for the Hamiltonian HN,M −3,t , in the
sense of (5.13). That is, for a function f = f (σ 1 , . . . , σ n , ξ 1 , . . . , ξ n ) we define
f t,∼ by the formula
  
1 − 
f t,∼ = n Eξ f (σ , . . . , σ , ξ , . . . , ξ ) exp −
1 n 1 n 
HN,M −3,t ,
Zt,∼ 1 nσ ,...,σ ≤n

 (5.26)
where Zt,∼ is the normalization factor, Zt,∼ = E− ξ σ exp(−H N,M −3,t (σ))

and where Eξ denotes expectation in the r.v.s ξk for  ≥ 1 and k ≤ M − 3.


Let us then define  


E = exp −H .
(5.27)
≤n

−j,t for j = 0, 1, 2 and  ≤ n,


1  n
Then for a function h of σ , . . . , σ and of SM
the identity
Eξ hE t,∼
ht= (5.28)
Eξ E t,∼
holds, where, as usual, Eξ denotes expectation in all the r.v.s “labeled ξ”. Here
hE t,∼ and E t,∼ depend only on the r.v.s ξk for k = M − 1, M − 2, M − 3.
Exercise 5.3.1. Rather than (5.26), let us define
   
1
f t,∼ = n Eξ f (σ , . . . , σ , ξ , . . . , ξ ) exp −
1 n 1 n 
HN,M −3,t ,
Zt,∼
σ 1 ,...,σ n ≤n
(5.29)
where Eξ denotes expectation in all the r.v.s ξk . Show that then that rather
than (5.27) we have
hE t,∼
ht= . (5.30)
E t,∼
Of course, (5.28) and (5.30) are simply two different manners to write the
same identity.
The convention used in (5.29) (i.e. that Eξ stands for expectation in all the
r.v.s. ξ) was used in Chapter 2. It is somewhat more natural than the conven-
tion used in (5.28). As in Chapter 3 we shall not use it here, to avoid having
to constantly remind the reader of it.

Our best guess is that the quantities Sk,t ,  ≤ n, k = M, M − 1, M − 2
will have a jointly Gaussian behavior when seen as functions on the system
with Hamiltonian (5.23). For different values of k they will be independent,
304 5. The V -statistics Model

and for the same value of k their pairwise correlation will be a new parameter
q. So we fix 0 ≤ q ≤ 1 (which will be determined later) and for j = 0, 1, 2,
 ≤ n, we consider independent standard Gaussian r.v.s zj and ξˆj (that are
independent of all the other sources of randomness) and we set

θj = zj q + ξˆj 1−q .

For 0 ≤ v ≤ 1 we define
√ √

Sj,v = 
vSM −j,t + 1 − vθj , (5.31)

keeping the dependence of Sj,v on t implicit. (The reader will observe that,
despite the similarity of notation, it is in practice impossible to confuse the
quantity Sk,t with the quantity Sj,v . Here again we choose a bit of informality
over heavy notation.) Let us denote by

Ev the quantity (5.27) when one replaces each occurrence of


−j,t by Sj,v for  ≤ n and j = 0, 1, 2 .
 
SM (5.32)

For any function h of σ 1 , . . . , σ n and of Sj,v



for j = 0, 1, 2 and  ≤ n, we
define h t,∼ by (5.26) and

hEv t,∼
νt,v (h) = E , (5.33)
Eξ Ev t,∼

where (following our usual convention) Eξ now denotes expectation in the


variables ξk and the r.v.s ξˆj . Therefore, if h depends on σ 1 , . . . , σ n only,
taking expectation in (5.28) yields

νt,1 (h) = νt (h) . (5.34)

Lemma 5.3.2. Consider a function f depending on σ 1 , . . . , σ n only. Then


we have
 
d   
 νt,v (f ) ≤ Ln2 α2 D2 νt,v (f 2 )1/2 νt,v (R1,2 − q)2 1/2 + K νt,v (|f |) . (5.35)
 dv  N

Moreover, if αD ≤ 1 and
1 1 2 2
Bv = w(S0,v , S1,v )w(S0,v , S2,v ), (5.36)

we have
 
d   
 νt,v (Bv f ) ≤ Ln2 D2 νt,v (f 2 )1/2 νt,v (R1,2 − q)2 1/2 + K νt,v (|f |) . (5.37)
 dv  N
5.3 Cavity in M 305

Proof. It is as in Lemma 2.3.2. We compute the derivatives of the function


v → νt,v (f ) (resp. v → νt,v (Bv f )) and we integrate by parts. Defining Sj,v

=

dSj,v /dv, we observe the relations (where the reader will carefully distinguish
 
Sj,v from Sj,v )


ESj,v Sj ,v = 0 if j = j
 
ESj,v Sj,v =0
 
 1 1    t   1 t
 =  ⇒ ESj,v
 
Sj,v = σi σi + σN σN − q = (R,  − q) .
2 N N 2
i<N

 − q in each term”. There


t
So, integration by parts “creates a factor R,
are 3M − 6 ≤ 3M terms in the expression (5.25). Each has a factor 1/N .
Before integration by parts, the expression for the derivative of the function
v → νt,v (f ) contains ≤ LnM terms. Each of these terms uses n + 1 replicas,
and its integration by parts creates ≤ LnM terms. All told, there are at most
Ln2 M 2 terms in the expression for dνt,v (f )/dv, and each of them is bounded
by a term of the type
D2
νt,v (|f ||R,
t
 − q|)
N2
for certain values of  and  ,  =  . So the bound (5.35) simply follows from
the Cauchy-Schwarz inequality. We proceed similarly in the case of (5.36).
The reason why we cannot get a factor α in (5.37) is that when computing
1 1 1 2 2
dνt,v (Bv f )/dv we find the term νt,v (S0,v w (S0,v , S1,v ))w(S0,v , S2,v )f ) where
2 2
w (x, y) = ∂w(x, y)/∂x = ∂ u(x, y)/∂x . When integrating by parts, this
creates a term
t
νt,v ((R1,2 − q)w (S0,v
1 1
, S1,v 2
))w (S0,v 2
, S2,v )f ) ,

and the best we can do is to bound this term by D2 νt,v (|f ||R1,2
t
− q|). 

2
The factor α in (5.35) is not really needed for the rest of the proof. There
is a lot of room in the arguments. However it occurs so effortlessly that we
see no reason to omit it. This might puzzle the reader.

Lemma 5.3.3. If αD ≤ 1, we have

νt,v (|h|) ≤ Ln νt (|h|) ; E |h| t,∼ ≤ Ln νt (|h|) . (5.38)

Proof. The quantity −H of (5.25) satisfies | − H| ≤ 3αD ≤ 3 (bounding


each term u(Sk1 ,t , Sk2 ,t ) by D) so that the quantity Ev of (5.36) satisfies
L−n ≤ Ev ≤ Ln . Thus (5.33) (used for |h| rather than h) implies νt,v (|h|) ≤
Ln E |hEv | t,∼ ≤ L2n E |h| t,∼ . Using again (5.33) in the case v = 1 we get
E |h| t,∼ ≤ Ln νt,1 (|h|) = Ln νt (|h|), using (5.34) in the equality. 


Exercise 5.3.4. Prove the first part of (5.38) using a differential inequality.
306 5. The V -statistics Model

5.4 The New Equation


The purpose of the interpolation of the previous section is that we expect
that νt,0 (B0 f ) will be easier to understand than νt,1 (B1 f ) = νt (B1 f ). This is
the case, but the (un)pleasant surprise is that it will still require significant
work to understand νt,0 (B0 f ). Let us consider the random function
1 

γ (x) = u(Sk,t , x) ,
N
k≤M −3

where the dependence on t is kept implicit in the left-hand side. Then, for
v = 0, the quantity Ev of (5.32) is equal to
E0 = E E (5.39)
for
 
E = exp γ (θj ) (5.40)
≤n j=0,1,2
1 
E = exp (u(θ0 , θ2 ) + u(θ0 , θ1 ) + u(θ1 , θ2 )) . (5.41)
N
≤n

This is seen by separating in the sum (5.25) the terms for which k1 = M − 2
or k1 = M − 1 (these create E ).
Of course the influence of E will be very small; but to understand the
influence of E , we must understand the function γ . We explain first the
heuristics.

We hope that the quantities (Sk,t )k≤M −3 behave roughly like independent
r.v.s under the averages · t,∼ , so that by the law of large numbers, we should
have that for each ,
M −3
γ (x)  E u(S1,t , x) t,∼  αE u(S1,t , x) t,∼ . (5.42)
N
This shows that (in the limit N → ∞) γ does not depend on  and is not
random. We denote by γ this non-random function, and we now look for the
relation it should satisfy. It seems very plausible that
E u(S1,t , x) t,∼  νt (u(S1,t , x)) = νt (u(SM,t , x)) (5.43)
by symmetry. We expect that (5.37) still holds, with a similar proof, if we
define now Bv = u(Sv , x). Assuming that as expected R1,2  q, we should
have νt (B1 )  νt,0 (B0 ) i.e. (with obvious notation: since there is only one
replica, we no longer need replica indices)
νt (u(SM,t , x))  νt,0 (u(θ0 , x))

u(θ0 , x) exp 0≤j≤2 γ(θj )
E  . (5.44)
Eξ exp 0≤j≤2 γ(θj )
5.4 The New Equation 307

Now, using independence


 
Eξ u(θ0 , x) exp γ(θj ) = Eξ u(θ0 , x) exp γ(θ0 ) Eξ exp γ(θj )
0≤j≤2 j=1,2
 
Eξ exp γ(θj ) = Eξ exp γ(θj )
0≤j≤2 j=0,1,2

so that from (5.44) we get

u(θ0 , x) exp γ(θ0 )


νt (u(SM,t , x))  E . (5.45)
Eξ exp γ(θ0 )

Exercise 5.4.1. Find a more economical interpolation to reach (5.45). (Hint:


in (5.25) replace M − 2 by M .)

Let us now write



θ= qz + 1 − qξ ,
where z and q are independent standard Gaussian r.v.s, and repeat that Eξ
denotes expectation in ξ only. Combining the previous chain of equations
(5.42) to (5.45) we reach that the non-random function γ should satisfy the
relation
u(θ, x) exp γ(θ)
γ(x) = αE . (5.46)
Eξ exp γ(θ)
The first task is to prove that this functional equation has a solution.

Lemma 5.4.2. If LDα ≤ 1, given any value of q then there exists a unique
function γ = γα,q from R to [0, 1] that satisfies (5.46). Moreover, given any
other function γ∗ from R to [0, 1] we have
 
 u(θ, y) exp γ∗ (θ) 
sup |γ(x) − γ∗ (x)| ≤ 2 sup γ(y) − αE . (5.47)
x y Eξ exp γ∗ (θ) 

We remind the reader that throughout the book a statement such as “If
LDα ≤ 1...” is a short-hand for “There exists a universal constant L with
the following property. If LDα ≤ 1...”
Proof. This is of course a “contraction argument”. Consider the supremum
norm  · ∞ on the space C of functions from R to [−1, 1]. Consider the
operator U that associates to a function ψ ∈ C the function U (ψ) given by

u(θ, x) exp ψ(θ)


U (ψ)(x) = αE .
Eξ exp ψ(θ)

Since 1/e ≤ exp ψ(θ) ≤ e and |u(θ, x)| ≤ D we have |U (ψ)(x)| ≤ αDe2 , so if
αDe2 ≤ 1 we have U (ψ) ∈ C. Consider ψ1 , ψ2 ∈ C and ϕ(t) = U (tψ1 + (1 −
t)ψ2 ) ∈ C, so that, writing Et = exp(tψ1 (θ) + (1 − t)ψ2 (θ)), we get
308 5. The V -statistics Model

u(θ, x)(ψ1 (θ) − ψ2 (θ))Et


ϕ (t) = αE
Eξ Et
u(θ, x)Et Eξ (ψ1 (θ) − ψ2 (θ))Et
− αE ,
(Eξ Et )2

and since tψ1 + (1 − t)ψ2 ∞ ≤ 1, we have 1/e ≤ Et ≤ e and ϕ (t)∞ ≤


L0 αDψ1 − ψ2 ∞ . Therefore ϕ(1) − ϕ(0)∞ ≤ L0 αDψ1 − ψ2 ∞ , i.e.

U (ψ1 ) − U (ψ2 )∞ ≤ L0 αDψ1 − ψ2 ∞ . (5.48)

Thus for 2L0 αD ≤ 1, the map U is a contraction of C and thus it has a


unique fixed point γ.
We turn to the proof of (5.47). We write that, since γ = U (γ), for any
function γ ∗ ∈ C, we have, when 2L0 αD ≤ 1, and using (5.48),

γ∗ − γ∞ ≤ γ∗ − U (γ∗ )∞ + U (γ∗ ) − U (γ)∞


1
≤ γ∗ − U (γ∗ )∞ + γ∗ − γ∞ .
2
Therefore
1
γ∗ − γ∞ ≤ U (γ∗ ) − U (γ)∞ ,
2
which is (5.47). 


Theorem 5.4.3. Consider any value of 0 ≤ q ≤ 1, and γ as provided by


Lemma 5.4.2. Then assuming

LαD ≤ 1 (5.49)

we have
) 2 *
1  K
∀x , E u(Sk,t , x) − γ(x) ≤ L(αD)2 E (R1,2 −q)2 t,∼ + .
N N
k≤M −3 t,∼
(5.50)

Proof. Let us define


1  1 
A(x) = u(Sk,t , x) − γ(x) ; A∗ (x) = u(Sk,t , x) − γ(x) ,
N N
k≤M −3 k≤M −4
(5.51)
so that
K
|A(x) − A∗ (x)| ≤
. (5.52)
N
Using symmetry between the values of k in the first line and (5.52) in the
second line yields
5.4 The New Equation 309
 
M −3
E A(x)2 t,∼ =E u(SM −3,t , x) − γ(x) A(x)
N t,∼
K
≤ + E (αu(SM −3,t , x) − γ(x))A∗ (x) t,∼ . (5.53)
N
We are again in a “cavity in M ” situation. We need to make explicit the influ-
ence of the term SM −3,t in the Hamiltonian. So we introduce the Hamiltonian
HN,M −4,t as in (5.23) and we denote by · ∗ an average for this Hamiltonian,
keeping the dependence on t implicit. Thus, for a function h of σ and of the
quantities Sk,t , k ≤ M − 3, the formula

hE∗ ∗
Eh t,∼ =E (5.54)
Eξ E∗ ∗

holds, where
1 
E∗ = exp u(Sk,t , SM −3,t ) , (5.55)
N
k≤M −4

and where Eξ denotes expectation in the variables ξk . Again, we must devise a
cavity argument with the underlying belief that SM −3,t will have a Gaussian
behavior. So, considering independent standard Gaussian r.v.s z and ξ, ˆ and
defining

θ = z q + ξˆ 1 − q ,
√ √
for 0 ≤ v ≤ 1 we set Sv = vSM −3,t + 1 − vθ. We consider the function
  
(αu(Sv , x) − γ(x))A∗ (x) exp N1 k≤M −4 u(Sk,t , Sv ) ∗
ψ(v) = E    , (5.56)
Eξ exp N1 k≤M −4 u(Sk,t , Sv ) ∗

ˆ This is exactly the same


where Eξ denotes expectation in the r.v.s ξk and ξ.
procedure we used in (5.33). Thus the relations (5.54) and (5.55) imply

ψ(1) = E (αu(SM −3,t , x) − γ(x))A∗ (x) t,∼ . (5.57)

We will bound |ψ (v)| (as in Lemma 5.3.2) but let us first look at ψ(0).
Defining
1 
B(x) = u(Sk,t , x) = A∗ (x) + γ(x) , (5.58)
N
k≤M −4

we have
(αu(θ, x) − γ(x))A∗ (x) exp B(θ) ∗
ψ(0) = E .
Eξ exp B(θ) ∗
Since we are following the pattern of Section 5.3, it should not come as a
surprise that the value of ψ(0) is not completely trivial to estimate; but a
last interpolation will suffice. For 0 ≤ s ≤ 1 we consider
310 5. The V -statistics Model
 
(αu(θ, x) − γ(x))A∗ (x) exp(sB(θ) + (1 − s)γ(θ)) ∗
ψ∗ (s) = E . (5.59)
Eξ exp(sB(θ) + (1 − s)γ(θ)) ∗

Thus ψ∗ (1) = ψ(0). Using independence and recalling (5.46) yields


 
(αu(θ, x) − γ(x))A∗ (x) exp γ(θ) ∗
ψ∗ (0) = E
Eξ exp γ(θ) ∗
(αu(θ, x) − γ(x)) exp γ(t)
= E A∗ (x) ∗ E =0.
Eξ exp γ(x)

We compute ψ∗ (s) in a straightforward manner, observing that

d
(sB(θ) + (1 − s)γ(θ)) = B(θ) − γ(θ) = A∗ (θ) .
ds
Writing Es = exp(sB(θ) + (1 − s)γ(θ)), we find

(αu(θ, x) − γ(x))A∗ (x)A∗ (θ)Es ∗


ψ∗ (s) = E
Eξ Es ∗
(αu(θ, x) − γ(x))A∗ (x)Es ∗ Eξ A∗ (θ)Es ∗
−E .
(Eξ Es ∗ )2

To bound |ψ∗ (s)|, believe it or not, no integration by parts is required! First


we observe that since |B(x)|, |γ(x)| ≤ LαD, we have 1/L ≤ Es ≤ L. Also,

|αu(θ, x) − γ(x)| ≤ LαD . (5.60)

Using the Cauchy-Schwarz inequality it is then straightforward to get the


bound
1/2 1/2
|ψ∗ (s)| ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ . (5.61)
Since ψ∗ (0) = 0 it follows that
1/2 1/2
ψ∗ (1) = ψ(0) ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ . (5.62)

To bound |ψ (v)| we proceed as in Lemma 5.3.2. We compute ψ (v) through


differentiation and integration by parts, and this integration by parts “creates
t
a factor R,  in each term”. Using the Cauchy-Schwarz inequality and (5.60)

we then get

1/2 1/2 K
|ψ (v)| ≤ LαDE A∗ (x)2 ∗ E (R1,2 − q)2 ∗ + ,
N
so that
1/2 1/2
ψ(1) ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗
1/2 1/2 K
+ LαDE A∗ (x)2 ∗ E (R1,2 − q)2 ∗ + . (5.63)
N
5.4 The New Equation 311

For L αD ≤ 1 we have
1/2 1/2 1 1
LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ ≤ E A∗ (x)2 ∗ + E A∗ (θ)2 ∗ ,
16 16
and the inequality ab ≤ a2 /t + tb2 for t = LαD implies
1/2 1/2 1
LαDE A∗ (x)2 ∗ E (R1,2 −q)2 ∗ ≤ E A∗ (x)2 ∗ +L(αD)2 E (R1,2 −q)2 ∗ .
16
Combining with (5.63) we get
K 1 1
ψ(1) ≤ + E A∗ (x)2 ∗ + E A∗ (θ)2 ∗ + L(αD)2 E (R1,2 − q)2 ∗ .
N 8 16
Combining with (5.53) and (5.57) we then obtain
K 1 1
E A(x)2 t,∼ ≤ + E A∗ (x)2 ∗ + E A∗ (θ)2 ∗
N 8 16
+ L(αD)2 E (R1,2 − q)2 ∗ . (5.64)
Now since |A(x) − A∗ (x)| ≤ K/N we have A∗ (x)2 ≤ A(x)2 + K/N and thus
K
E A∗ (θ)2 ∗ ≤ E A(θ)2 ∗ + .
N
In the quantity E A(θ)2 ∗ , the r.v. θ is independent of the randomness of
· ∗ . So, denoting by E∗ expectation in the randomness of this bracket only,
we have
E∗ A(θ)2 ∗ ≤ sup E∗ A(y)2 ∗ = sup E A(y)2 ∗ ,
y y

and thus, taking expectation,


E A(θ)2 ∗ ≤ sup E A(y)2 ∗ . (5.65)
y

Moreover, as in Lemma 5.3.3, if LαD ≤ 1 we have E |h| ∗ ≤ 2E |h| t,∼ .


Combining these relations we then get from (5.64) that for any x,
K 1 1
E A(x)2 t,∼ ≤ + E A(x)2 t,∼ + sup E A(y)2 t,∼
N 4 4 y
+ L(αD)2 E (R1,2 − q)2 t,∼

and thus
3 1 K
sup E A(x)2 t,∼ ≤ sup E A(y)2 t,∼ + + L(αD)2 E (R1,2 − q)2 t,∼ .
4 x 4 y N
Therefore we get
K
sup E A(x)2 t,∼ ≤ L(αD)2 E (R1,2 − q)2 t,∼ + , (5.66)
x N
and recalling the definition (5.51) of A(x) this is exactly (5.50). 

312 5. The V -statistics Model

5.5 The Replica-Symmetric Solution

Now that we have proved Theorem 5.4.3, we can go back to the study of
the quantity νt,0 (B0 f ) of Section 5.3. Given 0 ≤ q ≤ 1, and the function γ
provided by Lemma 5.4.2, we define
 2
Eξ γ (θ) exp γ(θ)
r∗ = E . (5.67)
Eξ exp γ(θ)

Differentiating in x the relation (5.46) yields

w(x, θ) exp γ(θ)


γ (x) = αE , (5.68)
Eξ exp γ(θ)

and thus |γ (x)| ≤ LαD so that |r∗ | ≤ L when αD ≤ 1.

Proposition 5.5.1. Assume LαD ≤ 1. Then with the notation of Lemma


5.3.2, we have

1/2 K
|νt,0 (f )− f t,∼ | ≤ Ln αD(E f 2 t,∼ )
1/2
(E (R1,2 −q)2 t,∼ ) + (E f2 t,∼ )
1/2
.
N
(5.69)
When Bv is given by (5.36) we have

|α2 νt,0 (B0 f ) − r∗ f t,∼ |


K
≤ Ln αD(E f 2 t,∼ )
1/2
(E (R1,2 − q)2 t,∼ )
1/2
+ (E f 2 t,∼ )
1/2
. (5.70)
N
Proof. We prove (5.70). For 0 ≤ s ≤ 1 we define, recalling the notation
(5.32),
  
E(s) = E0 exp(1 − s)
s 
γ(θj )
j=0,1,2, ≤n

   1  
= exp γ(θj ) +s 
u(Sk,t , θj ) − γ(θj )
N
j=0,1,2, ≤n j=0,1,2
≤n k≤M −3

s    
+ u(θ0 , θ2 ) + u(θ0 , θ1 ) + u(θ1 , θ2 ) , (5.71)
N
≤n

and we consider
B0 f E(s) t,∼
ψ(s) = α2 E . (5.72)
Eξ E(s) t,∼
The fundamental formula (5.33) shows that ψ(1) = α2 νt,0 (B0 f ). As expected
we will compute ψ(0) and bound ψ (s). Using that B0 = w(θ01 , θ11 )w(θ02 , θ22 )
we get, by independence of θj and of the randomness · t,∼
5.5 The Replica-Symmetric Solution 313

w(θ01 , θ11 )w(θ02 , θ22 ) exp j=0,1,2 ≤n γ(θj )
ψ(0) = α2 E  Ef t,∼ . (5.73)
Eξ exp j=0,1,2, ≤n γ(θj )

Using independence between the r.v.s ξˆj we obtain


 
Eξ exp γ(θj ) = Eξ exp γ(θj )
j=0,1,2, ≤n j=0,1,2, ≤n

and

Eξ w(θ01 , θ11 )w(θ02 , θ22 ) exp γ(θj )
j=0,1,2, ≤n

= Eξ w(θ01 , θ11 ) exp(γ(θ01 )


+ γ(θ11 ))Eξ w(θ02 , θ22 ) exp(γ(θ02 ) + γ(θ22 ))
 
×Eξ exp γ(θ21 )Eξ exp γ(θ12 ) Eξ exp γ(θj ) .
3≤≤n j=0,1,2

Therefore
ψ(0) = EU1 U2 E f t,∼ , (5.74)
where
Eξ w(θ01 , θ11 ) exp(γ(θ01 ) + γ(θ11 ))
U1 = α
Eξ exp γ(θ01 )Eξ exp γ(θ11 )
Eξ w(θ02 , θ22 ) exp(γ(θ02 ) + γ(θ22 ))
U2 = α .
Eξ exp γ(θ02 )Eξ exp γ(θ22 )
√ √
Let us now recall that θj = zj q + ξˆj 1 − q, where the Gaussian r.v.s
zj , ξˆj are all independent of each other. Let us denote by Ej expectation in
zj only, and E,j expectation in ξˆj only. Then

EU1 U2 = E(E1 U1 )(E2 U2 )

and
E1,0 E1,1 w(θ01 , θ11 ) exp γ(θ01 ) exp γ(θ11 )
E1 U1 = αE1
E1,0 exp γ(θ01 )E1,1 exp(θ11 )
 
exp γ(θ01 ) w(θ01 , θ11 ) exp γ(θ11 )
= αE1 E1,0 E1,1
E1,0 exp γ(θ01 ) E1,1 exp(θ11 )
 1

exp γ(θ0 ) w(θ01 , θ11 ) exp γ(θ11 )
= αE1,0 E1 E1,1 .
E1,0 exp γ(θ01 ) E1,1 exp γ(θ11 )

Now, using (5.68)

w(θ01 , θ11 ) exp γ(θ11 )


αE1 E1,1 = γ (θ01 ) ,
E1,1 exp γ(θ11 )
314 5. The V -statistics Model

so that
E1,0 γ (θ01 ) exp γ(θ01 )
E1 U1 = .
E1,0 exp γ(θ01 )
In a similar manner,
E2,0 γ (θ02 ) exp γ(θ02 )
E2 U2 = ,
E2,0 exp γ(θ02 )
so that
Eξ γ (θ) exp γ(θ)
E1 U1 = E2 U2 = ,
Eξ exp γ(θ)
and thus EU1 U2 = E(E1 U1 )2 = r∗ by (5.67). Thus we have shown that
ψ(0) = r∗ E f t,∼ .
To bound ψ (s), we proceed very much as in the proof of (5.61). We define
1 
A (x) = 
u(Sk,t , x) − γ(x) .
N
k≤M −3

Comparing with (5.51) yields


E A (θj )2 t,∼ = E A(θ)2 t,∼ . (5.75)
We observe the relation
  
E (s) = A 
(θj ) + C E(s) ,
≤n j=0,1,2
  
where C = N −1 ≤n u(θ0 , θ2 ) + u(θ0 , θ1 ) + u(θ1 , θ2 ) , so that |C| ≤ K/N .
We observe that exp(−Ln) ≤ E(s) ≤ exp(Ln), since αD ≤ 1. We differentiate
the formula (5.72), we use the Cauchy-Schwarz inequality and that α2 |B0 | ≤ 1
(since |B0 | ≤ D and αD ≤ 1) to obtain, using (5.75):
K
|ψ (s)| ≤ Ln (E f 2 t,∼ )
1/2
(E A(θ)2 t,∼ )
1/2
+ E |f | t,∼ . (5.76)
N
The random variable θ is independent of the randomness of · t,∼ , and there-
fore as in (5.65) we have
E A(θ)2 t,∼ ≤ sup E A(y)2 t,∼ .
y

Combining with (5.76) and (5.66) we get


K
|ψ (s)| ≤ Ln αD(E f 2 t,∼ )
1/2
(E (R1,2 − q)2 t,∼ )
1/2
+ (E f 2 t,∼ )
1/2
N
and this proves (5.70). The proof of (5.69) is similar but much simpler. 

We are finally ready to control the terms (5.15) in Proposition 5.2.2. We
consider only the case n = 2 for simplicity.
5.5 The Replica-Symmetric Solution 315

Proposition 5.5.2. Assume that LαD ≤ 1. Then we have


 1/2 K
|A1,2 − αr∗ νt (ε1 ε2 f )| ≤ Lα2 Dνt (f 2 )1/2 νt (R1,2 − q)2 + νt (f 2 )1/2 .
N
(5.77)
Proof. From (5.69) and (5.70) we get that, using Lemma 5.3.3 in the second
inequality,
|α2 νt,0 (B0 f ) − r∗ νt,0 (f )| ≤ LαD(E f 2 t,∼ )1/2 (E (R1,2 − q)2 t,∼ )
1/2

K
+ E f 2 2t,∼
N
 1/2
≤ LαDνt (f 2 )1/2 νt (R1,2 − q)2
K
+ νt (f 2 )1/2 . (5.78)
N
It follows from (5.37) and Lemma 5.3.3 again that
 1/2 K
|α2 νt,0 (B0 f )−α2 νt (B1 f )| ≤ Lα2 D2 νt (f 2 )1/2 νt (R1,2 −q)2 + νt (f 2 )1/2 .
N
Moreover from (5.35) we see that the quantity r∗ |νt,0 (f ) − νt (f )| satisfies the
same bound. Combining with (5.78) we obtain
 1/2 K
|α2 νt (B1 f ) − r∗ νt (f )| ≤ LαDνt (f 2 )νt (R1,2 − q)2 + νt (f 2 )1/2 .
N
Replacing f by ε1 ε2 f , and since νt (B1 ε1 ε2 f ) = C1,2 by (5.21), the result
follows from (5.22). 

2
Corollary 5.5.3. If f is a function on ΣN , if LαD ≤ 1, and if
r = αr∗ (5.79)
we have
 1/2 K
|νt (f )| ≤ Lα2 Dν(f 2 )1/2 ν (R1,2 − q)2 + ν(f 2 )1/2 . (5.80)
N
Proof. We combine (5.14) and (5.77). 

√ √
Theorem 5.5.4. If LαD ≤ 1, α ≤ 1, writing as usual θ = z q + ξ 1 − q,
the system of three equations (5.46),
 2
Eξ γ (θ) exp γ(θ)
r = αE (5.81)
Eξ exp γ(θ)

q = Eth2 z r (5.82)
with unknown (q, r, γ) has a unique solution and
  K
ν (R1,2 − q)2 ≤ . (5.83)
N
316 5. The V -statistics Model

Proof. First we will show that (5.46) and (5.81) define r as a continuous
function r(q) of q. Thinking of α as fixed once and for all, we denote by γq the
solution of (5.46). We will first show that the map q → γq ∈ C is continuous
when C is provided with the topology induced by the supremum norm  · .
Let us write θ = θq to make explicit the dependence on q. Let us fix q0 and
let us consider the function q → ψ(q) ∈ C given by

u(θq , y) exp γq0 (θq )


ψ(q)(y) = αE .
Eξ exp γq0 (θq )

It is straightforward to show that the function ψ is continuous, and by (5.81)


we have ψ(q0 ) = γq0 . It then follows from (5.47) used for γ = γq and γ∗ = γq0
that
γq − γq0  ≤ 2γq − ψ(q0 ) ,
and this shows that the function q → γq is continuous at q = q0 , and hence
everywhere. It follows from (5.68) that the map q → γq is continuous, and
this shows that r is a continuous function of q.
Therefore the map q → Eth2 z r(q) is continuous from [0, 1] to itself and
has a fixed point. This proves the existence of a solution to these equations,
and this solution is unique by (5.83). The rest of the proof follows from (5.80)
through our standard scheme of proof. Namely, we write
 
ν (R1,2 − q)2 = ν((ε1 ε2 − q)(R1,2 − q))
2
≤ + ν((ε1 ε2 − q)f ) , (5.84)
N
− √
where f = R1,2 − q. By Lemma 5.2.1 and since q = Eth2 z r = Eth2 Y we
have
ν0 ((ε1 ε2 − q)f ) = (Eth2 Y − q)ν0 (f ) = 0 ,
and using (5.80) for (ε1 ε2 − q)f we obtain
  K  1/2
ν (ε1 ε2 − q)f ≤ + Lα2 Dν(f 2 )1/2 ν (R1,2 − q)2
N
K  
≤ + Lα2 Dν (R1,2 − q)2 ,
N
so comparing with (5.84) yields
  K  
ν (R1,2 − q)2 ≤ + Lα2 Dν (R1,2 − q)2 , (5.85)
N
and this finishes the proof. 

The last result of this chapter deals with the computation of
1 
pN,M = E log exp(−HN,M (σ)) .
N σ
5.5 The Replica-Symmetric Solution 317

We will follow the method of the first proof of Theorem 2.4.2. We consider q
and r as in Theorem 5.5.4. We consider independent standard Gaussian r.v.s
z, (zk )k≤M , (zi )i≤N , (ξk )k≤M , we write
√ √ √
θk = zk q + ξk 1 − q ; Sk,s = sSk + 1 − sθk , (5.86)

and we consider the following interpolating Hamiltonian for 0 ≤ s ≤ 1:


1   √ √
− HN,M,s = u(Sk1 ,s , Sk2 ,s ) − σi 1 − szi r . (5.87)
N
1≤k1 <k2 ≤M i≤N

We define 
1
pN,M,s = E log Eξ exp(−HN,M,s ) .
N σ

An in the case of Theorem 2.4.2, this interpolation is designed to preserve the


replica-symmetric equations along the interpolation. The interesting twist is
that the computation of pN,M,0 is no longer trivial. It should be obvious that

pN,M,0 = log 2 + E log ch(z r) + p∗N,M , (5.88)

where   
1 1
p∗N,M = E log Eξ exp u(θk1 , θk2 ) , (5.89)
N N
1≤k1 <k2 ≤M

but how should one compute p∗N,M ?

Research Problem 5.5.5. (Level unknown) Consider q, α > 0, and the


√ zk , ξk denote independent standard Gaussian r.v.s,
function u. Recall that

that θk = zk q + ξk 1 − q, and that Eξ denotes expectation in the r.v.s ξk
only. Recalling (5.89), compute

lim p∗N,M .
N →∞ ,M/N →α

We do not assume in Problem 5.5.5 that α is small. When we write (5.89),


we think of the quantities ξk as “spins”, so there is no telling how difficult
this problem might be (although it could well be an exercise for an expert in
large deviation theory). In the present case however, we are concerned only
with the case LαD ≤ 1, and the result in this case is described as follows.

Proposition 5.5.6. There is a number L with the following property. As-


sume that D ≥ 1. For α ≤ 1/LD and q ∈ [0, 1], denote by γα,q the function
obtained by solving (5.46), and define
α
W (α, q) = E log Eξ exp γx,q dx , (5.90)
0
318 5. The V -statistics Model
√ √
where as usual θ = z q + ξ 1 − q, z and ξ are independent standard Gaus-
sian r.v.s, and Eξ denotes expectation in ξ only. Then if LDM ≤ N and
0 ≤ q ≤ 1 we have   
 ∗ 
pN,M − W M , q  ≤ √K . (5.91)
 N  N
The function W satisfies W (0, q) = 0 and
∂W
(α, q) = E log Eξ exp γα,q (θ) . (5.92)
∂α
Moreover
∂W r(α, q)
(α, q) = − (5.93)
∂q 2
where r(α, q) is given by (5.79) and (5.67) for γ = γα,q .

The following question is called an exercise rather than a Research Prob-


lem, because the solution might not be publishable; but the author does not
know this solution.

Exercise 5.5.7. Consider the function W defined by (5.90). Find a direct


proof that W satisfies (5.93).

The obstacle here is that it is not clear how to use condition (5.46).
Comparing (5.92) and (5.93) we get the relation
 
∂ ∂ r(α, q)
(E log Eξ exp γα,q (θ)) = − . (5.94)
∂q ∂α 2
A direct proof of this mysterious relation would provide a solution to the
exercise. The difficulty is of course that γα,q depends on q and α.
Proof of Proposition 5.5.6. From now on until the end of the chapter,
the arguments will be complete but sketchy, as they will rely on simplified
versions of techniques we have already used in this chapter. We define the
function W (α, q) by W (0, q) = 0 and (5.92).
Since the very definition of p∗N,M involves thinking of the variables ξk as
spins, we will approach the problem by the methods we have developed to
study spin systems. We write the identity
1 
N (p∗N,M +1 − p∗N,M ) = E log Eξ exp u(θk , θM +1 ) (5.95)
N
1≤k≤M

where, for a function h(θ1 , . . . , θM ) we define


1
h = Eξ h exp(−HN,M ) , (5.96)
Z
where Z = Eξ exp(−HN,M ) and where
5.5 The Replica-Symmetric Solution 319

1 
− HN,M = u(θk1 , θk2 ) . (5.97)
N
1≤k1 <k2 ≤M

The next step is to prove that (recalling that α = M/N ),


) 2 *
1  K
E u(θk , x) − γα,q (x) ≤ . (5.98)
N N
1≤k≤M

The argument is as in Theorem 5.4.3 but much simpler. We define


1  1 
A(x) = u(θk , x) − γα,q (x) ; A∗ (x) = u(θk , x) − γα,q (x) ,
N N
1≤k≤M 1≤k<M

so that as in (5.53) we have

K
E A(x)2 ≤ E (αu(θM , x) − γα,q (x))A∗ (x) + . (5.99)
N
Let us denote by · ∗ an average as in (5.96) but for the Hamiltonian HN,M −1 .
Let
1 
B(x) = u(θk , θM ) = A∗ (x) + γα,q (x) ,
N
1≤k<M

so that
(αu(θM , x) − γα,q (x))A∗ (x) exp B(θM ) ∗
(αu(θM , x) − γα,q (x))A∗ (x) = .
Eξ exp B(θM ) ∗

Let us then define ψ∗ (s) by the formula (5.59). Proceeding as in (5.62) we


obtain
1/2 1/2
ψ∗ (1) ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ ,
and combining with (5.99) we get

1/2 1/2 K
E A(x)2 ≤ LαDE A∗ (x)2 ∗ E A∗ (θ)2 ∗ + .
N
Also, we have E h ∗ ≤ L h when h is a positive function, so that

K
E A(x)2 ≤ LαDE A∗ (x)2 1/2
E A∗ (θ)2 1/2
+ ,
N
after which we conclude the proof of (5.98) as in the few lines of the proof of
Theorem 5.4.3 that follow (5.64).
Combining (5.98) and (5.95) yields

K
|N (p∗N,M +1 − p∗N,M ) − E log Eξ exp γα,q (θ)| ≤ √ . (5.100)
N
320 5. The V -statistics Model

The right-hand side of (5.92) is a function f (α) of α (since γ is a function of


α), and (5.92) implies
α+1/N
W (α + 1/N, q) − W (α, q) = E log Eξ exp γx,q (θ)dx ,
α
so that
 
 
W (α + 1/N, q) − W (α, q) − 1 E log Eξ exp γα,q (θ) ≤ K ,
 N  N2

i.e.
      
  K
N W M + 1 , q − W M , q − E log Eξ exp γα,q (θ) ≤ .
 N N N

Comparing with (5.100) and summing over M yields (5.91).


It remains only to prove the elusive relation (5.93). For this we compute

∂ ∗ 1 
pN,M = 2 E (θk1 w(θk1 , θk2 ) + θk2 w(θk2 , θk1 ))
∂q N
1≤k1 <k2 ≤M

where
1 1
θk = √ zk − √ ξk
2 q 2 1−q
and where the bracket · is as in (5.96). Thus

∂ ∗ 1 
p = 2E θk1 w(θk1 , θk2 )
∂q N,M N
k1 =k2

1 k1 =k2 θk1 w(θk1 , θk2 ) exp(−HN,M )
= 2E .
N Eξ exp(−HN,M )

We then need to integrate by parts in the r.v.s θk1 i.e. to compute

w(θk1 , θk2 ) exp(−HN,M )


E θk1 .
Eξ exp(−HN,M )

The straightforward method is to replace θk1 by its value and to integrate


by parts in the r.v.s zk and ξk . One can also obtain the formula by using
the heuristic principle (2.58), although of course to really prove the formula
one has to perform the calculations again. Here (2.58) means that we can
pretend to perform the computation
√ that the denominator is a function of

the quantities θk∼ = qzk + 1 − qξk∼ , where ξk∼ are independent copies of
the r.v.s ξk . Since Eθk1 θk2 = 0 and Eθk1 θk∼ = 0 if k = k1 and = 1/2 if k = k1 ,
one then gets that the only terms occurring are created by the denominator,
and this gives
5.5 The Replica-Symmetric Solution 321

w(θk1 , θk2 ) exp(−HN,M ) 1 


Eθk1 =− E w(θk1 , θk2 )w(θk1 , θk3 ) ,
Eξ exp(−HN,M ) 2N
k3 =k1

so that finally

∂ ∗ 1 
pN,M = − E w(θk1 , θk2 )w(θk1 , θk3 ) . (5.101)
∂q 2N 3
k1 =k2 ,k1 =k3

Symmetry between the values of k yields


 
∂ ∗ 3  K
 pN,M + α E w(θM , θM −1 )w(θM , θM −2 ) ≤
 ∂q 2  N .

Using the familiar “cavity in M argument” of (5.100) for M − 3 rather than


M and reproducing the computation following (5.73) we then get
 
∂ ∗ 
 pN,M + r(α, q)  ≤ √K . (5.102)
 ∂q 2  N
For q = 1, we have θk = zk and (5.89) yields
 M (M − 1)

p∗N,M  = Eu(z, z) , (5.103)
q=1 2N 2

and combining with (5.103) gives


 2  
 α 1 1  K
 ∗
r(α, x)dx ≤ √ .
 2 Eu(z, z) − pN,M + 2 N
q

Comparison with (5.91) yields (taking N → ∞ and M/N → α)


1
α2 1
W (α, q) = Eu(z, z) + r(α, x)dx
2 2 q

and this proves that


∂W r(α, q)
(α, q) = − . 

∂q 2
Theorem 5.5.8. Recalling the function W of Proposition 5.5.6 let
r √
RS(α) = W (α, q) − (1 − q) + E log ch(z q) + log 2 ,
2
where γ, q and r are as in Theorem 5.5.4. Then, if LαD ≤ 1 snd α = M/N ,
we have
K
|pN,M − RS(α)| ≤ √ . (5.104)
N
322 5. The V -statistics Model

Proof. Since
1

pN,M = pN,M,1 = pN,M,0 + pN,M,s ds ,
0 ∂s
combining with (5.88) and (5.91) it suffices to prove that
 
∂ 
 pN,M,s + r (1 − q) ≤ √K . (5.105)
 ∂s 2  N
First we compute ∂pN,M,s /∂s using straightforward differentiation. Denoting
by νs the average corresponding to the Hamiltonian (5.87) and defining
1 1
Sk,s = √ Sk − √ θk
2 s 2 1−s
we get

pN,M,s = I + II ,
∂s
where
1 
I= νs (Sk1 ,s w(Sk1 ,s , Sk2 ,s ))
N2
k1 =k2

and  
1 √
II = − √ νs σi zi r .
2 1−s i≤N

We then integrate by parts. This is similar to the integration by parts in


(2.81). This is easy for the term II. We will explain the result of the com-
putation for the term I using the heuristic principle (2.58). The relation
ESk1 ,s Sk2 ,s = 0 shows that as in the derivation of (5.101) “the only terms
created come from the denominator in the expression of νs ”. Moreover, the
action of the expectation Eξ in the denominator amount to “shift the quan-
tities Sk,s to a new replica.” As in the case of (2.81) the definition of replicas
here involves replacing ξk by an independent copy ξk . That is, defining Sk in
the obvious manner, we set
√ √ √

Sk,s = sSk + 1 − s( qzk + 1 − qξk )
1 1 √

Sk,s = √ Sk − √ ( qzk + 1 − qξk ) .
2 s 2 1−s
1
We observe the relation ESk,s 2
Sk,s = R1,2 − q, so that in the terms arising
from the denominator we get the factor R1,2 − q. Therefore we get
  
1 1
I = − νs (R1,2 − q) 3 w(Sk11 ,s , Sk12 ,s )w(Sk21 ,s , Sk22 ,s ) ,
2 N
k1 =k2 ,k1 =k3

and as usual we have


5.5 The Replica-Symmetric Solution 323
r
II = − (1 − ν(R1,2 )) .
2
Finally we have obtained the relation

pN,M,s = I + II
∂s 
  
1 1
= − νs (R1,2 − q) w(Sk11 ,s , Sk12 ,s )w(Sk21 ,s , Sk22 ,s ) −r
2 N3
k1 =k2 ,k1 =k3
r
− (1 − q) .
2
One then extends (5.83) to the interpolating system to obtain (5.105) through
the Cauchy-Schwarz inequality. 


Exercise 5.5.9. Improve the rate of (5.104) into the usual rate K/N . (This
requires very significant work.)
6. The Diluted SK Model and the K-Sat
Problem

6.1 Introduction
In the SK model, each individual (or spin) interacts with every other indi-
vidual. For large N , this does not make physical sense. Rather, we would like
that, as N → ∞, a given individual typically interacts only with a bounded
number of other individuals. This motivates the introduction of the diluted
SK model. In this model, the Hamiltonian is given by

− HN (σ) = β gij γij σi σj . (6.1)
i<j

As usual, (gij )i<j are i.i.d. standard Gaussian r.v.s. The quantities γij ∈
{0, 1} determine which of the interaction terms are actually present in the
Hamiltonian. There is an interaction term between σi and σj only when γij =
1. The natural choice for these quantities is to consider a parameter γ > 0
(that does not depend on N ) indicating “how diluted is the interaction”,
and to decide that the quantities γij are i.i.d. r.v.s with P(γij = 1) = γ/N ,
P(γij = 0) = 1 − γ/N , and are independent from the r.v.s gij . Thus, the
expected number of terms in (6.1) is

γ N (N − 1) γ(N − 1)
= ,
N 2 2
and the expected number of terms that contain σi is about γ/2. That is,
the average number of spins that interact with one given√spin is about γ/2.
One should observe that the usual normalizing factor 1/ N does not occur
in (6.1).
If we draw an edge between i and j when γij = 1, the resulting random
graph is well understood [12]. When γ < 1, this graph has only small con-
nected components, so there is no “global interaction” and the situation is
not so interesting. In order to get a challenging model we must certainly allow
the case where γ takes any positive value.
In an apparently unrelated direction, let us remind the reader that the
motivation of Chapter 2 is the problem as to whether certain random subsets
of {−1, 1}N have a non-empty intersection. In Chapter 2, we considered “ran-
dom half-spaces”. These somehow “depend on all coordinates”. What would
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 325
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 6, © Springer-Verlag Berlin Heidelberg 2011
326 6. The Diluted SK Model and the K-Sat Problem

happen if instead we considered sets depending only on a given number p of


coordinates? For example sets of the type
+ ,
σ ; (σi1 , . . . , σip ) = (η1 , . . . , ηp ) (6.2)

where 1 ≤ i1 < i2 < . . . < ip ≤ N , and η1 , . . . , ηp = ±1?


The question of knowing whether M random independent sets of the type
(6.2) have a non-empty intersection is known in theoretical computer science
as the random K-sat problem, and is of considerable interest. (There K is
just another notation for what we call p. “Sat” stands for “satisfiability”, as
the problem is presented under the equivalent form of whether one can assign
values to N Boolean variables in order to satisfy a collection of M random
logical clauses of a certain type.) By a random subset of the type (6.2), we of
course mean a subset that is chosen uniformly at random among all possible
such subsets. This motivates the introduction of the Hamiltonian

− HN (σ) = −β Wk (σ) (6.3)
k≤M

where Wk (σ) = 0 if (σi(k,1) , . . . , σi(k,p) ) = (ηk,1 , . . . , ηk,p ), and Wk (σ) = 1


otherwise. The indices 1 ≤ i(k, 1) < i(k, 2) < . . . < i(k, p) ≤ N and the
numbers ηk,i = ±1 are chosen randomly uniformly over all possible choices.
The interesting case is when M is proportional to N .
In a beautiful paper, S. Franz and S. Leone [60] observed that many
technicalities disappear (and that one obtains a similar model) if rather than
insisting that the Hamiltonian contains exactly a given number of terms, this
number of terms is a Poisson r.v. M (independent of the other sources of
randomness). Since we are interested in the case where M is proportional to
N we will assume that EM is proportional to N , i.e. EM = αN , where of
course α does not depend on N .
To cover simultaneously the cases of (6.1) and (6.3), we consider a ran-
dom real-valued function θ on {−1, 1}p , i.i.d. copies (θk )k≥1 of θ, and the
Hamiltonian 
− HN (σ) = θk (σi(k,1) , . . . , σi(k,p) ) . (6.4)
k≤M

Here, M is a Poisson r.v. of expectation αN , 1 ≤ i(k, 1) < . . . < i(k, p) ≤


N , the sets {i(k, 1), . . . , i(k, p)} for k ≥ 1 are independent and uniformly
distributed, and the three sources of randomness (these sets, M , and the
θk ) are independent of each other. There is no longer a coefficient β, since
this coefficient can be thought of as a part of θ. For example, a situation
very similar to (6.1) is obtained for p = 2 and θ(σ1 , σ2 ) = βgσ1 σ2 where g
is standard Gaussian. It would require no extra work to allow an external
field in the formula (6.4). We do not do this for simplicity, but we stress
that our approach does not require any special symmetry property. (On the
other hand, precise specific results such as those of [78] seem to rely on such
properties.)
6.1 Introduction 327

It turns out that the mean number of terms of the Hamiltonian that
depend on a given spin is of particular relevance. This number is γ = αp
(where α is such that EM = αN ), and for simplicity of notation this will be
our main parameter rather than α.
The purpose of this chapter is to describe the behavior of the system
governed by the Hamiltonian (6.4) under a “high-temperature condition”
asserting in some sense that this Hamiltonian is small enough. This condition
will involve the r.v. S given by

S = sup |θ(σ1 , . . . , σp )| , (6.5)

where the supremum is of course over all values of σ1 , σ2 , . . . , σp = ±1, and


has the following property: if γ (and p) are given, then the high-temperature
condition is satisfied when S is small enough.
Generally speaking, the determination of exactly under which conditions
there is high-temperature behavior is a formidable problem. The best that
our methods can possibly achieve is to reach qualitatively optimal conditions,
that capture “a fixed proportion of the high-temperature region”. This seems
to be the case of the following condition:

16pγE S exp 4S ≤ 1 . (6.6)

Since the mean number of spins interacting with a given spin remains
bounded independently of N , the central limit theorem does not apply, and
the ubiquitous Gaussian behavior of the previous chapters is now absent.
Despite this fundamental difference, and even though this is hard to express
explicitly, there are many striking similarities.
We now outline the organization of this chapter. A feature of our approach
is that, in contrast with what happened for the previous models, we do not
know how to gain control of the model “in one step”. Rather, we will first
prove in Section 6.2 that for large N a small collection of spins are approx-
imately independent under a condition like (6.6). This is the main content
of Theorem 6.2.2. The next main step takes place in Section 6.4, where in
Theorem 6.4.1 we prove that under a condition like (6.6), a few quantities
σ1 , . . . , σk are approximately independent with law μγ where μγ is a prob-
ability measure on [0, 1], that is described in Section 6.3 as the fixed point of a
(complicated) operator. This result is then used in the last
 part of Section 6.4
to compute limN →∞ pN (γ), where pN (γ) = N −1 E log exp(−HN (σ)), still
under a “high-temperature” condition of the type (6.6). In Section 6.5 we
prove under certain conditions an upper bound for pN (γ), that is true for all
values of γ and that asymptotically coincides with the limit previously com-
puted under a condition of the type (6.6). In Section 6.6 we investigate the
case of continuous spins, and in Section 6.7 we demonstrate the very strong
consequences of a suitable concavity hypothesis on the Hamiltonian, and we
point out a number of rather interesting open problems.
328 6. The Diluted SK Model and the K-Sat Problem

6.2 Pure State

The purpose of this section is to show that under (6.6) “the system is in a
pure state”, that is, the spin correlations vanish. In fact we will prove that
K
E | σ1 σ2 − σ1 σ2 | ≤ (6.7)
N
where K depends only on p and γ. The proof, by induction over N , is similar
in spirit to the argument occurring at the end of Section 1.3. In order to make
the induction work, it is necessary to carry a suitable induction hypothesis,
that will prove a stronger statement than (6.7). This stronger statement will
be useful later in its own right.
Given k ≥ 1 we say that two functions f, f on ΣN n
depend on k coordinates
if we can find indices 1 ≤ i1 < . . . < ik ≤ N and functions f , f from
{−1, 1}kn to R such that

f (σ 1 , . . . , σ n ) = f (σi11 , . . . , σi1k , σi21 , . . . , σi2k , . . . , σin1 , . . . , σink )

and similarly for f . The reason we define this for two functions is to stress
that both functions depend on the same set of k coordinates.
For i ≤ N , consider the transformation Ti of ΣN n
that, for a point
(σ , . . . , σ ) of ΣN , exchanges the i-th coordinates of σ 1 and σ 2 , and leaves
1 n n

all the other coordinates unchanged.


The following technical condition should be interpreted as an “approxi-
mate independence condition”.

Definition 6.2.1. Given three numbers γ0 > 0, B > 0 and B ∗ > 0, we


say that Property C(N, γ0 , B, B ∗ ) holds if the following is true. Consider two
n
functions f, f on ΣN , and assume that they depend on k coordinates. Assume
that f ≥ 0, that for a certain i ≤ N we have

f ◦ Ti = −f , (6.8)

and that for a certain number Q we have |f | ≤ Qf at each point of ΣN


n
.
Then if γ ≤ γ0 we have
 
 f  (kB + B ∗ )Q
E  ≤
 . (6.9)
f N

Condition C(N, γ0 , B, B ∗ ) is not immediately intuitive. It is an “approx-


imate independence condition” because if the spins were really independent,
the condition f ◦ Ti = −f would imply that f = f ◦ Ti = −f so that
f = 0.
To gain intuition, let us relate condition C(N, γ0 , B, B ∗ ) with (6.7). We
take n = 2, f = 1,
f (σ 1 , σ 2 ) = σ11 (σ21 − σ22 ) ,
6.2 Pure State 329

so that (6.8) holds for i = 2, k = 2 and |f | ≤ 2f . Thus under condition


C(N, γ0 , B, B ∗ ) we get by (6.9) that

2B + B ∗
E | σ11 (σ21 − σ22 ) | ≤
N
i.e.
2B + B ∗
E | σ1 σ2 − σ1 σ2 | ≤ ,
N
which is (6.7). More generally, basically the same argument shows that when
condition C(N, γ0 , B, B ∗ ) holds (for each N and numbers B and B ∗ that
do not depend on N ), to compute Gibbs averages of functions that depend
only on a number of spin that remains bounded independently of N , one
can pretend that these spins are independent under Gibbs’ measure. We will
return to this important idea later.

Theorem 6.2.2. There exists a number K0 = K0 (p, γ0 ) such that if γ ≤ γ0


and
16γ0 pE S exp 4S ≤ 1, (6.10)
then Property C(N, γ0 , K0 , K0 ) holds for each N .
n
When property C(N, γ0 , K0 , K0 ) holds, for two functions f, f on ΣN ,
that depend on k coordinates, and with f ≥ 0, |f | ≤ Qf , then under (6.8),
and if γ ≤ γ0 , we have
 
 f  (kK0 + K0 )Q 2kK0 Q
E  ≤ ≤ . (6.11)
f  N N

The point of distinguishing in the definition of C(N, γ0 , B, B ∗ ) the values B


and B ∗ will become apparent during the proofs.
To prove Theorem 6.2.2, we will proceed by induction over N . The small-
est value of N for which the model is defined is N = p. We first observe that
| f | ≤ Q f , so that C(p, γ0 , K1 , K1∗ ) is true if K1 ≥ p. We will show that
if K1 and K1∗ are suitably chosen, then under (6.10) we have

C(N − 1, γ0 , K1 , K1∗ ) ⇒ C(N, γ0 , K1 , K1∗ ) . (6.12)

This will prove Theorem 6.2.2.


The main idea to prove (6.12) is to relate the N -spin system with an
(N − 1)-spin system through the cavity method, and we first need to set up
this method. We write −HN (σ) = −HN −1 (σ) − H(σ), where

− HN −1 (σ) = θk (σi(k,1) , . . . , σi(k,p) ) , (6.13)

where the sum is over those k ≤ M for which i(k, p) ≤ N − 1, and where
H(σ) is the sum of the other terms of (6.4), those for which i(k, p) = N .
330 6. The Diluted SK Model and the K-Sat Problem

Since the set {i(k, 1), . . . , i(k, p)} is uniformly distributed over the subsets of
{1, . . . , N } of cardinality p, the probability that i(k, p) = N is exactly p/N .
A remarkable property of Poisson r.v.s is as follows:  when M is a
Poisson
 r.v., if (X k )k≥1 are i.i.d. {0, 1}-valued r.v.s then k≤M Xk and
k≤M (1−Xk ) are independent Poisson r.v.s with mean respectively EM EXk
and EM E(1 − Xk ). The simple proof is given in Lemma A.10.1. Using this
for Xk = 1 if i(k, p) = N and Xk = 0 otherwise implies that the numbers of
terms in H(σ) and HN −1 (σ) are independent Poisson r.v.s of mean respec-
tively (p/N )αN = γ and αN − γ. Thus the pair (−HN −1 (σ), −H(σ)) has
the same distribution as the pair
  
θk (σi (k,1) , . . . , σi (k,p) ), θj (σi(j,1) , . . . , σi(j,p−1) , σN ) . (6.14)
k≤M  j≤r

Here M and r are Poisson r.v.s of mean respectively αN − γ and γ; θk


and θj are independent copies of θ; i (k, 1) < . . . < i (k, p) and the set
{i (k, 1), . . . , i (k, p)} is uniformly distributed over the subsets of {1, . . . , N −
1} of cardinality p; i(j, 1) < . . . < i(j, p − 1) ≤ N − 1 and the set Ij =
{i(j, 1), . . . , i(j, p−1)} is uniformly distributed over the subsets of {1, . . . , N −
1} of cardinality p − 1; all these random variables are globally independent.
The following exercise describes another way to think of the Hamilto-
nian HN , which provides a different intuition for the fact that the pair
(−HN −1 (σ), −H(σ)) has the same distribution as the pair (6.14).
Exercise 6.2.3. For each p-tuple i = (i1 , . . . , ip ) with 1 ≤ i1 < . . . < ip ≤ N ,
and each j ≥ 1 let us consider independent copies θi,j of θ, and define

−Hi (σ) = θi,j (σi1 , . . . , σip ) ,
j≤ri

where ri are i.i.d. Poisson r.v.s (independent of all other sources of random-
ness) with
αM
Eri = M  .
p

Prove that the Hamiltonian HN has the same distribution as the Hamiltonian

i Hi .

Since the properties of the system governed by the Hamiltonian HN de-


pend only of the distribution of this Hamiltonian, from now on in this section
we will assume that, using the same notation as in (6.14),

− HN (σ) = −HN −1 (σ) − H(σ) , (6.15)

where 
− HN −1 (σ) = θk (σi (k,1) , . . . , σi (k,p) ) , (6.16)
k≤M 
6.2 Pure State 331

and 
− H(σ) = θj (σi(j,1) , . . . , σi(j,p−1) , σN ) . (6.17)
j≤r

Let us stress that in this section and in the next, the letter r will stand
for the number of terms in the summation (6.17), which is a Poisson r.v. of
expectation γ.
We observe from (6.16) that if we write ρ = (σ1 , . . . , σN −1 ) when σ =
(σ1 , . . . , σN ), −HN −1 (σ) = −HN −1 (ρ) is the Hamiltonian of a (N − 1)-spin
system, except that we have replaced γ by a different value γ− . To compute
γ− we recall that the mean number of terms of the Hamiltonian HN −1 is
αN − γ, so that the mean number γ− of terms that contain a given spin is
p N −p
γ− = (αN − γ) = γ , (6.18)
N −1 N −1
since pα = γ. We note that γ− ≤ γ, so that

γ < γ0 ⇒ γ− ≤ γ0 , (6.19)

a fact that will help the induction.


n
Given a function f on ΣN , the algebraic identity

Av f E −
f = (6.20)
Av E −
holds. Here,  
E = E(σ , . . . , σ ) = exp
1 n
−H(σ ) ,
(6.21)
≤n

and as usual Av means average over σN 1


, . . . , σNn
= ±1. Thus Av f E is a
function of (ρ , . . . , ρ ) only, and Av f E − means that it is then averaged
1 n

for Gibbs’ measure relative to the Hamiltonian (6.13).


In the right-hand side of (6.20), we have two distinct sources of random-
ness: the randomness in · − and the randomness in E. It will be essential
that these sources of randomness are probabilistically independent. In the
previous chapters we were taking expectation given · − . We find it more
convenient to now take expectation given E. This expectation is denoted by
E− , so that, according to (6.20) we have
     
 f   Av f E −   Av f E − 

E  = E  = EE−    . (6.22)
f  Av f E −  Av f E − 

After these preparations we describe the structure of the proof. Let us


consider a pair (f , f ) as in Definition 6.2.1. The plan is to write
1
Av f E = f
2 s s
332 6. The Diluted SK Model and the K-Sat Problem

n
for some functions fs on ΣN −1 , such that the number of terms does not
depend on N , and that all pairs (fs , Av f E) have the property of the pair
(f , f ), but in the (N − 1)-spin system. Since

Avf E 1  fs
= ,
Av f E 2 s Av f E

we can now apply the induction hypothesis to each term to get a bound for
the sum and hence for  
 Av f E − 

E−   ,
Av f E − 
and finally (6.22) completes the induction step.
We now start the proof. We consider a pair (f , f ) as in Definition 6.2.1,
that is |f | ≤ Qf , f ◦ Ti = −f for some i ≤ N , and f, f depend on k
coordinates. We want to bound E| f / f |, and for this we study the last
term of (6.22). Without loss of generality, we assume that i = N and that f
and f depend on the coordinates 1, . . . , k −1, N . First, we observe that, since
we assume |f | ≤ Qf , we have |f E| ≤ Qf E, so that |Av f E| ≤ Av |f E| ≤
QAv f E, and thus  
 Av f E − 
E−  ≤Q. (6.23)
Av f E − 
We recall (6.21) and (6.17), and in particular that r is the number of terms
in the summation (6.17) and is a Poisson r.v. of expectation γ. We want to
apply the induction hypothesis to compute the left-hand side of (6.23). The
expectation E− is expectation given E, and it helps to apply the induction
hypothesis if the functions Avf E and Avf E are not too complicated. To
ensure this it will be desirable that all the points i(j, ) for j ≤ r and  ≤ p−1
are different and ≥ k. In the rare event Ω (we recall that Ω denotes an event,
and not the entire probability space) where this not the case, we will simply
use the crude bound (6.23) rather than the induction hypothesis. Recalling
that i(j, 1) < . . . < i(j, p − 1), to prove that Ω is a rare event we write
Ω = Ω1 ∪ Ω2 where
" %
Ω1 = ∃j ≤ r , i(j, 1) ≤ k − 1
" %
Ω2 = ∃j, j ≤ r , j = j , ∃,  ≤ p − 1 , i(j, ) = i(j ,  ) .

These two events depend only on the randomness of E. Let us recall that for
j ≤ r the sets
Ij = {i(j, 1), . . . , i(j, p − 1)} (6.24)
are independent and uniformly distributed over the subsets of {1, . . . , N − 1}
of cardinality p − 1. The probability that any given i ≤ N − 1 belongs to
Ij is therefore (p − 1)/(N − 1). Thus the probability that i(j, 1) ≤ k − 1,
6.2 Pure State 333

i.e. the probability that there exists  ≤ k − 1 that belongs to Ij is at most


(p − 1)(k − 1)/(N − 1). Therefore

(p − 1)(k − 1) kpγ
P(Ω1 ) ≤ Er ≤ .
N −1 N
Here and below, we do not try to get sharp bounds. There is no point in
doing this, as anyway our methods cannot reach the best possible bounds.
Rather, we aim at writing explicit bounds that are not too cumbersome. For
j < j ≤ r, the probability that a given point i ≤ N − 1 belongs to both
sets Ij and Ij  is ((p − 1)/(N − 1))2 . Thus the random number U of points
i ≤ N − 1 that belong to two different sets Ij for j ≤ r satisfies
 2
p−1 r(r − 1) p2 γ 2
E U = (N − 1) E ≤ ,
N −1 2 2N

using that Er(r − 1) = (Er)2 since r is a Poisson r.v., see (A.64). Since U is
integer valued, we have P({U = 0}) ≤ EU and since Ω2 = {U = 0} we get

p2 γ 2
P(Ω2 ) ≤ ,
2N
so that finally, since Ω = Ω1 ∪ Ω2 , we obtain

kpγ + p2 γ 2
P(Ω) ≤ . (6.25)
N
Using (6.22), (6.23) and (6.25), we have
       
 f   Av f E −   Av f E − 

E  = E 1Ω E −   + E 1Ω c E− 
f  Av f E −  Av f E −

  
kpγ + p2 γ 2  Av f E − 
≤ Q + E 1Ω c E−   . (6.26)
N Av f E − 

The next task is to use the induction hypothesis to study the last term above.
When Ω does not occur (i.e. on Ω c ), all the points i(j, ), j ≤ r,  ≤ p − 1
are different and are ≥ k. Recalling the notation (6.24) we have
!
J = {i(j, ); j ≤ r,  ≤ p − 1} = Ij ,
j≤r

so that card J = r(p − 1) and

J ∩ {1, . . . , k − 1, N } = ∅ . (6.27)

For i ≤ N − 1 let us denote by Ui the transformation of ΣN n


−1 that exchanges
1 2 1 n n
the coordinates σi and σi of a point (ρ , . . . , ρ ) of ΣN −1 , and that leaves
all the other coordinates unchanged. That is, Ui is to N − 1 what Ti is to N .
334 6. The Diluted SK Model and the K-Sat Problem

Lemma 6.2.4. Assume that f satisfies (6.8) for i = N , i.e. f ◦ TN = −f


and depends only on the coordinates in {1, . . . , k − 1, N }. Then when Ω does
not occur (i.e. on Ω c ) we have

(Av f E) ◦ Ui = −Av f E . (6.28)
i∈J

Here i∈J Ui denotes the composition of the transformations Ui for i ∈ J
(which does not depend on the order in which this composition is performed).
This (crucial. . .) lemma means that something of the special symmetry of f
(as in (6.8)) is preserved when one replaces f by Av f E.

Proof. Let us write T = i∈J Ti . We observe first that

f ◦T =f

because f depends only on the coordinates in {1, . . . , k − 1, N }, a set disjoint


from J. Thus
f ◦ T ◦ TN = f ◦ TN = −f (6.29)
since f ◦ TN = −f . We observe now that T ◦ TN exchanges σi1 and σi2 for
all i ∈ J ∪ {N }. These values of i are precisely the coordinates of which E
depends, so that

E ◦ T ◦ TN (σ 1 , σ 2 , . . . , σ n ) = E(σ 2 , σ 1 , . . . , σ n ) = E(σ 1 , σ 2 , . . . , σ n ) ,

and hence
E ◦ T ◦ TN = E .
Combining with (6.29) we get

(f E) ◦ T ◦ TN = (f ◦ T ◦ TN )(E ◦ T ◦ TN ) = −f E

so that, since TN2 is the identity,

(f E) ◦ T = −(f E) ◦ TN . (6.30)

Now, for any function f we have Av(f ◦ TN ) = Avf and Av(f ◦ T ) = (Avf ) ◦

i∈J Ui . Therefore we obtain

Av ((f E) ◦ TN ) = Av f E

Av((f E) ◦ T ) = (Av f E) ◦ Ui ,
i∈J

so that applying Av to (6.30) proves (6.28). 


Let us set k = r(p − 1) = card J, and let us enumerate as i1 , . . . , ik the
points of J. Now (6.28) implies
6.2 Pure State 335
  
1 1 
Av f E = Av f E − (Av f E) ◦ Uis = fs , (6.31)
2 
2 
s≤k 1≤s≤k

where  
fs = (Av f E) ◦ Uiu − (Av f E) ◦ Uiu . (6.32)
u≤s−1 u≤s

Since Ui2 is the identity, we have

fs ◦ Uis = −fs . (6.33)

In words, (6.31) decomposes Av f E as a sum of k = r(p − 1) pieces that


possess the symmetry property required to use the induction hypothesis. In
order to apply this induction hypothesis, it remains to establish the property
that will play for the pairs (fs , Avf E) the role the inequality |f | ≤ Qf plays
for the pair (f , f ). This is the purpose of the next lemma. For j ≤ r we set

Sj = sup |θj (ε1 , ε2 , . . . , εp )| ,

where the supremum is over all values of ε1 , ε2 , . . . , εp = ±1. We recall the


notation (6.24).
Lemma 6.2.5. Assume that Ω does not occur and that is ∈ Iv for a certain
(unique) v ≤ r. Then
  
|fs | ≤ 4QSv exp 4 Su Av f E . (6.34)
u≤r

A crucial feature of this bound is that it does not depend on the number n
of replicas.
Proof. Let us write
    
E = exp −H(σ ) ; E = exp

−H(σ ) ,


3≤≤n =1,2

so that E = E E . Since |H(σ)| ≤ j≤r Sj , we have
  
E ≥ exp −2 Sj ,
j≤r

and therefore   
E ≥ E exp −2 Sj . (6.35)
j≤r

This implies   
Av f E ≥ (Av f E ) exp −2 Sj . (6.36)
j≤r
336 6. The Diluted SK Model and the K-Sat Problem

Next,
 
fs = (Av f E) ◦ Uiu − (Av f E) ◦ Uiu
u≤s−1 u≤s
   
= Av (f E) ◦ Tiu − (f E) ◦ Tiu
u≤s−1 u≤s
  
 
= Av f E ◦ Tiu − E ◦ Tiu , (6.37)
u≤s−1 u≤s

using in the last line that f ◦ Tiu = f for each u, since f depends only on
the coordinates 1, . . . , k − 1, N . Recalling that E = E E , and observing that
for each i, we have E ◦ Ti = E , we get
     
E◦ Tiu − E ◦ Tiu = E E ◦ Tiu − E ◦ Tiu ,
u≤s−1 u≤s u≤s−1 u≤s

and, if we set
 
   
Δ = supE ◦ Tiu − E ◦ Tiu  = sup |E − E ◦ Tis | ,
u≤s−1 u≤s

we get from (6.37) that, using that |f | ≤ Qf in the first inequality and (6.35)
in the second one,
  
|fs | ≤ ΔAv (|f |E ) ≤ QΔAv(f E ) ≤ QΔAv(f E) exp 2 Sj . (6.38)
j≤r

To bound Δ, we write E = j≤r Ej , where

Ej = exp 
θj (σi(j,1) 
, . . . , σi(j,p−1) 
, σN ).
=1,2

We note that Ej ◦ Tis = Ej if j = v, because then Ej depends only on the


coordinates in Ij , and is ∈
/ Ij if j = v, since is ∈ Iv and Ij ∩ Iv = ∅. Thus

E − E ◦ Tis = (Ev − Ev ◦ Tis ) Ej .
j =v

Now, using the inequality |ex − ey | ≤ |x − y|ea ≤ 2aea for |x|, |y| ≤ a and
a = 2Sv , we get
|Ev − Ev ◦ Tis | ≤ 4Sv exp 2Sv .

Since for all j we have Ej ≤ exp 2Sj , we get Δ ≤ 4Sv exp 2 j≤r Sj . Com-
bining with (6.38) completes the proof. 
6.2 Pure State 337

Proposition 6.2.6. Assume that N ≥ p + 1 and that condition C(N −


1, γ0 , B, B ∗ ) holds. Consider f and f as in Definition 6.2.1, and assume
that γ ≤ γ0 . Then
  
 f  Qp
E  ≤
 k(γ + 4BD exp 4D) + 4pBU Er2 V r−1 + pγ 2 + 4B ∗ D exp 4D ,
f N
(6.39)
where
D = γE S exp 4S .

Proof. We keep the notation of Lemmas 6.2.4 and 6.2.5. Since γ− ≤ γ, we


can use C(N − 1, γ0 , B, B ∗ ) to conclude from (6.33) and (6.34) that, since
fs and Av Ef depend on k − 1 + r(p − 1) ≤ k + rp coordinates, and since
1/(N − 1) ≤ 2/N because N ≥ 2, on Ω c we have
    
 fs −  8Q

E−   ≤ ∗
((k + rp)B + B )Sv exp 4 Sj .
Av f E −  N
j≤r

Let us denote by Eθ expectation in the r.v.s θ1 , . . . , θr only. Then we get


 
 fs −  8Q
Eθ E−  ≤ ((k + rp)B + B ∗ )U V r−1 ,
Av f E −  N

where
U = E S exp 4S ; V = E exp 4S .
Combining with (6.31), and since there are k = r(p − 1) ≤ rp terms we get
 
 Av f E −  4Qp
Eθ E−  ≤ ((kr + r2 p)B + rB ∗ )U V r−1 .
Av f E −  N

This bound assumes that Ω does not occur; but combining with (6.26) we
obtain the bound
  
 f  Qp  
E  ≤ kγ + pγ 2
+ 4B kU ErV r−1
+ pU Er 2 r−1
V + 4B ∗
U ErV r−1
.
f  N
Since r is a Poisson r.v. of expectation γ a straightforward calculation shows
that ErV r−1 = γ exp γ(V − 1). Since ex ≤ 1 + xex for all x ≥ 0 (as is trivial
using power series expansion) we have V ≤ 1+4U , so exp γ(V −1) ≤ exp 4γU
and U ErV r−1 ≤ D exp 4D. The result follows. 
Proof of Theorem 6.2.2. If

D0 = γ0 E S exp 4S

is small enough that 16pD0 ≤ 1 then

4pD0 exp 4D0 ≤ 1/2 , (6.40)


338 6. The Diluted SK Model and the K-Sat Problem

and (6.39) implies


     
 f  Q B B∗

E  ≤ k pγ0 + 2 2 r−1
+ 4p BU Er V 2 2
+p γ + .
f  N 2 2

Thus condition

C(N, γ0 , pγ0 + B/2, 4p2 BU Er2 V r−1 + p2 γ02 + B ∗ /2)

holds. That is, we have proved under (6.40) that

C(N − 1, γ0 , B, B ∗ ) ⇒ C(N, γ0 , pγ0 + B/2, 4p2 BU Er2 V r−1 + p2 γ02 + B ∗ /2)) .


(6.41)
Now, we observe that U Er2 V r−1 ≤ K ∼ and that if K1 = 2pγ0 and K1∗ =
8p2 K1 K ∼ + 2p2 γ02 , (6.41) shows that (6.12) holds, and we have completed
the induction. 
Probably at this point it is good to stop for a while and to wonder what
is the nature of the previous argument. In essence this is “contraction ar-
gument”. The operation of “adding one spin” essentially acts as a type of
contraction, as is witnessed by the factor 1/2 in front of B and B ∗ in the
right-hand side of (6.41). As it turns out, almost every single argument used
in this work to control a model under a “high-temperature condition” is of
the same type, whether this is rather apparent, as in Section 1.6, or in a
more disguised form as here. (The one exception being Latala’s argument of
Section 1.4.)
We explained at length in Section 1.4 that we expect that at high-
temperature, as long as one considers a number of spins that remains bounded
independently of N , Gibbs’ measure is nearly a product measure. For the
present model, this property follows from Theorem 6.2.2 and we now give
quantitative estimates to that effect, in the setting we need for future uses.
Let us consider the product measure μ on ΣN −1 such that

∀i ≤ N − 1, σi dμ(ρ) = σi − ,

and let us denote by · • an average with respect to μ. Equivalently, for a


function f on ΣN −1 , we have
N −1
f • = f (σ11 , . . . , σN −1 ) − , (6.42)

where σii is the i-th coordinate of the i-th replica ρi . The following conse-
quence of property C(N, γ0 , K0 , K0 ) will be used in Section 6.4. It expresses,
in a form that is particularly adapted to the use of the cavity method the fact
that under property C(N, γ0 , K0 , K0 ), a given number of spins (independent
of N ) become nearly independent for large N .
6.2 Pure State 339

Proposition 6.2.7. If property C(N, γ0 , K0 , K0 ) holds for each N , and if


γ ≤ γ0 , the following occurs. Consider for j ≤ r sets Ij ⊂ {1, . . . , N } with
card Ij = p, N ∈ Ij , and such that j = j ⇒ Ij ∩ Ij  = {N }. For j ≤ r
consider functions Wj on ΣN depending only on the coordinates in Ij and
let Sj = sup |Wj (σ)|. Let

E = exp Wj (σ) .
j≤r

Then, recalling the definition (6.42), we have


 
 Av σN E −
 Av σN E •  8r(p − 1)2 K0 
E−  − ≤ exp 2Sj . (6.43)
Av E − Av E •  N −1
j≤r

This is a powerful principle, since it is very much easier to work with


the averages · • than with the Gibbs averages · − . We will use this result
when r is as usual the number of terms in (6.17) but since in (6.43) the
expectation E− is only in the randomness of · − we can, in the proof, think
of the quantities r and Wj as being non-random.
Proof. Let f = Av σN E and f = Av E. For 0 ≤ i ≤ N − 1, let us define

fi = fi (ρ1 , . . . , ρN −1 ) = f (σ11 , σ22 , . . . , σii , σi+1


1 1
, . . . , σN −1 )

and fi similarly. The idea is simply that “we make the spins independent one
at a time”. Thus
Av σN E − f − Av σN E • f −
= 1 ; = N −1 , (6.44)
Av E − f1 − Av E • fN −1 −

and the left-hand side of (6.43) is bounded by


  
 f − f − 
E−  i−1 − i  .
fi−1 − fi −
2≤i≤N −1

The terms in the summation are zero unless i belongs to the union of the
sets Ij , j ≤ r, for otherwise f and f do not depend on the i-th coordinate
and fi = fi−1 , fi = fi−1 . We then try to bound the terms in the summation
when i ∈ Ij for a certain j ≤ r. Since |fi | ≤ fi we have
     
 fi−1 − fi −   fi−1 − fi −   fi − fi−1 − fi − 
 − ≤ +
 fi−1 − fi −   fi−1 −   fi−1 − fi − 
   
 f − fi −   fi−1 − fi − 
≤  i−1 +
fi−1 −   fi−1 − 

so that, taking expectation in the previous inequality we get


340 6. The Diluted SK Model and the K-Sat Problem
     
 f − f −   f − fi −   fi−1 − fi − 
E−  i−1 − i  ≤ E−  i−1 
 + E−   . (6.45)
fi−1 − fi − fi−1 − fi−1 −

We will use C(N − 1, γ0 , K0 , K0 ) to bound these terms. First, we observe


that the function fi−1 − fi changes sign if we exchange σi1 and σii . Next, we
observe that since Wu does not depend on σi for u = j (where j is defined
by i ∈ Ij ) we have


E := exp Wu (σ11 , σ22 , . . . , σii , σi+1
1 1
, . . . , σN )
u=j

i−1
= exp Wu (σ11 , σ22 , . . . , σi−1 , σi1 , σi+1
1 1
, . . . , σN ).
u=j

Then
fi−1 = Av E(σ11 , . . . , σi−1
i−1
, σi1 , . . . , σN
1
) ≥ exp(−Sj )Av E ,
where Av denotes average over σN 1
= ±1. In a similar fashion, we get |fi−1 | ≤
exp Sj Av E , |fi | ≤ exp Sj Av E , and thus

|fi−1 − fi | ≤ (2 exp 2Sj )fi−1 ,

so that using (6.11) property C(N − 1, γ0 , K0 , K0 ) implies


 
 fi−1 − fi − 

E−   ≤ 4K0 r(p − 1) exp 2Sj , (6.46)
fi−1 −  N − 1

because these functions depend on r(p − 1) coordinates. We proceed similarly


to handle the last term on the right-hand side of (6.45). We then perform the
summation over i ≤ N − 1. A new factor p − 1 occurs because each set Ij
contains p − 1 such values of i. 

6.3 The Functional Order Parameter

As happened in the previous models, we expect that if we fix a number n and


take N very large, at a given disorder, n spins (σ1 , . . . , σn ) will asymptoti-
cally be independent, and that the r.v.s σ1 , . . . , σn will asymptotically be
independent. In the case of the SK model, the limiting law of σi was the

law of th(βz q + h) where z is a standard Gaussian r.v. and thus this law
depended only on the single parameter q.
The most striking feature of the present model is that the limiting law is
now a complicated object, that no longer depends simply on a few parameters.
It is therefore reasonable to think of this limiting law μ as being itself a kind
of parameter (the correct value of which has to be found). This is what the
physicists mean when they say “that the order parameter of the model is a
6.3 The Functional Order Parameter 341

function” because they identify a probability distribution μ on R with the


tail function t → μ([t, ∞)).
The purpose of the present section is to find the correct value of this
parameter. As is the case of the SK model this value will be given as the
solution of a certain equation. The idea of the construction we will perform
is very simple. While using the cavity method in the previous section, we
have seen in (6.34) (used for n = 1 and f (σ) = σN ) that
AvσN E −
σN = , (6.47)
AvE −
where 
E = exp θj (σi(j,1) , . . . , σi(j,p−1) , σN ) . (6.48)
j≤r

In the limit N → ∞ the sets Ij = {i(j, 1), . . . , i(j, p − 1)} are disjoint. The
quantity E depends on a number of spins that in essence does not depend
on N . If we know the asymptotic behavior of any fixed number (i.e. of any
number that does not depend on N ) of the spins (σi )i<N , we can then com-
pute the behavior of the spin σN . This behavior has to be the same as the
behavior of the spins σi for i < N , and this gives rise to a “self-consistency
equation”.
To define formally this equation, consider a Poisson r.v. r with Er = γ,
and independent of the r.v.s θj . For σ ∈ {−1, 1}N and ε ∈ {−1, 1} we define

Er = Er (σ, ε) = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) . (6.49)
1≤j≤r

This definition will be used many times in the sequel. We note that Er
depends on σ only through the coordinates of rank ≤ r(p − 1).
Given a sequence x = (xi )i≥1 with |x i | ≤ 1 we denote by λx the prob-
ability on {−1, 1}N that “has a density i (1 + xi σi ) with respect to the
uniform measure”. More formally, λx is the product measure such that
σi dλx (σ) = xi for each i. We denote by · x an average for λx .
Similarly, if x = (xi )i≤M we also denote by λx the probability measure
on ΣM = {−1, 1}M such that σi dλx (σ) = xi and we denote by · x an
average for λx , so that we have

f x = (1 + xi σi )f (σ)dσ ,
i≤M

where dσ means average for the uniform measure on ΣM .


These definitions are also of central importance in this chapter. The
idea underlying these definitions has already been used implicitly in (6.42)
since for a function f on ΣN −1 we have

f • = f Y , (6.50)
342 6. The Diluted SK Model and the K-Sat Problem

where Y = ( σ1 − , . . . , σN −1 − ).
Consider a probability measure μ on [−1, 1], and an i.i.d. sequence X =
(Xi )i≥1 such that Xi is of law μ. We define T (μ) as the law of the r.v.

Av εEr X
, (6.51)
Av Er X

where Av denotes the average over ε = ±1. We note that E depends on σ


and ε, so that Av εEr and Av Er depend on σ only and (6.51) makes sense.
The intuition is that if μ is the law of σi for i < N , then T (μ) is the law
of σN . This is simply because if the spins “decorrelate” as we expect, and
if in the limit any fixed number of the averages σi i are i.i.d. of law μ, then
the right-hand side of (6.47) will in the limit have the same distribution as
the quantity (6.51).
Theorem 6.3.1. Assume that

4γpE(S exp 2S) ≤ 1 . (6.52)

Then there exists a unique probability measure μ on [−1, 1] such that

μ = T (μ) .

The proof will consist of showing that T is a contraction for the Monge-
Kantorovich transportation-cost distance d defined in (A.66) on the set of
probability measures on [−1, 1] provided with the usual distance. In the
present case, this distance is simply given by the formula

d(μ1 , μ2 ) = inf E|X − Y | ,

where the infimum is taken over all pairs of r.v.s (X, Y ) such that the law
of X is μ1 and the law of Y is μ2 . The very definition of d shows that to
bound d(μ1 , μ2 ) there is no other method than to produce a pair (X, Y ) as
above such that E|X − Y | is appropriately small. Such a pair will informally
be called a coupling of the r.v.s X and Y .

Lemma 6.3.2. For a function f on {−1, 1}N , we have


f x = Δi f x (6.53)
∂xi
− −
i ) − f (η i ))/2, and where η i (resp. η i ) is obtained by
where Δi f (η) = (f (η + +

replacing the i-th coordinate of η by 1 (resp. −1).



Proof. The measure λx on {−1, 1} such that η dλx (η) = x gives mass
(1 + x)/2 to 1 and mass (1 − x)/2 to −1, so that for a function f on {−1, 1}
we have
6.3 The Functional Order Parameter 343

1 x
f x = f (η) dλx (η) = (f (1) + f (−1)) + (f (1) − f (−1)) .
2 2

Thus, using in the second inequality the trivial fact that a = a x for any
number a implies

d 1 1
f x = (f (1) − f (−1)) = (f (1) − f (−1)) . (6.54)
dx 2 2 x

Since λx is a product measure, using (6.54) given all the coordinates different
from i, and then Fubini’s theorem, we obtain (6.53). 

Lemma 6.3.3. If Er is as in (6.49), if 1 ≤ j ≤ r and if (j − 1)(p − 1) < i ≤


j(p − 1), then  
 ∂ Av εEr x 
 
 ∂xi Av Er x  ≤ 2Sj exp 2Sj

where Sj = sup |θj |. For the other values of i the left-hand side of the previous
inequality is 0.

Proof. Lemma 6.3.2 implies:

∂ Av εEr x Δi (Av εEr ) x Av εEr x Δi Av Er x


= − . (6.55)
∂xi Av Er x Av Er x Av Er 2x

Now
|Δi (Av εEr )| = |Av (εΔi Er )| ≤ Av |Δi Er | .
We write Er = E E , where E = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε), and
where E does not depend on σi . Thus, using that |ex −ey | ≤ |x−y|ea ≤ 2aea
for |x|, |y| ≤ a, we get (keeping in mind the factor 1/2 in the definition
of Δi , that offsets the factor 2 above) that Δi E ≤ Sj exp Sj , and since
E ≤ Er exp Sj we get

|Δi Er | = |E Δi E | ≤ (Sj exp Sj )E ≤ (Sj exp 2Sj )Er

and thus  
 Δi (Av εEr ) x 

 Av Er x  ≤ Sj exp 2Sj .

The last term of (6.55) is bounded similarly. 


Proof of Theorem 6.3.1. This is a fixed point argument. It suffices to prove
that under (6.52), for any two probability measures μ1 and μ2 on [−1, 1], we
have
1
d(T (μ1 ), T (μ2 )) ≤ d(μ1 , μ2 ) . (6.56)
2
First, Lemma 6.3.3 yields that given x, y ∈ [−1, 1]N it holds:
344 6. The Diluted SK Model and the K-Sat Problem
 
 Av εE Av εEr y   
 r x
 − ≤2 Sj exp 2Sj |xi − yi | .
 Av Er x Av Er y 
j≤r (j−1)(p−1)<i≤j(p−1)
(6.57)
Consider a pair (X, Y ) of r.v.s and independent copies (Xi , Yi )i≥1 of this pair.
Let X = (Xi )i≥1 , Y = (Yi )i≥1 , so that from (6.57) we have
 
 Av εEr X Av εEr Y   
 − ≤2 Sj exp 2Sj |Xi − Yi | .
 Av Er Av Er Y 
X j≤r (j−1)(p−1)<i≤j(p−1)
(6.58)
Let us assume that the randomness of the pairs (Xi , Yi ) is independent of the
other sources of randomness in (6.58). Taking expectations in (6.58) we get
 
 Av εEr X Av εEr Y 
E − ≤ 2γ(p − 1)(ES exp 2S)E|X − Y | . (6.59)
Av Er X Av Er Y 

If X and Y have laws μ1 and μ2 respectively, then


Av εEr X Av εEr Y
and
Av Er X Av Er Y

have laws T (μ1 ) and T (μ2 ) respectively, so that (6.59) implies

d(T (μ1 ), T (μ2 )) ≤ 2γ(p − 1)(ES exp 2S)E|X − Y | .

Taking the infimum over all possible choices of X and Y yields

d(T (μ1 ), T (μ2 )) ≤ 2γ(p − 1)d(μ1 , μ2 )ES exp 2S ,

so that (6.52) implies (6.56). 


Let us denote by Tγ the operator T when we want to insist on the de-
pendence on γ. The unique solution of the equation μ = Tγ (μ) depends on
γ, and we denote it by μγ when we want to emphasize this dependence.
Lemma 6.3.4. If γ and γ satisfy (6.52) we have

d(μγ , μγ  ) ≤ 4|γ − γ | .

Proof. Without loss of generality we can assume that γ ≤ γ . Since μγ =


Tγ (μγ ) and μγ  = Tγ  (μγ  ), we have

d(μγ , μγ  ) ≤ d(Tγ (μγ ), Tγ (μγ  )) + d(Tγ (μγ  ), Tγ  (μγ  ))


1
≤ d(μγ , μγ  ) + d(Tγ (μγ  ), Tγ  (μγ  )) , (6.60)
2
using (6.56). To compare Tγ (μ) and Tγ  (μ) the basic idea is that there is
natural coupling between a Poisson r.v. of expectation γ and another Poisson
r.v. of expectation γ (an idea that will be used again in the next section).
6.4 The Replica-Symmetric Solution 345

Namely if r is a Poisson r.v. with Er = γ := γ −γ, and r is independent of


the Poisson r.v. r such that Er = γ then r +r is a Poisson r.v. of expectation
γ . Consider Er as in (6.49) and, with the same notation,

E = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) ,
r<j≤r+r 

so that Er E = Er+r . Consider an i.i.d. sequence X = (Xi )i≥1 of common


law μ. Then the r.v.s
Av εEr X Av εEr E X
and
Av Er X Av Er E X

have respectively laws Tγ (μ) and Tγ  (μ). Thus


 
 Av εEr X Av εEr E 
d(Tγ (μ), Tγ  (μ)) ≤ E  − X
 (6.61)
Av Er X Av Er E X
−(γ  −γ)
≤ 2P(r = 0) = 2(1 − e ) ≤ 2(γ − γ) ,

so that (6.60) implies that d(μγ , μγ  ) ≤ d(μγ , μγ  )/2 + 2(γ − γ), hence the
desired result. 
n
Exercise 6.3.5. Consider three functions U, V, W on ΣN . Assume that
V ≥ 0, that for a certain number Q, we have |U | ≤ QV , and let S ∗ =
supσ1 ,...,σn |W |. Prove that for any Gibbs measure · we have
 
 U exp W U 
 − ≤ 2QS ∗ exp 2S ∗ .
 V exp W V 
Exercise 6.3.6. Use the idea of Exercise 6.3.5 to control the influence of
E in (6.61) and to show that if γ and γ satisfy (6.52) then d(μγ , μγ  ) ≤
4|γ − γ |ES exp 2S.

6.4 The Replica-Symmetric Solution


In this section we will first prove that asymptotically as N → ∞ any fixed
number of the quantities σi are i.i.d. of law μγ , where μγ was defined
in the last section. We will then compute the quantity limN →∞ pN (γ) =
limN →∞ N −1 E log ZN (γ).

Theorem 6.4.1. Assume that

16pγ0 ES exp 4S ≤ 1 . (6.62)

Then there exists a number K2 (p, γ0 ) such that if we define for n ≥ 0 the
numbers A(n) as follows:
346 6. The Diluted SK Model and the K-Sat Problem

A(0) = K2 (p, γ0 )E exp 2S , (6.63)


 
A(n + 1) = A(0) + 40p3 (γ0 + γ03 )ES exp 2S A(n) , (6.64)
then the following holds. If γ ≤ γ0 , given any integers k ≤ N and n we can
find i.i.d. r.v.s z1 , . . . , zk of law μγ such that
 k 3 A(n)
E | σi − zi | ≤ 21−n k + . (6.65)
N
i≤k

In particular when
80p3 (γ0 + γ03 )ES exp 2S ≤ 1 , (6.66)
we can replace (6.65) by
 2k 3 K2 (γ0 , p)
E | σi − zi | ≤ E exp 2S . (6.67)
N
i≤k

The last statement of the Theorem simply follows from the fact that under
(6.66) we have A(n) ≤ 2A(0), so that we can take n very large in (6.90).
When (6.66) need not hold, optimisation over n in (6.65) yields a bound
≤ KkN −α for some α > 0 depending only on γ0 , p and S.
The next problem need not be difficult. This issue came at the very time
where the book was ready to be sent to the publisher, and it did not seem ap-
propriate to either delay the publication or to try to make significant changes
in a rush.
Research Problem 6.4.2. (level 1-) Is it true that (6.67) follows from
(6.62)? More specifically, when γ0  1, and when S is constant, does (6.67)
follow from a condition of the type K(p)γ0 S ≤ 1?
Probably the solution of this problem will not require essentially new
ideas. Rather, it should require technical work and improvement of the esti-
mates from Lemma 6.4.3 to Lemma 6.4.7, trying in particular to bring out
more “small factors” such as ES exp 2S, in the spirit of Exercice 6.3.6. It
seems however that it will also be necessary to proceed to a finer study of
what happens on the set Ω defined page 349.
It follows from Theorem 6.2.2 that we can assume throughout the proof
that property C(γ0 , N, K0 , K0 ) holds for every N . It will be useful to consider
the metric space [−1, 1]k , provided with the distance d given by

d((xi )i≤k , (yi )i≤k ) = |xi − yi | . (6.68)
i≤k

The Monge-Kantorovich transportation-cost distance on the space of proba-


bility measures on [−1, 1]k that is induced by (6.68) will also be denoted by
d. We define
6.4 The Replica-Symmetric Solution 347
 
D(N, k, γ0 ) = sup d L( σ1 , . . . , σk ), μ⊗k
γ (6.69)
γ≤γ0

where L(X1 , . . . , Xk ) denotes the law of the random vector (X1 , . . . , Xk ).


By definition of the transportation-cost distance in the right-hand side
of (6.69), the content of Theorem 6.4.1 is that if γ0 satisfies (6.62) we have
D(N, k, γ0 ) ≤ 21−n k + k 3 A(n)/N for each k ≤ N and each n. This inequality
will be proved by obtaining a suitable induction relation between the quan-
tities D(N, k, γ0 ). The overall idea of the proof is to use the cavity method
to express σ1 , . . . , σk as functions of a smaller spin system, and to use
Proposition 6.2.7 and the induction hypothesis to perform estimates on the
smaller spin system. 
We start by a simple observation. Since i≤k |xi − yi | ≤ 2k for xi , yi ∈
[−1, 1], we have D(N, k, γ0 ) ≤ 2k. Assuming, as we may, that K2 (p, γ0 ) ≥ 4p,
we see that there is nothing to prove unless N ≥ 2pk2 so in particular N ≥
p + k and N ≥ 2k. We will always assume below that this is the case. We
also observe that, by symmetry,

L( σ1 , . . . , σk ) = L( σN −k+1 , . . . , σN ) .

The starting point of the proof of Theorem 6.4.1 is a formula similar to (6.20),
but where we remove the last k coordinates rather than the last one. Writing
now ρ = (σ1 , . . . , σN −k ), we consider the Hamiltonian

− HN −k (ρ) = θs (σi(s,1) , . . . , σi(s,p) ) , (6.70)
s

where the summation is restricted to those s ≤ M for which i(s, p) ≤ N − k.


This is the Hamiltonian of an (N − k)-spin system, except that we have
replaced γ by a different value γ− . To compute γ− we observe that since
the set {i(s, 1), . . . , i(s, p)} is uniformly distributed among the subsets of
{1, . . . , N } of cardinality p, the probability that i(s, p) ≤ N − k, i.e. the
probability that this set is a subset of {1, . . . , N − k} is exactly

N −k
p
τ=  ,
N
p

so that the mean number of terms of this Hamiltonian is N ατ , and

γ− (N − k) = pN ατ = γN τ ,

and thus
(N − k − 1) · · · (N − k − p + 1)
γ− = γ . (6.71)
(N − 1) · · · (N − p + 1)
In particular γ− ≤ γ0 whenever γ ≤ γ0 . Let us denote again by · − an
average for the Gibbs measure with Hamiltonian (6.70). (The value of k will
be clear from the context.) Given a function f on ΣN , we then have
348 6. The Diluted SK Model and the K-Sat Problem

Av f E −
f = , (6.72)
Av E −
where Av means average over σN −k+1 , . . . , σN = ±1, and where

E = exp θs (σi(s,1) , . . . , σi(s,p) ) ,

for a sum over those values of s ≤ M for which i(s, p) ≥ N − k + 1. As before,


in distribution, 
E = exp θj (σi(j,1) , . . . , σi(j,p) ) , (6.73)
j≤r

where now the sets {i(j, 1), . . . , i(j, p)} are uniformly distributed over the
subsets of {1, . . . , N } of cardinality p that intersect {N − k + 1, . . . , N }, and
where r is a Poisson r.v. The expected value of r is the mean number of terms
in the Hamiltonian −HN that are not included in the summation (6.70), so
that
⎛ ⎞
N −k  
p γN (N − k) · · · (N − k − p + 1)
Er = αN ⎝1 −  ⎠= 1− . (6.74)
N p N · · · (N − p + 1)
p

The quantity r will keep this meaning until the end of the proof of The-
orem 6.4.1, and the quantity E will keep the meaning of (6.73). It is good to
note that, since N ≥ 2kp, for  ≤ p we have
N −k− k 2k
=1− ≥1− .
N − N − N
Therefore
(N − k) · · · (N − k − p − 1) 2k p 2kp
≥ 1− ≥1− , (6.75)
N · · · (N − p + 1) N N
and thus
Er ≤ 2kγ . (6.76)
We observe the identity
 
Av σN −k+1 E − Av σN E −
L( σN −k+1 , . . . , σN ) = L ,..., . (6.77)
Av E − Av E −
The task is now to use the induction hypothesis to approximate the right-
hand side of (6.77); this will yield the desired induction relation. There are
three sources of randomness on the right-hand of (6.77). There is the ran-
domness associated with the (N − k)-spin system of Hamiltonian (6.70); the
randomness associated to r and the sets {i(j, 1), . . . , i(j, p)}; and the random-
ness associated to the functions θs , s ≤ r. These three sources of randomness
are independent of each other.
6.4 The Replica-Symmetric Solution 349

To use the induction hypothesis, it will be desirable that for j ≤ r the


sets
Ij = {i(j, 1), . . . , i(j, p − 1)} (6.78)
are disjoint subsets of {1, . . . , N − k}, so we first control the size of the rare
event Ω where this is not the case. We have Ω = Ω1 ∪ Ω2 , where
" %
Ω1 = ∃j ≤ r, i(j, p − 2) ≥ N − k + 1

" %
Ω2 = ∃ j, j ≤ r, j = j , ∃ ,  ≤ p − 1 , i(j, ) = i(j ,  ) .

Proceeding as in the proof of (6.25) we easily reach the crude bound

4k2
P(Ω) ≤ (γp + γ 2 p2 ) . (6.79)
N
We recall that, as defined page 341, given a sequence x = (x1 , . . . , xN −k )
with |xi | ≤ 1 and a function f on ΣN −k , we denote by f x the average of f
with respect to the product measure λx on ΣN −k such that σi dλx (ρ) = xi
for 1 ≤ i ≤ N − k.
We now start a sequence of lemmas that aim at deducing from (6.77) the
desired induction relations among the quantities D(N, k, γ0 ). There will be
four steps in the proof. In the first step below, in each of the brackets in the
right-hand side of (6.77) we replace the Gibbs measure · − by · Y where
Y = ( σ1 − , . . . , σN −k − ). The basic reason why this creates only a small
error is that C(N, γ0 , K0 , K0 ) holds true for each N , a property which is used
as in Proposition 6.2.7.

Lemma 6.4.3. Consider the sequence

Y = ( σ1 −, . . . , σN −k −) .

Set
Av σN −k+ E − Av σN −k+ E Y
u = σN −k+ = ; v = .
Av E − Av E Y
Then we have
k3
d(L(u1 , . . . , uk ), L(v1 , . . . , vk )) ≤ K(p, γ0 )E exp 2S . (6.80)
N
Proof. From now on E− denotes expectation in the randomness of the N − k
spin system only. When Ω does not occur, there is nothing to change to the
proof of Proposition 6.2.7 to obtain that

8r(p − 1)2 K0 
E− |u − v | ≤ exp 2Sj ,
N −k
j≤r
350 6. The Diluted SK Model and the K-Sat Problem

where we recall that r denotes the number of terms in the summation in


(6.73), and is a Poisson r.v. which satisfies Er ≤ 2kγ. We always have E− |u −
v | ≤ 2, so that

8r(p − 1)2 K0 
E− |u − v | ≤ exp 2Sj + 21Ω . (6.81)
N −k
j≤r

Taking expectation in (6.81) then yields

8(p − 1)2 K0
E|u − v | ≤ E exp 2S Er2 + 2P(Ω)
N −k
k2 K(p, γ0 )
≤ E exp 2S ,
N
using (6.79), that N − k ≥ N/2 and that Er2 = Er + (Er) ≤ 2γk + 4γ k .
2 2 2

Since the left-hand side of (6.80) is bounded by ≤k E|u − v |, the result
follows. 
In the second step, we replace the sequence Y by an appropriate i.i.d.
sequence of law μγ− . The basic reason this creates only a small error is the
“induction hypothesis” i.e. the control of the quantities D(N − k, m, γ0 ).
Proposition 6.4.4. Consider an independent sequence X = (X1 , . . . , XN −k )
where each Xi has law μ− := μγ− . We set

Av σN −k+ E X
w = , (6.82)
AvE X
and we recall the quantities v of the previous lemma. Then we have

k3
d(L(v1 , . . . , vk ), L(w1 , . . . , wk )) ≤ K(p, γ0 ) (6.83)
N
+ 4ES exp 2SED(N − k, r(p − 1), γ0 ) ,

where the last expectation is taken with respect to the Poisson r.v. r.
The proof will rely on the following lemma.
Lemma 6.4.5. Assume that Ω does not occur. Consider  ≤ k and

E = exp θj (σi(j,1) , . . . , σi(j,p−1) , σN −k+ ) , (6.84)

where the summation is over those j ≤ r for which i(j, p) = N − k + . Then


for any sequence x we have
Av σN −k+ E x Av σN −k+ E x
= . (6.85)
Av E x Av E x
Consequently
6.4 The Replica-Symmetric Solution 351

∂ Av σN −k+ E x
=0 (6.86)
∂xi Av E x
unless i ∈ Ij for some j with i(j, p) = N −k+. In that case we have moreover
 
 ∂ Av σN −k+ E x 
  ≤ 4Sj exp 2Sj . (6.87)
 ∂xi Av E x 

Proof. Define E by E = E E . Since Ω does not occur, the quantities


σN −k+ E and E depend on disjoint sets of coordinates. Consequently

Av σN −k+ E = (Av σN −k+ E )(Av E ) (6.88)

Av E = (Av E )(Av E ) . (6.89)


In both (6.88) and (6.89) the two factors on the right depend on disjoint sets
of coordinates. Since · x is a product measure, we get

Av σN −k+ E x = Av σN −k+ E x Av E x

and similarly with (6.89), so that (6.85) follows, of which (6.86) is an obvious
consequence. As for (6.87), it is proved exactly as in Lemma 6.3.3. 
Proof of Proposition 6.4.4. Thestrategy is to construct a specific realiza-
& X for which the quantity E ≤N −k |v −w | is small. Consider the set
tion of
J = j≤r Ij (so that cardJ ≤ (p − 1)r). The construction takes place given
the set J. By definition of D(N − k, r(p − 1), γ0 ), given J we can construct
an i.i.d. sequence (Xi )i≤N −k distributed like μ− that satisfies

E− |Xi − σi − | ≤ 2D(N − k, r(p − 1), γ0 ) . (6.90)
i∈J

We can moreover assume that the sequence (θj )j≥1 is independent of the
randomness generated by J and the variables Xi . The sequence (Xi )i≤N −k
is our specific realization. It is i.i.d. distributed like μ− .
It follows from Lemma 6.4.5 that if Ω does not occur,
 
 Av σN −k+ E X Av σN −k+ E Y 

|w − v | =  −
Av E X Av E Y 
  
≤ |Xi − σi − | 2Sj exp 2Sj ,
i∈Ij

where the first sum is over those j ≤ r for which i(j, p) = N − k + . By


summation over  ≤ k, we get that when Ω does not occur,
 
|w − v | ≤ 2 |Xi − σi − |Sj(i) exp 2Sj(i) ,
≤k i∈J
352 6. The Diluted SK Model and the K-Sat Problem

where j(i) is the unique j ≤ r with i ∈ Ij . Denoting by Eθ expectation in the


r.v.s (θj )j≥1 and using independence we get
 
Eθ |w − v | ≤ 2 |Xi − σi − |ES exp 2S .
≤k i∈J

Taking expectation E− and using (6.90) implies that when Ω does not occur,

Eθ E− |w − v | ≤ 4(ES exp 2S)D(N − k, r(p − 1), γ0 ) ,
≤k

i.e.

1Ω c Eθ E− |w − v | ≤ 4(ES exp 2S)D(N − k, r(p − 1), γ0 ) . (6.91)
≤k

On the other hand, on Ω we have trivially



Eθ E− |w − v | ≤ 2k ,
≤k

and combining with (6.91) we see that



Eθ E− |w − v | ≤ 4(ES exp 2S)D(N − k, r(p − 1), γ0 ) + 2k1Ω .
≤k

Taking expectation and using (6.79) again yields


 k 3 K(p, γ0 )
E |w − v | ≤ + 4(ES exp 2S)ED(N − k, r(p − 1), γ0 ) ,
N
≤k

and this implies (6.83). 


Now comes the key step: by definition of the operator T of (6.51) the r.v.s
w of (6.82) are nearly independent with law T (μ− ).

Proposition 6.4.6. We have

k2
d(L(w1 , . . . , wk ), T (μ− ) ⊗ · · · ⊗ T (μ− )) ≤ K(p, γ0 ) . (6.92)
N
Proof. Let us define, for  ≤ k
+ ,
r() = card j ≤ r; i(j, p − 1) ≤ N − k, i(j, p) = N − k +  , (6.93)

so that when Ω does not occur, r() is the number of terms in the summation
of (6.84), and moreover for different values of , the sets of indices occurring
in (6.84) are disjoint. The sequence (r())≤k is an i.i.d. sequence of Poisson
r.v.s. (and their common mean will soon be calculated).
6.4 The Replica-Symmetric Solution 353

For  ≥ 1 and j ≥ 1 let us consider independent copies θ,j of θ and for


m ≥ 1 let us define, for σ ∈ RN ,

E,m = E,m (σ, ε) = exp θ,j (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) ,
1≤j≤m

a formula that should be compared to (6.49).


For  ≤ k, let us consider sequences X = (Xi, )i≥1 , where the r.v.s Xi,
are all independent of law μ− . Let us define w = w when Ω occurs, and
otherwise  
Av εE,r() X

w = . (6.94)
Av E,r() X
The basic fact is that the sequences (w )≤k and (w )≤k have the same law.
This is because they have the same law given the r.v. r and the numbers
i(j, 1), . . . , i(j, p) for j ≤ r. This is obvious when Ω occurs, since then w =
w . When Ω does not occur we simply observe from (6.85) and the definition
of w that
Av σN −k+ E X
w = .
Av E X
We then compare with (6.94), keeping in mind that there are r() terms in
the summation (6.84), and then using symmetry.
Therefore we have shown that

L(w1 , . . . , wk ) = L(w1 , . . . , wk ) . (6.95)

Since the sequence (r())≤k is an i.i.d. sequence of Poisson r.v.s, the sequence
(w )≤k is i.i.d. It has almost law T (μ− ), but not exactly because the Poisson
r.v.s r() do not have the correct mean. This mean γ = Er() is given by

N −k
N γ p−1 (N − k) · · · (N − k − p + 2)
γ =  =γ ≤γ.
p N (N − 1) · · · (N − p + 1)
p

To bound the small error created by the difference between γ and γ we


proceed as in the proof of Lemma 6.3.4. We consider independent Poisson
r.v.s (r ())≤k of mean γ − γ , so that s() = r() + r () is an independent
sequence of Poisson r.v.s of mean γ. Let
 
Av εE,s() X
w =    .
Av E,s() X


The sequence (w )≤k is i.i.d. and the law of w is T (μ− ). Thus (6.95) implies:

d(L(w1 , . . . , wk ), T (μ− ) ⊗ · · · ⊗ T (μ− )) = d(L(w1 , . . . , wk ), L(w1 , . . . , wk ))



≤ E|w − w | .
≤k
354 6. The Diluted SK Model and the K-Sat Problem

Now, since w = w unless Ω occurs or s() = r(), we have


 
E|w − w | ≤ 2 P(s() = r()) + P(Ω)

and
P(s() = r()) = P(r () = 0) ≤ γ − γ .
Moreover from (6.75) we see that γ − γ ≤ 2γkp/N. The result follows. 
The next lemma is the last step. It quantifies the fact that T (μ− ) is nearly
μ.
Lemma 6.4.7. We have
4γk2 p
d(T (μ− )⊗k , μ⊗k ) ≤ . (6.96)
N
Proof. The left-hand side is bounded by
k
kd(T (μ− ), μ) = kd(T (μ− ), T (μ)) ≤ d(μ, μ− ) ≤ 2k(γ − γ− ) ,
2
using Lemma 6.3.4. The result follows since by (6.75) we have γ − γ− ≤
2kpγ/N . 

Proof of Theorem 6.4.1. We set B = 4ES exp 2S. Using the triangle in-
equality for the transportation-cost distance and the previous estimates, we
have shown that for a suitable value of K2 (γ0 , p) we have (recalling the defi-
nition (6.63) of A(0)),
  k 3 A(0)
d L( σN −k+1 , . . . , σN ), μ⊗k ≤ + BED(N − k, r(p − 1), γ0 ) .
N
(6.97)
Given an integer n we say that property C ∗ (N, γ0 , n) holds if

k 3 A(n)
∀p ≤ N ≤ N , ∀k ≤ N , D(N , k, γ0 ) ≤ 21−n k + . (6.98)
N
Since D(N , k, γ0 ) ≤ 2k, C ∗ (N, γ0 , 0) holds for each N . And since A(n) ≥
A(0), C ∗ (p, γ0 , n) holds as soon as K2 (γ0 , p) ≥ 2p, since then D(p, k, γ0 ) ≤
2k ≤ k3 A(0)/p ≤ k3 A(n)/p. We will prove that

C ∗ (N − 1, γ0 , n) ⇒ C ∗ (N, γ0 , n + 1) , (6.99)

thereby proving that C ∗ (N, γ0 , n) holds for each N and n, which is the content
of the theorem.
To prove (6.99), we assume that C ∗ (N − 1, γ0 , n) holds and we consider
k ≤ N/2. It follows from (6.98) used for N = N − k ≤ N − 1 and r(p − 1)
instead of k that since k ≤ N/2 we have

p3 r3 A(n) 2p3 r3 A(n)


D(N −k, r(p−1), γ0 ) ≤ 21−n rp+ ≤ 21−n rp+ , (6.100)
N −k N
6.4 The Replica-Symmetric Solution 355

and going back to (6.97),

  k 3 A(0) 2p3 A(n)


d L( σN −k+1 , . . . , σN ), μ⊗k ≤ 21−n pBEr + + BE(r3 ) .
N N
(6.101)
Since r is a Poisson r.v., (A.64) shows that Er3 = (Er)3 + 3(Er)2 + Er, so
that since Er ≤ 2kγ we have crudely

Er3 ≤ 20(γ + γ 3 )k 3 , (6.102)

using that γ 2 ≤ γ + γ 3 . Since pBEr = 2pBkγ ≤ k/2 by (6.62), using (6.102)


to bound the last term of (6.101) we get

  k3
d L( σN −k+1 , . . . , σN ), μ⊗k ≤ 2−n k + (A(0) + 40p3 (γ + γ 3 )BA(n)) ,
N
and since this holds for each γ ≤ γ0 , the definition of D(N, k, γ0 ) shows that

k3 k 3 A(n + 1)
D(N, k, γ0 ) ≤ 2−n k + (A(0) + 40p3 (γ0 + γ03 )BA(n)) = 2−n k + .
N N
(6.103)
We have assumed k ≤ N/2, but since D(N, k, γ0 ) ≤ 2k and A(n + 1) ≥ A(0),
(6.103) holds for k ≥ N/2 provided K2 (γ0 , p) ≥ 8. This proves C ∗ (N, γ0 , n+1)
and concludes the proof. 

We now turn to the computation of
1 
pN (γ) = E log exp(−HN (σ)) . (6.104)
N σ

We will only consider the situation where (6.66) holds, leaving it to the reader
to investigate what kind of rates of convergence she can obtain when assum-
ing only (6.62). We consider i.i.d. copies (θj )j≥1 of the r.v. θ, that are inde-
pendent of θ, and we recall the notation (6.49). Consider an i.i.d. sequence
X = (Xi )i≥1 , where Xi is of law μγ (given by Theorem 6.3.1). Recalling the
definition (6.49) of Er we define

γ(p − 1)
p(γ) = log 2 − E log exp θ(σ1 , . . . , σp ) X + E log Av Er X . (6.105)
p

Here as usual Av means average over ε = ±1, the notation · X is as in e.g.


(6.51), and r is a Poisson r.v. with Er = γ.
Theorem 6.4.8. Under (6.62) and (6.66), for N ≥ 2, and if γ ≤ γ0 we
have
K log N
|pN (γ) − p(γ)| ≤ , (6.106)
N
where K does not depend on N or γ.
356 6. The Diluted SK Model and the K-Sat Problem

As we shall see later, the factor log N above is parasitic and can be removed.

Let γ− = γ(N − p)/(N − 1) as in (6.18). Theorem 6.4.8 will be a conse-


quence of the following two lemmas, that use the notation (6.104), and where
K does not depend on N or γ.
Lemma 6.4.9. We have
K
|N pN (γ) − (N − 1)pN −1 (γ− ) − log 2 − E log Av Er X| ≤ . (6.107)
N
Lemma 6.4.10. We have


(N − 1)pN −1 (γ) − (N − 1)pN −1 (γ− )


p−1  K
−γ 
E log exp θ(σ1 , . . . , σp ) X ≤ . (6.108)
p N

Proof of Theorem 6.4.8. Combining the two previous relations we get


K
|N pN (γ) − (N − 1)pN −1 (γ) − p(γ)| ≤ ,
N
and by summation over N that

N |pN (γ) − p(γ)| ≤ K log N . 




The following prepares for the proof of Lemma 6.4.10.

Lemma 6.4.11. We have


1
pN (γ) = E log exp θ(σ1 , . . . , σp ) . (6.109)
p

Proof. As N is fixed, it is obvious that pN (γ) exists. A pretty proof of (6.109)


is as follows. Consider δ > 0, i.i.d. copies (θj )j≥1 of θ, sets {i(j, 1), . . . , i(j, p)}
that are independent uniformly distributed over the subsets of {1, . . . , N } of
cardinality p, and define

− HN δ
(σ) = θj (σi(j,1) , . . . , σi(j,p) ) , (6.110)
j≤u

where u is a Poisson r.v. of mean N δ/p. All the sources of randomness in


this formula are independent of each other and of the randomness in HN . In
δ
distribution, HN (σ) + HN (σ) is the Hamiltonian of an N -spin system with
parameter γ + δ, so that

pN (γ + δ) − pN (δ) 1  δ

= E log exp(−HN (σ)) . (6.111)
δ Nδ
6.4 The Replica-Symmetric Solution 357
 
When u = 0, we have HN δ
≡ 0 so that log exp(−HN δ
(σ)) = 0. For very
small δ, the probability that u = 1 is at the first order in δ equal to N δ/p.
The contribution of this case to the right-hand side of (6.111) is, by symmetry
among sites,
1   1
E log exp θ1 (σi(1,1) , . . . , σi(1,p) ) = E log exp θ(σ1 , . . . , σp ) .
p p
The contribution of the case u > 1 is of second order in δ, so that taking the
limit in (6.111) as δ → 0 yields (6.109). 


Lemma 6.4.12. Recalling that X = (Xi )i≥1 where Xi are i.i.d. of law μγ
we have  
 
pN (γ) − 1 E log exp θ(σ1 , . . . , σp )  ≤ K . (6.112)
 p X
N
Proof. From Lemma 6.4.11 we see that it suffices to prove that
  K
E log exp θ(σ1 , . . . , σp ) − E log exp θ(σ1 , . . . , σp ) ≤ . (6.113)
X N
Let us denote by E0 expectation in the randomness of · (but not in θ), and
let S = sup |θ|. It follows from Theorem 6.2.2 (used as in Proposition 6.2.7)
that
   K
E0  exp θ(σ1 , . . . , σp ) − exp θ(σ11 , . . . , σpp )  ≤ exp S .
N
Here and below, the number K depends only on p and γ0 , but not on S or
N . Now  
exp θ(σ11 , . . . , σpp ) = exp θ(σ1 , . . . , σp ) Y ,
where Y = ( σ1 , . . . , σp ). Next, since
 
 ∂ 
 
 ∂xi exp θ(σ1 , . . . , σp ) x  ≤ exp S ,

considering (as provided by Theorem 6.4.1) a joint realization of the sequences


(X, Y) with E0 |X − σ | ≤ K/N for  ≤ p, we obtain as in Section 6.3 that
  K
E0  exp θ(σ1 , . . . , σp ) X − exp θ(σ1 , . . . , σp ) Y
≤ exp S .
N
Combining the previous estimates yields
  K
E0  exp θ(σ1 , . . . , σp ) − exp θ(σ1 , . . . , σp ) X
≤ exp S .
N
Finally for x, y > 0 we have
|x − y|
|log x − log y| ≤
min(x, y)
358 6. The Diluted SK Model and the K-Sat Problem

so that
  K
E0 log exp θ(σ1 , . . . , σp ) − log exp θ(σ1 , . . . , σp ) X
≤ exp 2S ,
N
and (6.113) by taking expectation in the randomness of θ. 

Proof of Lemma 6.4.10. We observe that
γ
pN −1 (γ) − pN −1 (γ− ) = pN −1 (t)dt .
γ−

Combining with Lemma 6.4.12 and Lemma 6.3.4 implies


 
 1  K
γ− ≤ t ≤ γ ⇒ pN −1 (t) − log exp θ(σ1 , . . . , σp ) 
X ≤
p N
and we conclude using that
   
N −p p−1
γ − γ− = γ 1 − =γ . 

N −1 N −1

Proof of Lemma 6.4.9. Let us denote by · − an average for the Gibbs


measure of an (N − 1)-spin system with Hamiltonian (6.13). We recall that
we can write in distribution
D

−HN (σ) = −HN −1 (ρ) + θj (σi(j,1) , . . . , σi(j,p−1) , σN ) ,
j≤r

where (θj )j≥1 are independent distributed like θ, where r is a Poisson r.v.
of expectation γ and where the sets {i(j, 1), . . . , i(j, p − 1)} are uniformly
distributed over the subsets of {1, . . . , N − 1} of cardinality p − 1. All these
randomnesses, as well as the randomness of HN −1 are globally independent.
Thus the identity
 
E log exp(−HN (σ)) = E log exp(−HN −1 (ρ))
σ ρ
+ log 2 + E log Av E − (6.114)

holds, where

E = E(ρ, ε) = exp θj (σi(j,1) , . . . , σi(j,p−1) , ε) .
j≤r

The term log 2 occurs from the identity a(1) + a(−1) = 2Av a(ε). Moreover
(6.114) implies the equality

N pN (γ) − (N − 1)pN −1 (γ− ) = log 2 + E log Av E − .

Thus (6.107) boils down to the fact that


6.5 The Franz-Leone Bound 359
  K
E log Av E − E log Av Er ≤ . (6.115)
− X
N
The reason why the left-hand side is small should be obvious, and the argu-
ments have already been used in the proof of Lemma 6.4.12. Indeed, it follows
from Theorems 6.2.2 and 6.4.1 that if F is a function on the (N − 1)-spin
system that depends only on k spins, the law of the r.v. F − is nearly that of
F Y where Yi are i.i.d. r.v.s of law μ− = μγ− (which is nearly μγ ). The work
consists in showing that the bound in (6.115) is actually in K/N . Writing the
full details is a bit tedious, but completely straightforward. We do not give
these details, since the exact rate in (6.107) will never be used. As we shall
soon see, all we need in (6.106) is a bound that goes to 0 as N → ∞. 


Theorem 6.4.13. Under (6.62) and (6.66)we have in fact


K
|pN (γ) − p(γ)| ≤ . (6.116)
N
Proof. It follows from (6.112) that the functions pN (γ) converge uni-
formly over the interval [0, γ0 ]. On the other hand, Theorem 6.4.8 shows
that p(γ) = lim pN (γ). Thus p(γ) has a derivative p (γ) = limN →∞ pN (γ), so
that (6.112) means that |pN (γ) − p (γ)| ≤ K/N , from which (6.116) follows
by integration. 

Comment. In this argument we have used (6.106) only to prove that
1
p (γ) = E log exp θ(σ1 , . . . , σp ) X .
p
One would certainly wish to find a simple direct proof of this fact from the
definition of (6.105). A complicated proof can be found in [56], Proposi-
tion 7.4.9.

6.5 The Franz-Leone Bound


In the previous section we showed that, under (6.62), the value of pN (γ) is
nearly given by the value (6.105). In the present section we prove a remarkable
fact. If the function θ is nice enough, one can bound pN (γ) by a quantity
similar to (6.105) for all values of γ. Hopefully this bound can be considered
as a first step towards the very difficult problem of understanding the present
model without a high-temperature condition. It is in essence a version of
Guerra’s replica-symmetric bound of Theorem 1.3.7 adapted to the present
setting.
We make the following assumptions on the random function θ. We assume
that there exists a random function f : {−1, 1} → R such that

exp θ(σ1 , . . . , σp ) = a(1 + bf1 (σ1 ) · · · fp (σp )) , (6.117)


360 6. The Diluted SK Model and the K-Sat Problem

where f1 , . . . , fp are independent copies of f , b is a r.v. independent of


f1 , . . . , fp that satisfies the condition

∀n ≥ 1, E(−b)n ≥ 0 , (6.118)

and a is any r.v. Of course (6.118) is equivalent to saying that Eb2k+1 ≤ 0


for k ≥ 0. We also assume two further conditions:

|bf1 (σ1 ) · · · fp (σp )| ≤ 1 a.e., (6.119)

and
either f ≥ 0 or p is even. (6.120)
Let us consider two examples where these conditions are satisfied. First,
let
θ(σ1 , . . . , σp ) = βJσ1 · · · σp ,
where J is a symmetric r.v. Then (6.117) holds for a = ch(βJ), b = th(βJ),
f (σ) = σ, (6.118) holds by symmetry and (6.120) holds when p is even.
Second, let
 (1 + ηj σj )
θ(σ1 , . . . , σp ) = −β ,
2
j≤p

where ηi are independent random signs. This is exactly the Hamiltonian


relevant to the K-sat problem (6.2). We observe that for x ∈ {0, 1} we have
the identity exp(−βx) = 1 + (e−β − 1)x. Let us set fj (σ) = (1 + ηj σ)/2 ∈
{0, 1}. Since θ(σ1 , . . . , σp ) = −βx for x = f1 (σ1 ) · · · fp (σp ) ∈ {0, 1} we see
that (6.117) holds for a = 1, b = e−β − 1 and fj (σ) = (1 + ηj σ)/2; (6.118)
holds since b < 0, and (6.120) holds since f ≥ 0.
Given a probability measure μ on [−1, 1], consider an i.i.d. sequence X
distributed like μ, and let us denote by p(γ, μ) the right-hand side of (6.105).
(Thus, under (6.62), μγ is well defined and p(γ) = p(γ, μγ )).

Theorem 6.5.1. Conditions (6.117) to (6.119) imply

1  Kγ
∀γ, ∀μ, pN (γ) = E log exp(−HN (σ)) ≤ p(γ, μ) + , (6.121)
N σ
N

where K does not depend on N or γ.

Let us introduce for ε = ±1 the r.v.

U (ε) = log exp θ(σ1 , . . . , σp−1 , ε) X ,

and let us consider independent copies (Ui,s (1), Ui,s (−1))i,s≥1 of the pair
(U (1), U (−1)).
6.5 The Franz-Leone Bound 361

Exercise 6.5.2. As a motivation for the introduction of thequantity U


prove that if we consider the 1-spin system with Hamiltonian − s≤r Ui,s (ε),
the average of ε for this Hamiltonian is equal, in distribution, to the quantity
(6.51). (Hence, it is distributed like T (μ).)

For 0 ≤ t ≤ 1 we consider a Poisson r.v. Mt of mean αtN = γtN/p,


and independent Poisson r.v.s ri,t of mean γ(1 − t), independent of Mt . We
consider the Hamiltonian
  
− HN,t (σ) = θk (σi(k,1) , . . . , σi(k,p) ) + Ui,s (σi ) , (6.122)
k≤Mt i≤N s≤ri,t

where as usual the different sources of randomness are independent of each


other, and we set
1 
ϕ(t) = E log exp(−HN,t (σ)) .
N σ

Proposition 6.5.3. We have

γ(p − 1) Kγ
ϕ (t) ≤ − E log exp θ(σ1 , . . . , σp ) X + . (6.123)
p N
This is of course the key fact.
Proof of Theorem 6.5.1. We deduce from (6.5.3) that

γ(p − 1) Kγ
pN (γ) = ϕ(1) ≤ ϕ(0) − E log exp θ(σ1 , . . . , σp ) X + .
p N

Therefore to prove Theorem 6.5.1 it suffices to show that ϕ(0) = log 2 +


E log Av Er X . For t = 0 the spins are decoupled, so this reduces to the case
N = 1. Since r1,0 has the same distribution as r, we simply observe that if
(Xs )s≤r are independent copies of X, the quantity

exp θs (σ1 , . . . , σp−1 , ε) Xs
s≤r

has the same distribution as the quantity Av Er X . Therefore,


   
E log exp U1,s (ε) = E log exp θs (σ1 , . . . , σp−1 , ε) Xs
ε=±1 s≤r ε=±1 s≤r

= log 2 + E log Av Er X ,

and this completes the proof of Theorem 6.5.1. 



We now prepare for the proof of (6.5.3).
362 6. The Diluted SK Model and the K-Sat Problem

Lemma 6.5.4. We have


 
γ 1
N
 
ϕ (t) ≤ E log exp θ(σi1 , . . . , σip )
p Np i1 ,...,ip =1

p  Kγ
− E log exp U (σi ) + . (6.124)
N N
i≤N

Here, as in the rest of the section, we denote by · an average for the Gibbs
measure with Hamiltonian (6.122), keeping the dependence on t implicit. On
the other hand, the number K in (6.124) is of course independent of t.
Proof. In ϕ (t) there are terms coming from the dependence on t of Mt and
terms coming from the dependence on t of ri,t .
As shown by Lemma 6.4.11, the term created by the dependence of Mt
on t is

γ γ 
N
  γK
E log exp θ(σ1 , . . . , σp ) ≤ E log exp θ(σi1 , . . . , σip ) + ,
p pN p i1 ,...,ip =1
N

because all the terms where the indices i1 , . . . , ip are distinct are equal. The
same argument as in Lemma 6.4.11 shows that the term created by the de-
pendence of ri,t on t is −(γ/N )E log exp U (σi ) . 

Thus, we have reduced the proof of Proposition 6.5.3 (hence, of Theo-
rem 6.5.1) to the following:
Lemma 6.5.5. We have

N
1   p 
p
E log exp θ(σi1 , . . . , σip ) − E log exp U (σi )
i1 ,...,ip =1
N N
i≤N
+(p − 1)E log exp θ(σ1 , . . . , σp ) X ≤0. (6.125)

The proof is not really difficult, but it must have been quite another matter
when Franz and Leone discovered it.
Proof. We will get rid of the annoying logarithms by power expansion,
 xn
log(1 + x) = − (−1)n
n
n≥1

for |x| < 1. Let us denote by E0 the expectation in the randomness of X and
of the functions fj of (6.117) only. Let us define
n
Cn = E0 f (σ1 ) (6.126)
1 
X
1 n
Aj,n = Aj,n (σ , . . . , σ ) = fj (σi ) (6.127)
N
i≤N ≤n
6.5 The Franz-Leone Bound 363

Bn = Bn (σ 1 , . . . , σ n ) = E0 Aj,n . (6.128)

We will prove that the left-hand side quantity (6.125) is equal to


∞
E(−b)n  p 
− E Bn − pBn Cnp−1 + (p − 1)Cnp . (6.129)
n=1
n

The function x → xp is convex on R+ , and when p is even it is convex on


R. Therefore xp − pxy p−1 + (p − 1)y p ≥ 0 for all x, y ∈ R+ , and when p is
even this is true for all x, y ∈ R. Now (6.120) shows that either Bn ≥ 0 and
Cn ≥ 0 or p is even, and thus it holds that Bnp − pBn Cnp−1 + (p − 1)Cnp ≥ 0.
Consequently the right-hand side of (6.129) is ≤ 0 because E(−b)n ≥ 0 by
(6.118).
By (6.117) we have

exp θ(σ1 , . . . , σp ) = a(1 + b fj (σj )) , (6.130)
j≤p

so that, taking the average · X and logarithm, and using (6.119) to allow
the power expansion in the second line,
  
log exp θ(σ1 , . . . , σp ) X = log a + log 1 + b fj (σj )
j≤p X

∞
(−b)n 
n
= log a − fj (σj ) . (6.131)
n=1
n X
j≤p

Now, by independence
 n  n
E0 fj (σj ) = E0 fj (σj ) X = Cnp
j≤p X j≤p

so that
∞
(−b)n p
E0 log exp θ(σ1 , . . . , σp ) X = E0 log a − Cn .
n=1
n

As in (6.131),

1   
N

p
log exp θ(σi1 , . . . , σip )
N i ,...,i
1 p


 
 (−b)n 1 N  n
= log a − p
fj (σij ) .
n=1
n N i ,...,i =1
1 p j≤p

Using replicas, we get


364 6. The Diluted SK Model and the K-Sat Problem
 n 
fj (σij ) = fj (σij ) ,
j≤p ≤n j≤p

so that, using (6.127) in the second line yields

1 
N  n
1 
N  n
fj (σij ) = fj (σij )
Np i1 ,...,ip =1
Np i1 ,...,ip =1 ≤n j≤p
j≤p

= Aj,n .
j≤p

Now from (6.128) and independence we get E0 j≤p Aj,n = Bnp , so that

 ∞
1
N
  (−b)n
E0 log exp θ(σi1 , . . . , σip ) = E0 log a − Bnp .
Np i1 ,...,ip =1 n=1
n

In a similar manner, recalling the definition of U , one shows that



1   (−b)n  
E0 log exp U (σi ) = E0 log a − Bn Cnp−1 ,
N n=1
n
i≤n

and this concludes the proof of Lemma 6.5.5. 




6.6 Continuous Spins

In this section we consider the situation of the Hamiltonian (6.4) when the
spins are real numbers. There are two motivations for this. First, the “main
parameter” of the system is no longer “a function” but rather “a random
function”. This is both a completely natural and fun situation. Second, this
will let us demonstrate in the next section the power of the convexity tools
we developed in Chapters 3 and 4. We consider a (Borel) function θ on Rp ,
i.i.d. copies (θk )k≥1 of θ, and for σ ∈ RN the quantity HN (σ) given by (6.4).
We consider a given probability measure η on R, and we lighten notation by
writing ηN for η ⊗N , the corresponding product measure on RN . The Gibbs
measure is now defined as the random probability measure on RN which has
a density with respect to ηN that is proportional to exp(−HN (σ)). Let us fix
an integer k and, for large N , let us try to guess the law of (σ1 , . . . , σk ) under
Gibbs’ measure. This is a random probability measure on Rk . We expect that
it has a density Yk,N with respect to ηk = η ⊗k . What is the simplest possible
structure? It would be nice if we had

Yk,N (σ1 , . . . , σk )  X1 (σ1 ) · · · Xk (σk ) ,


6.6 Continuous Spins 365

where X1 , . . . , Xk ∈ L1 (η) are random elements of L1 (η), which are proba-


bility densities, i.e. Xi ∈ D, where
" %
D = X ∈ L1 (η) ; X ≥ 0 ; Xdη = 1 . (6.132)

The nicest possible probabilistic structure would be that these random ele-
ments X1 , . . . , Xk be i.i.d, with a common law μ, a probability measure on the
metric space D. This law μ is the central object, the “main parameter”. (If
we wish, we can equivalently think of μ as the law of a random element of D.)
The case of Ising spins is simply the situation where η({1}) = η({−1}) = 1/2,
in which case

D = {(x(−1), x(1)) ; x(1), x(−1) ≥ 0 , x(1) + x(−1) = 2}

and

D can be identified with the interval [−1, 1]


by the map (x(−1), x(1)) → (x(1) − x(−1))/2 . (6.133)

Thus, in that case, as we have seen, the main parameter is a probability


measure on the interval [−1, 1].
We will assume in this section that θ is uniformly bounded, i.e.

S= sup |θ(σ1 , . . . , σp )| < ∞ (6.134)


σ1 ,...,σp ∈R

for a certain r.v. S. Of course (Sk )k≥1 denote i.i.d. copies of S with Sk =
sup |θk (σ1 , . . . , σp )|. Whether or how this boundedness condition can be weak-
ened remains to be investigated. Overall, once one gets used to the higher
level of abstraction necessary compared with the case of Ising spins, the proofs
are really not more difficult in the continuous case. In the present section we
will control the model under a high-temperature condition and the extension
of the methods of the previous sections to this setting is really an exercise.
The real point of this exercise is that in the next section, we will succeed
to partly control the model without assuming a high-temperature condition
but assuming instead the concavity of θ, a result very much in the spirit of
Section 3.1.
Our first task is to construct the “order parameter” μ = μγ . We keep the
notation (6.49), that is we write

Er = Er (σ, ε) = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) ,
1≤j≤r

where now σi and ε are real numbers.


Given a sequence X = (Xi )i≥1 of elements of D, for a function f of
σ1 , . . . , σN , we define (and this will be fundamental)
366 6. The Diluted SK Model and the K-Sat Problem

f X = f (σ1 , . . . , σN )X1 (σ1 ) · · · XN (σN )dη(σ1 ) · · · dη(σN ) , (6.135)

that is, we integrate the generic k-th coordinate with respect to η after making
the change of density Xk .
For consistency with the notation of the previous section, for a function
h(ε) we write
Avh = h(ε)dη(ε) . (6.136)

Thus
AvEr = Er (σ, ε)dη(ε)

is a function of σ only, and AvEr X means that we integrate in σ1 , . . . , σN ,


as in (6.135). We will also need the quantity Er X , where we integrate in
σ1 , . . . , σN as in (6.135), but we do not integrate this factor in ε. Thus Er X
is a function of ε only, and by Fubini’s theorem we have Av Er X = AvEr X .
In particular, the function f of ε given by

Er X
(6.137)
AvEr X

is such that f ≥ 0 and Avf = 1, i.e. f ∈ D.


Consider a probability measure μ on D, and (Xi )i≥1 a sequence of el-
ements of D that is i.i.d. of law μ. We denote by T (μ) the law (in D) of
the random element (6.137) when X = (Xi )i≥1 . When the spins take only
the values ±1, and provided we then perform the identification (6.133), this
coincides with the definition (6.51).

Theorem 6.6.1. Assuming (6.52), i.e. 4γp(ES exp 2S) ≤ 1, there exists a
unique probability measure μ on D such that μ = T (μ).

On D, the natural distance is induced by the L1 norm relative to η, i.e.


for x, y ∈ D
d(x, y) = x − y1 = |x(ε) − y(ε)|dη(ε) . (6.138)

The key to prove Theorem 6.6.1 is the following estimate, where we


consider a pair (X, Y ) of random elements of D, and independent copies
(Xi , Yi )i≥1 of this pair. Let X = (Xi )i≥1 and Y = (Yi )i≥1 .

Lemma 6.6.2. We have


' '  
' Er X Er Y '
' − ' ≤2 Sj exp 2Sj Xi − Yi 1 .
' AvEr X '
AvEr Y 1
j≤r (j−1)(p−1)<i≤j(p−1)
(6.139)
6.6 Continuous Spins 367

Once this estimate has been obtained we proceed exactly as in the proof
of Theorem 6.3.1. Namely, if μ and μ are the laws of X and Y respectively,
and since the law of the quantity (6.137) is T (μ), the expected value of
the left-hand side of (6.139) is an upper bound for the transportation-cost
distance d(T (μ), T (μ )) associated to the distance d of (6.138) (by the very
definition of the transportation-cost distance). Thus taking expectation in
(6.139) implies that

d(T (μ), T (μ )) ≤ 2γp(ES exp 2S)EX − Y 1 .

Since this is true for any choice of X and Y with laws μ and μ respectively,
we obtained that

d(T (μ), T (μ )) ≤ 2γp(ES exp 2S)d(μ, μ ) ,

so that under (6.52) the map T is a contraction for the transportation-cost


distance. This completes the proof of Theorem 6.6.1, modulo the fact that
the set of probability measures on a complete metric space is itself a complete
metric space when provided with the transportation-cost distance.
Proof of Lemma 6.6.2. It is essentially identical to the proof of (6.57),
although we find it convenient to write it a bit differently “replacing Yj by
Xj one at a time”. Let

X(i) = (X1 , . . . , Xi , Yi+1 , Yi+2 . . .) .

To ease notation we write


·i= · X(i) ,
so that
Er X Er r(p−1) Er Y Er 0
= ; = ,
AvEr X AvEr r(p−1) AvEr Y AvEr 0
and to prove (6.136) it suffices to show that if (j − 1)(p − 1) < i ≤ j(p − 1)
we have
' '
' Er i Er i−1 '
' '
' AvEr i − AvEr i−1 ' ≤ (2Sj exp 2Sj )Xi − Yi 1 . (6.140)
1

We bound the left-hand side by I + II, where


' '
' Er i − Er i−1 '
I=' ' ' (6.141)
AvEr i '
' 1
'
' Er i−1 ( AvEr i − AvEr i−1 ) '
II = '
'
' .
' (6.142)
AvEr i AvEr i−1 1

Now we observe that to bound both terms by Sj exp 2Sj Xi − Yi 1 it suffices
to prove that
368 6. The Diluted SK Model and the K-Sat Problem

| Er i − Er i−1 | ≤ Sj exp 2Sj Xi − Yi 1 Er i , (6.143)

(where both sides are functions of ε). Indeed to bound the term I using (6.143)
we observe that ' '
' Er i ' Er i
' '
' AvEr i ' = Av AvEr i = 1 (6.144)
1

and to bound the term II using (6.143) we observe that

| AvEr i − AvEr i−1 | ≤ Av| Er i − Er i−1 |


≤ Sj exp 2Sj Xi − Yi 1 Av Er i

and we use (6.144) again (for i − 1 rather than i).


Thus it suffices to prove (6.143). For this we write Er = E E , where

E = exp θj (σ(j−1)(p−1)+1 , . . . , σj(p−1) , ε) ,

and where E does not depend on σi . Therefore

A := E i = E i−1 .

Since E and E depend on different sets of coordinates, we have

Er i = E i E i =A E i ; Er i−1 = E i−1 E i−1 =A E i−1 .

Let us define B = B(σi , ε) the quantity obtained by integrating E in each


spin σk , k < i, with respect to η, and change of density Xk and each spin σk ,
k > i with respect to η with change of density Yk . Integrating first in the σk
for k = i we obtain

E i = BXi (σi )dη(σi ) ; E i−1 = BYi (σi )dη(σi ) ,

and therefore

Er i =A BXi (σi )dη(σi ) ; Er i−1 =A BYi (σi )dη(σi ) . (6.145)

Consequently,

Er i − Er i−1 =A B(Xi (σi ) − Yi (σi ))dη(σi )

=A (B − 1)(Xi (σi ) − Yi (σi ))dη(σi ) (6.146)


 
because Xi dη = Yi dη = 1. Now, since |θj | ≤ Sj , Jensen’s inequality shows
that | log B| ≤ Sj . Using that | exp x − 1| ≤ |x| exp |x| for x = log B we obtain
that |B − 1| ≤ Sj exp Sj . Therefore (6.146) implies
6.6 Continuous Spins 369

| Er i − Er i−1 | ≤ ASj exp Sj Xi − Yi 1 . (6.147)

Finally
 since exp(−Sj ) ≤ E we have exp(−Sj ) ≤ B, so that exp(−Sj ) ≤
BXi (σi )dη(σi ). The first part of (6.145) then implies that A exp(−Sj ) ≤
Er i and combining with (6.147) this finishes the proof of (6.143) and
Lemma 6.6.2. 

A suitable extension of Theorem 6.2.2 will be crucial to the study of the
present model. As in the case of Theorem 6.6.1, once we have found the
proper setting, the proof is not any harder than in the case of Ising spins.
Let us consider a probability space (X , λ), an integer n, and a family
(fω )ω∈X of functions on (RN )n . We assume that there exists i ≤ N such that
for each ω we have
fω ◦ Ti = −fω , (6.148)
where Ti is defined as in Section 6.2 i.e. Ti exchanges the ith components σi1
and σi2 of the first two replicas and leaves all the other components unchanged.
Consider another function f ≥ 0 on (RN )n . We assume that f and the
functions fω depend on k coordinates (of course what we mean here is that
they depend on the same k coordinates whatever the choice of ω). We assume
that for a certain number Q, we have

|fω |dλ(ω) ≤ Qf . (6.149)

Theorem 6.6.3. Under (6.10), and provided that γ ≤ γ0 , with the previous
notation we have 
| fω |dλ(ω) K0 kQ
E ≤ . (6.150)
f N
Proof. The fundamental identity (6.20):

Av f E −
f =
Av E −

remains true if we define Av as in (6.136). We then copy the proof of Theorem


6.2.2 “by replacing everywhere f by the average of fω in ω” as follows. First,
we define the property C(N, γ0 , B, B ∗ ) by requiring that under the conditions
of Theorem 6.6.3, rather than (6.9):
 
 f  Q(kB + B ∗ )
E  ≤ ,
f  N
we get instead 
| fω |dλ(ω) Q(kB + B ∗ )
E ≤ .
f N
Rather than (6.32) we now define
370 6. The Diluted SK Model and the K-Sat Problem
 
fω,s = (Av fω E) ◦ Uiu − (Av fω E) ◦ Uiu .
u≤s−1 u≤s

We replace (6.34) by
  
|fω,s |dλ(ω) ≤ 4QSv exp 4 Su Av f E ,
u≤r

and again the left-hand side of (6.39) by



| fω |dλ(ω)
E . 

f

We now describe the structure of the Gibbs measure, under a “high-


temperature” condition.
Theorem 6.6.4. There exists a number K1 (p) such that whenever
K1 (p)(γ0 + γ03 )ES exp 4S ≤ 1 , (6.151)
if γ ≤ γ0 , given any integer k, we can find i.i.d. random elements X1 , . . . , Xk
in D of law μ such that

E |YN,k (σ1 , . . . , σk ) − X1 (σ1 ) · · · Xk (σk )|dη(σ1 ) · · · dη(σk )

k3 K(p, γ0 )
≤ E exp 2S , (6.152)
N
where YN,k denotes the density with respect to ηk = η ⊗k of the law of
σ1 , . . . , σk under Gibbs’ measure, and μ is as in Theorem 6.6.1 and where
K(p, γ0 ) depends only on p and γ0 .
-
It is convenient to denote by ≤k X the function

(σ1 , . . . , σk ) → X (σ ) ,
≤k
-
so that the left-hand side of (6.152) is simply EYN,k − ≤k X 1 .
Overall the principle of the proof is very similar to that of the proof of
Theorem 6.4.1, but the induction hypothesis will not be based on (6.152).
The starting point of the proof is the fundamental cavity formula (6.72),
where Av now means that σN −k+1 , . . . , σN are averaged independently with
respect to η. When f is a function of k variables, this formula implies that
Avf (σN −k+1 , . . . , σN )E −
f (σN −k+1 , . . . , σN ) =
AvE −
 
E −
= Av f (σN −k+1 , . . . , σN ) . (6.153)
AvE −
6.6 Continuous Spins 371

The quantity E − is a function of σN −k+1 , . . . , σN only since (σ1 , . . . , σN −k )


is averaged for · − , and (6.153) means that the density with respect to ηk
of the law of σN −k+1 , . . . , σN under Gibbs’ measure is precisely the function

E −
. (6.154)
AvE −

Before deciding how to start the proof of Theorem 6.6.4, we will first take
full advantage of Theorem 6.6.3. For a function f on RN −k we denote
N −k
f • = f (σ11 , σ22 , . . . , σN −k ) − ,

that is, we average every coordinate in a different replica. We recall the set
Ω of (6.79).

Proposition 6.6.5. We have


 
 E − E •  k 2 K

E1Ω c Av  − ≤ E exp 2S . (6.155)
AvE − AvE •  N

Here and in the sequel, K denotes a constant that depends on p and γ0


only. This statement approximates the true density (6.154) by a quantity
which will be much simpler to work with, since it is defined via integration
for the product measure · • .
The proof of Proposition 6.6.5 greatly resembles the proof of Proposition
6.2.7. Let us state the basic principle behind this proof. It will reveal the
purpose of condition (6.149), that might have remained a little bit mysterious.

Lemma 6.6.6. For j ≤ r consider sets Ij ⊂ {1, . . . , N } with cardIj = p,


cardIj ∩ {N − k + 1, . . . , N } = 1, and assume that

j = j ⇒ Ij ∩ Ij  ⊂ {N − k + 1, . . . , N } ,

or, equivalently, that the sets Ij \{1, . . . , N −k} for j ≤ r are all disjoint. Con-
sider functions Wj (σ), depending only on the coordinates in Ij , and assume
that supσ |Wj (σ)| ≤ Sj . Consider

E = exp Wj (σ) .
j≤r

Then we have
 
 E − E •  4K0 k(p − 1) 
E− Av  − ≤ exp 2Sj . (6.156)
AvE − AvE •  N −k
j≤r
372 6. The Diluted SK Model and the K-Sat Problem

Here E− means expectation in the randomness of · − only.


Proof. We “decouple the spins one at a time” for i ≤ N − k, that is, we
write
Ei = E(σ11 , σ22 , . . . , σii , σi+1
1 1
, . . . , σN ),
so that
E − E1 − E • EN −k+1 −
= ; = .
AvE − AvE1 − AvE • AvEN −k+1 −
We bound  
 Ei − Ei−1 − 

E− Av  − . (6.157)
AvEi − AvEi−1 − 
When i belongs to no set Ij this is zero because then Ei = Ei−1 . Suppose
otherwise that i ∈ Ij for a certain j ≤ r. The term (6.157) is bounded by
I + II, where
   
 Ei − Ei−1 −   Ei − Av(Ei − Ei−1 ) − 

I = E− Av   ; 
II = E− Av   .
AvEi −  AvEi − AvEi−1 − 

We first bound the term II. We introduce a “replicated copy” Ei of Ei defined


by
Ei = E(σ1N +1 , σ2N +2 , . . . , σiN +i , σi+1
N +1 N +1
, . . . , σN )
and we write

Ei − Av(Ei − Ei−1 ) − = Ei Av(Ei − Ei−1 ) − .

Exchanging the variables σii and σi1 exchanges Ei and Ei−1 and changes the
sign of the function f = Ei Av(Ei − Ei−1 ). Next we prove the inequality

|Ei − Ei−1 | ≤ (2 exp 2Sj )Ei−1 .

To prove this we observe that E is of the form AB where A does not depend on
the ith coordinate and exp(−Sj ) ≤ B ≤ exp Sj . Thus with obvious notation
|Bi − Bi−1 | ≤ 2 exp Sj ≤ 2 exp 2Sj Bi−1 and since A does not depend on the
ith coordinate we have Ai = Ai−1 and thus

|Ei − Ei−1 | = |Ai Bi − Ai−1 Bi−1 | = Ai−1 |Bi − Bi−1 |


≤ 2 exp 2Sj Ai−1 Bi−1 = (2 exp 2Sj )Ei−1 .

Therefore
|Av(Ei − Ei−1 )| ≤ (2 exp 2Sj )AvEi−1
and
Av|f | ≤ (2 exp 2Sj )AvEi AvEi−1 . (6.158)
Thinking of Av in the left-hand side as averaging over the parameter ω =
(σi )N −k<i≤N,≤N +1 , we see that (6.158) is (6.149) when A = 2 exp 2Sj and
f = AvEi AvEi−1 . Applying (6.150) to the (N −k)-spin system we then obtain
6.6 Continuous Spins 373

K0 k
II ≤ (2 exp 2Sj ) .
N −k
Proceeding similarly we get the same bound for the term I (in a somewhat
simpler manner) and this completes the proof of (6.156). 

Proof of Proposition 6.6.5. We take expected values in (6.156), and we
remember as in the Ising case (i.e. when σi = ±1) that it suffices to consider
the case N ≥ 2k. 

It will be useful to introduce the following random elements V1 , . . . , Vk of
D. (These depend also on N , but the dependence is kept implicit.) The func-
tion V is the density with respect to η of the law of σN −k+ under Gibbs’ mea-
sure. Let us denote by Yk∗ the function (6.154) of σN −k+1 , . . . , σN , which, as
already noted, is the density with respect to ηk of the law of σN −k+1 , . . . , σN
under Gibbs’ measure. Thus V is the th -marginal of Yk∗ , that is, it is ob-
tained by averaging Yk∗ over all σN −k+j for j =  with respect to η.

Proposition 6.6.7. We have


' '
' ∗ . ' Kk 3
'
E'Yk − V ' ≤ E exp 2S . (6.159)
' N
1 ≤k

Moreover, if E is defined as in (6.84), then


' '
' E • ' 2
'
∀ ≤ k , E 'V − ' ≤ Kk E exp 2S . (6.160)
AvE • '1 N
1 1
The L1 norm
- is computed in L (ηk ) in (6.159) and in L (η) in (6.160). The
function ≤k V in (6.159) is of course given by
.  
V (σN −k+1 , . . . , σN ) = V (σN −k+ ) .
≤k 1≤≤k

Proof. Consider the event Ω as in (6.79). Using the L1 -norm notation as in


(6.159), (6.155) means that
' '
' E − E • ' 2
E1Ω c ' − ' ≤ Kk E exp 2S . (6.161)
' AvE − '
AvE • 1 N

When Ω does not occur, we have E = ≤k E , and the quantities E depend
on different coordinates, so that

E •= E • .
≤k

Also, E • depends on σN −k+ but not on σN −k+ for  =  and thus


374 6. The Diluted SK Model and the K-Sat Problem
 
Av E • = Av E • .
≤k ≤k

Therefore
E • 
= U , (6.162)
AvE •
≤k

where
E •
U = .
AvE •
Let us think
of U as a function
- of σN −k+ only, so we can write for consistency
of notation ≤k U = ≤k U . Thus (6.161) means
' '
' ∗ . ' Kk 2
'Yk − U '
E1 Ωc ' ' ≤ N E exp 2S .
1≤k
-
Now Yk∗ − ≤k U 1 ≤ 2, and combining with (6.79) we get
' '
' ∗ . ' Kk 2
'
E'Yk − U '
' ≤ N E exp 2S . (6.163)
1≤k

Now, we have ' '


' ∗ . '
V − U 1 ≤ ' Y
' k − U '
' ,
≤k 1

because the-right-hand side is the average over σN −k+1 , . . . , σN of the quan-


tity |Yk∗ − ≤k U |, and if one averages over σN −k+ for  =  inside the
absolute value rather than outside one gets the left-hand side. Thus (6.160)
follows from (6.163). To deduce (6.159) from (6.163) it suffices to prove that
'. . ' 
' '
' − U '
' V  ' ≤ V − U 1 . (6.164)
≤k ≤k 1 ≤k

This inequality holds whenever V , U ∈ D, and is obvious if “one replaces V


by U one at a time” because

V1 ⊗ · · · ⊗ V ⊗ U+1 ⊗ · · · ⊗ Uk − V1 ⊗ · · · ⊗ V−1 ⊗ U ⊗ · · · ⊗ Uk 1


= V1 ⊗ · · · ⊗ V−1 ⊗ (V − U ) ⊗ U+1 ⊗ · · · ⊗ Uk 1 = V − U 1

since V , U ∈ D for  ≤ k. 



We recall that YN,k denotes the density with respect to ηk of the law of
σ1 , . . . , σk under Gibbs’ measure. Let us denote by Y the density with respect
to η of the law of σ under Gibbs’ measure. We observe that YN,k corresponds
to Yk∗ if we use the coordinates σ1 , . . . , σk rather than σN −k+1 , . . . , σN , and
similarly Y1 , . . . , Yk correspond to V1 , . . . , Vk . Thus (6.159) implies
6.6 Continuous Spins 375
. Kk 3
EYN,k − Y 1 ≤ E exp 2S .
N
Using as in (6.164) that
' '
'. . ' 
' Y − X '
' ≤ Y − X 1 ,
' 
≤k ≤k 1 ≤k

then (6.159) shows that to prove Theorem 6.6.4, the following estimates suf-
fices.

Theorem 6.6.8. Assuming (6.151), if γ ≤ γ0 , given any integer k, we can


find i.i.d. random elements X1 , . . . , Xk in D with law μ such that
 k 3 K(p, γ0 )
E Y − X 1 ≤ E exp 2S . (6.165)
N
≤k

We will prove that statement by induction on N . Denoting by D(N, γ0 , k)


the quantity 
sup inf E Y − X 1 ,
γ≤γ0 X1 ,...,Xk
≤k

one wishes to prove that

k3 K
D(N, γ0 , k) ≤ E exp 2S .
N
For this we relate the N -spin system with the (N − k)-spin system. For this
purpose, the crucial equation is (6.162). The sequence V1 , . . . , Vk is distributed
as (Y1 , . . . , Yk ). Moreover, if for i ≤ N − k we denote by Yi− the density with
respect to η of the law of σi under the Gibbs measure of the (N − k)-spin
system, we have, recalling the notation (6.135)

E • = E Y ,

where Y = (Y1− , . . . , YN−−k ), so that (6.160) implies


' '
 ' E Y ' 3
'
E1Ω c 'V − ' ≤ Kk E exp 2S . (6.166)
AvE Y '1 N
≤k

We can then complete the proof of Theorem 6.6.8 along the same lines as in
Theorem 6.4.1. The functions (E )≤k do not depend on too many spins. We
can use the induction hypothesis and Lemma 6.6.2 to show that we can find a
sequence X = (X1 , . . . , XN −k+1 ) of identically distributed random elements
of D, of law μ− (= μγ− , where γ− is given by (6.74)), so that
376 6. The Diluted SK Model and the K-Sat Problem
 ' '
' E X '
E 1Ω c '
'V −
'
AvE X '1
≤k

is not too large. Then the sequence ( E X / AvE X )≤k is nearly i.i.d. with
law T (μ− ), and hence nearly i.i.d. with law μ. Completing the argument
really amounts to copy the proof of Theorem 6.4.1, so this is best left as an
easy exercise for the motivated reader. There is nothing else to change either
to the proof of Theorem 6.4.13.
We end this section by a challenging technical question. The relevance
of this question might not yet be obvious to the reader, but it will become
clearer in Chapter 8, after we learn how to approach the “spherical model”
through the “Gaussian model”. Let us consider the sphere

SN = {σ ∈ RN ; σ = N } (6.167)

and the uniform probability λN on SN .

Research Problem 6.6.9. Assume that the random function θ is Borel


measurable, but not necessarily continuous. Investigate the regularity prop-
erties of the function
1 
t → ψ(t) = E log exp θk (tσi(k,1) , . . . , tσi(k,p) )dλN (σ) .
N
k≤M

In particular, if M is a proportion on N , M = αN , is it true that


√ for large
N the difference ψ(t) − ψ(1) becomes small whenever |t − 1| ≤ 1/ N ?

The situation here is that, even though each of the individual functions
t → θ(tσi(k,1) , . . . , tσi(k,p) ) can be wildly discontinuous, these discontinuities
should be smoothed out by the integration for λN . Even the case θ is not
random and p = 1 does not seem obvious.

6.7 The Power of Convexity

Consider a random convex set V of Rp , and (Vk )k≥1 an i.i.d. sequence of


random convex sets distributed like V . Consider random integers i(k, 1) <
. . . < i(k, p) such that the sets Ik = {i(k, 1), . . . , i(k, p)} are i.i.d. uniformly
distributed over the subsets of {1, . . . , N } of cardinality p. Consider the i.i.d.
sequence of random convex subsets Uk of RN given by

σ ∈ Uk ⇔ (σi(k,1) , . . . , σi(k,p) ) ∈ Vk .

We recall that λN is the uniform probability measure on the sphere SN ,


and that M is a Poisson r.v. of expectation αN .
6.7 The Power of Convexity 377

Research Problem 6.7.1. (Level 3) Prove that, given p, V and α, there is


a number a∗ such that for N large
  
1
log λN SN ∩ Uk  a∗ (6.168)
N
k≤M

with overwhelming probability, and compute a∗ .

The value a∗ = −∞ is permitted; in that case/ we expect that given any


number a > 0, for N large we have λN (SN ∩ k≤M Uk ) ≤ exp(−aN ) with
overwhelming probability. Problem 6.7.1 makes sense even if the random
set V is not convex, but we fear that this case could be considerably more
difficult.
Consider a number κ > 0, and the probability measure η (= ηκ ) on R
of density κ/π exp(−κx2 ) with respect to Lebesgue measure. After reading
Chapter 8, the reader will be convinced that a good idea to approach Problem
6.7.1 is to first study the following, which in any case is every bit as natural
and appealing as Problem 6.7.1.
Research Problem 6.7.2. (Level 3) Prove that, given p, V, α and κ there
is a number a∗ such for large N we have
 
1 ⊗N
log η Uk  a∗ (6.169)
N
k≤M

with overwhelming probability, and compute a∗ .

Here again, the value a∗ = −∞ is permitted.


Consider a random concave function θ ≤ 0 on Rp and assume that

V = {θ = 0} .

Then, denoting by θ1 , . . . , θM i.i.d. copies of θ, we have


    
η ⊗N Uk = lim exp β θk (σi(k,1) , . . . , σi(k,p) ) dη ⊗N (σ) .
β→∞
k≤M k≤M
(6.170)
Therefore, to prove (6.169) it should be relevant to consider Hamiltonians of
the type 
− HN (σ) = θk (σi(k,1) , . . . , σi(k,p) ) , (6.171)
k≤M

where θ1 , . . . , θk are i.i.d. copies of a random concave function θ ≤ 0. These


Hamiltonians never satisfy a condition supσ1 ,...,σp ∈R |θ(σ1 , . . . , σp )| < ∞ such
as (6.134) unless θ ≡ 0, and we cannot use the results of the previous sec-
tions. The purpose of the present section is to show that certain methods
378 6. The Diluted SK Model and the K-Sat Problem

we have already used in Chapter 4 allow a significant step in the study of


the Hamiltonians (6.171). In particular we will “prove in the limit the funda-
mental self-consistency equation μ = T (μ)”. We remind the reader that we
assume
θ is concave, θ ≤ 0 . (6.172)
We will also assume that there exists a non random number A (possibly very
large) such that θ satisfies the following Lipschitz condition:

∀σ1 , . . . , σp , σ1 , . . . , σp , |θ(σ1 , . . . , σp ) − θ(σ1 , . . . , σp )| ≤ A |σj − σj | .
j≤p
(6.173)
The Gibbs measure is defined as usual as the probability measure on RN
with density with respect to η ⊗N that is proportional to exp(−HN (σ)), and
· denotes an average for this Gibbs measure.

Lemma 6.7.3. There exists a number K (depending on p, A, α and κ) such


that we have
|σ1 |
E exp ≤K. (6.174)
K

Of course it would be nice if we could improve (6.174) into E exp(σ12 /K) ≤


K.

Lemma 6.7.4. The density Y with respect to η of the law of σ1 under Gibbs’
measure satisfies

∀x, y ∈ R , Y (y) ≤ Y (x) exp rA|y − x| (6.175)

where r = card{k ≤ M ; i(k, 1) = 1}.

This lemma is purely deterministic, and is true for any realization of the
disorder. It is good however to observe right away that r is a Poisson r.v.
with Er = γ, where as usual γ = αp and EM = αN .
Proof. Since the density of Gibbs’ measure with respect to η ⊗N is propor-
tional to exp(−HN (σ)), the function Y (σ1 ) is proportional to

f (σ1 ) = exp(−HN (σ))dη(σ2 ) · · · dη(σN ) .

We observe now that the Hamiltonian HN depends on σ1 only through the


terms θk (σi(k,1) , . . . , σi(k,p) ) for which i(k, 1) = 1 so (6.173) implies that
f (σ1 ) ≤ f (σ1 ) exp rA|σ1 − σ1 | and this in turn implies (6.175). 

Proof of Lemma 6.7.3. We use (6.175) to obtain

Y (0) exp(−rA|x|) ≤ Y (x) ≤ Y (0) exp rA|x| . (6.176)


6.7 The Power of Convexity 379

Thus, using Jensen’s inequality:


 
1= Y dη ≥ Y (0) exp(−rA|x|)dη(x) ≥ Y (0) exp −rA |x|dη(x)
 
LrA
≥ Y (0) exp − √
κ
≥ Y (0) exp(−rK) ,

where, throughout the proof K denotes a number depending on A, κ and p


only, that may vary from time to time. Also,
 κ  κ
exp σ12 = exp x2 Y (x)dη(x)
2 2
κx2
≤ Y (0) exp exp rA|x|dη(x)
2
 
√ κx2
= Y (0) κπ exp − exp rA|x|dx
2
≤ KY (0) exp Kr2

by a standard computation, or simply using that −κx2 /2+rA|x| ≤ −κx2 /4+


Kx2 . Combining with (6.176) yields
 κ 
exp σ12 ≤ K exp Kr2 (6.177)
2
so that Markov’s inequality implies
 
κy 2
1{|σ1 |≥y} ≤ K exp Kr −
2
.
2

Using this for y = K x, we obtain

r≤x ⇒ 1{|σ1 |≥Kx} ≤ K exp(−x2 ) .

Now, since r is a Poisson r.v. with Er = αp we have E exp r ≤ K, and thus

E 1{|σ1 |≥Kx} ≤ K exp(−x2 ) + P(r > x) ≤ K exp(−x) ,

from which (6.174) follows. 



The essential fact, to which we turn now, is a considerable generalization
of the statement of Theorem 3.1.11 that “the overlap is essentially constant”.
Throughout the rest of the section, we also assume the following mild condi-
tion:
Eθ2 (0, . . . , 0) < ∞ . (6.178)
380 6. The Diluted SK Model and the K-Sat Problem

Proposition 6.7.5. Consider functions f1 , . . . , fn on R, and assume that


for a certain number D we have
()
|fk (x)| ≤ D (6.179)

for  = 0, 1, 2 and k ≤ n. Then the function


1 
R = R(σ 1 , . . . , σ n ) = f1 (σi1 ) · · · fn (σin ) (6.180)
N
i≤N

satisfies
K
E (R − E R )2 ≤ √ , (6.181)
N
where K depends only on κ, n, D and on the quantity (6.178).

The power of this statement might not be intuitive, but soon we will
show that it has remarkable consequences. Throughout the proof, K denotes
a number depending only on κ, n, A, D and on the quantity (6.178).
Lemma 6.7.6. The conditions of Proposition 6.7.5 imply:
K
(R − R )2 ≤ √ . (6.182)
N
Proof. The Gibbs’ measure on RN n has a density proportional to
   
exp − HN (σ  ) − κ σ  2
≤n ≤n

with respect to Lebesgue’s measure. √


It is straightforward that the gradient
of R at every point has a norm ≤ K/ N , so that
K
R has a Lipschitz constant ≤ . (6.183)
N
Consequently (6.182) follows from (3.17) used for k = 1. 

To complete the proof of Proposition 6.7.5 it suffices to show the following.
Lemma 6.7.7. We have
K
E( R − E R )2 ≤ √ . (6.184)
N
Proof. This proof mimics the Bovier-Gayrard argument of Section 4.5. Writ-
ing ηN = η ⊗N , we consider the random convex function
   
1
ϕ(λ) = log exp − HN (σ ) − κ

σ  + λN R dσ 1 · · · dσ n ,
 2
N
≤n ≤n
6.7 The Power of Convexity 381

so that
ϕ (0) = R .
We will deduce (6.184) from Lemma 4.5.2 used for k = 1 and δ = 0,
λ0 = 1/K, C0 = K, C1 = K, C2 = K/N , and much of the work consists
in checking conditions (4.135) to (4.138) of this lemma. Denoting by · λ
an average for the Gibbs’ measure with density with respect to Lebesgue’s
measure proportional to
   
exp − HN (σ ) − κ

σ  + λN R ,
 2
(6.185)
≤n ≤n

we have ϕ (λ) = R λ , so |ϕ (λ)| ≤ K and (4.135) holds for C0 = K. We now


prove the key fact that for λ ≤ λ0 = 1/K, the function
 κ  2
− HN (σ  ) − σ  + λN R (6.186)
2
≤n ≤n

is concave. We observe that (6.179) implies


 
 ∂2R  K
 
    ≤ ,
 ∂σi ∂σj  N

and that the left-hand side is zero unless i = j. This implies in turn that at
every point the second differential D of R satisfies |D(x, y)| ≤ Kxy/N
for every x, yin RN n . On the other hand, the second differential D∼ of the
function −κ ≤n σ  2 /2 satisfies at every point D∼ (x, x) = −κx2 for
every x in RN n . Therefore if Kλ ≤ κ, at every point the second differential
D∗ of the function (6.186) satisfies D∗ (x, x) ≤ 0 for every x in RN n , and
consequently this function is concave. Then the quantity (6.185) is of the
type  
κ  2
exp U − σ 
2
≤n

where U is concave; we can then use (6.183) and (3.17) to conclude that

ϕ (λ) = N (R − R λ )2 λ ≤K,

and this proves (4.137) with δ = 0 and hence also (4.136). It remains to prove
(4.138). For j ≤ N let us define

−Hj = θk (σi(k,1) , . . . , σi(k,p) ) .
k≤M,i(k,p)=j

The r.v.s Hj are independent, as is made obvious by the representation of HN


given in Exercise 6.2.3. For m ≤ N we denote by Ξm the σ-algebra generated
382 6. The Diluted SK Model and the K-Sat Problem

by the r.v.s Hj for j ≤ m, and we denote by Em the conditional expectation


given Ξm , so that we have the identity

E(ϕ(λ) − Eϕ(λ))2 = E(Em+1 ϕ(λ) − Em ϕ(λ))2 .
0≤m<N

To prove (4.138), it suffices to prove that for any given value of m we have
K
E(Em+1 ϕ(λ) − Em ϕ(λ))2 ≤ .
N2
Consider the Hamiltonian

− H ∼ (σ) = − Hj (6.187)
j =m+1

and
  
1
ϕ∼ (λ) = log exp H ∼ (σ  ) − κ σ  2 + λN R dσ 1 · · · dσ n .
N
≤n ≤n

It should be obvious that (since we have omitted the term Hm+1 in (6.187))

Em ϕ∼ (λ) = Em+1 ϕ∼ (λ) ,

so that
 2
E Em+1 ϕ(λ) − Em ϕ(λ))2 = E(Em+1 (ϕ(λ) − ϕ∼ (λ)) − Em (ϕ(λ) − ϕ∼ (λ))
 2
≤ 2E Em+1 (ϕ(λ) − ϕ∼ (λ))
 2
+ 2E Em (ϕ(λ) − ϕ∼ (λ))
≤ 4E(ϕ(λ) − ϕ∼ (λ))2 .

Therefore, it suffices to prove that


K
E(ϕ(λ) − ϕ∼ (λ))2 ≤ . (6.188)
N2
Thinking of λ as fixed, let us denote by · ∼ an average on RN n with
respect to the probability measure on RN n of density proportional to
   
exp − H ∼ (σ  ) − κ σ  2 + λN R .
≤n ≤n

We observe the identity


)   *
1
ϕ(λ) − ϕ∼ (λ) = log exp − (HN (σ  ) − H ∼ (σ  )) .
N
≤n ∼
6.7 The Power of Convexity 383

Now HN = H ∼ + Hm+1 and therefore


)   *
∼ 1
ϕ(λ) − ϕ (λ) = log exp − 
Hm+1 (σ ) . (6.189)
N
≤n ∼

Since −Hm+1 ≤ 0 we have ϕ(λ) − ϕ (λ) ≤ 0. Let us define

r = card{k ≤ M ; i(k, p) = m + 1} ,

the number of terms in Hm+1 , so that r is a Poisson r.v. with


m
p−1
Er = αN N  ≤ αp .
p

From (6.189) and Jensen’s inequality it follows that


1 
0 ≥ ϕ(λ) − ϕ∼ (λ) ≥ − Hm+1 (σ  ) , (6.190)
N ∼
≤n

and thus
) 2 *
1  2
1 
(ϕ(λ) − ϕ∼ (λ))2 ≤ 2 − Hm+1 (σ  ) ≤ 2 Hm+1 (σ  ) .
N ∼ N
≤n ≤n ∼

Therefore it suffices to prove that for  ≤ n we have

E Hm+1 (σ  )2 ∼ ≤K. (6.191)

Writing ak = |θk (0, . . . , 0)| and using (6.173) we obtain



|θk (σi1 , . . . , σik )| ≤ ak + A |σis | , (6.192)
s≤p

and therefore  
|Hm+1 (σ  )| ≤ ak + A ni |σi | ,
k∈I i≤N

where ni ∈ N and ni = rp, because each of the r terms in Hm+1 creates at
most p terms in the right-hand side. The randomness of Hm+1 is independent
of the randomness of · ∼ , and since Er2 ≤ K and Ea2k < ∞, by (6.178) it
suffices to prove that if i ≤ N then E (σi )2 ∼ ≤ K. This is done by basically
copying the proof of Lemma 6.7.3. Using (6.183) the density Y with respect
to η of the law of σi under Gibbs’ measure satisfies
 
∀x, y ∈ R , Y (x) ≤ Y (y) exp (ri A + K0 /N )|x − y| ,

where ri = card{k ≤ M ; ∃s ≤ p, i(k, s) = i}. The rest is as in Lemma 6.7.3.




384 6. The Diluted SK Model and the K-Sat Problem

The remarkable consequence of Proposition 6.7.5 we promised can be


roughly stated as follows: to make any computation for the Gibbs measure
involving only a finite number of spins, we can assume that different spins
are independent, both for the Gibbs measure and probabilistically. To make
this idea precise, let us recall the notation D of (6.132) (where now η has
density proportional to exp(−κx2 )). Keeping the dependence on N implicit,
let us denote by μ (= μN ) the law in D of the density X with respect to η
of the law of σ1 under Gibbs’ measure. Let us denote by X = (X1 , . . . , XN )
an i.i.d. sequence of random elements of law μ and recall the notation · X
of (6.135).

Theorem 6.7.8. Consider two integers n, k. Consider continuous bounded


functions U1 , . . . , Uk from Rn to R, and a continuous function V : Rk → R.
Then

lim |EV ( U1 (σ1 , . . . , σn ) , U2 (σ1 , . . . , σn ) , . . . , Uk (σ1 , . . . , σn ) )


N →∞
−EV ( U1 (σ1 , . . . , σn ) X, . . . , Uk (σ1 , . . . , σn ) X )| =0. (6.193)

We leave to the reader to formulate and prove an even more general


statement involving functions on several replicas.
Proof. Since U1 , . . . , Uk are bounded, on their range we can uniformly ap-
proximate V by a polynomial, so that it suffices to consider the case where
V is a monomial,
1 · · · xk .
mk
V (x1 , . . . , xk ) = xm 1
(6.194)
The next step is to show that we can assume that for each j ≤ k we have

lim Uj (σ1 , . . . , σn ) = 0 . (6.195)


(σ1 ,...,σn )→∞

To see this, we first note that without loss of generality we can assume that
|Uj | ≤ 1 for each j. Consider for each j ≤ k a function Uj∼ with |Uj∼ | ≤ 1
and assume that for some number S we have

∀i ≤ n , |σi | ≤ S ⇒ Uj∼ (σ1 , . . . , σn ) = Uj (σ1 , . . . , σn ) . (6.196)

Then 
|Uj (σ1 , . . . , σn ) − Uj∼ (σ1 , . . . , σn )| ≤ 1{σs ≥S} ,
s≤n

and therefore

| Uj (σ1 , . . . , σn ) − Uj∼ (σ1 , . . . , σn ) | ≤ 1{σs ≥S} .
s≤n

We note that for numbers x1 , . . . , xk and y1 , . . . , yk , all bounded by 1, we


have the elementary inequality
6.7 The Power of Convexity 385

|xm
1 · · · xk − y1 · · · yk | ≤
1 mk m1 mk
mj |xj − yj | . (6.197)
j≤k

It then follows that if we set


C = U1 (σ1 , . . . , σn ) m1 · · · Uk (σ1 , . . . , σn ) mk
C ∼ = U1∼ (σ1 , . . . , σn ) m1 · · · Uk∼ (σ1 , . . . , σn ) mk
then  
|C − C ∼ | ≤ mj 1{σs ≥S} ,
j≤k s≤n

and therefore
  
|EC − EC ∼ | ≤ mj E 1{σs ≥S} = n mj E 1{σ1 ≥S} .
j≤k s≤n j≤k

By Lemma 6.7.3, the right-hand side can be made small for S large, and since
we can choose the functions Uj that satisfy (6.196) and Uj (σ1 , . . . , σn ) = 0 if
one of the numbers |σs | is ≥ 2S, this indeed shows that we can assume (6.195).
A function Uj that satisfies (6.195) can be uniformly approximated by a
finite sum of functions of the type
f1 (σ1 ) · · · fn (σn ) ,
()
where |fs | is bounded for s ≤ n and  = 0, 1, 2. By expansion we then reduce
to the case where
Uj (σ1 , . . . , σn ) = f1,j (σ1 ) · · · fn,j (σn ) (6.198)
()
and we can furthermore assume that |fs,j | is bounded for  = 0, 1, 2 and
s ≤ n. Assuming (6.194) and (6.198) we have
B := EV ( U1 (σ1 , . . . , σn ) , . . . , Uk (σ1 , . . . , σn ) )
= E f1,1 (σ1 ) · · · fn,1 (σn ) m1
· · · f1,k (σ1 ) · · · fn,k (σn ) mk
.
We will write this expression using replicas. Let m = m1 + · · · + mk . Let
us write {1, . . . , m} as the disjoint union of sets I1 , . . . , Ik with cardIj = mj ;
and for  ∈ Ij and s ≤ n let us set
gs, = fs,j ,
 
so that in particular for  ∈ Ij we have s≤n gs, (σs ) = s≤n fs,j (σs ). Then,
using independence of replicas in the first equality, we get
   
gs, (σs ) = gs, (σs )
≤m s≤n ≤m s≤n
 m1  mk
= fs,1 (σs ) ··· fs,k (σs ) ,
s≤n s≤n
386 6. The Diluted SK Model and the K-Sat Problem

and therefore
   
B=E gs, (σs ) =E gs, (σs ) .
≤m s≤n s≤n ≤m

By symmetry among sites, for any indexes i1 , . . . , in ≤ N , all different, we


have
 
B=E gs, (σis ) . (6.199)
s≤n ≤m

Therefore, for a number K that does not depend on N , we have


 
     K
B − E 1 gs, (σis )  ≤

, (6.200)
 n
N i ,...,i N
1 n s≤n ≤m

where the summation is over all values of i1 , . . . , in . This is seen by using


(6.199) for the terms of the summation where all the indices are different and
by observing that there are at most KN n−1 other terms. Now
1     1   
 
g (σ
s, is ) = g (σ
s, i ) .
N n i ,...,i N
1 n s≤n ≤m s≤n i≤N ≤m

Defining
1  
Rs = gs, (σi ) ,
N
i≤N ≤m

we obtain from (6.200) that


 
   K
B − E Rs ≤
  N .
s≤n

Proposition 6.7.5 shows that for each s we have E |Rs − ERs | ≤ KN −1/4 ,
so that, replacing in turn each Rs by E Rs one at a time,
 
   
E R − E R ≤ K ,
 s s  N 1/4
s≤n s≤n

and therefore  
   K
B − E Rs  ≤ 1/4 . (6.201)
 N
s≤n

Now, using symmetry among sites in the first equality,


  
E Rs = E gs, (σs ) =E gs, (σs ) = E fs,j (σs ) mj
,
≤m ≤m j≤k

and we have shown that


6.7 The Power of Convexity 387
   
 
lim B − E fs,j (σs ) mj 
=0. (6.202)
N →∞
s≤n j≤k

In the special case where V is given by (6.194) and Uj is given by (6.198),


we have
 
EV ( U1 (σ1 , . . . , σn ) X , . . . , Uk (σ1 , . . . , σn ) X ) = E fs,j (σs ) mj ,
s≤n j≤k

so that (6.202) is exactly (6.193) in this special case. As we have shown, this
special case implies the general one. 

Given n, k, and a number C, inspection of the previous argument shows
that the convergence is uniform over the families of functions U1 , . . . , Uk that
satisfy |U1 |, . . . , |Uk | ≤ C.
We turn to the main result of this section, the proof that “in the limit
μ = T (μ)”. We recall the definition of Er as in (6.49), and that r is a Poisson
r.v. of expectation αp. Let us denote by X = (Xi )i≥1 an i.i.d. sequence,
where Xi ∈ D is a random element of law μ = μN (the law of the density
with respect to η of the law of σ1 under Gibbs’ measure), and let us define
T (μ) as follows: if
Er X
Y = ∈D,
AvEr X
then T (μ) is the law of Y in D. The following asserts in a weak sense that in
the limit T (μN ) = μN .

Theorem 6.7.9. Consider an integer n, and continuous bounded functions


f1 , . . . , fn on R. Then
 
 Avfn (ε)Er X 
 Avf1 (ε)Er X
lim E f1 (σ1 ) · · · fn (σ1 ) − E ··· =0.
N →∞ AvEr X AvEr X 
(6.203)

To relate (6.203) with the statement that “T (μ) = μ”, we note that

Avfs (ε)Er X
= Y fs dη ,
AvEr X

so that writing X = X1 , (6.203) means that


 
 
 
lim E f1 Xdη · · · fn Xdη − E f1 Y dη · · · fn Y dη  = 0 . (6.204)
N →∞ 

In a weak sense this asserts that in the limit the laws of X (i.e μ) and Y (i.e.
T (μ)) coincide.
388 6. The Diluted SK Model and the K-Sat Problem

While we do not know how to prove this directly, in a second stage we


will deduce from Theorem 6.7.9 that, as expected,

lim d(μN , T (μN )) = 0 , (6.205)


N →∞

where d is the transportation-cost distance.


Let us now explain the strategy to prove (6.203). The basic idea is to
combine Theorem 6.7.8 with the cavity method. We find convenient to use
the cavity method between an N -spin and an (N + 1)-spin system. Let us
define α by   N
p
α (N + 1) N +1 = αN , (6.206)
p

and let us consider a Poisson r.v. r with Er = α p. The letter r keeps this
meaning until the end of this chapter. For j ≥ 1, let us consider independent
copies θj of θ, and sets {i(j, 1), . . . , i(j, p − 1)} that are uniformly distributed
among the subsets of {1, . . . , N } of cardinality p − 1. Of course we assume
that all the randomness there is independent of the randomness of · . Let us
define 
−H(σ, ε) = θj (σi(j,1) , . . . , σi(j,p−1) , ε)
j≤r 

and E = E(σ, ε) = exp(−H(σ, ε)). Recalling the Hamiltonian (6.171), the


Hamiltonian −H = −HN − H is the Hamiltonian of an (N + 1)-spin system,
where the value of α has been replaced by α given by (6.206). Let us denote
by · an average for the Gibbs measure relative to H . Writing ε = σN +1 ,
symmetry between sites implies

E f1 (σ1 ) · · · fn (σ1 ) = E f1 (ε) · · · fn (ε) . (6.207)

Now, for a function f = f (σ, ε), the cavity formula


Avf E
f =
AvE
holds, where Av means integration in ε with respect to η, and where E =
E(σ, ε) = exp(−H(σ, ε)). We rewrite (6.207) as
f1 (σ1 )AvE fn (σ1 )AvE Avf1 (ε)E Avfn (ε)E
E ··· =E ··· . (6.208)
AvE AvE AvE AvE
We will then use Theorem 6.7.8 to approximately compute both sides of
(6.208) to obtain (6.203). However an obstacle is that the denominators can
be very small, or, in other words, that the function x/y is not continuous at
y = 0. To solve this problem we consider δ > 0 and we will replace these
denominators by δ + AvE .
We will need to take limits as δ → 0, and in order to be able to exchange
these limits with the limits as N → ∞ we need the following.
6.7 The Power of Convexity 389

Lemma 6.7.10. Assume that f = f (σ, ε) is bounded. Then


 
 Avf E Avf E 

lim sup E − =0.
δ→0 N  AvE δ + AvE 

Proof. First, if |f | ≤ C, we have


 
 Avf E Avf E  δ| Avf E |
 Cδ
 − = ≤ .
 AvE δ + AvE  AvE (δ + AvE ) δ + AvE

Next, we have
δ √ √
E ≤ δ + P( AvE ≤ δ) ,
δ + AvE
and, writing H = H(σ, ε),

AvE = Av exp(−H) ≥ exp −AvH ,

so that
 
 √  1 E Av|H|
P AvE ≤ δ ≤ P −AvH ≥ log √ ≤ √ .
δ log(1/ δ)

It follows from (6.173) that


  
|H(σ, ε)| ≤ |θj (0, . . . , 0)| + A |σi(j,s) | + |ε| ,
j≤r  s≤p−1

so that (6.178) and Lemma 6.7.3 imply that supN E Av|H| < ∞ and the
lemma is proved. 


Lemma 6.7.11. We have


 
 f (σ )AvE 
 1 1 fn (σ1 )AvE 
lim E ··· −E f1 (σ1 ) · · · fn (σ1 )  = 0 . (6.209)
N →∞ AvE AvE 

Proof. Consider the event Ω = Ω1 ∪ Ω2 ∩ Ω3 , where

Ω1 = {∃j ≤ r , i(j, 1) = 1}
Ω2 = {∃j, j ≤ r , j = j , ∃,  ≤ p − 1 , i(j, ) = i(j ,  )} (6.210)
Ω3 = {(p − 1)(r + 1) ≤ N } , (6.211)

so that as we have used many times we have


K
P(Ω) ≤ . (6.212)
N
390 6. The Diluted SK Model and the K-Sat Problem

Let us now define



U = Av exp θj (σj(p−1)+1 , . . . , σ(j+1)(p−1) , ε) (6.213)
1≤j≤r 

when (r + 1)(p − 1) ≤ N and U = 1 otherwise. The reader observes that


U depends only on the spins σi for i ≥ p. On Ω c we have i(j, 1) > 1 for all
j < r, and the indexes i(j, ) are all different. Thus symmetry between sites
implies that for any δ > 0,
 
f1 (σ1 )AvE fn (σ1 )AvE
E 1Ω c ···
δ + AvE δ + AvE
 
f1 (σ1 )U fn (σ1 )U
= E 1Ω c ··· . (6.214)
δ+ U δ+ U
We claim that

 f1 (σ1 )U fn (σ1 )U
lim E ···
N →∞ δ+ U δ+ U

f1 (σ1 ) X U X fn (σ1 ) X U X 
−E ··· =0. (6.215)
δ+ U X δ+ U X
To see this we simply use Theorem 6.7.8 given r and the functions θj , j ≤ r .
Since by (6.212) the influence of Ω vanishes in the limit, we get from (6.214)
that

 f (σ )AvE
 1 1 fn (σ1 )AvE
lim E ···
N →∞ δ + AvE δ + AvE

f1 (σ1 ) X U X fn (σ1 ) X U X 
−E ··· =0. (6.216)
δ+ U X δ+ U X 

Without loss of generality we can assume that |fs | ≤ 1 for each s. The
inequality (6.197) and Lemma 6.7.10 yield
 
 f1 (σ1 )AvE fn (σ1 )AvE f1 (σ1 )AvE fn (σ1 )AvE 

lim sup E ··· −E ···
δ→0 N  δ + AvE δ + AvE AvE AvE 
=0. (6.217)

Proceeding as in Lemma 6.7.10, we get


 
 U X 
lim sup E − 1 = 0 , (6.218)
δ→0 N δ+ U X
and proceeding as in (6.217) we obtain
 
 
f1 (σ1 ) X U fn (σ1 ) X U X
lim supE f1 (σ1 ) X · · · fn (σ1 ) X −E
X
···  = 0.
δ→0 N δ+ U X δ+ U X 
6.7 The Power of Convexity 391

Combining with (6.217) and (6.216) proves (6.209) since fs (σ1 ) = fs (σ1 ) X.


To complete the proof of Theorem 6.7.9, we show the following, where we
lighten notation by writing fs = fs (ε).

Lemma 6.7.12. We have


 
 Avf E Avfn E Avf1 Er X Avfn Er X 
 1
lim E ··· −E ··· =0.
N →∞ AvE AvE AvEr X AvEr X 

Proof. We follow the method of Lemma 6.7.11, keeping its notation. For
s ≤ n we define

Us = Avfs (ε) exp θ(σj(p−1)+1 , . . . , σ(j+1)(p−1) , ε)
1≤j≤r 

when (r + 1)(p − 1) ≤ N and Us = 1 otherwise. Consider δ > 0. Recalling


(6.211) and (6.213), symmetry between sites yields
 
Avf1 E Avfn E
E 1Ω c ···
δ + AvE δ + AvE
 
U1 Un
= E 1Ω c ··· . (6.219)
δ+ U δ+ U

Moreover Theorem 6.7.8 implies


 
 
 U1 Un U1 X Un X 
lim E ··· −E ··· =0.
N →∞ δ + U δ+ U δ+ U X δ+ U X

Since the influence of Ω vanishes in the limit, and exchanging again the limits
N → ∞ and δ → 0 as permitted by Lemma 6.7.10 (and a similar argument
for the terms E Us X /(δ + U X )), we obtain
 
 Avf E Avfn E Un X 
 1 U1 X
lim E ··· −E ··· =0.
N →∞ AvE AvE U X U X

It then remains only to show that


 
 U Avf1 Er X Avfn Er X 
 1 X Un X
lim E ··· −E ··· =0,
N →∞ U X U X AvEr X AvEr X 

which should be obvious by the definitions of U , Er and Us and since r is a


Poisson r.v. and, as N → ∞, Er = α p → αp = Er. 

We now state the desired strengthening of Theorem 6.7.9.
392 6. The Diluted SK Model and the K-Sat Problem

Theorem 6.7.13. If d denotes the transportation-cost distance associated to


the L1 norm in D, we have

lim d(μN , T (μN )) = 0 . (6.220)


N →∞

As we shall see, the sequence μ = μN is tight, and (6.220) implies that


any cluster point of this sequence is a solution of the equation μ = T (μ). If
we knew that this equation has a unique solution, we would conclude that
the sequence (μN ) converges to this solution, and we could pursue the study
of the model and in particular we could compute
1
lim E log exp(−HN (σ) − κσ2 )dσ .
N →∞ N

Thus, further results seem to depend on the following.

Research Problem 6.7.14. (Level 2) Prove that the equation μ = T (μ)


has a unique solution.

One really wonders what kind of methods could be used to approach this
question. Even if this can be solved, the challenge remains to find situations
where in the relation (see (6.170))
 
1
E log η ⊗N Uk
N
k≤M
1 
= lim E log exp β θk (σi(k,1) , . . . , σi(k,p) )dη ⊗N (σ)
β→∞ N
k≤M

one can exchange the limits N → ∞ and β → ∞. A similar problem in a


different context will be solved in Chapter 8.
We turn to the technicalities required to prove Theorem 6.7.13. They
are not difficult, although it is hard to believe that these measure-theoretic
considerations are really relevant to spin glasses. For this reason it seems that
the only potential readers for these arguments will be well versed in measure
theory. Consequently the proofs (that use a few basic facts of analysis, which
can be found in any textbook) will be a bit sketchy.

Lemma 6.7.15. Consider a number B and

D(B) = {f ∈ D ; ∀x, y , f (y) ≤ f (x) exp B|y − x|} .

Then D(B) is norm-compact in L1 (η).

Proof. A function f in D(B) satisfies

f (0) exp(−B|x|) ≤ f (x) ≤ f (0) exp B|x| ,


6.7 The Power of Convexity 393

so that since f (x)dη(x) = 1, we have K −1 ≤ f (0) ≤ K where K depends
on B and κ only. Moreover D(B) is equi-continuous on every interval, so a
sequence (fn ) in D(B) has a subsequence that converges uniformly in any
interval; since, given any ε > 0, there exists a number x0 for which

f ∈ D(B) ⇒ |f (x)|dη(x) ≤ ε ,
|x|≥x0

it follows that this subsequence converges in L1 (η). 



We recall the number A of (6.173).

Lemma 6.7.16. For each N and each k we have

μ(D(kA)) ≥ P(r ≤ k) , (6.221)

where r is a Poisson r.v. of mean αp.

Proof. This is a reformulation of Lemma 6.7.4 since (6.175) means that


Y ∈ D(rA). 

Proof of Theorem 6.7.13. The set of probability measures μ on D that
satisfy (6.221) for each k is tight (and consequently is compact for the
transportation-cost distance). Assuming if possible that (6.220) fails, we can
find ε > 0 and a converging subsequence (μN (k) )k≥1 of the sequence (μN )
such that
∀k , d(μN (k) , T (μN (k) )) ≥ ε .
We defined T (ν) for ν = μN . We leave it to the reader to define (in the
same manner) T (ν) for any probability measure ν on D and to show that
the operator T is continuous for d. So that if we define ν = limk μN (k) , then
T (ν) = limk T (μN (k) ) and therefore d(ν, T (ν)) ≥ ε. In particular we have
ν = T (ν). On the other hand, given continuous bounded functions f1 , . . . , fn
on R, since μN is the law of Y (the density with respect to η of the law of σ1
under Gibbs’s measure) in D we have
 
E f1 (σ1 ) · · · fn (σ1 ) = E f1 Y dη · · · fn Y dη
 
= f1 Y dη · · · fn Y dη dμN (Y ) . (6.222)

The map  
ν → ψ(ν) := f1 Y dη · · · fn Y dη dν(Y )

is continuous for the transportation-cost distance; in fact if |fs | ≤ 1 for each


s, one can easily show that |ψ(ν) − ψ(ν )| ≤ nd(ν, ν ). Therefore the limit of
the right-hand side of (6.222) along the sequence (N (k)) is
394 6. The Diluted SK Model and the K-Sat Problem
 
f1 Y dη · · · fn Y dη dν(Y ) .

Also, the definition of T (μN ) implies

Avf1 Er X Avfn Er X
E ···
AvEr X AvEr X
 
= f1 Y dη · · · fn Y dη dT (νN )(Y ) (6.223)

and the limit of the previous quantity along the sequence (N (k)) is
 
f1 Y dη · · · fn Y dη dT (ν)(Y ) .

Using (6.203) we get


 
f1 Y dη · · · fn Y dη dν(Y )
 
= f1 Y dη · · · fn Y dη dT (ν)(Y ) . (6.224)

We will now show that this identity implies ν = T (ν), a contradiction which
completes the proof of the theorem. Approximating a function on a bounded
set by a polynomial yields that if F is a continuous function of n variables,
then
 
F f1 Y dη, . . . , fn Y dη dν(Y )
 
= F f1 Y dη, . . . , fn Y dη dT (ν)(Y ) .

Consequently,
ϕ(Y )dν(Y ) = ϕ(Y )dT (ν)(Y ) , (6.225)

whenever ϕ(Y ) is a pointwise limit of a sequence of uniformly bounded func-


tions of the type
 
Y →F f1 Y dη, . . . , fn Y dη .

These include the functions of the type


 
ϕ(Y ) = min 1, min (ak + Y − Yk 1 ) , (6.226)
k≤k1

where ak are ≥ 0 numbers. This is because


6.8 Notes and Comments 395
   
 
ϕ(Y ) = min 1, min ak + max  f Y dη − f Yk dη  ,
k≤k1

where the maximum is over |f | ≤ 1, f continuous. Any [0, 1]-valued, 1-


Lipschitz function ϕ on D is the pointwise limit of a sequence of functions of
the type (6.226). It then follows that (6.225) implies that ν = T (ν). 


6.8 Notes and Comments

The first paper “solving” a comparable model at high temperature is [153].


A version of Theorem 6.5.1 “with replica symmetry breaking” is presented
in [115], where the proof of Theorem 6.5.1 given here can be found. This proof
is arguably identical to the original proof of [60], but the computations are
much simpler. This is permitted by the identification of which property of θ
is really used (i.e. (6.117)). Another relevant paper is [78], but it deals only
with a very special model.
An interesting feature of the present chapter is that we gain control of
the model “in two steps”, the first of which is Theorem 6.2.2. It would be
esthetically pleasing to find a proof “in one step” of a statement including
both Theorems 6.2.2 and 6.4.1.
There is currently intense interest in specific models of the type considered
in this chapter, see e.g. [51] and [102].
7. An Assignment Problem

7.1 Introduction
Given positive numbers c(i, j), i, j ≤ N , the assignment problem is to find

min c(i, σ(i)) , (7.1)
σ
i≤N

where σ ranges over all permutations of {1, . . . , N }. In words, if c(i, j) rep-


resents the cost of assigning job j to worker i, we want to minimize the total
cost when exactly one job is assigned to each worker.
We shall be interested in the random version of the problem, where the
numbers c(i, j) are independent and uniformly distributed over [0, 1].
Mézard and Parisi [103], [104] studied (7.1) by introducing a suitable
Hamiltonian, and conjectured that
 π2
lim E min c(i, σ(i)) = . (7.2)
N →∞ σ 6
i≤N

This was proved by D. Aldous [2]. Aldous takes advantage of a feature of


the present model, that makes it rather special among the various models we
studied: the existence of a “limiting object” (which he discovered [1]).
In a related direction, G. Parisi conjectured the following remarkable
identity. If the r.v.s c(i, j) are independent exponential i.e. they satisfy
P(c(i, j) ≥ x) = e−x for x ≥ 0, then we have
 1 1
E min c(i, σ(i)) = 1 + + ··· + 2 . (7.3)
σ 22 N
i≤N

The link with (7.2) is that it can be shown that if the r.v.s c(i, j) are
i.i.d., and their common distribution has a density f on R+ with respect
to Lebesgue measure, then if f is continuous in a neighborhood of 0, the
limit in (7.2) depends only on f (0). (The intuition for this is simply that all
the numbers c(i, σ(i)) relevant in the computation of the minimum in (7.2)
should be very small for large N , so that only the part of the distribution of
c(i, j) close to 0 matters.) Thus it makes no difference to assume that c(i, j)
is uniform over [0, 1] or is exponential of mean 1.
M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 397
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3 7, © Springer-Verlag Berlin Heidelberg 2011
398 7. An Assignment Problem

Vast generalizations of Parisi’s conjecture have been recently proved [109],


[96]. Yet the disordered system introduced by Mézard and Parisi remains of
interest. This model is obviously akin to the other models we consider; yet it
is rather different. In the author’s opinion, this model demonstrates well the
far-reaching nature of the ideas underlying the theory of mean field models
for spin glasses.
It is a great technical challenge to prove rigorously anything at all concern-
ing the original model of Mézard and Parisi. This challenge has yet to be met.
We will consider a slightly different model, that turns out to be easier, but
still of considerable interest. In this model, we consider two integers M, N ,
M ≥ N . We consider independent r.v.s (c(i, j))i≤N,j≤M that are uniform
over [0, 1]. The configuration space is the set ΣN,M of all one-to-one maps σ
from {1, . . . , N } to {1, . . . , M }. On this space we consider the Hamiltonian

HN,M (σ) = βN c(i, σ(i)) , (7.4)
i≤N

where β is a parameter. The reader observes that there is no minus sign in


this formula, that is, the Boltzmann factor is
  
exp −βN c(i, σ(i)) .
i≤N

Given a number α > 0, we will study the system for N → ∞, M = N (1+α),


and our results will hold for β ≤ β(α), where, unfortunately, limα→0 β(α) = 0.
The original model of Mézard and Parisi is the case M = N , i.e. α = 0. A
step towards understanding this model would be the following.

Research Problem 7.1.1. (Level 2) Extend the results of the present chap-
ter to the case β ≤ β0 where β0 is independent of α.

Even in the domain β ≤ β(α) our results are in a sense weaker than those
of the previous chapters. We do not study the model for given large values
of N and M , but only in the limit N → ∞ and M/N → α, and we do not
obtain a rate for several of the convergence results.
One of the challenges of the present situation is that it is not obvious
how to formulate the correct questions. We expect (under our condition that
β is small) that “the spins at two different sites are nearly independent”.
Here this should mean that when i1 = i2 , under Gibbs’ measure the variables
σ → σ(i1 ) and σ → σ(i2 ) are nearly independent. But how could one quantify
this phenomenon in a way suitable for a proof by induction?
We consider the partition function

ZN,M = exp(−HN,M (σ)) , (7.5)
σ
7.1 Introduction 399

where the summation is over all possible values of σ in ΣN,M . Throughout


the chapter we write

a(i, j) = exp(−βN c(i, j)) , (7.6)

so that 
ZN,M = a(i, σ(i)) .
σ i≤N

The cavity method will require removing elements from {1, . . . , N } and
{1, . . . , M }. Given a set A ⊂ {1, . . . , N } and a set B ⊂ {1, . . . , M } such that
N − card A ≤ M − card B, we write

ZN,M (A; B) = a(i, σ(i)) .
σ

The product is taken over i ∈ {1, . . . , N }\A and the sum is taken over
the one-to-one maps σ from {1, . . . , N }\A to {1, . . . , M }\B. Thus ZN,M =
ZN,M (∅; ∅). When A = {i1 , i2 , . . .} and B = {j1 , j2 , . . .} we write

ZN,M (A, B) = ZN,M (i1 , i2 , . . . ; j1 , j2 , . . .) .

Rather than working directly with Gibbs’ measure, we will prove that

ZN,M (i; j) ZN,M (∅; j) ZN,M (i; ∅)


 . (7.7)
ZN,M ZN,M ZN,M

It should be obvious that this is a very strong property, and that it deals with
independence. One can also get convinced that it deals with Gibbs’ measure
by observing that

ZN,M (i, j)
G({σ(i) = j}) = a(i, j) .
ZN,M

We consider the quantities

ZN,M (∅; j) ZN,M (i; ∅)


uN,M (j) = ; wN,M (i) = . (7.8)
ZN,M ZN,M

These quantities occur in the right-hand side of (7.7). The number uN,M (j)
is the Gibbs probability that j does not belong to the image of {1, . . . , N }
under the map σ. In particular we have 0 ≤ uN,M (j) ≤ 1. (On the other
hand we only know that wN,M (i) > 0.)
Having understood that these quantities are important, we would like
to know something about the family (uN,M (j))j≤M (or (wN,M (i))i≤N ). An
optimistic thought is that this family looks like an i.i.d. sequence drawn out
of a certain distribution, that we would like to describe, probably as a fixed
point of a certain operator. Analyzing the problem, it is not very difficult to
400 7. An Assignment Problem

guess what the operator should be; the unpleasant surprise is that it does
not seem obvious that this operator has a fixed point, and this contributes
significantly to the difficulty of the problem. In order to state our main result,
let us describe this operator. Of course, the motivation behind this definition
will become clear only gradually.
Consider a standard Poisson point process on R+ (that is, its intensity
measure is Lebesgue’s measure) and denote by (ξi )i≥1 an increasing enumer-
ation of the points it produces. Consider a probability measure η on R+ , and
i.i.d. r.v.s (Yi )i≥1 distributed according to η, which are independent of the
r.v.s ξi . We define
 
1
A(η) = L  (7.9)
i≥1 Yi exp (−βξi /(1 + α))
 
1
B(η) = L  , (7.10)
1 + i≥1 Yi exp(−βξi )

where of course L(X) is the law of the r.v. X. The dependence on β and α
is kept implicit.

Theorem 7.1.2. Given α > 0, there exists β(α) > 0 such that for β ≤ β(α)
there exists a unique pair μ, ν where μ is a probability measure on [0, 1] and
ν is a probability measure on R+ such that
α
xdμ(x) = ; μ = B(ν) ; ν = A(μ) . (7.11)
1+α

Moreover if M = N (1 + α), we have

μ = lim L(uN,M (M )) ; ν = lim L(wN,M (N )) . (7.12)


N →∞ N →∞

Research Problem 7.1.3. (Level 2) Find a direct proof of the existence of


the pair (μ, ν) as in (7.11).

One intrinsic difficulty is that there exists such a pair for each value of α
(not too small); so one cannot expect that the operator B ◦ A is a contraction
for a certain distance. The way we will prove (7.11) is by showing that a
cluster point of the sequence (L(uN,M (M )), L(wN,M (N ))) is a solution of
these equations.
While it is not entirely obvious what are the relevant questions one should
ask about the system, the following shows that the objects of Theorem 7.1.2
are of central importance.

Theorem 7.1.4. Given α, for β ≤ β(α) we have

1
lim E log ZN,M = −(1 + α) log x dμ(x) − log x dν(x) . (7.13)
N →∞ N
7.2 Overview of the Proof 401

7.2 Overview of the Proof

In this section we try to describe the overall strategy. The following funda-
mental identities are proved in Lemma 7.3.4 below
1
uN,M (M ) =  (7.14)
1+ k≤N a(k, M ) wN,M −1 (k)

1
wN,M (N ) =  . (7.15)
≤M a(N, ) uN −1,M ()
Observe that in the right-hand side of (7.14) the r.v.s a(k, M ) are independent
of the numbers wN,M −1 (k), and similarly in (7.15). We shall prove that

wN,M (k)  wN,M −1 (k)  wN,M −2 (k) . (7.16)

This fact is not easy. It is intimately connected to equation (7.7), and is


rigorously established in Theorem 7.4.7 below.
Once we have (7.16) we see from (7.14) that
1
uN,M (M )   , (7.17)
1+ k≤N a(k, M ) wN,M −2 (k)

and by symmetry between M and M − 1 that


1
uN,M (M − 1)   . (7.18)
1+ k≤N a(k, M − 1) wN,M −2 (k)

As a consequence, given the numbers wN,M −2 (k), the r.v.s uN,M (M ) and
uN,M (M − 1) are nearly independent. Their common law depends only on
the empirical measure
1 
δwN,M −2 (i) ,
N
i≤N

which, by (7.16), is nearly


1 
νN = δwN,M (i) . (7.19)
N
i≤N

We consider an independent sequence of r.v.s (Xk )k≥1 uniformly dis-


tributed on [0, 1], independent of all the other sources of randomness, and
we set
a(k) = exp(−βN Xk ) . (7.20)
The reason this sequence is of fundamental importance for the present model
is that, given j, the sequence (a(k, j))k of r.v.s has the same distribution
as the sequence (a(k))k , and, given i, this is also the case of the sequence
(a(i, k))k .
402 7. An Assignment Problem

Consider the random measure μN on [0, 1] given by


 
1
μN = La  ,
1 + k≤N a(k)wN,M (k)

where La denotes the law in the randomness of the variables a(k), when all
the other sources of randomness are fixed.
Thus, given the numbers wN,M (k), the r.v.s uN,M (M ) and uN,M (M − 1)
are nearly independent with common law μN . By symmetry this is true for
each pair of r.v.s uN,M (j) and uN,M (k).
Therefore we expect that the empirical measure
1 
μN = δuN,M (j)
M
j≤M

is nearly μN .
Since μN is a continuous function of νN , it follows that if νN is concen-
trated (in the sense that it is nearly non-random), then such is the case of
μN , that is nearly concentrated around its mean μN , and therefore μN itself
is concentrated around μN .
We can argue similarly that if μN is concentrated around μN , then νN
must be concentrated around a certain measure νN that can be calculated
from μN . The hard part of the proof is to get quantitative estimates showing
that if β is sufficiently small, then these cross-referential statements can be
combined to show that both μN and νN are concentrated around μN and
νN respectively. Now, the way μN is obtained from νN means in the limit
that μN  B(νN ). Similarly, νN  A(μN ). Also, μN = L(uN,M (M )) and
νM = L(wN,M (N )), so μ = limN L(uN,M (M )) and ν = limN L(wN,M (N ))
satisfy μ = B(ν) and ν = A(μ).

7.3 The Cavity Method

We first collect some simple facts.

Lemma 7.3.1. If i ∈
/ A, we have

ZN,M (A; B) = a(i, ) ZN,M (A ∪ {i} ; B ∪ {}) . (7.21)
∈B
/

If j ∈
/ B, we have

ZN,M (A; B) = ZN,M (A; B ∪ {j}) + a(k, j) ZN,M (A ∪ {k} ; B ∪ {j}) .
k∈A
/
(7.22)
7.3 The Cavity Method 403

Proof. One replaces each occurrence of ZN,M (·; ·) by its value and one checks
that the same terms occur in the left-hand and right-hand sides. 
The following deserves no proof.

Lemma 7.3.2. If M ∈
/ B, we have

ZN,M (A; B ∪ {M }) = ZN,M −1 (A; B) . (7.23)

If N ∈
/ A, we have

ZN,M (A ∪ {N }; B) = ZN −1,M (A; B) . (7.24)

In (7.24), and in similar situations below, we make the convention that


ZN −1,M (·; ·) is considered for a parameter β such that β (N − 1) = βN .
The following is also obvious from the definitions, yet it is fundamental.

Lemma 7.3.3. We have



ZN,M (∅; ) = (M − N )ZN,M (7.25)
≤M

and thus 
uN,M () = M − N . (7.26)
≤M

To prove (7.26) we can also observe that uN,M () is the Gibbs probability
that  does not belong to the image under σ of {1, · · · , N }, so that the left-
hand side of (7.26) is the expected number of integers that do not belong to
this image, i.e. M − N . In particular (7.26) implies by symmetry between the
values of  that EuN,M (M ) = (M − N )/M  α/(1  + α), so that any cluster
point μ of the sequence L(uN,M (M )) satisfies xdμ(x) = α/(1 + α).

Lemma 7.3.4. We have


ZN,M −1 1
uN,M (M ) = =  (7.27)
ZN,M 1 + k≤N a(k, M ) wN,M −1 (k)
ZN −1,M 1
wN,M (N ) = = . (7.28)
ZN,M ≤M a(N, ) uN −1,M ()

Proof. We use (7.22) with A = B = ∅ and j = M to obtain



ZN,M = ZN,M (∅; M ) + a(k, M ) ZN,M (k; M ) .
k≤N

Using (7.23) with A = ∅ or A = {k} and B = ∅ we get


404 7. An Assignment Problem

ZN,M = ZN,M −1 + a(k, M ) ZN,M −1 (k; ∅)
k≤N
  
= ZN,M −1 1 + a(k, M ) wN,M −1 (k) . (7.29)
k≤N

This proves (7.27). The proof of (7.28) is similar, using now (7.21) and (7.24).

It will be essential to consider the following quantity, where i ≤ N :
ZN,M ZN,M −1 (i; ∅) − ZN,M (i; ∅) ZN,M −1
LN,M (i) = 2 . (7.30)
ZN,M
The idea is that (7.7) used for j = M implies that ELN,M (i)2 is small.
(This expectation does not depend on i.) Conversely, if ELN,M (i)2 is small
this implies (7.7) for j = M and hence for all values of j by symmetry.
We will also use the quantity
ZN,M ZN,M −1 (∅; j) − ZN,M (∅; j) ZN,M −1
RN,M (j) = 2 . (7.31)
ZN,M
It is good to notice that |RN,M (j)| ≤ 2. This follows from (7.23) and the fact
that the quantity ZN,M (A, B) decreases as B increases.
The reason for introducing the quantity RN,M (j) is that it occurs natu-
rally when one tries to express LM,N (i) as a function of a smaller system (as
the next lemma shows).
Lemma 7.3.5. We have

a(N, ) RN −1,M () − a(N, M ) uN −1,M (M )2
≤M −1
LN,M (N ) = −  2 (7.32)
≤M a(N, ) uN −1,M ()

k≤N a(k, M ) LN,M −1 (k)
RN,M (M − 1) = −   2 . (7.33)
1 + k≤N a(k, M ) wN,M −1 (k)
Proof. Using the definition (7.31) of RN,M (j) with j = M − 1, we have
ZN,M ZN,M −1 (∅; M − 1) − ZN,M (∅; M − 1) ZN,M −1
RN,M (M − 1) = 2 .
ZN,M
(7.34)
As in (7.29), but using now (7.22) with B = {M − 1} and j = M we obtain:
ZN,M (∅; M − 1) = ZN,M −1 (∅; M − 1)

+ a(k, M ) ZN,M −1 (k; M − 1) . (7.35)
k≤N

Using this and (7.29) in the numerator of (7.34), and (7.29) in the denomina-
tor, and gathering the terms yields (7.33). The proof of (7.32) is similar. 

We end this section by a technical but essential fact.
7.4 Decoupling 405

Lemma 7.3.6. We have



RN,M (j) = −uN,M (M ) + uN,M (M )2 . (7.36)
j≤M −1

Proof. From (7.25) we have



ZN,M (∅; j) = (M −N ) ZN,M −ZN,M (∅; M ) = (M −N ) ZN,M −ZN,M −1 ,
j≤M −1

and changing M into M − 1 in (7.25) we get



ZN,M −1 (∅, j) = (M − 1 − N ) ZN,M −1 .
j≤M −1

These two relations imply (7.36) in a straightforward manner. 

7.4 Decoupling
In this section, we prove (7.7) and, more precisely, the following.
Theorem 7.4.1. Given α > 0, there exists β(α) > 0 such that if β ≤ β(α)
and M = N (1 + α), then for βN ≥ 1
K(α)
E LN,M (N )2 ≤ (7.37)
N
K(α)
E RN,M (M − 1)2 ≤ . (7.38)
N
The method of proof consists of using Lemma 7.3.5 to relate ERN,M (M −
1)2 with ELN,M −1 (N )2 and ELN,M (N )2 with ERN −1,M (M − 1)2 , and to it-
erate these relations. In the right-hand sides of (7.32) and (7.33), we will first
take expectation in the quantities a(N, ) and a(k, M ), that are probabilisti-
cally independent of the other quantities (an essential fact). Our first task is
to learn how to do this.
We recall the random sequence a(k) = exp(−βN Xk ) of (7.20), where
(Xk ) are i.i.d., uniform over [0, 1], and independent of the other sources of
randomness. The following lemma is obvious.
Lemma 7.4.2. We have
1 1
E a(k)p = (1 − exp(−βpN )) ≤ . (7.39)
βpN βpN
Lemma 7.4.3. Consider numbers (xk )k≤N . Then we have
 2  
1 1
E a(k) xk ≤ + x2k . (7.40)
2β 2 N 2βN
k≤N k≤N
406 7. An Assignment Problem

Proof. Using (7.39) we have


 2  
E a(k) xk = x2k E a(k)2 + xk x E a(k) E a()
k≤N k≤N k=
 2 
1  2 1
≤ xk + |xk | |x | .
2βN βN
k≤N k=

Now, the Cauchy-Schwarz inequality implies:


  2
1  N  2
|xk | |x | ≤ |xk | ≤ xk . 

2 2
k= k≤N k≤N

Corollary 7.4.4. If β ≤ 1 we have


1
E RN,M (M − 1)2 ≤ E LN,M −1 (N )2 .
β2
Proof. From (7.33) we have
 2

RN,M (M − 1)2 ≤ a(k, M ) LN,M −1 (k) .
k≤N

The sequence (a(k, M ))k≤N has the same distribution as the sequence
(a(k))k≤N , so that taking expectation first in this sequence and using (7.40)
we get, assuming without loss of generality that β ≤ 1,
1  1
E RN,M (M − 1)2 ≤ E LN,M −1 (k)2 = E LN,M −1 (N )2
β2N β2
k≤N

by symmetry between the values of k. 


This is very crude because in (7.33) the denominator is not of order 1, but
seems to be typically much larger. In order however to prove this, we need to
know that a proportion of the numbers (wN,M −1 (k))k≤M are large. We will
prove that this is indeed the case if β ≤ β(α), but we do not know it yet. To
improve on the present approach it seems that we would need to have this
information now. We could not overcome this technical difficulty, that seems
related to Research Problem 7.1.1.
We next turn to the task of taking expectation in (7.32). The rela-
tion (7.26) is crucial here. Since 0 ≤ uN,M () ≤ 1 and M − N  N α,
this relation implies that at least a constant proportion of the numbers
(u())≤M = (uN,M ())≤M is not small. To understand what happens, con-
sider an independent sequence X uniformly distributed over [0, 1] and note
that if we reorder the numbers (N X )≤M by increasing order, they look like
the sequence (ξi /(1 + α)) (where (ξi )i≥1 is an enumeration of the points of
7.4 Decoupling 407

 point process on R ). The sum ≤N a()u() then looks like the
+
a Poisson
sum ≤N exp(−βξ /(1 + α))u(σ()) where σ is a random permutation, and
it is easy to get convinced that typically it cannot be too small. The precise
technical result we need is as follows.

Proposition 7.4.5. Consider numbers


 0 ≤ u(), u () ≤ 1, for  ≤ M . As-
sume
 that ≤M u() ≥
 4 and ≤M u () ≥ 4. Consider b with N b ≤
≤M u() and N b ≤ ≤M u (). Then if βN ≥ 1 and if β ≤ b/40, for any
numbers (y())≤M we have
 2  2
a()y()
≤M Lβ 2 1 
E  2  2 ≤ 4 y()
b N
≤M a()u() ≤M a()u () ≤M

Lβ 3 
+ 6 y()2 , (7.41)
b N
≤M

where a() = exp(−βN X ) and L denotes a universal constant.

As will be apparent later, an essential feature is that the second term of this
bound has a coefficient β 3 (rather than β 2 ).

Corollary 7.4.6. If β ≤ α/80, βN ≥ 1, M ≥ N (1 + α), M ≤ 3N , we


have
Lβ 3 K(α)
ELN,M (N )2 ≤ 6 ERN −1,M (M − 1)2 + . (7.42)
α N
Proof. For  ≤ M , let u() = uN −1,M (), and a() = a(N, ). For  ≤ M − 1
let y() = RN −1,M (), and let y(M ) = −uN −1,M (M )2 . By (7.32) we have
 2
2
a()y()
≤M
LN,M (N ) =  4 .
≤M a()u()

We check first that ≤M u() ≥ 4. Then (7.26) implies

u() = M − (N − 1) ≥ N (1 + α) − N = N α ,
≤M

and if β ≤ α/80 and βN ≥ 1, then N α ≥ 80 and this is certainly ≥ 4. Also

1  N α α
b := u() ≥ ≥
N N 2
≤M

if N α ≥ 2 and in particular if N β ≥ 1 and β ≤ α/80. We then have β ≤ b/40.


Taking expectation in the r.v.s a(), we can now use (7.41) with u () = u()
to obtain
408 7. An Assignment Problem
 2
Lβ 2 1  Lβ 3 
Ea LN,M (N )2 ≤ y() + 6 y()2 , (7.43)
α4 N α N
≤M ≤M

where Ea denotes expectation in the r.v.s a() only. By (7.36) we have


 
 
 y() = |uN −1,M (M )| ≤ 1
 
≤M

and y(M )2 = uN −1,M (M )4 ≤ 1. Thus (7.43) implies

K(α) Lβ 3 
Ea LN,M (N )2 ≤ + 6 y()2 . (7.44)
N α N
≤M −1

To prove (7.42) we simply take expectation in (7.44), using that M ≤ 3N


and observing that Ey()2 = ERN −1,M (M − 1)2 for  ≤ M − 1. 

Proof of Theorem 7.4.1. To avoid trivial complications, we assume α ≤ 1.
Let us fix N , let us assume M = N (1 + α), and, for k ≤ N let us define

V (k) = E RN −k,M −k (M − k − 1)2 .

In this definition we assume that the values of ZN −k,M  that are relevant
for the computation of RN −k,M −k have been computed with the parameter
β replaced by the value β such that β (N − k) = βN . We observe that
M − k = N (1 + α) − k ≥ (N − k)(1 + α) and M − k ≤ 3(N − k).
Combining Corollaries 7.4.6 and 7.4.4, implies that if β (N − k) = βN ≥ 1
and β ≤ α/80 we have

Lβ K(α)
V (k) ≤ V (k + 1) + . (7.45)
α6 N
Let us assume that k ≤ N/2, so that b ≤ 2b. Then (7.45) holds whenever
β ≤ α/160. Thus if Lβ/α6 ≤ 1/2, k ≤ N/2 and βN ≥ 1, we obtain

1 K(α)
V (k) ≤ V (k + 1) + .
2 N
Combining these relations yields

K(α) K(α)
V (0) ≤ 2−k V (k) + ≤ 2−k+2 +
N N
since V (k) ≤ 4. Taking k  log N proves (7.38), and (7.37) follows by (7.42).


Theorem 7.4.7. Under the conditions of Theorem 7.4.1, for j ≤ M − 1,


i ≤ N − 1 we have
7.4 Decoupling 409

 2 K(α)
E uN,M (j) − uN,M −1 (j) ≤ (7.46)
N
 2 K(α)
E uN,M (j) − uN −1,M (j) ≤ (7.47)
N
 2 K(α)
E wN,M (i) − wN,M −1 (i) ≤ (7.48)
N
 2 K(α)
E wN,M (i) − wN −1,M (i) ≤ . (7.49)
N
Proof. The proofs are similar, so we prove only (7.46). We can assume j =
M − 1. Using (7.29) and (7.35) we get

ZN,M (∅; M − 1)
uN,M (M − 1) =
ZN,M
  
ZN,M −2 1 + k≤N a(k, M ) wN,M −2 (k)
=  .
ZN,M −1 1 + k≤N a(k, M ) wN,M −1 (k)

We observe the identity


ZN,M −1
LN,M (i) = (wN,M −1 (i) − wN,M (i)) ,
ZN,M

which is obvious from (7.30). Using this identity for M − 1 rather than M ,
we obtain

uN,M (M − 1) − uN,M −1 (M − 1)
  
ZN,M −2 1 + k≤N a(k, N ) wN,M −2 (k)
=  −1
ZN,M −1 1 + k≤N a(k, N ) wN,M −1 (k)

k≤N a(k, N ) LN,M −1 (k)
=  .
1 + k≤N a(k, N ) wN,M −1 (k)

Thus (7.47) follows from (7.37) and Lemma 7.4.3. 


We turn to the proof of Proposition 7.4.5, which occupies the rest of this
section. It relies on the following probabilistic estimate.

Lemma 7.4.8. Consider numbers 0 ≤ u() ≤ 1, and let b = N −1 ≤M u().
Then if βN ≥ 1 and β ≤ b/20 we have for k ≤ 8 that
 −k
 Lβ k
E a()u() ≤ , (7.50)
bk
≤M

where a() is as in (7.20).


410 7. An Assignment Problem

There is of course nothing magic about the number 8, this result is true for
any other number (with a different condition on β). As the proof is tedious,
it is postponed to the end of this section.
Proof of Proposition 7.4.5. First we reduce to the case u() = u () by
using that 2cc ≤ c2 + c 2 for
 −2  −2
c= a()u() ; c = a()u () .
≤M ≤M

Next, let ȧ() = a() − Ea() = a() − Ea(1), so that


   
a()y() = Ea(1) y() + ȧ()y()
≤M ≤M ≤M

and since Ea(1) ≤ 1/(βN ),


 2  2  2
2 1 
a()y() ≤ 2 y() + 2 ȧ()y() .
β N
≤M ≤M ≤M

Using (7.50) for k = 4, it suffices to prove that


 2
≤M ȧ()y() Lβ 3 
E  4 ≤ 6 y()2 . (7.51)
a()u() b N
≤M ≤M

Expending the square in the numerator of the left-hand side, we see that it
equals I + II, where
 ȧ( )2
I= y( )2 E  4 (7.52)
 ≤M ≤M a()u()
 ȧ(1 )ȧ(2 )
II = y(1 )y(2 )E  4 .
1 =2 ≤M a()u()

To bound the terms of I, let us set S = = a()u(), so

ȧ( )2 ȧ( )2 1
4 ≤ E 4 = Eȧ( ) E 4
2
E 
S S
≤M a()u()  

by independence. Now since ≤M u() ≥ 4 and u( ) ≤ 1, we have
 3  3
u() ≥ u() ≥ b , (7.53)
4 4
= ≤M

so using (7.50) for M − 1 rather than M and 3b/4 rather than b we get
ES−4
 ≤ Lβ 4 /b4 ; since Eȧ( )2 ≤ Ea( )2 ≤ 1/βN , we have proved that, using
that b ≤ 1 in the second inequality
7.4 Decoupling 411

Lβ 3  Lβ 3 
I≤ y()2
≤ y()2 .
N b4 N b6
≤M ≤M

To control the term II, let us set



S(1 , 2 ) = a()u()
=1 ,2

and
U = a(1 )u(1 ) + a(2 )u(2 ) ≥ 0 .

Thus ≤M a()u() = S(1 , 2 ) + U . Since U ≥ 0, a Taylor expansion yields

1 1 4U R
 4 = 4
− 5
+ 6
(7.54)
(S( ,  )) S( ,  ) S( 1 , 2 )
≤M a()u()
1 2 1 2

where |R| ≤ 15U 2 . Since S(1 , 2 ) is independent of a(1 ) and a(2 ), and since
Eȧ(1 )ȧ(2 )U = 0, multiplying (7.54) by ȧ(1 )ȧ(2 ) and taking expectation
we get
 
  15|ȧ(1 )ȧ(2 )|U 2
 ȧ(1 )ȧ(2 ) 
E  4  ≤ E
 a()u()  S(1 , 2 )6
≤M
1
= 15E(|ȧ(1 )ȧ(2 )|U 2 )E .
S(1 , 2 )6

Since U 2 ≤ 2(a(1 )2 + a(2 )2 ) and |ȧ(2 )| ≤ 1, independence implies

E(|ȧ(1 )ȧ(2 )|U 2 ) ≤ 4E(|ȧ(1 )||ȧ(2 )|a(2 )2 ) ≤ 4E(|ȧ(1 )|)Ea(2 )2 .

Now, Ea()2 ≤ 1/(2βN ) and E|ȧ()| ≤ 2Ea() ≤ 2/(βN ). Therefore we have

L
E(|ȧ(1 )ȧ(2 )|U 2 ) ≤ .
(βN )2

We also have that ES(1 , 2 )−6 ≤ Lβ 6 /b6 by (7.50) (used for k = 6 and M −2
rather than M , and proceeding as in (7.53)). Thus
 2
Lβ 4  Lβ 4 
II ≤ |y( 1 )y( 2 )| ≤ |y()|
b6 N 2 b6 N 2
1 =2 ≤M

Lβ  4
≤ 6 y()2 ,
b N
≤M

and we conclude using that β ≤ 1. 



The following prepares the proof of Lemma 7.4.8.
412 7. An Assignment Problem

Lemma 7.4.9. If βN ≥ 1 and λ ≥ 1 we have


 
log λ
E exp(−λa(1)) ≤ exp − .
2βN
Proof. Assume first λ ≤ exp βN , so that log λ ≤ βN and
 
log λ log λ
P(λa(1) ≥ 1) = P(exp βN X1 ≤ λ) = P X1 ≤ = .
βN βN

Thus, since exp(−x) ≤ 1/2 for x ≥ 1, we have


1
E exp(−λa(1)) ≤ 1 − P(λa(1) ≥ 1)
2
1 
≤ exp − P(exp(βN X1 ) ≤ λ)
2
log λ 
= exp − .
2βN
Consider next the case λ ≥ exp βN . Observe first that the function θ(x) =
x/ log x increases for x ≥ e so that θ(λ) ≥ θ(exp βN ), i.e. λ/ log(λ) ≥
(exp βN )/βN , that is λ exp(−βN ) ≥ log λ/βN . Now, since a(1) ≥ exp(−βN )
we have
log λ 
E exp(−λa(1)) ≤ E exp(−λ exp(−βN )) ≤ exp − . 

βN

Proof of Lemma 7.4.8. We use the inequality (A.8):

P(Y ≤ t) ≤ (exp λt)E exp(−λY ) (7.55)



for Y = ≤Ma()u() and any λ ≥ 0. We have
   
E exp(−λY ) = E exp −λ a()u() = E exp(−λu a()) .
≤M ≤M

Since u() ≤ 1, Hölder’s inequality implies


 u()  u()
E exp(−λu a()) ≤ E exp(−λa()) = E exp(−λa(1)) .

Therefore, assuming λ ≥ 1, and using Lemma 7.4.9 in the second line,


 P u()
E exp(−λY ) ≤ E exp(−λa(1)) ≤M
   
log λ
≤ exp − u()
2βN
≤M
 
b log λ
= exp − , (7.56)

7.5 Empirical Measures 413

using that bN = ≤M u(). Thus from (7.55) we get
   
tb b λt 
P Y ≤ ≤ exp − log λ − . (7.57)
2eβ 2β e

For t ≤ 1, taking λ = e/t, and since then log λ − λt/e = log e/t − 1 = − log t,
we get  
tb
P Y ≤ ≤ tb/2β .
2eβ

Therefore whenever t ≥ 1, the r.v. X = 1/Y satisfies


 
2teβ
P X≥ ≤ t−b/2β . (7.58)
b

Now we use (A.33) with F (x) = xk to get, making a change of variable


in the second line,

EX k = ktk−1 P(X ≥ t)dt
0
 k ∞  
2eβ 2eβt
= k−1
kt P X≥ dt .
b 0 b

We bound P(X ≥ 2eβt/b) by 1 for t ≤ 1 and using (7.58) for t ≥ 1 to get


 k  ∞   k 
2eβ 2eβ k
EX k ≤ 1+k t−b/(2β)+k−1 dt = 1+ ,
b 1 b b/(2β) − k

from which (7.50) follows since k ≤ 8 and b/(2β) ≥ 10. 




Exercise 7.4.10. Prove that for a r.v. Y ≥ 0 one has the formula

1
EY −k = tk−1 E exp(−tY )dt ,
(k − 1)! 0

and use it to obtain the previous bound on EX k = EY −k directly from (7.56).

7.5 Empirical Measures

Throughout the rest of this section, we assume the conditions of Theorem


7.4.1, that is, βN ≥ 1, M = N (1 + α) and β ≤ β(α).
Let us pursue our intuition that the sequence (uN,M (j))j≤M looks like it
is i.i.d. drawn out of a certain distribution. How do we find this distribution?
The obvious candidate is the empirical measure
414 7. An Assignment Problem

1 
μN = δuN,M (j) . (7.59)
M
j≤M

We will also consider


1 
νN = δwN,M (i) . (7.60)
N
i≤N

We recall the sequence a(k) = exp(−βN Xk ), where (Xk ) are i.i.d., uni-
form over [0, 1] and independent of the other sources of randomness. Consider
the random measure μN on [0, 1] given by
 
1
μN = La  ,
1 + k≤N a(k)wN,M (k)

where La denotes the law in the randomness of the variables a(k) with all
the other sources of randomness fixed. Thus, for a continuous function f on
[0, 1] we have
 
1
f dμN = Ea f  ,
1 + k≤N a(k)wN,M (k)

where Ea denotes expectation in the r.v.s a(k) only. Consider the (non-
random) measure μN = EμN , so that
 
1
f dμN = Ef  .
1 + k≤N a(k)wN,M (k)

In this section we shall show that μN  μN , and that, similarly, νN  νN


where  
1
f dνN = Ef  .
≤M a()uN,M ()

In the next section we shall make precise the intuition that “νN determines
μN ” and “μN determines νN ” to conclude the proof of Theorem 7.1.2.
It is helpful to consider an appropriate distance for probability measures.
Given two probability measures μ, ν on R, we consider the quantity
Δ(μ, ν) = inf E(X − Y )2 ,
where the infimum is over the pairs (X, Y ) of r.v.s such that X has law
μ and Y has law ν. The quantity Δ1/2 (μ, ν) is a distance. This statement
is not obvious, but is proved in Section A.11, where the reader may find
more information. This distance is called Wasserstein’s distance between μ
and ν. It is of course related to the transportation-cost distance considered in
Chapter 6, but is more convenient here. Let us observe that since E(X −Y )2 ≥
(EX − EY )2 we have
 2
xdμ(x) − xdν(x) ≤ Δ(μ, ν) . (7.61)
7.5 Empirical Measures 415

Theorem 7.5.1. The conditions of Theorem 7.4.1 imply

lim EΔ(μN , μN ) = 0 ; lim EΔ(νN , νN ) = 0 . (7.62)


N →∞ N →∞

We first collect some simple facts about Δ.


Lemma 7.5.2. We have
 
1  1  1 
Δ δxi , δyi = inf (xi − yσ(i) )2 , (7.63)
N N σ N
i≤N i≤N i≤N

where the infimum is over all permutations σ of {1, . . . , N }.


We will use this lemma when xi = wN,M (i), and almost surely any two of
these points are distinct. For this reason, we will give the proof only in the
(easier) case where any two of the points xi (resp. yi ) are distinct.
Proof. The inequality ≤ should be obvious.
 To prove the converse inequality,

we observe that if X has law N −1 i≤N δxi and Y has law N −1 i≤N δyi ,
then 
E (X − Y )2 = P (X = xi , Y = yj )(xi − yj )2 .
i,j≤N

We observe that the bistochastic matrices are exactly the matrices aij =
N P(X = xi , Y = yj ). Thus the left-hand side of (7.63) is
1 
inf aij (xi − yj )2 ,
N
i,j≤N

where the infimum is over all bistochastic matrices (aij ). The infimum is at-
tained at an extreme point, and it is a classical result (“Birkhoff’s theorem”)
that this extreme point is a permutation matrix. 
Lemma 7.5.3. Given numbers w(k), w (k) ≥ 0 we have
 2
1 1
E  − 
1 + k≤N a(k)w(k) 1 + k≤N a(k)w (k)
2 
≤ 2 (w(k) − w (k))2 . (7.64)
β N
k≤N

Consequently
    
1 1
Δ L  ,L 
1 + k≤N a(k)w(k) 1+ k≤N a(k)w (k)
2 
≤ (w(k) − w (k))2 . (7.65)
β2N
k≤N
416 7. An Assignment Problem

Proof. We use Lemma 7.4.3 together with the inequality


 2
1 1
 − 
1+ k≤N a(k)w(k)1+ k≤M a(k)w (k)
 2
≤ a(k)(w(k) − w (k)) . 

k≤N

The following fact is crucial.

Lemma 7.5.4. For any continuous function f we have


  
lim E f (uN,M (M )) − f dμN f (uN,M (M − 1)) − f dμN =0.
N →∞
(7.66)

Proof. Recalling the numbers a(k, ) of (7.6), let us consider


1
u=  .
1+ k≤N a(k, M )wN,M −2 (k)

Using (7.27), (7.64) and (7.48) (with M − 1 instead of M ) we obtain

K
E(uN,M (M ) − u)2 ≤ .
N
Exchanging the rôles of M and M − 1 shows that if
1
u = 
1+ k≤N a(k, M − 1)wN,M −2 (k)

we have
K
E(uN,M (M − 1) − u )2 = E(uN,M (M ) − u)2 ≤ .
N
Therefore to prove (7.66) it suffices to prove that
  
lim E f (u) − f dμN f (u ) − f dμN = 0 . (7.67)
N →∞

Now by definition of μN we have


  
E f (u) − f dμN f (u ) − f dμN = E(f (u) − f (u1 ))(f (u ) − f (u1 )) ,

where
1 1
u1 =  ; u1 =  ,
1+ k≤N a(k)wN,M (k) 1 + k≤N a (k)wN,M (k)
7.5 Empirical Measures 417

and where a(k) = exp(−βN Xk ) and a (k) = exp(−βN Xk ) are independent


of all the other r.v.s involved. Let
1 1
u2 =  ; u2 =  .
1+ k≤N a(k)wN,M −2 (k) 1 + k≤M a (k)wN,M −2 (k)

Using again (7.64) and (7.48) we get


K K
E(u1 − u2 )2 ≤ ; E(u1 − u2 )2 ≤ .
N N
Therefore, to prove (7.67) it suffices to show that

lim E(f (u) − f (u2 ))(f (u ) − f (u2 )) = 0 .


N →∞

Let us denote by Ea expectation only in the r.v.s a(k), a (k), a(k, M ) and
a(k, M − 1), which are probabilistically independent of the r.v.s wN,M −2 (k).
Then, by independence,

Ea (f (u) − f (u2 ))(f (u ) − f (u2 )) = (Ea f (u) − Ea f (u2 ))(Ea f (u ) − Ea f (u2 )).

This is 0 because Ea f (u) = Ea f (u2 ), as is obvious from the definitions. 




Corollary 7.5.5. For any continuous function f we have


 2
lim E f dμN − f dμN =0. (7.68)
N →∞

Proof. We have
1 
f dμN = f (uN,M ())
M
≤M

so that, expanding the square and by symmetry


 2  2
1
E f dμN − f dμN = E f (uN,M (M )) − f dμN
M
  
M −1
+ E f (uN,M (M )) − f dμN f (uN,M (M − 1)) − f dμN .
M
We conclude with Lemma 7.5.4. 

It is explained in Section A.11 why Wasserstein distance defines the weak
topology on the set of probability measures on a compact space. Using (A.73)
we see that (7.68) implies the following.

Corollary 7.5.6. We have

lim EΔ(μN , μN ) = 0 . (7.69)


N →∞
418 7. An Assignment Problem

Lemma 7.5.7. Consider an independent copy μ N of the random measure


μN . Then, recalling that μN = EμN , we have

EΔ(μN , μN ) ≤ EΔ(μN , μ
N ) . (7.70)

Proof. Let C be the class of pairs f, g of continuous functions such that

∀x, y , f (x) + g(y) ≤ (x − y)2 ,

so that by the duality formula (A.74) and since μN = EμN = E


μN ,
 
EΔ(μN , μN ) = E sup f dμN + E gd μN
(f,g)∈C
 
≤ E sup f dμN + gd
μN N ) ,
= EΔ(μN , μ
(f,g)∈C

using Jensen’s inequality. 




Lemma 7.5.8. Consider an independent copy νN of the random measure νN
defined in (7.60). Then we have
2 ∼
N ) ≤
EΔ(μN , μ EΔ(νN , νN ).
β2


Proof. Let νN = N −1 k≤N δwN,M∼

(k) , where (wN,M (k))k≤N is an inde-
pendent copy of the family (wN,M (k))k≤N . By Lemma 7.5.2 we can find a
permutation σ with
1  ∼
2 ∼
wN,M (k) − wN,M (σ(k)) = Δ(νN , νN )
N
k≤N

and by Lemma 7.5.3 we get


2 ∼
N ) ≤
Δ(μN , μ Δ(νN , νN ) (7.71)
β2
where
   
1 1
N = La
μ  ∼ = La  ∼ .
1 + k≤N a(k)wN,M (σ(k)) 1 + k≤N a(k)wN,M (k)

N is an independent
Taking expectation in (7.71) concludes the proof, since μ
0N .
copy of μ 

Let us observe the inequality

Δ(μ1 , μ2 ) ≤ 2(Δ(μ1 , μ3 ) + Δ(μ3 , μ2 )) , (7.72)

which is a consequence of the fact that Δ1/2 is a distance.


7.5 Empirical Measures 419

Proposition 7.5.9. We have


4 ∼
lim sup EΔ(μN , μN ) ≤ lim sup EΔ(νN , νN ). (7.73)
N →∞ β 2 N →∞
Consequently, if μ∼
N denotes an independent copy of the random measure μN ,
we have
16
lim sup EΔ(μN , μ∼ ∼
N ) ≤ 2 lim sup EΔ(νN , νN ) . (7.74)
N →∞ β N →∞
Proof. Inequality (7.72) implies
Δ(μN , μN ) ≤ 2Δ(μN , μN ) + 2Δ(μN , μN ) .
Therefore (7.69) yields
lim sup EΔ(μN , μN ) ≤ 2 lim sup EΔ(μN , μN ) .
N →∞ N →∞

By (7.70) and Lemma 7.5.8 this proves (7.73). To prove (7.74) we simply use
(7.72) to write that
Δ(μN , μ∼ ∼
N ) ≤ 2Δ(μN , μN ) + 2Δ(μN , μN ) ,

and we note that EΔ(μN , μ∼


N ) = EΔ(μN , μN ). 

At this point we have done half of the work required to prove Theorem 7.5.1.
The other half is as follows.
Proposition 7.5.10. We have
Lβ 3
lim sup EΔ(νN , νN ) ≤ lim sup EΔ(μN , μ∼
N) (7.75)
N →∞ α6 N →∞
and
∼ Lβ 3
lim sup EΔ(νN , νN )≤ lim sup EΔ(μN , μ∼
N) . (7.76)
N →∞ α6 N →∞

It is essential there to have a coefficient β 3 rather than β 2 . Combining (7.76)


and (7.74) shows that

∼ Lβ 3
lim sup EΔ(νN , νN )≤ lim sup EΔ(μN , μ∼
N)
N →∞ α6 N →∞
Lβ 3 16 ∼
≤ 6 2 lim sup EΔ(νN , νN ),
α β N →∞
so that if 16Lβ/α6 < 1 then

lim sup EΔ(νN , νN ) = lim sup EΔ(μN , μ∼
N) = 0
N →∞ N →∞

and (7.73) and (7.75) prove Theorem 7.5.1.


The proof of Proposition 7.5.10 is similar to the proof of Proposition 7.5.9,
using (7.41) rather than (7.40). Let us first explain the occurrence of the all
important factor β 3 in (7.76).
420 7. An Assignment Problem

Lemma
 7.5.11. Consider
 numbers u(), u () ≥ 0 for  ≤ M and assume
that ≤M u() = ≤M u () ≥ N α/2. Then we have

 2
1 1 Lβ 3 
E  − ≤ (u() − u ())2 .
≤M a()u() ≤M a()u () α6 N
≤M
(7.77)
Consequently we have
    
1 1 Lβ 3 
Δ L  ,L  ≤ 6 (u()−u ())2 .
≤M a()u() ≤M a()u () α N
≤M
(7.78)

Proof. We write
 2
1 1
 −
≤M a()u() ≤M a()u ()
 2
≤M (u() − u ())a()
≤  2  2 ,
≤M u()a() ≤M u ()a()

and we use (7.41) with y() = u() − u (), so that ≤M y() = 0. 

Consider the random measure ν N on R given by +

 
1
ν N = La  ,
≤N a()uN,M ()

so that νN = Eν N . We denote by νN an independent copy of ν N . We recall


that μ∼
N denotes an independent copy of μN .

Lemma 7.5.12. We have


Lβ 3
EΔ(ν N , νN ) ≤ EΔ(μN , μ∼
N) .
α6

Proof. Let μ∼ N = M
−1 ∼
≤M δuN,M () , where (uN,M ())≤M is an inde-

pendent copy of the family (uN,M ())≤M . By Lemma 7.5.2 we can find a
permutation σ with
1  2
uN,M () − u∼
N,M (σ()) = Δ(μN , μ∼
N) .
M
≤M

The essential point now is that (7.26) yields


 
uN,M () = u∼
N,M (σ()) = M − N ≥ αN/2 ,
≤M ≤M
7.5 Empirical Measures 421

so that we can use Lemma 7.5.11 to get


Lβ 3
Δ(ν N , νN ) ≤ Δ(μN , μ∼
N) (7.79)
α6
where
   
1 1
νN = La  ∼ = La  ∼ .
≤M a()uN,M (σ()) ≤M a()uN,M ()

Taking expectation in (7.79) concludes the proof, since νN is an independent


copy of ν0N . 

The rest of the arguments in the proof of Proposition 7.5.10 is very sim-
ilar to the arguments of Proposition 7.5.9. One extra difficulty is that the
distributions νN (etc.) no longer have compact support. This is bypassed by
a truncation argument. Indeed, it follows from (7.28) and (7.50) that
4
EwN,M (i) ≤ K(α) .
If b ≥ 0 is a truncation level, the quantities wN,M,b (i) := min(wN,M (i), b)
satisfy
 2  K(α)
E(wN,M (i) − wN,M,b (i))2 ≤ E wN,M (i)1{wN,M (i)≥b} ≤ .
b2

If we define νN,b = N −1 i≤N δwN,M,b (i) , then
1 
Δ(νN,b , νN ) ≤ (wN,M (i) − wN,M,b (i))2
N
i≤N

so that
K(α)
EΔ(νN,b , νN ) ≤ , (7.80)
b2
and using such a uniformity, rather than (7.75) it suffices to prove for each b
the corresponding result when in the left-hand side “everything is truncated
at level b”. More specifically, defining νN,b by
  
1
f dνN,b = Ef min b,  ,
≤M a()uN,M ()

one proves that


Lβ 3
lim sup EΔ(νN,b , νN,b ) ≤ lim sup EΔ(μN , μ∼
N) ,
N →∞ α6 N →∞
and one uses that (7.80) implies
K(α)
lim sup EΔ(νN , νN ) ≤ lim sup EΔ(νN,b , νN,b ) + .
N →∞ N →∞ b2
The details are straightforward.
422 7. An Assignment Problem

7.6 Operators

The definition of the operators A and B given in (7.9) and (7.10) is pretty,
but it does not reflect the property we need.  The fundamental property of
the operator A is that if the measure M −1 ≤M δu() approaches the mea-
 −1
sure μ, the law of ≤M aN ()u() approaches A(μ), where aN () =
exp(−N βX ), M/N  1 + α, and where of course the r.v.s (X )≥1 are
i.i.d. uniform over [0, 1]. Since the description of A given in (7.9) will not be
needed, its (non-trivial) equivalence with the definition we will give below in
Proposition 7.6.2 will be left to the reader.
In order to prove the existence of the operator A, we must prove that if
two measures
1  1 
δu() and δu ()
M M 
≤M ≤M

both approach μ, and if M/N  M /N , then


   
1 1
L  L  .
≤M aN ()u() ≤M  aN ()u ()


This technical fact is contained in the following estimate.

Proposition 7.6.1. Consider a number α > 0. Consider integers M , N ,


M , N with N ≤ M ≤ 2N , N ≤ M ≤ 2N and numbers 0 ≤ u() ≤ 1 for
 ≤ M , numbers 0 ≤ u () ≤ 1 for  ≤ M . Let
1  1 
η= δu() ; η = δu () .
M M 
≤M ≤M
 
Assume that xdη(x) ≥ α/4 and xdη (x) ≥ α/4. Assume that βN ≥
1, βN ≥ 1 and β ≤ α/80. Then, with aN () = exp(−βN X ) as above, we
have
    
1 1
Δ L  ,L 
≤M aN ()u() ≤M  aN ()u ()


  
1 1 M M 
≤ K(α) + +  −
N N N N 
 2
Lβ 3 Lβ 2
+ 6 Δ(η, η ) + 4 xdη(x) − xdη (x) . (7.81)
α α

Let us state an important consequence.

Proposition 7.6.2. Given a number α > 0 there exists a number β(α) > 0
with the following property. If β ≤ β(α) and if μ is a probability measure on
7.6 Operators 423

[0, 1] with xdμ(x) ≥ α/4, there exists a unique probability measure A(μ) on
R+ with the following property. Consider numbers 0 ≤ u() ≤ 1 for  ≤ M ,
and set
1 
η= δu() .
M
≤M

Then
     
1 1 M 
Δ A(μ), L  + 
≤ K(α) − (1 + α)
≤M N ()u()
a N N
 2
Lβ 2
+ 4 xdμ(x) − xdη(x)
α
Lβ 3
+ 6 Δ(μ, η) . (7.82)
α

Moreover, if μ is another probability measure and if xdμ (x) ≥ α/4, we
have
 2
Lβ 2 Lβ 3
Δ(A(μ), A(μ )) ≤ 4 xdμ(x) − xdμ (x) + 6 Δ(μ, μ ) . (7.83)
α α

A little bit of measure-theoretic technique is required again here, because


we are dealing with probability measures that are not supported by a compact
interval. In the forthcoming lemma, there is really nothing specific about the
power 4.

Lemma 7.6.3. Given a number


 ∞ C, consider the set D(C) of probability mea-
sures θ on R+ that satisfy 0 x4 dθ(x) ≤ C. Then D(C) is a compact metric
space for the distance Δ.

Proof. The proof uses a truncation argument similar to the one given at
the end of the proof of Proposition 7.5.10. Given a number b > 0 and a
probability measure θ in D(C) we define the truncation θb as the image of
θ under the map x → min(x, b). In words, all the mass that θ gives to the
half-line [b, ∞[ is pushed to the point b. Then we have
∞ ∞
C
Δ(θ, θb ) ≤ (x − min(x, b))2 dθ(x) ≤ x2 dθ(x) ≤ . (7.84)
0 b b2

Consider now a sequence (θn )n≥1 in D(C). We want to prove that it has a
subsequence that converges for the distance Δ. Since for each b the set of
probability measures on the interval [0, b] is compact for the distance Δ (as is
explained in Section A.11), we assume, by taking a subsequence if necessary,
that for each integer m the sequence (θnm )n≥1 converges for Δ to a certain
probability measure λm . Next we show that there exists a probability measure
λ in D(C) such that λm = λm for each m. This is simply because if m < m
424 7. An Assignment Problem
 ∞ 4
then λmm = λm (the “pieces fit together”) and because 0 x dλm (x) ≤ C
for each m. Now, for each m we have limn→∞ Δ(θnm , λm ) = 0, and (7.84) and
the triangle inequality imply that limn→∞ Δ(θn , λ) = 0. 

Proof of Proposition 7.6.2. The basic idea is to define A(μ) “as the limit”
 −1 
of the law λ of ≤M aN ()u() as M −1 ≤M δu() → μ, M, N →
∞, M/N → (1 + α). We note that by (7.50)
  4used for k = 8, whenever
≤M u() ≥ αN/8, (and β < β(α)) we have x dλ(x) ≤ L. Thus, recalling
the notation of Lemma 7.6.3, we have λ ∈ D(L), a compact set, and therefore
the family of these measures has a cluster point A(μ), and (7.82) holds by
continuity. Moreover (7.83) is a consequence of (7.82) and continuity (and
shows that the cluster point A(μ) is in fact unique). 

We recall the probability measures μN , νN , νN , μN of Section 7.5.
Proposition 7.6.4. We have
lim Δ(νN , A(μN )) = 0 . (7.85)
N →∞

Proof. First we recall that by (7.26) we have


1  M −N α
xdμN (x) = uN,M () = ≥
M M 2
≤M

for M = N (1 + α) and N large. Since Theorem 7.5.1 asserts that


EΔ(μN , μN ) → 0, (7.61) implies that
 2
E xdμN (x) − xdμN (x) → 0

and thus xdμN (x) ≥ α/4 for N large. Therefore we can use (7.82) for
μ = μN and η = μN to get (using (7.61) again)
  
1
Δ A(μN ), La 
≤M aN ()uN,M ()
 2 
K(α) β β3
≤ +L + 6 Δ(μN , μN ) . (7.86)
N α4 α
The expectation of the right-hand side goes to zero as N → ∞ by Theorem
7.5.1. Since by definition
 
1
νN = ELa  ,
≤M aN ()uN,M ()

taking expectation in (7.86) and using Jensen’s inequality as in (7.70) com-


pletes the proof. 

Proposition 7.6.4 is of course only half of the work because we also have
to define the operators B. These operators B have the following defining
property.
7.6 Operators 425

Proposition 7.6.5. To each probability measure ν on R+ we can attach


a probability measure B(ν) on [0, 1] with the following property. Consider
numbers w(k) ≥ 0 for k ≤ N , and let
1 
η= δw(k) .
N
k≤N

Then
  
1 K L
Δ B(ν), L  ≤ + 2 Δ(ν, η) . (7.87)
1 + k≤N aN (k)w(k) N β

Moreover
L
Δ(B(ν), B(ν )) ≤ Δ(ν, ν ) . (7.88)
β2
Proof. Similar, but simpler than the proof of Proposition 7.6.2. 


Proposition 7.6.6. We have

lim Δ(μN , B(νN )) = 0 . (7.89)


N →∞

Proof. Similar (but simpler) than the proof of (7.85). 



Proof of Theorem 7.1.2. It follows from the definition of νN and (7.50)
that x4 dνN (x) ≤ L, so that, recalling the set D(L) of Lemma 7.6.3, we
have νN ∈ D(L). Since μN lives on [0, 1], we can find a subsequence of the
sequence (μN , νN ) that converges for Δ to a pair (μ, ν). Using (7.85) and
(7.89) we see that this pair satisfies the relations (7.11):
α
xdμ(x) = ; μ = B(ν) , ν = A(μ) . (7.90)
1+α
The equations (7.90) have a unique solution. Indeed, if (μ , ν ) is another
solution (7.83) implies

Lβ 3
Δ(ν, ν ) ≤ Δ(μ , μ)
α6
and by (7.88) we have
L
Δ(μ, μ ) ≤ Δ(ν, ν )
β2
so that

Δ(μ, μ ) ≤ Δ(μ, μ )
α6
and Δ(μ, μ ) = 0 if Lβ/α6 < 1. Let us stress the miracle here. The condition
(7.26) forces the relation xdμ(x) = α/(1 + α), and this neutralizes the first
426 7. An Assignment Problem

term on the right-hand side of (7.83). This term is otherwise devastating,


because the coefficient Lβ 2 /α4 does not compensate the coefficient L/β 2 of
(7.88).
Since the pair (μ, ν) of (7.90) is unique, we have in fact that μ = lim μN ,
ν = lim νN . On the other hand, by definition of μN we have EμN =
L(uN,M (M )), so Jensen’s inequality implies as in (7.70) that

Δ(L(uN,M (M )), μN ) ≤ EΔ(μN , μN ) ,

so limN →∞ L(uN,M (M )) = μ by (7.62). Similarly limN →∞ L(wN,M (N )) = ν.




We turn to the proof of Proposition 7.6.1. Let us start by a simple obser-
vation.

Proposition 7.6.7. The bound (7.81) holds when M = M .

Proof.
 Without loss of generality
 we assume that N ≤ N . Let S =
≤M a N ()u() and S = ≤M a N  ()u (). Then

      2
1 1 1 1 (S − S )2
Δ L ,L ≤E − =E ≤ I + II (7.91)
S S S S S2S 2

where  2
≤M (aN () − aN ())u ()

I = 2E ;
S2S 2
 2
≤M aN ()(u() − u ())
II = 2E .
S2S 2
We observe since N ≤ N that aN () ≥ aN (), so that

S ≥ S ∼ := aN ()u () ,
≤M

and  2
≤M aN ()(u() − u ())
II ≤ 2E .
S 2 S ∼2
To bound this quantity we will use the estimate(7.41). The relations
xdη(x) ≥  α/4 and xdη (x) ≥ α/4 mean that ≤M u() ≥ αM/4 ≥
αN/4 and ≤M u () ≥ αM/4 ≥ αN/4. Thus in (7.41) we can take b = α/4.
This estimate then yields
 2  2
Lβ 2 M Lβ 3 1 
II ≤ 4 xdη(x) − xdη (x) + 6 (u() − u ())2 .
α N α N
≤M
(7.92)
7.6 Operators 427

We can assume
 from Lemma 7.5.2 that we have reordered the terms u () so
that M −1 ≤M (u() − u ())2 ≤ Δ(η, η ), and then the bound (7.92) is as
desired, since M ≤ 2N .
To controlthe term I, we first note that 0 ≤ aN  () − aN () ≤ 1 since
N ≤ N ; and ≤M (aN  () − aN ())u() ≤ M since 0 ≤ u () ≤ 1. Therefore

 aN  () − aN ()
I ≤ 2M E .
S2S 2
≤M

We control this term with the same


 method that we used to control the term
(7.52). Namely, we define S =  = aN ( )u( ) and S similarly, and we
write, using independence and the Cauchy-Schwarz inequality that

aN  () − aN () aN  () − aN ()


E ≤E
2
S S 2 S2 S 2
 1/2  1/2
1 1
≤ E(a N () − aN ()) E 4 E 4 .
S S

Using (7.50), and since = u() ≥ N α/4 − 1 ≥ N α/8 because N β ≥ 1 and
β ≤ α/80, we get
 1/2
1
E 4 ≤ K(α)β 2 ,
S
and similarly for S . Using (7.39) for p = 1, we obtain
 
L 1 1
E(aN  () − aN ()) ≤ − .
β N N

The result follows. 



The main difficulty in the proof of Proposition 7.6.1 is to find how to relate
the different values M and M . Given a sequence (u())≤M and an integer
M , consider the sequence (u∼ ())≤M M  that is obtained by repeating each
term u() exactly M times.

Proposition 7.6.8. We have


    
1 1 K
Δ L  ,L  ∼
≤ . (7.93)
≤M aN ()u() ≤M M  aN M ()u () N


Proof of Proposition 7.6.1. The meaning of (7.93) is that within a small


error (as in (7.81)) we can replace M by M M and N by N M . Similarly, we
replace M by M M and N by N M , so we have reduced the proof to the
case M = M of Proposition 7.6.7 (using that Δ1/2 is a distance). 

The proof of Proposition 7.6.8 relies on the following.
428 7. An Assignment Problem

Lemma 7.6.9. Consider independent r.v.s X , X, uniform over [0, 1]. Con-
sider an integer R ≥ 1, a number γ ≥ 2 and the r.v.s

a = exp(−γX) ; a = exp(−γR X ) .
≤R

Then we can find a pair of r.v.s (Y, Y ) such that Y has the same law as the
r.v. a and Y has the same law as the r.v. a with
L L
E|Y − Y | ≤ 2
, E(Y − Y )2 ≤ 2 . (7.94)
γ γ
Proof of Proposition 7.6.8. We use Lemma 7.6.9 for γ = βN , R = M .
Consider independent copies (Y , Y ) of the pair (Y, Y ). It should be obvious

from
 the definition of the sequence u () that S :=
 ≤M Y u() equals

a
≤M M  M M
 ()u () in distribution. Writing S = ≤M Y u(), the left-
hand side of (7.93) is
      2  2
1 1 1 1 ≤M (Y − Y )u()
Δ L ,L ≤E − =E ,
S S S S S2S 2
 2
≤M |Y − Y |
≤E .
S2S 2
We expand the square, and we use (7.94) for γ = βN and one more time
the method used to control (7.52) to find that this is ≤ K(α)/N . 

Proof of Lemma 7.6.9. Given any two r.v.s a, a ≥ 0, there is a canonical
way to construct a coupling of them. Consider the function Y on [0, 1] given
by
Y (x) = inf{t ; P(a ≥ t) ≤ x} .
The law of Y under Lebesgue’s measure is the law of a. Indeed the definition
of Y (x) shows that

P(a ≥ y) > x ⇒ Y (x) > y


P(a ≥ y) < x ⇒ Y (x) < y ,

so that if λ denotes Lebesgue measure, we have λ({Y (x) ≥ y}) = P(a ≥


y). Moreover “the graph of Y is basically obtained from the graph of the
function t → P(a ≥ t) by making a symmetry around the diagonal”. Define
Y similarly. The pair (Y, Y ) is the pair we look for, although it will require
some work to prove this. First we note that
1
E|Y − Y | = |Y (x) − Y (x)|dx .
0

This is the area between the graphs of Y of Y , and also the area between
the graphs of the functions t → P(a ≥ t) and t → P(a ≥ t) because these
7.6 Operators 429

two areas are exchanged by symmetry around the diagonal (except maybe
for their boundary). Therefore

E|Y − Y | = |P(a ≥ t) − P(a ≥ t)|dt .
0

The rest of the proof consists in elementary (and very tedious) estimates of
this quantity when a and a are as in Lemma 7.6.9. For t ≤ 1 we have
   
1 1 1 1
P(a ≥ t) = P(exp(−γX) ≥ t) = P X ≤ log = min 1, log ,
γ t γ t
and similarly
 
1 1
P(exp(−γRX ) ≥ t) = min 1, log .
γR t

Since a ≥ t as soon as one of the summands exp(−γRX ) exceeds t, inde-


pendence implies
  R
1 1
P(a ≥ t) ≥ 1 − 1 − min 1, log := ψ(t) .
γR t

Since (1 − x)R ≥ 1 − Rx for x ≥ 0, we have


   
1 1 1 1
ψ(t) ≤ R min 1, log = min R, log ,
γR t γ t

and since ψ(t) ≤ 1, we have in fact


 
1 1
ψ(t) ≤ min 1, log = P(a ≥ t) .
γ t
We note that
R2 x2
x ≥ 0 ⇒ (1 − x) ≤ e−Rx ≤ 1 − Rx +
R
.
2
Using this for  
1 1
x = min 1, log
Rγ t
this yields that
R2 x2
ψ(t) = 1 − (1 − x)R ≥ Rx −
2
and  
1 1 R2 x2
0 ≤ P(a ≥ t) − ψ(t) ≤ min 1, log − Rx + .
γ t 2
Since
430 7. An Assignment Problem
 
1 1 1 1
min 1, log ≤ Rx ≤ log ,
γ t γ t
we have proved that
 2
1 1 1
0 ≤ P(a ≥ t) − ψ(t) ≤ log . (7.95)
2 γ t

For a real number y we write y + = max(y, 0), so that |y| = −y + 2y + . We


use this relation for y = P(a ≥ t) − P(a ≥ t), so that since P(a ≥ t) ≥ ψ(t)
we obtain
y + ≤ (P(a ≥ t) − ψ(t))+ = P(a ≥ t) − ψ(t) ,
and

|P(a ≥ t) − P(a ≥ t)| ≤ P(a ≥ t) − P(a ≥ t) + 2(P(a ≥ t) − ψ(t)) . (7.96)

Since a ≤ 1, for t > 1 we then have

|P(a ≥ t) − P(a ≥ t)| = P(a ≥ t) = P(a ≥ t) − P(a ≥ t) . (7.97)

Using (7.96) for t ≤ 1 and (7.97) for t > 1 we obtain, using (7.95) in the
second inequality,
∞ 1
|P(a ≥ t) − P(a ≥ t)| dt ≤ 2 (P(a ≥ t) − ψ(t)) dt
0 0
∞ ∞
+ P(a ≥ t) dt − P(a ≥ t) dt
0 0
L
≤ + Ea − Ea .
γ2

Finally we use that by (7.39) we have |Ea − Ea | ≤ L/γ 2 , and this concludes
the proof that E |Y − Y | ≤ L/γ 2 .
We turn to the control of E(Y − Y )2 . First, we observe that

E(Y − Y )2 ≤ 2E(Y − min(Y , 2))2 + 2E(min(Y , 2) − Y )2 .

Now, since Y ≤ 1, we have

E(Y − min(Y , 2))2 = E(min(Y, 2) − min(Y , 2))2


≤ 2E| min(Y, 2) − min(Y , 2)|
L
≤ 2E|Y − Y | ≤ 2 .
γ

The r.v. A = Y − min(Y , 2) satisfies

A>0⇒A=Y −2,
7.6 Operators 431

so that if t > 0 we have P(A ≥ t) = P(Y ≥ t + 2). Since Y and a have the
same distribution, it holds:

E(min(Y , 2) − Y )2 = EA2 = 2 tP(Y ≥ t + 2)dt
0

=2 tP(a ≥ t + 2)dt .
0

To estimate P(a ≥ t), we write, for λ > 0

P(a ≥ t) ≤ exp(−λ t) E exp λ a


 R
= exp(−λ t) E exp(λ exp(−γ R X))

and, using (7.39) in the second inequality, and a power expansion of eλ to


obtain the third inequality, we get
 λp
E exp(λ exp(−γ R X)) = E exp(−γ R p X)
p!
p≥0
 λp eλ
≤ 1+ ≤1+
p! p γ R γR
p≥1
 λ 
e
≤ exp
γR

so that  

P(a ≥ t) ≤ exp − λt .
γ
Taking λ = log γ > 0, we get

P(a ≥ t) ≤ L γ −t

so that since γ ≥ 2 we obtain



L
t P(a ≥ t + 2) dt ≤ . 

0 γ2

Research Problem 7.6.10. (Level 2) Is it true that given an integer n,


there exists a constant K(α, n), and independent r.v.s U1 , . . . , Un of law μ
with
 K(α, n)
E (uN,M (i) − Ui )2 ≤ ? (7.98)
N
i≤n

Proof of Theorem 7.1.2. We will stay somewhat informal in this proof.


We write AN,M = E log ZN,M , so that
432 7. An Assignment Problem

ZN,M
AN,M − AN,M −1 = E log = −E log uN,M (M − 1)
ZN,M −1
ZN,M
AN,M − AN −1,M = E log = −E log wN,M (N ) .
ZN −1,M
 
By Theorem 7.1.2, these quantities have limits − log x dμ (x) and − log x
dν (x) respectively. (To obtain the required tightness, we observe that from
(7.27), (7.28) and Markov’s inequality we have P(uN,M (M − 1) < t) ≤ Kt
and P(wN,M (N ) < t) ≤ Kt.) Setting M (R) = R(1 + α), we write
AN,M − A1,1 = I + II ,
where

I= AR,M (R) − AR−1,M (R)
2≤R≤M

II = AR−1,M (R) − AR−1,M (R−1) .
2≤R≤M

For large R we have

AR,M (R) − AR−1,M (R)  − log x dν(x) ,

and since M (R) − 2 ≤ M (R − 1) ≤ M (R) − 1, we also have

AR−1,M (R) − AR−1,M (R−1)  −(M (R) − M (R − 1)) log x dμ(x) .

The result follows. 


A direction that should be pursued is the detailed study of Gibbs’ mea-
sure; the principal difficulty might be to discover fruitful formulations. If G
denotes Gibbs’ measure, we should note the relation
ZN,M (i; j)
G({σ(i) = j}) = a(i, j)  a(i, j)wN,M (i)uN,M (j) . (7.99)
ZN,M
Also, if i1 = i2 and j1 = j2 , we have
ZN,M (i1 , i2 ; j1 , j2 )
G({σ(i1 ) = j1 ; σ(i2 ) = j2 }) = a(i1 , i2 )a(j1 , j2 ) . (7.100)
ZN,M
One can generalize (7.7) to show that
ZN,M (i1 , i2 ; j1 , j2 )
 wN,M (i1 )wN,M (i2 )uN,M (j1 )uN,M (j2 )
ZN,M
so comparing (7.99) and (7.100) we get
G({σ(i1 ) = j1 ; σ(i2 ) = j2 })  G({σ(i1 ) = j1 })G({σ(i2 ) = j2 }) .
The problem however to find a nice formulation is that the previous relation
holds for most values of j1 and j2 simply because both sides are nearly zero!
7.7 Notes and Comments 433

7.7 Notes and Comments

A recent paper [169] suggests that it could be of interest to investigate the


following model. The configuration space consists of all pairs (A, σ) where
A is a subset of {1, . . . , N }, and where σ is a one to one map from A to
{1, . . . , N }. The Hamiltonian is then given by

HN ((A, σ)) = −CcardA + βN c(i, σ(i)), (7.101)
i∈A

where C is a constant and c(i, j) are as previously. The idea of the Hamil-
tonian is that the term −CcardA favors the pairs (A, σ) for which cardA is
large. It seems likely that, given C, results of the same nature as those we
proved can be obtained for this model when β ≤ β(C), but that it will be
difficult to prove the existence of a number β0 such than these results hold
for β ≤ β0 , independently of the value of C, and even more difficult to prove
that (as the results of [169] seem to indicate) they will hold for any value of
C and of β.
A. Appendix: Elements of Probability Theory

A.1 How to Use this Appendix

This appendix lists some well-known and some less well-known facts about
probability theory. The author does not have the energy to give a reference in
the printed literature for the well known facts, for the simple reason that he
has not opened a single textbook over the last three decades. However all the
statements that come without proof should be in standard textbooks, two of
which are [10] and [161]. Of course the less well-known facts are proved in
detail.
The appendix is not designed to be read from the first line. Rather one
should refer to each section as the need arises. If you do not follow this advice,
you might run into difficulties, such as meeting the notation L before having
learned that this always stands for a universal constant (= a number).

A.2 Differentiation Inside an Expectation

For the purpose of derivation inside an integral sign, or, equivalently, inside
an expectation, the following result will suffice. It follows from Lebesgue’s
dominated convergence theorem. If that is too fancy, much more basic ver-
sions of the same principle suffice, and can be found in Wikipedia.
Proposition A.2.1. Consider a random function ψ(t) defined on an inter-
val J of R, and assume that E|ψ(t)| < ∞ for each t ∈ J. Assume that the
function ψ(t) is always continuously differentiable, and that for each compact
subinterval I of J one has

E sup |ψ (t)| < ∞ . (A.1)


t∈I

Then the function ϕ(t) = Eψ(t) is continuously differentiable and

ϕ (t) = Eψ (t) . (A.2)

As an illustration we give a proof of (1.41).

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 435
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
436 A. Appendix: Elements of Probability Theory

Proposition A.2.2. Consider an infinitely differentiable function F on RM ,


such that all its partial derivatives are of “moderate growth” in the sense of
(A.18). Consider two independent centered √ jointly
√ Gaussian families u =
(ui )i≤M , v = (vi )i≤M , and let ui (t) = tui + 1 − tvi , u(t) = (ui (t))i≤M .
Consider the function
ϕ(t) = EF (u(t)) . (A.3)
Let
d 1 1
ui (t) = ui (t) = √ ui − √ vi .
dt 2 t 2 1−t
Then
 ∂F
ϕ (t) = E ui (t) (u(t)) . (A.4)
∂xi
i≤M

Proof. We prove (A.1) with ψ(t) = F (u(t)). We write first that for a
compact subinterval I of ]0, 1[ we have
   
 ∂F   ∂F 
supui (t) (u(t)) ≤ sup |ui (t)| sup (u(t)) .
t∈I ∂xi t∈I t∈I ∂xi

Using the Cauchy-Schwarz inequality, to prove (A.1) it suffices to prove that


 2
E sup |ui (t)| < ∞
t∈I

and   2
 ∂F 
E sup (u(t)) < ∞ . (A.5)
t∈I ∂xi

We prove only the second inequality, since the first one is rather immediate.
Using that ∂F/∂xi is of moderate growth (as in (A.18)), given any a > 0 we
first see that there is a constant A such that
 
 ∂F 
 
 ∂xi (x) ≤ A exp ax ,
2

and since
√ √ √
u(t) ≤ tu + 1 − tv ≤ 2 max(u, v)
we obtain
 
 ∂F 
sup (u(t)) ≤ A max(exp 2au2 , exp 2av2 )
t∈I ∂xi

≤ A exp 2a (u2i + vi2 ) ,
i≤M

so that (A.5) follows from Hölder’s inequality and the integrability properties
of Gaussian r.v.s, namely the fact that if g is a Gaussian r.v. then E exp ag 2 <
∞ for aEg 2 < 1/2 as follows from (A.11) below. 

A.3 Gaussian Random Variables 437

A.3 Gaussian Random Variables

A (centered) Gaussian r.v. g has a density of the type


 
1 t2
√ exp − 2
2πτ 2τ

so that E g 2 = τ 2 . When τ = 1, g is called standard Gaussian. We hardly ever


use non-centered Gaussian r.v., so that the expression “consider a Gaussian
r.v. z” means “consider a centered Gaussian r.v. z”. A fundamental fact is
that
a2 τ 2
E exp ag = exp . (A.6)
2
Indeed,
∞  
1 t2
E exp ag = √ exp at − 2 dt
2πτ −∞ 2τ
  ∞  
2 2
a τ 1 (t − aτ 2 )2
= exp √ exp − dt
2 2πτ −∞ 2τ 2
a2 τ 2
= exp .
2
For a r.v. Y ≥ 0 and s > 0 we have Markov’s inequality
1
P(Y ≥ s) ≤ EY . (A.7)
s
Using this for Y = exp(λX), where X is any r.v., we obtain for any λ ≥ 0
the following fundamental inequality:

P(X ≥ t) = P(exp(λX) ≥ eλt ) ≤ e−λt E exp(λX) . (A.8)

Changing X into −X and t into −t, we get the following equally useful fact:

P(X ≤ t) ≤ eλt E exp(−λX) .

Combining (A.6) with (A.8) we get for any t ≥ 0 that


 
λ2 τ 2
P(g ≥ t) ≤ exp −λt + ,
2

and taking λ = t/τ 2  


t2
P(g ≥ t) ≤ exp − 2 . (A.9)

Elementary estimates (to be found in any probability book worth its price)
show that for t > 0 we have, for some number L,
438 A. Appendix: Elements of Probability Theory
 
1 t2
P(g ≥ t) ≥ exp − 2 . (A.10)
L(1 + t/τ ) 2τ

This is actually proved in (3.137) page 229, a way to show that this book is
worth what you paid for. There is of course a more precise understanding of
the tails of g than (A.9) and (A.10); but (A.9) and (A.10) will mostly suffice
here. Another fundamental formula is that when Eg 2 = τ 2 then for 2aτ 2 < 1
and any b we have

1 τ 2 b2
E exp(ag 2 + bg) = √ exp . (A.11)
1 − 2aτ 2 2(1 − 2aτ 2 )

Indeed,
∞  
1 t2
E exp(ag + bg) = √
2
exp at − 2 + bt dt .
2
2πτ −∞ 2τ

We then complete the squares by writing


 2
t2 1 − 2aτ 2 bτ 2 bτ 2
at − 2 + bt = −
2
t− −
2τ 2τ 2 1 − 2aτ 2 2(1 − 2aτ 2 )

and conclude by making the change of variable

bτ 2 τ
t= + u√ .
1 − 2aτ 2 1 − 2aτ 2

The following is also important.


Lemma A.3.1. Consider M Gaussian r.v.s (gi )i≤M with Egi2 ≤ τ for each
i ≤ N . We do NOT assume that they are independent. Then we have

E max gi ≤ τ 2 log M . (A.12)


i≤M

Proof. Consider β > 0. Using Jensen’s inequality (1.23) as in (1.24) and


(A.6) we have
    
E log exp βgi ≤ log E exp βgi
i≤M i≤M
  
1 2 2
≤ log M exp β τ
2
β2τ 2
= + log M . (A.13)
2
Now
A.3 Gaussian Random Variables 439
 
β max gi ≤ log exp βgi ,
i≤M
i≤M
so that, using (A.13),
 
β2τ 2
βE max gi ≤ E log exp βgi ≤ + log M .
i≤M 2
i≤M

Taking β = 2 log M /τ yields (A.12). 

An important fact is that when the r.v.s gi are independent, the inequality
(A.12) can essentially be reversed,
τ
E max ≥ log M .
i≤M L
We do not provide the simple proof, since we will not use this statement.
Given independent standard Gaussian r.v.s g1 , . . . , gM , their joint law
has density (2π)−M/2 exp(−x2 /2), where x2 = i≤M x2i . This density is
invariant byrotation, and, as a consequence, the law of every linear combi-
nation z = i≤M ai gi is Gaussian. The set G of these linear combinations is
a vector space, each element of which is a Gaussian r.v. Such a space is often
called a Gaussian space.  It has a natural dot product, given with obvious
notation by E zz = k≤M ak ak . Given two linear subspaces F1 , F2 of F , if
these spaces are orthogonal, i.e. E z1 z2 = 0 whenever z1 ∈ F1 , z2 ∈ F2 , they
are probabilistically independent. This is obvious from rotational invariance,
since after a suitable rotation these spaces are spanned by two disjoint subsets
of g1 , . . . , gM .
We say that a family z1 ,  . . . , zN of r.v.s is jointly Gaussian if the law
of every linear combination k≤N ak zk is Gaussian. If z1 , . . . , zN belong
to a Gaussian space G as above, then obviously the family z1 , . . . , zN is
jointly Gaussian. All the jointly Gaussian families considered in this book
will obviously be of this type, since they are defined by explicit formulas such
as zk = i≤M ak,i gi where g1 , . . . , gM are independent standard Gaussian
r.v.s, a formula that we abbreviate by zk = g · ak where g = (g1 , . . . , gM ),
ak = (ak,1 , . . . , ak,M ) and · denotes the dot product in RM . For the beauty of
it, let us mention that, in distribution, any jointly Gaussian family z1 , . . . , zN
can be represented as above as zk = ak · g (with M = N ). This is simply be-
cause the joint law of a jointly Gaussian family z1 , . . . , zk is determined by the
numbers Ezk z , so that it suffices to find the vectors ak in such a manner that
Ezk z = ak · a . If we think of the linear span of the r.v.s z1 , . . . , zN provided
with the dot product z · z = Ezz as an Euclidean space, and of z1 , . . . , zN
as points in this space, they provide exactly such a family of vectors.
Another interesting fact is the following. If (qu,v )u,v≤n is a symmetric
positive definite matrix, there exists jointly Gaussian r.v.s (Yu )u≤n such that
E Yu Yv = qu,v . This is obvious when the matrix (qu,v ) is diagonal; the gen-
eral case follows from the fact that a symmetric matrix diagonalizes in an
orthogonal basis.
440 A. Appendix: Elements of Probability Theory

A.4 Gaussian Integration by Parts


Given a continuously differentiable function F on R (that satisfies the growth
condition at infinity stated below in (A.15)) and a centered Gaussian r.v. g
we have the integration by parts formula

E gF (g) = E g 2 E F (g) . (A.14)

To see this, if E g 2 = τ 2 , we have


 
1 t2
E gF (g) = √ t exp − 2 F (t)dt
2πτ R 2τ
 
τ2 t2
= √ exp − 2 F (t)dt
2πτ R 2τ
= E g 2 E F (g)

provided
lim F (t) exp(−t2 /2τ 2 ) = 0 . (A.15)
|t|→∞

This formula is used over and over in this work. As a first application, if
Eg 2 = τ 2 and 2aτ 2 < 1 we have

Eg 2 exp ag 2 = Eg(g exp ag 2 ) = τ 2 (E exp ag 2 + E 2ag exp ag 2 ) , (A.16)

so that
1
(1 − 2aτ 2 )Eg exp ag 2 = τ 2 E exp ag 2 = τ √
1 − 2aτ 2
by (A.11) and Eg exp ag 2 = τ 2 (1−2aτ 2 )−3/2 . As another application, if k ≥ 2

Eg k = Egg k−1 = τ 2 (k − 1)Eg k−2 ,

so that in particular Eg 4 = 3τ 2 , and one can recursively compute all the


moments of g. All kinds of Gaussian integrals can be computed effortlessly
in this manner.
Condition (A.15) holds in particular if F is of moderate growth in the
sense that lim|t|→∞ F (t) exp(−at2 ) = 0 for each a > 0. A function F (with a
regular behavior as will be the case of all the functions we consider) fails to
be of moderate growth if “it grows as fast as exp(at2 ) for some a > 0”. The
functions to which we will apply the integration by parts formula typically
do not “grow faster than exp(At)” for a certain number A (except in the case
of certain very explicit functions such as in (A.16)).
Formula (A.14) generalizes as follows. Given g, z1 , . . . , zn in a Gaussian
space G, and a function F of n variables (with a moderate behavior at infinity
to be stated in (A.18) below), we have
 ∂F
EgF (z1 , . . . , zn ) = E(gz )E (z1 , . . . , zn ) . (A.17)
∂x
≤n
A.5 Tail Estimates 441

This is probably the single most important formula in this work. For a proof,
consider the r.v.s
E z g
z = z − g .
E g2
They satisfy E z g = 0; thus g is independent of the family (z1 , . . . , zn ). We
then apply (A.14) at (z )≤n given. Since z = z +gE gz /E g 2 , (A.17) follows
whenever the following is satisfied to make the use of (A.14) legitimate (and
to allow the interchange of the expectation in z and in the family (z1 , . . . , zn )):
for each number a > 0, we have

lim |F (x)| exp(−ax2 ) = 0 . (A.18)


x→∞

A.5 Tail Estimates

We recall that given any r.v. X and λ > 0, by (A.8) we have

P(X ≥ t) ≤ e−λt E exp λX .



If X = i≤N Xi where (Xi )i≤N are independent, then

E exp λX = E exp λXi ,
i≤N

so that
   
P(X ≥ t) ≤ e−λt E exp λXi = exp −λt + log E exp λXi . (A.19)
i≤N i≤N

If (ηi )i≤N are independent Bernoulli r.v.s, i.e. P(ηi = ±1) = 1/2, then
E exp λ ai ηi = ch λai , and thus
    
P ai ηi ≥ t ≤ exp −λt + log ch λai . (A.20)
i≤N i≤N

It is obvious on power series expansions that ch t ≤ exp(t2 /2), so that


   
λ2  2
P ai ηi ≥ t ≤ exp −λt + ai ,
2
i≤N i≤N

and by optimization over λ, for all t ≥ 0,


   
t2
P ai ηi ≥ t ≤ exp −  . (A.21)
2 i≤N a2i
i≤N
442 A. Appendix: Elements of Probability Theory

This
 inequality is often called the subgaussian inequality. By symmetry,
P i≤N ai ηi ≤ −t is bounded by the same expression, so that
     
  t2
P  
ai ηi  ≥ t ≤ 2 exp −  . (A.22)
2 i≤N a2i
i≤N

As a consequence of (A.21) we have the following


 
N t2
card{(σ , σ ) ∈
1 2 2
ΣN ; R1,2 ≥ t} ≤ 2 2N
exp − . (A.23)
2

This is seen by taking ai = 1/N , by observing that for the uniform measure
2 1 2
on ΣN the
 sequence ηi = σi σi is an independent Bernoulli sequence and that
R1,2 = i≤N ai ηi . Related to (A.21) is the fact that
 2
1  1
E exp ai ηi ≤1  . (A.24)
2 1 − i≤N a2i
i≤N

Equivalently,
  2
1  2N
exp ai σi ≤1  ,
2 1 − i≤N a2i
i≤N

where the summation is over all sequences (σi )i≤N with σi = ±1. To prove
(A.24) we consider a standard Gaussian r.v. g independent of the r.v.s ηi
and, using (A.6), we have, denoting by Eg expectation in g only, and using
again that log ch t ≤ t2 /2,
 2
1  
E exp ai ηi = E Eg exp gai ηi
2
i≤N i≤N

= Eg exp log chgai
i≤N

g2  2
≤ Eg exp ai
2
i≤N
1
= 1  .
1− i≤N a2i
 √
It follows from (A.24) that if S = a2i , then, if bi = ai / 2S, we have
 2
i≤N
i≤N bi = 1/2 and
 2  2
1 1  1
E exp ai ηi = E exp bi ηi ≤ ≤2.
4S 2 1/2
i≤N i≤N
A.5 Tail Estimates 443

Since exp x ≥ xn /n! ≥ xn /nn for each n and x ≥ 0 we see that


 2n  n
E ai ηi ≤ 2(4n)n S n = 2(4n)n a2i , (A.25)
i≤N i≤N

a relation known as Khinchin’s inequality.


Going back to (A.20), if ai = 1 for each i ≤ N , changing t into N t, we
get  
P ηi ≥ N t ≤ exp N (−λt + log ch λ) .
i≤N

If 0 ≤ t < 1, the exponent is minimized for th λ = t, i.e.

eλ − e−λ e2λ − 1
λ −λ
= 2λ =t,
e +e e +1

so that e2λ = (1 + t)/(1 − t) and


1
λ= (log(1 + t) − log(1 − t)) .
2
Also, ch−2 λ = 1 − th2 λ = 1 − t2 , so that
1
log ch λ = − log(1 − t2 ) ,
2
and
1
min (−λt + log ch λ) = − (t log(1 + t) − t log(1 − t))
λ 2
1 1
− log(1 − t) − log(1 + t)
2 2
= −I(t) (A.26)

where
1
I(t) = ((1 + t) log(1 + t) + (1 − t) log(1 − t)) . (A.27)
2
The function I(t) is probably better understood by noting that
1
I(0) = I (0) = 0 , I (t) = . (A.28)
1 − t2
It follows from (A.26) that
 
P ηi ≥ N t ≤ exp(−N I(t)) ,
i≤N

or, equivalently, that


444 A. Appendix: Elements of Probability Theory
"  %
card σ ∈ ΣN ; σi ≥ tN ≤ 2N exp(−N I(t)) . (A.29)
i≤N

If k is an integer, then i≤N σi = k exactly when the sequence (σi )i≤N
contains (N + k)/2 times 1 and (N − k)/2 times −1. This is impossible√when
N + k is odd. When N + k is even, using Stirling’s formula n! ∼ nn e−n 2πn,
we obtain
"  %  
N N!
card σ ∈ ΣN ; σi = k = N +k =  N +k   N −k 
i≤N 2 2 ! 2 !

1 N NN

L (N − k)(N + k) N +k (N +k)/2  N −k (N −k)/2
 
2 2
2N 1
≥ √   (N +k)/2  (N −k)/2
L N 1+ k
N 1 − Nk
  
2N k
= √ exp −N I . (A.30)
L N N

This reverses the inequality (A.29) within the factor√L N .
Since by Lemma 4.3.5 the function t → log ch t is concave, it follows
from (A.20) that
   ( 

P ai ηi ≥ t N ≤ exp N −λt + log ch λ a2i
i≤N i≤N

and, using (A.26)


    
√ t
P ai ηi ≥ t N ≤ exp −N I 1 . (A.31)
i≤N i≤N a2i

A.6 How to Use Tail Estimates

It will often occur that for a r.v. X, we know an upper bound for the prob-
abilities P(X ≥ t), and that we want to deduce an upper bound for EF (X)
for a certain function F . For example, if Y is a r.v., Y ≥ 0, then

EY = P(Y ≥ t) dt , (A.32)
0

using Fubini theorem to compute the “area under the graph of Y ”.


More generally, if X ≥ 0 and F is a continuously differentiable non-
decreasing function on R+ we have
A.6 How to Use Tail Estimates 445
X
F (X) = F (0) + F (t)dt = F (0) + F (t)dt .
0 {t≤X}

Taking expectation, and using Fubini’s theorem to exchange the integral in


t and the expectation, we get that

EF (X) = F (0) + F (t)P(X ≥ t)dt . (A.33)
0

For a typical application of (A.33) let us assume that X satisfies the following
tail condition:  
t2
∀t ≥ 0 , P(|X| ≥ t) ≤ 2 exp − 2 , (A.34)
2A
where A is a certain number. Then, using (A.33) for F (x) = xk and |X|
instead of X we get
∞  
t2
E|X| ≤ 2k
k
t k−1
exp − 2 dt .
0 2A

The right-hand side can be recursively computed by integration by parts. If


k ≥ 3,
∞   ∞  
t2 t2
t k−1
exp − 2 dt = (k − 2)A 2
t k−3
exp − 2 dt.
0 2A 0 2A

In this manner one obtains e.g.

EX 2k ≤ 2k+1 k!A2k .

This√shows in particular that “the moments of order k of X grow at most


like k.” Indeed, using the crude inequality k! ≤ kk we obtain

(E|X|k )1/k ≤ (EX 2k )1/2k ≤ 2A k . (A.35)

Suppose, conversely, that for a given r.v. X we know that for a certain number
B and any k ≥ 1 we have EX 2k ≤ B 2k k k (i.e. an inequality of the
type (A.35)
for even moments). Then, using the power expansion exp x2 = k≥0 x2k /k!,
for any number C we have

X2  EX 2k  B 2k k k
E exp 2
= 2k
≤ .
C C k! C 2k k!
k≥0 k≥0

Now, by Stirling’s formula, there is a constant L0 such that k k ≤ Lk0 k!, and
therefore there is a number L (e.g. L = 2L0 ) such that

X2
E exp ≤2.
LB 2
446 A. Appendix: Elements of Probability Theory

This implies in turn that


 
t2
P(X ≥ t) ≤ 2 exp − .
LB 2

Many r.v.s considered in this work satisfy the condition (A.34). The previous
considerations explain why, when convenient, we control these r.v.s through
their moments.
If F is a continuously differentiable non-decreasing function on R, F ≥ 0,
F (−∞) = 0, we have
X
F (X) = F (t)dt = F (t)dt .
−∞ {t≤X}

Taking expectation, and using again Fubini’s theorem to exchange the inte-
gral in t and the expectation, we get now that

E F (X) = F (t) P(X ≥ t) dt . (A.36)
−∞

This no longer assumes that X ≥ 0. Considering now a < b we have


b
E(F (min(X, b))1{X≥a} ) = F (a)P(X ≥ a) + F (t)P(X ≥ t)dt . (A.37)
a

This is seen by using (A.36) for the conditional probability that X ≥ a, and
for the r.v. min(X, b) instead of X.

A.7 Bernstein’s Inequality

Theorem A.7.1. Consider a r.v. X with EX = 0 and an independent se-


quence (Xi )i≤N distributed like X. Assume that, for a certain number A, we
have
|X|
E exp ≤2. (A.38)
A
Then, for all t > 0 we have
    2 
t t
P Xi ≥ t ≤ exp − min , (A.39)
4N A2 2A
i≤N

    
t2 4A3 t
P Xi ≥ t ≤ exp − 1− . (A.40)
2N EX 2 N (EX 2 )2
i≤N
A.7 Bernstein’s Inequality 447

Proof. From (A.19) we obtain


 
P Xi ≥ t ≤ exp(−λt + N log E exp λX) . (A.41)
i≤N

We have
E exp λX = 1 + E ϕ(λX) (A.42)
where ϕ(x) = ex − x − 1. We observe that Eϕ(|X|/A) ≤ E exp(|X|/A) − 1 =
2 − 1 = 1. Now power series expansion yields that ϕ(x) ≤ ϕ(|x|) and that for
x > 0, the function λ → ϕ(λx)/λ2 increases. Thus, for λ ≤ 1/A, we have

E ϕ(λX) ≤ λ2 A2 E ϕ(|X|/A) ≤ λ2 A2 .

Combining (A.42) with the inequality log(1+x) ≤ x, we obtain log E exp λX ≤


λ2 A2 . Consequently (A.41) implies
 
P Xi ≥ t ≤ exp(−λt + N λ2 A2 ) .
i≤N

We choose λ = t/2N A2 if t ≤ 2N A (so that λ ≤ 1/A). When t ≥ 2N A, we


choose λ = 1/A, and then
t t
−λt + N λ2 A2 = − +N ≤− .
A 2A
This proves (A.39). To prove (A.40) we replace (A.42) by

λ2 EX 2
E exp λX = 1 + + E ϕ1 (λX)
2
where ϕ1 (x) = ex −x2 /2−x−1. We observe that Eϕ1 (|X|/A) ≤ Eϕ(|X|/A) ≤
1. Using again power series expansion yields ϕ1 (x) ≤ ϕ1 (|x|) and that for
x > 0 the function λ → ϕ1 (λx)/λ3 increases. Thus, if λ ≤ 1/A, we get

E ϕ1 (λX) ≤ λ3 A3 E ϕ1 (|X|/A) ≤ λ3 A3

so that log E exp λX ≤ λ2 EX 2 /2 + λ3 A3 and we choose λ = t/N EX 2 to


obtain (A.40) when t ≤ N EX 2 /A. When t ≥ N EX 2 /A, then

4A3 t 4A2
2 2
≥ ≥1
N (EX ) EX 2

because EX 2 /2A2 ≤ E exp |X|/A ≤ 2. Thus (A.40) is automatically satisfied


in that case since the right-hand side is ≥ 1. 
Another important version of Bernstein’s inequality assumes that

|X| ≤ A . (A.43)
448 A. Appendix: Elements of Probability Theory

case for p ≥ 2 we have EX ≤ A EX , so that when λ ≤ 1, and


p p−2 2
In that
since p≥2 1/p! = e − 2 ≤ 1,

 λp  (λA)p−2
E ϕ(λX) = E X p ≤ λ2 E X 2 ≤ λ2 E X 2 .
p! p!
p≥2 p≥2

Proceeding as before, and taking now λ = min(t/E X 2 , 1/A), we get


    
t2 t
P Xi ≥ t ≤ exp − min , . (A.44)
4N E X 2 2A
i≤N

We will also need a version of (A.39) for martingale difference sequences.


Assume that we are given an increasing sequence (Ξi )0≤i≤N of σ-algebras.
A sequence (Xi )1≤i≤N is called a martingale difference sequence if Xi is Ξi -
measurable and Ei−1 (Xi ) = 0, where Ei−1 denotes conditional expectation
given Ξi−1 . Let us assume that for a certain number A we have

|Xi |
∀ i ≤ N , Ei−1 exp ≤2. (A.45)
A
Exactly as before, this implies that for |λ| A ≤ 1 we have Ei−1 exp λXi ≤
exp λ2 A2 . Thus
   
Ek−1 exp λ Xi = exp λ Xi Ek exp λ Xk
i≤k i≤k−1
  
≤ exp λ 2 2
Xi + λ A .
i≤k−1

By decreasing induction over k, this shows that for each k we have


   
Ek−1 exp λ Xi ≤ exp λ Xi + (N − k + 1) λ A .
2 2

i≤N i≤k−1

Using this for k = 1 and taking expectation yields E exp λ i≤N Xi ≤
exp N λ2 A2 . Use of Chebyshev inequality as before gives
    2 
t t
P Xi ≥ t ≤ exp − min , . (A.46)
4N A2 2A
i≤N

A.8 ε-Nets

A ball of RM is a convex balanced set with non-empty interior. The convex


hull of a set A is denoted by convA.
A.9 Random Matrices 449

Proposition A.8.1. Given a ball B of RM , we can find a subset A of B


such that  M
1
card A ≤ 1 + (A.47)
ε
∀ x ∈ B , A ∩ (x + 2 εB) = ∅ (A.48)
conv A ⊃ (1 − 2 ε)B . (A.49)
Moreover, given a linear functional ϕ on R , we have
M

sup ϕ(x) ≥ (1 − 2 ε) sup ϕ(x) . (A.50)


x∈A x∈B

As a corollary, we can find a subset A of (1 − 2ε)−1 B such that cardA ≤


(1 + ε−1 )M and B ⊂ convA. The case ε = 1/4 is of interest: cardA ≤ 5M and
supx∈A ϕ(x) ≥ (1/2) supx∈B ϕ(x).
Proof. We simply take for A a maximal subset of B such that the sets x+εB
are disjoint for x ∈ A. These sets are of volume εM VolB, and are entirely
contained in the set (1 + ε)B, which is of volume (1 + ε)M VolB. This proves
(A.47).
Given x in B, we can find y in A with (x+εB)∩(y+εB) = ∅, for otherwise
this would contradict the maximality of A. Thus y ∈ (x + 2εB) ∩ A. This
proves (A.48).
Using (A.48), given x in B, we can find y0 in A with x − y0 ∈ 2 εB.
Applying this to (x − y0 )/2ε, we find y1 in A with x − y0 − 2εy1 ∈ (2ε)2 B,
and in this manner we find a sequence (yi ) in A with

y= (2ε)i yi ∈ (1 − 2ε)−1 convA ,
i≥0

since A is finite. This proves (A.49), of which (A.50) is an immediate conse-


quence. 

A.9 Random Matrices

In this section we get some control of the norm of certain random matrices.
Much more detailed (and difficult) results are known.

Lemma A.9.1. If (gij )1≤i<j≤N are independent standard Gaussian r.v.s,


then, with probability at least 1 − L exp(−N ) we have
  
  √  1/2

∀ (xi )i≤N , ∀ (yi )i≤N ,  
gij xi yj  ≤ L N 2
xi yi2 . (A.51)
i<j i≤N i≤N
450 A. Appendix: Elements of Probability Theory

Proof. Let us denote by B the Euclidean ball of RN , and by A a sub-


set of 2B with card A ≤ 5N and conv A ⊃ B, as provided by Proposi-
 2
tion A.8.1. If (xi )i≤N and (yi )i≤N belong to A, then E gij xi yj ≤
  i<j
j≤N yj ≤ 16 and (A.9) implies
2 2
i≤N xi
    2
  t
P  gij xi yj  ≥ t ≤ 2 exp − ,
i<j
32

so that with probability at least 1 − 2(25)N exp(−64N ) it holds that


 
  √

∀ (xi )i≤N , ∀ (yi )i≤N ∈ A ,  gij xi yj  ≤ 32 N ,
i<j

and hence
 
  √
∀ (xi )i≤N , ∀ (yi )i≤N 
∈B,  gij xi yj  ≤ 32 N ,
i<j

and this implies (A.51). 


We consider independent Bernoulli r.v.s (ηi,k )i≤N,k≤M , that is, P(ηi,k =
±1) = 1/2.
 2
Lemma A.9.2. Consider numbers (αk,k )k,k ≤M with αk,k ≤ 1. Then,
for t > 0 we have
     2 
t t
P αk,k ηi,k ηi,k ≥ t ≤ exp − min
  , (A.52)

NL L
k=k i≤N
     
t2 Lt
P αk,k ηi,k ηi,k ≥ t ≤ exp − 1− . (A.53)
2N N
k=k i≤N

Proof. The r.v.s Xi = k=k αk,k ηi,k ηi,k are i.i.d., and obviously EXi = 0,

EXi2 =  ≤ 1. An important result of C. Borell [14] implies that then
2
αk,k
E exp(|Xi |/L) ≤ 2 so that (A.52) is a consequence of (A.39) and (A.53) is a
consequence of (A.40). 
Proposition A.9.3. Consider a number 0 < a ≤ 1 and n ≤ M . If
n log(eM/n) ≤ N a2 , the following event occurs with probability at least
1 − exp(−a2 N ). Given any subset I of {1, . . . , M } with card I = n, and
any sequences (xk )k≤M , (yk )k≤M , we have
   
xk ηi,k yk ηi,k
i≤N k∈I k∈I

   1/2
1/2
≤N xk yk + N La x2k yk2 . (A.54)
k∈I k∈I k∈I
A.9 Random Matrices 451

Corollary A.9.4. If a ≤ 1 and M ≤ N a2 , then with probability at least


1 − exp(−a2 N/L), for any sequences (xk )k≤M and (yk )k≤M we have
    
xk ηi,k yk ηi,k
i≤N k≤M k≤M

  1/2   1/2
≤N xk yk + N La x2k yk2 , (A.55)
k≤M k≤M k≤M

and
  2  
xk ηi,k ≤ N (1 + La) x2k . (A.56)
i≤N k≤M k≤M

Proof. The case n = M of (A.54) is (A.55) and the case yk = xk of (A.55)


is (A.56). 

Proof of Proposition A.9.3. We rewrite (A.54) as

   1/2  1/2
xk yk ηi,k ηi,k ≤ LN a x2k yk2 . (A.57)
i≤N k=k , k, k ∈I k∈I k∈I

 card2 A ≤ 5 , A ⊂ 2B and conv A ⊃ B,


Consider a subset A of Rn , with n

where B is the Euclidean ball k≤n xk = 1. To ensure (A.57) it suffices that


 
xk yk ηi,k ηi,k ≤ LN a (A.58)
i≤N k=k ,k,k ∈I

whenever (xk )k∈I ∈ A and (yk )k∈I ∈ A. Now, given any such sequences
(A.52) implies
    
N
P xk yk ηi,k ηi,k ≥ N u ≤ exp − min(u , u) . (A.59)
2

 
L
i≤N k=k ,k,k ∈I

Since n ≤ M and n log(eM/n) ≤ N a2 it holds that n ≤ N a2 . We observe also


that 25 ≤ e4 . Thus the number of possible choices for I and the sequences
(xk )k∈I , (yk )k∈I is at most
   n   
M eM eM
(card A)2 ≤ 25n = 25n exp n log ≤ exp 5N a2
n n n

so that taking u = L a where L large enough, all the events (A.58) simulta-
neously occur with a probability at least 1 − exp(−N a2 ). 
Our next result resembles Proposition A.9.3, but rather than restricting
the range of k we now restrict the range of i.
452 A. Appendix: Elements of Probability Theory

Proposition A.9.5. Consider a number 0 < a < 1. Consider a number


N0 ≤ N such that N0 log(eN/N0 ) ≤ a2 N , and assume that M ≤ a2 N . Then
the following event occurs with probability at least 1 − exp(−a2 N ): Given any
subset J of {1, . . . , N } with cardJ ≤ N0 , and any sequence (xk )k≤M , we have

  2   
xk ηi,k ≤ N0 x2k + L max(N a2 , N N0 a) x2k .
i∈J k≤M k≤M k≤M
(A.60)

Proof. The proof is very similar to the proof of Proposition A.9.3. It suffices
to prove that for all choices of (xk ) and (yk ) we have

   1/2   1/2
xk yk ηi,k ηi,k ≤ L max(N a2 , N N0 a) x2k yk2 .
i∈J k=k k≤M k≤M
(A.61)
Consider a subset A of R

M
, with cardA ≤ 5 M
, A ⊂ 2B, B ⊂ convA, where
B is the Euclidean ball k≤M x2k ≤ 1. To ensure (A.61) it suffices that

xk yk ηi,k ηi,k ≤ L max(N a2 , N N0 a)
i∈J k=k

whenever cardJ ≤ N0 , (xk )k≤M , (yk )k≤M ∈ A. It follows from (A.52) that
for v > 0,

    
cardJ
P xk yk ηi,k ηi,k ≥ vcardJ ≤ exp − 2
min(v , v) ,
L
i∈J k=k

and using this for v = uN0 /cardJ ≥ u entails


    
N0
P xk yk ηi,k ηi,k ≥ N0 u ≤ exp − 2
min(v , u) . (A.62)

L
i∈J k=k

The number of possible choices for J and the sequences (xk )k≤M , (yk )k≤M
is at most
 N  
eN
 N0
(cardA) ≤
2
25M ≤ exp 5N a2 ,
n N0
n≤N0

so that by taking u = L max(a2 N/N0 , a N/N0 ) where L is large enough,


all the events (A.61) simultaneously occur with a probability at least 1 −
exp(−N a2 ). 

Here is another nice consequence of Lemma A.9.2.
A.9 Random Matrices 453

Lemma A.9.6. If ε > 0 we have


  
−2
P N Rk,k ≥ (1 − 2 ε) u
2

1≤k<k ≤M
 M 2    
1 u u
≤ 1+ exp − 1−L
ε 2 N

where Rk,k = N −1 i≤N ηi,k ηi,k .
Proof. We start the proof by observing that
 1/2 
2
Rk,k = sup αk,k Rk,k
k<k k<k

where the supremum


 is taken over the subset B of RM (M −1)/2 of sequences
αk,k with k<k αk,k ≤ 1. We use Proposition A.8.1 to find a subset A of

2

B with card A ≤ (1 + ε−1 )M such that


2

  1/2
sup αk,k Rk,k ≥ (1 − 2 ε) 2
Rk,k  .
A
k<k

Thus
 
−2
 ≥ (1 − 2 ε)
2
P N Rk,k u
k<k
 1/2  
 u
−1
=P 2
Rk,k  ≥ (1 − 2 ε)
N
k<k
  
 u
≤ P sup αk,k Rk,k ≥
A N
k<k
 M 2    
1 u u
≤ 1+ exp − 1−L ,
ε 2 N

where we use (A.53) for t = uN in the last line. 
Corollary A.9.7. We have
"  %
−N n −2
2 card (σ 1 , . . . , σ n ) ; 2
N R,  ≥ (1 − 2 ε) u
1≤< ≤n
 n2    
1 u u
≤ 1+ exp − 1−L
ε 2 N
 
where R, = N −1 i≤N σi σi .
Proof. This is another way to formulate Lemma A.9.6 when M = n. 
454 A. Appendix: Elements of Probability Theory

A.10 Poisson Random Variables and Point Processes

A Poisson random variable X of expectation a is an integer-valued r.v. such


that, for k = 0 , 1 , . . .
ak −a
P(X = k) = e
k!
so that
 ak
E exp λX = eλk−a = exp a(eλ − 1) . (A.63)
k!
k≥0

Differentiating 1, 2, or 3 times this relation in λ and setting λ = 0 we see


that
EX = a ; EX 2 = a + a2 ; EX 3 = a + 3a2 + a3 . (A.64)
Using from (A.8) that for λ > 0 and a r.v. Y we have P(Y ≥ t) ≤
e−λt E exp λY and P(Y ≤ t) ≤ eλt E exp(−λY ), and optimizing over λ we
get that for t > 1 we have

P(X ≥ at) ≤ exp(−a(t log t − t − 1))

and
P(X ≤ a/t) ≤ exp(−a(t log t − t − 1)) .
In particular we have
 
a
P(|X − a| ≥ a/2) ≤ exp − . (A.65)
L

Of course, such an inequality holds for any constant instead of 1/2.


If X1 , X2 are independent Poisson r.v.s, X1 + X2 is a Poisson r.v. The
following lemma prove a less known remarkable property of these variables.
Lemma A.10.1. Consider a Poisson r.v X and i.i.d. r.v.s (δi )i≥1 such that
P(δi = 1) = δ, P(δi = 0) = 1 − δ for a certain number δ. Then the r.v.s
 
X1 = δi ; X 2 = (1 − δi )
i≤X i≤X

are independent Poisson r.v.s, of expectation respectively δEX and (1−δ)EX.


In this lemma we “split X in two pieces”. In a similar manner, we can split
X in any number of pieces.
Proof. We compute
A.10 Poisson Random Variables and Point Processes 455
   
E exp(λX1 + μX2 ) = E exp λ δi + μ (1 − δi )
i≤X i≤X
 ak    
= e−a E exp λ δi + μ (1 − δi )
k!
k≥0 i≤k i≤k
 ak  k
= e−a E exp(λδi + μ(1 − δ1 ))
k!
k≥0
 ak
= e−a (δeλ + (1 − δ)eμ )k
k!
k≥0

= exp a(δeλ + (1 − δ)eμ − 1)


= exp aδ(e−λ − 1) exp a(1 − δ)(e−μ − 1)
= E exp(λY1 + μY2 ) ,

where Y1 and Y2 are independent Poisson r.v.s with expectation respectively


δ and 1 − δ. 

Consider a positive measure μ of finite total mass |μ| (say on R3 ), and
assume for simplicity that μ has no atoms. A Poisson point process of intensity
measure μ is a random finite subset Π = Πμ with the following properties:
1. card Π is a Poisson r.v. of expectation |μ|.
2. Given that card Π = k, Π is distributed like the set {X1 , . . . , Xk } where
X1 , . . . , Xk are i.i.d. r.v.s of law μ/|μ|.
(Some inessential complications occur when μ has atoms, and one has to
count points of the Poisson point process “with their order of multiplicity”.)
We list without proof some of the main properties of Poisson point processes.
(The proofs are all very easy.)
Given two disjoint Borel sets, A, B, Π ∩ A and Π ∩ B are independent
Poisson point processes.
Given two finite measures μ1 , μ2 , if Πμ1 and Πμ2 are independent Poisson
point processes of intensity measure μ1 and μ2 respectively, then Πμ1 ∪ Πμ2
is a Poisson point process of intensity measure μ1 + μ2 .
Given a (continuous) map ϕ, ϕ(Π) is a Poisson point process of intensity
measure ϕ(μ), the image measure of the intensity measure μ of Π by ϕ.
Consider a positive measure μ and a Poisson point process Πμ of intensity
measure μ. If ν is a probability (say on R3 ), and (Uα )α≥1 are i.i.d. r.v.s of
law ν, we can construct a Poisson point process of intensity measure μ ⊗ ν
as follows. We number in a random order the points of Π as x1 , . . . , xk , and
we consider the couples (x1 , U1 ), . . . , (xk , Uk ).
Consider now a positive measure μ on R+ . We do not assume that μ is
finite, but we assume that μ([a, ∞)) is finite for each a ≥ 0. We denote by
μ0 the restriction of μ to [1, ∞), by μk its restriction to [2−k , 2−k+1 [, k ≥ 1.
Consider for k ≥ 0 a Poisson point process Πk of intensity measure μk , and
456 A. Appendix: Elements of Probability Theory

assume that these are independent. We can define a Poisson point process of
intensity measure μ as Π = ∪k≥0 Πk . Then for each a, Π ∩ [a, ∞) is a Poisson
point process, the intensity measure of which is the restriction of μ to [a, ∞).

A.11 Distances Between Probability Measures


The set M1 (X ) of probability measures on a compact metric space (X , d) is
provided with
 a natural topology, the weakest topology that makes all the
maps μ → f (x)dμ(x) continuous, where f ∈ C(X ), the space of continu-
ous functions on X . For this topology the set M1 (X ) is a compact metric
space. The compactness is basically obvious if one knows the fundamental
Riesz representation theorem. This theorem identifies M1 (X ) with the set of
positive linear functionals Φ on C(X ) that have the property that Φ(1) = 1
where the function 1 is the function that takes the value 1 at every point.
The so-called Monge-Kantorovich transportation-cost distance on M1 (X )
is particularly useful. Given a compact metric space (X , d), and two proba-
bility measures μ1 and μ2 on X , their transportation-cost distance is defined
as
d(μ1 , μ2 ) = inf E d(X1 , X2 ) , (A.66)
where the infimum is taken over all pairs (X1 , X2 ) of r.v.s such that the law
of Xj is μj for j = 1, 2. Equivalently,

d(μ1 , μ2 ) = inf d(x1 , x2 )dθ(x1 , x2 ) ,

where the infimum is over all probability measures θ on X 2 with marginals


μ1 and μ2 respectively. It is not immediately clear that the formula (A.66)
defines a distance. This is however obvious due to the (fundamental) “duality
formula”
 
d(μ1 , μ2 ) = sup f (x) dμ1 (x) − f (x) dμ2 (x) (A.67)

where the supremum is taken over all functions f from X to R with Lipschitz
constant 1, i.e. that satisfy |f (x) − f (y)| ≤ d(x, y) for all x, y in X . The
classical formula (A.67) is a simple consequence of the Hahn-Banach theorem.
We will not use it in any essential way, so we refer the reader to Lemma A.11.1
below for the complete proof of a similar result.
Another proof that d is a distance uses the classical notion of disintegra-
tion of measures (or, equivalently of conditional probability), and we sketch
it now. Consider a probability measure θ on X 2 with marginals μ1 and μ2
respectively. Then there exists a (Borel measurable) family of probability
measures θx on X such that for any continuous function h on X 2 we have
 
hdθ = h(x, y)dθx (y) dμ1 (x) . (A.68)
A.11 Distances Between Probability Measures 457

Consider another probability measure θ on X 2 with marginals μ1 and μ3


respectively, and a family of probability measures θx on X such that for any
continuous function h on X 2 we have
 
hdθ = h(x, y)dθx (y) dμ1 (x) . (A.69)

Consider then the probability measure θ on X 2 such that for any continuous
function we have
 
hdθ = h(y, z)dθx (y)dθx (z) dμ1 (x) . (A.70)

Using (A.70) in the case where h(y, z) = f (y) in the first line and (A.68) in
the third line we get that
 
f (y)dθ (y, z) = f (y)dθx (y)dθx (z) dμ1 (x)
 
= f (y)dθx (y) dμ1 (x)

= f (y)dθ(x, y) = f (y)dμ2 (y) ,

using in the last inequality that μ2 is the second marginal of θ. This proves
that the first marginal of θ is μ2 , and similarly, its second marginal is μ3 .
Using the triangle inequality
d(y, z) ≤ d(y, x) + d(x, z) ,
and using (A.70), (A.69) and (A.68) we obtain

d(y, z)dθ (y, z) ≤ d(x, y)dθ(x, y) + d(x, z)dθ (x, z) ,

and in this manner we can easily complete the proof that d is a distance on
M1 (X ).
The topology defined by the distance d is the weak topology on M1 (X ).
To see this we observe first that the weak topology is also the weakest topol-
ogy that makes all the maps μ → f (x)dμ(x) where f is a Lipschitz function
on X with Lipschitz constant ≤ 1. This is simply because the linear span of
the classes of such functions is dense in C(X ) for the uniform norm. Therefore
the weak topology is weaker than the topology defined by d. To see that it is
also stronger we note that in (A.67) we can also take the supremum on the
class of Lipschitz functions that take the value 0 at a given point of X . This
class is compact for the supremum norm. Therefore given ε > 0 there is a
finite class F of Lipschitz functions on X such that
 
 

d(μ1 , μ2 ) ≤ ε + sup  f (x) dμ1 (x) − f (x) dμ2 (x) . (A.71)
F
458 A. Appendix: Elements of Probability Theory

Given two probability measures (μ, ν) on a compact metric space (X , d)


we consider the quantity

Δ(μ, ν) = inf Ed(X, Y )2 , (A.72)

where the infimum is taken over all pairs of r.v.s (X, Y ) with laws μ and
ν respectively. The quantity Δ1/2 (μ, ν) is a distance, called Wasserstein’s
distance between μ and ν. This is not obvious from the definition, but can be
proved following the scheme we outlined in the case of the Monge-Kantorovich
transportation-cost distance (A.66). It also follows from the duality formula
given in Lemma A.11.1 below. Of course Wasserstein’s distance is a close
cousin of the transportation-cost distance, simply we replace the “linear”
measure of the “cost of transportation” by a “quadratic measure” of this
cost.
Denoting by D the diameter of X , i.e.

D = sup{d(x, y) ; x , y ∈ X } ,

for any two r.v.s X and Y we have the inequalities

(Ed(X, Y ))2 ≤ Ed(X, Y )2 ≤ DEd(X, Y ) ,

so that
d(μ, ν)2 ≤ Δ(μ, ν) ≤ Dd(ν, μ) .
Consequently the topology induced by Wasserstein distance on M1 (X ) also
coincides with the weak topology. Let us note in particular from (A.71) that,
given a number ε > 0 there exists a finite set F of continuous functions on
X such that
 
 

Δ(μ1 , μ2 ) ≤ ε + sup  f (x) dμ1 (x) − f (x) dμ2 (x) . (A.73)
F

The following is the “duality formula” for Wasserstein’s distance.


Lemma A.11.1. If μ and ν are two probability measures on X , then
"
Δ(μ, ν) = sup f dμ + g dν ; f, g continuous,
%
∀ x, y ∈ X , f (x) + g(y) ≤ d(x, y)2 . (A.74)

Proof. If f and g are continuous functions such that

∀ x, y ∈ X , f (x) + g(y) ≤ d(x, y)2 ,

then for each pair (X, Y ) of r.v.s valued in X we have Ef (X) + Eg(Y
 ) ≤
Ed(X, Y )2 , so that if X has law μ and Y has law ν we have f dμ + g dν ≤
A.11 Distances Between Probability Measures 459

Ed(X,
 Y )2 . Taking the infimum over all choices of X and Y we see that
f dμ+ g dν ≤ Δ(μ, ν). Therefore if a denotes the right-hand side of (A.74),
we have proved that a ≤ Δ(μ, ν), and we turn to the proof of the converse.
We consider the subset S of the set C(X × X ) of continuous functions on
X × X that consists of the functions w(x, y) such that there exists continuous
functions f and g on X for which

f dμ + g dν = a (A.75)

and
∀ x, y ∈ X , w(x, y) > f (x) + g(y) − d(x, y)2 . (A.76)
It follows from the definition of a that for each function w in S there exist
x and y with w(x, y) > 0. Since S is convex and open, the Hahn-Banach
separation theorem asserts that we can find a linear functional Φ on C(X ×X )
such that Φ(w) > 0 for each w in S. If w ∈ S and w ≥ 0 it follows from the
definition of S that w + λw ∈ S, so that Φ(w + λw ) > 0. Thus Φ(w ) ≥ 0,
i.e. Φ is positive, it is a positive measure on X × X . Since it is a matter of
normalization, we can assume that it is a probability, which we denote by θ.
If f and g are as in (A.75), then for each ε > 0 we see by (A.76) that the
function w(x, y) = f (x) + g(y) − d(x − y)2 + ε belongs to S and thus

d(x, y)2 dθ(x, y) ≤ (f (x) + g(y)) dθ(x, y) . (A.77)



Now this holds true if we replace f by f + f where f dμ = 0. Thus this
latter condition must imply that f (x) dθ(x, y) = 0. It follows
 that if θ1 is
the first marginal of θ then f (x)dθ1 (x) = 0 whenever f dμ = 0. Using
this for f (x) = f (x) − f dμ where f is any continuous function, we see that
θ1 = μ, i.e. μ is the first marginal of θ. Similarly, ν is the second marginal of
θ so that
(f (x) + g(y)) dθ(x, y) = f dμ + g dν = a ,

and (A.77) then implies that d(x, y)2 dθ(x, y) ≤ a. A pair (X, Y ) of r.v.s of
joint law θ then witnesses that Δ(μ, ν) ≤ a. 
The previous distances must not be confused with the total variation
distance given by
"  %
 
 
μ − ν = sup  f dμ(x) − f dν(x) ; |f | ≤ 1 . (A.78)

The total variation distance induces the weak topology on M1 (X ) only when
X is finite. When this is the case, we have

μ − ν = |μ({x}) − ν({x})| . (A.79)
x∈X
460 A. Appendix: Elements of Probability Theory

Exercise A.11.2. Prove that μ − ν = 2Δ(μ, ν), where Δ1/2 (μ, ν) is


Wasserstein’s distance when X is provided with the distance d given by
d(x, x) = 0 and d(x, y) = 1 when x = y.
When X is a metric space, that is not necessarily compact, the formulas
(A.66) and (A.72) still make sense, although the infimum might be infinite.
The corresponding “distances” still satisfy the triangle inequality.

A.12 The Paley-Zygmund Inequality


This simple (yet important) argument is also known as the second moment
method. It goes back to the work of Paley and Zygmund on trigonometric
series.
Proposition A.12.1. Consider a r.v. X ≥ 0. Then
 
1 1 (EX)2
P X ≥ EX ≥ . (A.80)
2 4 EX 2
Proof. If A = {X ≥ EX/2}, then, since X ≤ EX/2 on the complement Ac
of A, we have
1
EX = E(X1A ) + E(X1Ac ) ≤ E(X1A ) + EX .
2
Thus, using the Cauchy-Schwarz inequality,
1
EX ≤ E(X1A ) ≤ (EX 2 )1/2 P(A)1/2 . 

2

A.13 Differential Inequalities


We will often meet simple differential inequalities, and it is worth to learn how
to handle them. The following is a form of the classical Gronwall’s lemma.
Lemma A.13.1. If a function ϕ ≥ 0 satisfies
|ϕr (t)| ≤ c1 ϕ(t) + c2
for 0 < t < 1, where c1 , c2 ≥ 0 and where ϕr is the right-derivative of ϕ, then
 
c2
ϕ(t) ≤ exp(c1 t) ϕ(0) + . (A.81)
c1
Proof. We note that
    
 c2 
 c2
 ϕ(t) +  ≤ c 1 ϕ(t) + ,
 c1 r  c1
so that  
c2 c2
ϕ(t) + ≤ exp(c1 t) ϕ(0) + . 

c1 c1
A.14 The Latala-Guerra Lemma 461

A.14 The Latala-Guerra Lemma

In this section we prove Proposition 1.3.8. We present the proof due to F.


Guerra. M. Yor noticed that essentially the same proof gives a more general
result that is probably less mysterious.
Proposition A.14.1. Consider an increasing bounded function ϕ(y), that
√ ϕ(−y) = −ϕ(y) and ϕ (y) < 0 for y > 0. Then the function Ψ (x) =
satisfies
Eϕ(z x + h)2 /x is strictly decreasing on R+ and vanishes as x → ∞.
Proof. To prove that the function Ψ is strictly decreasing, working
√ condi-
tionally on h, we can assume that h is a number. We set Y = z x + h. We
have
 √ 
x2 Ψ (x) = E z x ϕ (Y )ϕ(Y ) − ϕ(Y )2
 
= E ϕ(Y )(Y ϕ (Y ) − ϕ(Y )) − hE ϕ (Y )ϕ(Y ) .

The reader should observe here how tricky we have been: we resisted the
temptation to use Gaussian integration by parts.
To study ϕ, we note first that ϕ(0) = 0 since ϕ is odd, so that since
ϕ is increasing ϕ(y) > 0 for y > 0 and ϕ(y) < 0 for y < 0. The function
ψ(y) = yϕ (y) − ϕ(y) satisfies ψ(0) = 0 and ψ (y) = yϕ (y). Thus ψ (y) < 0
for y = 0 and thus ψ(y) < 0 for y > 0 and ψ(y) > 0 for y < 0. Therefore
ϕ(y)(yϕ (y) − ϕ(y)) = ϕ(y)ψ(y) < 0 for y = 0 and hence
 
E ϕ(Y )(Y ϕ (Y ) − ϕ(Y )) < 0 .

Consequently all we have to prove is that hE ϕ(Y )ϕ (Y ) ≥ 0. We start by


writing that
1 √ √
ϕ(z x + h)ϕ (z x + h) e−z /2 dz .
2
E ϕ(Y )ϕ (Y ) = √ (A.82)

Now comes the beautiful trick. We make the change of variable
y−h
z= √ ,
x

so that y = z x + h and

1 √ √
ϕ(z x + h)ϕ (z x + h) e−z /2 dz
2


 2 
1 y hy h2
= √ ϕ(y)ϕ (y) exp − + − dy . (A.83)
2πx 2x x 2x

Making the change of variable y = −y we get


462 A. Appendix: Elements of Probability Theory

1 √ √
ϕ(z x + h)ϕ (z x + h) e−z /2 dz
2


 2 
1 y hy h2
= −√ ϕ(y)ϕ (y) exp − − − dy . (A.84)
2πx 2x x 2x

Recalling (A.82) and adding (A.83) and (A.84) we get


 2 
1 y h2
hE ϕ(Y )ϕ (Y ) = √ ϕ(y)ϕ (y)hsh(hy/x) exp − − dy ≥ 0 ,
2πx 2x 2x

because ϕ (y) ≥ 0 and hϕ(y)sh(hy/x) ≥ 0. This proves that Ψ is strictly


decreasing. 


A.15 Proof of Theorem 3.1.4

Proof. We start with N = 1 and then perform


 induction
 on the dimension
N . By homogeneity, we may assume that U dx = V dx = 1 and by ap-
proximation that U and V are continuous with strictly positive values. Define
x, y : ]0, 1[→ R by
x(t) y(t)
U (q)dq = t , V (q)dq = t .
−∞ −∞

Therefore x and y are increasing and differentiable and

x (t)U (x(t)) = y (t)V (y(t)) = 1 .

Set z(t) = sx(t) + (1 − s)y(t), t ∈]0, 1[. By the arithmetic-geometric mean


inequality, for every t,

z (t) = sx (t) + (1 − s)y (t) ≥ (x (t))s (y (t))1−s . (A.85)

Now, since z is injective, by the hypothesis (3.11) on W and (A.85),


1
W dx ≥ W (z(t))z (t)dt
0
1
≥ U (x(t))s V (y(t))1−s (x (t))s (y (t))1−s dt
0
1
= [U (x(t))x (t)]s [V (y(t))y (t)]1−s dt
0
=1.

This proves the case N = 1. It is then easy to deduce the general case by
induction on N as follows. Suppose N > 1 and assume that the functional
A.15 Proof of Theorem 3.1.4 463

version of the Brunn-Minkowski theorem holds in RN −1 . Let U, V, W be non-


negative measurable functions on RN satisfying (3.11) for some s ∈ [0, 1].
Let q ∈ R be fixed and define Uq : RN −1 → [0, ∞[ by Uq (x) = U (x, q) and
similarly for Vq and Wq . Clearly, if q = sq0 + (1 − s)q1 , q0 , q1 ∈ R,

Wq (sx + (1 − s)y) ≥ Uq0 (x)s Vq1 (y)1−s

for all x, y ∈ RN −1 . Therefore, by the induction hypothesis,


 s  1−s
Wq (x)dx ≥ Uq0 (x)dx Vq1 (x)dx . (A.86)
RN −1 RN −1 RN −1

Let us define W ∗ (q) = RN −1
Wq (x)dx, and U ∗ (q), V ∗ (q) similarly. We see
from (A.86) that

W ∗ (sq0 + (1 − s)q1 ) ≥ U ∗ (q0 )s V ∗ (q1 )1−s ,

so applying the one-dimensional case shows that


 s  1−s
W ∗ (q)dq ≥ U ∗ (q)dq V ∗ (q)dq .
R R R

Since  
W (x)dx = Wq (x)dx dq = W ∗ (q)dq ,
RN R RN −1 R

and similarly for U ∗ and V ∗ this is the desired result. Theorem 3.1.4 is
established. 

References

1. Aldous D. (1992) Asymptotics in the random assignment problem. Probab.


Theory Related Fields 93, no. 4, pp. 507–534.
2. Aldous D. (2001) The ζ(2) limit in the random assignment problem. Random
Structures and Algorithms 18, no. 4, pp. 381–418.
3. Amit D.J., Gutfreund H., Sompolinsky H. (1987) Statistical mechanics of neu-
ral networks near saturation. Annals of Physics 173, pp. 30–67.
4. Aizenman M., Lebowitz J.L., Ruelle D. (1987) Some rigorous results on the
Sherrington-Kirkpatrick model. Comm. Math. Phys. 112, pp. 3–20.
5. Almeida J.R.L., Thouless D.J. (1978) Stability of the Sherrington-Kirkpatrick
solution of a spin glass model. J. Phys. A: Math. Gen. II, pp. 983–990.
6. Albeverio S., Tirozzi B., Zegarlinski B. (1992) Rigorous results for the free
energy in the Hopfield model. Comm. Math. Phys. 150, no. 2, pp. 337–373.
7. Barbina X., Márquez-Carreras D., Rovira C., Tindel S. (2004) Higher order
expansions for the overlap of the SK model, Seminar on Stochastic Anal-
ysis, Random Fields and Applications IV, pp. 21–43, Progr. Probab., 58,
Birkhäuser, Basel, 2004.
8. Bayati M., Gamarnik D., Tetali P. (2009) A combinatorial approach to
Guerra’s interpolation method. Manuscript.
9. Bianchi A., Contucci P., Giardina C. (2003) Thermodynamic Limit for Mean
Field Spin Models. Math. Phys. El. Jour. 9, n. 6, pp. 1–15.
10. Billingsley, P. (1995) Probability and Measure. Second edition. Wiley Series
in Probability and Mathematical Statistics: Probability and Mathematical
Statistics. John Wiley & Sons, Inc., New York. xiv+622 pp.
11. Bobkov S.G., Ledoux M. (2000) From Brunn-Minkowski to Brascamp-Lieb
and to logarithmic Sobolev inequalities. Geom. Funct. Anal. 10, no. 5, pp.
1028–1052.
12. Bollobás B. (2001) Random graphs. Second edition. Cambridge Studies in
Advanced Mathematics 73, Cambridge University Press, Cambridge, 2001,
xviii+498 pp.
13. Bolina O., Wreszinski W.F. (2004) A Self Averaging “Order Parameter” for
the Sherrington-Kirkpatrick Spin Glass Model. Journal Stat. Phys. 116, pp.
1389–1404.
14. Borell C. (1984) On polynomial chaos and integrability. Probab. Math. Statist.
3, no. 2, pp. 191–203.
15. Bouten M. (1988) Replica symmetry instability in perceptron models. Com-
ment on: “Optimal storage properties of neural network models”, J. Phys.
A 21, no. 1, pp. 271–284, by E. Gardner and B. Derrida. With a reply by
Derrida. J. Phys. A 27 (1994), no. 17, pp. 6021-6025.
16. Bovier A. (1994) Self-averaging in a class of generalized Hopfield models. J.
Phys. A 27, no. 21, pp. 7069–7077.

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 465
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
466 References

17. Bovier A. (1997) Comment on “capacity of Hopfield model”. J. Phys. A 30,


pp. 7993–7996.
18. Bovier A. (1999) Sharp upper bounds on perfect retrieval in the Hopfield
model. J. Appl. Probab. 36, no. 3, pp. 941–950.
19. Bovier A. (2006) Statistical mechanics of disordered systems. A mathematical
perspective. Cambridge Series in Statistical and Probabilistic Mathematics.
Cambridge University Press, Cambridge. xiv+312 pp.
20. Bovier A., van Enter A.C.D., Niederhauser B. (1999) Stochastic symmetry-
breaking in a Gaussian Hopfield model. J. Statist. Phys. 95, no. 1-2, pp.
181–213.
21. Bovier A., Gayrard V. (1992) Rigorous bounds on the storage capacity of the
dilute Hopfield model. J. Statist. Phys. 69, no. 3-4, pp. 597–627.
22. Bovier A., Gayrard V. (1993) Rigorous results on the thermodynamics of the
dilute Hopfield model. J. Statist. Phys. 72, no. 1-2, pp. 79–112.
23. Bovier A., Gayrard V. (1993) Lower bounds on the memory capacity of the di-
lute Hopfield model. Cellular automata and cooperative systems (Les Houches,
1992), pp. 55–66. NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 396, Kluwer
Acad. Publ., Dordrecht.
24. Bovier A., Gayrard V. (1993) The Hopfield model on random graphs: lower
bounds on the storage capacity. Dynamics of complex and irregular systems
(Bielefeld, 1991), pp. 277–288. Bielefeld Encount. Math. Phys. VIII, World
Sci. Publishing, River Edge, NJ.
25. Bovier A., Gayrard V. (1996) An almost sure large deviation principle for the
Hopfield model. Ann. Probab. 24, no. 3, pp. 1444–1475.
26. Bovier A., Gayrard V. (1997) The retrieval phase of the Hopfield model: a
rigorous analysis of the overlap distribution. Probab. Theory Relat. Fields 107,
no. 1, pp. 61–98.
27. Bovier A., Gayrard V. (1997) An almost sure central limit theorem for the
Hopfield model. Markov Process. Related Fields 3, no. 2, pp. 151–173.
28. Bovier A., Gayrard V. (1998) Hopfield models as generalized random mean
field models. Mathematical aspects of spin glasses and neural networks. Progr.
Probab. 41, pp. 3–89. Birkhäuser Boston, Inc., Boston, MA.
29. Bovier A., Gayrard V. (1998) Metastates in the Hopfield model in the replica
symmetric regime. Math. Phys. Anal. Geom. 1, no. 2, pp. 107–144.
30. Bovier A., Mason D.M. (2001) Extreme value behavior in the Hopfield model.
Ann. Appl. Probab. 11, no. 1, pp. 91–120.
31. Bovier A., Niederhauser B. (2001) The spin-glass phase-transition in the Hop-
field model with p-spin interactions. Adv. Theor. Math. Phys. 5, no. 6, pp.
1001–1046.
32. Bovier A., Gayrard V., Picco P. (1994) Gibbs states for the Hopfield model in
the regime of perfect memory. Prob. Theory Relat. Fields 100, pp. 1329–363.
33. Bovier A., Gayrard V., Picco P. (1995) Gibbs states for the Hopfield model
with extensively many patterns. J. Stat. Phys. 79, pp. 395–414.
34. Bovier A., Gayrard V., Picco P. (1995) Gibbs states of the Hopfield model
with extensively many patterns. J. Statist. Phys. 79, no. 1-2, pp. 395–414.
35. Bovier A., Gayrard V., Picco P. (1995) Large deviation principles for the
Hopfield model and the Kac-Hopfield model. Probab. Theory Relat. Fields
101, no. 4, pp. 511–546.
36. Bovier A., Picco P. (editors) (1997) Mathematical aspects of spin glasses and
Neural networks. Progress in Probability 41, Birkhäuser, Boston.
37. Brascamp H., Lieb E. (1976) On extension of the Brunn-Minkowski and the
Prékopa-Leindler Theorems, including inequalities for log-concave functions,
References 467

and with an application to the diffusion equation, J. Funct. Anal. 22, pp.
366–389.
38. Canisius J., van Enter A.C.D., van Hemmen J.L. (1983) On a classical spin
glass model. Z. Phys. B 50, no. 4, pp. 311–336.
39. Carmona P., Hu Y. (2006) Universality in Sherrington-Kirkpatrick’s spin glass
model. Ann. Inst. H. Poincaré Probab. Statist. 42, no. 2, pp. 215–222.
40. Carvalho S., Tindel S. (2007) On the multiple overlap function of the SK
model. Publicacions Matematiques 51, pp. 163–199.
41. Carvalho S., Tindel S. (2007) A central limit theorem for a localized version
of the SK model. Potential Analysis 26, pp. 323–343.
42. Catoni O. (1996) The Legendre transform of two replicas of the Sherrington-
Kirkpatrick spin glass model. A free energy inequality. Probab. Theory Relat.
Fields 105, no. 3, pp. 369–392.
43. Cavagna A., Giardina I., Parisi G. (1997) Structure of metastable states in
spin glasses by means of a three replica potential. J. Phys. A: Math. and Gen.
30, no. 13, pp. 4449–4466.
44. Chatterjee S. (2007) Estimation in spin glasses: a first step. Ann. Statist. 35,
no. 5, pp. 1931–1946.
45. Chatterjee S. (2010) Spin glasses and Stein’s method. Probab. Theory Relat.
Fields 148, pp. 567–600.
46. Chatterjee S., Crawford N. (2009) Central limit theorems for the energy den-
sity in the Sherrington-Kirkpatrick model. J. Stat. Phys. 137, no. 4 pp. 639–
666.
47. Comets F. (1996) A spherical bound for the Sherrington-Kirkpatrick model.
Hommage à P.A. Meyer et J. Neveu. Astérisque 236, pp. 103–108.
48. Comets F. (1998) The martingale method for mean-field disordered systems at
high temperature. Mathematical aspects of spin glasses and neural networks.
Progr. Probab. 41, pp. 91–113. Birkhäuser Boston, Inc., Boston, MA.
49. Comets F., Neveu J. (1995) The Sherrington-Kirkpatrick model of spin glasses
and stochastic calculus: the high temperature case. Comm. Math. Phys. 166,
no. 3, pp. 549–564.
50. Crawford N. (2008) The intereaction between multioverlap in the high tem-
perature phase of the Sherrington-Kirkpatrick spin glass. J. Math. Phys. 49,
125201 (24 pages).
51. Dembo A., Montanari A. (2010) Gibbs measures and phase transitions on
sparse random graphs. Braz. J. Probab. Stat. 24, no. 2, pp. 137–211.
52. Derrida B., Gardner E. (1988) Optimal storage properties of neural network
models. J. Phys. A 21, pp. 271–284.
53. Derrida B., Gardner E. (1989) Three unfinished works on the optimal storage
capacity of networks. Special issue in memory of Elizabeth Gardner, pp. 1957-
1988. J. Phys. A 22, no. 12, pp. 1983–1994.
54. Ellis R.S. (1985) Entropy, large Deviations and Statistical Mechanics
Grundlehren der Mathematischen Wissenschaften 271. Springer-Verlag, New
York. xiv+364 pp.
55. Feng J., Shcherbina M., Tirozzi B. (2001) On the critical capacity of the
Hopfield model. Comm. Math. Phys. 216, no. 1, pp. 139–177.
56. Feng J., Shcherbina M., Tirozzi B. (2001) On the critical capacity of the
Hopfield model. Comm. Math. Phys. 216, no. 1, pp. 139-177.
57. Feng J., Tirozzi B. (1995) The SLLN for the free-energy of a class of neural
networks. Helv. Phys. Acta 68, no. 4, pp. 365–379.
58. Feng J., Tirozzi B. (1997) Capacity of the Hopfield model. J. Phys. A 30,
no. 10, pp. 3383–3391.
468 References

59. Fischer K.H., Hertz J., (1991) Spin glasses. Cambridge Studies in Magnetism,
1. Cambridge University Press, Cambridge, 1991. x+408 pp.
60. Franz S., Leone M. (2003) Replica bounds for optimization problems and
diluted spin systems. J. Statist. Phys. 111, no. 3-4, pp. 535–564.
61. Fröhlich J., Zegarlinski B. (1987) Some comments on the Sherrington-
Kirkpatrick model of spin glasses. Comm. Math. Phys. 112, pp. 553–566.
62. Gamarnik, D. (2004) Linear phase transition in random linear constraint sat-
isfaction problems. Probab. Theory Relat. Fields 129, pp. 410–440.
63. Gardner E. (1988) The space of interactions in neural network models. J.
Phys. A 21, pp. 257–270.
64. Gentz B. (1996) An almost sure central limit theorem for the overlap param-
eters in the Hopfield model. Stochastic Process. Appl. 62, no. 2, pp. 243–262.
65. Gentz B. (1996) A central limit theorem for the overlap in the Hopfield model.
Ann. Probab. 62, no. 4, pp. 1809–1841.
66. Gentz B. (1998) On the central limit theorem for the overlap in the Hop-
field model. Mathematical aspects of spin glasses and neural networks Progr.
Probab. 41, pp. 115–149, Birkhäuser Boston, Boston, MA.
67. Gentz B., Löwe M. (1999) The fluctuations of the overlap in the Hopfield
model with finitely many patterns at the critical temperature. Probab. Theory
Relat. Fields 115, no. 3, pp. 357–381.
68. Gentz B., Löwe M. (1999) Fluctuations in the Hopfield model at the critical
temperature. Markov Process. Related Fields 5, no. 4, pp. 423–449.
69. Guerra F. (1995) Fluctuations and thermodynamic variables in mean field
spin glass models. Stochastic Processes, Physics and Geometry, S. Albeverio
et al. editors, World Scientific, Singapore.
70. Guerra F. (1996) About the overlap distribution in mean field spin glass mod-
els. International Journal of Modern Physics B 10, pp. 1675–1684.
71. Guerra F. (2001) Sum rules for the free energy in the mean field spin glass
model. Fields Institute Communications 30, pp. 161–170.
72. Guerra F. (2005) Mathematical aspects of mean field spin glass theory. Euro-
pean Congress of Mathematics, pp. 719–732, Eur. Math. Soc., Zürich, 2005.
73. Guerra F., Toninelli F.L. (2002) Quadratic replica coupling for the
Sherrington-Kirkpatrick mean field spin glass model. J. Math. Phys. 43, no. 7,
pp. 3704–3716.
74. Guerra F., Toninelli F.L. (2002) Central limit theorem for fluctuations in the
high temperature region of the Sherrington-Kirkpatrick spin glass model. J.
Math. Phys. 43, no. 12, pp. 6224–6237.
75. Guerra F., Toninelli F.L. (2002) The Thermodynamic Limit in Mean Field
Spin Glass Models. Commun. Math. Phys. 230, pp. 71–79.
76. Guerra F., Toninelli F.L. (2003) The Infinite Volume Limit in Generalized
Mean Field Disordered Models. Markov Process. Related Fields 9, no. 2, pp.
195–207.
77. Guerra F., Toninelli F.L. (2003) Infinite volume limit and spontaneous replica
symmetry breaking in mean field spin glass models. Ann. Henri Poincaré 4,
suppl. 1, S441–S444.
78. Guerra F., Toninelli F.L. (2004) The high temperature region of the Viana-
Bray diluted spin glass model. J. Statist. Phys. 115, no. 1-2, pp. 531–555.
79. Hanen A. (2007) Un théorème limite pour les covariances des spins dans le
modèle de Sherrington-Kirkpatrick avec champ externe. Ann. Probab. 35,
no. 1, pp. 141–179.
80. Hanen A. (2008) A limit theorem for mean magnetization in the Sherrington-
Kirkpatrick model with an external field. To appear.
References 469

81. van Hemmen J.L., Palmer R.G. (1979) The replica method and a solvable spin
glass system. J. Phys. A 12, no. 4, pp. 563–580.
82. van Hemmen J.L., Palmer R.G. (1982) The thermodynamic limit and the
replica method for short-range random systems. J. Phys. A 15, no. 12, pp.
3881–3890.
83. Hertz J., Krogh A., Palmer R.C. (1991) Introduction to the theory of neural
computation. Santa Fe Institute Studies in the Sciences of Complexity. Lec-
ture Notes, I. Addison-Wesley Publishing Company, Advanced Book Program,
Redwood City. xxii+327 pp.
84. Hopfield J.J. (1982) Neural networks and physical systems with emergent
collective computational abilities. Proc. Natl. Acad. Sci. USA 79, pp. 1554–
2558.
85. Hopfield J.J. (1982) Neural networks and physical systems with emergent
collective computational abilities. Proc. Natl. Acad. Sci. USA 81, pp. 3088.
86. Ibragimov A., Sudakov V.N., Tsirelson B.S. (1976) Norms of Gaussian sample
functions. Proceedings of the Third Japan USSR Symposium on Probability
theory. Lecture Notes in Math. 550, Springer Verlag, pp. 20–41.
87. Kahane J.-P. (1986) Une inégalité du type de Slepian et Gordon sur les pro-
cessus gaussiens. Israel J. Math. 55, no. 1, pp. 109–110.
88. Kösters H. (2006) Fluctuations of the free energy in the diluted SK-model.
Stochastic Process. Appl. 116, no. 9, pp. 1254-1268.
89. Kim J.H., Roche J.R. (1998) Covering cubes by random half cubes, with
applications to binary neural networks: rigorous combinatorial approaches.
Eight annual Workshop on Computational Learning Theory, Santa Cruz, 1995.
J. Comput. System Sci. 56, no. 2, pp. 223–252.
90. Krauth W., Mézard M. (1989) Storage capacity of memory network with bi-
nary couplings. J. Phys. 50, pp. 3057-3066.
91. Kurkova, I. (2005) Fluctuations of the free energy and overlaps in the high-
temperature p-spin SK and Hopfield models. Markov Process. Related Fields
11, no. 1, pp. 55–80.
92. Latala R. (2002) Exponential inequalities for the SK model of spin glasses,
extending Guerra’s method. Manuscript.
93. Ledoux M. (2001) The concentration of measure phenomenon. Mathematical
Surveys and Monographs 89, American Mathematical Society, Providence,
RI, x+181 pp.
94. Ledoux M. (2000) On the distribution of overlaps in the Sherrington-
Kirkpatrick spin glass model, J. Statist. Phys. 100, no. 5-6, pp. 871–892.
95. Ledoux M., Talagrand M. (1991) Probability in Banach Spaces. Springer-
Verlag, Berlin.
96. Linusson S., Wästlund J. (2004) A proof of Parisi’s conjecture on the random
assignment problem. Probab. Theory Relat. Fields 128, no. 3, pp. 419–440.
97. Loukianova D. (1997) Lower bounds on the restitution error of the Hopfield
model. Probab. Theory Relat. Fields 107, pp. 161–176.
98. Löwe M. (1998) On the storage capacity of Hopfield models with correlated
patterns. Ann. Appl. Probab. 8, no. 4, pp. 1216–1250.
99. Löwe M. (1999) The storage capacity of generalized Hopfield models with
semantically correlated patterns. Markov Process. Related Fields 5, no. 1, pp.
1–19.
100. Márquez-Carreras D., Rovira C., Tindel S. (2006) Asymptotic behavior of
the magnetization for the perceptron model. Ann. Inst. H. Poincaré Probab.
Statist. 42, no. 3, pp. 327–342.
101. Mézard M. (1988) The space of interactions in neural networks: Gardner’s
computation with the cavity method. J. Phys. A 22, pp. 2181–2190.
470 References

102. Mézard M., Montanari A. (2009) Information, Physics, and Computation.


Oxford Graduate Texts. Oxford University Press, xiv+569 pp.
103. Mézard M., Parisi G. (1985) Replicas and optimization. J. Physique Lett. 46,
L771.
104. Mézard M., Parisi G. (1986) Mean field equations for the matching and the
traveling salesman problem. Europhys. Lett. 916.
105. Mézard M., Parisi G., Virasoro M. (1987) Spin glass theory and beyond, World
Scientific, Singapore.
106. Milman V. (1988) The heritage of P. Lévy in functional analysis. Colloque P.
Lévy sur les processus stochastiques, Astérisque 157-158, pp. 273–301.
107. Milman V., Schechtman G. (1986) Asymptotic theory of finite dimensional
normed spaces. Lecture Notes in Math. 1200, Springer Verlag.
108. Monasson R., Zecchina R. (1997) Statistical mechanics of the random K-sat
model. Phys. Rev. E pp. 1357–1370.
109. Nair C., Prabhakar B., Sharma, M. (2005) Proofs of the Parisi and
Coppersmith-Sorkin random assignment conjectures. Random Structures Al-
gorithms 27 (2005), no. 4, pp. 413–444.
110. Newman C. (1988) Memory capacity in neural network models: rigorous lower
bounds. Neural Networks I, pp. 223–238.
111. Newman C., Stein D. (1993) Chaotic size dependence in spin glasses. Cellular
automata and cooperative systems (Les Houches, 1992). NATO Adv. Sci. Inst.
Ser. C Math. Phys. Sci 396, Kluwer Acad. Publ., Dordrecht.
112. Nishimori, H. (2001) Statistical physics of spin glasses and information pro-
cessing. An introduction. International Series of Monographs on Physics 111,
Oxford University Press, New York, xii+243 pp.
113. Newman C., Stein D. (1997) Spatial inhomogeneity and thermodynamic chaos.
Phys. Rev. Letters 76, pp. 4821–4824.
114. Panchenko, D. (2005) A central limit theorem for weighted averages of spins in
the high temperature region of the Sherrington-Kirkpatrick model. Electron.
J. Probab. 10, no. 14, pp. 499–524.
115. Panchenko, D., Talagrand, M. (2004) Bounds for diluted mean-fields spin glass
models. Probab. Theory Relat. Fields 130, no. 3, pp. 319–336.
116. Parisi G. (1992) Field theory, disorder, simulation. World Scientific Lecture
Notes in Physics 49, World Scientific Publishing Co., Inc., River Edge, NJ,
1992. vi+503 pp.
117. Parisi G. (1992) Statistical physics and spectral theory of disordered systems:
some recent developments. Mathematical Physics X, (Leipzig, 1991), pp. 70–
86, Springer, Berlin.
118. Pastur L.A., Figotin A. (1977) Exactly soluble model of a spin glass. Sov. J.
Low Temp. Phys. 3, pp. 378–383.
119. Pastur L.A., Figotin A. (1978) On the theory of disordered spin systems.
Theor. Math. Phys. 35, pp. 403–414.
120. Pastur L.A., Shcherbina M.V. (1991) Absence of self-averaging of the order
parameter in the Sherrington-Kirkpatrick model. J. Statist. Phys. 62, no. 1-2,
pp. 1–19.
121. Pastur L.A., Shcherbina M.V., Tirozzi B. (1994) The replica-symmetric solu-
tion without replica trick for the Hopfield model. J. Statist. Phys. 74, no. 5-6,
pp. 1161–1183.
122. Pastur L.A., Shcherbina M.V., Tirozzi B. (1999) On the replica-symmetric
equations for the Hopfield model. J. Math. Phys. 40, no. 8, pp. 3930–3947.
123. Petritris D. (1996) Equilibrium statistical mechanics of frustrated spin glasses:
a survey of mathematical results. Ann. Inst. H. Poincaré Phys. Théor. 64, pp.
255–288.
References 471

124. Pisier G. (1986) Probabilistic methods in the geometry of Banach spaces.


Probability and analysis (Varenna, 1985), pp. 167–241, Lecture Notes in Math.
1206, Springer, Berlin.
125. Ruelle, D. (1999) Statistical Mechanics: Rigorous Results. Reprint of the
1989 edition. World Scientific Publishing Co., Inc., River Edge, NJ,
xvi+219 pp.
126. Ruelle, D. (2004) Thermodynamic Formalism: The Mathematical Structure of
Equilibrium Statistical Mechanics. Second edition. Cambridge Mathematical
Library. Cambridge University Press, Cambridge. xx+174 pp.
127. Shcherbina M.V. (1991) More about absence of self averaging of the order
parameter in the Sherrington-Kirkpatrick model. CARR Reports in Mathe-
matical Physics no 3/91, Department of Mathematics, University of Rome “la
Sapienza”.
128. Shcherbina M.V. (1997) On the replica-symmetric solution for the
Sherrington-Kirkpatrick model. Helv. Phys. Acta 70, pp. 838–853.
129. Shcherbina M.V. (1999) Some estimates for the critical temperature of the
Sherrington-Kirkpatrick model with magnetic field. In: Mathematical Results
in Statistical Mechanics, World Scientific, Singapore, pp. 455–474.
130. Shcherbina M.V., Tirozzi B. (1994) The free energy of a class of Hopfield
models. J. Statist. Phys. 72, no. 1-2, pp. 113–125.
131. Shcherbina M.V., Tirozzi B. (1995) A perturbative expansion for the Hopfield
model. Helv. Phys. Acta 68, no. 5, pp. 470–491.
132. Shcherbina M.V., Tirozzi B. (2001) On the critical capacity of the Hopfield
model. Comm. Math. Phys. 216, no. 1, pp. 139–177.
133. Shcherbina M.V., Tirozzi B. (2003) On the rigorous solution of Gardner’s
problem. Comm. Math. Phys. 234, no. 3, pp. 383–422.
134. Shcherbina M.V., Tirozzi B. (2003) Central limit theorems for order param-
eters of the Gardner problem. Markov Process. Related Fields 9, no. 4, pp.
803–828.
135. Shcherbina M.V., Tirozzi B. (2005) Central limit theorems for the free energy
of the modified Gardner model. Markov Process. Related Fields 11, no. 1, pp.
133–144.
136. Sherrington D., Kirkpatrick S. (1972) Solvable model of a spin glass. Phys.
Rev. Lett. 35, pp. 1792–1796.
137. Slepian D. (1962) The one-side barrier for Gaussian white noise. Bell Systems
Tech. J. 41 pp. 463–501.
138. Talagrand M. (1987) Regularity of Gaussian processes. Acta. Math. 159, pp.
99–149.
139. Talagrand M. (1995) Concentration of measure and isoperimetric inequalities
in product spaces. Publ. Math. I.H.E.S. 81, pp. 73–205.
140. Talagrand M. (1996) A new look at independence. Ann. Probab. 24, pp. 1–34.
141. Talagrand M. (1998) The Sherrington-Kirkpatrick model: a challenge to math-
ematicians. Probab. Theory Relat. Fields 110, pp. 109–176.
142. Talagrand M. (1998) Rigorous results for the Hopfield model with many pat-
terns. Probab. Theory Relat. Fields 110, pp. 177–276.
143. Talagrand M. (1998) Huge random structures and mean field models for spin
glasses, Proceedings of the Berlin International Congress of mathematicians.
Documenta Math., Extra Vol. I, pp. 507–536.
144. Talagrand M. (1999) Intersecting random half cubes. Random Structures and
Algorithms 15, pp. 436–449.
145. Talagrand M. (1999) Self-averaging and the space of interactions in neural
networks. Random Structures and Algorithms 14, pp. 199–213.
472 References

146. Talagrand M. (2000) Verres de spin et optimisation combinatoire, Séminaire


Bourbaki, Vol. 1998-99. Astérisque 15, pp. 436–449.
147. Talagrand M. (2000) Rigorous low temperature results for the mean field p-
spin interaction model. Probab. Theor. Relat. Fields 117, pp. 303–360.
148. Talagrand M. (2000) Intersecting random half-spaces: towards the Derrida-
Gardner formula. Ann. Probab. 28, pp. 725–758.
149. Talagrand M. (2000) Exponential inequalities and replica-symmetry breaking
for the Sherrington-Kirkpatrick model. Ann. Probab. 28, pp. 1018–1062.
150. Talagrand M. (2000) Exponential inequalities and convergence of moments
in the replica-symmetric phase of the Hopfield model. Ann. Probab. 28, pp.
1393–1469.
151. Talagrand M. (2000) Large deviation principles and generalized Sherrington-
Kirkpatrick models. Ann. Fac. Sci. Toulouse Math. 9, pp. 203–244.
152. Talagrand M. (2000) Spin glasses: a new direction for probability theory?
Mathematics towards the third millennium (Rome, 1999). Atti Accad. Naz.
Lincei Cl. Sci. Fis. Mat. Natur. Rend. Lincei 9, Mat. Appl. Special Issue, pp.
127–146.
153. Talagrand M. (2001) The high temperature phase of the random K-sat prob-
lem. Probab. Theory Relat. Fields 119, pp. 187–212.
154. Talagrand M. (2001) The Hopfield model at the critical temperature. Probab.
Theory Relat. Fields 121, pp. 237–268.
155. Talagrand M. (2002) On the high temperature phase of the Sherrington-
Kirkpatrick model. Ann. Probab. 30, pp. 364–381.
156. Talagrand M. (2002) On the Gaussian perceptron at high temperature. Math.
Phys. Anal. Geom. 5, no. 1, pp. 77–99.
157. Talagrand M. (2003) Spin glasses: a challenge for mathematicians. Cavity
and mean field models. Ergebnisse der Mathematik und ihrer Grenzgebiete.
3. Folge. A Series of Modern Surveys in Mathematics, 46. Springer-Verlag,
Berlin. x+586 pp.
158. Talagrand M. (2003) Mean field models for spin glasses: a first course, Lectures
on probability theory and statistics (Saint-Flour, 2000), pp. 181–285, Lecture
Notes in Math. 1816, Springer, Berlin.
159. Talagrand M. (2007) Large Deviations, Guerra’s and A.S.S. Schemes, and the
Parisi Hypothesis. J. Stat. Phys. 126, no. 4, pp. 837–894.
160. Thouless D.J., Anderson P.W., Palmer R.G. (1977) Solution of ‘Solvable model
of a spin glass’ Philosphical Magazine 35, no. 3, pp. 593–601.
161. Tijms H. (2007) Understanding Probability. Chance rules in everyday life.
Second edition. Cambridge University Press, Cambridge. x+442 pp.
162. Tindel S. (2003) Quenched large deviation principle for the overlap of a p-spins
system J. Stat. Phys. 110, pp. 51–72.
163. Tindel S. (2005) On the stochastic calculus method for spins systems. Ann.
Probab. 33, no. 2, pp. 561–581.
164. Toubol A. (1998) High temperature regime for a multidimensional
Sherrington-Kirkpatrick model of spin glass. Probab. Theory Relat. Fields 110,
no. 4, pp. 497–534.
165. Toubol A. (1999) Small random perturbation of a classical mean field model.
Stochastic Process. Appl. 81, no. 1, pp. 1–24.
166. Toulouse G. (1983) Frustration and disorder, new problems in statistical me-
chanics, spin glasses in a historical perspective. Heidelberg Colloquium on Spin
Glasses, J.L. van Hemmen and I. Morgenstern eds. Lecture Notes in Physics
192, Springer Verlag.
167. Wästlund J. (2010), The mean field traveling salesman and related problems,
Acta Mathematica 204, no. 1, pp. 91-150.
References 473

168. Wästlund J. (2009) An easy proof of the ζ(2) limit in the random assignment
problem. Electron. Commun. Probab. 14, pp. 261–269.
169. Wästlund J. Replica-symmetry and combinatorial optimization, Manuscript,
2009.
Index

Dw 2
, 250 ∇, 203
G , 244 θ, 259
H0 , 22 ρ, 53
K, 31 qb, 83
L, 31 rb, 163
O(k), 41 b(0) , b(1) , b(2), 83

R,  , 55
b∗ , 242
R1,1 , 192 m∗ , 242
R1,2 , 3 mk (σ), 240
R, , 35 n → 0, 145
T, , 87 pN , 6
U1,1 , 275 qN,M , 207
ZN , 5 A(x), 228
Av, 54 I(t), 237
E, 21 N (x), VII, 225
Eξ , 51, 73, 156 R, 81
Er , 341 1A , 49
RS0 (α), 225
SN , VII Aizenman, 147
ΣN , 3 Aldous, 397
σ, 191 align the spins, 240
β+ , 62 allergic, 235
ch, 9 analytic continuation, 146
AT line, 80, 93
·, 6
atoms, 1, 4
·t , 20, 31
·t,∼ , 161
Bernoulli r.v., 13, 152, 240
log, VII Birkhoff’s theorem, 415
νt , 156 Boltzmann, IX
ν(f ), 30 Boltzmann factor, IX
ν(f )1/2 , 131 Bovier, 246, 254, 255, 275, 296
νt , 156, 207 Brascamp-Lieb, 235, 236, 296
νx , 341 Brunn-Minkowski, 193
νt,v , 162, 216
G, 245 cavity, 53
ρN,M , 207 central limit theorem, 41, 186
sh, 9 claims, 296
, 29 Comets, 147
th, 9 concentration of measure, 16, 127, 273
θ , 162 configuration, IX
b, 202 conflict, 240
ε , 55 contraction argument, 59

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 475
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
476 Index

coupled copies, 96, 126 Latala, 23, 149


Curie-Weiss model, 237, 242 Lebowitz, 147
Ledoux, 147
decorrelate, 47 Leone, 326
decoupled, 54 level of a problem, 29
deep breath, 145 Lipschitz constant, 16, 194
Derrida, 190 Lipschitz function, 16
differential inequalities, 460 log-concave, 213, 217
diluted interaction, 325 low temperature, X
disorder, X, 6 low-temperature region, 80
duality formula, 456, 458
Mézard, 146, 149, 190, 397
eigenvalues, 84 macroscopic, 4
energy, IX main term, 290
energy levels, X Markov’s inequality, 437
enlightening, 233 Maurey, 193
error term, 290 mean-field approximation, 4
essentially supported, 254 mean-field model, 4
exchange of the limits, 145 Milman, 147
external field, 4 moderate growth, 436, 440
Monge-Kantorovich, 342, 346, 456
Fekete’s lemma, 25
ferromagnetic interaction, 237, 241 negative number of variables, 146
finite connectivity, XIII negligible family of sets, 255
Fröhlich, 147 negligible set, 254, 255
Franz, 326 neural networks, 151
frustration, 1 Neveu, 147
fully rigorous, 7 no qualms, 71

Gardner, VIII, XIII, 190 Omega, 247


Gaussian space, 439 operator norm, 202
Gayrard, 246, 254, 255, 275, 296 optimization, 2
Ghirlanda-Guerra, XIII, 140, 147 overlap, 3
Gibbs’ measure, IX, 5 overwhelming probability, 247
Griffiths’ lemma, 25, 27, 97
Gronwall’s lemma, 460 Paley-Zygmund inequality, 460
ground breaking, 146 Parisi, 146, 397
Guerra, 12, 21–24, 148 Parisi conjecture, 398
guess, 146 partition function, IX, 5
Pastur, 147, 295
Hölder’s inequality, 36 Picco, 296
Hamming distance, 3 positivity argument, 177
Hanen, 113 probabilistic correlation, X
high temperature, X
high-temperature region, 80 random external field, 20
Hubbard-Stratonovitch transform, 245 random half-spaces, 151, 191
real replicas, 6
inverse temperature, IX realistic, XI
realistic model, 4
Jensen’s inequality, 8, 48 replica method, 146
replica-symmetric, 23
Krauth, 190 replicas, 6
result without proof, 37
large deviation, 192 Ruelle, 147
Index 477

scary formula, 32 transportation-cost distance, 342, 346,


second moment method, 460 367, 392, 456
self-averaging, 7, 19 truncated correlation, 47
Shcherbina, 147, 148, 191, 202, 235,
295, 296 ubiquitous, 327
site, 8, 10 unbearable, 45
Slepian’s lemma, 15 universal constant, 31
smart path, 12
symmetrization, 16, 195 Virasoro, 146
symmetry between replicas, 32, 35
symmetry between sites, 8 W, 245
Wasserstein’s distance, 414, 458
TAP equations, 71 weak independence, 29
temperature, 5
Tirozzi, 191, 202, 235, 296 Zegarlinski, 147
Toninelli, 24, 148 zero-temperature, 2, 9, 241
Glossary

G In the Hopfield model, the image of the Gibbs’


measure under the map σ → m(σ) =
(mk (σ))k≤M , 244
GN Gibbs’ measure on ΣN , 5
K A quantity that does not depend on N , although
it might depend on other parameters of the
model. Its value might not be the same at each
occurrence, 30
L A universal constant, i.e. a number, that does
not depend on anything. Its value might not be
the same at each occurrence, 30
O(k) Any quantity A such that |A| ≤ KN −k/2 where
K does not depend on N , 41
R1,2 The overlap of configurations
 σ 1 and σ 2 , that
−1 1 2
is the quantity N 1≤i≤N σi σi , 3

R, The overlap  between configurations σ  and σ  ,

that is N −1 1≤i≤N σi σi , 35
−  
R,  The quantity N −1 1≤i<N σi σi , 55
Sk In the models of Chapter
 2 and 3, this denotes
the quantity N −1/2 i≤N gi,k σi . We can also
denote this quantity by Sk (σ). We then use the
short-hand notation Sk = Sk (σ  ), 153
Sv The quantity corresponding to Sk when using
the “cavity in M ”, where t is now fixed, and v
is the interpolation parameter, 161
Sk,t The quantity corresponding to Sk when using
the cavity method (interpolation along the last
spin), e.g. in the case of the perceptron model
this quantity is given by (2.15). One then need

the “replicated versions” Sk,t of Sk,t such as in
(2.22), 155
  
T, ; T ; T T, = N1 i≤N σ̇i σ̇i ; T = N1 i≤N σ̇i σi ;

T = N1 i≤N σi 2 − q, 87

M. Talagrand, Mean Field Models for Spin Glasses, Ergebnisse der Mathematik 479
und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics 54,
DOI 10.1007/978-3-642-15202-3, © Springer-Verlag Berlin Heidelberg 2011
480 Glossary

W In the Hopfield model, W = (N β/2π)M/2 , 244



YN Often Y = βz q + h, 40
E Mathematical expectation, VII
EY 2 Short-hand for E(Y 2 ), 3
Eξ Expectation only in the r.v. ξ, that is, for all
the other r.v.s given. More generally, expecta-
tion only in the r.v.s “named from ξ” such as
ξ  , 51
E This notation and its avatars such as E , etc. is
used throughout the book to denote an expo-
nential term that occurs when using the cavity
method, 79
Ω This denotes an event, not the whole of the
probability space, 247
RS The typical name for a “replica-symmetric for-
mula”, i.e. and expression that gives the limiting
value of pN at high temperature. In the case of
the SK model, this is denoted SK(β, h), VII √
SN The sphere of RN of center 0 and radius N ,
VII
SK(β, h) The expression giving the replica-symmetric for-
mula in the case of the SK model, 24
ΣN {−1, 1}N , IX
α In a model with two parameters M and N , such
as in Chapters 2 to 4, this often denotes the
ratio M/N . This might also denote a number
> 0, such as in the expression “M/N → α”, 153
chx The hyperbolic cosine of x, 9
· An average for the Gibbs measure or its prod-
ucts, 6
· t A Gibbs average for an interpolating Hamilto-
nian, when the interpolating parameter is equal
to t, 20
log The natural logarithm, VII
thx The hyperbolic tangent of x, 9
νt (f ) A short-hand for dνt (f )/dt, 31
ν(f ) A short-hand for E f , 30
ν(f )1/2 A short-hand for (ν(f ))1/2 , 131
νt (f ) A short-hand for E f t , 31
νt,v This is the quantity that corresponds to νt when
we interpolate “in the cavity in M ” method
along the parameter v, so νt = νt,1 , and sup-
posedly, νt,0 is easier to compute than νt , 162
Glossary 481

G In the Hopfield model, the convolution of


G with γ, the Gaussian measure of density
W exp(−βN z2 /2) with respect to Lebesgue
measure on RM . It is a small perturbation of
G , 244
ψ(z) In the Hopfield model, the quantity ψ(z) =
−N βz2 /2 + i≤N log ch(βη i · z + h), 245
σ̇i A short-hand for σi − σi , 47
shx The hyperbolic sine of x, 9
D
= Equality in distribution, 62
b The barycenter of μ, 51

ε A short-hand for σN , the last spin of the -th
replica, 55
∇F The gradient of F , 16
σ̇ The sequence (σ̇i )1≤i≤N , 47
g A standard Gaussian vector, that is g =
(g1 , . . . , gM ) where g1 , . . . , gM are i.i.d. standard
Gaussian r.v.s, and where M should be clear
from the context, 15
m(σ) In the Hopfield model, m(σ) = (mk (σ))k≤M ,
244
ρ When considering a configuration σ =
(σ1 , . . . , σN ) we denote by ρ the configuration
(σ1 , . . . , σN −1 ) is the (N − 1)-spin system, 53
σ The standard name for a configuration in the
-th replica, 6

q Most of the time, q = Eth4 Y = Eth4 (βz q + h),
83
a(k) The k-th moment of a standard Gaussian r.v.,
except in chapter 7 where the meaning is differ-
ent, 40
a∗ In the Hopfield model, a∗ = 1 − β(1 − m∗2 ), 256
b∗ In the Hopfield model, b∗ = log ch(βm∗ + h) −
β ∗2
2 m , 242
m∗ In the Hopfield model, the solution of the equa-
tion m∗ = th(βm∗ + h), 241
mk In the  Hopfield model, mk = mk (σ) =
N −1 i≤N ηi,k σi , 240
pN pN = N −1 E log ZN . This quantity is also de-
noted pN (β) or pN (β, h). In models where there
are two parameters N and M , as in Chapters 2,
3, 4, might be denoted pN,M , 6
482 Glossary

2
−x /2
A(x) The function − dx
d
log N (x) = √1 e
2π N (x)
, 228
I(t) The function
1 
I(t) = (1 + t) log(1 + t) + (1 − t) log(1 − t) ,
2
which satisfies I(0) = I (0) = 0 and I (0) =
1/(1 − t2 ), see (A.29), 237
N (x) The probability that a standard Gaussian r.v. g
is ≥ x, VII
R Denotes a quantity which is a remainder, of
“smaller order”, such as in (1.217), 81
1A The indicator function of the set A, 49
Av Typically denotes the average over one or a few
spins that take values ±1, 53

approximate
integration by parts A central technique to handle situations where
the randomness is generated by Bernoulli r.v.s
rather than by Gaussian r.v.s. It relies on the
identity (4.198), 289
AT line For the SK model, the line of equation

β 2 Ech−4 (βz q + h) = 1, where q is the solu-
tion of (1.74), 80

Bernoulli r.v. A r.v. η such that P(η = 1) = P(η = −1) = 1/2,


13
Boltzmann factor At the configuration σ is has the value
exp(−βHN (σ)), IX

configuration An element of the configuration space, which is


most of the time either ΣN are SN , IX

decorrelate The spins σ1 and σ2 decorrelate when


limN →∞ E| σ1 σ2 − σ1 σ2 | = 0. One expects
this behavior at high temperature. One also ex-
pects that asymptotically the r.v.s σ1 and σ2
are independent, 47
disorder The randomness of the Hamiltonian, X

energy A number associated to each configuration,


which is often random, IX
Glossary 483

essentially
supported A random measure G is essentially supported
by a set A (depending on N and M ) if the com-
c
plement A of A is negligible, 254
external field A term h 1≤i≤N σi occurring in the Hamilto-
nian, 4

finite connectivity A situation where the average number of spins


that interact with a given spin remains bounded
as the size of the system increases, XII

Gibbs’ measure The probability on the configuration space with


density proportional to the Boltzmann factor,
IX
Griffiths’ lemma If a sequence ϕN of convex differentiable
functions converges pointwise in an interval
to a (necessarily convex) function ϕ, then
limN →∞ ϕN (x) = ϕ (x) at every point x for
which ϕ (x) exists, 25

Hamiltonian The function that associates to each configura-


tion its energy, IX
Hamming distance The Hamming distance of two sequences σ 1 and
σ 2 of ΣN is the proportion of coordinates where
they differ, 3
high-temperature
behavior The situation where the spins decorrelate, and
where the limiting value of pN is given by the
replica-symmetric formula, 11

independent This word is always understood in the proba-


bilistic sense, 1
interchange
of limits A very sticky point, 8

Jensen’s inequality For a convex function ϕ and a r.v. X, the fact


that ϕ(EX) ≤ Eϕ(X), 8

negligible If G is a random measure, as set A (depend-


ing on N and M ) is negligible if EG(A) ≤
K exp(−N/K) where K does not depend on N
or M , 254
484 Glossary

overlap The overlap between two configurations σ =


(σ1 , . . . , σN ) and τ =
(τ1 , . . . , τN ) is the quan-
tity N −1 σ · τ = N −1 1≤i≤N σi τi , 3
overwhelming
probability An event Ω (depending on M and N ) occurs
with overwhelming probability if P(Ω) ≥ 1 −
K exp(−N/K) for some number K that does
not depend either on N and M , 247

partition function The normalizing


 factor in Gibbs’ measure,
ZN = ZN (β) = σ exp(−βHN (σ)), IX

r.v.s A short-hand for random variables, 1


random 
external field A term 1≤i≤N hi σi in the Hamiltonian, where
(hi ) are i.i.d. r.v.s, 20
replica-symmetric In physics’ terminology, this describes “high-
temperature behavior”. The limiting value of
pN is then given by the replica-symmetric for-
mula. This formula depends on some “free pa-
rameters” that are specified by the replica-
symmetric equations. These equations always
seem to express that the free parameters are a
critical point of the replica-symmetric formula.
A simple example of replica-symmetric formula
is given by the right-hand side of (1.73), and
the corresponding replica-symmetric equation is
(1.74), 22
replicas Configurations that are averaged independently
for Gibbs’ measure. They are typically denoted
by σ 1 , . . . , σ  , . . ., 6

self-averaging Informally, a random quantity XN such that


E|XN | ≥ 1/L but that the variance of XN goes
to 0 as N → ∞. The value of EXN then “gives
all the first-order information about XN ”, 7
site An integer 1 ≤ i ≤ N , 8
symmetry
between replicas A consequence of the fact that (σ  ) is an i.i.d.
sequence for Gibbs’ measure. A good place to
learn about it is the beginning of the proof of
Proposition 1.8.7, 32
Glossary 485

symmetry
between sites A general principle that for many Hamiltonians,
the sites “play the same rôle”, 8

typical A situation that occurs with probability near 1,


as opposed to an exceptional situation, which
occurs with probability near 0, VIII

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy