0% found this document useful (0 votes)

118 views342 pages

Free Probability and Random Matrices

Fields Institute Monograph, Volume 35 Springer Science+Business Media 2017 J. A. Mingo and R. Speicher

Uploaded by

Khaeles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views342 pages

Free Probability and Random Matrices

Fields Institute Monograph, Volume 35 Springer Science+Business Media 2017 J. A. Mingo and R. Speicher

Uploaded by

Khaeles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 342

J. A. Mingo and R.

Speicher

Free Probability and Random

Matrices

Fields Institute Monograph, Volume 35

Springer Science+Business Media 2017
Dedicated to Dan-Virgil Voiculescu who gave
us freeness and Betina and Jill who gave us
the liberty to work on it.
Contents

1 Asymptotic Freeness of Gaussian Random Matrices . . . . . . . . . . . . . . . . 13

1.1 Moments and cumulants of random variables . . . . . . . . . . . . . . . . . . . 13
1.2 Moments of a Gaussian random variable . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Gaussian vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 The moments of a standard complex Gaussian random variable . . . . 17
1.5 Wick’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Gaussian random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 A genus expansion for the GUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.8 Non-crossing partitions and permutations . . . . . . . . . . . . . . . . . . . . . . 21
1.9 Wigner’s semi-circle law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.10 Asymptotic freeness of independent GUE’s . . . . . . . . . . . . . . . . . . . . . 23
1.11 Freeness and asymptotic freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.12 Basic properties of freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.13 Classical moment-cumulant formulas . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.14 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2 The Free Central Limit Theorem and Free Cumulants . . . . . . . . . . . . . 33

2.1 The classical and free central limit theorems . . . . . . . . . . . . . . . . . . . . 33
2.1.1 Classical central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.2 Free central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 Non-crossing partitions and free cumulants . . . . . . . . . . . . . . . . . . . . . 41
2.3 Products of free random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.4 Functional relation between moment series and cumulant series . . . . 49
2.5 Subordination and the non-commutative derivative . . . . . . . . . . . . . . . 52

3 Free Harmonic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.1 The Cauchy transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Moments and asymptotic expansions . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3 Analyticity of the R-transform: compactly supported measures . . . . . 74
3.4 Measures with finite variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5
6 Contents

3.5 Free additive convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.6 The R-transform and free additive convolution of arbitrary measures 88

4 Asymptotic Freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.1 Averaged convergence versus almost sure convergence . . . . . . . . . . . 99
4.2 Gaussian random matrices and deterministic matrices . . . . . . . . . . . . 104
4.3 Haar distributed unitary random and deterministic matrices . . . . . . . 108
4.4 Wigner and deterministic random matrices . . . . . . . . . . . . . . . . . . . . . 112
4.5 Examples of random matrix calculations . . . . . . . . . . . . . . . . . . . . . . . 122
4.5.1 Wishart matrices and the Marchenko-Pastur distribution . . . . 122
4.5.2 Sum of random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.5.3 Product of random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5 Second-Order Freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.1 Fluctuations of GUE random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 128
5.2 Fluctuations of several matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
5.3 second-order probability space and second-order freeness . . . . . . . . . 142
5.4 second-order cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.5 Functional relation between moment and cumulant series . . . . . . . . . 154
5.6 Diagonalization of fluctuations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.6.1 Diagonalization in the one-matrix case . . . . . . . . . . . . . . . . . . 157
5.6.2 Diagonalization in the multivariate case . . . . . . . . . . . . . . . . . 162

6 Free Group Factors and Freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

6.1 Group (von Neumann) algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
6.2 Free group factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.3 Free product of groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.4 Moments and isomorphism of von Neumann algebras . . . . . . . . . . . . 168
6.5 Freeness in the free group factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.6 The structure of free group factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.7 Compression of free group factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.8 Circular operators and complex Gaussian random matrices . . . . . . . . 173
6.9 Proof of L(F3 )1/2 ' L(F9 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
6.10 The general case L(Fn )1/k ' L(F1+(n−1)k2 ) . . . . . . . . . . . . . . . . . . . . . 178
6.11 Interpolating free group factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6.12 The dichotomy for the free group factor isomorphism problem . . . . . 179

7 Free Entropy χ - the Microstates Approach via Large Deviations . . . . 181

7.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.2 Large deviation theory and Cramér’s Theorem . . . . . . . . . . . . . . . . . . 182
7.3 Sanov’s Theorem and entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
7.4 Back to random matrices and one-dimensional free entropy . . . . . . . 188
7.5 Definition of multivariate free entropy . . . . . . . . . . . . . . . . . . . . . . . . . 190
7.6 Some important properties of χ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Contents 7

7.7 Applications of free entropy to operator algebras . . . . . . . . . . . . . . . . 194

7.7.1 The proof of Theorem 7, part (i) . . . . . . . . . . . . . . . . . . . . . . . . 196
7.7.2 The proof of Theorem 7, part (iii) . . . . . . . . . . . . . . . . . . . . . . . 198

8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher

Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.1 Non-commutative derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.2 ∂i as unbounded operator on Chx1 , . . . , xn i . . . . . . . . . . . . . . . . . . . . . . 204
8.3 Conjugate variables and free Fisher information Φ ∗ . . . . . . . . . . . . . . 208
8.4 Additivity of Φ ∗ and freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
8.5 The non-microstates free entropy χ ∗ . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
8.6 Operator algebraic applications of free Fisher information . . . . . . . . 221
8.7 Absence of atoms for self-adjoint polynomials . . . . . . . . . . . . . . . . . . 223
8.8 Additional exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

9 Operator-Valued Free Probability Theory and Block Random

Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.1 Gaussian block random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
9.2 General theory of operator-valued free probability . . . . . . . . . . . . . . . 238
9.3 Relation between scalar-valued and matrix-valued cumulants . . . . . . 243
9.4 Moving between different levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
9.5 A non-self-adjoint example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

10 Polynomials in Free Variables and Operator-Valued Convolution . . . . 251

10.1 The general concept of a free deterministic equivalent . . . . . . . . . . . . 251
10.2 A motivating example: reduction to multiplicative convolution . . . . . 253
10.3 Reduction to operator-valued additive convolution via the
linearization trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.4 Analytic theory of operator-valued convolutions . . . . . . . . . . . . . . . . . 259
10.4.1 General notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
10.4.2 Operator-valued additive convolution . . . . . . . . . . . . . . . . . . . 260
10.4.3 Operator-valued multiplicative convolution . . . . . . . . . . . . . . . 261
10.5 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
10.6 The Case of Rational Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
10.7 Additional exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

11 Brown Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

11.1 Brown measure for normal operators . . . . . . . . . . . . . . . . . . . . . . . . . . 265
11.2 Brown measure for matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
11.3 Fuglede-Kadison determinant in finite von Neumann algebras . . . . . 268
11.4 Subharmonic functions and their Riesz measures . . . . . . . . . . . . . . . . 269
11.5 Definition of the Brown measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
11.6 Brown measure of R-diagonal operators . . . . . . . . . . . . . . . . . . . . . . . . 272
11.6.1 A little about the proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
8 Contents

11.6.2 Example: circular operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

11.6.3 The circular law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.6.4 The single ring theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
11.7 Brown measure of elliptic operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
11.8 Brown measure for unbounded operators . . . . . . . . . . . . . . . . . . . . . . . 276
11.9 Hermitization method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
11.10Brown measure of arbitrary polynomials in free variables . . . . . . . . . 278

Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

12.1 Solutions to exercises in Chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
12.2 Solutions to exercises in Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
12.3 Solutions to exercises in Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12.4 Solutions to exercises in Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
12.5 Solutions to exercises in Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
12.6 Solutions to exercises in Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
12.7 Solutions to exercises in Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
12.8 Solutions to exercises in Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
12.9 Solutions to exercises in Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
12.10Solutions to exercises in Chapter 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
12.11Solutions to exercises in Chapter 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

Index of Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333

Introduction

This book is an invitation to the world of free probability theory.

Free probability is a quite young mathematical theory with many avatars. It owes
its existence to the visions of one man, Dan-Virgil Voiculescu, who created it out of
nothing at the beginning of the 1980s and pushed it forward ever since. The subject
had a relatively slow start in its first decade but took on a lot of momentum later on.
It started in the theory of operator algebras, showed its beautiful combinatorial
structure via non-crossing partitions, made contact with the world of random ma-
trices, and reached out to many other subjects: like representation theory of large
groups, quantum groups, invariant subspace problem, large deviations, quantum in-
formation theory, subfactors, or statistical inference. Even in physics and engineer-
ing many people have heard of it and find it useful and exciting.
One of us (RS) has already written, jointly with Alexandru Nica, a monograph
[140] on the combinatorial side of free probability theory. Whereas combinatorics
will also show up in the present book, our intention here is different: we want to
give a flavour of the breadth of the subject, hence this book will cover a variety of
different facets, occurrences and applications of free probability; instead of going in
depth in one of them, our goal is to give the basic ideas and results in many different
directions and show how all this is related.
This means that we have to cover subjects as different as random matrices and
von Neumann algebras. This should, however, not to be considered a peril but as a
promise for the good things to come.
We have tried to make the book accessible to both random matrix and operator
algebra (and many more) communities without requiring too many prerequisites.
Whereas our presentation of random matrices should be mostly self-contained, on
the operator algebraic side we try to avoid the technical parts of the theory as much
as possible. We hope that the main ideas about von Neumann algebras are compre-
hensible even without knowing what a von Neumann algebra is. In particular, in
Chapters 1-5 no von Neumann algebras will make their appearance.

9
10 Contents

The book is a mixture between textbook and research monograph. We actually

cover many of the important developments of the subject in recent years, for which
no coherent introduction in monograph style has existed up to now.
Chapters 1, 2, 3, 4 and 6 describe in a self-contained way the by now well-
established basic body of the theory of free probability. Chapters 1 and 4 deal with
the relation of free probability with random matrices; Chapter 1 is more of a motivat-
ing nature, whereas Chapter 4 provides the rigorous results. Chapter 6 provides the
relation to operator algebras and the free group factor isomorphism problem, which
initiated the whole theory. Chapter 2 presents the combinatorial side of the theory;
as this is dealt with in much more detail in the monograph [140], we sometimes refer
to the latter for details. Chapter 3 gives a quite extensive and self-contained account
of the analytic theory of free convolution. We put there quite some emphasis on the
subordination formulation, which is the modern state of the art for dealing with such
questions, and which cannot be found in this form anywhere else.
The other chapters deal with parts of the theory where the final word is not yet
spoken, but where important progress has been achieved and which surely will sur-
vive in one or the other form in future versions of free probability. In those chapters
we often make references to the original literature for details of the proofs. Never-
theless we try also there to provide intuition and motivation for what and why. We
hope that those chapters invite also some of the readers to do original work in the
subject.
Chapter 5 is on second-order freeness; this theory intends to deal with fluctua-
tions of random matrices in the same way as freeness does this with the average.
Whereas the combinatorial aspect of this theory is far evolved, the analytic status
awaits a better understanding.
Free entropy has at the moment two incarnations with very different flavuor. The
microstates approach is treated in Chapter 7, whereas the non-microstates approach
is in Chapter 8. Both approaches have many interesting and deep results and appli-
cations - however, the final form of free entropy theory (hoping that there is only
one) still has to be found.
Operator-valued free probability has evolved in recent years into a very powerful
generalization of free probability theory; this is made clear by its applicability to
much bigger classes of random matrices, and by its use for calculating the distribu-
tion of polynomials in free variables. The operator-valued theory is developed and
its use demonstrated in Chapters 9 and 10.
In Chapter 11, we present the Brown measure, a generalization of the spectral
distribution from the normal to the non-normal case. In particular, we show how
free probability (in its operator-valued version) allows one to calculate such Brown
measures. Again there is a relation with random matrices; the Brown measure is the
canonical candidate for the eigenvalue distribution of non-normal random matrix
models (where the eigenvalues are not real, but complex).
After having claimed to cover many of the important directions of free probabil-
ity we have now to admit that there are at least as many which unfortunately did not
Contents 11

make it into the book. One reason for this is that free probability is still very fast
evolving with new connections popping up quite unexpectedly.
So we are, for example, not addressing such exciting topics as free stochastic
and Malliavin calculus [40, 108, 114], or the rectangular version of free probability
[28], or the strong version of asymptotic freeness [48, 58, 89], or free monotone
transport [83], or the relation with representation theory [35, 72] or with quantum
groups [16, 17, 44, 73, 110, 118, 146]; or the quite recent new developments around
bifreeness [51, 81, 100, 196], traffic freeness [50, 122] or connections to Ramanujan
graphs via finite free convolution [124]. Instead of trying to add more chapters to a
never-ending (and never-published) book, we prefer just to stop where we are and
leave the remaining parts for others.
We want to emphasize that some of the results in this book owe their existence to
the book writing itself, and our endeavour to fill apparent gaps in the existing theory.
Examples of this are our proof of the asymptotic freeness of Wigner matrices from
deterministic matrices in Section 4.4 (for which there exists now also another proof
in the book [6]), the fact that finite free Fisher information implies the existence of
a density in Proposition 8.18, or the results about the absence of algebraic relations
and zero divisors in the case of finite free Fisher information in Theorems 8.13 and
8.32.
Our presentation benefited a lot from input by others. In particular, we like to
mention Serban Belinschi and Hari Bercovici for providing us with a proof of Propo-
sition 8.18, and Uffe Haagerup for allowing us to use his manuscript of his talk at
the Fields Institute as the basis for Chapter 11. With the exception of Sections 11.9
and 11.10 we are mainly following his notes in Chapter 11. Chapter 3 relied sub-
stantially on input and feedback from the experts on the subject. Many of the results
and proofs around subordination were explained to us by Serban Belinschi, and we
also got a lot of feedback from JC Wang and John Williams. We are also grateful
to N. Raj Rao for help with his RMTool package which was used in our numerical
simulations.
The whole idea of writing this book started from a lectures series on free prob-
ability and random matrices which we gave at the Fields Institute, Toronto, in the
fall of 2007 within the Thematic Programme on Operator Algebras. Notes of our
lectures were taken by Emily Redelmeier and by Jonathan Novak; and the first draft
of the book was based on these notes.
We had the good fortune to have Uffe Haagerup around during this programme
and he agreed to give one of the lectures, on his work on the Brown measure. As
mentioned above, the notes of his lecture became the basis of Chapter 11.
What are now Chapters 5, 8, 9, and 10 were not part of the lectures at the Fields
Institute, but were added later. Those additional chapters cover in big parts also
results which did not yet exist in 2007. So this gives us at least some kind of excuse
that the finishing of the book took so long.
Much of Chapter 8 is based on classes on “Random matrices and free entropy”
and “Non-commutative distributions” which one of us (RS) taught at Saarland Uni-
12 Contents

versity during the winter terms 2013-14 and 2014-15, respectively. The final out-
come of this chapter owes a lot to the support of Tobias Mai for those classes.
Chapter 9 is based on work of RS with Wlodek Bryc, Reza Rashidi Far and Tamer
Oraby on block random matrices in a wireless communications (MIMO) context,
and on various lectures of RS for engineering audiences, where he tried to convince
them of the relevance and usefulness of operator-valued methods in wireless prob-
lems. Chapter 10 benefited a lot from the work of Carlos Vargas on free deterministic
equivalents in his PhD thesis and from the joint work of RS with Serban Belinschi
and Tobias Mai around linearization and the analytic theory of operator-valued free
probability. The algorithms, numerical simulations, and histograms for eigenvalue
distributions in Chapter 10 and Brown measures in Chapter 11 are done with great
expertise and dedication by Tobias Mai.
There are exercises scattered throughout the text. The intention is to give readers
an opportunity to test their understanding. In some cases, where the result is used in
a crucial way or where the calculation illustrates basic ideas, a solution is provided
at the end of the book.
In addition to the already mentioned individuals we owe a lot of thanks to people
who read preliminary versions of the book and gave useful feedback, which helped
to improve the presentation and correct some mistakes. We want to mention in par-
ticular Marwa Banna, Arup Bose, Mario Diaz, Yinzheng Gu, Todd Kemp, Felix
Leid, Josué Vázquez, Hao-Wei Wang, Simeng Wang, and Guangqu Zheng.
Further thanks are due to the Fields Institute for the great environment they of-
fered us during the already mentioned thematic programme on Operator Algebras
and for the opportunity to publish our work in their Monographs series. The writing
of this book, as well as many of the reported results, would not have been possible
without financial support from various sources; in particular, we want to mention a
Killam Fellowship for RS in 2007 and 2008, which allowed him to participate in the
thematic programme at the Fields Institute and thus get the whole project started;
and the ERC Advanced Grant “Non-commutative distributions in free probability”
of RS which provided time and resources for the finishing of the project. Many of
the results we report here were supported by grants from the Canadian and German
Science Foundations NSERC and DFG, respectively; by Humboldt Fellowships for
Serban Belinschi and John Williams for stays at Saarland University; and by DAAD
German-French Procope exchange programmes between Saarland University and
the universities of Besançon and of Toulouse.
As we are covering a wide range of topics there might come a point where one
gets a bit exhausted from our book. There are, however, some alternatives; like the
standard references [97, 140, 197, 198] or survey articles [37, 84, 141, 142, 156,
162, 164, 165, 183, 191, 192] on (some aspects of) free probability. Our advice:
take a break, enjoy those and then come back motivated to learn more from our
book.
Chapter 1
Asymptotic Freeness of Gaussian Random Matrices

In this chapter we shall introduce a principal object of study: Gaussian random

matrices. This is one of the few ensembles of random matrices for which one can
do explicit calculations of the eigenvalue distribution. For this reason the Gaussian
ensemble is one of the best understood. Information about the distribution of the
eigenvalues is carried by it moments: {E(tr(X k ))}k where E is the expectation, tr
denotes the normalized trace (i.e. tr(IN ) = 1), and X is an N × N random matrix.
One of the achievements of the free probability approach to random matrices is
to isolate the property called asymptotic freeness. If X and Y are asymptotically free
then we can approximate the moments of X + Y and XY from the moments of X
and Y ; moreover this approximation becomes exact in the large N limit. In its exact
form this relation is called freeness and we shall give its definition at the end of this
chapter, §1.12. In Chapter 2 we shall explore the basic properties of freeness and
relate these to some new objects called free cumulants. To motivate the definition of
freeness we shall show in this chapter that independent Gaussian random matrices
are asymptotically free, thus showing that freeness arises naturally.
To begin this chapter we shall review enough of the elementary properties of
Gaussian random variables to demonstrate asymptotic freeness.
We want to add right away the disclaimer that we do not attempt to give a com-
prehensive introduction into the vast subject of random matrices. We concentrate
on aspects which have some relevance for free probability theory; still this should
give the uninitiated reader a good idea what random matrices are and why they are
so fascinating and have become a centrepiece of modern mathematics. For more
on its diversity, beauty, and depth one should have a look at [6, 69, 172] or on the
collection of survey articles on various aspects of random matrices in [2].

1.1 Moments and cumulants of random variables

Let ν be a probability measure on R. If R |t|n dν(t) < ∞ we say that ν has a moment
R

of order n, the nth moment is denoted αn = R t n dν(t).

13
14 1 Asymptotic Freeness of Gaussian Random Matrices

Exercise 1. If ν has a moment of order n then ν has all moments of order m for
m < n.
√
The integral ϕ(t) = eist dν(s) (with i = −1) is always convergent and is called
R

the characteristic function of ν. It is always uniformly continuous on R and ϕ(0) =

1, so for |t| small enough ϕ(t) 6∈ (−∞, 0] and we can define the continuous function
log(ϕ(t)). If ν has a moment of order n then ϕ has a derivative of order n, and
conversely if ϕ has a derivative of order n then ν has a moment of order n when n
is even and a moment of order n − 1 when n is odd (see Lukacs [119, Corollary 1 to
Theorem 2.3.1]). Moreover αn = i−n ϕ (n) (0), so if ϕ has a power series expansion it
has to be
(it)n
ϕ(t) = ∑ αn .
n≥0 n!
Thus if ν has a moment of order m + 1 we can write
m
(it)n dn

−n
+ o(t m )

log(ϕ(t)) = ∑ kn with kn = i log(ϕ(t)) .
n=1 n! dt n t=0

The numbers {kn }n are the cumulants of ν. To distinguish them from the free
cumulants, which will be defined in the next chapter, we will call {kn }n the classi-
cal cumulants of ν. The moments {αn }n of ν and the cumulants {kn }n of ν each
determine the other through the moment-cumulant formulas:

n!
αn = ∑ kr1 · · · knrn
r1 · · · (n!)rn r ! · · · r ! 1
(1.1)
1·r1 +···+n·rn =n (1!) 1 n
r1 ,...,rn ≥0

(−1)r1 +···+rn −1 (r1 + · · · + rn − 1)! n! r1

kn = ∑ r1 · · · (n!)rn r ! · · · r !
α1 · · · αnrn . (1.2)
1·r1 +···+n·rn =n (1!) 1 n
r1 ,...,rn ≥0

Both sums are over all non-negative integers r1 , . . . , rn such that 1 · r1 + · · · + n · rn =

n. We shall see below in Exercises 4 and 12 how to use partitions to simplify these
formidable equations.
A very important random variable is the Gaussian or normal random variable. It
has the distribution
Z t2
1 2 /(2σ 2 )
P(t1 ≤ X ≤ t2 ) = √ e−(t−a) dt
2πσ 2 t1

where a is the mean and σ 2 is the variance. The characteristic function of a Gaussian
random variable is
σ 2t 2 (it)1 (it)2
ϕ(t) = exp(iat − ), thus log ϕ(t) = a +σ2 .
2 1! 2!
1.2 Moments of a Gaussian random variable 15

Hence for a Gaussian random variable all cumulants beyond the second are 0.
Exercise 2. Suppose ν has a fifth moment and we write

(it) (it)2 (it)3 (it)4

ϕ(t) = 1 + α1 + α2 + α3 + α4 + o(t 4 )
1! 2! 3! 4!
where α1 , α2 , α3 , and α4 are the first four moments of ν. Let

(it) (it)2 (it)3 (it)4

log(ϕ(t)) = k1 + k2 + k3 + k4 + o(t 4 ).
1! 2! 3! 4!
Using the Taylor series for log(1 + x) find a formula for α1 , α2 , α3 , and α4 in terms
of k1 , k2 , k3 , and k4 .

1.2 Moments of a Gaussian random variable

Let X be a Gaussian random variable with mean 0 and variance 1. Then by definition
Z t2
1 2 /2
P(t1 ≤ X ≤ t2 ) = √ e−t dt.
2π t1

Let us find the moments of X. Clearly, α0 = 1, α1 = 0, and by integration by parts

dt
Z
2 /2
αn = E(X n ) = t n e−t √ = (n − 1)αn−2 for n ≥ 2.
R 2π
Thus
α2n = (2n − 1)(2n − 3) · · · 5 · 3 · 1 =: (2n − 1)!!
and α2n−1 = 0 for all n.
Let us find a combinatorial interpretation of these numbers. For a positive integer
n let [n] = {1, 2, 3, . . . , n}, and P(n) denote all partitions of the set [n], i.e. π =
{V1 , . . . ,Vk } ∈ P(n) means V1 , . . . ,Vk ⊆ [n], Vi 6= 0/ for all i, V1 ∪ · · · ∪ Vk = [n], and
Vi ∩ V j = 0/ for i 6= j; V1 , . . . ,Vk are called the blocks of π. We let #(π) denote the
number of blocks of π and #(Vi ) the number of elements in the block Vi . A partition
is a pairing if each block has size 2. The pairings of [n] will be denoted P2 (n).
Let us count |P2 (2n)|, the number of pairings of [2n]. 1 must be paired with
something and there are 2n − 1 ways of choosing it. Thus

|P2 (n)| = (2n − 1)|P2 (n − 2)| = (2n − 1)!!.

So E(X 2n ) = |P2 (2n)|. There is a deeper connection between moments and parti-
tions known as Wick’s formula (see Section 1.5).
Exercise 3. We say that a partition of [n] has type (r1 , . . . , rn ) if it has ri blocks of
size i. Show that the number of partitions of [n] of type (r1 , r2 , . . . , rn ) is
16 1 Asymptotic Freeness of Gaussian Random Matrices

n!
.
(1!)r1 (2!)r2 · · · (n!)rn r 1 !r2 ! · · · rn !

Using the type of a partition there is a very simple expression for the moment-
cumulant relations above. Moreover this expression is quite amenable for calcula-
tion. If π is a partition of [n] and {ki }i is any sequence, let kπ = k1r1 k2r2 · · · knrn where
ri is the number of blocks of π of size i. Using this notation the first of the moment-
cumulant relations can be written

αn = ∑ kπ . (1.3)
π∈P (n)

The second moment-cumulant relation can be written (see Exercise 13)

kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ . (1.4)

π∈P (n)

The simplest way to do calculations with relations like those above is to use formal
power series (see Stanley [167, §1.1]).
Exercise 4. Let {αn } and {kn } be two sequences satisfying (1.3). In this exercise
we shall show that as formal power series
∞
zn ∞
zn
log 1 + ∑ αn = ∑ kn . (1.5)
n=1 n! n=1 n!

(i) Show that by differentiating both sides of (1.5) it suffices to prove

∞
zn ∞
zn ∞ zn
∑ αn+1 n! = 1 + ∑ αn ∑
n! n=0
kn+1 .
n!
(1.6)
n=0 n=1

(ii) By grouping the terms in ∑π kπ according to the size of the block containing
1 show that
n−1
n−1

αn = ∑ kπ = ∑ km+1 αn−m−1 .
π∈P (n) m=0 m

(iii) Use the result of (ii) to prove (1.6).

1.3 Gaussian vectors

Let X : Ω → Rn , X = (X1 , . . . , Xn ) be a random vector. We say that X is Gaussian if
there is a positive definite n × n real symmetric matrix B such that

exp(−hBt,ti/2)dt
Z
E(Xi1 · · · Xik ) = ti1 · · ·tik
Rn (2π)n/2 det(B)−1/2
1.5 Wick’s formula 17

where h·, ·i denotes the standard inner product onRn . Let C = (ci j ) be the covariance
matrix, that is ci j = E [Xi − E(Xi )] · [X j − E(X j )] .
In fact C = B−1 and if X1 , . . . , Xn are independent then B is a diagonal ma-
trix, see Exercise 5. If Y1 , . . . ,Yn are independent Gaussian random variables, A
is an invertible real matrix, and X = AY , then X is a Gaussian random vector
and every Gaussian random vector is obtained in this way. If X = (X1 , . . . , Xn ) is
a complex random vector we say that X is a complex Gaussian random vector if
(Re(X1 ), Im(X1 ), . . . , Re(Xn ), Im(Xn )) is a real Gaussian random vector.
Exercise 5. Let X = (X1 , . . . , Xn ) be a Gaussian random vector with density
p
det(B)(2π)−n exp(−hBt,ti/2). Let C = (ci j ) = B−1 .
(i) Show that B is diagonal if and only if {X1 , . . . , Xn } are independent.
(ii) By first diagonalizing B show that ci j = E [Xi − E(Xi )] · [X j − E(X j )] .

1.4 The moments of a standard complex Gaussian random variable

Suppose X and Y are independent√real Gaussian random variables with mean 0 and
variance 1. Then Z = (X + iY )/ 2 is a complex Gaussian random variable with
mean 0 and variance E(ZZ) = 12 E(X 2 + Y 2 ) = 1. We call Z a standard complex
Gaussian random variable. Moreover, for such a complex Gaussian variable we
have (
m n 0, m 6= n
E(Z Z ) = .
m!, m = n
√
Exercise 6. Let Z = (X + iY )/ 2 be a standard complex Gaussian random variable
with mean 0 and variance 1.
(i) Show that
1
Z
n 2 2
E(Z m Z ) = (t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2 .
π R2

By switching to polar coordinates show that

Z 2π Z ∞
n 1 2
E(Z m Z ) = rm+n+1 eiθ (m−n) e−r drdθ .
π 0 0

n
(ii) Show that E(Z m Z ) = 0 for m 6= n, and that E(|Z|2n ) = n!.

1.5 Wick’s formula

Let (X1 , . . . Xn ) be a real Gaussian random vector and i1 , . . . , ik ∈ [n]. Wick’s for-
mula gives a simple expression for E(Xi1 · · · Xik ). If k is even and π ∈ P2 (k)
let Eπ (X1 , . . . , Xk ) = ∏(r,s)∈π E(Xr Xs ). For example if π = {(1, 3)(2, 6)(4, 5)} then
Eπ (X1 , X2 , X3 , X4 , X5 , X6 ) = E(X1 X3 )E(X2 X6 )E(X4 X5 ). Eπ is a k-linear functional.
18 1 Asymptotic Freeness of Gaussian Random Matrices

The fact that only pairings arise in Wick’s formula is a consequence of the observa-
tion on page 15 that for a Gaussian random variable, all cumulants above the second
vanish.

Theorem 1. Let (X1 , . . . , Xn ) be a real Gaussian random vector. Then

E(Xi1 · · · Xik ) = ∑ Eπ (Xi1 , . . . , Xik ) for any i1 , . . . , ik ∈ [n]. (1.7)

π∈P2 (k)

Proof: Suppose that the covariance matrix C of (X1 , . . . , Xn ) is diagonal, i.e. the
Xi ’s are independent. Consider (i1 , . . . , ik ) as a function [k] → [n]. Let {a1 , . . . , ar }
be the range of i and A j = i−1 (a j ). Then {A1 , . . . , Ar } is a partition of [k] which
we denote ker(i). Let |At | be the number of elements in At . Then E(Xi1 · · · Xik ) =
r |A |
∏t=1 E(Xat t ). Let us recall that if X is a real Gaussian random variable of mean 0
and variance c, then for k even E(X k ) = ck/2 × |P2 (k)| = ∑π∈P2 (k) Eπ (X, . . . , X),
|A |
and for k odd E(X k ) = 0. Thus we can write the product ∏t E(Xat t ) as a sum
∑π∈P2 (k) Eπ (Xi1 , . . . , Xik ) where the sum runs over all π’s which only connect el-
ements in the same block of ker(i). Since E(Xir Xis ) = 0 for ir 6= is we can relax
the condition that π only connect elements in the same block of ker(i). Hence
E(Xi1 · · · Xik ) = ∑π∈P2 (k) Eπ (Xi1 , . . . , Xik ).
Finally let us suppose that C is arbitrary. Let the density of (X1 , . . . , Xn ) be
exp(−hBt,ti/2)[(2π)n/2 det(B)−1/2 ]−1 and choose an orthogonal matrix O such that
D = O−1 BO is diagonal. Let
   
Y1 X1
 ..  −1  .. 
 .  = O  . .
Yn Xn

Then (Y1 , . . . ,Yn ) is a real Gaussian random vector with the diagonal covariance
matrix D−1 . Then
n
E(Xi1 · · · Xik ) = ∑ oi1 j1 oi2 j2 · · · oik jk E(Y j1 Y j2 · · ·Y jk )
j1 ,..., jk =1
n
= ∑ oi1 j1 · · · oik jk ∑ Eπ (Y j1 , . . . ,Y jk )
j1 ,..., jk =1 π∈P2 (k)

= ∑ Eπ (Xi1 , . . . , Xik ).
π∈P2 (k)

Since both sides of equation (1.7) are k-linear we can extend by linearity to the
complex case.

Corollary 2. Suppose (X1 , . . . , Xn ) is a complex Gaussian random vector then

1.7 A genus expansion for the GUE 19

(ε ) (ε ) (ε ) (ε )
E(Xi1 1 · · · Xik k ) = ∑ Eπ (Xi1 1 , . . . , Xik k ) (1.8)
π∈P2 (k)

for all i1 , . . . , ik ∈ [n] and all ε1 , . . . , εk ∈ {0, 1}; where we have used the notation
(0) (1)
Xi := Xi and Xi := Xi .

Formulas (1.7) and (1.8) are usually referred to as Wick’s formula after the physi-
cist Gian-Carlo Wick [200], who introduced them in 1950 as a fundamental tool in
quantum field theory; one should notice, though, that they had already appeared
much earlier, in 1918, in the work of the statistician Leon Isserlis [101].
Exercise 7. Let Z1 , . . . , Zs be independent standard complex Gaussian random vari-
ables with mean 0 and E(|Zi |2 ) = 1. Show that

E(Zi1 · · · Zin Z j1 · · · Z jn ) = |{σ ∈ Sn | i = j ◦ σ }|.

Sn denotes the symmetric group on [n]. Note that this is consistent with part (iii) of
Exercise 6.

1.6 Gaussian random matrices

√
Let X be an N × N matrix with entries fi j where √ fi j = xi j + −1 yi j is a complex
Gaussian random variable normalized such that N fi j is a standard complex Gaus-
2
sian random variable, i.e. E( fi j ) = 0, E( fi j ) = 1/N and
(i) fi j = f ji ,
(ii) {xi j }i≥ j ∪ {yi j }i> j are independent.
Then X is a self-adjoint Gaussian random matrix. Such a random matrix is often
called a GUE random matrix (GUE = Gaussian unitary ensemble).
√
Exercise 8. Let X be an N × N GUE random matrix, with entries fi j = xi j + −1 yi j
normalized so that E(| fi j |2 ) = 1/N.
(i) Consider the random N 2 -vector

(x11 , . . . , xNN , x12 , . . . , x1N , . . . , xN−1,N , y12 , . . . , yN−1,N ).

2
Show that the density of this vector is c e−NTr(X )/2 dX where c is a constant and
2
dX = ∏Ni=1 dxii ∏i< j dxi j dyi j is Lebesgue measure on RN .
(ii) Evaluate the constant c.

1.7 A genus expansion for the GUE

Let us calculate E(Tr(Y k )), for Y = (gi j ) a N × N GUE random matrix. We first sup-
2
pose for convenience that the entries of Y have been normalized so that E(gi j ) =
1. Now
20 1 Asymptotic Freeness of Gaussian Random Matrices

N
E(Tr(Y k )) = ∑ E(gi1 i2 gi2 i3 · · · gik i1 ).
i1 ,...,ik =1

By Wick’s formula (1.8), E(gi1 i2 gi2 i3 · · · gik i1 ) = 0 whenever k is odd, and otherwise

E(gi1 i2 gi2 i3 · · · gi2k i1 ) = ∑ Eπ (gi1 i2 , gi2 i3 , . . . , gi2k i1 ).

π∈P2 (2k)

Now E(gir ir+1 gis is+1 ) will be 0 unless ir = is+1 and is = ir+1 (using the convention
2
that i2k+1 = i1 ). If ir = is+1 and is = ir+1 then E(gi i gi i ) = E(gi i ) = 1.
r r+1 s s+1 r r+1
Thus given (i1 , . . . , i2k ), E(gi1 i2 gi2 i3 · · · gi2k i1 ) will be the number of pairings π of [2k]
such that for each pair (r, s) of π, ir = is+1 and is = ir+1 .
In order to easily count these we introduce the following notation. We regard the
2k-tuple (i1 , . . . , i2k ) as a function i : [2k] → [N]. A pairing π = {(r1 , s1 )(r2 , s2 ), . . . ,
(rk , sk )} of [2k] will be regarded as a permutation of [2k] by letting (ri , si ) be the
transposition that switches ri with si and π = (r1 , s1 ) · · · (rk , sk ) as the product of
these transpositions. We also let γ2k be the permutation of [2k] which has the one
cycle (1, 2, 3, . . . , 2k). With this notation our condition on the pairings has a simple
expression. Let π be a pairing of [2k] and (r, s) be a pair of π. The condition ir =
is+1 can be written as i(r) = i(γ2k (π(r))) since π(r) = s and γ2k (π(r)) = s + 1.
Thus Eπ (gi1 i2 , gi2 i3 , . . . , gi2k i1 ) will be 1 if i is constant on the orbits of γ2k π and 0
otherwise. For a permutation σ , let #(σ ) denote the number of cycles of σ . Thus

N n i is constant on the o
E(Tr(Y 2k )) = π ∈ P2 (2k)

∑ orbits of γ2k π

i1 ,...,i2k =1
n i is constant on the o
= ∑ i : [2k] → [N]

orbits of γ2k π

π∈P (2k)2

= ∑ N #(γ2k π) .
π∈P2 (2k)

We summarize this result in the statement of the following theorem.

Theorem 3. Let YN = (gi j ) be a N × N GUE random matrix with entries normalized

so that E(|gi j |2 ) = 1 for all i and j. Then

E(Tr(YN2k )) = ∑ N #(γ2k π) .
π∈P2 (2k)

Moreover, for XN = N −1/2YN = ( fi j ), with the normalization E(| fi j |2 ) = 1/N, we

have
E(tr(XN2k )) = ∑ N #(γ2k π)−k−1 .
π∈P2 (2k)
1.8 Non-crossing partitions and permutations 21

Here, Tr denotes the usual, unnormalized, trace, whereas tr = N1 Tr is the normalized

trace.

The expansion in this theorem is usually addressed as genus expansion. In the

next section we will elaborate more on this.
In the mathematical literature this genus expansion appeared for the first time
in the work of Harer and Zagier [91] in 1986, but was mostly overlooked for a
while, until random matrices became main stream also in mathematics in the new
millennium; in physics, on the other side, such expansions were kind of folklore
and at the basis of Feynman diagram calculations in quantum field theory; see for
example [170, 45, 207].

1.8 Non-crossing partitions and permutations

In order to find the limit of E(tr(X 2k )) we have to understand the sign of the quantity
#(γ2k π) − k − 1. We shall show that for all pairings #(γ2k π) − k − 1 ≤ 0 and identify
the pairings for which we have equality. As we shall see that the π’s for which we
have equality are the non-crossing pairings, let us begin by reviewing some material
on non-crossing partitions from [140, Lecture 9].
Let π be a partition of [n]. If we can find i < j < k < l such that i and k are in one
block V of π and j and l are in another block W of π we say that V and W cross.
If no pair of blocks of π cross then we say π is non-crossing. We denote the set of
non-crossing partitions of [n] by NC(n). The set of non-crossing pairings of [n] is
denoted NC2 (n). We discuss this more fully in §2.2
Given a permutation π ∈ Sn we consider all possible factorizations into prod-
ucts of transpositions. For example we can write (1, 2, 3, 4) = (1, 4)(1, 3)(1, 2) =
(1, 2)(1, 4)(2, 3)(1, 4)(3, 4). We let |π| be the least number of transpositions needed
to factor π. In the example above |(1, 2, 3, 4)| = 3. From this definition we see that
|πσ | ≤ |π| + |σ |, |π −1 | = |π|, and |e| = 0, that is | · | is a length function on Sn .
There is a very simple relation between |π| and #(π), namely for π ∈ Sn we have
|π| + #(π) = n. There is a simple observation that will be used to establish this and
many other useful inequalities. Let (i1 , . . . , ik ) be a cycle of a permutation π and
1 ≤ m < n ≤ k. Then (i1 , . . . , ik )(im , in ) = (i1 , . . . , im , in+1 , . . . , ik )(im+1 , . . . , in ). From
this we immediately see that if π is a permutation and τ = (r, s) is a transposition
then #(πτ) = #(π) + 1 if r and s are in the same cycle of π and #(πτ) = #(π) − 1
if r and s are in different cycles of π. Thus we easily deduce that for any trans-
positions τ1 , . . . , τk in Sn we have #(τ1 · · · τk ) ≥ n − k as, starting with the identity
permutation (with n cycles), each transposition τi can reduce the number of cycles
by at most 1. This shows that #(π) ≥ n − |π|. On the other hand we have for any
cycle (i1 , . . . , ik ) = (i1 , ik )(i1 , ik−1 ) · · · (i1 , i2 ) is the product of k − 1 transpositions.
Thus |π| ≤ n − #(π). See [140, Lecture 23] for a further discussion.
Let us return to our original problem and let π be a pairing of [2k]. We regard π
as a permutation in S2k as above. Then #(π) = k, so |π| = k. Also |γ2k | = 2k − 1. The
triangle inequality gives us |γ2k | ≤ |π| + |γ2k π| (since π = π −1 ), or #(γ2k π) ≤ k + 1.
22 1 Asymptotic Freeness of Gaussian Random Matrices

This shows that #(γ2k π) − k − 1 ≤ 0 for all pairings π. Next we have to identify
for which π’s we have equality. For this we use a theorem of Biane which embeds
NC(n) into Sn .
We let γn = (1, 2, 3, . . . , n). Let π be a partition of [n]. We can arrange the elements
of the blocks of π in increasing order and consider these blocks to be the cycles of a
permutation, also denoted π. When we regard π as a permutation #(π) also denotes
the number of cycles of π. Biane’s result is that π is non-crossing, as a partition,
if and only if, the triangle inequality |γn | ≤ |π| + |π −1 γn | becomes an equality. In
terms of cycles this means #(π) + #(π −1 γn ) ≤ n + 1 with equality only if π is non-
crossing. This is a special case of a theorem which states that for π and σ , any two
permutations of [n] such that the subgroup generated by π and σ acts transitively
on [n], there is an integer g ≥ 0 such that #(π) + #(π −1 σ ) + #(σ ) = n + 2(1 − g),
and g is the minimal genus of a surface upon which the ‘graph’ of π relative to σ
can be embedded. See [61, Propriété II.2] and Fig. 1.1. Thus we can say that π is
non-crossing with respect to σ if |σ | = |π| + |π −1 σ |. We shall need this relation in
Chapter 5. An easy corollary of the equation #(π) + #(π −1 σ ) + #(σ ) = n + 2(1 − g)
is that if π is a pairing of [2k] and #(γ2k π) < k + 1 then #(γ2k π) < k.

Example 4. γ = (1, 2, 3, 4, 5, 6), π = (1, 4)(2, 5)(3, 6), #(π) = 3, #(γ6 ) = 1, #(π −1 γ6 )
= 2, #(π) + #(π −1 γ6 ) + #(γ6 ) = 6, therefore g = 1.

Fig. 1.1 A surface of genus 1 with the

pairing (1, 4)(2, 5)(3, 6) drawn on it.

4 5 6
3 2 1

If g = 0 the surface is a sphere and the graph is planar and we say π is planar
relative to γ. When γ has one cycle, ‘planar relative to γ’ is what we have been
calling a non-crossing partition; for a proof of Biane’s theorem see [140, Proposition
23.22].

Proposition 5. Let π ∈ Sn , then π ∈ NC(n) if and only if |π| + |π −1 γn | = |γn |.

Corollary 6. If π is a pairing of [2k] then #(γ2k π) ≤ k − 1 unless π is non-crossing

in which case #(γ2k π) = k + 1.
1.10 Asymptotic freeness of independent GUE’s 23

1.9 Wigner’s semi-circle law

2 1
Consider again our GUE matrices XN = ( fi j ) with normalization E( fi j ) = N.
Then, by Theorem 3, we have

E(tr(XN2k )) = N −(k+1) ∑ N #(γ2k π)

π∈P2 (2k)

= ∑ N −2gπ , (1.9)
π∈P2 (2k)

because #(π −1 γ) = #(γπ −1 ) for any permutations π and γ, and if π is a pairing then
π = π −1 . Thus Ck := limN→∞ E(tr(XN2k )) is the number of non-crossing pairings of
[2k], i.e. the cardinality of NC2 (2k).It is well-known that this is the k-th Catalan
1 2k

number k+1 k (see [140, Lemma 8.9], or (2.5) in the next chapter).

p
Fig. 1.2 The graph of (2π)−1 4 − t 2 . The 2kth mo- 0.3
ment of the Rsemi-circle
√ law is the Catalan number
2 2k
Ck = (2π)−1 −2 t 4 − t 2 dt. 0.2

0.1

-2 -1 0 1 2

Since the Catalan numbers are the moments of the semi-circle distribution, we
have arrived at Wigner’s famous semi-circle law [201], which says that√the spec-
tral measures of {XN }N , relative to the state E(tr(·)), converge to (2π)−1 4 − t 2 dt,
i.e. the expected proportion of eigenvalues of X between a and b is asymptotically
R √
(2π)−1 ab 4 − t 2 dt. See Fig 1.2.

Theorem 7. Let {XN }N be a sequence of GUE random matrices, normalized so that

E(| fi j |2 ) = 1/N for the entries of XN . Then
Z 2
1 p
lim E(tr(XNk )) = tk 4 − t 2 dt.
N 2π −2

If we arrange that all the XN ’s are defined on the same probability space XN : Ω →
MN (C) we can say something stronger: {tr(XNk )}N converges to the kth moment
−1
R 2 k√
(2π) −2 t 4 − t 2 dt almost surely. We shall prove this in Chapters 4 and 5. See
Theorem 4.4 and Remark 5.14.

1.10 Asymptotic freeness of independent GUE’s

Suppose that for each N, X1 , . . . , Xs are independent N × N GUE’s. For notational
simplicity we suppress the dependence on N. Suppose m1 , . . . , mr are positive inte-
gers and i1 , i2 , . . . , ir ∈ [s] such that i1 6= i2 , i2 6= i3 , . . . , ir−1 6= ir . Consider the ran-
dom N × N matrix YN := (Xim1 1 − cm1 I)(Xim2 2 − cm2 I) · · · (Ximr r − cmr I), where cm is the
24 1 Asymptotic Freeness of Gaussian Random Matrices

asymptotic value of the m-th moment of Xi (note that this is the same for all i); i.e.,
cm is zero for m odd and the Catalan number Cm/2 for m even.
Each factor is centred asymptotically and adjacent factors have independent en-
tries. We shall show that E(tr(YN )) → 0 and we shall call this property asymptotic
freeness. This will then motivate Voiculescu’s definition of freeness.
First let us recall the principle of inclusion-exclusion (see Stanley [167, Vol. 1,
Chap. 2]). Let S be a set and E1 , . . . , Er ⊆ S. Then
r
|S \ (E1 ∪ · · · ∪ Er )| = |S| − ∑ |Ei | + ∑ |Ei1 ∩ Ei2 | + · · ·
i=1 i1 6=i2

+ (−1)k Ei ∩ · · · ∩ Ei + · · · + (−1)r |E1 ∩ · · · ∩ Er | ;

∑ 1 k
(1.10)
i1 ,...,ik
distinct

for example, |S \ (E1 ∪ E2 )| = |S| − (|E1 | + |E2 |) + |E1 ∩ E2 |.

We can rewrite the right-hand side of (1.10) as

\
|S \ (E1 ∪ · · · ∪ Er )| = (−1)m |Ei1 ∩ · · · ∩ Eim | = ∑ (−1)|M| Ei

∑
M⊆[r] M⊆[r]
i∈M

M={i1 ,...,im }

= S and (−1)|0|/ = 1.
T
provided we make the convention that i∈0/ Ei

Notation 8 Let i1 , . . . , im ∈ [s]. We regard these labels as the colours of the matrices
Xi1 , Xi2 , . . . , Xim . Given a pairing π ∈ P2 (m), we say that π respects the colours
i := (i1 , . . . , im ), or to be brief: π respects i, if ir = i p whenever (r, p) is a pair of π.
Thus π respects i if and only if π only connects matrices of the same colour.

Lemma 9. Suppose i1 , . . . , im ∈ [s] are positive integers. Then

E(tr(Xi1 · · · Xim )) = |{π ∈ NC2 (m) | π respects i}| + O(N −2 ).

Proof: The proof proceeds essentially in the same way as for the genus expansion
of moments of one GUE matrix.
(i ) (i )
E(tr(Xi1 · · · Xim )) = ∑ E( f j11j2 · · · f jmm, j1 )
j1 ,..., jm
(i ) (i )
= ∑ ∑ Eπ ( f j11, j2 , . . . , f jmm, j1 )
j1 ,..., jm π∈P2 (m)
(i ) (i )
= ∑ ∑ Eπ ( f j11, j2 , . . . , f jmm, j1 )
π∈P2 (m) j1 ,..., jm
π respects i
by (1.9)
= ∑ N −2gπ
π∈P2 (m)
π respects i
1.10 Asymptotic freeness of independent GUE’s 25

= |{π ∈ NC2 (m) | π respects i}| + O(N −2 ).

The penultimate equality follows in the same way as in the calculations leading to
Theorem 3; for this note that we have for π which respects i that
(i ) (i ) (1) (1)
Eπ ( f j11, j2 , . . . , f jmm, j1 ) = Eπ ( f j1 , j2 , . . . , f jm , j1 ),

so for the contribution of such a π which respects i it does not play a role any more
that we have several matrices instead of one.

Theorem 10. If i1 6= i2 , i2 6= i3 , . . . , ir−1 6= ir then limN E(tr(YN )) = 0.

Proof: Let I1 = {1, . . . , m1 }, I2 = {m1 +1, . . . , m1 +m2 }, . . . , Ir = {m1 +· · ·+mr−1 +

1, . . . , m1 + · · · + mr } and m = m1 + · · · + mr . Then

E tr((Xim1 1 − cm1 I) · · · (Ximr r − cmr I))

|M| m
= ∑ (−1) ∏ cmi E tr ∏ Xi j j
M⊆[r] i∈M j∈M
/

= ∑ (−1)|M| ∏ cmi |{π ∈ NC2 (∪ j6∈M I j ) | π respects i}| + O(N −2 ).
M⊆[r] i∈M

Let S = {π ∈ NC2 (m) | π respects i} and E j = {π ∈ S | elements of I j are only

paired amongst themselves }. Then

\
E j = ∏ cm j {π ∈ NC2 (∪ j∈M
/ I j ) | π respects i} .

j∈M j∈M

Thus

E tr((Xim1 1 − cm1 I) · · · (Ximr r − cmr I)) = (−1)|M| E j + O(N −2 ).
\
∑
M⊆[r] j∈M

So we must show that

(−1)|M|
\
∑ E j = 0.
M⊆[r] j∈M

However by inclusion-exclusion this sum equals |S \ (E1 ∪ · · · ∪ Er )|. Now S \ (E1 ∪

· · ·∪Er ) is the set of pairings of [m] respecting i such that at least one element of each
interval is connected to another interval. However this set is empty because elements
of S \ (E1 ∪ · · · ∪ Er ) must connect each interval to at least one other interval in a
non-crossing way and thus form a non-crossing partition of the intervals {I1 , . . . , Ir }
without singletons, in which no pair of adjacent intervals are in the same block, and
this is impossible.
26 1 Asymptotic Freeness of Gaussian Random Matrices

1.11 Freeness and asymptotic freeness

Let XN,1 , . . . , XN,s be independent N × N GUE random matrices. For each N let AN,i
be the polynomials in XN,i with complex coefficients. Let AN be the algebra gener-
ated by AN,1 , . . . , AN,s . For A ∈ AN let ϕN (A) = E(tr(A)). Thus AN,1 , . . . , AN,s are
unital subalgebras of the unital algebra AN with state ϕN .
We have just shown in Theorem 7 that given a polynomial p we have that
limN ϕN (AN,i ) exists where AN,i = p(XN,i ). Moreover we have from Theorem 10
that given polynomials p1 , . . . , pr and positive integers j1 , . . . , jr such that
◦ limN ϕN (AN,i ) = 0 for i = 1, 2, . . . , r
◦ j1 6= j2 , j2 6= j3 , . . . , jr−1 6= jr
that limN ϕN (AN,1 AN,2 · · · AN,r ) = 0, where AN,i = pi (XN, ji ). We thus say that the
subalgebras AN,1 , . . . , AN,s are asymptotically free because, in the limit as N tends
to infinity, they satisfy the freeness property of Voiculescu. We state this below;
in the next chapter we shall explore freeness in detail. Note that asymptotic free-
ness implies that for any polynomials p1 , . . . , pr and i1 , . . . , ir ∈ [s] we have that
limN ϕN (p1 (XN,i1 ) · · · pr (XN,ir )) exists. So the random variables {XN,1 , . . . , XN,s }
have a joint limit distribution and it is the distribution of free random variables.
Definition 11. Let (A, ϕ) be a unital algebra with a unital linear functional. Suppose
A1 , . . . , As are unital subalgebras. We say that A1 , . . . , As are freely independent (or
just free) with respect to ϕ if whenever we have r ≥ 2 and a1 , . . . , ar ∈ A such that
◦ ϕ(ai ) = 0 for i = 1, . . . , r
◦ ai ∈ A ji with 1 ≤ ji ≤ s for i = 1, . . . , r
◦ j1 6= j2 , j2 6= j3 , . . . , jr−1 6= jr
we must have ϕ(a1 · · · ar ) = 0. We can say this succinctly as: the alternating product
of centred elements is centred.

1.12 Basic properties of freeness

We adopt the general philosophy of regarding freeness as a non-commutative ana-
logue of the classical notion of independence in probability theory. Thus we refer to
it often as free independence.
Definition 12. In general we refer to a pair (A, ϕ), consisting of a unital algebra
A and a unital linear functional ϕ : A → C with ϕ(1) = 1, as a non-commutative
probability space. If A is a ∗-algebra and ϕ is a state, i.e., in addition to ϕ(1) = 1
also positive (which means: ϕ(a∗ a) ≥ 0 for all a ∈ A), then we call (A, ϕ) a ∗-
probability space. If A is a C∗ -algebra and ϕ a state, (A, ϕ) is a C∗ -probability
space. Elements of A are called non-commutative random variables or just random
variables.
If (A, ϕ) is a ∗-probability space and ϕ(x∗ x) = 0 only when x = 0 we say that ϕ
is faithful. If (A, ϕ) is a non-commutative probability space, we say that ϕ is non-
degenerate if we have: ϕ(yx) = 0 for all y ∈ A implies that x = 0; and ϕ(xy) = 0
1.12 Basic properties of freeness 27

for all y ∈ A implies that x = 0. By the Cauchy-Schwarz inequality, for a state on

a ∗-probability space “non-degenerate” and “faithful” are equivalent. If A is a von
Neumann algebra and ϕ is a faithful normal state, i.e. continuous with respect to
the weak-* topology, (A, ϕ) is called a W ∗ -probability space. If ϕ is also a trace,
i.e., ϕ(ab) = ϕ(ba) for all a, b ∈ A, then it is a tracial W ∗ -probability space. For a
tracial W ∗ -probability space we will usually write (M, τ) instead of (A, ϕ).
Proposition 13. Let (B, ϕ) be a non-commutative probability space. Consider uni-
tal subalgebras A1 , . . . , As ⊂ B which are free. Let A be the algebra generated by
A1 , . . . , As . Then ϕ|A is determined by ϕ|A1 , . . . , ϕ|As and the freeness condition.

Proof: Elements in the generated algebra A are linear combinations of words of

the form a1 · · · ak with a j ∈ Ai j for some i j ∈ {1, . . . , s} which meet the condition
that neighbouring elements come from different subalgebras. We need to calculate
ϕ(a1 · · · ak ) for such words. Let us proceed in an inductive fashion.
We know how to calculate ϕ(a) for a ∈ Ai for some i ∈ {1, . . . , s}.
Now suppose we have a word of the form a1 a2 with a1 ∈ Ai1 and a2 ∈ Ai2 with
i1 6= i2 . By the definition of freeness, this implies

ϕ[(a1 − ϕ(a1 )1)(a2 − ϕ(a2 )1)] = 0.

But

(a1 − ϕ(a1 )1)(a2 − ϕ(a2 )1) = a1 a2 − ϕ(a2 )a1 − ϕ(a1 )a2 + ϕ(a1 )ϕ(a2 )1.

Hence we have

ϕ(a1 a2 ) = ϕ ϕ(a2 )a1 + ϕ(a1 )a2 − ϕ(a1 )ϕ(a2 )1 = ϕ(a1 )ϕ(a2 ).

Continuing in this fashion, we know that ϕ(å1 · · · åk ) = 0 by the definition of free-
ness, where åi = ai − ϕ(ai )1 is a centred random variable. But then

ϕ(å1 · · · åk ) = ϕ(a1 · · · ak ) + lower order terms in ϕ,

where the lower order terms are already dealt with by induction hypothesis.
Remark 14. Let (A, ϕ) be a non-commutative probability space. For any subalgebra
B ⊂ A we let B̊ = B ∩ ker ϕ. Let A1 and A2 be unital subalgebras of A, we let
A1 ∨ A2 be the subalgebra of A generated algebraically by A1 and A2 . With this
notation we can restate Proposition 13 as follows. If A1 and A2 are free then
⊕ ⊕
ker ϕ|A1 ∨A2 = ∑ ∑ Åα1 Åα2 · · · Åαn (1.11)
n≥1 α1 6=···6=αn

where α1 , . . . , αn ∈ {1, 2}.

For subalgebras C ⊂ B ⊂ A we shall let B C = {b ∈ B | ϕ(cb) = 0 for all c ∈ C}.
When ϕ|C is non-degenerate we have C C = {0}.
28 1 Asymptotic Freeness of Gaussian Random Matrices

Exercise 9. Let (A, ϕ) be a non-commutative probability space. Suppose A1 , A2 ⊂

A are unital subalgebras and are free with respect to ϕ. If ϕ|A1 is non-degenerate
then
⊕ ⊕
(A1 ∨ A2 ) A1 = Å2 ⊕ ∑ ∑ Åα1 Åα2 · · · Åαn . (1.12)
n≥2 α1 6=···6=αn

Definition 15. Let (A, ϕ) be a non-commutative probability space. Elements a1 , . . . ,

as ∈ A are said to be free or freely independent if the generated unital subalge-
bras Ai = alg(1, ai ) (i = 1, . . . , s) are free in A with respect to ϕ. If (A, ϕ) is a
∗-probability space then we say that a1 , . . . , as ∈ A are ∗-free if the generated unital
∗-subalgebras Bi = alg(1, ai , a∗i ) (i = 1, . . . , s) are free in A with respect to ϕ. In the
same way, (∗-)freeness between sets of variables is defined by the freeness of the
generated unital (∗-)subalgebras.

In terms of random variables, Proposition 13 says that mixed moments of free

variables are calculated in a specific way out of the moments of the separate vari-
ables. This is in clear analogy to the classical notion of independence.
Let us look at some examples for such calculations of mixed moments. For ex-
ample, if a, b are freely independent, then ϕ[(a − ϕ(a)1)(b − ϕ(b)1)] = 0, implying
ϕ(ab) = ϕ(a)ϕ(b).
In a slightly more complicated example, let {a1 , a2 } be free from b. Then apply-
ing the state to the corresponding centred word:

ϕ[(a1 − ϕ(a1 )1)(b − ϕ(b)1)(a2 − ϕ(a2 )1)] = 0,

hence the linearity of ϕ gives

ϕ(a1 ba2 ) = ϕ(a1 a2 )ϕ(b). (1.13)

A similar calculation shows that if {a1 , a2 } is free from {b1 , b2 }, then

ϕ(a1 b1 a2 b2 ) = ϕ(a1 a2 )ϕ(b1 )ϕ(b2 )

+ ϕ(a1 )ϕ(a2 )ϕ(b1 b2 ) − ϕ(a1 )ϕ(a2 )ϕ(b1 )ϕ(b2 ). (1.14)

It is important to note that while free independence is analogous to classical

independence, it is not a generalization of the classical case. Classical commuting
random variables a, b are free only in trivial cases: ϕ(aabb) = ϕ(abab), but the left-
hand side is ϕ(aa)ϕ(bb) while the right-hand side is ϕ(a2 )ϕ(b)2 + ϕ(a)2 ϕ(b2 ) −
ϕ(a)2 ϕ(b)2 , which implies ϕ[(a − ϕ(a))2 ] · ϕ[(b − ϕ(b))2 ] = 0. But then (note that
states in classical probability spaces are always positive and faithful) one of the
factors inside ϕ must be 0, so that one of a, b must be a scalar.
Observe that while freeness gives a concrete rule for calculating mixed moments,
this rule is a priori quite complicated. We will come back to this question for a better
1.13 Classical moment-cumulant formulas 29

understanding of this rule in the next chapter. For the moment let us just note the
following.
Proposition 16. Let (A, ϕ) be a non-commutative probability space. The subalge-
bra of scalars C1 is free from any other unital subalgebra B ⊂ A.

Proof: Let a1 · · · ak be an alternating word in centred elements of C1, B. The case

k = 1 is trivial, otherwise we have at least one a j ∈ C1. But then ϕ(a j ) = 0 implies
a j = 0, so a1 · · · ak = 0. Thus obviously ϕ(a1 · · · ak ) = 0.

1.13 Classical moment-cumulant formulas

At the beginning of this chapter we introduced the cumulants of a probability mea-
sure ν via the logarithm of its characteristic function: if {αn }n are the moments of
ν and
zn zn
∑ kn n! = log 1 + ∑ αn n! (1.15)
k≥1 n≥1

is the logarithm of the moment-generating function then {kn }n are the cumulants of
ν. We gave without proof two formulas (1.1) and (1.2) showing how to compute the
nth moment from the first n cumulants and conversely.
In the exercises below we shall prove equations (1.1) and (1.2) as well as showing
the very simple restatements in terms of set partitions

αn = ∑ kπ and kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ .

π∈P (n) π∈P (n)

The simplicity of these formulas, in particular the first, makes them very useful for
computation. Moreover they naturally lead to the moment-cumulant formulas for
the free cumulants in which the set P(n) of all partitions of [n] is replaced by NC(n)
the set of non-crossing partitions of [n]. This will be taken up in Chapter 2.
It was shown in Exercise 4 that if we have two sequences {αn }n and {kn }n such
that αn = ∑π∈P (n) kπ then we have (1.15) as relation between their exponential
power series. In Exercises 11 and 12 this is proved again starting from the formal
power series relation and ending with the first moment-cumulant relation. This can
be regarded as a warm-up for Exercises 13 and 14 when we prove the second half
of the moment-cumulant relation:

kn = ∑ (−1)#(π)−1 (#(π) − 1)!απ .

π∈P (n)

This formula can also be proved by the general theory of Möbius inversion in P(n)
after identifying the Möbius function on P(n) (see [140, Ex. 10.33]).
So far we have only considered cumulants of a single random variable; we need
an extension to several random variables so that kn becomes a n-linear functional.
We begin with mixed moments and extend the notation used in Section 1.5. Let
{Xi }i be a sequence of random variables and π ∈ P(n), we let
30 1 Asymptotic Freeness of Gaussian Random Matrices

Eπ (X1 , . . . , Xn ) = ∏ E(Xi1 Xi2 · · · Xil ).

V ∈π
V =(i1 ,...,il )

Then we set

kn (X1 , . . . , Xn ) = ∑ (−1)#(π)−1 (#(π) − 1)! Eπ (X1 , . . . , Xn ).

π∈P (n)

We then define kπ as above; namely for π ∈ P(n) we set

kπ (X1 , . . . , Xn ) = ∏ kl (Xi1 , . . . , Xil ).

V ∈π
V =(i1 ,...,il )

Our cumulant-moment formula can be recast as a multilinear moment-cumulant

formula
E(X1 · · · Xn ) = ∑ kπ (X1 , . . . , Xn ).
π∈P (n)

Another formula we shall need is the product formula of Leonov and Shiryaev
for cumulants (see [140, Theorem 11.30]). Let n1 , . . . , nr be positive integers and
n = n1 + · · · + nr . Given random variables X1 , . . . , Xn let Y1 = X1 · · · Xn1 , Y2 =
Xn1 +1 · · · Xn1 +n2 , . . . , Yr = Xn1 +···+nr−1 +1 · · · Xn1 +···+nr . Then

kr (Y1 , . . . ,Yr ) = ∑ kπ (X1 , . . . , Xn ) (1.16)

π∈P (n)
π∨τ=1n

where the sum runs over all π ∈ P(n) such that π ∨ τ = 1n and τ ∈ P(n) is the
partition with r blocks

{(1, . . . , n1 ), (n1 + 1, . . . , n1 + n2 ), · · · , (n1 + · · · + nr−1 + 1, . . . , n1 + · · · + nr )}

and 1n ∈ P(n) is the partition with one block. Here ∨ denotes the join in the lattice
of all partitions (see [140, Remark 9.19]).
In the next chapter we will have in (2.19) an analogue of (1.16) for free cumu-
lants.

1.14 Additional exercises

Exercise 10. (i) Let ∑∞ n

n=1 βn z be a formal power series. Using the power series
x
expansion for e show that as a formal power series
∞ ∞ n βl1 · · · βlm n
exp ∑ βn zn = 1+ ∑ ∑ ∑ m!
z .
n=1 n=1 m=1 l1 ,...,lm ≥1
l1 +···+lm =n

(ii) Show
1.14 Additional exercises 31
∞
n
∞ β1r1 · · · βnrn n
exp ∑ n = 1+ ∑
β z ∑ r1 !r2 ! · · · rn !
z .
n=1 n=1 r1 ,...,rn ≥0
1·r1 +···+n·rn =n

Use this to prove equation (1.1).

βn n
Exercise 11. Let ∑∞ n=1 n! z be a formal power series. For a partition π of type
(r1 , r2 , . . . , rn ), let βπ = β1r1 β2r2 · · · βnrn . Show that
∞
βn n ∞ zn
exp ∑ z = 1+ ∑ ∑ βπ .
n=1 n! n=1 π∈P (n) n!

Exercise 12. Let ∑∞ n

n=1 βn z be a formal power series. Using the power series expan-
sion for log(1 + x) show that
∞ ∞ β1r1 · · · βnrn n
log 1 + ∑ βn zn = ∑ ∑ (−1)r1 +···+rn −1 (r1 + · · · + rn − 1)! z .
n=1 n=1 1·r1 +···+n·rn =n r1 · · · rn !

Use this to prove equation (1.2).

z n
Exercise 13. (i) Let ∑∞
n=1 αn n! be a formal power series. Show that

∞
zn ∞
#(π)−1
zn
log 1 + ∑ αn =∑ ∑ (−1) (#(π) − 1)! απ .
n=1 n! n=1 π∈P (n) n!

(ii) Let
kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ .
π∈P (n)

Use the result of Exercise 11 to show that

αn = ∑ kπ .
π∈P (n)

Exercise 14. Suppose ν is a probability measure with moments {αn }n of all orders
and let {kn }n be its sequence of cumulants. Show that

αn = ∑ kπ and kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ .

π∈P (n) π∈P (n)
Chapter 2
The Free Central Limit Theorem and Free Cumulants

Recall from Chapter 1 that if (A, ϕ) is a non-commutative probability space and

A1 , . . . , As are subalgebras of A which are free with respect to ϕ, then freeness
gives us in principle a rule by which we can evaluate ϕ(a1 a2 · · · ak ) for any alternat-
ing word in random variables a1 , a2 , . . . , ak . Thus we can in principle calculate all
mixed moments for a system of free random variables. However, we do not yet have
any concrete idea of the structure of this factorization rule. This situation will be
greatly clarified by the introduction of free cumulants. Classical cumulants appeared
in Chapter 1, where we saw that they are intimately connected with the combina-
torial notion of set partitions. Our free cumulants will be linked in a similar way to
the lattice of non-crossing set partitions; the latter were introduced in combinatorics
by Kreweras [113]. We will motivate the appearance of free cumulants and non-
crossing partition lattices in free probability theory by examining in detail a proof
of the central limit theorem by the method of moments.
The combinatorial approach to free probability was initiated by Speicher in [159,
161], in order to get alternative proofs for the free central limit theorem and the main
properties of the R-transform, which had been treated before by Voiculescu in [176,
177] by more analytic tools. Nica showed a bit later in [135] how this combinatorial
approach connects in general to Voiculescu’s operator-theoretic approach in terms
of creation and annihilation operators on the full Fock space. The combinatorial
path was pursued much further by Nica and Speicher; for more details on this we
refer to the standard reference [140].

2.1 The classical and free central limit theorems

Our setting is that of a non-commutative probability space (A, ϕ) and a sequence
(ai )i∈N ⊂ A of centred and identically distributed random variables. This means that
ϕ(ai ) = 0 for all i ≥ 1, and that ϕ(ani ) = ϕ(anj ) for any i, j, n ≥ 1. We assume that our
random variables ai , i ≥ 1 are either classically independent, or freely independent
as defined in Chapter 1. Either form of independence gives us a factorization rule
for calculating mixed moments in the random variables.

33
34 2 The Free Central Limit Theorem and Free Cumulants

For k ≥ 1, set
1
Sk := √ (a1 + · · · + ak ). (2.1)
k
The Central Limit Theorem is a statement about the limit distribution of the ran-
dom variable Sk in the large k limit. Let us begin by reviewing the kind of conver-
gence we shall be considering.
Recall that given a real-valued random variable X on a probability space we have
a probability measure µX on R, called the distribution of X. The distribution of X is
defined by the equation
Z
E( f (X)) = f (t) dµX (t) for all f ∈ Cb (R) (2.2)

where Cb (R) is the C∗ -algebra of all bounded continuous functions on R. We say

that a probability measure µ on R is determined by it moments if µ has moments
{αk }k of all orders and µ is the only probability measure on R with moments {αk }k .
If the moment generating function of µ has a positive radius of convergence, then µ
is determined by its moments (see Billingsley [41, Theorem 30.1]).
Exercise 1. Show that a compactly supported measure is determined by its moments.

A more general criterion is the Carleman condition (see Akhiezer [3, p. 85])
which says that a measure µ is determined by its moments {αk }k if we have
∑k≥1 (α2k )−1/(2k) = ∞.
Exercise 2. Using the Carleman condition, show that the Gaussian measure is de-
termined by its moments.
A sequence of probability measures {µn }n on R is said to converge weakly to µ
R R
if { f dµn }n converges to f dµ for all f ∈ Cb (R). Given a sequence {Xn }n of real-
valued random variables we say that {Xn }n converges in distribution (or converges
in law) if the probability measures {µXn }n converge weakly.
If we are working in a non-commutative probability space (A, ϕ) we call an
element a of A a non-commutative random variable. Given such an a we may define
R
µa by p dµa = ϕ(p(a)) for all polynomials p ∈ C[x]. At this level of generality
R
we may not be able to define f dµa for all functions f ∈ Cb (R), so we call the
linear functional µa : C[x] → C the algebraic distribution of a, even if it is not a
probability measure. However when it is clear from the context we shall just call µa
the distribution of a. Note that if a is a self-adjoint element of a C∗ -algebra and ϕ is
positive and has norm 1, then µa extends from C[x] to Cb (R) and thus µa becomes
a probability measure on R.
Definition 1. Let (Ak , ϕk ), for k ∈ N, and (A, ϕ) be non-commutative probability
spaces.
1) Let (bk )k∈N be a sequence of non-commutative random variables with bk ∈ Ak ,
distr
and let b ∈ A. We say that bk converges in distribution to b, denoted by bk −→ b, if
2.1 The classical and free central limit theorems 35

lim ϕk (bnk ) = ϕ(bn ) (2.3)

k→∞

for any fixed n ∈ N.

(i)
2) More generally, let I be an index set. For each i ∈ I, let bk ∈ Ak for k ∈ N
(i)
and b(i) ∈ A. We say that (bk )i∈I converges in distribution to (b(i) )i∈I , denoted by
(i) distr
(bk )i∈I −→ (b(i) )i∈I , if
(i ) (i )
lim ϕk (bk 1 · · · bk n ) = ϕ(b(i1 ) · · · b(in ) ) (2.4)
k→∞

for all n ∈ N and all i1 , . . . , in ∈ I.

Note that this definition is neither weaker nor stronger than weak convergence of
the corresponding distributions. For real-valued random variables the convergence
in (2.3) is sometimes called convergence in moments. However there is an impor-
tant case where the two conditions coincide. If we have a sequence of probability
measures {µk }k on R, each having moments of all orders and a probability measure
µ determined by its moments, such that for every n we have t n dµk (t) → t n dµ
R R

as k → ∞, then {µk }k converges weakly to µ (see Billingsley [41, Theorem 30.2]).

To see that weak convergence does not imply convergence in moments consider the
sequence {µk }k where µk = (1 − 1/k)δ0 + (1/k)δk and δk is the probability measure
with an atom at k of mass 1.
Exercise 3. Show that {µk }k converges weakly to δ0 , but that we do not have con-
vergence in moments.
We want to make a statement about convergence in distribution of the ran-
dom variables (Sk )k∈N from (2.1) (which all come from the same underlying non-
commutative probability space). Thus we need to do a moment calculation. Let [k]
= {1, . . . , k} and [n] = {1, . . . , n}. We have

1
ϕ(Skn ) = ∑ ϕ(ar1 · · · arn ).
kn/2 r:[n]→[k]

It turns out that the fact that the random variables a1 , . . . , ak are independent and
identically distributed makes the task of calculating this sum less complex than it
initially appears. The key observation is that because of (classical or free) inde-
pendence of the ai ’s and the fact that they are identically distributed, the value of
ϕ(ar1 . . . arn ) depends not on all details of the multi-index r, but just on the informa-
tion where the indices are the same and where they are different. Let us recall some
notation from the proof of Theorem 1.1.
Notation 2 Let i = (i1 , . . . , in ) be a multi-index. Then its kernel, denoted by ker i, is
that partition in P(n) whose blocks correspond exactly to the different values of the
indices,
k and l are in the same block of ker i ⇐⇒ ik = il .
36 2 The Free Central Limit Theorem and Free Cumulants

Fig. 2.1 Suppose j1 = j3 = j4 and j2 = j5

aj1 aj2 aj3 aj4 aj5 aj6
but { j1 , j2 , j6 } are distinct. Then ker( j) =
{(1, 3, 4), (2, 5), (6)}.

Lemma 3. With this notation we have that ker i = ker j implies ϕ(ai1 · · · ain ) =
ϕ(a j1 · · · a jn ).

Proof: To see this note first that ker i = ker j implies that the i-indices can be ob-
tained from the j-indices by the application of some permutation σ , i.e. ( j1 , . . . , jn )
= (σ (i1 ), . . . , σ (in )). We know that the random variables a1 , . . . , ak are (classically
or freely) independent. This means that we have a factorization rule for calculating
mixed moments in a1 , . . . , ak in terms of the moments of individual ai ’s. In partic-
ular this means that ϕ(ai1 · · · ain ) can be written as some expression in moments
ϕ(ari ), while ϕ(a j1 · · · a jn ) can be written as that same expression except with ϕ(ari )
replaced by ϕ(arσ (i) ). However, since our random variables all have the same distri-
bution, then ϕ(ari ) = ϕ(arσ (i) ) for any i, j, and thus ϕ(ai1 · · · ain ) = ϕ(a j1 · · · a jn ).

Let us denote the common value of ϕ(ai1 · · · ain ) for all i with ker i = π, for some
π ∈ P(n), by ϕ(π). Consequently, we have

1
ϕ(Skn ) = ∑ ϕ(π) · |{i : [n] → [k] | ker i = π}|.
kn/2 π∈P (n)

It is not difficult to see that

#{i : [n] → [k] | ker i = π} = k(k − 1) · · · (k − #(π) + 1)

because we have k choices for the first block of π, k − 1 choices for the second block
of π and so on until the last block where we have k − #(π) + 1.
Then what we have proved is that
1
ϕ(Skn ) = ∑ ϕ(π) · k(k − 1) · · · (k − #(π) + 1).
kn/2 π∈P (n)

The great advantage of this expression over what we started with is that the num-
ber of terms does not depend on k. Thus we are in a position to take the limit as
k → ∞, provided we can effectively estimate each term of the sum.
Our first observation is the most obvious one, namely we have

k(k − 1) · · · (k − #(π) + 1) ∼ k#(π) as k → ∞.

Next observe that if π has a block of size 1, then we will have ϕ(π) = 0. Indeed
suppose that π = {V1 , . . . ,Vm , . . . ,Vs } ∈ P(n) with Vm = {l} for some l ∈ [n]. Then
we will have
2.1 The classical and free central limit theorems 37

ϕ(π) = ϕ(a j1 · · · a jl−1 a jl a jl+1 · · · a jn )

where ker( j) = π and thus jl 6∈ { j1 , . . . , jl−1 , jl+1 , . . . , jn }. Hence we can write
ϕ(π) = ϕ(ba jl c), where b = a j1 · · · a jl−1 and c = a jl+1 · · · a jn and thus

ϕ(π) = ϕ(ba jl c) = ϕ(a jl )ϕ(bc) = 0,

since a jl is (classically or freely) independent of {b, c}. (For the free case this factor-
ization was considered in Equation (1.13) in the last chapter. In the classical case it
is obvious, too.) Of course, for this part of the argument, it is crucial that we assume
our variables ai to be centred.
Thus the only partitions which contribute to the sum are those with blocks of size
at least 2. Note that such a partition can have at most n/2 blocks. Now,
(
k#(π) 1, if #(π) = n/2
lim = .
k→∞ kn/2 0, if #(π) < n/2

Hence the only partitions which contribute to the sum in the k → ∞ limit are those
with exactly n/2 blocks, i.e. partitions each of whose blocks has size 2. Such parti-
tions are called pairings, and the set of pairings is denoted P2 (n).
Thus we have shown that

lim ϕ(Skn ) = ∑ ϕ(π).

k→∞
π∈P2 (n)

Note that in particular if n is odd then P2 (n) = 0,

/ so that the odd limiting moments
vanish. In order to determine the even limiting moments, we must distinguish be-
tween the setting of classical independence and free independence.

2.1.1 Classical central limit theorem

In the case of classical independence, our random variables commute and factorize
completely with respect to ϕ. Thus if we denote by ϕ(a2i ) = σ 2 the common vari-
ance of our random variables, then for any pairing π ∈ P2 (n) we have ϕ(π) = σ n .
Thus we have
(
σ n (n − 1)(n − 3) . . . 5 · 3 · 1, if n even
lim ϕ(Skn ) = ∑ σ n = .
k→∞
π∈P2 (n) 0, if n odd

From Section 1.1, we recognize these as exactly the moments of a Gaussian random
variable of mean 0 and variance σ 2 . Since by Exercise 2 the normal distribution is
determined by its moments, and hence our convergence in moments is the same as
the classical convergence in distribution, we get the following form of the classical
38 2 The Free Central Limit Theorem and Free Cumulants

central limit theorem: if (ai )i∈N are classically independent random variables which
are identically distributed with ϕ(ai ) = 0 and ϕ(a2i ) = σ 2 , and having all moments,
then Sk converges in distribution to a Gaussian random variable with mean 0 and
variance σ 2 . Note that one can see the derivation above also as a proof of the Wick
formula for Gaussian random variables if one takes the central limit theorem for
granted.

2.1.2 Free central limit theorem

Now we want to deal with the case where the random variables are freely indepen-
dent. In this case, ϕ(π) will not be the same for all pair partitions π ∈ P2 (2n) (we
focus on the even moments now because we already know that the odd ones are
zero). Let’s take a look at some examples:

ϕ({(1, 2), (3, 4)}) = ϕ(a1 a1 a2 a2 ) = ϕ(a21 )ϕ(a22 ) = σ 4

ϕ({(1, 4), (2, 3)}) = ϕ(a1 a2 a2 a1 ) = ϕ(a21 )ϕ(a22 ) = σ 4
ϕ({(1, 3), (2, 4)}) = ϕ(a1 a2 a1 a2 ) = 0.

The last equality is just from the definition of freeness, because a1 a2 a1 a2 is an

alternating product of centred free variables.

1 2 3 4 5 6 1 4 5 6 5 6

Fig. 2.2 We start with the pairing {(1, 4), (2, 3), (5, 6)} and remove the pair (2, 3) of adjacent
elements (middle figure). Next we remove the pair (1, 4) of adjacent elements. We are then left
with a single pair; so the pairing must have been non-crossing to start with.

In general, we will get ϕ(π) = σ 2n if we can successively remove neighbouring

pairs of identical random variables in the word corresponding to π so that we end
with a single pair (see Fig. 2.2); if we cannot we will have ϕ(π) = 0 as in the exam-
ple ϕ(a1 a2 a1 a2 ) = 0 above. Thus the only partitions that give a non-zero contribu-
tion are the non-crossing ones (see [140, p. 122] for details). Non-crossing pairings
were encountered already in Chapter 1, where we denoted the set of non-crossing
pairings by NC2 (2n). Then we have as our free central limit theorem that

lim ϕ(Sk2n ) = σ 2n · |NC2 (2n)|.

k→∞

In Chapter 1 we already mentioned that the cardinality Cn := |NC2 (2n)| is given by

the Catalan numbers. We want now to elaborate on the proof of this claim.
2.1 The classical and free central limit theorems 39

A very simple method is to show that the pairings are in a bijective correspon-
dence with
the Dyck paths; by using André’s reflection principle one finds that there
are 2nn − 2n 1 2n
n−1 = n+1 n such paths (see [140, Prop. 2.11] for details).
Our second method for counting non-crossing pairings is to find a simple recur-
rence which they satisfy. The idea is to look at the block of a pairing which contains
the number 1. In order for the pairing to be non-crossing, 1 must be paired with
some even number in the set [2n], else we would necessarily have a crossing. Thus
1 must be paired with 2i for some i ∈ [n]. Now let i run through all possible values
in [n], and count for each the number of non-crossing pairings that contain this pair,
as in the diagram below.

1 2 2i − 1 2i 2i + 1 2n

Fig. 2.3 We have Ci−1 possible pairings on [2, 2i − 1] and Cn−i possible pairings on [2i + 1, 2n].

In this way we see that the cardinality Cn of NC2 (2n) must satisfy the recurrence
relation
n
Cn = ∑ Ci−1Cn−i , (2.5)
i=1

with initial condition C0 = 1. One can then check using a generating function that
1 2n

the Catalan numbers satisfy this recurrence, hence Cn = n+1 n .
n
Exercise 4. Let f (z) = ∑∞ n=0 Cn z be the generating function for {Cn }n , where C0 = 1
and Cn satisfies the recursion (2.5).
(i) Show that 1 + z f (z)2 = f (z). √
(ii) Show that f is also the power series for 1− 2z1−4z .
1 2n

(iii) Show that Cn = n+1 n .
1 2n

We can also prove directly that Cn = n+1 n by finding a bijection between
NC2 (2n) and some standard set of objects which we can see directly is enumerated
by the Catalan numbers. A reasonable choice for this “canonical” set is the collec-
tion of 2 × n standard Young tableaux. A standard Young tableaux of shape 2 × n is
a filling of the squares of a 2 × n grid with the numbers 1, . . . , 2n which is strictly
increasing in each of the two rows and each of the n columns. The number of these
standard Young tableaux is very easy to calculate, using a famous and fundamental
result known as the hook-length formula [167, Vol. 2, Corollary 7.21.6]. The hook-
length formula tells us that the number of standard Young tableaux on the 2 × n
rectangle is
(2n)! 1 2n
= . (2.6)
(n + 1)!n! n + 1 n
40 2 The Free Central Limit Theorem and Free Cumulants

1 2n

Thus we will have proved that |NC2 (2n)| = n+1 n if we can bijectively associate to
each pair partition π ∈ NC2 (2n) a standard Young tableaux on the 2 × n rectangular
grid. This is very easy to do. Simply take the “left-halves” of each pair in π and write
them in increasing order in the cells of the first row. Then take the “right-halves”
of each pair of π and write them in increasing order in the cells of the second row.
Figure 2.4 shows the bijection between NC2 (6) and standard Young tableaux on the
2 × 3 rectangle.

1 2 3 4 5 6 1 3 4

2 5 6

Fig. 2.4 In the bijection between NC2 (6) and 2 × 3 standard Young tableaux the pairing
{(1, 2), (3, 6), (4, 5)} gets mapped to the tableaux on the right.

Definition 4. A self-adjoint random variable s with odd moments ϕ(s2n+1 ) = 0 and

even moments ϕ(s2n ) = σ 2nCn , where Cn is the n-th Catalan number and σ > 0 is
a constant, is called a semi-circular element of variance σ 2 . In the case σ = 1, we
call it the standard semi-circular element.

The argument we have just provided gives us the Free Central Limit Theorem.
Theorem 5. If (ai )i∈N are self-adjoint, freely independent, and identically dis-
tributed with ϕ(ai ) = 0 and ϕ(a2i ) = σ 2 , then Sk converges in distribution to a
semi-circular element of variance σ 2 as k → ∞.
This free central limit theorem was proved as one of the first results in free prob-
ability theory by Voiculescu already in [176]. His proof was much more operator
theoretic; the proof presented here is due to Speicher [159] and was the first hint
at a relation between free probability theory and the combinatorics of non-crossing
partitions. (An early concrete version of the free central limit theorem, before the
notion of freeness was isolated, appeared also in the work of Bożejko [43] in the
context of convolution operators on free groups.)
Recall that in Chapter 1 it was shown that for a random matrix XN chosen from
N × N GUE we have that
(
n 0, if n odd
lim E[tr(XN )] = (2.7)
N→∞ Cn/2 , if n even

so that a GUE random matrix is a semi-circular element in the limit of large matrix
distr
size, XN −→ s.
We can also define a family of semi-circular random variables.
2.2 Non-crossing partitions and free cumulants 41

Definition 6. Suppose (A, ϕ) is a ∗-probability space. A self-adjoint family (si )i∈I ⊂

A is called a semi-circular family of covariance C = (ci j )i, j∈I if C ≥ 0 and for any
n ≥ 1 and any n-tuple i1 , . . . , in ∈ I we have

ϕ(si1 · · · sin ) = ∑ ϕπ [si1 , . . . , sin ],

π∈NC2 (n)

where
ϕπ [si1 , . . . , sin ] = ∏ ci p iq .
(p,q)∈π

If C is diagonal then (si )i∈I is a free semi-circular family.

This is the free analogue of Wick’s formula. In fact, using this language and our
definition of convergence in distribution from Definition 1, it follows directly from
Lemma 1.9 that if X1 , . . . , Xr are matrices chosen independently from GUE, then, in
the large N limit, they converge in distribution to a semi-circular family s1 , . . . , sr of
covariance ci j = δi j .
Exercise 5. Show that if {x1 , . . . , xn } is a semi-circular family and A = (ai j ) is an
invertible matrix with real entries then {y1 , . . . , yn } is a semi-circular family where
yi = ∑ j ai j x j .

Exercise 6. Let {x1 , . . . , xn } be a semi-circular family such that for all i and j we
have ϕ(xi x j ) = ϕ(x j xi ). Show that by diagonalizing the covariance matrix we can
find an orthogonal matrix O = (oi j ) such that {y1 , . . . , yn } is a free semi-circular
family where yi = ∑ j oi j x j .

Exercise 7. Formulate and prove a multidimensional version of the free central limit
theorem.

2.2 Non-crossing partitions and free cumulants

We begin by recalling some relevant definitions concerning non-crossing partitions
from Section 1.8.

Definition 7. A partition π ∈ P(n) is called non-crossing if there do not exist num-

bers i, j, k, l ∈ [n] with i < j < k < l such that: i and k are in the same block of π,
j and l are in the same block of π, but i and j are not in the same block of π. The
collection of all non-crossing partitions of [n] was denoted NC(n).

Fig. 2.5 A crossing in a partition. … i … j … k … l …

42 2 The Free Central Limit Theorem and Free Cumulants

Figure 2.5 should make it clear what a crossing in a partition is; a non-crossing
partition is a partition with no crossings.
Note that P(n) is partially ordered by

π1 ≤ π2 ⇐⇒ each block of π1 is contained in a block of π2 . (2.8)

We also say that π1 is a refinement of π2 . NC(n) is a subset of P(n) and inherits this
partial order, so NC(n) is an induced sub-poset of P(n). In fact both are lattices;
they have well-defined join ∨ and meet ∧ operations (though the join of two non-
crossing partitions in NC(n) does not necessarily agree with their join when viewed
as elements of P(n)). Recall that the join π1 ∨ π2 in a lattice is the smallest σ with
the property that σ ≥ π1 and σ ≥ π2 ; and that the meet π1 ∧ π2 is the largest σ with
the property that σ ≤ π1 and σ ≤ π2 .
We now define the important free cumulants of a non-commutative probability
space (A, ϕ). They were introduced by Speicher in [161]. For other notions of cu-
mulants and the relation between them see [10, 74, 117, 153].

Definition 8. Let (A, ϕ) be a non-commutative probability space. The correspond-

ing free cumulants κn : An → C (n ≥ 1) are defined inductively in terms of moments
by the moment-cumulant formula

ϕ(a1 · · · an ) = ∑ κπ (a1 , . . . , an ), (2.9)

π∈NC(n)

where, by definition, if π = {V1 , . . . ,Vr } then

κπ (a1 , . . . , an ) = ∏ κl (ai1 , . . . , ail ). (2.10)

V ∈π
V =(i1 ,...,il )

Remark 9. In Equation (2.10) and below, we always mean that the elements i1 , . . . , il
of V are in increasing order. Note that Equation (2.9) has a formulation using Möbius
inversion which we might call the cumulant-moment formula. To present this we
need the moment version of Equation (2.10). For a partition π ∈ P(n) with π =
{V1 , . . . ,Vr } we set

ϕπ (a1 , . . . , an ) = ∏ ϕ(ai1 · · · ail ). (2.11)

V ∈π
V =(i1 ,...,il )

We also need the Möbius function µ for NC(n) (see [140, Lecture 10]). Then our
cumulant-moment relation can be written

κn (a1 , . . . , an ) = ∑ µ(π, 1n )ϕπ (a1 , . . . , an ). (2.12)

π∈NC(n)
2.2 Non-crossing partitions and free cumulants 43

One could use Equation (2.12) as the definition of free cumulants, however for prac-
tical calculations Equation (2.9) is usually easier to work with.
Example 10. (1) For n = 1, we have ϕ(a1 ) = κ1 (a1 ), and thus

κ1 (a1 ) = ϕ(a1 ). (2.13)

(2) For n = 2, we have

ϕ(a1 a2 ) = κ{(1,2)} (a1 , a2 ) + κ{(1),(2)} (a1 , a2 ) = κ2 (a1 , a2 ) + κ1 (a1 )κ1 (a2 ).

Since we know from the n = 1 calculation that κ1 (a1 ) = ϕ(a1 ), this yields

κ2 (a1 , a2 ) = ϕ(a1 a2 ) − ϕ(a1 )ϕ(a2 ). (2.14)

(3) For n = 3, we have

ϕ(a1 a2 a3 ) = κ{(1,2,3)} (a1 , a2 , a3 ) + κ{(1,2),(3)} (a1 , a2 , a3 ) + κ{(1),(2,3)} (a1 , a2 , a3 )

+ κ{(1,3),(2)} (a1 , a2 , a3 ) + κ{(1),(2),(3)} (a1 , a2 , a3 )
= κ3 (a1 , a2 , a3 ) + κ2 (a1 , a2 )κ1 (a3 ) + κ2 (a2 , a3 )κ1 (a1 )
+ κ2 (a1 , a3 )κ1 (a2 ) + κ1 (a1 )κ1 (a2 )κ1 (a3 ).

Thus we find that

κ3 (a1 , a2 , a3 ) = ϕ(a1 a2 a3 ) − ϕ(a1 )ϕ(a2 a3 )

− ϕ(a2 )ϕ(a1 a3 ) − ϕ(a3 )ϕ(a1 a2 ) + 2ϕ(a1 )ϕ(a2 )ϕ(a3 ). (2.15)

These three examples outline the general procedure of recursively defining κn in

terms of the mixed moments. It is easy to see that κn is an n-linear function.
Exercise 8. (i) Show the following: if ϕ is a trace then the cumulant κn is, for each
n ∈ N, invariant under cyclic permutations, i.e., for all a1 , . . . , an ∈ A we have

κn (a1 , a2 , . . . , an ) = κn (a2 , . . . , an , a1 ).

(ii) Let us assume that all moments with respect to ϕ are invariant under all
permutations of the entries, i.e., that we have for all n ∈ N and all a1 , . . . , an ∈ A
and all σ ∈ Sn that ϕ(aσ (1) · · · aσ (n) ) = ϕ(a1 · · · an ). Is it then true that also the free
cumulants κn (n ∈ N) are invariant under all permutations?
Let us also point out how the definition appears when a1 = · · · = an = a, i.e. when
all the random variables are the same. Then we have

ϕ(an ) = ∑ κπ (a, . . . , a).

π∈NC(n)

Thus if we write αna := ϕ(an ) and κπa := κπ (a, . . . , a) this reads

44 2 The Free Central Limit Theorem and Free Cumulants

αna = ∑ κπa . (2.16)

π∈NC(n)

Note the similarity to Equation (1.3) for classical cumulants.

Since the Catalan number is the number of non-crossing pairings of [2n] as well
as the number of non-crossing partitions of [n] we can use Equation (2.16) to show
that the cumulants of the standard semi-circle law are all 0 except κ2 = 1.
Exercise 9. Use Equation (2.16) to show that for the standard semi-circle law all
cumulants are 0, except κ2 which equals 1.
As another demonstration of the simplifying power of the moment-cumulant for-
mula (2.16) let us use the formula to find a simple expression for the moments
and free cumulants of the Marchenko-Pastur law. This is a probability measure
on R+ ∪ {0} that is as fundamental as the semi-circle law (see Section 4.5). Let
0 < c < ∞ be a positive real number. For each c we shall construct a probability
√ 2 √
measure νc . Set a = (1 − c)p and b = (1 + c)2 . For c ≥ 1, νc has as support the
interval [a, b] and the density (b − x)(x − a)/(2πx); that is
p
(b − x)(x − a)
dνc (x) = dx.
2πx
For 0 < c < 1, νc has the same density on [a, b] and in addition has an atom at 0 of
mass 1 − c; thus
p
(b − x)(x − a)
dνc (x) = (1 − c)δ0 + dx.
2πx
Note that when c = 1, a = 0 and the density has a “pole” of order 1/2 at 0 and thus
is still integrable.
Exercise 10. In this exercise we shall show that νc is a probability measure for all
c. Let R = −x2 + (a + b)x − ab then write
√
R R 1 −2x + (a + b) 1 a + b ab
= √ = √ + √ − √ .
x x R 2 R 2 R x R
(i) Show that the integral of the first term on [a, b] is 0.
√
(ii) Using the substitution t = (x − (1 + c))/ c, show that the integral of the
second term over [a, b] is π(a + b)/2.
(iii) Let u = (b − a)/(2ab), v = (b + a)/(2ab) and t = u−1 (v − √ x−1 ). With this
substitution show that the integral of the third term over [a, b] is −π ab.
(iv) Using the first three parts show that νc is a probability measure.

Definition 11. The Marchenko-Pastur distribution is the law with distribution νc

with 0 < c < ∞. We shall see in Exercise 11 that all free cumulants of νc are equal
to c. By analogy with the classical cumulants of the Poisson distribution, νc is also
2.2 Non-crossing partitions and free cumulants 45

called the free Poisson law (of rate c). We should also note that we have chosen a dif-
ferent normalization than that used by other authors in order to make the cumulants
simple; see Remark 12 and Exercise 12 below.

Exercise 11. In this exercise we shall find the moments and free cumulants of the
Marchenko-Pastur law.
√
(i) Let αn be the nth moment. Use the substitution t = (x − (1 + c))/ c to show
that
[(n−1)/2]
n − 1 2k

1
αn = ∑ (1 + c)n−2k−1 c1+k .
k=0 k + 1 2k k

(ii) Expand the expression (1 + c)n−2k−1 to obtain that

[(n−1)/2] n−k−1
(n − 1)!
αn = ∑ ∑ cl+1 .
k=0 l=k k! (k + 1)! (l − k)! (n − k − l − 1)!

(iii) Interchange the order of summation and use Vandermonde convolution ([79,
(5.23)]) to show that
n l
c n n
αn = ∑ .
l=1 n l − 1 l
n n
(iv) Finally use the fact ([140, Cor. 9.13]) that n1 l−1 l is the number of non-
crossing partitions of [n] with l blocks to show that

αn = ∑ c#(π) .
π∈NC(n)

Use this formula to show that κn = c for all n ≥ 1.

√ √
Remark 12. Given y > 0, let a0 = (1 − y)2 and b0 = (1 + y)2 . Let ρy be the prob-
ability measurepon R given by (b0 − t)(t − a0 )/(2πyt) dt on [a0 , b0 ] when y ≤ 1 and
p

(1 − y−1 )δ0 + (b0 − t)(t − a0 )/(2πyt) dt on {0} ∪ [a0 , b0 ] when y > 1. As above δ0
is the Dirac mass at 0. This might be called the standard form of the Marchenko-
Pastur law. In the exercise below we shall see that ρy is related to νc in a simple way
and the cumulants of ρy are not as simple as those of νc .

Exercise 12. Show that by setting c = 1/y and making the substitution t = x/c we
have Z Z
xk dνc (x) = ck t k dρy (t).

Show that the free cumulants of ρy are given by κn = c1−n .

There is a combinatorial formula by Krawczyk and Speicher [111] for expand-
ing cumulants whose arguments are products of random variables. For example,
consider the expansion of κ2 (a1 a2 , a3 ). This can be written as
46 2 The Free Central Limit Theorem and Free Cumulants

κ2 (a1 a2 , a3 ) = κ3 (a1 , a2 , a3 ) + κ1 (a1 )κ2 (a2 , a3 ) + κ2 (a1 , a3 )κ1 (a2 ). (2.17)

A more complicated example is given by:

κ2 (a1 a2 , a3 a4 ) (2.18)
= κ4 (a1 , a2 , a3 , a4 ) + κ1 (a1 )κ3 (a2 , a3 , a4 ) + κ1 (a2 )κ3 (a1 , a3 , a4 )
+ κ1 (a3 )κ3 (a1 , a2 , a4 ) + κ1 (a4 )κ3 (a1 , a2 , a3 ) + κ2 (a1 , a4 )κ2 (a2 , a3 )
+ κ2 (a1 , a3 )κ1 (a2 )κ1 (a4 ) + κ2 (a1 , a4 )κ1 (a2 )κ1 (a3 )
+ κ1 (a1 )κ2 (a2 , a3 )κ1 (a4 ) + κ1 (a1 )κ2 (a2 , a4 )κ1 (a3 ).

In general, the evaluation of a free cumulant with products of entries involves

summing over all π which have the property that they connect all different product
strings. Here is the precise formulation, for the proof we refer to [140, Theorem
11.12]. Note that this is the free counter part of the formula (1.16) for classical
cumulants.

Theorem 13. Suppose n1 , . . . , nr are positive integers and n = n1 + · · · + nr . Con-

sider a non-commutative probability space (A, ϕ) and a1 , a2 , . . . , an ∈ A. Let

A1 = a1 · · · an1 , A2 = an1 +1 · · · an1 +n2 , ..., Ar = an1 +···+nr−1 +1 · · · an .

Then
κr (A1 , . . . , Ar ) = ∑ κπ (a1 , . . . , an ) (2.19)
π∈NC(n)
π∨τ=1n

where the summation is over those π ∈ NC(n) which connect the blocks correspond-
ing to A1 , . . . , Ar . More precisely, this means that π ∨ τ = 1n where

τ = {(1, . . . , n1 ), (n1 + 1, . . . , n1 + n2 ), . . . , (n1 + · · · + nr−1 + 1, . . . , n)}

and 1n = {(1, 2, . . . , n)} is the partition with only one block.

Exercise 13. (i) Let τ = {(1, 2), (3)}. List all π ∈ NC(3) such that π ∨ τ = 13 . Check
that these are exactly the terms appearing on the right-hand side of Equation (2.17).
(ii) Let τ = {(1, 2), (3, 4)}. List all π ∈ NC(4) such that π ∨ τ = 14 . Check that
these are exactly the terms on the right-hand side of Equation (2.18)
The most important property of free cumulants is that we may characterize
free independence by the vanishing of “mixed” cumulants. Let (A, ϕ) be a non-
commutative probability space and A1 , . . . , As ⊂ A unital subalgebras. A cumulant
κn (a1 , a2 , . . . , an ) is mixed if each ai is in one of the subalgebras, but a1 , a2 , . . . , an
do not all come from the same subalgebra.

Theorem 14. The subalgebras A1 , . . . , As are free if and only if all mixed cumulants
vanish.
2.2 Non-crossing partitions and free cumulants 47

The proof of this theorem relies on formula (2.19) and on the following proposi-
tion which is a special case of Theorem 14. For the details of the proof of Theorem
14 we refer again to [140, Theorem 11.15].

Proposition 15. Let (A, ϕ) be a non-commutative probability space and let κn ,

n ≥ 1 be the corresponding free cumulants. For n ≥ 2, κn (a1 , . . . , an ) = 0 if 1 ∈
{a1 , . . . , an }.

Proof: We consider the case where the last argument an is equal to 1, and proceed
by induction on n.
For n = 2,
κ2 (a, 1) = ϕ(a1) − ϕ(a)ϕ(1) = 0.
So the base step is done.
Now assume for the induction hypothesis that the result is true for all 1 ≤ k < n.
We have that

ϕ(a1 · · · an−1 1) = ∑ κπ (a1 , . . . , an−1 , 1)

π∈NC(n)

= κn (a1 , . . . , an−1 , 1) + ∑ κπ (a1 , . . . , an−1 , 1).

π∈NC(n)
π6=1n

According to our induction hypothesis, a partition π 6= 1n can have κπ (a1 , . . . , an−1 ,

1) different from zero only if (n) is a one-element block of π, i.e. π = σ ∪ {(n)} for
some σ ∈ NC(n − 1). For such a partition we have

κπ (a1 , . . . , an−1 , 1) = κσ (a1 , . . . , an−1 )κ1 (1) = κσ (a1 , . . . , an−1 ),

hence

ϕ(a1 · · · an−1 1) = κn (a1 , . . . , an−1 , 1) + ∑ κσ (a1 , . . . , an−1 )

σ ∈NC(n−1)

= κn (a1 , . . . , an−1 , 1) + ϕ(a1 · · · an−1 ).

Since ϕ(a1 · · · an−1 1) = ϕ(a1 · · · an−1 ), we have proved that κn (a1 , . . . , an−1 , 1) = 0.

Whereas Theorem 14 gives a useful characterization for the freeness of subalge-

bras, its direct application to the case of random variables would not yield a satisfy-
ing characterization in terms of the vanishing of mixed cumulants in the subalgebras
generated by the variables. By invoking again the product formula for free cumu-
lants, Theorem 13, it is quite straightforward to get the following much more useful
characterization in terms of mixed cumulants of the variables.

Theorem 16. Let (A, ϕ) be a non-commutative probability space. The random vari-
ables a1 , . . . , as ∈ A are free if and only if all mixed cumulants of the a1 , . . . , as
48 2 The Free Central Limit Theorem and Free Cumulants

vanish. That is, a1 , . . . , as are free if and only if whenever we choose i1 , . . . , in ∈

{1, . . . , s} in such a way that ik 6= il for some k, l ∈ [n], then κn (ai1 , . . . , ain ) = 0.

2.3 Products of free random variables

We want to understand better the calculation rule for mixed moments of free vari-
ables. Thus we will now derive the basic combinatorial description for such mixed
moments.
Let {a1 , . . . , ar } and {b1 , . . . , br } be free random variables, and consider

ϕ(a1 b1 a2 b2 · · · ar br ) = ∑ κπ (a1 , b1 , a2 , b2 , · · · , ar , br ).
π∈NC(2r)

Since the a’s are free from the b’s, we only need to sum over those partitions π
which do not connect the a’s with the b’s. Each such partition may be written as π =
πa ∪ πb , where πa denotes the blocks consisting of a’s and πb the blocks consisting
of b’s. Hence by the definition of free cumulants

ϕ(a1 b1 a2 b2 · · · ar br ) = ∑ κπa (a1 , . . . , ar ) · κπb (b1 , . . . , br )

πa ∪πb ∈NC(2r)
!
= ∑ κπa (a1 , . . . , ar ) · ∑ κπb (b1 , . . . , br ) .
πa ∈NC(r) πb ∈NC(r)
πa ∪πb ∈NC(2r)

It is now easy to see that, for a given πa ∈ NC(r), there exists a biggest σ ∈ NC(r)
with the property that πa ∪σ ∈ NC(2r). This σ is called the Kreweras complement of
πa and is denoted by K(πa ), see [140, Def. 9.21]. This K(πa ) is given by connecting
as many b’s as possible in a non-crossing way without getting crossings with the
blocks of πa . The mapping K is an order-reversing bijection on the lattice NC(r).
But then the summation condition on the internal sum above is equivalent to the
condition πb ≤ K(πa ). Summing κπ over all π ∈ NC(r) gives the corresponding r-th
moment, which extends easily to

∑ κπ (b1 , . . . , br ) = ϕσ (b1 , . . . , br ),
π∈NC(r)
π≤σ

where ϕσ denotes, in the same way as in κπ , the product of moments along the
blocks of σ ; see Equation (2.11).
Thus we get as the final conclusion of our calculations that

ϕ(a1 b1 a2 b2 · · · ar br ) = ∑ κπ (a1 , . . . , ar ) · ϕK(π) (b1 , . . . , br ). (2.20)

π∈NC(r)

Let us consider some simple examples for this formula. For r = 1, there is only
one π ∈ NC(1), which is its own complement, and we get
2.4 Functional relation between moment series and cumulant series 49

ϕ(a1 b1 ) = κ1 (a1 )ϕ(b1 ).

As κ1 = ϕ, this gives the usual factorization formula

ϕ(a1 b1 ) = ϕ(a1 )ϕ(b1 ).

For r = 2, there are two elements in NC(2), and , and we have

K( )= and K( ) =

and the formula above gives

ϕ(a1 b1 a2 b2 ) = κ2 (a1 , a2 )ϕ(b1 )ϕ(b2 ) + κ1 (a1 )κ1 (a2 )ϕ(b1 b2 ).

With κ1 (a) = ϕ(a) and κ2 (a1 , a2 ) = ϕ(a1 a2 ) − ϕ(a1 )ϕ(a2 ) this reproduces formula
(1.14).
The formula above is not symmetric between the a’s and the b’s (the former
appear with cumulants, the latter with moments). Of course, one can also exchange
the roles of a and b, in which case one ends up with

ϕ(a1 b1 a2 b2 · · · ar br ) = ∑ ϕK −1 (π) (a1 , . . . , ar ) · κπ (b1 , . . . , br ). (2.21)

π∈NC(r)

Note that K 2 is not the identity, but a cyclic rotation of π.

Formulas (2.20) and (2.21) are particularly useful when one of the sets of vari-
ables has simple cumulants, as is the case for semi-circular random variables bi = s.
Then only the second cumulants κ2 (s, s) = 1 are non-vanishing, i.e. in effect the
sum is only over non-crossing pairings. Thus, if s is semi-circular and free from
{a1 , . . . , ar } then we have

ϕ(a1 sa2 s · · · ar s) = ∑ ϕK −1 (π) (a1 , . . . , ar ). (2.22)

π∈NC2 (r)

Let us also note in passing that one can rewrite the Equations (2.20) and (2.21)
above in the symmetric form (see [140, (14.4)])

κr (a1 b1 , a2 b2 , . . . , ar br ) = ∑ κπ (a1 , . . . , ar ) · κK(π) (b1 , . . . , br ). (2.23)

π∈NC(r)

2.4 Functional relation between moment series and cumulant series

Notice how much more efficient the result on the description of freeness in terms of
cumulants is in checking freeness of random variables than the original definition of
free independence. In the cumulant framework, we can forget about centredness and
weaken “alternating” to “mixed.” Also, the problem of adding two freely indepen-
dent random variables becomes easy on the level of free cumulants. If a, b ∈ (A, ϕ)
are free with respect to ϕ, then
50 2 The Free Central Limit Theorem and Free Cumulants

κna+b = κn (a + b, . . . , a + b)
= κn (a, . . . , a) + κn (b, . . . , b) + (mixed cumulants in a, b)
= κna + κnb .

Thus the problem of calculating moments is shifted to the relation between cu-
mulants and moments. We already know that the moments are polynomials in the
cumulants, according to the moment-cumulant formula (2.16), but we want to put
this relationship into a framework more amenable to performing calculations.
For any a ∈ A, let us consider formal power series in an indeterminate z defined
by
∞
M(z) = 1 + ∑ αna zn , moment series of a
n=1
∞
C(z) = 1 + ∑ κna zn , cumulant series of a.
n=1

We want to translate the moment-cumulant formula (2.16) into a statement about

the relationship between the moment and cumulant series.

Proposition 17. The relation between the moment series M(z) and the cumulant
series C(z) of a random variable is given by

M(z) = C(zM(z)). (2.24)

Proof: The idea is to sum first over the possibilities for the block of π containing
1, as in the derivation of the recurrence for Cn . Suppose that the first block of π
looks like V = {1, v2 , . . . , vs }, where 1 < v1 < · · · < vs ≤ n. Then we build up the
rest of the partition π out of smaller “nested” non-crossing partitions π1 , . . . , πs with
π1 ∈ NC({2, . . . , v2 − 1}), π2 ∈ NC({v2 + 1, . . . , v3 − 1}), etc. Hence if we denote
i1 = |{2, . . . , v2 − 1}|, i2 = |{v2 + 1, . . . , v3 − 1}|, etc., then we have
n
αn = ∑ ∑ ∑ κs κπ1 · · · κπs
s=1 i1 ,...,is ≥0 π=V ∪π1 ∪···∪πs
s+i1 +···+is =n
n
= ∑ ∑ κs ∑ κπ1 · · · ∑ κπs
s=1 i1 ,...,is ≥0 π1 ∈NC(i1 ) πs ∈NC(is )
s+i1 +···+is =n
n
= ∑ ∑ κs αi1 · · · αis .
s=1 i1 ,...,is ≥0
s+i1 +···+is =n

Thus we have
2.4 Functional relation between moment series and cumulant series 51
∞ ∞ n
1 + ∑ αn zn = 1 + ∑ ∑ ∑ κs zs αi1 zi1 . . . αis zis
n=1 n=1 s=1i1 ,...,is ≥0
s+i1 +···+is =n
∞ ∞ s
s i
= 1 + κs z
∑ ∑ αi z .
s=1 i=0

Now consider the Cauchy transform of a:

1 ∞
ϕ(an ) 1
G(z) := ϕ = ∑ n+1 = M(1/z) (2.25)
z−a n=0 z z

and the R-transform of a defined by

C(z) − 1 ∞
a
R(z) := = ∑ κn+1 zn . (2.26)
z n=0

C(z)
Also put K(z) = R(z) + 1z = z . Then we have the relations

1 1 1 1 1
K(G(z)) = C(G(z)) = C M = zG(z) = z.
G(z) G(z) z z G(z)

Note that M and C are in C[[z]], the ring of formal power series in z, G ∈ C[[ 1z ]],
and K ∈ C((z)), the ring of formal Laurent series in z, i.e. zK(z) ∈ C[[z]]. Thus K ◦G ∈
C(( 1z )) and G ◦ K ∈ C[[z]]. We then also have G(K(z)) = z.
Thus we recover the following theorem of Voiculescu, which is the main re-
sult on the R-transform. Voiculescu’s original proof in [177] was much more op-
erator theoretic. One should also note that this computational machinery for the
R-transform was also found independently and about the same time by Woess
[204, 205], Cartwright and Soardi [49], and McLaughlin [125], in a more restricted
setting of random walks on free product of groups. Our presentation here is based
on the approach of Speicher in [161].

Theorem 18. For a random variable a let Ga (z) be its Cauchy transform and define
its R-transform Ra (z) by
Ga [Ra (z) + 1/z] = z. (2.27)
Then, for a and b freely independent, we have

Ra+b (z) = Ra (z) + Rb (z). (2.28)

Let us write, for a and b free, the above as:

z = Ga+b [Ra+b (z) + 1/z] = Ga+b [Ra (z) + Rb (z) + 1/z]. (2.29)
52 2 The Free Central Limit Theorem and Free Cumulants

If we now put w := Ra+b (z) + 1/z, then we have z = Ga+b (w) and we can continue
Equation (2.29) as:

Ga+b (w) = z = Ga [Ra (z) + 1/z] = Ga [w − Rb (z)] = Ga w − Rb [Ga+b (w)] .

Thus we get the subordination functions ωa and ωb given by

ωa (z) = z − Rb [Ga+b (z)] and ωb (z) = z − Ra [Ga+b (z)]. (2.30)

We have ωa , ωb ∈ C(( 1z )), so Ga ◦ ωa ∈ C[[ 1z ]]. These satisfy the subordination rela-
tions
Ga+b (z) = Ga [ωa (z)] = Gb [ωb (z)]. (2.31)
We say that Ga+b is subordinate to both Ga and Gb . The name comes from the theory
of univalent functions; see [65, Ch. 6] for a general discussion.
Exercise 14. Show that ωa (z) + ωb (z) − 1/Ga (ωa (z)) = z.

1
Exercise 15. Suppose we have formal Laurent series ωa (z) and ωb (z) in z such that

Ga (ωa (z)) = Gb (ωb (z)) and ωa (z) + ωb (z) − 1/Ga (ωa (z)) = z. (2.32)

Let G be the formal power series G(z) = Ga (ωa (z)) and R(z) = Gh−1i (z) − z−1 .
(Gh−1i denotes here the inverse under composition of G.) By replacing z by Gh−1i (z)
in the second equation of (2.32) show that R(z) = Ra (z) + Rb (z). These equations
can thus be used to define the distribution of the sum of two free random variables.
At the moment these are identities on the level of formal power series. In the next
chapter, we will elaborate on their interpretation as identities of analytic functions,
see Theorem 3.43.

2.5 Subordination and the non-commutative derivative

One might wonder about the relevance of the subordination formulation in (2.31).
Since it has become more and more evident that the subordination formulation of
free convolution is in many cases preferable to the (equivalent) description in terms
of the R-transform, we want to give here some idea why subordination is a very nat-
ural concept in the context of free probability. When subordination appeared in this
context first in papers of Voiculescu [181] and Biane [34] it was more an ad hoc con-
struction – its real nature was only revealed later in the paper [190] of Voiculescu,
where he related it to the non-commutative version of the derivative operation.
We will now introduce the basics of this non-commutative derivative; as before
in this chapter, we will ignore all analytic questions and just deal with formal power
series. In Chapter 8 we will have more to say about the analytic properties of the
non-commutative derivatives.
Let Chxi be the algebra of polynomials in the variable x. Then we define the
non-commutative derivative ∂x as a linear mapping ∂x : Chxi → Chxi ⊗ Chxi by the
2.5 Subordination and the non-commutative derivative 53

requirements that it satisfies the Leibniz rule

∂x (qp) = ∂x (q) · 1 ⊗ p + q ⊗ 1 · ∂x (p)

and by
∂x 1 = 0, ∂x x = 1 ⊗ 1.
This means that it is given more explicitly as the linear extension of
n−1
∂x x n = ∑ xk ⊗ xn−1−k . (2.33)
k=0

We can also (and will) extend this definition from polynomials to infinite formal
power series.
Exercise 16. (i) Let, for some z ∈ C with z 6= 0, f be the formal power series

1 ∞
xn
f (x) = = ∑ n+1 .
z − x n=0 z

Show that we have then ∂x f = f ⊗ f .

(ii) Let f be a formal power series in x with the property that ∂x f = f ⊗ f . Show
that f must then be either zero or of the form f (x) = 1/(z − x) for some z ∈ C, with
z 6= 0.
We will now consider polynomials and formal power series in two non-commut-
ing variables x and y. In this context, we still have the notion of ∂x (and also of
∂y ) and now their character as “partial” derivatives becomes apparent. Namely, we
define ∂x : Chx, yi → Chx, yi ⊗ Chx, yi by the requirements that it should be a deriva-
tion, i.e., satisfy the Leibniz rule, and by the prescriptions:

∂x x = 1 ⊗ 1, ∂x y = 0, ∂x 1 = 0.

For a monomial xi1 · · · xin in x and y (where we put x1 := x and x2 := y) this means
explicitly
n
∂x xi1 · · · xin = ∑ δ1ik xi1 · · · xik−1 ⊗ xik+1 · · · xin . (2.34)
k=1

Again it is clear that we can extend this definition also to formal power series in
non-commuting variables.
Let us note that we may define the derivation ∂x+y on Chx + yi exactly as we did
∂x . Namely ∂x+y (1) = 0 and ∂x+y (x + y) = 1 ⊗ 1. Note that ∂x+y can be extended
to all of Chx, yi but not in a unique way unless we specify another basis element.
Since Chx + yi ⊂ Chx, yi we may apply ∂x to Chx + yi and observe that ∂x (x + y) =
1 ⊗ 1 = ∂x+y (x + y). Thus
54 2 The Free Central Limit Theorem and Free Cumulants
n
∂x (x + y)n = ∑ (x + y)k−1 ⊗ (x + y)n−k = ∂x+y (x + y)n .
k=1

Hence
∂x |Chx+yi = ∂x+y . (2.35)
If we are given a polynomial p(x, y) ∈ Chx, yi, then we will also consider
Ex [p(x, y)], the conditional expectation of p(x, y) onto a function of just the vari-
able x, which should be the best approximation to p among such functions. There
is no algebraic way of specifying what best approximation means; we need a state
ϕ on the ∗-algebra generated by self-adjoint elements x and y for this. Given such
a state, we will require that the difference between p(x, y) and Ex [p(x, y)] cannot be
detected by functions of x alone; more precisely, we ask that

ϕ q(x) · Ex [p(x, y)] = ϕ q(x) · p(x, y) (2.36)

for all q ∈ Chxi. If we are going from the polynomials Chx, yi over to the Hilbert
space completion L2 (x, y, ϕ) with respect to the inner product given by h f , gi :=
ϕ(g∗ f ) then this amounts just to an orthogonal projection from the space L2 (x, y, ϕ)
onto the subspace L2 (x, ϕ) generated by polynomials in the variable x. (Let us as-
sume that ϕ is positive and faithful so that we get an inner product.) Thus, on the
Hilbert space level the existence and uniqueness of Ex [p(x, y)] is clear. In general,
though, it might not be the case that the projection of a polynomial in x and y is a
polynomial in x – it will just be an L2 -function. If we assume, however, that x and
y are free, then we claim that this projection maps polynomials to polynomials. In
fact for this construction to work at the algebraic level we only need assume that
ϕ|Chxi is non-degenerate as this shows that Ex is well defined by (2.36). It is clear
from Equation (2.36) that ϕ(Ex (a)) = ϕ(a) for all a ∈ Chx, yi.
Let us consider some examples. Assume that x and y are free. Then it is clear that
we have
Ex [xn ym ] = xn ϕ(ym )
and more generally
Ex [xn1 ym xn2 ] = xn1 +n2 ϕ(ym ).
It is not so clear what Ex [yxyx] might be. Before giving the general rule let us make
some simple observations.
Exercise 17. Let A1 = Chxi and A2 = Chyi with x and y free and ϕ|A1 non-degen-
erate.
(i) Show that Ex [Å2 ] = 0.
(ii) For α1 , . . . , αn ∈ {1, 2} with α1 6= · · · 6= αn and n ≥ 2, show that Ex [Åα1 · · · Åαn ] =
0.

Exercise 18. Let A1 and A2 be as in Exercise 17. Since A1 and A2 are free we can
use Equation (1.12) from Exercise 1.9 to write
2.5 Subordination and the non-commutative derivative 55
⊕ ⊕
A1 ∨ A2 = A1 ⊕ Å2 ⊕ ∑ ∑ Åα1 Åα2 · · · Åαn .
n≥2 α1 6=···6=αn

We have just shown that if Ex is a linear map satisfying Equation (2.36) then Ex is
the identity on the first summand and 0 on all remaining summands. Show that by
defining Ex this way we get the existence of a linear mapping from A1 ∨ A2 to A1
satisfying Equation (2.36). An easy consequence of this is that for q1 (x), q2 (x) ∈
Chxi and p(x, y) ∈ Chx, yi we have Ex [q1 (x)p(x, y)q2 (x)] = q1 (x)Ex [p(x, y)]q2 (x).
Let a1 = yn1 , a2 = yn2 and b = xm1 . To compute Ex (yn1 xm1 yn2 ) we follow the same
centring procedure used to compute ϕ(a1 ba2 ) in Section 1.12. From Exercise 17 we
see that

Ex [a1 ba2 ] = Ex [å1 ba2 ] + ϕ(a1 )bϕ(a2 )

= Ex [å1 b̊a2 ] + ϕ(å1 a2 )ϕ(b) + ϕ(a1 )bϕ(a2 )
= ϕ(å1 a2 )ϕ(b) + ϕ(a1 )bϕ(a2 )
= ϕ(a1 a2 )ϕ(b) − ϕ(a1 )ϕ(b)ϕ(a2 ) + ϕ(a1 )bϕ(a2 ).

Thus

Ex [yn1 xm1 yn2 xm2 ] = ϕ(yn1 +n2 )ϕ(xm1 )xm2 + ϕ(yn1 )xm1 ϕ(yn2 )xm2
− ϕ(yn1 )ϕ(xm1 )ϕ(yn2 )xm2 .

The following theorem (essentially in the work [34] of Biane) gives the gen-
eral recipe for calculating such expectations. As usual the formulas are simplified
by using cumulants. To give the rule we need the following bit of notation. Given
σ ∈ P(n) and a1 , . . . , an ∈ A we define ϕ̃σ (a1 , . . . , an ) in the same way as ϕσ in
Equation (2.11) except we do not apply ϕ to the last block, i.e. the block con-
taining n. For example if σ = {(1, 3, 4), (2, 6), (5)} then ϕ̃σ (a1 , a2 , a3 , a4 , a5 , a6 ) =
ϕ(a1 a3 a4 )ϕ(a5 )a2 a6 . More explicitly, for σ = {V1 , . . . ,Vs } ∈ NC(r) with r ∈ Vs we
put
ϕ̃σ (a1 , . . . , ar ) = ϕ ∏ ai1 · · · ϕ ∏ ais−1 · ∏ ais .
i1 ∈V1 is−1 ∈Vs−1 is ∈Vs

Theorem 19. Let x and y be free. Then for r ≥ 1 and n1 , m1 , . . . , nr , mr ≥ 0, we have

Ex [yn1 xm1 · · · ynr xmr ] = ∑ κπ (yn1 , . . . , ynr ) · ϕ̃K(π) (xm1 , . . . , xmr ). (2.37)
π∈NC(r)

Let us check that this agrees with our previous calculation of Ex [yn1 xm1 yn2 xm2 ].

Ex [yn1 xm1 yn2 xm2 ]

= κ{(1,2)} (yn1 , yn2 ) · ϕ̃{(1),(2)} (xm1 , xm2 ) + κ{(1),(2)} (yn1 , yn2 ) · ϕ̃{(1,2)} (xm1 , xm2 )
= κ2 (yn1 , yn2 )ϕ(xm1 )xm2 + κ1 (yn1 )κ1 (yn2 )xm1 +m2
= ϕ(yn1 +n2 ) − ϕ(yn1 )ϕ(yn2 ) ϕ(xm1 ) · xm2 + ϕ(yn1 )ϕ(yn2 ) · xm1 +m2 .

56 2 The Free Central Limit Theorem and Free Cumulants

The proof of the theorem is outlined in the exercise below.

Exercise 19. (i) Given π ∈ NC(n) let π 0 be the non-crossing partition of [n0 ] =
{0, 1, 2, 3, . . . , n} obtained by joining 0 to the block of π containing n. For a0 , a1 , . . . ,
an ∈ A, show that ϕπ 0 (a0 , a1 , a2 , . . . , an ) = ϕ(a0 ϕ̃π (a1 , . . . , an )).
(ii) Suppose that A1 , A2 ⊂ A are unital subalgebras of A which are free with
respect to the state ϕ. Let x0 , x1 , . . . , xn ∈ A1 and y1 , y2 , . . . , yn ∈ A2 . Show that

ϕ(x0 y1 x1 y2 x2 · · · yn xn ) = ∑ κπ (y1 , . . . , yn )ϕK(π)0 (x0 , x1 , . . . , xn ).

π∈NC(n)

Prove Theorem 19 by showing that with the expression given in (2.37) one has for
all m ≥ 0

ϕ xm · Ex [yn1 xm1 · · · ynr xmr ] = ϕ xm · yn1 xm1 · · · ynr xmr .

Exercise 20. Use the method of Exercise 19 to work out Ex [xm1 yn1 · · · xmr ynr ].
By linear extension of Equation (2.37) one can thus get the projection onto one
variable x of any non-commutative polynomial or formal power series in two free
variables x and y. We now want to identify the projection of resolvents in x + y. To
achieve this we need a crucial intertwining relation between the partial derivative
and the conditional expectation.

Lemma 20. Suppose ϕ is a state on Chx, yi such that x and y are free and ϕ|Chxi is
non-degenerate. Then

Ex ⊗ Ex ◦ ∂x+y |Chx+yi = ∂x ◦ Ex |Chx+yi . (2.38)

Proof: We let A1 = Chxi and A2 = Chyi. We use the decomposition from Exer-
cise 1.9
⊕ ⊕
A1 ∨ A2 A1 = Å2 ⊕ ∑ ∑ Åα1 · · · Åαn
n≥2 α1 6=···6=αn

and examine the behaviour of Ex ⊗ Ex ◦ ∂x on each summand. We know that ∂x is 0

on Å2 by definition. For n ≥ 2

Ex ⊗ Ex ◦ ∂x (Åα1 · · · Åαn )
n
⊆ ∑ δ1,αk Ex (Åα1 · · · Åαk−1 (C1 ⊕ Åαk )) ⊗ Ex ((C1 ⊕ Åαk )Åαk+1 · · · Åαn ).
k=1

By Exercise 17, in each term one or both of the factors is 0. Thus Ex ⊗ Ex ◦

∂x |A1 ∨A2 A1 = 0. Hence

Ex ⊗ Ex ◦ ∂x |A1 ∨A2 = Ex ⊗ Ex ◦ ∂x ◦ Ex |A1 ∨A2 = ∂x ◦ Ex |A1 ∨A2 ,

2.5 Subordination and the non-commutative derivative 57

and then by Equation (2.35) we have

Ex ⊗ Ex ◦ ∂x+y |Chx+yi = Ex ⊗ Ex ◦ ∂x |Chx+yi = ∂x ◦ Ex |Chx+yi .

Theorem 21. Let x and y be free. For every z ∈ C with z 6= 0 there exists a w ∈ C
such that
1 1
Ex [ ]= . (2.39)
z − (x + y) w−x
In other words, the best approximation for a resolvent in x + y by a function of x is
again a resolvent.

By applying the state ϕ to both sides of (2.39) one obtains the subordination for
the Cauchy transforms, and thus it is clear that the w from above must agree with
the subordination function from (2.31), w = ω(z).
Proof: We put
1
f (x, y) := .
z − (x + y)
By Exercise 16, part (i), we know that ∂x+y f = f ⊗ f . By Lemma 20 we have that
for functions g of x + y

∂x Ex [g(x + y)] = Ex ⊗ Ex [∂x+y g(x + y)]. (2.40)

By applying (2.40) to f , we obtain

∂x Ex [ f ] = Ex ⊗ Ex [∂x+y f ] = Ex ⊗ Ex [ f ⊗ f ] = Ex [ f ] ⊗ Ex [ f ].

Thus, by the second part of Exercise 16, we know that Ex [ f ] is a resolvent in x and
we are done.
Chapter 3
Free Harmonic Analysis

In this chapter we shall present an approach to free probability based on analytic

functions. At the end of the previous chapter we defined the Cauchy transform of
a random variable a in an algebra A with a state ϕ to be the formal power series
G(z) = 1z M( 1z ) where M(z) = 1 + ∑n≥1 αn zn and αn = ϕ(an ) are the moments of a.
Then R(z), the R-transform of a, was defined to be the formal power series R(z) =
∑n≥1 κn zn−1 determined by the moment-cumulant relation which we have shown to
be equivalent to the equations

G R(z) + 1/z = z = 1/G(z) + R(G(z)). (3.1)

If a is a self-adjoint element of a unital C∗ -algebra A with a state ϕ then there

is a spectral measure ν on R such that the moments of a are the same as the
moments of the probability measure ν. We can then define the analytic function
G(z) = ϕ((z − a)−1 ) = R (z − t)−1 dν(t) on the complex upper half plane, C+ . One
R

can then consider the relation between the formal power series G obtained from the
moment generating function and the analytic function G obtained from the spectral
measure. It turns out that on the exterior of a disc containing the support of ν, the
formal power series converges to the analytic function, and the R-transform becomes
an analytic function on an open set containing 0 whose power series expansion is
the formal power series ∑n≥1 κn zn−1 given in the previous chapter.
When ν does not have all moments there is no formal power series; this corre-
sponds to a being an unbounded self-adjoint operator affiliated with A. However, the
Cauchy transform is always defined. Moreover, one can construct the R-transform
of ν, analytic on some open set, satisfying equation (3.1) — although there may not
be any free cumulants if ν has no moments. However if ν does have moments then
the R-transform has cumulants given by an asymptotic expansion at 0.
If X and Y are classically independent random variables with distributions νX
and νY then the distribution of X +Y is the convolution, νX ∗ νY . We shall construct
the free analogue, νX νY , of the classical convolution. νX νY is called the free
additive convolution of νX and νY ; it is the distribution of the sum X + Y when X

59
60 3 Free Harmonic Analysis

and Y are freely independent. Since X and Y do not commute we cannot do this
with functions as in the classical case. We shall do this on the level of probability
measures.
We shall ultimately show that the R-transform exists for all probability measures.
However, we shall first do this for compactly supported probability measures, then
for probability measures with finite variance, and finally for arbitrary probability
measures. This follows more or less the historical development. The compactly sup-
ported case was treated in [177] by Voiculescu. The case of finite variance was then
treated by Maassen in [120]; this was an important intermediate step, as it promoted
the use of the reciprocal Cauchy transform F = 1/G and of the subordination func-
tion. The general case was then first treated by Bercovici and Voiculescu in [31] by
operator algebraic methods; however, more recent alternative approaches, by Be-
linschi and Bercovici [21, 18] and by Chistyakov and Götze [54, 53], rely on the
subordination formulation. Since this subordination approach seems to be analyti-
cally better controllable than the R-transform, and also best suited for generaliza-
tions to the operator-valued case (see Chapter 10, in particular Section 10.4)), we
will concentrate in our presentation on this approach and try to give a streamlined
and self-contained presentation.

3.1 The Cauchy transform

Definition 1. Let C+ = {z ∈ C | Im(z) > 0} denote the complex upper half-plane,
and C− = {z | Im(z) < 0} denote the lower half-plane. Let ν be a probability mea-
sure on R and for z 6∈ R let
1
Z
G(z) = dν(t);
R z−t
G is the Cauchy transform of the measure ν.

Let us briefly check that the integral converges to an analytic function on C+ .

Lemma 2. G is an analytic function on C+ with range contained in C− .

Proof: Since |z − t|−1 ≤ |Im(z)|−1 and ν is a probability measure the integral is

always convergent. If Im(w) 6= 0 and |z − w| < |Im(w)|/2 then for t ∈ R we have

z − w |Im(w)| 1 1
t −w <
· = ,
2 |Im(w)| 2
n
n=0 ((z − w)/(t − w)) converges uniformly to (t − w)/(t − z) on
so the series ∑∞
−1
|z − w| < |Im(w)|/2. Thus (z − t) = − ∑∞ −(n+1) (z − w)n on |z − w| <
n=0 (t − w)
|Im(w)|/2. Hence
∞ Z
G(z) = − ∑ (t − w)−(n+1) dν(t) (z − w)n
n=0 R
3.1 The Cauchy transform 61

Fig. 3.1 We choose θ1 , the argument 1.2

of z − 2, to be such that 0 ≤ θ1 < 2π. z
1
Similarly we choose θ2 , the argument of
0.8
z + 2, such that 0 ≤ θ2 < 2π. Thus θ1 + θ2
is continuous on C \ [−2, ∞). However 0.6
ei(θ1 +θ2 )/2 is continuous on C \ [−2, 2] 0.4
because ei(0+0)/2 = 1 = ei(2π+2π)/2 , so there 0.2
θ1 (z)
is no jump as the half lines (−∞, −2] and θ2 (z)
[2, ∞) are crossed. -2 -1 1 2

is analytic on |z − w| < |Im(w)|/2.

Finally note that for Im(z) > 0, we have for t ∈ R, Im((z − t)−1 ) < 0, and hence
Im(G(z)) < 0. Thus G maps C+ into C− .
Exercise 1. (i) Let µ be the atomic probability measure with atoms at the real num-
bers {a1 , . . . , an } and let λi = µ({ai }) be the mass of the atom at ai . Find the Cauchy
transform of µ.
(ii) Let ν be the Cauchy distribution, i.e dν(t) = π −1 (1 + t 2 )−1 dt. Show that
G(z) = 1/(z + i).
√
In the next two exercises we need to choose a branch of z2 − 4 for z√in the
upper + z2 − 4 = (z − 2)(z + 2) and define each of z − 2
√ half-plane,+C . We write +
and z + 2 on C . For z ∈ C , let θ1 be the angle between the x-axis and the line
joining z to 2; and θ2 the angle between the x-axis and the line joining z to √ −2. See
Fig. 3.1. Then z − 2 = |z − 2|eiθ1 and z + 2 = |z + 2|eiθ2 and so we define z2 − 4 to
be |z2 − 4|1/2 ei(θ1 +θ2 )/2 .
√
Exercise 2. For z = u +iv ∈ C+ let z = |z|eiθ /2 where 0 < θ < π is the argument
p

of z. Show that
s√ s√
√ u2 + v2 + u √ u2 + v2 − u
Re( z) = and Im( z) = .
2 2

Exercise 3. For z ∈ C+ show that

p p
|Im(z)| < Im z2 − 4 and Re z2 − 4 ≤ |Re(z)|;

with equality in the second relation only when Re(z) = 0.

Exercise 4. In this exercise we shall compute the Cauchy transform of the arc-sine
law using contour integration. Recall that
√ the density of the arc-sine law on the
interval [−2, 2] is given by dν(t) = 1/(π 4 − t 2 ). Let
62 3 Free Harmonic Analysis

(z − t)−1
Z 2
1
G(z) = √ dt.
π −2 4 − t2
(i) Make the substitution t = 2 cos θ for 0 ≤ θ ≤ π. Show that
Z 2π
1
G(z) = (z − 2 cos θ )−1 dθ .
2π 0

(ii) Make the substitution w = eiθ and show that we can write G as the contour
integral
1 1
Z
G(z) = dw
2πi Γ zw − w2 − 1
where Γ = {w ∈ C | |w| = 1}. √
2 2
√ Show that the roots of zw − w − 1 = 0 are w1 = (z − z − 4)/2 and w2 =
(iii)
2
(z + z − 4)/2 and that w1 ∈ int(Γ ) and that w2 6∈ int(Γ ), using the branch defined
above. √
(iv) Using the residue calculus show that G(z) = 1/ z2 − 4.

Exercise 5. In this exercise we shall compute the Cauchy transform of the semi-
circle law using contour integration. Recall that the
√ density of the semi-circle law
on the interval [−2, 2] is given by dν(t) = (2π)−1 4 − t 2 . Let
Z √
1 2 4 − t2
G(z) = dt.
2π −2 z − t
(i) Make the substitution t = 2 cos θ for 0 ≤ θ ≤ π. Show that

4 sin2 θ
Z 2π
1
G(z) = dθ .
4π 0 z − 2 cos θ

(ii) Make the substitution w = eiθ and show that we can write G as the contour
integral
1 (w2 − 1)2
Z
G(z) = dw
4πi Γ w (w2 − zw + 1)
2

where Γ = {w ∈ C | |w| = 1}.

(iii) Using the results from Exercise 3 and the residue calculus, show that
√
z − z2 − 4
G(z) = , (3.2)
2
using the branch defined above.

Exercise 6. In this exercise we shall compute the Cauchy transform of the Mar-
chenko-Pastur law with parameter c using contour integration. We shall start by
supposing that c > 1. Recall that the density of the Marchenko-Pastur law on the
3.1 The Cauchy transform 63
p √
interval [a, b] is given by dνc (t) = (b − t)(t − a)/(2πt)dt with a = (1 − c)2 and
√
b = (1 + c)2 . Let
Z bp
(b − t)(t − a)
G(z) = dt.
a 2πt(z − t)
√
(i) Make the substitution t = 1 + 2 c cos θ + c for 0 ≤ θ ≤ π. Show that

4c sin2 θ
Z 2π
1
G(z) = √ √ dθ .
4π 0 (1 + 2 c cos θ + c)(z − 1 − 2 c cos θ − c)

(ii) Make the substitution w = eiθ and show that we can write G as the contour
integral
1 (w2 − 1)2
Z
G(z) = dw
4πi Γ w(w + f w + 1)(w2 − ew + 1)
2
√ √
where Γ = {w ∈ C | |w| = 1}, f = (1 + c)/ c, and e = (z − (1 + c))/ c.
(iii) Using the results from Exercise 3 and the residue calculus, show that
p
z + 1 − c − (z − a)(z − b)
G(z) = , (3.3)
2z
√
using the branch defined in same way as with z2 − 4 above except a replaces −2
and b replaces 2.

Lemma 3. Let G be the Cauchy transform of a probability measure ν. Then:

lim iy G(iy) = 1 and sup y |G(x + iy)| = 1.

y→∞ y>0,x∈R

Proof: We have
1 −y2
Z Z
y Im(G(iy)) = y Im dν(t) = 2 2
dν(t)
R iy − t R y +t
1
Z Z
=− 2
dν(t) → − dν(t) = −1
R 1 + (t/y) R

as y → ∞; since (1 + (t/y)2 )−1 ≤ 1 we could apply Lebesgue’s dominated conver-

gence theorem.
We have
−yt
Z
y Re(G(iy)) = 2 2
dν(t).
R y +t

But for all y > 0 and for all t

yt 1
y2 + t 2 ≤ 2 ,

64 3 Free Harmonic Analysis

and |yt/(y2 + t 2 )| converges to 0 as y → ∞. Therefore y Re(G(iy)) → 0 as y → ∞,

again by the dominated convergence theorem. This gives the first equation of the
lemma.
For y > 0 and z = x + iy,
y y
Z Z
y |G(z)| ≤ dν(t) = dν(t) ≤ 1.
|z −
p
R t| R (x − t)2 + y2

Thus sup y |G(x + iy)| ≤ 1. By the first part, however, the supremum is 1.
y>0, x∈R
Another frequently used notation is to let m(z) = (t − z)−1 dν(t). We have
R

m(z) = −G(z) and m is usually called the Stieltjes transform of ν. It maps C+ to

C+ .

Notation 4 Let us recall the Poisson kernel from harmonic analysis. Let
1 1 1 ε
P(t) = and Pε (t) = ε −1 P(tε −1 ) = for ε > 0.
π 1 + t2 π t2 + ε2
If ν1 and ν2 are two probability measures on R recall that their convolution is
R∞
defined by ν1 ∗ ν2 (E) = −∞ ν1 (E − t) dν2 (t) (see Rudin [151, Ex. 8.5]). If ν is
a probability measure on R and f ∈ L1 (R, ν) we can define f ∗ ν by f ∗ ν(t) =
R∞
−∞ f (t − s) dν(s). Since P is bounded we can form Pε ∗ ν for any probability mea-
sure ν and any ε > 0. Moreover Pε is the density of a probability measure, namely
a Cauchy distribution with scale parameter ε. We shall denote this distribution by
δ−iε .

Remark 5. Note that δ−iε ∗ ν is a probability measure with density

1
Pε ∗ ν(x) = − Im(G(x + iε)),
π
where G is the Cauchy transform of ν. It is a standard fact that δ−iε ∗ ν converges
weakly to ν as ε → 0+ . (Weak convergence is defined in Remark 12). Thus we can
use the Cauchy transform to recover ν. In the next theorem we write this in terms of
the distribution functions of measures. In this form it is called the Stieltjes inversion
formula.

Theorem 6. Suppose ν is a probability measure on R and G is its Cauchy transform.

For a < b we have
Z b
1 1
− lim Im(G(x + iy)) dx = ν((a, b)) + ν({a, b}).
y→0+ π a 2

If ν1 and ν2 are probability measures with Gν1 = Gν2 , then ν1 = ν2 .

Proof: We have
3.1 The Cauchy transform 65

1 −y
Z Z
Im(G(x + iy)) = Im dν(t) = 2 2
dν(t).
R x − t + iy R (x − t) + y

Thus
Z b Z Z b
−y
Im(G(x + iy)) dx = 2 2
dx dν(t)
a R a (x − t) + y
Z Z (b−t)/y
1
=− 2
d x̃ dν(t)
R (a−t)/y 1 + x̃
Z b−t a − t
=− tan−1 − tan−1 dν(t),
R y y

where we have let x̃ = (x − t)/y.

So let f (y,t) = tan−1 ((b − t)/y) − tan−1 ((a − t)/y) and

0,
 t∈/ [a, b]
f (t) = π/2, t ∈ {a, b} .

t ∈ (a, b)

π,

Then limy→0+ f (y,t) = f (t), and, for all y > 0 and for all t, we have | f (y,t)| ≤ π.
So by Lebesgue’s dominated convergence theorem
Z b Z
lim Im(G(x + iy)) dx = − lim f (y,t) dν(t)
y→0+ a y→0+ R
Z
=− f (t) dν(t)
R
1
= −π(ν((a, b) + ν({a, b})).
2
This proves the first claim.
Now assume that Gν1 = Gν2 . This implies, by the formula just proved, that
ν1 ((a, b)) = ν2 ((a, b)) for all a and b which are atoms neither of ν1 nor of ν2 . Since
there are only countably many atoms of ν1 and ν2 , we can write any interval (a, b)
in the form (a, b) = ∪∞ +
n=1 (a + εn , b − εn ) for a decreasing sequence ε → 0 , such
that all a + εn and all b − εn are atoms neither of ν1 nor of ν2 . But then we get

ν1 ((a, b)) = lim ν1 ((a + εn , b − εn )) = lim ν2 ((a + εn , b − εn )) = ν2 ((a, b)).

εn →0+ εn →0+

This shows that ν1 and ν2 agree on all open intervals and thus are equal.

Example 7 (The semi-circle distribution).

As an example of Stieltjes inversion let us take a familiar example and calculate
its Cauchy transform using a generating function and then using only the Cauchy
transform find the density by using Stieltjes inversion. The density of the semi-circle
law ν := µs is given by
66 3 Free Harmonic Analysis

√
4 − t2
dν(t) = dt on [−2, 2];
2π
and the moments are given by
Z 2
(
n 0, n odd
mn = t dν(t) = ,
−2 Cn/2 , n even
where the Cn ’s are the Catalan numbers:

1 2n
Cn = .
n+1 n

Now let M(z) be the moment-generating function

M(z) = 1 +C1 z2 +C2 z4 + · · ·

then
M(z)2 = CmCn z2(m+n) = CmCn z2k .

∑ ∑ ∑
m,n≥0 k≥0 m+n=k

Now we saw in equation (2.5) that ∑m+n=k CmCn = Ck+1 , so

1
M(z)2 = ∑ Ck+1 z2k = z2 ∑ Ck+1 z2(k+1)
k≥0 k≥0

and therefore

z2 M(z)2 = M(z) − 1 or M(z) = 1 + z2 M(z)2 .

By replacing M(z) by z−1 G(1/z) we get that G satisfies the quadratic equation
zG(z) = 1 + G(z)2 . Solving this we find that
√
z ± z2 − 4
G(z) = .
2
√
We use the branch of z2 − 4 defined before Exercise 2, however we must choose
p iyG(iy) = 1.
the sign in front of the square root. By Lemma 3, we require that limy→∞
p
Note that for y > 0 we have that, using our definition, (iy)2 − 4 = i y2 + 4. Thus
p
iy − (iy)2 − 4
lim (iy) =1
y→∞ 2
and p
iy + (iy)2 − 4
lim (iy) = ∞.
y→∞ 2
Hence
3.1 The Cauchy transform 67
√
z − z2 − 4
G(z) = .
2
Of course, this agrees with the result in Exercise 5.
Returning to the equation zG(z) = 1 + G(z)2 we see that z = G(z) + 1/G(z), so
K(z) = z + 1/z and thus R(z) = z i.e. all cumulants of the semi-circle law are 0
except κ2 , which equals 1, something we observed already in Exercise 2.9.
Now let us apply Stieltjes inversion to G(z). We have
q
1/2
(x + iy)2 − 4 = (x + iy)2 − 4 sin((θ1 + θ2 )/2)

Im

(
|x2 − 4|1/2 · 0 = 0, |x| > 2
q
lim Im (x + iy)2 − 4 = 2 1/2
√
y→0+ 2
|x − 4| · 1 = 4 − x , |x| ≤ 2

and thus
p !
x + iy − (x + iy)2 − 4
lim Im(G(x + iy)) = lim Im
y→0+ y→0+ 2

0, |x| > 2
√
= − 4 − x2 .
 , |x| ≤ 2
2
Therefore

1 0, |x| > 2
√
− lim Im(G(x + iy)) = 2 .
y→0+ π  4 − x , |x| ≤ 2
2π
Hence we recover our original density.

If G is the Cauchy transform of a probability measure we cannot in general expect

G(z) to converge as z converges to a ∈ R. It might be that |G(z)| → ∞ as z → a or that
G behaves as if it has an essential singularity at a. However (z − a)G(z) always has a
limit as z → a if we take a non-tangential limit. Let us recall the definition. Suppose
f : C+ → C and a ∈ R, we say lim^z→a f (z) = b if for every θ > 0, limz→a f (z) = b
when we restrict z to be in the cone {x + iy | y > 0 and |x − a| < θ y} ⊂ C+ .

Proposition 8. Suppose ν is a probability measure on R with Cauchy transform G.

For all a ∈ R we have lim (z − a)G(z) = ν({a}).
^z→a

Proof: Let θ > 0 be given. If z = x + iy and |x − a| < θ y, then for t ∈ R we have

z − a 2 (x − a)2 + y2 1 + ( x−a
y )
2
x − a 2
= = ≤ 1 + < 1 + θ 2.
z−t (x − t)2 + y2 1 + ( x−t
y )2 y
68 3 Free Harmonic Analysis

Let m = ν({a}), δa the Dirac mass at a, and σ = ν − mδa . Then σ is a sub-

probability measure and so
Z Z
z−a z−a
|(z − a)G(z) − m| =
dσ (t) ≤
dσ (t).
z−t z−t

We have |(z − a)/(z − t)| → 0 as z → a for all t 6= a. Since {a} is a set of σ mea-
sure 0, we may apply the dominated convergence theorem to conclude that indeed
lim^z→a (z − a)G(z) = m.
Let f (z) = (z − a)G(z). Suppose f has an analytic extension to a neighbour-
hood of a then G has a meromorphic extension to a neighbourhood of a. If
m = lim^z→a f (z) > 0 then G has a simple pole at a with residue m and ν has an
atom of mass m at a. If m = 0 then G has an analytic extension to a neighbourhood
of a.
Let us illustrate this with the example of the Marchenko-Pastur distribution with
parameter c (see
p the discussion following Exercise 2.9). √ In that case we have G(z) =
√
(z + 1 − c − (z − a)(z − b)/(2z); recall that a =p (1 − c)2 and b = (1 + c)2 . If
we write this as f (z)/z with f (z) = (z + 1 − c − (z − a)(z − b) )/2 then we may
(using the convention of Exercise 6 (ii)) extend f to be analytic on {z | Re(z) < a}
by choosing π/2 < θ1 , θ2 < 3π/2. With this convention we have f (0) = 1 − c when
c < 1 and f (0) = 0 when c > 1. Note that this is exactly the weight of the atom at 0.
For many probability measures arising in free probability G has a meromorphic
extension to a neighbourhood of a given point a. This is due to two results. The
first is a theorem of Greenstein [80, Thm. 1.2] which states that G can be continued
analytically to an open set containing the interval (a, b) if and only if the restriction
of ν to (a, b) is absolutely continuous with respect to Lebesgue measure and that
the density is real analytic. The second is a theorem of Belinschi [19, Thm. 4.1]
which states that the free additive convolution (see §3.5) of two probability measures
(provided neither is a Dirac mass) has no continuous singular part and the density
is real analytic whenever positive and finite. This means that for such measures G
has a meromorphic extension to a neighbourhood of every point where the density
is positive on some open set containing the point.

Remark 9. The proof of the next theorem depends on a fundamental result of

R. Nevanlinna which provides an integral representation for an analytic function
from C+ to C+ . It is the upper half-plane version of a better known theorem about
the harmonic extension of a measure on the boundary of the open unit disc to its
interior. Suppose that ϕ : C+ → C+ is analytic, then the theorem of Nevanlinna as-
serts that there is a unique finite positive Borel measure σ on R and real numbers α
and β , with β ≥ 0 such that for z ∈ C+
1 + tz
Z
ϕ(z) = α + β z + dσ (t).
R t −z
3.1 The Cauchy transform 69

This integral representation is achieved by mapping the upper half-plane to the open
unit disc D, via ξ = (iz+1)/(iz−1),
and then defining ψ on D by ψ(ξ ) = −iϕ(z) =
−iϕ i(1 + ξ )/(1 − ξ ) and obtaining an analytic function ψ mapping the open unit
disc, D, into the complex right half-plane. In the disc version of the problem we
must find a real number β 0 and a positive measure σ 0 on ∂ D = [0, 2π] such that
Z 2π it
e +z
ψ(z) = iβ 0 + dσ 0 (t).
0 eit − z
The measure σ 0 is then obtained as a limit using the Helly selection principle (see
e.g. Lukacs [119, Thm. 3.5.1]). This representation is usually attributed to Herglotz.
The details can be found in Akhiezer and Glazman [4, Ch. VI, §59], Rudin [151,
Thm. 11.9], or Hoffman [99, p. 34].

The next theorem answers the question as to which analytic functions from C+
to C− are the Cauchy transform of a positive Borel measure.

Theorem 10. Suppose G : C+ → C− is analytic and lim supy→∞ y|G(iy)| = c < ∞.

Then there is a unique positive Borel measure ν on R such that
1
Z
G(z) = dν(t) and ν(R) = c.
R z−t

Proof: By the remark above, applied to −G, there is a unique finite positive measure
R
σ on R such that G(z) = α + β z + (1 + tz)/(z − t) dσ (t) with α ∈ R and β ≤ 0.
Considering first the real part of iyG(iy) we get that for all y > 0 large enough

1 + t2
Z
2c ≥ Re(iyG(iy)) = y2 − β + dσ (t) .
y2 + t 2

Since both −β and (1 + t 2 )/(y2 + t 2 ) dσ (t) are non-negative, the right-hand term
R

can only stay bounded if β = 0. Thus for all y > 0 sufficiently large

1 + t2
Z
dσ (t) ≤ 2c.
1 + (t/y)2

Thus by the monotone convergence theorem (1 + t 2 ) dσ (t) ≤ 2c and so σ has a

second moment.
From the imaginary part of iyG(iy) we get that for all y > 0 sufficiently large

t(y2 − 1)
Z

y α +

2 2
dσ (t) ≤ 2c,
R t +y

which implies that

t(y2 − 1)
Z
α = − lim 2 2
dσ (t).
y→∞ R t +y
70 3 Free Harmonic Analysis

Since |(y2 − 1)/(t 2 + y2 )| ≤ 1 for y ≥ 1 and since σ has a second (and hence also
a first) moment we can apply the dominated convergence theorem and conclude that

1 − y−2
Z Z
α = − lim t dσ (t) = − tdσ (t).
y→∞ R 1 + (t/y)2 R

Hence
Z
1 + tz 1 1
Z Z
G(z) = −t + dσ (t) = (1 + t 2 )dσ (t) = dν(t),
R z−t R z−t R z−t

where we have put ν(E) := E (1 + t 2 )dσ (t). This ν is a finite measure since σ has
R

a second moment. So G is the Cauchy transform of the positive Borel measure ν.

Since the imaginary part of iyG(iy) tends to 0, by Lemma 3, and the real part is
positive, we have
Z
c = lim sup |iyG(iy)| = lim Re(iyG(iy)) = (1 + t 2 ) dσ (t) = ν(R).
y→∞ y→∞

Remark 11. Recall that in Definition 2.11 we defined the Marchenko-Pastur law via
the density νc on R. We then showed in Exercise 2.11 the free cumulants of νc
are given by κn = c for all n ≥ 1. We can also approach the Marchenko-Pastur
distribution from the other direction; namely start with the free cumulants and derive
the density using Theorems 6 and 10.
If we assume that κn = c for all n ≥ 1 and 0 < c < ∞, then R(z) = c/(1 − z) and
so by the reverse of equation (2.27)
1
+ R(G(z)) = z (3.4)
G(z)

we conclude that G satisfies the quadratic equation

1 c
+ = z.
G(z) 1 − G(z)
√ √
So using our previous notation: a = (1 − c)2 and b = (1 + c)2 , we have
p
z + 1 − c ± (z − a)(z − b)
G(z) = .
2z
p
Aspin Exercise 6 we choose the branch of the square root defined by (z − a)(z − b)
= |(z − a)(z − b)| ei(θ1 +θ2 )/2 , where 0 < θ1 , θ2 < π and θ1 is the argument of z − b
+
p of z − a. This gives us an analytic function on C . To choose
and θ2 is the argument
the sign in front of (z − a)(z − b) we take our lead from Theorem 6.

Exercise 7. Show the following.

3.1 The Cauchy transform 71

(i) When z = iy with y > 0 we have

p
z+1−c− (z − a)(z − b)
lim z · = 1;
y→∞ 2z
(ii) and p
z+1−c+ (z − a)(z − b)
lim z · = ∞.
y→∞ 2z
(iii) For z ∈ C+ show that
p
z+1−c− (z − a)(z − b)
6∈ R.
2z

This forces the sign, so now we let

p
z + 1 − c − (z − a)(z − b)
G(z) = for z ∈ C+ .
2z
This is our candidate for the Cauchy transform of a probability measure. Since
G(C+ ) is an open connected subset of C \ R we have that G(C+ ) is contained in
either C+ or C− . Part (i) of Exercise 7 shows that G(C+ ) ⊂ C− . So by Theorem 10
there is a probability measure on R for which G is the Cauchy transform.
Exercise 8. As was done in Example 7, show by Stieltjes inversion that the proba-
bility measure of which G is the Cauchy transform is νc .

Exercise 9. Let a and b be real numbers with b ≤ 0. Let G(z) = (z − a − ib)−1 .

Show that G is the Cauchy transform of a probability measure, δa+ib , which has a
density and find its density using Stieltjes inversion. Let ν be a probability measure
on R with Cauchy transform G. Show that, G, e the Cauchy transform of δa+ib ∗ ν is
e = G(z − (a + ib)). Here ∗ denotes the classical convolution, c.f.
the function G(z)
Notation 4.
Note that though G looks like the Cauchy transform of the complex constant
random variable a + ib, it is shown here that it is actually the Cauchy transform of an
(unbounded) real random variable. To be clear, we have defined Cauchy transforms
only for real-valued random variables, i.e., probability measures on R.

Remark 12. If {νn }n is a sequence of finite Borel measures on R we say that {νn }n
converges weakly to the measure ν if for every f ∈ Cb (R) (the continuous bounded
R R
functions on R) we have limn f (t) dνn (t) = f (t) dν(t). We say that {νn }n con-
verges vaguely to ν if for every f ∈ C0 (R) (the continuous functions on R vanish-
R R
ing at infinity) we have limn f (t) dνn (t) = f (t) dν(t). Weak convergence implies
vague convergence but not conversely. However if all νn and ν are probability mea-
sures then the vague convergence of {νn }n to ν does imply that {νn }n converges
72 3 Free Harmonic Analysis

weakly to ν [55, Thm. 4.4.2]. If {νn }n is a sequence of probability measures con-

verging weakly to ν then the corresponding sequence of Cauchy transforms, {Gn }n ,
converges pointwise to the Cauchy transform of ν, as for fixed z ∈ C+ , the function
t 7→ (z − t)−1 is a continuous function on R, bounded by |z − t|−1 ≤ (Im(z))−1 . The
following theorem gives the converse.
Theorem 13. Suppose that {νn }n is a sequence of probability measures on R with
Gn the Cauchy transform of νn . Suppose {Gn }n converges pointwise to G on C+ .
If limy→∞ iy G(iy) = 1 then there is a unique probability measure ν on R such that
νn → ν weakly, and G(z) = (z − t)−1 dν(t).
R

Proof: {Gn }n is uniformly bounded on compact subsets of C+ (as we have |G(z)|

≤ |Im(z)|−1 for the Cauchy transform of any probability measure), so by Montel’s
theorem {Gn }n is relatively compact in the topology of uniform convergence on
compact subsets of C+ , thus, in particular, {Gn }n has a subsequence which con-
verges uniformly on compact subsets of C+ to an analytic function, which must be
G. Thus G is analytic. Now for z ∈ C+ , G(z) ∈ C− . Also for each n ∈ N, x ∈ R and
y > 0, y |Gn (x + iy)| ≤ 1. Thus ∀x ∈ R, ∀y ≥ 0, y |G(x + iy)| ≤ 1. So in particular, G
is non-constant. If for some z ∈ C+ , Im(G(z)) = 0 then by the minimum modulus
principle G would be constant. Thus G maps C+ into C− . Hence by Theorem 10
there is a unique finite measure ν such that G(z) = R (z − t)−1 dν(t) and ν(R) ≤ 1.
R

Since, by assumption, limy→∞ iyG(iy) = 1 we have by Theorem 10 that ν(R) = 1

and thus ν is a probability measure.
Now by the Helly selection theorem there is a subsequence νnk k converging
vaguely to some measure ν̃. For fixed z the function t 7→ (z − t)−1 is in C0 (R).
Thus for Im(z) > 0, Gnk (z) = R (z − t)−1 dνnk (t) → R (z − t)−1 d ν̃(t). Therefore
R R

G(z) = (z − t)−1 d ν̃(t) i.e. ν = ν̃. Thus {νnk }k converges vaguely to ν. Since ν is
R

a probability measure {νnk }k converges weakly to ν. So all weak cluster points of

{νn }n are ν and thus the whole sequence {νn }n converges weakly to ν.
Note that we needed the assumption limy→∞ y(G(iy) = −i in order to ensure that
the limit measure ν is indeed a probability measure. In general, without this assump-
tion, one might lose mass in the limit, and one has only the following statement.
Corollary 14. Suppose {νn }n is a sequence of probability measures on R with
Cauchy transforms {Gn }n . If {Gn }n converges pointwise on C+ , then there is a
finite positive Borel measure ν with ν(R) ≤ 1 such that {νn } converges vaguely to
ν.

Exercise 10. Identify νn and ν for the sequence of Cauchy transforms which are
given by Gn (z) = 1/(z − n).

3.2 Moments and asymptotic expansions

We saw in Lemma 3 that zG(z) approaches 1 as z approaches ∞ in C+ along the
imaginary axis. Thus zG(z) − 1 approaches 0 as z ∈ C+ tends to ∞. Quantifying how
3.2 Moments and asymptotic expansions 73

Fig. 3.2 The Stolz angle Γα,β .

Γα,β

|
|x
=
αy
y=β

fast zG(z) − 1 approaches 0 will be useful in showing that near ∞, G is univalent

and thus has an inverse. If our measure has moments then we get an asymptotic
expansion for the Cauchy transform.

Notation 15 Let α > 0 and let Γα = {x + iy | αy > |x|} and for β > 0 let Γα,β =
{z√∈ Γα | Im(z) > β }. See Fig. 3.2. Note that for z ∈ C+ we have z ∈ Γα if and only
if 1 + α 2 Im(z) > |z|.

Definition 16. If α > 0 is given and f is a function on Γα we say limz→∞,z∈Γα f (z) =

c to mean that for every ε > 0 there is β > 0 so that | f (z) − c| < ε for z ∈ Γα,β . If
this holds for every α we write lim^z→∞ f (z) = c. When it is clear from the context
we shall abbreviate this to limz→∞ f (z) = c. We call Γα a Stolz angle and Γα,β a
truncated Stolz angle. To show convergence in a Stolz angle it is sufficient to show
convergence along a sequence {zn }n in Γα tending to infinity. Hence the usual rules
for sums and products of limits apply.
We extend this definition to the case c = ∞ as follows. If for every α1 < α2 and
every β2 > 0 there is β1 such that f (Γα1 ,β1 ) ⊂ Γα2 ,β2 we say lim f (z) = ∞.
^z→∞

Exercise 11. Show that if lim^z→∞ f (z)/z = 1 then lim^z→∞ f (z) = ∞.

In the following exercises G will be the Cauchy transform of the probability
measure ν.
Exercise 12. Let ν be a probability measure on R and α > 0. In this exercise we
will consider limits as z → ∞ with z ∈ Γα√. Show that:
(i) for z ∈ Γα and t ∈ R, |z − t| ≥ |t|/ √1 + α 2 ;
(ii) for z ∈ Γα and t ∈ R, |z − t| ≥ |z|/ 1 + α 2 ;
R
(iii) limz→∞ R t/(z − t) dν(t) = 0;
(iv) limz→∞ zG(z) = 1.

Exercise 13. Let F : C+ → C+ be analytic and let

1 + tz
Z
F(z) = a + bz + dσ (t)
R t −z
74 3 Free Harmonic Analysis

be its Nevanlinna representation with a real and b ≥ 0. Then for all α > 0 we have
limz→∞ F(z)/z = b for z ∈ Γα .

Exercise 14. Let ν be a probability measure on R. Suppose ν has a moment of order

n, i.e. R |t|n dν(t) < ∞. Let α1 , . . . , αn be the first n moments of ν. Let α > 0 be
R

given. As in Exercise 12 all limits as z → ∞ will be assumed to be in a Stolz angle

as in Notation 15.
(i) Show that
Z n+1
t dν(t) = 0.
lim
z→∞ R z − t

(ii) Show that

n+1 1 α1 α2 αn
lim z G(z) − + + 3 + · · · + n+1 = 0.
z→∞ z z2 z z

Exercise 15. Suppose that α > 0 and ν is a probability measure on R and that for
some n > 0 there are real numbers α1 , α2 , . . . , α2n such that as z → ∞ in Γα

2n+1 1 α1 α2n
lim z G(z) − + + · · · + 2n+1 = 0.
z→∞ z z2 z
2n dν(t) < ∞
R
Show that ν has a moment of order 2n, i.e. Rt and that α1 , α2 , . . . , α2n
are the first 2n moments of ν.

3.3 Analyticity of the R-transform: compactly supported measures

We now turn to the problem of showing that the R-transform is always an analytic
function; recall that in Chapter 2 the R-transform was defined as a formal power se-
ries, satisfying Equation (2.26). The more we assume about the probability measure
ν the better behaved is R(z).
Indeed, if ν is a compactly supported measure, supported in the interval [−r, r],
then R(z) is analytic on the disc with centre 0 and radius 1/(6r). Moreover the
coefficients in the power series expansion R(z) = κ1 + κ2 z + κ3 z2 + · · · are exactly
the free cumulants introduced in Chapter 2.
If ν has variance σ 2 then R(z) is analytic on a disc with centre −i/(4σ ) and
radius 1/(4σ ) (see Theorem 26). Note that 0 is on the boundary of this disc so ν
may fail to have any free cumulants beyond the second. However if ν has moments
of all orders then R(z) has an asymptotic expansion at 0 and the coefficients in this
expansion are the free cumulants of ν.
The most general situation is when ν is not assumed to have any moments. Then
R(z) is analytic on a wedge ∆α,β = {z−1 | z ∈ Γα,β } in the lower half-plane with 0
at its vertex (see Theorem 33).
3.3 Analyticity of the R-transform: compactly supported measures 75

Consider now first the case that ν is a compactly supported probability measure
on R. Then ν has moments of all orders. We will show that the Cauchy transform of
ν is univalent on the exterior of a circle centred at the origin. We can then solve the
equation G(R(z) + 1/z) = z for R(z) to obtain a function R, analytic on the interior
of a disc centred at the origin and with power series given by the free cumulants of
ν. The precise statements are given in the next theorem.

Theorem 17. Let ν be a probability measure on R with support contained in the

interval [−r, r] and let G be its Cauchy transform. Then
(i) G is univalent on {z | |z| > 4r};
(ii) {z | 0 < |z| < 1/(6r)} ⊂ {G(z) | |z| > 4r};
(iii) there is a function R, analytic on {z | |z| < 1/(6r)} such that G(R(z) + 1/z) = z
for 0 < |z| < 1/(6r);
(iv) if {κn }n are the free cumulants of ν then, for |z| < 1/(6r), ∑n≥1 κn zn−1 converges
to R(z).

Proof:
Let {αn }n be the moments of ν and let α0 = 1. Note that |αn | ≤ |t|n dν(t) ≤ rn .
R

Let
1
Z
f (z) = G 1/z = z dν(t).
1 − tz
For |z| < 1/r and t ∈ supp(ν), |zt| < 1 and the series ∑(zt)n converges uniformly
on supp(ν) and thus ∑n≥0 αn zn+1 converges uniformly to f (z) on compact subsets
of {z | |z| < 1/r}. Hence ∑n≥0 αn z−(n+1) converges uniformly to G(z) on compact
subsets of {z | |z| > r}.
Suppose |z1 |, |z2 | < r−1 . Then

f (z1 ) − f (z2 )
≥ Re f (z1 ) − f (z2 )

z1 − z2 z1 − z2
Z 1
d f (z1 + t(z2 − z1 ))

= Re dt
0 dt z2 − z1
Z 1
Re f 0 (z1 + t(z2 − z1 )) dt.

=
0

And

Re( f 0 (z)) = Re(1 + 2zα1 + 3z2 α2 + · · · )

≥ 1 − 2 |z| r − 3 |z|2 r2 − · · ·
= 2 − (1 + 2(|z| r) + 3(|z| r)2 + · · · )
1
= 2− .
(1 − |z| r)2

For |z| < (4r)−1 we have

76 3 Free Harmonic Analysis

1 2
Re( f 0 (z)) ≥ 2 − 2
= .
(1 − 1/4) 9

Hence for |z1 | , |z2 | < (4r)−1 we have | f (z1 ) − f (z2 )| ≥ 2|z1 − z2 |/9. In particular, f
is univalent on {z | |z| < (4r)−1 }. Hence G is univalent on {z | |z| > 4r}. This proves
(i).
For any curve Γ in C and any w not on Γ let IndΓ (w) = Γ (z − w)−1 dz/(2πi)
R

be the index of w with respect to Γ (or the winding number of Γ around w). Now,
as f (0) = 0, the only solution to f (z) = 0 for |z| < (4r)−1 is z = 0. Let Γ be the
curve {z | |z| = (4r)−1 } and f (Γ ) = { f (z) | z ∈ Γ } be the image of Γ under f . By
the argument principle

1 f 0 (z)
Z
Ind f (Γ ) (0) = dz = 1.
2πi Γ f (z)

Also for |z| < (4r)−1

| f (z)| = |z| |1 + α1 z + α2 z2 + · · · |
≥ |z| (2 − (1 + r|z| + r2 |z|2 + · · · ))

1
= |z| 2 −
1 − r |z|

1
≥ |z| 2 −
1 − 1/4
2
= |z|.
3
Thus for |z| = (4r)−1 we have | f (z)| ≥ (6r)−1 . Hence f (Γ ) lies outside the circle
|z| = (6r)−1 and thus {z | |z| < (6r)−1 } is contained in the connected component of
C\ f (Γ ) containing 0. So for w ∈ {z | |z| < (6r)−1 }, Ind f (Γ ) (w) = Ind f (Γ ) (0) = 1, as
the index is constant on connected components of the complement of f (Γ ). Hence

1 f 0 (z)
Z
1 = Ind f (Γ ) (w) = dz,
2πi Γ f (z) − w

so again by the argument principle there is exactly one |z| with z < (4r)−1 such that
f (z) = w. Hence

{z | |z| < (6r)−1 } ⊂ { f (z) | |z| < (4r)−1 }

and thus
{z | 0 < |z| < (6r)−1 } ⊂ {G(z) | |z| > 4r}.
This proves (ii).
3.3 Analyticity of the R-transform: compactly supported measures 77

Let f h−1i be the inverse of f on {z | |z| < (6r)−1 }. Then f h−1i (0) = 0 and
0
f h−1i (0) = 1/ f 0 (0) = 1, so f h−1i has a simple zero at 0. Let K be the meromorphic
function on {z | |z| < (6r)−1 } given by K(z) = 1/ f h−1i (z). Then K has a simple pole
at 0 with residue 1. Hence R(z) = K(z) − 1/z is holomorphic on {z | |z| < (6r)−1 },
and for 0 < |z| < (6r)−1

1
G(R(z) + 1/z) = G(K(z)) = f = f ( f h−1i )(z) = z.
K(z)

This proves (iii).

Let C(z) = 1 + zR(z) = zK(z). Then C is analytic on {z | |z| < (6r)−1 } and so has
a power series expansion ∑n≥0 κ̃n zn , with κ̃0 = 1. We shall have proved (iv) if we
can show that for all n ≥ 1, κ̃n = κn where {κn }n are the free cumulants of ν.
Recall that f (0) = 0, so M(z) := f (z)/z = ∑n≥0 αn zn is analytic on {z | |z| < r−1 }.
For z such that |z| < (4r)−1 and | f (z)| < (6r)−1 we have

f (z) f (z)
C( f (z)) = f (z)K( f (z)) = = = M(z). (3.5)
f h−1i ( f (z)) z

Fix p ≥ 1. Then we may write

p
M(z) = 1 + ∑ αl zl + o(z p ),
l=1

p
C(z) = 1 + ∑ κ̃l zl + o(z p ),
l=1

and
p l
( f (z))l = ∑ αm−1 zm + o(z p ).
m=1
Hence
p p l
C( f (z)) = 1 + ∑ κ̃l ∑ αm−1 zm + o(z p ).
l=1 m=1

Thus by equation (3.5)

p p p l
1 + ∑ αl zl = 1 + ∑ κ̃l ∑ αm−1 zm + o(z p ).
l=1 l=1 m=1

However this is exactly the relation between {αn }n and {κn }n found at the end of
the proof of Proposition 2.17. Given {αn }n there are unique κn ’s that satisfy this
relation, so we must have κ̃n = κn for all n. This proves (iv).

Remark 18. A converse to Theorem 17 was found by F. Benaych-Georges [27].

Namely, if a probability measure ν has an R-transform which is analytic on an open
set containing 0 and for all k ≥ 0, the kth derivative R(k) (0) is a real number, then ν
78 3 Free Harmonic Analysis

has compact support. Note that for the Cauchy distribution R(k) (0) = 0 for k ≥ 1 but
R(0) is not real.

3.4 Measures with finite variance

In the previous section we showed that if ν has compact support then the R-
transform of ν is analytic on an open disc containing 0. If we assume that ν has
finite variance but make no assumption about the support then we can still conclude
that the equation G(R(z) + 1/z) = z has an analytic solution on an open disc in the
lower half plane. The main problem is again demonstrating the univalence of G,
which is accomplished by a winding number argument.
We have already seen in Exercise 12 that zG(z) − 1 → 0 as z → ∞ in some Stolz
angle Γα . Let G1 (z) = z − 1/G(z). Then G1 (z)/z → 0, i.e. G1 (z) = o(z). If ν has a
first moment α1 then z2 (G(z) − (1/z + α1 /z2 )) → 0 and we may write
1 α
1
z2 G(z) − + = zG(z)(G1 (z) − α1 ) + α1 (zG(z) − 1).
z z2

Thus G1 (z) → α1 . Suppose ν has a second moment α2 then

1 α
1 α2
z3 G(z) − + 2 + 3 →0
z z z

or equivalently
1 α1 α2 1
G(z) = + 2 + 3 +o 3
z z z z
and thus
1 α2 − α12 1
G1 (z) = z − 1
= α1 + +o . (3.6)
z + α1
z2
+ α2
z3
+ o( z13 ) z z

The next lemma shows that G1 maps C+ to C− . We shall work with the function
F = 1/G. It will be useful to establish some properties of F (Lemmas 19 and 20) and
then show that these properties characterize the reciprocals of Cauchy transforms of
measures of finite variance (Lemma 21).
Lemma 19. Let ν be a probability measure on R and G its Cauchy transform. Let
F(z) = 1/G(z). Then F maps C+ to C+ and Im(z) ≤ Im(F(z)) for z ∈ C+ , with
equality for some z only if ν is a Dirac mass.

Proof: Im(G(z)) = −Im(z) |z − t|−2 dν(t), so

|z − t|−2 dν(t)
R
Im(F(z)) −Im(G(z)) 1
= = .
Im(z) Im(z) |G(z)|2 |G(z)|2

So our claim reduces to showing that |G(z)|2 ≤ |z − t|−2 dν(t). However by the
R

Cauchy-Schwartz inequality
3.4 Measures with finite variance 79
Z 2 Z
1 2
1
Z Z 1 2
2

z − t dν(t) ≤ 1 dν(t) z − t dν(t) = dν(t),

z−t

with equality only if t 7→ (z − t)−1 is ν-almost constant, i.e. ν is a Dirac mass. This
completes the proof.

Lemma 20. Let ν be a probability measure with finite variance σ 2 and let G1 (z) =
z − 1/G(z), where G is the Cauchy transform of ν. Then there is a probability mea-
sure ν1 such that
1
Z
G1 (z) = α1 + σ 2 dν1 (t)
z−t
where α1 is the mean of ν.

Proof: If σ 2 = 0, then ν is a Dirac mass, thus G1 (z) = α1 and the assertion is

trivially true. So let us assume that σ 2 6= 0. G1 (z) = z − 1/G(z) is analytic on C+
and by the previous lemma G1 (C+ ) ⊂ C− . Let α1 and α2 be the first and second
moment of ν, respectively. Clearly, we also have (G1 − α1 )(C+ ) ⊂ C− and, by
equation (3.6), limz→∞ z(G1 (z) − α1 ) = α2 − α12 = σ 2 > 0. Thus by Theorem 10
there is a probability measure ν1 such that
1
Z
G1 (z) − α1 = σ 2 dν1 (t).
z−t

Lemma 21. Suppose that F : C+ → C+ is analytic and there is C > 0 such that for
z ∈ C+ , |F(z) − z| ≤ C/Im(z). Then there is a probability measure ν with mean 0
and variance σ 2 ≤ C such that 1/F is the Cauchy transform of ν. Moreover σ 2 is
the smallest C such that |F(z) − z| ≤ C/Im(z).

Proof: Let G(z) = 1/F(z). Then G : C+ → C− is analytic and

1 − 1 = |F(z) − z| ≤ C

.
zG(z) |z| |z|Im(z)

Hence limz→∞ zG(z) = 1 in any Stolz angle. Thus by Theorem 10 there is a proba-
bility measure ν such that G is the Cauchy transform of ν. Now

y2 2 y2
Z Z
2

2 2
t dν(t) = y − 2 2
dν(t) + 1 = y Im iy G(iy) (F(iy) − iy) .
y +t y +t

Also, allowing that both sides might equal ∞, we have by the monotone convergence
theorem that
y2 2
Z Z
t 2 dν(t) = lim t dν(t).
y→∞ y + t2
2

However
80 3 Free Harmonic Analysis

y |iy G(iy)|C
|yIm iyG(iy) (F(iy) − iy) | ≤ = C |iy G(iy)|,
Im(iy)

thus t 2 dν(t) ≤ C, and so ν has a second, and thus also a first, moment. Also
R

y2
Z
t dν(t) = −y2 Re(G(iy)) = −Re iyG(iy)(F(iy) − iy) .

y2 + t 2

Since iyG(iy) → 1 and |F(iy) − iy| ≤ C/y we see that the first moment of ν is 0, also
by the monotone convergence theorem.
We now have that σ 2 ≤ C. The inequality |z−F(z)| ≤ C/Im(z) precludes ν being
a Dirac mass other than δ0 . For ν = δ0 we have F(z) = z, and then the minimal C
is clearly 0 = σ 2 . Hence we can restrict to ν 6= δ0 , hence to ν not being a Dirac
mass. Thus by Lemma 19 we have for z ∈ C+ that z − F(z) ∈ C− . By equation
(3.6), limz→∞ z(z − F(z)) = σ 2 in any Stolz angle. Hence by Theorem 10 there is a
probability measure ν̃ such that z − F(z) = σ 2 (z − t)−1 d ν̃(t). Hence
R

1 1 σ2
Z Z
|z − F(z)| ≤ σ 2 d ν̃(t) ≤ σ 2 d ν̃(t) = .
|z − t| Im(z) Im(z)

This proves the last claim.

Exercise 16. Suppose ν has a fourth moment and we write
1 α1 α2 α3 α4 1
G(z) = + 2 + 3 + 4 + 5 +o 5 .
z z z z z z
Show that
1 β0 β1 β2 1
z− = α1 + + 2 + 3 + o 3
G(z) z z z z

where

β0 = α2 − α12 β1 = α3 − 2α1 α2 + α13 β2 = α4 − 2α1 α3 − α22 + 3α12 α2 − α14

and thus conclude that the probability measure ν1 of Lemma 20 has the second
moment β2 /β0 .

Remark 22. We have seen that if ν has a second moment then we may write
1 1
G(z) = 1
= R 1
(z − α1 ) − (α2 − α12 )
R
z−t dν1 (t) (z − a1 ) − b1 z−t dν1 (t)

where ν1 is a probability measure on R, and a1 = α1 , b1 = α2 − α12 . If ν has a fourth

moment then ν1 will have a second moment and we may repeat our construction to
write
3.4 Measures with finite variance 81

1 1
Z
dν1 (t) = 1
z−t
R
(z − a2 ) − b2 z−t dν2 (t)
for some probability measure ν2 , where a2 = (α3 − 2α1 α2 + α13 )/(α2 − α12 ) and
b2 = (α2 α4 + 2α1 α2 α3 − α23 − α12 α4 − α32 )/(α2 − α12 )2 . Thus

1
G(z) = .
b1
z − a1 − R 1
z − a2 − b2 z−t dν2 (t)

If ν has moments of all orders {αn }n then the Cauchy transform of ν has a
continued fraction expansion (often called a J-fraction because of the connection
with Jacobi matrices).

1
G(z) = .
b1
z − a1 −
b2
z − a2 −
z − a3 − · · ·
The coefficients {an }n and {bn }n are obtained from the moments {αn }n as fol-
lows. Let An be the (n + 1) × (n + 1) Hankel matrix

1 α1 · · · αn
 
 α1 α2 · · · αn+1 
An =  . .. .. 
 
 .. . . 
αn αn+1 · · · α2n

and Ãn−1 be the n × n matrix obtained from An by deleting the last row and second
last column and Ã0 = (α1 ). Then let ∆−1 = 1, ∆n = det(An ), ∆˜ −1 = 0, and ∆˜ n =
det(Ãn ). By Hamburger’s theorem (see Shohat and Tamarkin [157, Thm. 1.2]) we
have that for all n, ∆n ≥ 0. Then b1 b2 · · · bn = ∆n /∆n−1 and

a1 + a2 + · · · + an = ∆˜ n−1 /∆n−1 ,
2
or equivalently bn = ∆n−2 ∆n /∆n−1 and an = ∆˜ n−1 /∆n−1 − ∆˜ n−2 /∆n−2 . If for some
n, ∆n = 0 then we only get a finite continued fraction.

Notation 23 For β > 0 let C+

β = {z | Im(z) > β }.

Lemma 24. Suppose F : C+ → C+ is analytic and there is σ > 0 such that for
z ∈ C+ we have |z − F(z)| ≤ σ 2 /Im(z). Then
(i) C+ +
2σ ⊂ F(Cσ );
(ii) for each w ∈ C+ +
2σ there is a unique z ∈ Cσ such that F(z) = w.

Hence there is an analytic function, F h−1i , defined on C+

2σ such that F(F
h−1i (w)) =
+
w. Moreover for w ∈ C2σ
82 3 Free Harmonic Analysis

(iii) Im(F h−1i (w)) ≤ Im(w) ≤ 2 Im(F h−1i (w)), and

(iv) |F h−1i (w) − w| ≤ 2σ 2 /Im(w).

Proof: Suppose Im(w) > 2σ . If |z − w| = σ then

Im(z) ≥ Im(w − iσ ) = Im(w) − σ > 2σ − σ = σ .

Let C be the circle with centre w and radius σ . Then C ⊂ C+

σ . For z ∈ C we have

σ2
|(F(z) − w) − (z − w)| = |F(z) − z| ≤ < σ = |z − w|.
Im(z)

Thus by Rouché’s theorem there is a unique z ∈ int(C) with F(z) = w. This proves
(i).
If z0 ∈ C+ 0
σ and F(z ) = w then

σ2
|w − z0 | = |F(z0 ) − z0 | ≤ < σ,
Im(z0 )

so z0 ∈ int(C) and hence z = z0 . This proves (ii). We define F h−1i (w) = z.

By Lemma 21, 1/F is the Cauchy transform of a probability measure with finite
variance. Thus by Lemma 19 we have that Im(F(z)) ≥ Im(z) thus Im(F h−1i (w)) ≤
Im(w). On the other hand, by replacing σ in (i) by β > σ , one has for w ∈ C+ 2β
that Im(F h−1i (w)) > β . By letting 2β approach Im(w) we get that Im(F h−1i (w)) ≥
1
2 Im(w). This proves (iii).
For w ∈ C+ 2σ let z = F
h−1i (w) ∈ C+ , then by (iii)
σ

σ2 2σ 2
|F h−1i (w) − w| = |z − w| = |F(z) − z| ≤ ≤ .
Im(z) Im(w)

This proves (iv).

Theorem 25. Let ν be a probability measure on R with first and second moments α1
and α2 . Let G(z) = (z −t)−1 dν(t) be the Cauchy transform of ν and σ 2 = α2 −α12
R

be the variance of ν. Let F(z) = 1/G(z), then |F(z)+α1 −z| ≤ σ 2 /Im(z). Moreover
there is an analytic function Gh−1i defined on {z | |z + i(4σ )−1 | < (4σ )−1 } such that
G(Gh−1i (z)) = z.

e = G(z +α1 ). Then G

Proof: Let G(z) e is the Cauchy transform of a centred probabil-
e = 1/G
ity measure. Let F(z) e then Fe : C+ → C+ . By Lemma 20 there is a probability
e = σ 2 (z − t)−1 d ν̃(t). Thus
R
measure ν̃ such that z − F(z)

σ2 σ2 σ2
Z Z
|z − F(z)|
e ≤ d ν̃(t) ≤ d ν̃(t) = .
|z − t| Im(z) Im(z)

Then |F(z) + α1 − z| ≤ σ 2 /Im(z).

3.5 Free additive convolution 83

Fig. 3.3 If a probability measure ν has finite variance σ 2

then the R-transform of ν is analytic on a disc in the lower
half-plane with centre −i(4σ )−1 and passing through 0. −i
4σ

If we apply Lemma 24, we get an inverse for Fe on {z | Im(z) > 2σ }. Note that
|z + i(4σ )−1 | < (4σ )−1 if and only if Im(1/z) > 2σ . Since G(z) = 1/F̃(z − α1 ) we
let Gh−1i (z) = Feh−1i (1/z) + α1 for |z + i(4σ )−1 | < (4σ )−1 . Then
1
G Gh−1i (z) = G Feh−1i (1/z) + α1 = G
e Feh−1i (1/z) = = z.
Fe Feh−1i (1/z)

In the next theorem we show that with the assumption of finite variance σ 2 we
can find an analytic function R which solves the equation G(R(z) + 1/z) = z on the
open disc with centre −i(4σ )−1 and radius (4σ )−1 . This is the R-transform of the
measure.
Theorem 26. Let ν be a probability measure with variance σ 2 . Then on the open
disc with centre −i(4σ )−1 and radius (4σ )−1 there is an analytic function R(z) such
that G(R(z) + 1/z) = z for |z + i(4σ )−1 | < (4σ )−1 where G the Cauchy transform
of ν.

Proof: Let Gh−1i be the inverse provided by Theorem 25 and R(z) = Gh−1i (z) − 1/z.
Then G(R(z) + 1/z) = G(Gh−1i (z)) = z.
One should note that the statements and proofs of Theorems 25 and 26, inter-
preted in the right way, remain also valid for the degenerated case σ 2 = 0, where ν
is a Delta mass. Then Gh−1i and R are defined on the whole lower half-plane C− ;
actually, for ν = δα1 we have R(z) = α1 .

3.5 The free additive convolution of probability measures with finite variance
One of the main ideas of free probability is that if we have two self-adjoint operators
a1 and a2 in a unital C∗ -algebra with state ϕ and if a1 and a2 are free with respect to
ϕ then we can find the moments of a1 +a2 from the moments of a1 and a2 according
to a universal rule. Since a1 , a2 and a1 + a2 are all bounded self-adjoint operators
there are probability measures ν1 , ν2 , and ν such that for i = 1, 2
Z Z
ϕ(aki ) = t k dνi (t) and ϕ((a1 + a2 )k ) = t k dν(t).

We call ν the free additive convolution of ν1 and ν2 and denote it ν1 ν2 . An

important observation is that because of the universal rule, ν1 ν2 only depends
84 3 Free Harmonic Analysis

on ν1 and ν2 and not on the operators a1 and a2 used to construct it. For bounded
operators we also know that the free additive convolution can be described by the
additivity of their R-transforms We shall show in this section how to construct ν1
ν2 without assuming that the measures have compact support and thus without using
Banach algebra techniques. As we have seen in the last section we can still define
an R-transform by analytic means (at least for the case of finite variance); the idea is
then of course to define ν = ν1 ν2 by prescribing the R-transform of ν as the sum
of the R-transforms of ν1 and ν2 . However, it is then not at all obvious that there
actually exists a probability measure with this prescribed R-transform. In order to
see that this is indeed the case, we have to reformulate our description in terms of
the R-transform in a subordination form, as already alluded to in (2.31) at the end
of the last chapter.
Recall that the R-transform in the compactly supported case satisfied the equation
G(R(z) + 1/z) = z for |z| sufficiently small. So letting F(z) = 1/G(z) this becomes
F(R(z) + 1/z) = z−1 . For |z| sufficiently small Gh−1i (z) is defined, and hence also
F h−1i (z−1 ); then for such z we have

R(z) = F h−1i (z−1 ) − z−1 . (3.7)

If ν1 and ν2 are compactly supported with Cauchy transforms G1 and G2 and

corresponding F1 , F2 , R1 and R2 , then we have the equation R(z) = R1 (z) + R2 (z)
for |z| small, this implies
h−1i −1 h−1i −1
F h−1i z−1 − z−1 = F1 − z−1 + F2 − z−1 .

z z
h−1i h−1i
If we let w1 = F1 (z−1 ), w2 = F2 (z−1 ), and w = F h−1i (z−1 ) this equation be-
comes
w − F(w) = w1 − F1 (w1 ) + w2 − F2 (w2 ).
Since z−1 = F(w) = F1 (w1 ) = F2 (w2 ) we can write this as

F1 (w1 ) = F2 (w2 ) and w = w1 + w2 − F1 (w1 ). (3.8)

We shall now show, given two probability measures ν1 and ν2 with finite variance,
we can construct a probability measure ν with finite variance such that R = R1 + R2 ,
the R-transforms of ν, ν1 , and ν2 respectively.
Given w ∈ C+ we shall show in Lemma 27 that there are w1 and w2 in C+ such
that (3.8) holds. Then we define F by F(w) = F1 (w1 ) and show that 1/F is the
Cauchy transform of a probability measure of finite variance. This measure will
then be the free additive convolution of ν1 and ν2 . Moreover the maps w 7→ w1 and
w 7→ w2 will be the subordination maps of equation (2.31).
We need the notion of the degree of an analytic function which we summarize in
the exercise below.
3.5 Free additive convolution 85

Exercise 17. Let X be a Riemann surface and f : X → C an analytic map. Let us

recall the definition of the multiplicity of f at z0 in X (see e.g. Miranda [133, Ch.
II, Def. 4.2]). There is m ≥ 0 and a chart (U, ϕ) of z0 such that ϕ(z0 ) = 0 and
f (ϕ h−1i (z)) = zm + f (z0 ) for z in ϕ(U). We set mult( f , z0 ) = m. For each z in C we
define the degree of f at z, denoted deg f (z), by

deg f (z) = ∑ mult( f , w).

w∈ f h−1i (z)

It is a standard theorem that if X is compact then deg f is constant (see e.g. Miranda
[133, Ch. II, Prop. 4.8]).
(i) Adapt the proof in the compact case to show that if X is not necessarily com-
pact but f is proper, i.e. if the inverse image of a compact set is compact, then deg f
is constant.
(ii) Suppose that F1 , F2 : C+ → C+ are analytic and Fi0 (z) 6= 0 for z ∈ C+ and
i = 1, 2. Let X = {(z1 , z2 ) ∈ C+ × C+ | F1 (z1 ) = F2 (z2 )}. Give X the structure of a
complex manifold so that (z1 , z2 ) 7→ F1 (z1 ) is analytic.
(iii) Suppose F1 , F2 , and X are as in (ii) and in addition there are σ1 and σ2 such
that for i = 1, 2 and z ∈ C+ we have |z − Fi (z)| ≤ σi2 /Im(z). Show that θ : X → C
given by θ (z1 , z2 ) = z1 + z2 − F1 (z1 ) is a proper map.

Lemma 27. Suppose F1 and F2 are analytic maps from C+ to C+ and that there is
r > 0 such that for z ∈ C+ and i = 1, 2 we have |Fi (z) − z| ≤ r2 /Im(z) . Then for
each z ∈ C+ there is a unique pair (z1 , z2 ) ∈ C+ × C+ such that
(i) F1 (z1 ) = F2 (z2 ), and
(ii) z1 + z2 − F1 (z1 ) = z.

Proof: Note that, by Lemma 21, our assumptions imply that, for i = 1, 2, 1/Fi is
the Cauchy transform of some probability measure and thus, by Lemma 19, we also
know that it satisfies Im(z) ≤ Im(Fi (z)).
We first assume that z ∈ C+4r . If (z1 , z2 ) satisfies (i) and (ii),

Im(z1 ) = Im(z) + Im(F2 (z2 ) − z2 ) ≥ Im(z).

Likewise Im(z2 ) ≥ Im(z). Hence, if we are to find a solution to (i) and (ii) we shall
find it in C+ + +
4r × C4r . By Lemma 24, F1 and F2 are invertible on C2r . Thus to find a
+
solution to (i) and (ii) it is sufficient to find u ∈ C2r such that
h−1i h−1i
F1 (u) + F2 (u) − u = z (3.9)
h−1i h−1i
and then let z1 = F1 (u) and z2 = F2 (u). Thus we must show that for every
z ∈ C+ +
4r there is a unique u ∈ C2r satisfying equation (3.9).
Let C be the circle with centre z and radius 2r. Then C ⊂ C+
2r and for u ∈ C we
have by Lemma 24
86 3 Free Harmonic Analysis

h−1i h−1i 4r2 4r2

|F1 (u) − u| + |F2 (u) − u| ≤ < = 2r.
Im(u) 2r

Hence

h−1i h−1i
(z − u) − z − u − (F1 (u) − u) − (F2 (u) − u)

h−1i h−1i
≤ F1 (u) − u + F2 (u) − u < 2r = |z − u|.

Thus by Rouché’s theorem there is a unique u ∈ int(C) such that

h−1i h−1i
z − u = F1 (u) − u + F2 (u) − u .

If there is u0 ∈ C+
2r with

h−1i h−1i
z − u0 = F1 (u0 ) − u0 + F2 (u0 ) − u0 ,

then, again by Lemma 24,

h−1i h−1i
|z − u0 | = F1 (u0 ) − u0 + F2 (u0 ) − u0 < 2r

and thus u0 ∈ int(C) and hence u0 = u. Thus there is a unique u ∈ C+ 2r satisfying

equation (3.9).
Let X = {(z1 , z2 ) | F1 (z1 ) = F2 (z2 )} be the Riemann surface in Exercise 17 and
θ (z1 , z2 ) = z1 + z2 − F1 (z1 ). We have just shown that for z ∈ C+
4r , degθ (z) = 1. But
by Exercise 17, degθ is constant on C+ so there is a unique solution to (i) and (ii)
for all z ∈ C+ .
Exercise 18. Let ν be a probability measure with variance σ 2 and mean m. Let
ν̃(E) = ν(E + m). Show that ν̃ is a probability measure with mean 0 and vari-
ance σ 2 . Let G and G̃ be the corresponding Cauchy transforms. Show that we
have G̃(z) = G(z + m). Let R and R̃ be the corresponding R-transforms. Show that
R(z) = R̃(z) + m for |z + i(4σ )−1 | < (4σ )−1 .

Theorem 28. Let ν1 and ν2 be two probability measures on R with finite variances
and R1 and R2 be the corresponding R-transforms. Then there is a unique prob-
ability measure with finite variance, denoted ν1 ν2 , and called the free additive
convolution of ν1 and ν2 , such that the R-transform of ν1 ν2 is R1 + R2 .
Moreover the first moment of ν1 ν2 is the sum of the first moments of ν1 and ν2
and the variance of ν1 ν2 is the sum of the variances of ν1 and ν2 .

Proof: By Exercise 18 we only have to prove the theorem in the case ν1 and ν2
are centred. Moreover there are probability measures ρ1 and ρ2 such that for z ∈
C+ and i = 1, 2 we have z − Fi (z) = σi2 (z − t)−1 dρi (t). By Lemma 27 for each
R
3.5 Free additive convolution 87

z in C+ there is a unique pair (z1 , z2 ) ∈ C+ × C+ such that F1 (z1 ) = F2 (z2 ) and

z1 + z2 − F1 (z1 ) = z. Define F(z) = F1 (z1 ). Let X = {(z1 , z2 ) | F1 (z1 ) = F2 (z2 )} and
θ : X → C+ be as in Exercise 17. In Lemma 27 we showed that θ is an analytic
bijection, since deg(θ ) = 1. Then F = F1 ◦ π ◦ θ −1 where π(z1 , z2 ) = z1 . Thus F is
analytic on C+ and we have

z − F(z) = z1 − F1 (z1 ) + z2 − F2 (z2 ). (3.10)

Since Im(F1 (z)) ≥ Im(z) we have Im(z) = Im(z2 ) + Im(z1 − F1 (z)) ≤ Im(z2 ).
Likewise Im(z) ≤ Im(z1 ). Thus

σ12 σ2 σ 2 + σ22
|z − F(z)| = |z1 − F1 (z1 ) + z2 − F2 (z2 )| ≤ + ≤ 1 .
Im(z1 ) Im(z2 ) Im(z)

Therefore, by Lemma 21, 1/F is the Cauchy transform of a centred probability

measure with variance σ 2 ≤ σ12 + σ22 . Thus there is, by Lemma 20, a probability
measure ρ such that
1
Z
z − F(z) = σ 2 dρ(t).
z−t
So by equation (3.10)
1 1 1
Z Z Z
σ2 dρ(t) = σ12 dρ1 (t) + σ22 dρ2 (t)
z−t z1 − t z2 − t
and hence
z z z
Z Z Z
σ2 dρ(t) = σ12 dρ1 (t) + σ22 dρ2 (t). (3.11)
z−t z1 − t z2 − t

For z = iy we have z/F(z) → ∞ by Exercise 12 (ii). Also

h−1i
2σ 2 2σ 2
F1 (F(z)) − F(z) ≤ ≤

Im(F(z)) Im(z)

by Lemma 24, parts (iii) and (iv). Thus

h−1i
z1 F (F(z)) − F(z) F(z)
= 1 + .
z z z
The first term goes to 0 and the second term goes to 1 as y → ∞, hence z1 /z → 1.
Likewise z2 /z → 1. Thus
iy iy
Z Z
lim dρ1 (t) = 1 and likewise lim dρ2 (t) = 1.
y→∞ z1 − t y→∞ z2 − t

If we now take limits as y → ∞ in equation (3.11), we get σ 2 = σ12 + σ22 .

88 3 Free Harmonic Analysis

Let Dr = {z | |z+ir| < r}, then D1/(4σ ) ⊂ D1/(4σ1 ) ∩D1/(4σ2 ) . Let z ∈ D1/(4σ ) , then
h−1i h−1i
z−1 is in the domains of F h−1i , F1 , and F2 . Now by Lemma 27, for F h−1i (z−1 )
find z1 and z2 in C+ so that F1 (z1 ) = F2 (z2 ) and F h−1i (z−1 ) = z1 + z2 − F1 (z1 ). By
the construction of F we have z−1 = F(F h−1i (z−1 )) = F1 (z1 ) = F2 (z2 ) and so z1 =
h−1i h−1i
F1 (z−1 ) and z2 = F2 (z−1 ). Thus the equation F h−1i (z−1 ) = z1 + z2 − F1 (z1 )
becomes
h−1i −1 h−1i −1
F h−1i z−1 − z−1 = F1 − z−1 + F2 − z−1 .

z z

Now recall the construction of the R-transform given by Theorem 26, reformulated
as in (3.7) in terms of F: R(z) = F h−1i (z−1 ) − z−1 . Hence R(z) = R1 (z) + R2 (z).

3.6 The R-transform and free additive convolution of arbitrary measures

In this section we consider probability measures on R that may not have any mo-
ments. We first show that for all α > 0 there is β > 0 so that the R-transform can be
defined on the wedge ∆α,β in the lower half-plane:

i 1
∆α,β = {z−1 | z ∈ Γα,β } = w ∈ C− | |Re(w)| < −α Im(w) and |w + |< .
2β 2β

This means that the R-transform is a germ of analytic functions in that for each
α > 0 there is β > 0 and an analytic function R on ∆α,β such that whenever we are
given another α 0 > 0 for which there exists a β 0 > 0 and a second analytic function
R0 on ∆α 0 ,β 0 , the two functions agree on ∆α,β ∩ ∆α 0 ,β 0 . (See Fig. 3.4.)

Fig. 3.4 Two wedges in C− : ∆α1 ,β1 and

∆α2 ,β2 with α1 > α2 and β1 > β2 . We have
∆α1 ,β1
R(1) on ∆α1 ,β1 and R(2) on ∆α2 ,β2 such that
R(1) (z) = R(2) (z) for z ∈ ∆α1 ,β1 ∩ ∆α2 ,β2 . We
shall denote the germ by R.
∆α2 ,β2

Definition 29. Let ν be a probability measure on R, let G be the Cauchy transform

of ν. We define the R-transform of ν as the germ of analytic functions on the do-
mains ∆α,β satisfying equation (3.1). This means that for all α > 0 there is β > 0
such that for all z ∈ ∆α,β we have G(R(z) + 1/z) = z and for all z ∈ Γα,β we have
R(G(z)) + 1/G(z) = z.

Remark 30. When ν is compactly supported we can find a disc centred at 0 on which
there is an analytic function satisfying equation (3.1). This was shown in Theorem
17. When ν has finite variance we showed that there a disc in C− tangent to 0 and
3.6 The R-transform and free additive convolution of arbitrary measures 89

with centre on the imaginary axis (see Fig. 3.3) on which there is a analytic function
satisfying equation (3.1). This was shown in Theorem 26. In the general case we
shall define R(z) by the equation R(z) = F h−1i (z−1 ) − z−1 . The next two lemmas
show that we can find a domain where this definition works.

Lemma 31. Let F be the reciprocal of the Cauchy transform of a probability mea-
sure on R. Suppose 0 < α1 < α2 . Then there is β0 > 0 such that for all β2 ≥ β0 and
β1 ≥ β2 (1 + α2 − α1 ),
(i) we have Γα1 ,β1 ⊆ F(Γα2 ,β2 )
(ii) and F h−1i exists on Γα1 ,β1 , i.e. for each w ∈ Γα1 ,β1 there is a unique z ∈ Γα2 ,β2
such that F(z) = w.

Proof: Let θ = tan−1 α1−1 − tan−1 α2−1 . Choose ε > 0 so that

α2 − α1
ε < sin θ = q q .
1 + α12 1 + α22

Choose β0 > 0 such that |F(z) − z| < ε|z| for z ∈ Γα2 ,β0 (which is possible by Exer-
cise 12). Let β2 ≥ β0 and β1 ≥ β2 (1 + α2 − α1 ).
Let us first show that for w ∈ Γα1 ,β1 and for z ∈ ∂Γα2 ,β2 we have ε|z| < |z − w|.
If z = α2 y + iy ∈ ∂Γα2 then |z − w|/|z| ≥ sin θ > ε. If z = x + iβ2 ∈ ∂Γα2 ,β2 then
q q q
|z − w| > β1 − β2 ≥ β2 (α2 − α1 ) > εβ2 1 + α12 1 + α22 ≥ ε|z| 1 + α12 > ε|z|.

Thus for w ∈ Γα1 ,β1 and z ∈ ∂Γα2 ,β2 we have ε|z| < |z − w|.
Now fix w ∈ Γα1 ,β1 and let r > |w|/(1 − ε). Thus for z ∈ {z̃ | |z̃| = r} ∩ Γα2 ,β2 we
have |z − w| ≥ r − |w| > εr = ε|z|. So let C be the curve

C := ∂Γα2 ,β2 ∩ {z̃| | |z̃| ≤ r} ∪ {z̃ | |z̃| = r} ∩ Γα2 ,β2 .

For z ∈ C we have that ε|z| < |z − w|. Thus for z ∈ C we have

|(F(z) − w) − (z − w)| < ε|z| < |z − w|.

So by Rouché’s theorem there is exactly one z in the interior of C such that F(z) = w.
Since we can make r as large as we want, there is a unique z ∈ Γα2 ,β2 such that
F(z) = w. Hence F has an inverse on Γα1 ,β1 .

Lemma 32. Let F be the reciprocal of the Cauchy transform of a probability mea-
sure on R. Suppose 0 < α1 < α2 . Then there is β0 > 0 such that

F(Γα1 ,β1 ) ⊆ Γα2 ,β1 for all β1 ≥ β0 .

Proof: Choose 1/2 > ε > 0 so that

90 3 Free Harmonic Analysis
√
α1 + ε/ 1 − ε 2
α2 > √ > α1 .
1 − α1 ε/ 1 − ε 2

Then choose β0 > 0 such that |F(z) − z| < ε|z| for z ∈ Γα1 ,β0 .
Suppose β1 ≥ β0 and let z ∈ Γα1 ,β1 with Re(z) ≥ 0, (the case Re(z) < 0 is similar).
Write z = |z|eiϕ . Then ϕ > tan−1 (α1−1 ). Write F(z) = |F(z)|eiψ . We have |z−1 F(z)−
1| < ε. Thus | sin(ψ − ϕ)| < ε, so

ψ > ϕ − sin−1 (ε) > tan−1 (α1−1 ) − sin−1 (ε).

If ψ ≤ π/2, then
√
−1 α1−1 − ε/ 1 − ε 2
(α1−1 ) − sin−1 (ε) > α2−1 .

tan(ψ) > tan tan = √
1 + α1−1 ε/ 1 − ε 2

Thus F(z) ∈ Γα2 .

Suppose ψ ≥ π/2. Then we must show that π − ψ > tan−1 (α2−1 ) or equivalently
that tan(π − ψ) > α2−1 . Since |ψ − ϕ| < sin−1 (ε) and ϕ ≤ π/2 we must then have
π − ψ > π/2 − sin−1 (ε). Thus
p
tan(π − ψ) > tan(π/2 − sin−1 (ε)) = 1 − ε 2 /ε.

On the other hand

p p
α2 > α1 + ε/ 1 − ε 2 > ε/ 1 − ε 2 ,

so tan(π − ψ) > α2−1 as required. Thus in both cases F(z) ∈ Γα2 .

Since we also have Im(F(z)) ≥ Im(z) > β1 we obtain F(Γα1 ,β1 ) ⊆ Γα2 ,β1 .

Theorem 33. Let ν be a probability measure on R with Cauchy transform G and set
F = 1/G. For every α > 0 there is β > 0 so that R(z) = F h−1i (z−1 ) − z−1 is defined
for z ∈ ∆α,β and such that we have
(i) G(R(z) + 1/z) = z for z ∈ ∆α,β and
(ii) R(G(z)) + 1/G(z) = z for z ∈ Γα,β .

Proof: Let F(z) = 1/G(z). Let α > 0 be given and by Lemma 31 choose β0 > 0
so that F h−1i is defined on Γ2α,β0 . For z ∈ ∆2α,β0 , R(z) is thus defined and we have
G(R(z) + 1/z) = G(F h−1i (z−1 )) = z.
Now by Lemma 32 we may choose β > β0 such that F(Γα,β ) ⊆ Γ2α,β . For z ∈
−1
Γα,β we have G(z) = 1/F(z) ∈ Γ2α,β = ∆2α,β ⊆ ∆2α,β0 and so

R(G(z)) + 1/G(z) = F h−1i (F(z)) − F(z) + F(z) = z.

Since ∆α,β ⊂ ∆2α,β0 , we also have G(R(z) + 1/z) = z for z ∈ ∆α,β .

3.6 The R-transform and free additive convolution of arbitrary measures 91

Exercise 19. Let w ∈ C be such that Im(w) ≤ 0. Then we saw in Exercise 9 that
G(z) = (z − w)−1 is the Cauchy transform of a probability measure on R. Show that
the R-transform of this measure is R(z) = w. In this case R is defined on all of C
even though the corresponding measure has no moments (when Im(w) < 0).

Remark 34. We shall now show that given two probability measures ν1 and ν2 with
R-transforms R1 and R2 , respectively, we can find a third probability measure ν with
Cauchy transform G and R-transform R such that R = R1 + R2 . This means that for
all α > 0 there is β > 0 such that all three of R, R1 , and R2 are defined on ∆α,β and
for z ∈ ∆α,β we have R(z) = R1 (z) + R2 (z). We shall denote ν by ν1 ν2 and call it
the free additive convolution of ν1 and ν2 . Clearly, this extends then our definition
for probability measures with finite variance from the last section.
When ν1 is a Dirac mass at a ∈ R we can dispose of this case directly. An easy
calculation shows that R1 (z) = a, c.f. Exercise 1. So R(z) = a + R2 (z) and thus
G(z) = G2 (z − a). Thus ν(E) = ν2 (E − a), c.f. Exercise 18. So for the rest of this
section we shall assume that neither ν1 nor ν2 is a Dirac mass.
There is another case that we can easily deal with. Suppose Im(w) < 0. Let ν1 =
δw be the probability measure with Cauchy transform G1 (z) = (z − w)−1 . This is the
measure we discussed in Notation 4; see also Exercises 9 and 19. Then R1 (z) = w.
Let ν2 be any probability measure on R. We let G2 be the Cauchy transform of ν2
and R2 be its R-transform. So if ν1 ν2 exists its R-transform should be R(z) =
w + R2 (z). Let us now go back to the subordination formula (2.31) in Chapter 2. It
says that if ν1 ν2 exists its Cauchy transform, G, should satisfy G(z) = G2 (ω2 (z))
where ω2 (z) = z − R1 (G(z)) = z − w. Now ω2 maps C+ to C+ and letting G =
G2 ◦ ω2 we have
lim iy G(iy) = 1.
y→∞

So by Theorem 10 there is a measure, which we shall denote ν1 ν2 , of which

G is the Cauchy transform, and thus the R-transform of this measure satisfies, by
construction, the equation R = R1 + R2 . Note that in this special case we have δw
ν2 = δw ∗ ν2 where ∗ means the classical convolution, because they have the same
Cauchy transform G(z) = G2 (z − w); see Notation 4, Theorem 6, and Exercise 9.
We can also solve for ω1 to conclude that ω1 (z) = F2 (z − w) + w. For later reference
we shall summarize this calculation in the theorem below.

Theorem 35. Let w = a + ib ∈ C− and δw be the probability measure on R with

density
1 −b
dδw (t) = dt
π b2 + (t − a)2
when b < 0 and the Dirac mass at a when b = 0. Then for any probability measure
ν we have δw ν = δw ∗ ν.

In the remainder of this Chapter we shall define ν1 ν2 in full generality; for this
we will show that we can always find ω1 and ω2 satisfying (2.32).
92 3 Free Harmonic Analysis

Notation 36 Let ν1 , ν2 be probability measures on R with Cauchy transforms G1

and G2 respectively. Let Fi (z) = 1/Gi (z) and Hi (z) = Fi (z) − z. The functions
F1 , F2 , H1 , and H2 are analytic functions that map the upper half-plane C+ to it-
self.

Corollary 37. Let F1 and F2 be as in Notation 36. Suppose 0 < α2 < α1 . Then there
are β2 ≥ β0 > 0 such that
h−1i h−1i
(i) F1 is defined on Γα1 ,β1 for any β1 ≥ β0 with F1 (Γα1 ,β1 ) ⊆ Γα1 +1,β1 /2 ;
(ii) F2 (Γα2 ,β2 ) ⊆ Γα1 ,β0 .

Proof: Let α = α1 + 1. By Lemma 31 there is β0 /2 > 0 such that for all β ≥ β0 /2

h−1i
and β1 = β (1 + α − α1 ) = 2β ≥ β0 we have Γα1 ,β1 ⊆ F1 (Γα,β ) and F1 exists on
h−1i
Γα1 ,β1 ; thus F1 (Γα1 ,β1 ) ⊆ Γα1 +1,β1 /2 . By Lemma 32 choose now β2 > 0 (and also
β2 ≥ β0 ) so that F2 (Γα2 ,β2 ) ⊆ Γα1 ,β2 ⊆ Γα1 ,β0 .

Definition 38. For any z, w ∈ C+ let g(z, w) = z + H1 (z + H2 (w)). Then g : C+ ×

C+ → C+ is analytic. Let gz (w) = g(z, w).

Remark 39. Choose now some α1 > α2 > 0, and β2 ≥ β0 ≥ 0 according to Corollary
37. In the following we will also need to control Im(F2 (w) − w). Note that, by the
fact that F2 (w)/w → 1, for w → ∞ in Γα2 ,β2 , we have, for any ε < 1, |F2 (w) − w| <
ε|w| for sufficiently large w ∈ Γα2 ,β2 . But then
q
0 ≤ Im(F2 (w) − w) ≤ |F2 (w) − w| < ε|w| < ε 1 + α22 · Im(w);
q
the latter inequality is from Notation 15 for w ∈ Γα2 ,β2 . By choosing 1/ε = 2 1 + α22
we find thus a β > 0 (which we can take β ≥ β2 ) such that we have
1
Im(F2 (w) − w) < Im(w) for all w ∈ Γα2 ,β ⊆ Γα2 ,β2 . (3.12)
2
h−1i
Consider now for w ∈ Γα2 ,β the point z = w + F1 (F2 (w)) − F2 (w). Since F2 (w) ∈
Γα1 ,β0 , this is well defined. Furthermore, we have Im(F2 (w)) ≥ Im(w) > β ≥ β0 ,
and thus actually F2 (w) ∈ Γα1 ,Im(w) , which then yields

h−1i h−1i Im(w)

F1 (w) ∈ Γα1 +1,Im(w)/2 i.e. Im(F1 (w)) > .
2
This together with (3.12) shows that we have z ∈ C+ , whenever we choose w ∈ Γα2 ,β .

Lemma 40. With α2 and β as above let w ∈ Γα2 ,β . Then

h−1i
z = w + F1 (F2 (w)) − F2 (w) ⇐⇒ g(z, w) = w.
3.6 The R-transform and free additive convolution of arbitrary measures 93

h−1i
Proof: Suppose z = w + F1 (F2 (w)) − F2 (w). By Remark 39 we have z ∈ C+ .

g(z, w) = z + H1 (z + H2 (w))
= z + H1 (z + F2 (w) − w)
h−1i
= z + H1 (F1 (F2 (w)))
h−1i h−1i
= z + F1 (F1 (F2 (w))) − F1 (F2 (w))
h−1i
= z + F2 (w) − F1 (F2 (w))
= w.

Suppose g(z, w) = w. Then

w = g(z, w) = w + F1 (z + F2 (w) − w) − F2 (w)

so
F2 (w) = F1 (z + F2 (w) − w)
thus
h−1i
F1 (F2 (w)) = z + F2 (w) − w
as required.

Remark 41. By Lemma 40 the open set

n o
h−1i
Ω = w + F1 (F2 (w)) − F2 (w) | w ∈ Γα2 ,β ⊆ C+

is such that for z ∈ Ω , gz has a fixed point in C+ (even in Γα2 ,β ). Our goal is to
show that for every z ∈ C+ there is w such that gz (w) = w and that w is an analytic
function of z.

Exercise 20. In the next proof we will use the following simple part of the Denjoy-
Wolff Theorem. Suppose f : D → D is a non-constant holomorphic function on the
unit disc D := {z ∈ C | |z| < 1} and it is not an automorphism of D (i.e., not of
the form λ (z − α)/(1 − ᾱz) for some α ∈ D and λ ∈ C with |λ | = 1). If there is a
z0 ∈ D with f (z0 ) = z0 , then for all z ∈ D, f ◦n (z) → z0 . In particular, the fixed point
is unique.
Prove this by an application of the Schwarz Lemma.

Lemma 42. Let g(z, w) be as in Definition 38. Then there is a non-constant analytic
function f : C+ → C+ such that for all z ∈ C+ , g(z, f (z)) = f (z). The analytic
function f is uniquely determined by the fixed point equation.

Proof: As before we set

n o
h−1i
Ω = w + F1 (F2 (w)) − F2 (w) | w ∈ Γα2 ,β ⊆ C+
94 3 Free Harmonic Analysis

and let, for z ∈ C+ , gz : C+ → C+ be gz (u) = g(z, u).

The idea of the proof is to define the fixed point of the function gz by itera-
tions. For z ∈ Ω we already know that we have a fixed point, hence the version of
Denjoy-Wolff mentioned above gives the convergence of the iterates in this case.
The extension of the statement to all z ∈ C+ is then provided by an application
of Vitali’s Theorem. A minor inconvenience comes from the fact that we have to
transcribe our situation from the upper half-plane to the disc, in order to apply the
theorems mentioned above. This is achieved by composing our functions with
1+z z−i
ϕ(z) = i and ψ(z) = ;
1−z z+i
ϕ maps D onto C+ and ψ maps C+ onto D; they are inverses of each other.
Let us first consider z ∈ Ω . Let g̃z : D → D be given by g̃z = ψ ◦ gz ◦ ϕ. Since
h−1i
z ∈ Ω there exists an w ∈ Γα2 ,β with z = w + F1 (F2 (w)) − F2 (w). Let w̃ = ψ(w).
Then
g̃z (w̃) = ψ(gz (ϕ(w̃))) = ψ(gz (w)) = ψ(w) = w̃.
So the map g̃z has a fixed point in D. In order to apply the Denjoy-Wolff theorem,
we have to exclude that g̃z is an automorphism. But since we have for all w ∈ C+

Im(gz (w)) = Im(z) + Im(H1 (z + H2 (w))) ≥ Im(z),

it is clear that gz cannot be an automorphism of the upper half-plane and hence g̃z
cannot be an automorphism of the disc. Hence, by Denjoy-Wolff, g̃◦n z (ũ) → w̃ for
all ũ ∈ D. Converting back to C+ we see that g◦n z (u) → w for all u ∈ C +.
+
Now we define our iterates on all of C , where we choose for concreteness the
initial point as u0 = i. We define a sequence { fn }n of analytic functions from C+ to
C+ by fn (z) = g◦n +
z (i). We claim that for all z ∈ C , limn f n (z) exists. We have shown
h−1i
that already for z ∈ Ω . There z = w + F1 (F2 (w)) − F2 (w) with w ∈ Γα2 ,β , and
g◦n
z (i) → w. Thus for all z ∈ Ω the sequence { f n (z)}n converges to the corresponding
w. Now let Ω̃ = ψ(Ω ) and fñ = ψ ◦ fn ◦ ϕ. then fñ : D → D and for z̃ ∈ Ω̃ , limn fñ (z̃)
exists. Hence, by Vitali’s Theorem, limn fñ (z̃) exists for all z̃ ∈ D. Note that by the
maximum modulus principle this limit cannot take on values on the boundary of D
unless it is constant. Since it is clearly not constant on Ω̃ , the limit takes on only
values in D. Hence limn fn (z) exists for all z ∈ C+ as an element in C+ . So we define
f : C+ → C+ by f (z) = limn fn (z); by Vitali’s Theorem the convergence is uniform
on compact subsets of C+ and f is analytic. Recall that fn (z) = g◦n z (i), so

◦(n+1)
gz ( f (z)) = lim gz ( fn (z)) = lim gz (i) = f (z),
n n

so we have g(z, f (z)) = gz ( f (z)) = f (z).

By Denjoy-Wolff, the function f is uniquely determined by the fixed point equa-
tion on the open set Ω ; by analytic continuation it is then unique everywhere.
3.6 The R-transform and free additive convolution of arbitrary measures 95

Theorem 43. There are analytic functions ω1 , ω2 : C+ → C+ such that for all z ∈
C+
(i) F1 (ω1 (z)) = F2 (ω2 (z)), and
(ii) ω1 (z) + ω2 (z) = z + F1 (ω1 (z)).
The analytic functions ω1 and ω2 are uniquely determined by these two equations.

Proof: Let z ∈ C+ and gz (w) = g(z, w). By Lemma 42, gz has a unique fixed point
f (z). So define the function ω2 by ω2 (z) = f (z) for z ∈ C+ , and the function ω1 by
ω1 (z) = z + F2 (ω2 (z)) − ω2 (z). Then ω1 and ω2 are analytic on C+ and

ω1 (z) + ω2 (z) = z + F2 (ω2 (z)).

h−1i
By Lemma 40, we have that for z ∈ Ω , z = ω2 (z) + F1 (F2 (ω2 (z))) − F2 (ω2 (z))
and by construction z = ω2 (z) + ω1 (z) − F2 (ω2 (z)). Hence for z ∈ Ω , ω1 (z) =
h−1i
F1 (F2 (ω2 (z))). Thus for all z ∈ Ω , and hence by analytic continuation for all
z ∈ C+ , we have F1 (ω1 (z)) = F2 (ω2 (z)) as required.
For the uniqueness one has to observe that the equations (i) and (ii) yield

ω1 (z) = z + F2 (ω2 (z)) − ω2 (z) = z + H2 (ω2 (z))

and
ω2 (z) = z + F1 (ω1 (z)) − ω1 (z) = z + H1 (ω1 (z)),
and thus
ω2 (z) = z + H1 (z + H2 (ω2 (z))) = g(z, ω2 (z)).
By Lemma 42, we know that an analytic solution of this fixed point equation is
unique. Exchanging H1 and H2 gives in the same way the uniqueness of ω1 .
To define the free additive convolution of ν1 and ν2 we shall let F(z) = F1 (ω1 (z))
= F2 (ω2 (z)) and then show that 1/F is the Cauchy transform of a probability mea-
sure, which will be ν1 ν2 . The main difficulty is to show that F(z)/z → 1 as
^z → ∞. For this we need the following lemma.
ω1 (iy) ω2 (iy)
Lemma 44. lim = lim = 1.
y→∞ iy y→∞ iy

Proof: Let us begin by showing that limy→∞ ω2 (iy) = ∞ (in the sense of Definition
16).
We must show that given α, β > 0 there is y0 > 0 such that ω2 (iy) ⊆ Γα,β when-
ever y > y0 . Note that by the previous Theorem we have ω2 (z) = z+H1 (ω1 (z)) ∈ z+
C+ . So we have that Im(ω2 (z)) > Im(z). Since ω2 maps C+ to C+ we have by the
Nevanlinna representation of ω2 (see Exercise 13) that b2 = limy→∞ ω2 (iy)/(iy) ≥ 0.
This means that Im(ω2 (iy))/y → b2 and our inequality Im(ω2 (z)) > Im(z) implies
that b2 ≥ 1. We also have that Re(ω2 (iy))/y → 0. So there is y0 > 0 so that for
y > y0 ≥ β we have
96 3 Free Harmonic Analysis
2 2 2
Re(ω2 (iy)) Im(ω2 (iy)) 2 Im(ω2 (iy))
+ < (α + 1) .
y y y

For such a y we have

|ω2 (iy)|2 Im(ω2 (iy)) 2

2
< (1 + α ) .
y2 y

Thus ω2 (iy) ∈ Γα (see Notation 15). Since Im(ω2 (iy)) > y > y0 , we have that
ω2 (iy) ∈ Γα,β . Thus limy→∞ ω2 (iy) = ∞.
Recall that ω1 (z) = z+H2 (ω2 (z)) ∈ z+C+ , so by repeating our arguments above
we have that b1 = limy→∞ ω1 (iy)/(iy) ≥ 1 and limy→∞ ω1 (iy) = ∞.
F1 (ω1 (iy))
Since lim^z→∞ F1 (z)/z = 1 (see Exercise 12) we now have lim = 1.
y→∞ ω2 (iy)
Moreover the equation ω1 (z) + ω2 (z) = z + F1 (ω1 (z)) means that

ω1 (iy) + ω2 (iy)
b1 + b2 = lim
y→∞ iy
iy + F1 (ω1 (iy))
= lim
y→∞ iy
F1 (ω1 (iy)) ω1 (iy)
= 1 + lim
y→∞ ω1 (iy) iy
= 1 + b1 .

Thus b2 = 1. By the same argument we have b1 = 1.

Theorem 45. Let F = F2 ◦ ω2 . Then F is the reciprocal of the Cauchy transform of
a probability measure.
Proof: We have that F maps C+ to C+ so by Theorem 10 we must show that
limy→∞ F(iy)/(iy) = 1. By Lemma 44

F(iy) F2 (ω2 (iy)) F2 (ω2 (iy)) ω2 (iy)

lim = lim = lim = 1.
y→∞ iy y→∞ iy y→∞ ω2 (iy) iy

Theorem 46. Let ν1 and ν2 be two probability measures on R then there is ν, a

probability measure on R with R-transform R, such that R = R1 + R2 .
Proof: Let F = F2 ◦ ω2 = F1 ◦ ω1 be as in Theorem 45 and ν its corresponding
probability measure. By Theorem 43 (ii) we have

ω1 (F h−1i (z−1 )) + ω2 (F h−1i (z−1 )) − F1 (ω1 (F h−1i (z−1 ))) = F h−1i (z−1 ).
h−1i h−1i
Also ω1 (F h−1i (z−1 )) = F1 (z−1 ) and ω2 (F h−1i (z−1 )) = F2 (z−1 ) so our equa-
tion becomes
3.6 The R-transform and free additive convolution of arbitrary measures 97

h−1i h−1i
F1 (z−1 ) + F2 (z−1 ) − z−1 = F h−1i (z−1 ).
Hence R(z) = R1 (z) + R2 (z).

Definition 47. Let ν1 ν2 be the probability measure whose Cauchy transform is

the reciprocal of F; i.e., for which we have R = R1 + R2 . We call ν1 ν2 the free
additive convolution of ν1 and ν2 .

Remark 48. In the case of bounded operators x and y which are free we saw in Sec-
tion 3.5 that the distribution of their sum gives the free additive convolution of their
distributions. Later we shall see how using the theory of unbounded operators affil-
iated with a von Neumann algebra we can have the same conclusion for probability
measures with non-compact support (see Remark 8.16).

Remark 49. 1) There is also a similar analytic theory of free multiplicative convolu-
tion for the product of free variables; see, for example, [21, 31, 54].
2) There exists a huge body of results around infinitely divisible and stable laws
in the free sense; see, for example, [8, 9, 11, 22, 29, 31, 32, 30, 53, 70, 97, 199].
Chapter 4
Asymptotic Freeness for Gaussian, Wigner, and Unitary Random
Matrices

After having developed the basic theory of freeness we are now ready to have a
more systematic look into the relation between freeness and random matrices. In
chapter 1 we showed the asymptotic freeness between independent Gaussian ran-
dom matrices. This is only the tip of an iceberg. There are many more classes of
random matrices which show asymptotic freeness. In particular, we will present
such results for Wigner matrices, Haar unitary random matrices and treat also the
relation between such ensembles and deterministic matrices. Furthermore, we will
strengthen the considered form of freeness from the averaged version (which we
considered in chapter 1) to an almost sure one.
We should point out that our presentation of the notion of freeness is quite orthog-
onal to its historical development. Voiculescu introduced this concept in an operator
algebraic context (we will say more about this in chapter 6); at the beginning of
free probability, when Voiculescu discovered the R-transform and proved the free
central limit theorem around 1983, there was no relation at all with random matri-
ces. This connection was only revealed later in 1991 by Voiculescu [180]; he was
motivated by the fact that the limit distribution which he found in the free central
limit theorem had appeared before in Wigner’s semi-circle law in the random ma-
trix context. The observation that operator algebras and random matrices are deeply
related had a tremendous impact and was the beginning of new era in the subject of
free probability.

4.1 Asymptotic freeness: averaged convergence versus almost sure

convergence
The most important random matrix is the GUE random matrix ensemble AN . Let us
recall what this means. Each entry of AN is a complex-valued random variable ai j ,
and a ji = ai j for i 6= j, while aii = aii thus implying that aii is in fact a real-valued
random variable. AN is said to be GUE-distributed if each ai j with i < j is of the
form √
ai j = xi j + −1yi j , (4.1)

99
100 4 Asymptotic Freeness

where xi j , yi j , 1 ≤ i < j ≤ N are independent real Gaussian random variables, each

with mean 0 and variance 1/(2N). This also determines the entries below the di-
agonal. Moreover, the GUE requirement means that the diagonal entries aii are real-
valued independent Gaussian random variables which are also independent from the
xi j ’s and the yi j ’s and have mean 0 and variance 1/N.
Let tr be the normalized trace on the full N × N matrix algebra over C. Then
tr(AN ) is a random variable. In Chapter 1 we proved Wigner’s semi-circle law;
namely that (
1 2n

m n+1 n , m = 2n
lim E[tr(AN )] = .
N→∞ 0, m odd
In the language we have developed in Chapter 2 (see Definition 2.1), this means that
distr
AN −→ s, as N → ∞, where the convergence is in distribution with respect to E ◦ tr
and s is a semi-circular element in some non-commutative probability space.
We also saw Voiculescu’s remarkable generalization of Wigner’s semi-circle law:
(1) (p)
if AN , . . . , AN are p independent N × N GUE random matrices (meaning that if we
collect the real and imaginary parts of the above diagonal entries together with the
diagonal entries we get a family of independent real Gaussians with mean 0 and
variances as explained above), then
(1) (p) distr
AN , . . . , AN −→ s1 , . . . , s p as N → ∞, (4.2)

where s1 , . . . , s p is a family of freely independent semi-circular elements. This

amounts to proving that for all m ∈ N and all 1 ≤ i1 , . . . , im ≤ p we have
(i ) (i )
lim E[tr(AN1 · · · ANm ] = ϕ(si1 · · · sim ).
N→∞

Recall that since s1 , . . . , s p are free their mixed cumulants will vanish, and only the
second cumulants of the form κ2 (si , si ) will be non-zero. With the chosen normal-
ization of the variance for our random matrices those second cumulants will be 1.
Thus
ϕ(si1 · · · sim ) = ∑ κπ [si1 , . . . , sim ]
π∈NC2 (m)

is given by the number of non-crossing pairings of the si1 , . . . , sim which connect
only si ’s with the same index. Hence (4.2) follows from Lemma 1.9.
The statements above about the limit distribution of Gaussian random matrices
are in distribution with respect to the averaged trace E[tr(·)]. However, they also hold
in the stronger sense of almost sure convergence. Before formalizing this let us first
look at some numerical simulations in order to get an idea of the difference between
convergence of averaged eigenvalue distribution and almost sure convergence of
eigenvalue distribution.
Consider first our usual setting with respect to E[tr(·)]. To simulate this we have
to average for fixed N the eigenvalue distributions of the sampled N × N matrices.
4.1 Averaged convergence versus almost sure convergence 101

Fig. 4.1 Averaged distribu-

0.3
tion for 10,000 realizations
with N = 5. The dashed line
0.2
is the semi-circle law and the
solid line is the limit as the
0.1
number of realizations tends
to infinity.
-3 -2 -1 0 1 2 3

Fig. 4.2 Averaged distribu-

0.3
tion for 10,000 realizations
with N = 20. The dashed line
0.2
is the semi-circle law.
0.1

-3 -2 -1 0 1 2 3

For the Gaussian ensemble there are infinitely many of those, so we approximate
this averaging by choosing a large number of realizations of our random matrices. In
the following pictures we created 10,000 N × N matrices (by generating the entries
independently and according to a normal distribution), calculated for each of those
10,000 matrices the N eigenvalues and plotted the histogram for the 10, 000 × N
eigenvalues. We show those histograms for N = 5 (see Fig. 4.1) and N = 20 (see
Fig. 4.2). Wigner’s theorem in the averaged version tells us that as N → ∞ these
averaged histograms have to converge to the semi-circle. The numerical simulations
show this very clearly. Note that already for quite small N, for example N = 20, we
have a very good agreement with the semi-circular distribution.
Let us now consider the stronger almost sure version of this. In that case we
produce for each N only one N × N matrix (generated according to the probability
measure for our ensemble) and plot the corresponding histogram of the N eigen-
values. The almost sure version of Wigner’s theorem says that generically, i.e., for
almost all choices of such sequences of N × N matrices, the corresponding sequence
of histograms converges to the semi-circle. This statement is supported by the fol-
lowing pictures of four such samples, for N = 10, N = 100, N = 1000, N = 4000
(see Figures 4.3 and 4.4). Clearly, for small N the histogram depends on the specific
realization of our random matrix, but the larger N gets, the smaller the variations
between different realizations get.
Also for the asymptotic freeness of independent Gaussian random matrices we
have an almost sure version. Consider two independent Gaussian random matrices
distr
AN , BN . We have seen that AN , BN −→ s1 , s2 , where s1 , s2 are free semi-circular
elements.
This means, for example, that

lim E[tr(AN AN BN BN AN BN BN AN )] = ϕ(s1 s1 s2 s2 s1 s2 s2 s1 ).

N→∞
102 4 Asymptotic Freeness

0.3 0.3

0.2 0.2

0.1 0.1

0
-2 -1 0 1 2 -2 -1 0 1 2

Fig. 4.3 One realization of a N = 10 and a N = 100 Gaussian random matrix.

0.3
0.3

0.2
0.2

0.1 0.1

0
-2 -1 0 1 2 -2 -1 0 1 2

Fig. 4.4 One realization of a N = 1000 and a N = 4000 Gaussian random matrix.

We have ϕ(s1 s1 s2 s2 s1 s2 s2 s1 ) = 2, since there are two non-crossing pairings

which respect the indices:
s1 s1 s2 s2 s1 s2 s2 s1 and s1 s1 s2 s2 s1 s2 s2 s1

The numerical simulation in the first part of the following figure shows the aver-
aged (over 1000 realizations) value of tr(AN AN BN BN AN BN BN AN ), plotted against
N, for N between 2 and 30. Again, one sees (Fig. 4.5 left) a very good agreement
with the asymptotic value of 2 for quite small N.

3.0 4.0
2.8
3.5
2.6
3.0
2.4
2.2 2.5

2.0 2.0
1.8 1.5
1.6
1 5 10 15 20 25 30 10 50 100 150 200

Fig. 4.5 On the left we have the averaged trace (averaged over 1000 realizations) of the normalized
trace of XN = AN AN BN BN AN BN BN AN for N from 1 to 30. On the right the normalized trace of XN
for N from 1 to 200 (one realization for each N).
4.1 Averaged convergence versus almost sure convergence 103

For the almost sure version of this we realize for each N just one matrix
AN and (independently) one matrix BN and calculate for this pair the number
tr(AN AN BN BN AN BN BN AN ). We expect that generically, as N → ∞, this should also
converge to 2. The second part of the above figure shows a simulation for this (Fig.
4.5 right).
Let us now formalize our two notions of asymptotic freeness. For notational con-
venience, we restrict here to two sequences of matrices. The extension to more ran-
dom matrices or to sets of random matrices should be clear.
Definition 1. Consider two sequences (AN )N∈N and (BN )N∈N of random N × N ma-
trices such that for each N ∈ N, AN and BN are defined on the same probability space
(ΩN , PN ). Denote by EN the expectation with respect to PN .
1) We say AN and BN are asymptotically free if AN , BN ∈ (AN , EN [tr(·)]) (where
AN is the algebra generated by the random matrices AN and BN ) converge in dis-
tribution to some elements a, b (living in some non-commutative probability space
(A, ϕ)) such that a, b are free.
2) Consider now the product space Ω = ∏N∈N ΩN and let P = ∏N∈N PN be
the product measure of the PN on Ω . Then we say that AN and BN are almost
surely asymptotically free, if there exists a, b (in some non-commutative probabil-
ity space (A, ϕ)) which are free and such that we have for almost all ω ∈ Ω that
AN (ω), BN (ω) ∈ (MN (C), tr(·)) converge in distribution to a, b.
Remark 2. What does this mean concretely? Assume we are given our two se-
quences AN and BN and we want to investigate their convergence to some a and
b, where a and b are free. Then, for any choice of m ∈ N and p1 , q1 , . . . , pm , qm ≥ 0,
we have to consider the trace of the corresponding monomial,

fN := tr(AqN1 BNp1 · · · AqNm BNpm ),

and show that this converges to the corresponding expression

α := ϕ(aq1 b p1 · · · aqm b pm ).

For asymptotic freeness we have to show the convergence of EN [ fN ] to α, whereas

in the almost sure case we have to strengthen this to the almost sure convergence of
{ fN }N . In order to do so one usually shows that the variance of the random variables
fN goes to zero fast enough. Namely, assume that EN [ fN ] converges to α; then the
fact that fN (ω) does not converge to α is equivalent to the fact that the difference
between fN (ω) and αN := EN [ fN ] does not converge to zero. But this is the same
as the statement that for some ε > 0 we have | fN (ω) − αN | ≥ ε for infinitely many
N. Thus the almost sure convergence of { fN }N is equivalent to the fact that for any
ε >0
P({ω | | fN (ω) − αN | ≥ ε infinitely often}) = 0.
As this is the probability of the lim sup of events, we can use the first Borel-Cantelli
lemma which guarantees that this probability is zero if we have
104 4 Asymptotic Freeness

∑ P({ω | | fN (ω) − αN | ≥ ε}) < ∞.

N∈N

Note that, since the fN are independent with respect to P, this is by the second
Borel-Cantelli lemma actually equivalent to the almost sure convergence of fN . On
the other hand, Chebyshev’s inequality gives us the bound (since αN = E[ fN ])

1
PN ({ω | | fN (ω) − αN | ≥ ε}) ≤ var[ fN ].
ε2
So if we can show that ∑N∈N var[ fN ] < ∞, then we are done. Usually, one is able to
bound the order of these variances by a constant times 1/N 2 , which is good enough.
We will come back to the question of estimating the variances in Remark 5.14.
In Theorem 5.13 we will show the variances is of order 1/N 2 , as claimed above.
(Actually we will do more there and provide a non-crossing interpretation of the co-
efficient of this leading order term.) So in the following we will usually only address
the asymptotic freeness of the random matrices under consideration in the averaged
sense and postpone questions about the almost sure convergence to Chapter 5. How-
ever, in all cases considered the averaged convergence can be strengthened to almost
sure convergence, and we will state our theorems directly in this stronger form.
Remark 3. There is actually another notion of convergence which might be more
intuitive than almost sure convergence, namely convergence in probability. Namely,
our random matrices AN and BN converge in probability to a and b (and hence, if a
and b are free, are asymptotically free in probability), if we have for each ε > 0 that

lim PN ({ω | | fN (ω) − αN | ≥ ε}) = 0.

N→∞

As before, we can use Chebyshev’s inequality to insure this convergence if we can

show that limN var[ fN ] = 0.
It is clear that convergence in probability is weaker than almost sure convergence.
Since our variance estimates are usually strong enough to insure almost sure con-
vergence we will usually state our theorems in terms of almost sure convergence.
Almost sure versions of the various theorems were also considered in [160, 96, 173].

4.2 Asymptotic freeness of Gaussian random matrices and deterministic

matrices
Consider a sequence (AN )N∈N of N × N GUE random matrices AN ; then we know
distr
that AN −→ s. Consider now also a sequence (DN )N∈N of deterministic (i.e., non-
random) matrices, DN ∈ MN (C). Assume that

lim tr(Dm
N) (4.3)
N→∞

distr
exists for all m ≥ 1. Then we have DN −→ d, as N → ∞, where d lives in some
non-commutative probability space and where the moments of d are given by the
4.2 Gaussian random matrices and deterministic matrices 105

limit moments (4.3) of the DN . We want to investigate the question whether there is
anything definite to say about the relation between s and d?
In order to answer this question we need to find out whether the limiting mixed
moments
q(1) q(2) q(m)
lim E[tr(DN AN DN · · · DN AN )], (4.4)
N→∞

for all m ≥ 1 (where q(k) can be 0 for some k), exist. In the calculation let us
suppress the dependence on N to reduce the number of indices, and write
q(k) (k)
DN = (di j )Ni,j=1 and AN = (ai j )Ni,j=1 . (4.5)

The Wick formula allows us to calculate mixed moments in the entries of A :

E[ai1 j1 ai2 j2 · · · aim jm ] = ∑ ∏ E[air jr ais js ], (4.6)

π∈P2 (m) (r,s)∈π

where
1
E[ai j akl ] = δil δ jk . (4.7)
N
Thus we have

q(1) q(2) q(m) 1 (1) (2) (m)

E[tr(DN AN DN · · ·DN AN )] = ∑ E[d j1 i1 ai1 j2 d j2 i2 ai2 j3 · · · d jm im aim j1 ]
N i, j:[m]→[N]
1 (1) (m)
= ∑ E[ai1 j2 ai2 j3 · · · aim j1 ]d j1 i1 · · · d jm im
N i, j:[m]→[N]
m
1 (1) (m)
=
N 1+m/2
∑ ∑ ∏ δir jγπ(r) d j1 i1 · · · d jm im
π∈P2 (m) i, j:[m]→[N] r=1
1 (1) (m)
= ∑ ∑ d j1 jγπ(1) · · · d jm jγπ(m) .
N 1+m/2 π∈P
2 (m) j:[m]→[N]

In the calculation above, we regard a pairing π ∈ P2 (m) as a product of disjoint

transpositions in the permutation group Sm (i.e., an involution without fixed point).
Also γ ∈ Sm denotes the long cycle γ = (1, 2, . . . , m) and #(σ ) is the number of
cycles in the factorization of σ ∈ Sm as a product of disjoint cycles.
In order to get a simple formula for the expectation we need a simple expression
for
(1) (m)
∑ d j1 jγπ(1) · · · d jm jγπ(m) .
j:[m]→[N]

As always, tr is the normalized trace, and we extend it multiplicatively as a func-

tional on permutations. For example if σ = (1, 6, 3)(4)(2, 5) ∈ S6 then

trσ [D1 , D2 , D3 , D4 , D5 , D6 ] = tr(D1 D6 D3 )tr(D4 )tr(D2 D5 ).

106 4 Asymptotic Freeness

In terms of matrix elements we have the following which we leave as an easy exer-
cise.
Exercise 1. Let A1 , . . . , An be N × N matrices and let σ ∈ Sn be a permutation. Let
(k)
the entries of Ak be (ai j )Ni,j=1 . Show that

N
(1) (2) (n)
trσ (A1 , . . . , An ) = N −#(σ ) ∑ ai1 iσ (1) ai2 iσ (2) · · · ain iσ (n) .
i1 ,...,in =1

Thus we may write

q(1) q(2) q(m) q(1) q(m)
E[tr(DN AN DN · · · DN AN )] = ∑ N #(γπ)−1−m/2 trγπ [DN , . . . , DN ].
π∈P2 (m)
(4.8)

Now, as pointed out in Corollary 1.6, one has for π ∈ P2 (m) that
(
1, if π ∈ NC2 (m)
lim N #(γπ)−1−m/2 = ,
N→∞ 0, otherwise

so that we finally get

q(1) q(2) q(m)
lim E[tr(DN AN DN · · · DN AN )] = ∑ ϕγπ [d q(1) , . . . , d q(m) ]. (4.9)
N→∞
π∈NC2 (m)

We see that the mixed moments of Gaussian random matrices and deterministic ma-
trices have a definite limit. And moreover, we can recognize this limit as something
familiar. Namely compare (4.9) to the formula (2.22) for a corresponding mixed
moment in free variables d and s, in the case where s is semi-circular:

ϕ[d q(1) sd q(2) s · · · d q(m) s] = ∑ ϕK −1 (π) [d q(1) , . . . , d q(m) ]. (4.10)

π∈NC2 (m)

Both formulas, (4.9) and (4.10), are the same provided K −1 (π) = γπ where K is
the Kreweras complement. But this is indeed true for all π ∈ NC2 (m), see [140, Ex.
18.25]. Consider for example π = {(1, 10), (2, 3), (4, 7), (5, 6), (8, 9)} ∈ NC2 (10).
Regard this as the involution π = (1, 10)(2, 3)(4, 7)(5, 6)(8, 9) ∈ S10 . Then we have
γπ = (1)(2, 4, 8, 10)(3)(5, 7)(6)(9), which corresponds exactly to K −1 (π).
Thus we have proved that Gaussian random matrices and deterministic matrices
become asymptotically free with respect to the averaged trace. The calculations can
of course also be extended to the case of several GUE and deterministic matrices.
By estimating the covariance of the appropriate traces, see Remark 5.14, one can
strengthen this to almost sure asymptotic freeness. So we have the following theo-
rem of Voiculescu [180, 188].
4.2 Gaussian random matrices and deterministic matrices 107

(1) (p)
Theorem 4. Let AN , . . . , AN be p independent N × N GUE random matrices and
(1) (q)
let DN , . . . , DN be q deterministic N × N matrices such that

(1) (q) distr

DN , . . . , DN −→ d1 , . . . , dq as N → ∞.

Then
(1) (p) (1) (q) distr
AN , . . . , AN , DN , . . . , DN −→ s1 , . . . , s p , d1 , . . . , dq as N → ∞,

where each si is semi-circular and s1 , . . . , s p , {d1 , . . . , dq } are free. The conver-

(1) (p)
gence above also holds almost surely, so in particular, we have that AN , . . . , AN ,
(1) (q)
{DN , . . . , DN } are almost surely asymptotically free.
The theorem above can be generalized to the situation where the DN ’s are also
random matrix ensembles. If we assume that the DN and the AN are independent
and that the DN have an almost sure limit distribution, then we get almost sure
asymptotic freeness by the deterministic version above just by conditioning onto
the DN ’s. Hence we have the following random version for the almost sure setting.
(1) (p)
Theorem 5. Let AN , . . . , AN be p independent N × N GUE random matrices and
(1) (q)
let DN , . . . , DN be q random N × N matrices such that almost surely

(1) (q) distr

DN (ω), . . . , DN (ω) −→ d1 , . . . , dq as N → ∞.
(1) (p) (1) (q)
Furthermore, assume that AN , . . . , AN , {DN , . . . , DN } are independent. Then we
have almost surely as N → ∞
(1) (p) (1) (q) distr
AN (ω), . . . , AN (ω), DN (ω), . . . , DN (ω) −→ s1 , . . . , s p , d1 , . . . , dq ,

where each si is semi-circular and s1 , . . . , s p , {d1 , . . . , dq } are free. So in particular,

(1) (p) (1) (q)
we have that AN , . . . , AN , {DN , . . . , DN } are almost surely asymptotically free.
For the averaged version, on the other hand, the assumption of an averaged limit
distribution for random DN is not enough to guarantee asymptotic freeness in the
averaged sense; as the following example shows.
Example 6. Consider a Gaussian random matrix AN and let, for each N, DN be
a random matrix which is independent from AN , and just takes on two values:
P(DN = IN ) = 1/2 and P(DN = −IN ) = 1/2, where IN is the identity matrix. Then
for each N, DN has the averaged eigenvalue distribution 21 δ−1 + 21 δ1 , and thus the
same distribution in the limit; but AN and DN are clearly not asymptotically free.
The problem here is that the fluctuations of DN are too large; there is no almost sure
convergence in that case to 12 δ−1 + 21 δ1 . Of course, we have that IN is asymptotically
free from AN and that −IN is asymptotically free from AN , but this does not imply
the asymptotic freeness of DN from AN .
108 4 Asymptotic Freeness

Let us also remark that in our algebraic framework it is not obvious how to deal
directly with the assumption of almost sure convergence to the limit distribution. We
will actually replace this in the next chapter by the more accessible condition that
the variance of the normalized traces is of order 1/N 2 . Note that this is a stronger
condition in general than almost sure convergence of the eigenvalue distribution, but
this stronger assumption in our theorems will be compensated by the fact that we
can then also show this stronger behaviour in the conclusion.

4.3 Asymptotic freeness of Haar distributed unitary random matrices and

deterministic matrices
Let U(N) denote the group of unitary N × N matrices, i.e. N × N complex matrices
which satisfy U ∗U = UU ∗ = IN . Since U(N) is a compact group, one can take
R
dU to be Haar measure on U(N) normalized so that U (N) dU = 1, which gives
a probability measure on U(N). A Haar distributed unitary random matrix is a
matrix UN chosen at random from U(N) with respect to Haar measure. There is
a useful theoretical and practical way to construct Haar unitaries: take an N × N
(non-self-adjoint!) random matrix whose entries are independent standard complex
Gaussians and apply the Gram-Schmidt orthogonalization procedure; the resulting
matrix is then a Haar unitary.
Exercise 2. Let {Zi j }Ni,j=1 be N 2 independent standard complex Gaussian random
variables with mean 0 and complex variance 1, i.e. E(Zi j Zi j ) = 1. Show that if U =
(ui j )i j is a unitary matrix and Yi j = ∑Nk=1 uik Zk j then {Yi j }Ni,j=1 are N 2 independent
standard complex Gaussian random variables with mean 0 and complex variance 1.

Exercise 3. Let Φ : GLN (C) → U(N) be the map which takes an invertible complex
matrix A and applies the Gram-Schmidt procedure to the columns of A to obtain a
unitary matrix. Show that for any U ∈ U(N) we have Φ(UA) = UΦ(A).

Exercise 4. Let {Zi j }i j be as in Exercise 2 and let Z be the N × N matrix with entries
Zi j . Since Z ∈ GLN (C), almost surely, we may let U = Φ(Z). Show that U is Haar
distributed.
What is the ∗-distribution of a Haar unitary random matrix with respect to the
state ϕ = E ◦ tr? Since UN∗ UN = IN = UN UN∗ , the ∗-distribution is determined by the
values ϕ(UNm ) for m ∈ Z. Note that for any complex number λ ∈ C with |λ | = 1,
λUN is again a Haar unitary random matrix. Thus, ϕ(λ mUNm ) = ϕ(UNm ) for all m ∈ Z.
This implies that we must have ϕ(UNm ) = 0 for m 6= 0. For m = 0, we have of course
ϕ(UN0 ) = ϕ(IN ) = 1.

Definition 7. Let (A, ϕ) be a ∗-probability space. An element u ∈ A is called a Haar

unitary if
◦ u is unitary, i.e. u∗ u = 1A = uu∗ ;
4.3 Haar distributed unitary random and deterministic matrices 109

◦ ϕ(um ) = δ0,m for m ∈ Z.

Thus a Haar unitary random matrix UN ∈ U(N) is a Haar unitary for each N ≥ 1
(with respect to ϕ = E ◦ tr).
We want to see that asymptotic freeness occurs between Haar unitary random
matrices and deterministic matrices, as was the case with GUE random matrices.
The crucial element in the Gaussian setting was the Wick formula, which of course
does not apply when dealing with Haar unitary random matrices, whose entries are
neither independent nor Gaussian. However, we do have a replacement for the Wick
formula in this context, which is known as the Weingarten convolution formula, see
[57, 60].
The Weingarten convolution formula asserts the existence of a sequence of func-
tions (WgN )∞ N=1 with each WgN a central function in the group algebra C[Sn ] of the
symmetric group Sn , for each N ≥ n. The function WgN has the property that for
the entries ui j of a Haar distributed unitary random matrix U = (ui j ) ∈ U(N) and all
index tuples i, j, i0 , j0 : [n] → [N]
n
E[ui1 j1 · · · uin jn ui0 j0 · · · ui0n jn0 ] = ∑ ∏ δir i0σ (r) δ jr jτ(r)
0 WgN (τσ −1 ). (4.11)
1 1
σ ,τ∈Sn r=1

Exercise 5. Let us recall a special factorization of a permutation σ ∈ Sn into a

product of transpositions. Let σ1 = σ and let n1 ≤ n be the largest integer such
that σ1 (n1 ) 6= n1 and k1 = σ1 (n1 ). Let σ2 = (n1 , k1 )σ1 , the product of the transpo-
sition (n1 , k1 ) and σ1 . Then σ2 (n1 ) = n1 . Let n2 be the largest integer such that
σ2 (n2 ) 6= n2 and k2 = σ2 (n2 ). In this way we find n ≥ n1 > n2 > · · · > nl and
k1 , . . . , kl such that ki < ni and such that (nl , kl ) · · · (n1 , k1 )σ = e, the identity of Sn .
Then σ = (n1 , k1 ) · · · (nl , kl ) and this representation is unique, subject to the condi-
tions on ni and ki . Recall that #(σ ) denotes the number of cycles in the cycle de-
composition of σ and |σ | is the minimal number of factors among all factorizations
into a product of transpositions.
Moreover l = |σ | = n − #(σ ) because |σi−1 | = |σi | − 1. Recall the Jucys-Murphy
elements in C[Sn ]; let

J1 = 0, J2 = (1, 2), ... Jk = (1, k) + (2, k) + · · · + (k − 1, k).

Show that Jk and Jl commute for all k and l.

Exercise 6. Let N be an integer. Using the factorization in Exercise 5 show that

(N + J1 ) · · · (N + Jn ) = ∑ N #(σ ) σ .
σ ∈Sn
110 4 Asymptotic Freeness

Exercise 7. Let G ∈ C[Sn ] be the function G(σ ) = N #(σ ) . Thus as operators we have
G = (N + J1 ) · · · (N + Jn ). Show that kJk k ≤ k − 1 and for N ≥ n, G is invertible in
C[Sn ]. Let WgN be the inverse of G.
By writing
N n WgN = (1 + N −1 J1 )−1 · · · (1 + N −1 Jn )−1
show that
N n WgN (σ ) = O(N −|σ | ).

Thus one knows the asymptotic decay

1
WgN (π) ∼ as N → ∞ (4.12)
N 2n−#(π)
for any π ∈ Sn . The convolution formula and the asymptotic estimate allow us to
prove the following result of Voiculescu [180, 188].
(1) (p)
Theorem 8. Let UN , . . . ,UN be p independent N × N Haar unitary random ma-
(1) (q)
trices, and let DN , . . . , DN be q deterministic N × N matrices such that

(1) (q) distr

DN , . . . , DN −→ d1 , . . . , dq as N → ∞.

Then, for N → ∞,
(1) (1)∗ (p) (p)∗ (1) (q) distr
UN ,UN , . . . ,UN ,UN , DN , . . . , DN −→ u1 , u∗1 , . . . , u p , u∗p , d1 , . . . , dq ,

where each ui is a Haar unitary and {u1 , u∗1 }, . . . , {u p , u∗p }, {d1 , . . . , dq } are free.
(1) (1)∗
The above convergence holds also almost surely. In particular, {UN ,UN }, . . . ,
(p) (p)∗ (1) (q)
{UN ,UN }, {DN , . . . , DN } are almost surely asymptotically free.

The proof proceeds in a fashion similar to the Gaussian setting and will not be
given here. We refer to [140, Lecture 23].
Note that in general if u is a Haar unitary such that {u, u∗ } is free from elements
{a, b}, then a and ubu∗ are free. In order to prove this, consider

ϕ p1 (a)q1 (ubu∗ ) · · · pr (a)qr (ubu∗ )

where pi , qi are polynomials such that for all i = 1, . . . , r

ϕ(pi (a)) = 0 = ϕ(qi (ubu∗ )).

Note that by the unitary condition we have qi (ubu∗ ) = uqi (b)u∗ . Thus, by the free-
ness between {u, u∗ } and b,

0 = ϕ(qi (ubu∗ )) = ϕ(uqi (b)u∗ ) = ϕ(uu∗ )ϕ(qi (b)) = ϕ(qi (b)).

4.3 Haar distributed unitary random and deterministic matrices 111

But then

ϕ p1 (a)q1 (ubu∗ ) · · · pr (a)qr (ubu∗ ) = ϕ p1 (a)uq1 (b)u∗ p2 (a) · · · pr (a)uqr (b)u∗

is zero, since {u, u∗ } is free from {a, b} and ϕ vanishes on all the factors in the latter
product.
Thus our Theorem 8 yields also the following as a corollary.

Theorem 9. Let AN and BN be two sequences of deterministic N × N matrices with

distr distr
AN −→ a and BN −→ b. Let UN be a sequence of N × N Haar unitary random
distr
matrices. Then AN ,UN BN UN∗ −→ a, b, where a and b are free. This convergence
holds also almost surely. So in particular, we have that AN and UN BN UN∗ are almost
surely asymptotically free.

The reader might notice that this theorem is, strictly speaking, not a consequence
of Theorem 8, because in order to use the latter we would need the assumption that
also mixed moments in AN and BN converge to some limit; which we do not assume
in Theorem 9. However, the proof of Theorem 8, for the special case where we only
need to consider moments in which UN and UN∗ come alternatingly, reveals that we
never encounter a mixed moment in AN and BN . The structure of the Weingarten
formula ensures that they will never interact. A detailed proof of Theorem 9 can be
found in [140, Lecture 23].
Conjugation by a Haar unitary random matrix corresponds to a random rotation.
Thus the above theorem says that randomly rotated deterministic matrices become
asymptotically free in the limit of large matrix dimension. Another way of saying
this is that random matrix ensembles which are unitarily invariant (i.e., such that
the joint distribution of their entries is not changed by conjugation with any unitary
matrix) are asymptotically free from deterministic matrices.
Note that the eigenvalue distribution of BN is not changed if we consider UN BN UN∗
instead. Only the relation between AN and BN is brought into a generic form by ap-
plying a random rotation between the eigenspaces of AN and of BN .
Again one can generalize Theorems 8 and 9 by replacing the deterministic ma-
trices by random matrices, which are independent from the Haar unitary matrices
and which have an almost sure limit distribution. As outlined at the end of the last
section we will replace in Chapter 5 the assumption of almost sure convergence by
the vanishing of fluctuations var[tr(·), tr(·)] like 1/N 2 . See also our discussions in
Chapter 5 around Remark 5.26 and Theorem 5.29.
Note also that Gaussian random matrices are invariant under conjugation by uni-
tary matrices, i.e., if BN is GUE, then also UN BN UN∗ is GUE. Furthermore the fluctu-
ations of GUE random matrices vanish of the right order and hence we have almost
sure convergence to the semi-circle distribution. Thus Theorem 9 (in the version
where BN is allowed to be a random matrix ensemble with almost sure limit dis-
tribution) contains the asymptotic freeness of Gaussian random matrices and deter-
ministic random matrices (Theorem 4) as a special case.
112 4 Asymptotic Freeness

4.4 Asymptotic freeness between Wigner and deterministic random matrices

Wigner matrices are generalizations of Gaussian random matrices: the entries are,
apart from symmetry conditions, independent and identically distributed, but with
arbitrary, not necessarily Gaussian, distribution. Whereas Gaussian random matri-
ces are unitarily invariant this is not true any more for general Wigner matrices;
thus we cannot use the results about Haar unitary random matrices to derive asymp-
totic freeness results for Wigner matrices. Nevertheless, there are many results in
the literature which show that Wigner matrices behave with respect to eigenvalue
questions in the same way as Gaussian random matrices. For example, their eigen-
value distribution converges always to a semi-circle. In order to provide a common
framework and possible extensions for such investigations it is important to set-
tle the question of asymptotic freeness for Wigner matrices. We will show that in
this respect Wigner matrices also behave like Gaussian random matrices. It turns out
that the estimates for the subleading terms are, compared to the Gaussian case, more
involved. However, there is actually a nice combinatorial structure behind these es-
timates, which depends on a general estimate for sums given in terms of graphs.
This quite combinatorial approach goes back to work of Yin and Krishnaiah who
considered the product of two random matrices, one of them being a covariance ma-
trix (i.e. a Wishart matrix, see Section 4.5). Their moment calculations are special
cases of the general asymptotic freeness calculations which we have to address in
this section.
Another proof for the asymptotic freeness of Wigner matrices which does not
rely on the precise graph sum estimates for the subleading terms can be found in the
book of Anderson, Guionnet, Zeitouni [6]. The special case where the deterministic
matrices are of a block-diagonal form was already treated by Dykema in [66].
We will extend our calculations from Section 4.2 to Wigner matrices. So let
(i)
(AN )N≥1 now be a sequence of Wigner matrices, and (DN )N≥1 sequences of de-
terministic matrices whose joint limit distribution exists. We have to look at alter-
nating moments in Wigner matrices and deterministic matrices. Again, we consider
just one Wigner matrix, but it is clear that the same arguments work also for a family
of independent Wigner matrices, by just decorating the AN with an additional index.
In order to simplify the notation it is also advantageous to consider the case where
the entries of our Wigner matrices are real random variables. So now let us first give
a precise definition what we mean by a Wigner matrix.

Notation 10 Let µ be a probability distribution on R. Let ai j for i, j ∈ N with i ≤ j

be independent identically distributed real random variables with distribution µ. We
also put ai j := a ji for i > j. Then the corresponding N × N Wigner random matrix
ensemble is given by the self-adjoint random matrix
1
AN = √ (ai j )Ni,j=1 .
N
4.4 Wigner and deterministic random matrices 113

Let AN be now such a Wigner matrix; clearly, in our algebraic frame we have to
assume that all moments of µ exist; furthermore, we have to assume that the mean
of µ is zero, and we normalize the variance of µ to be 1.

Remark 11. We want to comment on our assumption that µ has mean zero. In ana-
lytic proofs involving Wigner matrices one usually does not need this assumption.
For example, Wigner’s semi-circle law holds for Wigner matrices, even if the en-
tries have non-vanishing mean. The general case can, by using properties of weak
convergence, be reduced to the case of vanishing mean. However, in our algebraic
frame we cannot achieve this reduction. The reason for this discrepancy is that
our notion of convergence in distribution is actually stronger than weak conver-
gence in situations where mass might escape to infinity. For example, consider a
deterministic diagonal matrix DN , with a11 = N, and all other entries zero. Then
µDN = (1 − 1/N)δ0 + 1/NδN , thus µDN converges weakly to δ0 , for N → ∞. How-
ever, the second and higher moments of DN with respect to tr do not converge, thus
DN does not converge in distribution.
Another simplifying assumption we have made is that the distribution of the di-
agonal entries is the same as that of the off-diagonal entries. With a little more work
the method given here can be made to work without this assumption.

We examine now an averaged alternating moment in our deterministic matrices

(k) (k)
DN = (di j ) and the Wigner matrix AN = √1N (ai j ). We have

h i
(1) (m)
E tr DN AN · · · DN AN
N
1 h
(1) (m)
i
= ∑ E di1 i2 ai2 i3 · · · di2m−1 i2m ai2m i1
N m/2+1 i1 ,...,i2m =1
N
1 (1) (m)
= ∑ E ai2 i3 · · · ai2m i1 di1 i2 · · · di2m−1 i2m
N m/2+1 i1 ,...,i2m =1
N
1 (1) (m)
= ∑ ∑ kσ (ai2 i3 , . . . , ai2m i1 )di1 i2 · · · di2m−1 i2m .
N m/2+1 i1 ,...,i2m =1 σ ∈P (m)

In the last step we have replaced the Wick formula for Gaussian random variables
by the general expansion of moments in terms of classical cumulants. Now we use
the independence of the entries of AN . A cumulant in the ai j is only different from
zero if all its arguments are the same; of course, we have to remember that ai j = a ji .
(Not having to bother about the complex conjugate here, is the advantage of looking
at real Wigner matrices.) Thus, in order that kσ [ai2 i3 , . . . , ai2m i1 ] is different from zero
we must have: if k and l are in the same block of σ then we must have {i2k , i2k+1 } =
{i2l , i2l+1 }. Note that now we do not prescribe whether i2k has to agree with i2l
or with i2l+1 . In order to deal with partitions of the indices i1 , . . . , i2m instead of
partitions of the pairs (i2 , i3 ), (i4 , i5 ) . . . , (i2m , i1 ), we say that a partition π ∈ P(2m)
114 4 Asymptotic Freeness

is a lift of a partition σ ∈ P(m) if we have for all k, l = 1, . . . , m with k 6= l that

n o
k ∼σ l ⇔ [2k ∼π 2l and 2k + 1 ∼π 2l + 1] or [2k ∼π 2l + 1 and 2k + 1 ∼π 2l] .

Here we using the notation k ∼σ l to mean that k and l are in the same block of σ .
Then the condition that kσ (ai2 i3 , . . . , ai2m i1 ) is different from zero can also be para-
phrased as: ker i ≥ π, for some lift π of σ . Note that the value of kσ (ai2 i3 , . . . , ai2m i1 )
depends only on ker(i) because we have assumed that the diagonal and off-diagonal
elements have the same distribution. Let us denote this common value by kker(i) .
Thus we can rewrite the equation above as
h i
(1) (m)
E tr DN AN · · · DN AN
1 (1) (m)
= ∑ ∑ kker(i) di1 i2 · · · di2m−1 i2m . (4.13)
N m/2+1 σ ∈P (m) i:[2m]→[N]
ker i≥π for some lift π of σ

Note that in general there is not a unique lift of a given σ . For example, for the one
block partition σ = {(1, 2, 3)} ∈ P(3) we have the following lifts in P(6):

{(1, 3, 5), (2, 4, 6)}, {(1, 3, 4), (2, 5, 6)}, {(1, 2, 4), (3, 5, 6)},

{(1, 2, 5), (3, 4, 6)}, {(1, 2, 3, 4, 5, 6)}.

If σ consists of several blocks then one can make the corresponding choice for
each block of σ . If σ is a pairing there is a special lift π of σ which we call the
standard lift of σ ; if (r, s) is a block of σ , then π will have the blocks (2r + 1, 2s)
and (2r, 2s + 1).
If we want to rewrite the sum over i in (4.13) in terms of sums of the form
(1) (m)
∑ di1 i2 · · · di2m−1 i2m (4.14)
i:[2m]→[N]
ker i≥π

for fixed lifts π, then we have to notice that in general a multi-index i will show up
with different π’s; indeed, the lifts of a given σ are partially ordered by inclusion
and form a poset; thus we can rewrite the sum over i with ker i ≥ π for some lift π of
σ in terms of sums over fixed lifts, with some well-defined coefficients (given by the
Möbius function of this poset – see Exercise 8). However, the precise form of these
coefficients is not needed since we will show that at most one of the corresponding
sums has the right asymptotic order (namely N m/2+1 ), so all the other terms will play
no role asymptotically. So our main goal will now be to examine the sum (4.14) and
show that for all π ∈ P(2m) which are lifts of σ , a term of the form (4.14) grows
in N with order at most m/2 + 1, and furthermore, this maximal order is achieved
only in the case in which σ is a non-crossing pairing and π is the standard lift of σ .
4.4 Wigner and deterministic random matrices 115

After identifying these terms we must relate them to Equation (4.9); this is achieved
in Exercise 9.
Exercise 8. Let σ be a partition of [m] and M = {π ∈ P(2m) | π is a lift of σ }. For
a subset L of M, let πL = supπ∈L π; here sup denotes the join in the lattice of all
partitions. Use the principle of inclusion-exclusion to show that
(1) (m) (1) (m)
∑ di1 i2 · · · di2m−1 i2m = ∑ (−1)|L|−1 ∑ di1 i2 · · · di2m−1 i2m .
i:[2m]→[N] L⊂M i:[2m]→[N]
ker i≥π for some π∈M ker i≥πL

Exercise 9. Let σ be a pairing of [m] and π be the standard lift of σ . Then

(1) (m) (1) (m)
∑ di1 i2 · · · di2m−1 i2m = Trγm σ (DN , . . . , DN ).
ker(i)≥π

Let us first note that, because of our assumption that the entries of the Wigner
matrices have vanishing mean, first-order cumulants are zero and thus only those
σ which have no singletons will contribute to (4.13). This implies the same prop-
erty for the lifts and in (4.14) we can restrict ourselves to considering π without
singletons.
It turns out that it is convenient to associate to π a graph Gπ . Let us start with
the directed graph Γ2m with 2m vertices labelled 1, 2, . . . , 2m and directed edges
(1, 2), (3, 4), . . . , (2m − 1, 2m); (2i − 1, 2i) starts at 2i and goes to 2i − 1. Given a
π ∈ P(2m) we obtain a directed graph Gπ by identifying the vertices which belong
to the same block of π. We will not identify the edges (actually, the direction of two
edges between identified vertices might even not be the same) so that Gπ will in
general have multiple edges, as well as loops. The sum (4.14) can then be rewritten
in terms of the graph G = Gπ as
(e)
SG (N) := ∑ ∏ dit(e) ,is(e) , (4.15)
i:V (G)→[N] e∈E(G)

where we sum over all functions i : V (G) → [N], and for each such function we take
(e)
the product of dit(e) ,is(e) as e runs over all the edges of the graph and s(e) and t(e)
denote, respectively, the source and terminus of the edge e. Note that we keep all
edges under the identification according to π; thus the m matrices D(1) , . . . , D(m) in
(4.14) show up in (4.15) as the various De for the m edges of Gπ . See Fig. 4.6.
What we have to understand about such graph sums is their asymptotic behaviour
as N → ∞. This problem has a nice answer for arbitrary graphs, namely one can
estimate such graph sums (4.15) in terms of the norms of the matrices corresponding
to the edges and properties of the graph G. The relevant feature of the graph is the
structure of its two-edge connected components.
116 4 Asymptotic Freeness

Definition 12. A cutting edge of a connected graph is an edge whose removal would
disconnect the graph. A connected graph is two-edge connected if it does not contain
a cutting edge, i.e., if it cannot be disconnected by the removal of an edge. A two-
edge connected component of a graph is a two-edge connected subgraph which is
not properly contained is a larger two-edge connected subgraph.
A forest is a graph without cycles. A tree is a connected component of a forest,
i.e., a connected graph without cycles. A tree is trivial if it consists of only one
vertex. A leaf of a non-trivial tree is a vertex which meets only one edge. The sole
vertex of a trivial tree will also be called a trivial leaf.

It is clear that if one shrinks each two-edge connected component of a graph to a

vertex and removes the loops, then one does not have any more cycles, thus one is
left with a forest.

Notation 13 For a graph G we denote by F(G) its forest of two-edge connected

components; the vertices of F(G) consist of the two-edge connected components of
G and two distinct vertices of F(G) are connected by an edge if there is a cutting
edge between vertices from the two corresponding two-edge connected components
in G.

We can now state the main theorem on estimates for graph sums. The special case
for two-edge connected graphs goes back to the work of Yin and Krishnaiah [206],
see also the book of Bai and Silverstein [15]. The general case, which is stronger
than the corresponding statement in [206, 15], is proved in [130].

Theorem 14. Let G be a directed graph, possibly with multiple edges and loops. Let
(e)
for each edge e of G be given an N × N matrix De = (di j )Ni,j=1 . Then the associated
graph sum (4.15) satisfies

|SG (N)| ≤ N r(G) · ∏ kDe k, (4.16)

e∈E(G)

where r(G) is determined as follows from the structure of the graph G. Let F(G) be
the forest of two-edge connected components of G. Then

r(G) = ∑ r(l),
l leaf of F(G)

where (
1, if l is a trivial leaf
r(l) := 1
.
2, if l is a leaf of a non-trivial tree

Note that each tree of the forest F(G) makes at least a contribution of 1 in r(G),
because a non-trivial tree has at least two leaves. One can also make the description
above more uniform by having a factor 1/2 for each leaf, but then counting a trivial
leaf as two actual leaves. Note also that the direction of the edges plays no role for
4.4 Wigner and deterministic random matrices 117

2
D1
1
(3)
D2
3 (1, 4, 6)
(2)
6 D2
D1
D3
D3 4 (5)
5

Fig. 4.6 On the left we have Γ6 . We let π be the partition of [6] with blocks {(1, 4, 6), (2), (3), (5)}.
(1) (2) (3)
The graph on the right is Gπ . We have Sπ (N) = ∑i, j,k,l di j d jk d jl and r(Gπ ) = 3/2.

the estimate above. The direction of an edge is only important in order to define the
contribution of an edge to the graph sum. One direction corresponds to the matrix
De , the other direction corresponds to the transpose Dte . Since the norm of a matrix
is the same as the norm of its transpose, the estimate is the same for all graph sums
which correspond to the same undirected graph.
Let us now apply Theorem 14 to Gπ . We have to show that r(Gπ ) ≤ m/2 + 1 for
our graphs Gπ , π ∈ P(2m). Of course, for general π ∈ P(2m) this does not need to
be true. For example, if π = {(1, 2), (3, 4), . . . , (2m − 1, 2m)} then Gπ consists of m
isolated points and thus r(Gπ ) = m. Clearly, we have to take into account that we
can restrict in (4.13) to lifts of a σ without singletons.

Definition 15. Let G = (V, E) be a graph and w1 , w2 ∈ V . Let us consider the graph
G0 obtained by merging the two vertices w1 and w2 into a single vertex w. This
means that the vertices V 0 of G0 are (V \ {w1 , w2 }) ∪ {w}. Also each edge of G
becomes an edge of G0 , except that if the edge started (or ended) at w1 or w2 then
the corresponding edge of G0 starts (or ends) at w.

Lemma 16. Suppose π1 and π2 are partitions of [2m] and π1 ≤ π2 . Then r(Gπ2 ) ≤
r(Gπ1 ).

Proof: We only have to consider the case where π2 is obtained from π1 by joining
two blocks w1 and w2 of π1 , and then use induction.
We have to consider three cases. Let C1 and C2 be the two-edge connected com-
ponents of Gπ1 containing w1 and w2 respectively. Recall that r(Gπ1 ) is the sum of
the contributions of each connected component and the contribution of a connected
component is either 1 or one half the number of leaves in the corresponding tree of
F(Gπ1 ), whichever is larger.
Case 1. Suppose the connected component of Gπ1 containing w1 is two-edge con-
nected, i.e. C1 becomes the only leaf of a trivial tree in F(Gπ1 ). Then the contribution
of this component to r(Gπ1 ) is 1. If w2 is in C1 then merging w1 and w2 has no effect
on r(Gπ1 ) and thus r(Gπ1 ) = r(Gπ2 ). If w2 is not in C1 , then C1 gets joined to some
118 4 Asymptotic Freeness

Fig. 4.7 Suppose w1 and w2 are in the same connected component of Gπ1 but in different, say C1
and C2 , two-edge connected components of Gπ1 , we collapse the edge (shown here shaded) joining
C1 to C2 in F(Gπ1 ). (See Case 3 in the proof of Lemma 16.)

e e e
… … … …
v v v v v

Fig. 4.8 If we remove the vertex v from a graph we replace the edges e1 and e2 by the edge e. (See
Definition 17.)

other connected component of Gπ1 , which will leave the contribution of this other
component unchanged. In this latter case we shall have r(Gπ2 ) = r(Gπ1 ) − 1.
For the rest of the proof we shall assume that neither w1 nor w2 lies in a connected
component of Gπ1 which has only one two-edge connected component.
Case 2. Suppose w1 and w2 lie in different connected components of Gπ1 . When w1
and w2 are merged the corresponding two-edge connected components are joined.
If either of these corresponded to a leaf in F(Gπ1 ) then the number of leaves would
be reduced by 1 or 2 (depending on whether both two-edge components were leaves
in Fπ1 ). Hence r(Gπ2 ) is either r(Gπ1 ) − 1/2 or r(Gπ1 ) − 1.
Case 3. Suppose that both w1 and w2 are in the same connected component of Gπ1 .
Then the two-edge connected components C1 and C2 become vertices of a tree T in
F(Gπ1 ) (see Fig. 4.7). When we merge w1 and w2 we form a two-edge connected
component C of Gπ2 , which consists of all the two-edge connected components
corresponding to the vertices of T along the unique path from C1 to C2 . On the level
of T this corresponds to collapsing all the edges between C1 and C2 into a single
vertex. This may reduce the number of leaves by 0, 1, or 2. If there were only two
leaves, we might end up with a single vertex but the contribution to r(Gπ1 ) would
still not increase. Thus r(Gπ1 ) can only decrease.

Definition 17. Let G be a directed graph and let v be a vertex of G. Suppose that v
has one incoming edge e1 and one outgoing edge e2 . Let G0 be the graph obtained
by removing e1 , e2 and v and replacing these with an edge e from s(e1 ) to t(e2 ). We
say that G0 is the graph obtained from G by removing the vertex v. See Fig. 4.8.
We say that the degree of a vertex is the number of edges to which it is incident,
using the convention that a loop contributes 2. The total degree of a subgraph is the
sum of the degrees of all its vertices.
Using the usual order on partitions of [2m], we say that a partition π is a minimal
lift of σ if is not larger than some other lift of σ .
4.4 Wigner and deterministic random matrices 119

Lemma 18. Let σ be a partition of [m] without singletons and π ∈ P(2m) be a

minimal lift of σ . Suppose that Gπ contains a two-edge connected component of
total degree strictly less than 3 and which becomes a leaf in F(Gπ ). Then
(i) (k − 1, k) is a block of σ ; and
(ii) (2k − 2, 2k + 1) and (2k − 1, 2k) are blocks of π.
Let σ 0 be the partition obtained by deleting the block (k − 1, k) from σ and π 0 the
partition obtained by deleting (2k − 2, 2k + 1) and (2k − 1, 2k) from π. Then π 0 is a
minimal lift of σ 0 and the graph Gπ 0 is obtained from Gπ by
(a) deleting the connected component (2k − 1, 2k) and;
(b) deleting the vertex obtained from (2k − 2, 2k + 1);
(c) r(Gπ ) = r(Gπ 0 ) + 1.

Proof: Since σ has no singletons, each block of σ contains at least two elements
and thus each block of the lift π contains at least two points. Thus every vertex of
Gπ has degree at least 2. So a two-edge connected component with total degree less
than 3 must consist of a single vertex. Moreover if this vertex has distinct incoming
and outgoing edges then this two-edge connected component cannot become a leaf
in F(Gπ ). Thus Gπ has a two-edge connected component C which consists of a
vertex with a loop. Moreover C will also be a connected component. Since an edge
always goes from 2k − 1 to 2k, π must have a block consisting of the two elements
2k − 1 and 2k. Since π is a lift of σ , σ must have the block (k − 1, k). Since π is a
minimal lift of σ , π has the two blocks (2k − 2, 2k + 1), (2k − 1, 2k). This proves (i)
and (ii).
Now π 0 is a minimal lift of σ 0 because π was minimal on all the other blocks of
σ . Also the block (2k − 2, 2k + 1) corresponds to a vertex of Gπ with one incoming
edge and one outgoing edge. Thus by removing this block from π we remove a
vertex from Gπ , as described in Definition 17. Hence Gπ 0 is obtained from Gπ by
removing the connected component C and the vertex (2k − 2, 2k + 1).
Finally, the contribution of C to r(Gπ ) is 1. If the connected component, C0 , of
Gπ containing the vertex (2k − 2, 2k + 1) has only one other vertex, which would
have to be (2k − 3, 2k + 2), the contribution of this component to r(Gπ ) will be 1
and Gπ 0 will have as a connected component this vertex (2k − 3, 2k + 2) and a loop
whose contribution to r(Gπ 0 ) will still be 1. On the other hand, if C0 has more than
one other vertex then the number of leaves will not be diminished when the vertex
(2k − 1, 2k + 1) is removed and thus also in this case the contribution of C0 to r(Gπ )
is unchanged. Hence in both cases r(Gπ ) = r(Gπ 0 ) + 1.

Lemma 19. Consider σ ∈ P(m) without singletons and let π ∈ P(2m) be a lift of
σ . Then we have for the corresponding graph Gπ that
m
r(Gπ ) ≤ + 1, (4.17)
2
120 4 Asymptotic Freeness

(1,3) (2,4)
D1 D2

(1, 2) (3, 4)
D2

Fig. 4.9 If σ = {(1, 2)} there are two possible minimal lifts: π1 = {(1, 2), (3, 4)} and π2 =
{(1, 3), (2, 4)}. We show Gπ1 on the left and Gπ2 on the right. The graph sum for π1 is
Tr(D1 )Tr(D2 ) and the graph sum for π2 is Tr(D1 Dt2 ). (See the conclusion of the proof of Lemma
19.)

and we have equality if and only if σ is a non-crossing pairing and π the corre-
sponding standard lift

k ∼σ l ⇔ 2k ∼π 2l + 1 and 2k + 1 ∼π 2l .

Proof: By Lemma 16 we may suppose that π is a minimal lift of σ . Let the con-
nected components of Gπ be C1 , . . . ,C p . Let the number of edges in Ci be mi , and
the number of leaves in the tree of F(Gπ ) corresponding to Ci be li . The contribution
of Ci to r(Gπ ) is ri = max{1, li /2}.
Suppose σ has no blocks of the form (k − 1, k). Then by Lemma 18 each two-
edge connected component of Gπ which becomes a leaf in F(Gπ ) must have total
degree at least 3. Thus mi ≥ 2 for each i. Moreover the contribution of each leaf to
the total degree must be at least 3. Thus 3li ≤ 2mi . If li ≥ 2 then ri = li /2 ≤ mi /3. If
li = 1 then, as mi ≥ 2, we have ri = 1 ≤ mi /2. So in either case ri ≤ mi /2. Summing
over all components we have r(Gπ ) ≤ m/2.
If σ does contain a block of the form (k −1, k) and π blocks (2k −2, 2k +1), (2k −
1, 2k), then we may repeatedly remove these blocks from σ and π until we reach
σ 0 and π 0 such that either: (a) σ 0 contains no blocks which are a pair of adjacent
elements; or (b) σ 0 = {(1, 2)} (after renumbering) and π 0 is a minimal lift of σ 0 . In
either case by Lemma 18, r(Gπ ) = r(Gπ 0 ) + q where q is the number of times we
have removed a pair of adjacent elements of σ .
In case (a), we have by the earlier part of the proof that r(Gπ 0 ) ≤ m0 /2. Thus
r(Gπ ) = r(Gπ 0 ) + q ≤ m0 /2 + q = m/2.
In case (b) we have that σ 0 = {(1, 2)} and either π = {(1, 2), (3, 4)} (π is stan-
dard) or π = {(1, 3), (2, 4)} (π is not standard). In the first case, see Fig. 4.9,
Gπ 0 has two vertices, each with a loop and so r(Gπ 0 ) = 2 = m0 /2 + 1, and hence
r(Gπ ) = q + m0 /2 + 1 = m/2 + 1. In the second case Gπ 0 is two-edge connected and
so r(Gπ 0 ) = 1 = m0 /2, and hence r(Gπ ) = q + m0 /2 = m/2. So we can only have
r(Gπ ) = m/2 + 1 when σ is a non-crossing pairing and π is standard; in all other
cases we have r(Gπ ) ≤ m/2.
Equipped with this lemma the investigation of the asymptotic freeness of Wigner
matrices and deterministic matrices is now quite straightforward. Lemma 19 shows
that the sum (4.14) has at most the order N m/2+1 and that the maximal order is
4.4 Wigner and deterministic random matrices 121

achieved exactly for σ which are non-crossing pairings and for π which are the
corresponding standard lifts. But for those we get in (4.13) the same contribution
as for Gaussian random matrices. The other terms in (4.13) will vanish, as long as
we have uniform bounds on the norms of the deterministic matrices. Thus the result
for Wigner matrices is the same as for Gaussian matrices, provided we assume a
uniform bound on the norm of the deterministic matrices.
Moreover the forgoing arguments can be extended to several independent Wigner
matrices. Thus we have proved the following theorem.

Theorem 20. Let µ1 , . . . , µ p be probability measures on R, for which all moments

(1) (p)
exist and for which the means vanish. Let AN , . . . , AN be p independent N × N
Wigner random matrices with entry distributions µ1 , . . . , µ p , respectively, and let
(1) (q)
DN , . . . , DN be q deterministic N × N matrices such that for N → ∞

(1) (q) distr

DN , . . . , DN −→ d1 , . . . , dq

and such that

(r)
sup kDN k < ∞.
N∈N
r=1,...,q

Then, as N → ∞,
(1) (p) (1) (q) distr
AN , . . . , AN , DN , . . . , DN −→ s1 , . . . , s p , d1 , . . . , dq ,

where each si is semi-circular and s1 , . . . , s p , {d1 , . . . , dq } are free. In particular, we

(1) (p) (1) (q)
have that AN , . . . , AN , {DN , . . . , DN } are asymptotically free.

By estimating the variance of the traces one can show that one also has almost
sure convergence in the above theorem; also, one can extend those statements to ran-
(k)
dom matrices DN which are independent from the Wigner matrices, provided one
assumes the almost sure version of a limit distribution and of the norm boundedness
condition. We leave the details to the reader.
Exercise 10. Show that under the same assumptions as in Theorem 20 one can
bound the variance of the trace of a word in Wigner and deterministic matrices as
h
(1) (m) i C
var tr DN AN · · · DN AN ≤ 2 ,
N
where C is a constant, depending on the word.
Show that this implies that Wigner matrices and deterministic matrices are almost
surely asymptotically free under the assumptions of Theorem 20.

Exercise 11. State (and possibly prove) the version of Theorem 20, where the
(1) (q)
DN , . . . , DN are allowed to be random matrices.
122 4 Asymptotic Freeness

4.5 Examples of random matrix calculations

In the following we want to look at some examples which show how the machinery
of free probability can be used to calculate asymptotic eigenvalue distributions of
random matrices.

4.5.1 Wishart matrices and the Marchenko-Pastur distribution

Fig. 4.10 On the left we have the eigenvalue distribution of a Wishart random matrix with N = 100
and M = 200 averaged over 3000 instances and on the right we have one instance with N = 2000
and M = 4000. The solid line is the graph of the density of the limiting distribution.

Besides the Gaussian random matrices the most important random matrix en-
semble are the Wishart random matrices [203]. They are of the form A = N1 XX ∗ ,
where X is an N × M random matrix with independent Gaussian entries. There are
two forms: a complex case when the entries xi j are standard complex Gaussian ran-
dom variables with mean 0 and E(|xi j |2 ) = 1; and a real case where the entries are
real-valued Gaussian random variables with mean 0 and variance 1. Again, one has
an almost sure convergence to a limiting eigenvalue distribution (which is the same
in both cases), if one sends N and M to infinity in such a way that the ratio M/N
is kept fixed. Fig. 4.10 above shows the eigenvalue histograms with M = 2N: for
N = 100 and N = 2000. For N = 200 we have averaged over 3000 realizations.
By similar calculations as for the Gaussian random matrices one can show that in
the limit N, M → ∞ such that the ratio M/N → c, for some 0 < c < ∞, the asymptotic
averaged eigenvalue distribution is given by

lim E[tr(Ak )] = ∑ c#(π) . (4.18)

N,M→∞
M/N→c π∈NC(k)

Exercise 12. Show that for A = N1 XX ∗ , a Wishart matrix as above, we have

N M
1
E(Tr(Ak )) = ∑ ∑ E(xi1 i−1 xi2 i−1 · · · xik i−k xi1 i−k ).
N k i1 ,...,i =1 i−1 ,...,i
k −k =1
4.5 Examples of random matrix calculations 123

Then use Exercise 1.7 to show that, in the case of standard complex Gaussian entries
for X, we have the “genus expansion”

−1 )−(k+1)
M #(σ )
E(tr(Ak )) = ∑ N #(σ )+#(γk σ .
σ ∈Sk N

Then use Proposition 1.5 to prove (4.18).

This means that all free cumulants of the limiting distribution are equal to c.
This qualifies the limiting distribution to be called a free Poisson distribution of
rate c. Since this limiting distribution of Wishart matrices was first calculated by
Marchenko and Pastur [123], it is in the random matrix literature usually called
the Marchenko-Pastur distribution. See Definition 2.11, Exercises 2.10, 2.11, and
Remark 3.11 and the subsequent exercises.
Exercise 13. We have chosen the normalization for Wishart matrices that simplifies
the free cumulants. The standard normalization is M1 XX ∗ . If we let A0 = M1 XX ∗ then
A= M 0
N A so in the limit we have scaled the distribution by c. Using Exercise 2.12,
show that the limiting eigenvalue distribution of A0 is ρy where y = 1/c (using the
notation of Remark 2.12).

4.5.2 Sum of random matrices

Let us now consider the sum of random matrices. If the two matrices are asymptot-
ically free then we can apply the R-transform machinery for calculating the asymp-
totic distribution of their sum. Namely, for each of the two matrices we calculate the
Cauchy transform of their asymptotic eigenvalue distribution, and from this their
R-transform. Then the sum of the R-transforms gives us the R-transform of the sum
of the matrices, and from there we can go back to the Cauchy transform and, via
Stieltjes inversion theorem, to the density of the sum.

Example 21. As an example, consider A +UAU ∗ , where U is a Haar unitary random

matrix and A is a diagonal matrix with N/2 eigenvalues −1 and N/2 eigenvalues 1.
(See Fig. 4.11.)

Fig. 4.11 The eigenvalue

distribution of A + UAU ∗ . In 0.3 0.3

the left graph we have 1000 0.2 0.2

realizations with N = 100, and 0.1 0.1
in the right, one realization 0 0
-2 -1 0 1 2 -2 -1 0 1 2
with N = 1000.
124 4 Asymptotic Freeness

0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05

-1 0 1 2 3 4 -1 0 1 2 3 4

Fig. 4.12 On the left we display the averaged eigenvalue distribution for 3000 realizations of the
sum of a GUE and a complex Wishart random matrix with M = 200 and N = 100. On the right
we display the eigenvalue distribution of a single realization of the sum of a GUE and a complex
Wishart random matrix with M = 8000 and N = 4000.

Thus, by Theorem 9, the asymptotic eigenvalue distribution of the sum is the

same as the distribution of the sum of two free Bernoulli distributions. The latter
can be easily calculated as the arc-sine distribution. See [140, Example 12.8].

Example 22. Consider now independent GUE and Wishart matrices. They are asymp-
totically free, thus the asymptotic eigenvalue distribution of their sum is given by
the free convolution of a semi-circle and a Marchenko-Pastur distribution.
Fig. 4.12 shows the agreement (for c = 2) between numerical simulations and
the predicted distribution using the R-transform. The first is averaged over 3000
realizations with N = 100, and the second is one realization for N = 4000.

4.5.3 Product of random matrices

One can also rewrite the combinatorial description (2.23) of the product of free
variables into an analytic form. The following theorem gives this version in terms
of Voiculescu’s S-transform [178]. For more details and a proof of that theorem we
refer to [140, Lecture 18].

Theorem 23. Put Ma (z) := ∑∞ m m

m=0 ϕ(a )z and define the S-transform of a by

1 + z h−1i
Sa (z) := Ma (z),
z

where M h−1i denotes the inverse under composition of M. Then: if a and b are free,
we have Sab (z) = Sa (z) · Sb (z).

Again, this allows one to do analytic calculations for the asymptotic eigenvalue
distribution of a product of asymptotically free random matrices. One should note
in this context, that the product of two self-adjoint matrices is in general not self-
adjoint, thus is is not clear why all its eigenvalues should be real. (If they are not
real then the S-transform does not contain enough information to recover the eigen-
values.) However, if one makes the restriction that at least one of the matrices has
4.5 Examples of random matrix calculations 125

��
��
��
��

��

Fig. 4.13 The eigenvalue distribution of the product of two independent complex Wishart matrices.
On the left we have one realization with N = 100 and M = 500. On the right we have one realization
with N = 2000 and M = 10000. See Example 24.

positive spectrum, then, because the eigenvalues of AB are the same as those of the
self-adjoint matrix B1/2 AB1/2 , one can be sure that the eigenvalues of AB are real as
well, and one can use the S-transform to recover them. One should also note that a
priori the S-transform of a is only defined if ϕ(a) 6= 0. However, by allowing for-
√
mal power series in z one can also extend the definition of the S-transform to the
case where ϕ(a) = 0, ϕ(a2 ) > 0. For more on this, and the corresponding version
of Theorem 23 in that case, see [144].

Example 24. Consider two independent Wishart matrices. They are asymptotically
free; this follows either by the fact that a Wishart matrix is unitarily invariant or, al-
ternatively, by an easy generalization of the genus expansion from (4.18) to the case
of several independent Wishart matrices. So the asymptotic eigenvalue distribution
of their product is given by the distribution of the product of two free Marchenko-
Pastur distributions.
As an example consider two independent Wishart matrices for c = 5. Fig. 4.13
compares simulations with the analytic formula derived from the S-transform. The
first is one realization for N = 100 and M = 500, the second is one realization for
N = 2000 and M = 10000.
Chapter 5
Fluctuations and Second-Order Freeness

Given an N × N random matrix ensemble we often want to know, in addition to its

limiting eigenvalue distribution, how the eigenvalues fluctuate around the limit. This
is important in random matrix theory because in many ensembles the eigenvalues
exhibit repulsion and this feature is often important in applications (see e.g. [112]).
If we take a diagonal random matrix ensemble with independent entries, then the
eigenvalues are just the diagonal entries of the matrix and by independence do not
exhibit any repulsion. If we take a self-adjoint ensemble with independent entries,
i.e. the Wigner ensemble, the eigenvalues are not independent and appear to spread
evenly, i.e. there are few bald spots and there is much less clumping, see Figure 5.1.
For some simple ensembles one can obtain exact formulas measuring this repul-
sion, i.e. the two-point correlation functions; unfortunately these exact expressions
are usually rather complicated. However just as in the case of the eigenvalue distri-
butions themselves, the large N limit of these distributions is much simpler and can
be analysed.

��

� �
-� -� � � � -� -� � � �

Fig. 5.1 On the left is a histogram of the eigenvalues of an instance of a 50 × 50 GUE random ma-
trix. The tick marks at the bottom show the actual eigenvalues. On the right we have independently
sampled a semi-circular distribution 50 times. We can see that the spacing is more ‘uniform’ in the
eigenvalue plot (on the left). The fluctuation moments are a way of measuring this quantitatively.

127
128 5 Second-Order Freeness

We saw earlier that freeness allows us to find the limiting distributions of XN +YN
or XN YN provided we know the limiting distributions of XN and YN individually and
XN and YN are asymptotically free. The theory of second-order freeness, which was
developed in [59, 128, 129], provides an analogous machinery for calculating the
fluctuations of sums and products from those of the constituent matrices, provided
one has asymptotic second order freeness.
We want to emphasize that on the level of fluctuations the theory is less robust
than on the level of expectations. In particular, whereas on the first order level it
does not make any difference for most results whether we consider real or complex
random matrices, this is not true any more for second order. What we are going to
present here is the theory of second-order freeness for complex random matrices
(modelled according to the GUE). There exists also a real second-order freeness
theory (modelled according to the GOE, i.e., Gaussian orthogonal ensemble); the
general structure of the real theory is the same as in the complex case, but details
are different. In particular, in the real case there will be additional contributions in
the combinatorial formulas, which correspond to non-orientable surfaces. We will
not say more on the real case, but refer to [127, 147].

5.1 Fluctuations of GUE random matrices

To start let us return to our basic example, the GUE. Let XN be an N × N self-adjoint
√
Gaussian random matrix, that is, if we write XN = ( fi j )Ni,j=1 with fi j = xi j + −1 yi j ,
then {xi j }i≤ j ∪ {yi j }i< j is an independent set of Gaussian random variables with

E( fi j ) = 0, E(xii2 ) = 1/N, and E(xi2j ) = E(y2i j ) = 1/(2N) (for i 6= j).

The eigenvalue distribution

√ of XN converges almost surely to Wigner’s semi-
circular law (2π)−1 4 − t 2 dt and in particular if f is a polynomial and tr = N −1 Tr
is the normalized
R2 √ trace, then {tr( f (XN ))}N converges almost surely as N → ∞ to
(2π)−1 −2 f (t) 4 − t 2 dt. Thus, if f is a polynomial centred with respect to the
semi-circle law i.e.
1 2
Z p
f (t) 4 − t 2 dt = 0 (5.1)
2π −2
then {tr( f (XN ))}N converges almost surely to 0; however, if we rescale by multi-
plying by N, {Tr( f (XN ))}N becomes a convergent sequence of random variables,
and the limiting covariances for various f ’s give the fluctuations of XN . Assuming
a growth condition on the first two derivatives of f , Johansson [104] was able to
show the result below for more general functions f , but we shall just state it for
polynomials.
Theorem 1. Let f be a polynomial such that the centredness condition (5.1) is sat-
isfied and let {XN }N be the GUE. Then Tr( f (XN )) converges to a Gaussian ran-
dom variable. Moreover if {Cn }n are the Chebyshev polynomials of the first kind
(rescaled to [−2, 2]) then {Tr(Cn (XN ))}∞
n=1 converge to independent Gaussian ran-
dom variables with limN Tr(Cn (XN )) having mean 0 and variance n.
5.1 Fluctuations of GUE random matrices 129

The Chebyshev polynomials of the first kind are defined by the relation Tn (cos θ )
= cos nθ . They are the orthogonal polynomials on [−1, 1] which are orthogonal with
respect to the arc-sine law π −1 (1 − x2 )−1/2 . Rescaling to the interval [−2, 2] means
using the measure π −1 (4 − x2 )−1/2 dx and setting Cn (x) = 2 Tn (x/2). We thus have
C0 (x) = 2 C3 (x) = x3 − 3x
C1 (x) = x C4 (x) = x4 − 4x2 + 2
C2 (x) = x2 − 2 C5 (x) = x5 − 5x3 + 5x
and for n ≥ 1, Cn+1 (x) = xCn (x) −Cn−1 (x).
The reader will be asked to prove some of the above mentioned properties of Cn (as
well as corresponding properties of the second kind analogue Un ) in Exercise 12.
We will provide a proof of Theorem 1 at the end of this chapter, see Section 5.6.1.
Recall that in the case of first order freeness the moments of the GUE had a
combinatorial interpretation in terms of planar diagrams. These diagrams led to the
notion of free cumulants and the R-transform, which unlocked the whole theory.
For the GUE the moments {αk }k of the limiting eigenvalue distribution are 0 for k
odd and the Catalan numbers for k even. For example when k = 6, α6 = 5, the third
Catalan number, and the corresponding diagrams are the five non-crossing pairings
on [6].
1 1 1 1 1
6 2 6 2 6 2 6 2 6 2

5 3 5 3 5 3 5 3 5 3
4 4 4 4 4

To understand the fluctuations we shall introduce another type of planar diagram,

this time on an annulus. We shall confine our discussion to ensembles that have what
we shall call a second-order limiting distribution.

Definition 2. Let {XN }N be a sequence of random matrices. We say that {XN }N has
a second-order limiting distribution if there are sequences {αk }k and {α p,q } p,q such
that
◦ for all k, αk = limN E(tr(XNk )) and
◦ for all p ≥ 1 and q ≥ 1,

α p.q = lim cov Tr(XNp ), Tr(XNq )

◦ for all r > 2 and all integers p1 , . . . , pr ≥ 1

lim kr Tr(XNp1 ), Tr(XNp2 ), . . . , Tr(XNpr ) = 0.

Here, kr are the classical cumulants; note that the αk are the limits of k1 (which
is the expectation) and α p,q are the limits of k2 (which is the covariance).
130 5 Second-Order Freeness

Remark 3. Note that the first condition says that XN has a limiting eigenvalue dis-
tribution in the averaged sense. By the second condition the variances of normal-
ized traces go asymptotically like 1/N 2 . Thus, by Remark 4.2, the existence of a
second-order limiting distribution implies actually almost sure convergence to the
limit distribution.

We shall next show that the GUE has a second-order limiting distribution. The
numbers {α p,q } p,q that are obtained have an important combinatorial significance
as the number of non-crossing annular pairings. Informally, a pairing of the (p, q)-
annulus is non-crossing or planar if when we arrange the numbers 1, 2, 3, . . . , p
in clockwise order on the outer circle and the numbers p + 1, . . . , p + q in counter-
clockwise order on the inner circle there is a way to draw the pairings so that the
lines do not cross and there is at least one string that connects the two circles. For
example α4,2 = 8 and the eight drawings are shown below.

1 1 1 1

4 5 2 4 5 4 5 2 4 5 2
6 6 2 6 6

3 3 3 3
1 1 1 1

4 5 2 4 5 4 5 4 5
6 6 2 6 2 6 2

3 3 3 3

In Definition 2.7 we defined a partition π of [n] to be non-crossing if a certain

configuration, called a crossing, did not appear. A crossing was defined to be four
points a < b < c < d ∈ [n] such that a and c are in one block of π and b and d
are in another block of π. In [126] a permutation of [p + q] was defined to be a non-
crossing annular permutation if no one of five proscribed configurations appeared. It
was then shown that under a connectedness condition this definition was equivalent
to the algebraic condition #(π) + #(π −1 γ) = p + q, where γ = (1, 2, 3, . . . , p)(p +
1, . . . , p + q). In [129, §2.2] another definition was given.
Here we wish to present a natural topological definition (Definition 5) and show
that it is equivalent to the algebraic condition in [126]. The key idea is to relate a non-
crossing annular permutation to a non-crossing partition and then use an algebraic
condition found by Biane [33]. To state the theorem of Biane (Theorem 4) it is
necessary to regard a partition as a permutation by putting the elements of its blocks
in increasing order. It is also convenient not to distinguish notationally between a
partition and the corresponding permutation.
As before, we denote by #(π) the number of blocks or cycles of π. We let (i, j)
denote the transposition that switches i and j.
5.1 Fluctuations of GUE random matrices 131

The following theorem tells us when a permutation came from a non-crossing

partition. See [140, Prop. 23.23] for a proof. The proof uses induction and two sim-
ple facts about permutations.
◦ If π ∈ Sn and i, j ∈ [n] then

#(π (i, j)) = #(π) + 1 if i and j are in the same cycle of π

#(π (i, j)) = #(π) − 1 if i and j are in different cycles of π.

◦ If |π| is the minimum number of factors among all factorizations of π into a
product of transpositions, then

|π| + #(π) = n. (5.2)

Theorem 4. Let γn denote the permutation in Sn which has the one cycle (1, 2, 3, . . . ,
n). For all π ∈ Sn we have

#(π) + #(π −1 γn ) ≤ n + 1; (5.3)

and π, considered as a partition, is non-crossing if and only if

#(π) + #(π −1 γn ) = n + 1. (5.4)

Definition 5. The (p, q)-annulus is the annulus with the integers 1 to p arranged
clockwise on the outside circle and p + 1 to p + q arranged counterclockwise on the
inner circle. A permutation π in S p+q is a non-crossing permutation on the (p, q)-
annulus (or just: a non-crossing annular permutation) if we can draw the cycles of
π between the circles of the annulus so that
(i) the cycles do not cross,
(ii) each cycle encloses a region between the circles homeomorphic to the disc with
boundary oriented clockwise, and
(iii) at least one cycle connects the two circles.
We denote by SNC (p, q) the set of non-crossing permutations on the (p, q)-annulus.
The subset consisting of non-crossing pairings on the (p, q)-annulus is denoted by
NC2 (p, q).

Example 6. Let p = 5 and q = 3 and π1 = (1, 2, 8, 6, 5)(3, 4, 7) and π2 = (1, 2, 8, 6, 5)

(3, 7, 4). Then π1 is a non-crossing permutation of the (5, 3)-annulus; we can find a
drawing which satisfies (i) and (ii) of Definition 5:
132 5 Second-Order Freeness

1
5 2
6 8
7
4 3

But for π2 we can find a drawing satisfying one of (i) or (ii) but not both. Notice also
that if we try to draw π1 on a disc we will have a crossing, so π1 is non-crossing on
the annulus but not on the disc. See also Fig. 5.2.
Notice that when we have a partition π of [n] and we want to know if π is non-
crossing in the disc sense, property (ii) of Definition 5 is automatic because we
always put the elements of the blocks of π in increasing order.
Remark 7. Note that in general we have to distinguish between non-crossing an-
nular permutations and the corresponding partitions. On the disc the non-crossing
condition ensures that for each π ∈ NC(n) there is exactly one corresponding non-
crossing permutation (by putting the elements in a block of π in increasing order to
read it as a cycle of a permutation). On the annulus, however, this one-to-one cor-
respondence breaks down. Namely, if π ∈ SNC (p, q) has only one through-cycle (a
through-cycle is a cycle which contains elements from both circles), then the block
structure of this cycle is not enough to recover its cycle structure. For example, in
SNC (2, 2) we have the following four non-crossing annular permutations:

(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 4, 2), (1, 4, 3, 2).

As partitions all four are the same, having one block {1, 2, 3, 4}; but as permutations
they are all different. It is indeed the permutations, and not the partitions, which are
relevant for the description of the fluctuations. One should, however, also note that
this difference disappears if one has more than one through-cycle. Also for pairings
there is no difference between non-crossing annular permutations and partitions.
This justifies the notation NC2 (p, q) in this case.

Exercise 1. (i) Let π1 and π2 be two non-crossing annular permutations in SNC (p, q),
which are the same as partitions. Show that if they have more than one through-
cycle, then π1 = π2 .
(ii) Show that the number of non-crossing annular permutations which are the
same as partitions is, in the case of one through-cycle, given by mn, where m and n
are the number of elements of the through-cycle on the first and the second circle,
respectively.

Theorem 8. Let γ = (1, 2, 3, . . . , p)(p + 1, . . . , p + q) and π ∈ S p+q be a permutation

that has at least one cycle that connects the two cycles of γ. Then π is a non-crossing
permutation of the (p, q)-annulus if and only if #(π) + #(π −1 γ) = p + q.
5.1 Fluctuations of GUE random matrices 133

1 1
8 2
5 2
6
7 3
7 8

6 4
4 3
5
Fig. 5.2 Consider the permutation π = (1, 5)(2, 6)(3, 4, 7, 8). As a disc permutation, it cannot be
drawn in a non-crossing way. However on the (5, 3)-annulus it has a non-crossing presentation.
Note that we have π −1 γ5,3 = (1, 6, 4)(2, 8)(3)(5)(7). So #(π) + #(π −1 γ8 ) = 8.

Proof: We must show that the topological property of Definition 5 is equivalent to

the algebraic property #(π) + #(π −1 γ) = p + q. A similar equivalence was given in
Theorem 4; and we shall use this equivalence to prove Theorem 8.
To begin let us observe that if π is a non-crossing partition of [p + q] we can
deform the planar drawing for π on the disc into a drawing on the annulus satisfying
the two first conditions of Definition 5 as follows: we deform the disc so that it
appears as an annulus with a channel with one side between p and p + 1 and the
other between p + q and 1. We then close the channel and obtain a non-crossing
permutation of the (p, q)-annulus.

1
8 2 1 1
2 5 2
7 3 5 8 6 8
6
7 7
6 4
5 4 3 4 3

We have thus shown that every non-crossing partition of [p + q] that satisfies

the connectedness condition gives a non-crossing annular permutation of the (p, q)-
annulus. We now wish to reverse the procedure.
So let us start with π a non-crossing permutation of the (p, q)-annulus. We chose
i such that i and π(i) are on different circles, in fact we can assume that 1 ≤ i ≤ p
and p + 1 ≤ π(i) ≤ p + q. Such an i always exists because π always has at least one
cycle that connects the two circles. We then cut the annulus by making a channel
from i to γ −1 π(i). In the illustration below i = 4.
134 5 Second-Order Freeness

1
1 1 5 2
5 2 5 2
6 8 6 8 6 3
7 7
8 4
4 3 4 3
7

Hence our π is non-crossing in the disc; however, the order of the points on the
disc produced by cutting the annulus is not the standard order – it is the order given
by

γ̃ = γ (i, γ −1 π(i))
= (1, . . . , i, π(i), γ(π(i)), . . . , p + q, p + 1, . . . , γ −1 (π(i)), γ(i), . . . , p).

Thus we must show that for i and π(i) on different circles the following are equiva-
lent
(a) π is non-crossing in the disc with respect to γ̃ = γ (i, γ −1 π(i)), and
(b) #(π) + #(π −1 γ) = p + q.
If i and π(i) are in different cycles of γ, then i and π −1 γ(i) are in the same
cycle of π −1 γ. Hence #(π −1 γ (i, π −1 γ(i))) = #(π −1 γ) + 1. Thus #(π) + #(π −1 γ̃)
= #(π) + #(π −1 γ) + 1. Since γ̃ has only one cycle we know, by Theorem 4, that π
is non-crossing with respect to γ̃ if and only if #(π) + #(π −1 γ̃) = p + q + 1. Thus π
is non-crossing with respect to γ̃ if and only if #(π) + #(π −1 γ) = p + q. This shows
the equivalence of (a) and (b).
This result is part of a more general theory of maps on surfaces found by Jacques
[102] and Cori [61]. Suppose we have two permutations π and γ in Sn and that π
and γ generate a subgroup of Sn that acts transitively on [n]. Suppose also that γ has
k cycles and we draw k discs on a surface of genus g and arrange the points in the
cycles of γ around the circles so that when viewed from the outside the numbers
appear in the same order as in the cycles of γ. We then draw the cycles of π on the
surface such that
◦ the cycles do not cross, and
◦ each cycle of π is the oriented boundary of a region on the sphere, oriented with
an outward pointing normal, homeomorphic to a disc.
The genus of π relative to γ is the smallest g such that the cycles of π can be drawn
on a surface of genus g. When g = 0, i.e. we can draw π on a sphere, we say that π
is γ-planar.
In the example below we let n = 3, γ = (1, 2, 3) and, in the first example π1 =
(1, 2, 3) and in the second π2 = (1, 3, 2).
5.1 Fluctuations of GUE random matrices 135

2 2
3 1 3 1
Since π1 and π2 have only one cycle there is no problem with the blocks crossing;
it is only to get the correct orientation that we must add a handle for π2 .

Theorem 9. Suppose π, γ ∈ Sn generate a subgroup which acts transitively on [n]

and g is the genus of π relative to γ. Then

#(π) + #(π −1 γ) + #(γ) = n + 2(1 − g). (5.5)

Sketch The idea of the proof is to use Euler’s formula for the surface of genus
g on which we have drawn the cycles of π, as in the definition. Each cycle of γ is
a disc numbered according to γ and we shrink each of these to a point to make the
vertices of our simplex. Thus V = #(γ). The resulting surface will have one face for
each cycle of π and one for each cycle of π −1 γ. Thus F = #(π) + #(π −1 γ). Finally
the edges will be the boundaries between the cycles of π and the cycles of π −1 γ,
there will be n of these. Thus 2(1 − g) = F − E +V = #(π) + #(π −1 γ) − n + #(γ).

Remark 10. The requirement that the subgroup generated by π and γ act transitively
is needed to get a connected surface. In the disconnected case we can replace 2(1 −
g) by the Euler characteristic of the union of the surfaces.

Now let us return to our discussion of the second-order limiting distribution of

the GUE.

Theorem 11. Let {XN }N be the GUE. Then {XN }N has a second-order limiting
distribution with fluctuation moments {α p,q } p,q where α p,q is the number of non-
crossing pairings on a (p, q)-annulus.

Proof: We have already seen in Theorem 1.7, that

αk = lim E(tr(XNk ))
N

exists for all k, and is given by the number of non-crossing pairings of [k]. Let us
next fix r ≥ 2 and positive integers p1 , p2 , . . . , pr and we shall find a formula for
kr (Tr(XNp1 ), Tr(XNp2 ), . . . , Tr(XNpr )).
We shall let p = p1 + p2 + · · · + pr and γ be the permutation in S p with the r
cycles
136 5 Second-Order Freeness

γ = (1, 2, 3, . . . , p1 )(p1 + 1, . . . , p1 + p2 ) · · · (p1 + · · · + pr−1 + 1, . . . , p1 + · · · + pr ).

Now, with XN = ( fi j )Ni,j=1 ,

E(Tr(XNp1 ) · · · Tr(XNpr )) = ∑ E( fi1 ,i2 fi2 ,i3 · · · fi p1 ,i1 · fi p1 +1 ,i p1 +2 · · · fi p1 +p2 ,i p1 +1 × · · ·

· · · × fi p1 +···+pr−1 +1 ,i p1 +···+pr−1 +2 · · · fi p1 +···+pr ,i p1 +···+pr−1 +1 )
= ∑ E( fi1 ,iγ(1) · · · fi p ,iγ(p) )

because the indices of the f ’s follow the cycles of γ.

Recall that Wick’s formula (1.8) tells us how to calculate the expectation of a
product of Gaussian random variables. In particular, the expectation will be 0 unless
the number of factors is even. Thus we must have p even and

E( fi1 ,iγ(1) · · · fi p ,iγ(p) ) = ∑ Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) ).

π∈P2 (p)

Given a pairing π and a pair (s,t) of π, E( fis ,iγ(s) fit ,iγ(t) ) will be 0 unless is = iγ(t) and
it = iγ(s) . Following our usual convention of regarding partitions as permutations and
a p-tuple (i1 , . . . , i p ) as a function i : [p] → [N], this last condition can be written as
i(s) = i(γ(π(s))) and i(t) = i(γ(π(t))). Thus for Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) ) to be non-
zero we require i = i◦γ ◦π, or the function i to be constant on the cycles of γπ. When
Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) ) 6= 0 it equals N −p/2 (by our normalization of the variance,
E(| fi j |2 ) = 1/N). An important quantity will then be the number of functions i :
[p] → [N] that are constant on the cycles of γπ; since we can choose the value of the
function arbitrarily on each cycle this number is N #(γπ) . Hence
N
E(Tr(XNp1 ) · · · Tr(XNpr )) = ∑ ∑ Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) )
i1 ,...,i p =1 π∈P2 (p)
N
= ∑ ∑ Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) )
π∈P2 (p) i1 ,...,i p =1

N −p/2 · # {i : [p] → [N] | i = i ◦ γ ◦ π}

= ∑
π∈P2 (p)

= ∑ N #(γπ)−p/2 .
π∈P2 (p)

The next step is to find which pairings π contribute to the cumulant kr . Recall
that if Y1 , . . . ,Yr are random variables then

kr (Y1 , . . . ,Yr ) = ∑ Eσ (Y1 ,Y2 , . . . ,Yr ) µ(σ , 1r )

σ ∈P (r)

where µ is the Möbius function of the partially ordered set P(r), see Exercise 1.14.
If σ is a partition of [r] there is an associated partition σ̃ of [p] where each block of
5.1 Fluctuations of GUE random matrices 137

σ̃ is a union of cycles of γ, in fact if s and t are in the same block of σ then the rth
and sth cycles of γ

(p1 + · · · + ps−1 + 1, . . . , p1 + · · · + ps ) and (p1 + · · · + pt−1 + 1, . . . , p1 + · · · + pt )

are in the same block of σ̃ . Using the same calculation as was used above we have
for σ ∈ P(r)

Eσ (Tr(XNp1 ), . . . , Tr(XNpr )) = ∑ N #(γπ)−p/2 .

π∈P2 (p)
π≤σ̃

Now given π ∈ P(p) we let π̂ be the partition of [r] such that s and t are in the same
block of π̂ if there is a block of π that contains both elements of sth and t th cycles
of π. Thus

kr (Tr(XNp1 ), . . . , Tr(XNpr )) = ∑ µ(σ , 1r ) ∑ N #(γπ)−p/2

σ ∈P (r) π∈P2 (p)
π≤σ̃

= ∑ N #(γπ)−p/2 ∑ µ(σ , 1r ).
π∈P2 (p) σ ∈P (r)
σ ≥π̂

A fundamental fact of the Möbius function is that for an interval [σ1 , σ2 ] in P(r)
we have ∑σ1 ≤σ ≤σ2 µ(σ , σ2 ) = 0 unless σ1 = σ2 in which case the sum is 1. Thus
∑σ ≥π̂ µ(σ , 1r ) = 0 unless π̂ = 1r in which case the sum is 1. Hence

kr (Tr(XNp1 ), . . . , Tr(XNpr )) = ∑ N #(γπ)−p/2 .

π∈P2 (p)
π̂=1r

When π̂ = 1r the subgroup generated by γ and π acts transitively on [p] and thus
Euler’s formula (5.5) can be applied. Thus for the π which appear in the sum we
have

#(γπ) = #(π −1 γ)
= p + 2(1 − g) − #π − #γ
= p + 2(1 − g) − p/2 − r
= p/2 + 2(1 − g) − r,

and thus #(γπ) − p/2 = 2 − r − 2g. So the leading order of kr , corresponding to the
γ-planar π, is given by N 2−r . Taking the limit N → ∞ gives the assertion. It shows
that kr goes to zero for r > 2, and for r = 2 the limit is given by the number of
γ-planar π, i.e., by #(NC2 (p, q)).
138 5 Second-Order Freeness

5.2 Fluctuations of several matrices

Up to now we have looked on the limiting second-order distribution of one GUE
random matrix. One can generalize those calculations quite easily to the case of
several independent GUE random matrices.
(N) (N)
Exercise 2. Suppose X1 , . . . , Xs are s independent N × N GUE random matrices.
Then we have, for all p, q ≥ 1 and for all 1 ≤ r1 , . . . , r p+q ≤ s that
(N) (N) (N) (N) (r)
lim k2 Tr(Xr1 · · · Xr p ), Tr(Xr p+1 · · · Xr p+q ) = # NC2 (p, q) ,
N

(r)
where NC2 (p, q) denotes the non-crossing annular pairings which respect the
colour, i.e., those π ∈ NC2 (p, q) such that (k, l) ∈ π only if rk = rl . Furthermore,
all higher order cumulants of unnormalized traces go to zero.
Maybe more interesting is the situation where we also include deterministic ma-
trices. Similarly to the first order case, we expect to see some second-order freeness
structure appearing there. Of course, the calculation of the asymptotic fluctuations
of mixed moments in GUE and deterministic matrices will involve the (first order)
limiting distribution of the deterministic matrices. Let us first recall what we mean
by this.
Definition 12. Suppose that we have, for each N ∈ N, deterministic N × N matrices
(N) (N)
D1 , . . . , Ds ∈ MN (C) and a non-commutative probability space (A, ϕ) with ele-
ments d1 , . . . , ds ∈ A such that we have for each polynomial p ∈ Chx1 , . . . , xs i in s
non-commuting variables
(N) (N)
lim tr p(D1 , . . . , Ds ) = ϕ p(d1 , . . . , ds ) .
N

(N) (N)
Then we say that (D1 , . . . , Ds )N has a limiting distribution given by (d1 , . . . , ds ) ∈
(A, ϕ).
(N) (N)
Theorem 13. Suppose X1 , . . . , Xs are s independent N × N GUE random matri-
(N) (N)
ces. Fix p, q ≥ 1 and let {D1 , . . . , D p+q } ⊆ MN (C) be deterministic N × N matri-
ces with limiting distribution given by d1 , . . . , d p+q ∈ (A, ϕ). Then we have for all
1 ≤ r1 , . . . , r p+q ≤ s that

(N) (N) (N) (N) (N) (N) (N) (N)
lim k2 Tr(D1 Xr1 · · · D p Xr p ), Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )
N
= ∑ ϕγ p,q π (d1 , . . . , d p+q ),
(r)
π∈NC2 (p,q)

where the sum runs over all π ∈ NC2 (p, q) such that (k, l) ∈ π only if rk = rl and
where
γ p,q = (1, . . . , p)(p + 1, . . . , p + q) ∈ S p+q . (5.6)
5.2 Fluctuations of several matrices 139

Proof: Let us first calculate the expectation of the product of the two traces. For
better legibility, we suppress in the following the upper index N. We write as usual
(N) (r) (N) (p) (r)
Xr = ( fi j ) and D p = (di j ). We will denote by P2 (p + q) the pairings of
(r)
[p +q] which respect the colour r = (r1 , . . . , r p+q ), and by P2,c (p +q) the pairings in
(r)
P2 (p+q) where at least one pair connects a point in [p] to a point in [p+1, p+q] =
{p + 1, p + 2, . . . , p + q}.

E Tr(D1 Xr1 · · · D p Xr p )Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )

(1) (r ) (2) (p) (r ) (p+1) (p+q) (r p+q )
= ∑ E di1 j1 f j1 i12 di2 i3 · · · di p j p f j p ip1 · di p+1 j p+1 · · · di p+q j p+q f j p+q i p+1
i1 ,...,i p+q
j1 ,..., j p+q

(r ) (r p+q ) (1) (p+q)
= ∑ E f j1 i12 · · · f j p+q i p+1 di1 j1 · · · di p+q j p+q
i1 ,...,i p+q
j1 ,..., j p+q
(1) (p+q)
= ∑ ∑ N −(p+q)/2 δ j,i◦γ p,q ◦π · di1 j1 · · · di p+q j p+q
i1 ,...,i p+q (r)
π∈P2 (p+q)
j1 ,..., j p+q
(1) (p+q)
= N −(p+q)/2 ∑ ∑ di1 j1 · · · di p+q j p+q
(r) i ,...,i p+q
π∈P2 (p+q) 1
j1 ,..., j p+q
j=i◦γ p,q ◦π
(1) (p+q)
= N −(p+q)/2 ∑ ∑ di1 ,iγ · · · di p+q ,iγ
p,q ◦π(1) p,q ◦π(p+q)
(r) i ,...,i p+q
π∈P2 (p+q) 1

= N −(p+q)/2 ∑ Trγ p,q π (D1 , . . . , D p+q ).

(r)
π∈P2 (p+q)

Thus, by subtracting the disconnected pairings, we get for the covariance

k2 (Tr(D1 Xr1 · · · D p Xr p ),Tr(D p+1 Xr p+1 · · · D p+q Xr p+q ))

= ∑ N −(p+q)/2 Trγ p,q π (D1 , . . . , D p+q )
(r)
π∈P2,c (p+q)

= ∑ N #(γ p,q π)−(p+q)/2 trγ p,q π (D1 , . . . , D p+q ).

(r)
π∈P2,c (p+q)

For π ∈ P2,c (p + q) we have #(π) + #(γ p,q π) + #(γ p,q ) = p + q + 2(1 − g), and
hence #(γ p,q π) − p+q
2 = −2g. The genus g is always ≥ 0 and equal to 0 only when
π is non-crossing. Thus

k2 (Tr(D1 Xr1 · · · D p Xr p ), Tr(D p+1 Xr p+1 · · · D p+q Xr p+q ))

= ∑ trγ p,q π (D1 , . . . , D p+q ) + O(N −1 ),
(r)
π∈NC2 (p,q)
140 5 Second-Order Freeness

and the assertion follows by taking the limit N → ∞.

Remark 14. Note that Theorem 13 shows that the variance of the corresponding nor-
malized traces is O(N −2 ). Indeed the theorem shows that the variance of the unnor-
malized traces converges, so by normalizing the trace we get that the variance of the
normalized traces decreases like N −2 . This proves then the almost sure convergence
claimed in Theorem 4.4.
(N) (N)
We would like to replace the deterministic matrices D1 , . . . , D p+q in Theorem
13 by random matrices and see if we can still conclude that the variances of the
normalized mixed traces decrease like N −2 . As was observed at the end of Section
4.2 we have to assume more than just the existence of a limiting distribution of the
D(N) ’s. In the following definition we isolate this additional property.
(N) (N)
Definition 15. We shall say the random matrix ensemble {D1 , . . . , D p }N has
bounded higher cumulants if we have for all r ≥ 2 and for any unnormalized traces
(N) (N)
Y1 , . . . ,Yr of monomials in D1 , . . . , D p that

sup |kr (Y1 , . . . ,Yr )| < ∞.

Note that this is a property of the algebra generated by the D’s. We won’t prove
it here but for many examples we have kr (Y1 , . . . ,Yr ) = O(N 2−r ) with the Yi ’s as
above. These examples include the GUE, Wishart, and Haar distributed unitary ran-
dom matrices.
(N) (N)
Theorem 16. Suppose X1 , . . . , Xs are s independent N × N GUE random matri-
(N) (N)
ces. Fix p, q ≥ 1 and let {D1 , . . . , D p+q } ⊆ MN (C) be random N × N matrices
with a limiting distribution and with bounded higher cumulants. Then we have for
all 1 ≤ r1 , . . . , r p+q ≤ s that

(N) (N) (N) (N) (N) (N) (N) (N)
k2 tr(D1 Xr1 · · · D p Xr p ), tr(D p+1 Xr p+1 · · · D p+q Xr p+q ) = O(N −2 ).

Proof: We rewrite the proof of Theorem 13 with the change that the D’s are now
random to get

E Tr(D1 Xr1 · · · D p Xr p )Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )

= N −(p+q)/2

∑ E Trγ p,q π (D1 , . . . , D p+q ) ,
(r)
π∈P2 (p+q)

and

E Tr(D1 Xr1 · · · D p Xr p ) · E Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )
= N −(p+q)/2 ∑ E Trγ p π1 (D1 , . . . , D p ) · E Trγq π2 (D p+1 , . . . , D p+q ) .

(r)
π1 ∈P2 (p)
(r)
π2 ∈P2 (q)
5.2 Fluctuations of several matrices 141

Here, γ p denotes as usual the one cycle permutation γ p = (1, 2, . . . , p) ∈ S p , and

(r) (r)
similar for γq . We let P2,d (p+q) be the pairings in P2 (p+q) which do not connect
(r) (r) (r)
[p] to [p + 1, p + q]. Then we can write P2 (p + q) = P2,c (p + q) ∪ P2,d (p + q), as
(r) (r) (r)
a disjoint union. Moreover we can identify P2,d (p + q) with P2 (p) × P2 (q).
Thus, by subtracting the disconnected pairings, we get for the covariance

k2 Tr(D1 Xr1 · · · D p Xr p ), Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )

∑ N −(p+q)/2 E Trγ p,q π (D1 , . . . , D p+q )

=
(r)
π∈P2,c (p+q)

N −(p+q)/2 E Trγ p,q π1 π2 (D1 , . . . , D p+q )

+ ∑
(r)
π1 ∈P2 (p)
(r)
π2 ∈P2 (q)

N −(p+q)/2 E Trγ p π1 (D1 , . . . , D p ) · E Trγq π2 (D p+1 , . . . , D p+q )

− ∑
(r)
π1 ∈P2 (p)
(r)
π2 ∈P2 (q)

N −(p+q)/2 E Trγ p,q π (D1 , . . . , D p+q )

= ∑
(r)
π∈P2,c (p+q)

N −(p+q)/2 k2 Trγ p π1 (D1 , . . . , D p ), Trγq π2 (D p+1 , . . . , D p+q ) .

+ ∑
(r)
π1 ∈P2 (p)
(r)
π2 ∈P2 (q)

We shall show that both of these terms are O(1), and thus after normalizing the
traces k2 = O(N −2 ). For the first term this is the same argument as in the proof
(r) (r)
of Theorem 5.13. So let π1 ∈ P2 (p) and π2 ∈ P2 (q). We let s = #(γ p π1 ) and
t = #(γq π2 ). Since γ p π1 has s cycles we may write Trγ p π1 (D1 , . . . , D p ) = Y1 · · ·Ys with
each Yi of the form Tr(Dl1 · · · Dlk ). Likewise since γq π2 has t cycles we may write
Trγq π2 (D p+1 , . . . , D p+q ) = Ys+1 · · ·Ys+t with the Y ’s of the same form as before. Now
by our assumption on the D’s we know that for u ≥ 2 we have ku (Yi1 , . . . ,Yiu ) = O(1).
Using the product formula for classical cumulants, see Equation (1.16), we have that

k2 (Y1 · · ·Ys ,Ys+1 · · ·Ys+t ) = ∑ kτ (Y1 , . . . ,Ys+t )

τ∈P (s+t)

where τ must connect [s] to [s+1, s+t]. Now kτ (Y1 , . . . ,Ys+t ) = O(N c ) where c is the
number of singletons in τ. Thus the order of N −(p+q)/2 kτ (Y1 , . . . ,Ys+t ) is N c−(p+q)/2 .
So we are reduced to showing that c ≤ (p+q)/2. Since τ connects [s] to [s+1, s+t],
τ must have a block with at least 2 elements. Thus the number of singletons is at
142 5 Second-Order Freeness

most s + t − 2. But s = #(γ p π1 ) ≤ p/2 + 1 and t = #(γq π2 ) ≤ q/2 + 1 by Corol-

lary 1.6. Thus c ≤ (p + q)/2 as claimed.

5.3 second-order probability space and second-order freeness

Recall that a non-commutative probability space (A, ϕ) consists of an algebra over
C and a linear functional ϕ : A → C, with ϕ(1) = 1. Such a non-commutative prob-
ability space is called tracial, if ϕ is also a trace, i.e., if ϕ(ab) = ϕ(ba) for all
a, b ∈ A.
To provide the general framework for second-order freeness we introduce now
the idea of a second-order probability space, (A, ϕ, ϕ2 ).

Definition 17. Let (A, ϕ) be a tracial non-commutative probability space. Suppose

that we have in addition a bilinear functional ϕ2 : A × A → C such that
◦ ϕ2 is symmetric in its two variables, i.e., we have ϕ2 (a, b) = ϕ2 (b, a) for all
a, b ∈ A
◦ ϕ2 is tracial in each variable
◦ ϕ2 (1, a) = 0 = ϕ2 (a, 1) for all a ∈ A.
Then we say that (A, ϕ, ϕ2 ) is a second-order non-commutative probability space.

Usually our second-order limit elements will arise as limits of random matrices;
where ϕ encodes the asymptotic behaviour of the expectation of traces, whereas
ϕ2 does the same for the covariances of traces. As we have seen before, in typical
examples (as the GUE) we should consider the expectation of the normalized trace
tr, but the covariances of the unnormalized traces Tr.
As we have seen in Theorem 16 one usually also needs some control over the
higher order cumulants; requiring bounded higher cumulants for the unnormalized
traces of the D’s was enough to control the variances of the mixed unnormalized
traces. However, as in the case of one matrix (see Definition 2), we will in the
following definition require instead of boundedness of the higher cumulants the
stronger condition that they converge to zero. This definition from [128] makes some
arguments easier, and is usually satisfied in all relevant random matrix models. Let
us point out that, as remarked in [127], the whole theory could also be developed
with the boundedness condition instead.
(N) (N)
Definition 18. Suppose we have a sequence of random matrices {A1 , . . . , As }N
and random variables a1 , . . . , as in a second-order non-commutative probability
(N) (N)
space. We say that (A1 , . . . , As )N has the second order limit (a1 , . . . , as ) if we
have:
◦ for all p ∈ Chx1 , . . . , xs i

(N) (N)
lim E tr(p(A1 , . . . , As )) = ϕ p(a1 , . . . , as ) ;
N
5.3 second-order probability space and second-order freeness 143

◦ for all p1 , p2 ∈ Chx1 , . . . , xs i

(N) (N) (N) (N)
lim cov Tr(p1 (A1 , . . . , As )), Tr(p2 (A1 , . . . , As )) =
N

ϕ2 p1 (a1 , . . . , as ), p2 (a1 , . . . , as ) ;

◦ for all r ≥ 3 and all p1 , . . . , pr ∈ Chx1 , . . . , xs i

(N) (N) (N) (N)
lim kr Tr(p1 (A1 , . . . , As )), . . . , Tr(pr (A1 , . . . , As )) = 0.
N

Remark 19. As in Remark 3, the second condition implies that we have almost sure
(N) (N)
convergence of the (first order) distribution of the {A1 , . . . , As }N . So in particu-
lar, if the a1 , . . . , as are free, then the existence of a second-order limit includes also
(N) (N)
the fact that A1 , . . . , As are almost surely asymptotically free.
Example 20. A trivial example of second-order limit is given by deterministic ma-
(N) (N)
trices. If {D1 , . . . , Ds } are deterministic N × N matrices with limiting distribu-
tion then kr (Y1 , . . . ,Yr ) = 0 for r > 1 and for any polynomials Yi in the D’s. So
(N) (N)
D1 , . . . , Ds has a second-order limiting distribution; ϕ is given by the limiting
distribution and ϕ2 is identically zero.
Example 21. Define (A, ϕ, ϕ2 ) by A = Chsi and

ϕ(sk ) = # NC2 (k) ϕ2 (s p , sq ) = # NC2 (p, q) .

and (5.7)

Then (A, ϕ, ϕ2 ) is a second-order probability space and s is, by Theorem 11, the
second-order limit of a GUE random matrix. In first order s is, of course, just a
semi-circular element in (A, ϕ). We will address a distribution given by (5.7) as a
second-order semi-circle distribution.

Exercise 3. Prove that the second-order limit of a Wishart random matrix with rate
c (see Section 4.5.1) is given by (A, ϕ, ϕ2 ) with A = Chxi and

ϕ(xn ) = ∑ c#(π) and ϕ2 (xm , xn ) = ∑ c#(π) . (5.8)

π∈NC(n) π∈SNC (m,n)

We will address a distribution given by (5.8) as a second-order free Poisson distri-

bution (of rate c).

Example 22. Define (A, ϕ, ϕ2 ) by A = Chu, u−1 i and, for k, p, q ∈ Z,

(
k 0, k 6= 0
ϕ(u ) = and ϕ2 (u p , uq ) = |p|δ p,−q . (5.9)
1, k = 0

Then (A, ϕ, ϕ2 ) is a second-order probability space and u is the second-order limit

of Haar distributed unitary random matrices. In first order u is of course just a Haar
144 5 Second-Order Freeness

unitary in (A, ϕ). We will address a distribution given by (5.9) as a second-order

Haar unitary.

Exercise 4. Prove the statement from the previous example: Show that for Haar
distributed N × N unitary random matrices U we have
(
|p|, if p = −q
lim k2 (Tr(U p ), Tr(U q )) =
N 0, otherwise

and that the higher order cumulants of unnormalized traces of polynomials in U and
U ∗ go to zero.

Example 23. Let us now consider the simplest case of several variables, namely the
limit of s independent GUE. According to Exercise 2 their second-order limit is given
by (A, ϕ, ϕ2 ) where A = Chx1 , . . . , xs i and
(r)
ϕ xr(1) · · · xr(k) = # NC2 (k)

and (r)
ϕ2 xr(1) · · · xr(p) , xr(p+1) · · · xr(p+q) = #(NC2 (p, q)).
In the same way as we used in Chapter 1 the formula for ϕ as our guide to the defi-
nition of the notion of freeness we will now have a closer look on the corresponding
formula for ϕ2 and try to extract from this a concept of second-order freeness.
As in the first order case, let us consider ϕ2 applied to alternating products of
centred variables, i.e., we want to understand
m n
ϕ2 (xim1 1 − cm1 1) · · · (xi p p − cm p 1), (xnj11 − cn1 1) · · · (x jqq − cnq 1) ,

where cm := ϕ(xim ) (which is independent of i). The variables are here assumed to
be alternating in each argument, i.e., we have

i1 6= i2 6= · · · 6= i p−1 6= i p and j1 6= j2 6= · · · =
6 jq−1 6= jq .

In addition, since the whole theory relies on ϕ2 being tracial in each of its arguments
(as the limit of variances of traces) we will actually assume that it is alternating in a
cyclic way, i.e., that we also have i p 6= i1 and jq 6= j1 .
Let us put m := m1 + · · · + m p and n := n1 + · · · + nq . Furthermore, we call the
consecutive numbers corresponding to the factors in our arguments “intervals”; so
the intervals on the first circle are

(1, . . . , m1 ), (m1 + 1, . . . , m1 + m2 ), . . . , (m1 + · · · + m p−1 + 1, . . . , m),

and the intervals on the second circle are

(m + 1, . . . , m + n1 ), . . . , (m + n1 + · · · + nq−1 + 1, . . . , m + n).
5.3 second-order probability space and second-order freeness 145

By the same arguing as in Chapter 1 one can convince oneself that the subtraction
of the means has the effect that instead of counting all π ∈ NC2 (m, n) we count
now only those where each interval is connected to at least one other interval. In
the first order case, because of the non-crossing property, there were no such π and
the corresponding expression was zero. Now, however, we can connect an interval
from one circle to an interval of the other circle, and there are possibilities to do this
m n
in a non-crossing way. Renaming ak := xik k − cmk 1 and bl := x jll − cnl 1 leads then
exactly to the formula which will be our defining property of second-order freeness
in the next definition.

Definition 24. Let (A, ϕ, ϕ2 ) be a second-order non-commutative probability space

and (Ai )i∈I a family of unital subalgebras of A. We say that (Ai )i∈I are free of
second order if (i) the subalgebras (Ai )i∈I are free of first order, i.e., in the sense of
§1.11 and (ii) the fluctuation moments of centred and cyclically alternating elements
can be computed from their ordinary moments in the following way. Recall that
a1 , . . . , an ∈ ∪i Ai are cyclically alternating if ai ∈ A ji and j1 6= j2 6= · · · jn 6= j1 .
The second condition (ii) is that given two tuples a1 , . . . , am and b1 , . . . , bn which are
centred and cyclically alternating then for (m, n) 6= (1, 1)
n−1 n
ϕ2 (a1 · · · am , b1 · · · bn ) = δmn ∑ ∏ ϕ(ai bk−i ), (5.10)
k=0 i=1

where the indices of bi are interpreted modulo n; when m = n = 1 we have

ϕ2 (a1 , b1 ) = 0 if a1 and b1 come from different Ai ’s.
second-order freeness for random variables or for sets is, as usual, defined as
second-order freeness for the unital subalgebras generated by the variables or the
sets, respectively.

Equation (5.10) has the following diagrammatic interpretation. A non-crossing

permutation of an (m, n)-annulus is called a spoke diagram if all cycles have just
two elements (i, j), and the elements are on different circles, i.e. i ∈ [m] and j ∈
[m + 1, m + n], see Fig. 5.3. We can only have a spoke diagram if m = n; the set of
spoke diagrams is denoted by Sp(n). With this notation, Equation (5.10) can also be
written as

ϕ2 (a1 · · · am , b1 · · · bn ) = δmn ∑ ϕπ (a1 , . . . , am , b1 , . . . , bn ). (5.11)

π∈Sp(n)

Exercise 5. Let A1 , A2 ⊂ A be free of second order in (A, ϕ, ϕ2 ) and consider

a1 , a2 ∈ A1 and b1 , b2 ∈ A2 . Show that the definition of second-order freeness im-
plies the following formula for the first non-trivial mixed fluctuation moment:

ϕ2 (a1 b1 , a2 b2 ) = ϕ(a1 a2 )ϕ(b1 b2 ) − ϕ(a1 a2 )ϕ(b1 )ϕ(b2 )

− ϕ(a1 )ϕ(a2 )ϕ(b1 b2 ) + ϕ(a1 )ϕ(a2 )ϕ(b1 )ϕ(b2 )
146 5 Second-Order Freeness

6
2
7
8
12

9 11
10
5
3

Fig. 5.3 The spoke diagram for π = (1, 8)(2, 7)(3, 12)(4, 11)(5, 10)(6, 9). For this permutation we
have ϕπ (a1 , . . . , a12 ) = ϕ(a1 a8 )ϕ(a2 a7 )ϕ(a3 a12 )ϕ(a4 a11 )ϕ(a5 a10 )ϕ(a6 a9 ).

+ ϕ2 (a1 , a2 )ϕ(b1 )ϕ(b2 ) + ϕ(a1 )ϕ(a2 )ϕ2 (b1 , b2 ).

Let us define now the asymptotic version of second-order freeness.

(N) (N) (N) (N)
Definition 25. We say {A1 , . . . , As }N and {B1 , . . . , Bt }N are asymptotically
free of second order if there is a second-order non-commutative probability space
(A, ϕ, ϕ2 ) and elements a1 , . . . , as , b1 , . . . , bt ∈ A such that
(N) (N) (N) (N)
◦ (A1 , . . . , As , B1 , . . . , Bt )N has a second-order limit (a1 , . . . , as , b1 , . . . , bt )
◦ {a1 , . . . , as } and {b1 , . . . , bt } are free of second order.
Remark 26. Note that asymptotic freeness of second order is much stronger than
having almost sure asymptotic freeness (of first order). According to Remark 19,
we can guarantee the latter by the existence of a second order limit plus freeness
of first order in the limit. Having also freeness of second order in the limit makes a
much more precise statement on the asymptotic structure of the covariances.
In Example 23 we showed that several independent GUE random matrices are
asymptotically free of second order. The same is also true if we include deterministic
matrices. This follows from the explicit description in Theorem 13 of the second
order limit in this case. We leave the proof of this as an exercise.
(N) (N)
Theorem 27. Let {X1 , . . . , Xs }N be s independent GUE’s and, in addition, let
(N) (N) (N)
{D1 , . . . , Dt }N be t deterministic matrices with limiting distribution. Then X1 , . . . ,
(N) (N) (N)
Xs , {D1 , . . . , Dt } are asymptotically free of second order.

Exercise 6. Prove Theorem 27 by using the explicit formula for the second-order
limit distribution given in Theorem 13.

Exercise 7. Show that Theorem 27 remains also true if the deterministic matrices are
replaced by random matrices which are independent from the GUE’s and which have
5.4 second-order cumulants 147

a second-order limit distribution. For this, upgrade first Theorem 16 to a situation

(N) (N)
where {D1 , . . . , D p+q } have a second-order limit distribution.
As in the first order case one can also show that Haar unitary random matrices are
asymptotically free of second order from deterministic matrices and, more generally,
from random matrices which have a second-order limit distribution and which are
independent from the Haar unitary random matrices; this can then be used to deduce
the asymptotic freeness of second-order between unitarily invariant ensembles. The
calculations in the Haar case rely again on properties of the Weingarten functions
and get a bit technical. Here we only state the results, we refer to [128] for the details
of the proof.
(k)
Definition 28. Let B1 , . . . , Bt be t N × N random matrices with entries bi j (k =
(k)
1, . . . ,t; i, j = 1, . . . , N). Let U ∈ UN be unitary and UBkU ∗ = (b̃i j )Ni,j=1 . If the joint
(k)
distribution of all entries {bi j | k = 1, . . . ,t; i, j = 1, . . . , N} is, for each U ∈ UN ,
(k)
the same as the joint distribution of all entries in the conjugated matrices {b̃i j |
k = 1, . . . ,t; i, j = 1, . . . , N}, then we say that the joint distribution of the entries of
B1 , . . . , Bt is unitarily invariant.
(N) (N) (N) (N)
Theorem 29. Let {A1 , . . . , As }N and {B1 , . . . , Bt }N be two ensembles of ran-
dom matrices such that
(N) (N)
◦ for each N, all entries of A1 , . . . , As are independent from all entries of
(N) (N)
B1 , . . . , Bt
(N) (N)
◦ for each N, the joint distribution of the entries of B1 , . . . , Bt is unitarily in-
variant
(N) (N) (N) (N)
◦ each of (A1 , . . . , As )N and (B1 , . . . , Bt )N has a second-order limiting dis-
tribution.
(N) (N) (N) (N)
Then {A1 , . . . , As }N and {B1 , . . . , Bt }N are asymptotically free of second or-
der.

5.4 second-order cumulants

In the context of usual (first order) freeness it was advantageous to go over from
moments to cumulants - the latter were easier to use to detect freeness, by the char-
acterization of the vanishing of mixed cumulants. In the same spirit, we will now
try to express also the fluctuations ϕ2 in terms of cumulants. The following theory
of second-order cumulants was developed in [59]. Let us reconsider our combina-
torial description of ϕ2 for two of our main examples. In the case of a second-order
semi-circular element (i.e., for the limit of GUE random matrices; see Example 21)
we have

ϕ2 (sm , sn ) = # NC2 (m, n) =

∑ 1 = ∑ κπ . (5.12)
π∈NC2 (m,n) π∈SNC (m,n)
148 5 Second-Order Freeness

The latter form comes from the fact that the free cumulants κn for semi-circulars
are 1 for n = 2 and zero otherwise, i.e., κπ is 1 for a non-crossing pairing, and zero
otherwise. For the second-order free Poisson (i.e, for the limit of Wishart random
matrices: see Exercise 3) we have

ϕ2 (xm , xn ) = ∑ c#(π) = ∑ κπ . (5.13)

π∈SNC (m,n) π∈SNC (m,n)

The latter form comes here from the fact that the free cumulants for a free Poisson
are all equal to c. So in both cases the value of ϕ2 is expressed as a sum over the an-
nular versions of non-crossing partitions, and each such permutation π is weighted
by a factor κπ , which is given by the product of first order cumulants, one factor κr
for each cycle of π of length r. This is essentially the same formula as for ϕ, the only
difference is that we sum over annular permutations instead over circle partitions.
However, it turns out that in general the term

∑ κπ (a1 , . . . , am , am+1 , . . . , am+n )

π∈SNC (m,n)

is only one part of ϕ2 (a1 · · · am , am+1 · · · am+n ); there will also be another contribu-
tion which involves genuine “second order cumulants”.
To see that we need in general such an additional contribution, let us rewrite the
expression from Exercise 5 for ϕ2 (a1 b1 , a2 b2 ), for {a1 , a2 } and {b1 , b2 } being free
of second order, in terms of first order cumulants.

ϕ2 (a1 b1 , a2 b2 ) = κ2 (a1 , a2 )κ2 (b1 , b2 ) + κ2 (a1 , a2 )κ1 (b1 )κ1 (b2 )

+ κ1 (a1 )κ1 (a2 )κ2 (b1 , b2 ) + something else.

The three displayed terms are the three non-vanishing terms κπ for π ∈ SNC (2, 2)
(there are of course more such π, but they do not contribute because of the vanishing
of mixed cumulants in free variables). But we have some additional contributions
which we write in the form

something else = κ1,1 (a1 , a2 )κ1 (b1 )κ1 (b2 ) + κ1 (a1 )κ1 (a2 )κ1,1 (b1 , b2 )

where we have set

κ1,1 (a1 , a2 ) := ϕ2 (a1 , a2 ) − κ2 (a1 , a2 ).

The general structure of the additional terms is the following. We have second-
order cumulants κm,n which have as arguments m elements from the first circle and
n elements from the second circle. As one already sees in the above simple example,
one only has summands which contain at most one such second-order cumulant as
factor. All the other factors are first order cumulants. So these terms can also be
written as κσ , but now σ is of the form σ = π1 × π2 ∈ NC(m) × NC(n) where one
5.4 second-order cumulants 149

1 9
8 2
7 3 12 10
6 4
5 11

Fig. 5.4 The second-order non-crossing annular partition σ = {(1, 2, 3), (4, 7), (5, 6), (8)} ×
{(9, 12), (10, 11)} . Its contribution in the moment cumulant formula is
κσ (a1 , . . . , a12 ) = κ3,2 (a1 , a2 , a3 , a9 , a12 )κ2 (a4 , a7 )κ2 (a5 , a6 )κ1 (a8 )κ2 (a10 , a11 ).

block of σ1 and one block of σ2 is marked. The two marked blocks go together as
arguments into a second-order cumulant, all the other blocks give just first order
cumulants. Let us make this more rigorous in the following definition.

Definition 30. The second-order non-crossing annular partitions, [NC(m)×NC(n)],

consist of elements σ = (π1 ,W1 ) × (π2 ,W2 ), where π1 × π2 ∈ NC(m) × NC(n) and
where W1 ∈ π1 and W2 ∈ π2 . The blocks W1 and W2 are designated as marked blocks
of π1 and π2 , respectively. In examples, we will often mark those blocks as boldface.

Definition 31. Let (A, ϕ, ϕ2 ) be a second-order probability space. The second-order

cumulants
κm,n : Am × An → C
are m + n-linear functionals on A, where we distinguish the group of the first m
arguments from the group of the last n arguments. Those second-order cumulants
are implicitly defined by the following moment-cumulant formula.

ϕ2 (a1 · · · am , am+1 · · · am+n ) =

∑ κπ (a1 , . . . , am+n ) + ∑ κσ (a1 , . . . , am+n ). (5.14)
π∈SNC (m,n) σ ∈[NC(m)×NC(n)]

Here we have used the following notation. For a π = {V1 , . . . ,Vr } ∈ SNC (m, n) we
put
r
κπ (a1 , . . . , am+n ) := ∏ κ#(Vi ) (ak )i∈Vk ,
i=1

where the κn are the already defined first order cumulants in the probability space
(A, ϕ). For a σ ∈ [NC(m) × NC(n)] we define κσ as follows. If σ = (π1 ,W1 ) ×
(π2 ,W2 ) is of the form π1 = {W1 ,V1 , . . . ,Vr } ∈ NC(m) and π2 = {W2 , Ṽ1 , . . . , Ṽs } ∈
NC(n), where W1 and W2 are the two marked blocks, then
150 5 Second-Order Freeness
r s
κσ (a1 , . . . , am+n ) := ∏ κ#(Vi ) (ak )k∈Vi · ∏ κ#(Ṽ j ) (al )l∈Ṽ j ·
i=1 j=1

κ#(W1 ),#(W2 ) (au )u∈W1 , (av )v∈W2 .

The first sum only involves first order cumulants and in the second sum each term
is a product of one second-order cumulant and some first order cumulants. Thus,
since we already know all first order cumulants, the first sum is totally determined
in terms of moments of ϕ. The second sum, on the other side, contains exactly the
highest order term κm,n (a1 , . . . , am+n ) and some lower order cumulants. Thus, by
recursion, we can again solve the moment-cumulant formulas for the determination
of κm,n (a1 , . . . , am+n ).

Example 32. 1) For m = n = 1, we have one first order contribution

π = (1, 2) ∈ SNC (1, 1)

and one second-order contribution

σ = {(1)} × {(2)} ∈ [NC(1) × NC(1)]

and thus we get

ϕ2 (a1 , a2 ) = κπ (a1 , a2 ) + κσ (a1 , a2 ) = κ2 (a1 , a2 ) + κ1,1 (a1 , a2 ).

By invoking the definition of κ2 , κ2 (a1 , a2 ) = ϕ(a1 a2 ) − ϕ(a1 )ϕ(a2 ), this can be

solved for κ1,1 in terms of moments with respect to ϕ and ϕ2 :

κ1,1 (a1 , a2 ) = ϕ2 (a1 , a2 ) − ϕ(a1 a2 ) + ϕ(a1 )ϕ(a2 ). (5.15)

2) For m = 2 and n = 1 we have four first order contributions in SNC (2, 1),

π1 = (1, 2, 3), π2 = (2, 1, 3), π3 = (1, 3)(2), π4 = (1)(2, 3)

and three second-order contributions in [NC(2) × NC(1)],

σ1 = {(1, 2)} × {(3)}, σ2 = {(1), (2)} × {(3)}, σ3 = {(1), (2)} × {(3)},

resulting in

ϕ2 (a1 a2 , a3 ) = κ3 (a1 , a2 , a3 ) + κ3 (a2 , a1 , a3 ) + κ2 (a1 , a3 )κ1 (a2 )

+ κ2 (a2 , a3 )κ1 (a1 ) + κ2,1 (a1 , a2 , a3 ) + κ1,1 (a1 , a3 )κ1 (a2 ) + κ1,1 (a2 , a3 )κ1 (a1 ).

By using the known formulas for κ1 , κ2 , κ3 , and the formula for κ1,1 from above,
this can be solved for κ2,1 :

κ2,1 (a1 , a2 , a3 ) = ϕ2 (a1 a2 , a3 ) − ϕ(a1 )ϕ2 (a2 , a3 ) − ϕ(a2 )ϕ2 (a1 , a3 )

5.4 second-order cumulants 151

− ϕ(a1 a2 a3 ) − ϕ(a1 a3 a2 ) + 2ϕ(a1 )ϕ(a2 a3 )

+ 2ϕ(a1 a3 )ϕ(a2 ) + 2ϕ(a1 a2 )ϕ(a3 ) − 4ϕ(a1 )ϕ(a2 )ϕ(a3 ).

Example 33. 1) Let s be a second-order semi-circular element, i.e., the second-order

limit of GUE random matrices, with second-order distribution as described in Ex-
ample 21. Then the second-order cumulants all vanish in this case, i.e., we have for
all m, n ∈ N
κn (s, . . . , s) = δn2 and κm,n (s, . . . , s) = 0.
2) For the second-order limit y of Wishart random matrices of parameter c (i.e.,
for a second-order free Poisson element) it follows from Exercise 3 that again all
second-order cumulants vanish and the distribution of y can be described as follows:
for all m, n ∈ N we have

κn (y, . . . , y) = c, and κm,n (y, . . . , y) = 0.

3) For an example with non-vanishing second-order cumulants, let us consider

the square a := s2 of the variable s from above. Then, by Equation (5.15), we have

κ1,1 (s2 , s2 ) = κ1,1 (a, a) = ϕ2 (a, a) − ϕ(aa) + ϕ(a)ϕ(a)

= ϕ2 (s2 , s2 ) − ϕ(s2 s2 ) + ϕ(s2 )ϕ(s2 ) = 2 − 2 + 1 = 1.

√ N
Exercise 8. Let XN = 1/ N xi j i, j=1 be a Wigner random matrix ensemble, where
xi j = x ji for all i, j; all xi j for i ≥ j are independent; all diagonal entries xii are
identically distributed according to a distribution ν; and all off-diagonal entries xi j ,
for i 6= j are identically distributed according to a distribution µ. Show that {XN }N
has a second-order limit x ∈ (A, ϕ, ϕ2 ) which is in terms of cumulants given by: all
µ
first order cumulants are zero but κ2x = k2 ; all second-order cumulants are zero but
µ µ µ
x = k ; where k and k are the 2nd and 4th classical cumulant of µ, respectively.
κ2,2 4 2 4

The usefulness of the notion of second-order cumulants comes from the follow-
ing second-order analogue of the characterization of freeness by the vanishing of
mixed cumulants.

Theorem 34. Let (A, ϕ, ϕ2 ) be a second-order probability space. Consider unital

subalgebras A1 , . . . , As ⊂ A. Then the following statements are equivalent:
(i) The algebras A1 , . . . , As are free of second order.
(ii) Mixed cumulants, both of first and second order, of the subalgebras vanish, i.e.:
◦ whenever we choose, for n ∈ N, a j ∈ Ai j ( j = 1, . . . , n) in such a way that
ik 6= il for some k, l ∈ [n] then the corresponding first order cumulants vanish,
κn (a1 , . . . , an ) = 0;
152 5 Second-Order Freeness

◦ and whenever we choose, for m, n ∈ N, a j ∈ Ai j ( j = 1, . . . , m + n) in such a

way that ik 6= il for some k, l ∈ [m + n] then the corresponding second-order
cumulants vanish, κm,n (a1 , . . . , am+n ) = 0.

Sketch Let us give a sketch of the proof. The statement about the first order
cumulants is just Theorem 2.14.
That the vanishing of mixed cumulants implies second-order freeness follows
quite easily from the moment-cumulant formula. In the case of cyclically alternating
centred arguments the only remaining contributions are given by spoke diagrams
and then the moment cumulant formula (5.14) reduces to the defining formula (5.11)
of second-order freeness.
For the other direction, note first that second-order freeness implies the vanish-
ing of κm,n (a1 , . . . , am+n ) = 0 whenever all the ai are centred and both groups of
arguments are cyclically alternating, i.e., i1 6= i2 6= · · · 6= im 6= i1 and im+1 6= im+2 6=
··· =
6 im+n 6= im+1 . Next, because centring does not change the value of second-
order cumulants, we can drop the assumption of centredness. For also getting rid of
the assumption that neighbours must be from different algebras one has, as in the
first order case (see Theorem 3.14), to invoke a formula for second-order cumulants
which have products as arguments.

In the following theorem we state the formula for the κm,n with products as argu-
ments. For the proof we refer to [132].

Theorem 35. Suppose n1 , . . . , nr , nr+1 , . . . , nr+s are positive integers, m := n1 +· · ·+

nr , n = nr+1 + · · · + nr+s . Given a second-order probability space (A, ϕ, ϕ2 ) and
a1 , a2 , . . . , am+n ∈ A, let

A1 = a1 · · · an1 , A2 = an1 +1 · · · an1 +n2 , , . . . , Ar+s = an1 +···+nr+s−1 +1 · · · am+n .

Then

κr,s (A1 , . . . , Ar , Ar+1 , . . . , Ar+s ) = ∑ κπ (a1 , . . . , am+n )

π∈SNC (m,n) with ...

+ ∑ κσ (a1 , . . . , am+n ), (5.16)

σ ∈[NC(m)×NC(n)] with ...

where the summation is over

(i) those σ = (π1 ,W1 ) × (π2 ,W2 ) ∈ [NC(m) × NC(n)] where π1 connects on one
circle the groups corresponding to A1 , . . . , Ar and π2 connects on the other circle
the groups corresponding to Ar+1 , . . . , Ar+s , where “connecting” is here used in
the same sense as in the first order case (see Theorem 2.13). More precisely, this
means that π1 ∨ {(1, . . . , n1 ), . . . , (n1 + · · · + nr−1 + 1, . . . , m)} = 1m and that
π2 ∨ {(m + 1, . . . , m + nr+1 ), . . . , (m + nr+1 + · · · + nr+s−1 + 1, . . . , m + n)} = 1n ;
note that the marked blocks do not play any role for this condition.
5.4 second-order cumulants 153

(ii) those π ∈ SNC (m, n) which connect the groups corresponding to all Ai on both
circles in the following annular way: for such a π all the groups must be con-
nected, but it is not possible to cut the annulus open by cutting on each of the two
circles between two groups.

Example 36. 1) Let us reconsider the second-order cumulant κ1,1 (A1 , A2 ) for A1 =
A2 = s2 from Example 33, by calculating it via the above theorem. Since all second-
order cumulants and all but the second first order cumulants of s are zero, in
the formula (5.16) there is no contributing σ and the only two possible π’s are
π1 = {(1, 3), (2, 4)} and π2 = {(1, 4), (2, 3)}. Both connect both groups (a1 , a2 ) and
(a3 , a4 ), but whereas π1 does this in an annular way, in the case of π2 the annulus
could be cut open outside these groups. So π1 contributes and π2 does not. Hence

κ1,1 (s2 , s2 ) = κπ1 (s, s, s, s) = κ2 (s, s)κ2 (s, s) = 1,

which agrees with the more direct calculation in Example 33.

2) Consider for general random variables a1 , a2 , a3 the cumulant κ1,1 (a1 a2 , a3 ).
The only contributing annular permutation in Equation (5.16) is π = (1, 3, 2) (note
that (1, 2, 3) connects the two groups (a1 , a2 ) and a3 , but not in an annular way),
whereas all second-order annular partitions in [NC(2) × NC(1)], namely

σ1 = {(1, 2)} × {(3)}, σ2 = {(1), (2)} × {(3)}, σ3 = {(1), (2)} × {(3)}

are permitted and thus we get

3
κ1,1 (a1 a2 , a3 ) = κπ (a1 , a2 , a3 ) + ∑ κσi (a1 , a2 , a3 )
i=1
= κ3 (a1 , a3 , a2 ) + κ2,1 (a1 , a2 , a3 )
+ κ1,1 (a1 , a3 )κ1 (a2 ) + κ1,1 (a2 , a3 )κ1 (a1 ).

As in the first order case one can, with the help of this product formula, also get a
version of the characterization of freeness in terms of vanishing of mixed cumulants
for random variables instead of subalgebras.

Theorem 37. Let (A, ϕ, ϕ2 ) be a second-order probability space. Consider a1 , . . . , as

∈ A. Then the following statements are equivalent:
(i) The variables a1 , . . . , as are free of second order.
(ii) Mixed cumulants, both of first and second-order, of the variables vanish, i.e.:
κn (ai1 , . . . , ain ) = 0 and κm,n (ai1 , . . . , aim+n ) = 0 for all m, n ∈ N and all 1 ≤ ik ≤ s
(for all relevant k) and such that among the variables there are at least two
different ones: there exist k, l such that ik 6= il .

Exercise 9. The main point in reducing this theorem to the version for subalgebras
consists in using the product formula to show that the vanishing of mixed cumulants
154 5 Second-Order Freeness

in the variables implies also the vanishing of mixed cumulants in elements in the
generated subalgebras. As an example of this, show that the vanishing of all mixed
first and second-order cumulants in a1 and a2 implies also the vanishing of the mixed
cumulants κ2,1 (a31 , a1 , a22 ) and κ1,2 (a31 , a1 , a22 ).

5.5 Functional relation between second-order moment and cumulant series

Let us now consider the situation where all our random variables are the same, a1 =
· · · = am+n = a. Then we write as before for the first order quantities αn := ϕ(an )
and κna := κn (a, . . . , a) and on second-order level αm,n := ϕ2 (am , an ) and κm,n
a :=

κm,n (a, . . . , a). The vanishing of mixed cumulants for free variables gives then again
that our cumulants linearize the addition of free variables.

Theorem 38. Let (A, ϕ, ϕ2 ) be a second-order probability space and let a, b ∈ A be

free of second order. Then we have for all m, n ∈ N

κna+b = κna + κnb and a+b

κm,n a
= κm,n b
+ κm,n . (5.17)

As in the first order case one can translate the combinatorial relation between mo-
ments and cumulants into a functional relation between generating power series. In
the following theorem we give this as a relation between the corresponding Cauchy
and R-transforms. Again, we refer to [59] for the proof and more details.

Theorem 39. The moment-cumulant relations

αn = ∑ κπ and αm,n = ∑ κπ + ∑ κσ
π∈NC(n) π∈SNC (m,n) σ ∈[NC(m)×NC(n)]

are equivalent to the functional relations

1
+ R(G(z)) = z (5.18)
G(z)

and
∂2 F(z) − F(w)
G(z, w) = G0 (z)G0 (w)R(G(z), G(w)) + log (5.19)
∂ z∂ w z−w
between the following formal power series: the Cauchy transforms
1 1
G(z) = ∑ αn z−n
z n≥0
and G(z, w) = ∑ αm,n z−m w−n
zw m,n≥1

and the R-transforms

1 1
R(z) = ∑ κn zn
z n≥1
and R(z, w) = ∑ κm,n zm wn ;
zw m,n≥1
5.5 Functional relation between moment and cumulant series 155

and where F(z) = 1/G(z).

Equation (5.19) can also be written in the form

0 0 1 1
G(z, w) = G (z)G (w) R(G(z), G(w)) + 2
− . (5.20)
(G(z) − G(w)) (z − w)2

Equation (5.18) is just the well-known functional relation (2.27) from Chapter
2 between first order moments and cumulants. Equation (5.19) determines a se-
quence of equations relating the first and second-order moments with the second-
order cumulants; if we also express the first order moments in terms of first or-
der cumulants, then this corresponds to the moment-cumulant relation αm,n =
∑π∈SNC (m,n) κπ + ∑σ ∈[NC(m)×NC(n)] κσ .
Note that formally the second term on the right-hand side of (5.19) can also be
written as
∂2 F(z) − F(w) ∂2 G(w) − G(z)
log = log ; (5.21)
∂ z∂ w z−w ∂ z∂ w z−w
but since (G(w) − G(z))/(z − w) has no constant term, the power series expansion
of log[(G(w) − G(z))/(z − w)] is not well-defined.
Below is a table, produced from (5.19), giving the first few equations.

α1,1 = κ1,1 + κ2
α1,2 = κ1,2 + 2κ1 κ1,1 + 2κ3 + 2κ1 κ2
α2,2 = κ2,2 + 4κ1 κ1,2 + 4κ12 κ1,1 + 4κ4 + 8κ1 κ3 + 2κ22 + 4κ12 κ2
α1,3 = κ1,3 + 3κ1 κ1,2 + 3κ2 κ1,1 + 3κ12 κ1,1 + 3κ4 + 6κ1 κ3 + 3κ22 + 3κ12 κ2
α2,3 = κ2,3 + 2κ1 κ1,3 + 3κ1 κ2,2 + 3κ2 κ1,2 + 9κ12 κ1,2 + 6κ1 κ2 κ1,1 + 6κ13 κ1,1
+ 6κ5 + 18κ1 κ4 + 12κ2 κ3 + 18κ12 κ3 + 12κ1 κ22 + 6κ13 κ2
α3,3 = κ3,3 + 6κ1 κ2,3 + 6κ2 κ1,3 + 6κ12 κ1,3 + 9κ12 κ2,2 + 18κ1 κ2 κ1,2 + 18κ13 κ1,2
+ 9κ22 κ1,1 + 18κ12 κ2 κ1,1 + 9κ14 κ1,1 + 9κ6 + 36κ1 κ5 + 27κ2 κ4 + 54κ12 κ4
+ 9κ32 + 72κ1 κ2 κ3 + 36κ13 κ3 + 12κ23 + 36κ12 κ22 + 9κ14 κ2 .

Remark 40. Note that the Cauchy transforms can also be written as

1 1
G(z) = lim E tr( ) =ϕ (5.22)
N→∞ z − AN z−a

and

1 1 1 1
G(z, w) = lim cov Tr( ), Tr( ) = ϕ2 , , (5.23)
N→∞ z − AN w − An z−a w−a

if AN has a as second-order limit distribution.

156 5 Second-Order Freeness

In the case where all the second-order cumulants are zero, i.e., R(z, w) = 0, Equa-
tion (5.19) expresses the second-order Cauchy transform in terms of the first order
Cauchy transform,

∂2 F(z) − F(w)

1 1
ϕ2 , = G(z, w) = log . (5.24)
z−a w−a ∂ z∂ w z−w

This applies then in particular to the GUE and Wishart random matrices; that in those
cases the second-order cumulants vanish follows from equations (5.12) and (5.13);
see also Example 33. In the case of Wishart matrices equation (5.24) (in terms of
G(z) instead of F(z), via (5.21)) was derived by Bai and Silverstein [14, 15].
However, there are also many important situations where the second-order cu-
mulants do not vanish and we need the full version of (5.19) to understand the
fluctuations. The following exercise gives an example for this.

Exercise 10. A circular element in first order is of the form

1
c := √ (s1 + is2 ), (5.25)
2
where s1 and s2 are free standard semi-circular elements, see Section 6.8. There we
will also show that such a circular element is in ∗-distribution the limit of a complex
Gaussian random matrix. Since the same arguments apply also to second order, we
define a circular element of second order by the Equation (5.25), where now s1 and
s2 are two semi-circular elements of second order which are also free of second
order. This means in particular, that all second-order cumulants in c and c∗ are zero.
We will in the following compare such a second-order circular element c with a
second-order semi-circular element s as defined in Example 21.
(i) Show that the first order distribution of s2 and cc∗ is the same, namely both
are free Poisson elements of rate 1.
(ii) Show that the second-order cumulants of s2 do not vanish.
(iii) Show that the second-order cumulants of cc∗ are all zero, hence cc∗ is a
second-order free Poisson element, of rate 1.
This shows that whereas s2 and cc∗ are the same in first order, their second-order
distributions are different.

5.6 Diagonalization of fluctuations

(N) (N)
Consider a sequence of random matrices (A1 , . . . , As )N which has a second or-
der limit (a1 , . . . , as ). Then we have for any polynomial p ∈ Chx1 , . . . , xs i that the
unnormalized trace of its centred version,
(N) (N) (N) (N)
Tr p(A1 , . . . , As ) − E(tr(p(A1 , . . . , As ))) 1N ,
5.6 Diagonalization of fluctuations 157

converges to a Gaussian variable. Actually, all such traces of polynomials converge

jointly to a Gaussian family (this is just the fact that we require in our definition of
a second-order limit distribution that all third and higher classical cumulants go to
zero) and the limiting covariance between two such traces for p1 and p2 is given
by ϕ2 (p1 (a1 , . . . , as ), p2 (a1 , . . . , as )). Often, we have a kind of explicit formula (of
a combinatorial nature) for the covariance between monomials in our variables; but
only in very rare cases this covariance is diagonal in those monomials. (An im-
portant instance where this actually happens is the case of Haar unitary random
matrices, see Example 22. Note that there we are dealing with the ∗-distribution,
and we are getting complex Gaussian distributions.) For a better understanding of
the covariance one usually wants to diagonalize it; this corresponds to going over to
Gaussian variables which are independent.
In the case of one GUE random matrix this diagonalization is one of the main
statements in Theorem 1, which was the starting point of this chapter. In the follow-
ing we want to see how our description of second-order distributions and freeness
allows to understand this theorem and its multivariate generalizations.

5.6.1 Diagonalization in the one-matrix case

Let us first look on the one-variable situation. If all second-order cumulants are
zero (as for example for GUE or Wishart random matrices), so that our second-order
Cauchy transform is given by (5.24), then one can proceed as follows.
In order to extract from G(z, w) some information about the covariance for arbi-
trary polynomials p1 and p2 we use Cauchy’s integral formula to write

1 p1 (z) 1 p2 (w)
Z Z
p1 (a) = dz, p2 (a) = dw,
2πi C1 z−a 2πi C2 w−a

where the contour integrals over C1 and C2 are in the complex plane around the
spectrum of a. We are assuming that a is a bounded self-adjoint operator, thus we
have to integrate around sufficiently large portions of the real line. This gives then,
by using Equation (5.24) and integration by parts,

1 1 1
Z Z
ϕ2 (p1 (a), p2 (a)) = − 2 p1 (z)p2 (w)ϕ2 , dzdw
4π C1 C2 z−x w−x
1
Z Z
=− 2 p1 (z)p2 (w)G(z, w)dzdw
4π C1 C2
1 ∂2 F(z) − F(w)
Z Z
=− 2 p1 (z)p2 (w) log dzdw
4π C1 C2 ∂ z∂ w z−w
1
Z Z F(z) − F(w)
=− 2 p01 (z)p02 (w) log dzdw.
4π C1 C2 z−w
158 5 Second-Order Freeness

We choose now for C1 and C2 rectangles with height going to zero, hence the integra-
tion over each of these contours goes to integrals over the real axis, one approaching
the real line from above, and the other approaching the real line from below. We de-
note the corresponding limits of F(z), when z is approaching x ∈ R from above or
from below, by F(x+ ) and F(x− ), respectively. Since p01 and p02 are continuous at
the real axis, we get

F(x+ ) − F(y+ )

1
Z Z
0 0
ϕ2 (p1 (a), p2 (a)) = − p 1 (x)p2 (y) log
2
4π R R x+ − y+
F(x+ ) − F(y− ) F(x− ) − F(y+ ) F(x− ) − F(y− )
− log − log + log dxdy.
x+ − y− x− − y+ x− − y−

Note that one has for the reciprocal Cauchy transform F(z̄) = F(z), hence
F(x− ) = F(x+ ). Since the contributions of the denominators cancel, we get in the
end
F(x) − F(y) 2

1
Z Z
0 0
ϕ2 (p1 (a), p2 (a)) = − 2 p (x)p2 (y) log
dxdy, (5.26)
4π R R 1 F(x) − F(y)

where F(x) denotes now the usual limit F(x+ ) coming from the complex upper
half-plane.
The diagonalization of this bilinear form (5.26) depends on the actual form of

F(x) − F(y) 2
2
1 = − 1 log G(x) − G(y) .

K(x, y) = − 2
log
2
(5.27)
4π F(x) − F(y) 4π G(x) − G(y)

Example 41. Consider the GUE case. Then G is the Cauchy transform of the semi-
circle, √ √
z − z2 − 4 x − i 4 − x2
G(z) = , thus G(x) = .
2 2
Hence we have

1 x − y − i(√4 − x2 − p4 − y2 ) 2
K(x, y) = − 2 log √

p
4π

x − y − i( 4 − x2 + 4 − y2 )
√ p
1 (x − y)2 + ( 4 − x2 − 4 − y2 )2
= − 2 log √ p
4π (x − y)2 + ( 4 − x2 + 4 − y2 )2
p
1 4 − xy − (4 − x2 )(4 − y2 )
= − 2 log p .
4π 4 − xy + (4 − x2 )(4 − y2 )

In order to relate this to Chebyshev polynomials let us write x = 2 cos θ and y =

2 cos ψ. Then we have
5.6 Diagonalization of fluctuations 159

1 4(1 − cos θ cos ψ − sin θ sin ψ)

K(x, y) = − 2
log
4π 4(1 − cos θ cos ψ + sin θ sin ψ)
1 1 − cos(θ − ψ)
= − 2 log
4π 1 − cos(θ + ψ)
1 1
= − 2 log(1 − cos(θ − ψ)) + 2 log(1 − cos(θ + ψ))
4π 4π
1 ∞ 1
= 2 ∑ (cos(n(θ − ψ)) − cos(n(θ + ψ)))
2π n=1 n
1 ∞ 1
= ∑ n sin(nθ ) sin(nψ).
π 2 n=1

In the penultimate step we have used the expansion (5.31) for log(1 − cos θ ) from
the next exercise.
Similarly as cos(nθ ) is related to x = 2 cos θ via the Chebyshev polynomials Cn
of the first kind, sin(nθ ) can be expressed in terms of x via the Chebyshev polyno-
mials Un of the second kind. Those are defined via

sin((n + 1)θ )
Un (2 cos θ ) = . (5.28)
sin θ
We will address some of its properties in Exercise 12.
We can then continue our calculation above as follows.
1 ∞ 1
K(x, y) = ∑ n Un−1 (x) sin θ ·Un−1 (y) sin ψ
π 2 n=1
∞
1 1 p 1 p
= ∑ Un−1 (x) 4 − x2 ·Un−1 (y) 4 − y2 .
n=1 n 2π 2π

We will now use the following two facts about Chebyshev polynomials:
◦ the Chebyshev polynomials of second kind are orthogonal polynomials with re-
spect to the semi-circular distribution, i.e., for all m, n ≥ 0
Z +2
1 p
Un (x)Um (x) 4 − x2 dx = δnm ; (5.29)
−2 2π
◦ the two kinds of Chebyshev polynomials are related by differentiation,

Cn0 = nUn−1 for n ≥ 0. (5.30)

Then we can recover Theorem 1 by checking that the covariance is diagonal for
the Chebyshev polynomials of first kind.
Z Z
ϕ2 (Cn (a),Cm (a)) = Cn0 (x)Cm0 (y)K(x, y)dxdy
160 5 Second-Order Freeness
Z +2Z +2 ∞
1 1 p
= nUn−1 (x)mUm−1 (y) ∑ Uk−1 (x) 4 − x2
−2 −2 k=1 k 2π
1 p
·Uk−1 (y) 4 − y2 dxdy
2π
∞ Z +2
1 1 p 2
= nm ∑ Un−1 (x)Uk−1 (x) 4 − x dx
k=1 k −2 2π
Z +2
1 p
× Um−1 (y)Uk−1 (y) 4 − y2 dy
−2 2π
∞
1
= nm ∑ δnk δmk
k=1 k
= nδnm .

Note that all our manipulations were formal and we did not address analytic issues,
like the justification of the calculations concerning contour integrals. For this, and
also for extending the formula for the covariance beyond polynomial functions one
should consult the original literature, in particular [104, 15].

Exercise 11. Show the following expansion

∞
1 1 1
− log(1 − 1 cos θ ) = ∑ cos(nθ ) + log 2. (5.31)
2 n=1 n 2

Exercise 12. Let Cn and Un be the Chebyshev polynomials, rescaled to the interval
[−2, +2], of the first and second kind, respectively. (See also Notation 8.33 and
subsequent exercises.)
(i) Show that the definition of the Chebyshev polynomials via recurrence rela-
tions, as given in Notation 8.33, is equivalent to the definition via trigonometric
functions, as given in the discussion following Theorem 1 and in Equation (5.28).
(ii) Show equations (5.29) and (5.30).
(iii) Show that the Chebyshev polynomials of first kind are orthogonal with re-
spect to the arc-sine distribution, i.e., for all n, m ≥ 0 with (m, n) 6= (0, 0) we have
Z +2
dx
Cn (x)Cm (y) √ = δnm . (5.32)
−2 π 4 − x2
Note that the definition for the case n = 0, C0 = 2, is made in order to have it fit
with the recurrence relations; to fit the orthonormality relations, C0 = 1 would be
the natural choice.

Example 42. By similar calculations as for the GUE one can show that in the case of
Wishart matrices the diagonalization of the covariance (5.13) is achieved by going
5.6 Diagonalization of fluctuations 161
√ √
over to shifted Chebyshev polynomials of the first kind, cnCn ((x − (1 + c))/ c).
This result is due to Cabanal-Duvillard [47], see also [14, 115].
Remark 43. We want to address here a combinatorial interpretation of the fact that
the Chebyshev polynomials Ck diagonalize the covariance for a GUE random matrix.
Let s be our second-order semi-circular element; hence ϕ2 (sm , sn ) is given by the
number of annular non-crossing pairings on an (m, n) annulus. This is, of course,
not diagonal in m and n because some points on each circle can be paired among
themselves, and this pairing on both sides has no correlation; so there is no constraint
that m has to be equal to n. However, a quantity which clearly must be the same for
both circles is the number of through-pairs, i.e., pairs which connect both circles.
Thus in order to diagonalize the covariance we should go over from the number of
points on a circle to the number of through-pairs leaving this circle. A nice way
to achieve this is to cut our diagrams in two parts - one part for each circle. These
diagrams will be called non-crossing annular half-pairings. See Figures 5.5 and 5.7.
We will call what is left in a half-pairing of a through-pair after cutting an open pair
- as opposed to closed pairs which live totally on one circle and are thus not affected
by the cutting.
In this pictorial description sm corresponds to the sum over non-crossing an-
nular half-pairings on one circle with m points and sn corresponds to a sum over
non-crossing annular half-pairings on another circle with n points. Then ϕ2 (sm , sn )
corresponds to pairing the non-crossing annular half-pairings for sm with the non-
crossing annular half-pairings for sn . A pairing of two non-crossing annular half-
pairings consists of glueing together their open pairs in all possible planar ways.
This clearly means that both non-crossing annular half-pairings must have the same
number of open pairs, and thus our covariance should become diagonal if we go
over from the number n of points on a circle to the number k of open pairs. Fur-
thermore, there are clearly k possibilities to pair two sets of k open pairs in a planar
way.

1 1
1 1
4 2 4 2
4 2 4 2
3 3
3 3

Fig. 5.5 The 4 non-crossing half-pairings on four points with 2 through strings are shown.

From this point of view the Chebyshev polynomials Ck should describe k open
pairs. If we write xn as a linear combination of the Ck , xn = ∑nk=0 qn,kCk (x), then the
above correspondence suggests that for k > 0, the coefficients qn,k are the number
of non-crossing annular half-pairings of n points with k open pairs. See Fig. 5.6 and
Fig. 5.7.
162 5 Second-Order Freeness

1 q0,0

1 q1,1

2 1 q2,0 q2,2

3 1 q3,1 q3,2

6 4 1 q4,0 q4,2 q4,4

Fig. 5.6 As noted earlier, for the purpose of diagonalizing the fluctuations the constant term of the
polynomials is not important. If we make the small adjustment that C0 (x) = 1 and all the others
are unchanged then the recurrence relation becomes Cn+1 (x) = xCn (x) − 2Cn−1 (x) for n ≥ 2 and
C2 (x) = xC1 (x) − 2C0 (x). From this we obtain qn+1,k = qn,k−1 + qn,k+1
for k ≥ 1 and qn+1,0 = 2qn,1 .
n
From these relations we see that for k ≥ 1 we have qn,k = (n−k)/2 when n − k is even and 0 when
n−1
n − k is odd. When k = 0 we have qn,0 = 2 n/2−1 when n is even and qn,0 = 0 when n is odd.

1 1
5 2 1 1
1 5 2 5
5 2 2
5 2
4 3 4 3 4 3 4 3
4 3

1 1 1
1 1 5 5
2 5 2 2
5 2 5 2

4 3 4 3 4 3
4 3 4 3

Fig. 5.7 When n = 5 and k = 1, q5,1 = 10. The ten non-crossing half-pairings on five points with
one through string.

That this is indeed the correct combinatorial interpretation of the result of Jo-
hansson can be found in [115]. There the main emphasis is actually on the case
of Wishart matrices and the result of Cabanal-Duvillard from Example 42 . The
Wishart case can be understood in a similar combinatorial way; instead of non-
crossing annular half-pairings and through-pairs one has to consider non-crossing
annular half-permutations and through-blocks.

5.6.2 Diagonalization in the multivariate case

Consider now the situation of several variables; then we have to diagonalize the
bilinear form (p1 , p2 ) 7→ ϕ2 (p1 (a1 , . . . , as ), p2 (a1 , . . . , as )). For polynomials in just
5.6 Diagonalization of fluctuations 163

one of the variables this is the same problem as in the previous section. It remains
to understand the mixed fluctuations in more than one variable. If we have that
a1 , . . . , as are free of second order, then this is fairly easy. The following theorem
from [128] follows directly from Definition 24 of second order freeness.

Theorem 44. Assume a1 , . . . , as are free of second order in the second-order prob-
(i)
ability space (A, ϕ, ϕ2 ). Let, for each i = 1, . . . , s, Qk (k ≥ 0) be the orthogonal
(i)
polynomials for the distribution of ai ; i.e., Qk is a polynomial of degree k such that
(i) (i)
ϕ(Qk (ai )Ql (ai )) = δkl for all k, l ≥ 0. Then the fluctuations of mixed words in
(i ) (i )
the ai ’s are diagonalized by cyclically alternating products Qk11 (ai1 ) · · · Qkmm (aim )
(with all kr ≥ 1 and i1 6= i2 , i2 6= i3 , . . . , im 6= i1 ) and the covariances are given by
the number of cyclic matchings of these products:

(i ) (i ) (j ) (j )
ϕ2 Qk11 (ai1 ) · · · Qkmm (aim ), Ql1 1 (a j1 ) · · · Qln n (a jn )
= δmn · #{r ∈ {1, . . . , n} | is = js+r , ks = ls+r ∀s = 1, . . . , n}, (5.33)

where we count s + r modulo n.

Remark 45. Note the different nature of the solution for the one-variate and the
multi-variate case. For example, for independent GUE’s we have that the covariance
is diagonalized by the following set of polynomials:
◦ Chebyshev polynomials Ck of first kind in one of the variables
◦ cyclically alternating products of Chebyshev polynomials Uk of second kind for
different variables.
Again there is a combinatorial way of understanding the appearance of the two dif-
ferent kinds of Chebyshev polynomials. As we have outlined in Remark 43, the
Chebyshev polynomials Ck show up in the one-variate case, because this corre-
sponds to going over to non-crossing annular half-pairings with k through-pairs.
In the multi-variate case one has to realize that having several variables breaks the
circular symmetry of the circle and thus effectively replaces a circular problem by a
linear one. In this spirit, the expansion of xn in terms of Chebyshev polynomials Uk
of second kind counts the number of non-crossing linear half-pairings on n points
with k open pairs.
In the Wishart case there is a similar description by replacing non-crossing an-
nular half-permutations by non-crossing linear half-permutations, resulting in an
analogue appearance of orthogonal polynomials of first and second kind for the
one-variate and multi-variate situation, respectively.
More details and the proofs of the above statements can be found in [115].
Chapter 6
Free Group Factors and Freeness

The concept of freeness was actually introduced by Voiculescu in the context of

operator algebras, more precisely, during his quest to understand the structure of
special von Neumann algebras, related to free groups. We wish to recall here the
relevant context and show how freeness shows up there very naturally and how it
can provide some information about the structure of those von Neumann algebras.
Operator algebras are ∗- algebras of bounded operators on a Hilbert space which
are closed in some canonical topologies. (C∗ -algebras are closed in the operator
norm, von Neumann algebras are closed in the weak operator topology; the first
topology is the operator version of uniform convergence, the latter of pointwise
convergence.) Since the group algebra of a group can be represented on itself by
bounded operators given by left multiplication (this is the regular representation of
a group), one can take the closure in the appropriate topology of the group algebra
and get thus C∗ -algebras and von Neumann algebras corresponding to the group.
The group von Neumann algebra arising from a group G in this way is usually
denoted by L(G). This construction, which goes back to the foundational papers of
Murray and von Neumann in the 1930’s, is, for G an infinite discrete group, a source
of important examples in von Neumann algebra theory, and much of the progress
in von Neumann algebra theory was driven by the desire to understand the relation
between groups and their von Neumann algebras better. The group algebra consists
of finite sums over group elements; going over to a closure means that we allow also
some infinite sums. One should note that the weak closure, in the case of infinite
groups, is usually much larger than the group algebra and it is very hard to control
which infinite sums are added. Von Neumann algebras are quite large objects and
their classification is notoriously difficult.

6.1 Group (von Neumann) algebras

Let G be a discrete group. We want to consider compactly supported continuous
functions a : G → C, equipped with convolution (a, b) 7→ a ∗ b. Note that compactly
supported means just finitely supported in the discrete case and thus the set of such

165
166 6 Free Group Factors and Freeness

functions can be identified with the group algebra C[G] of formal finite linear com-
binations of elements in G with complex coefficients, a = ∑g∈G a(g)g, where only
finitely many a(g) 6= 0. Integration over such functions is with respect to the count-
ing measure, hence the convolution is then written as

a ∗ b = ∑ (a ∗ b)(g)g = ∑ ∑ a(h)b(h−1 g) g = ∑ a(h)h ∑ b(k)k = ab,

g∈G g∈G h∈G h∈G k∈G

and is hence nothing but the multiplication in C[G]. Note that the function δe = 1 · e
is the identity element in the group algebra C[G], where e is the identity element in
G.
Now define an inner product on C[G] by setting
(
1, if g = h
hg, hi = (6.1)
0, if g 6= h

on G and extending sesquilinearly to C[G]. From this inner product we define the 2-
norm on C[G] by kak22 = ha, ai. In this way (C[G], k · k2 ) is a normed vector space.
However, it is not complete in the case of infinite G (for finite G the following
is trivial). The completion of C[G] with respect to k · k2 consists of all functions
a : G → C satisfying ∑g∈G |a(g)|2 < ∞, and is denoted by `2 (G), and is a Hilbert
space.
Now consider the unitary group representation λ : G → U(`2 (G)) defined by

λ (g) · ∑ a(h)h := ∑ a(h)gh. (6.2)

h∈G h∈G

This is the left regular representation of G on the Hilbert space `2 (G). It is obvious
from the definition that each λ (g) is an isometry of `2 (G), but we want to check
that it is in fact a unitary operator on `2 (G). Since clearly hgh, ki = hh, g−1 ki, the
adjoint of the operator λ (g) is λ (g−1 ). But then since λ is a group homomorphism,
we have λ (g)λ (g)∗ = I = λ (g)∗ λ (g), so that λ (g) is indeed a unitary operator on
`2 (G).
Now extend the domain of λ from G to C[G] in the obvious way:

λ (a) = λ ( ∑ a(g)g) = ∑ a(g)λ (g).

g∈G g∈G

This makes λ into an algebra homomorphism λ : C[G] → B(`2 (G)), i.e. λ is a rep-
resentation of the group algebra on `2 (G). We define two new (closed) algebras via
this representation. The reduced group C∗ -algebra Cred∗ (G) of G is the closure of

λ (C[G]) ⊂ B(`2 (G)) in the operator norm topology. The group von Neumann alge-
bra of G, denoted L(G), is the closure of λ (C[G]) in the strong operator topology
on B(`2 (G)).
6.2 Free group factors 167

One knows that for an infinite discrete group G, L(G) is a type II1 von Neumann
algebra, i.e. L(G) is infinite dimensional, but yet there is a trace τ on L(G) defined
by τ(a) := hae, ei for a ∈ L(G), where e ∈ G is the identity element. To see the
trace property of τ it suffices to check it for group elements; this extends then to
the general situation by linearity and normality. However, for g, h ∈ G, the fact that
τ(gh) = τ(hg) is just the statement that gh = e is equivalent to hg = e; this is clearly
true in a group. The existence of a trace shows that L(G) is a proper subalgebra
of B(`2 (G)); this is the case because there does not exist a trace on all bounded
operators on an infinite dimensional Hilbert space. An easy fact is that if G is an
ICC group, meaning that the conjugacy class of each g ∈ G with g 6= e has infinite
cardinality, then L(G) is a factor, i.e. has trivial centre (see [106, Theorem 6.75]).
Another fact is that if G is an amenable group (e.g the infinite permutation group
S∞ = ∪n Sn ), then L(G) is the hyperfinite II1 factor R.
Exercise 1. (i) Show that L(G) is a factor if and only if G is an ICC group.
(ii) Show that the infinite permutation group S∞ = ∪n Sn is ICC. (Note that each
element from S∞ moves only a finite number of elements.)

6.2 Free group factors

Now consider the case where G = Fn , the free group on n generators; n can here be a
natural number n ≥ 1 or n = ∞. Let us briefly recall the definition of Fn and some of
its properties. Consider the set of all words, of arbitrary length, over the 2n + 1-letter
alphabet {a1 , a2 , . . . , an , a−1 −1 −1
1 , a2 , . . . , an } ∪{e}, where the letters of the alphabet
satisfy no relations other than eai = ai e = ai , ea−1 −1 −1 −1 −1
i = ai e = ai , ai ai = ai ai = e.
We say that a word is reduced if its length cannot be reduced by applying one of the
above relations. Then the set of all reduced words in this alphabet together with the
binary operation of concatenating words and reducing constitutes the free group Fn
on n generators. Fn is the group generated by n symbols satisfying no relations other
than those required by the group axioms. Clearly F1 is isomorphic to the abelian
group Z, while Fn is non-abelian for n > 1 and in fact has trivial center. The integer
n is called the rank of the free group; it is fairly easy, though not totally trivial, to
see (e.g., by reducing it via abelianization to a corresponding question about abelian
free groups) that Fn and Fm are isomorphic if and only if m = n.
Exercise 2. Show that Fn is, for n ≥ 2, an ICC group.
Since Fn has the infinite conjugacy class property, one knows that the group
von Neumann algebra L(Fn ) is a II1 factor, called a free group factor. Murray and
von Neumann showed that L(Fn ) is not isomorphic to the hyperfinite factor, but
otherwise nothing was known about the structure of these free group factors, when
free probability was invented by Voiculescu to understand them better.
While as pointed out above we have that Fn ' Fm if and only if m = n, the
corresponding problem for the free group factors is still unknown; see however some
results in this direction in section 6.12.
168 6 Free Group Factors and Freeness

Free Group Factor Isomorphism Problem: Let m, n ≥ 2 (possibly equal to ∞),

n 6= m. Are the von Neumann algebras L(Fn ) and L(Fm ) isomorphic?
The corresponding problem for the reduced group C∗ -algebras was solved by
∗ (F ) 6' C∗ (F ) for
Pimsner and Voiculescu [143] in 1982: they showed that Cred n red m
m 6= n.

6.3 Free product of groups

There is the notion of free product of groups. If G, H are groups, then their free
product G ∗ H is defined to be the group whose generating set is the disjoint union
of G and H, and which has the property that the only relations in G ∗ H are those
inherited from G and H and the identification of the neutral elements of G and H.
That is, there should be no non-trivial algebraic relations between elements of G
and elements of H in G ∗ H. In a more abstract language: the free product is the
coproduct in the category of groups. For example, in the category of groups, the
n-fold direct product of n copies of Z is the lattice Zn ; the n-fold coproduct (free
product) of n copies of Z is the free group Fn on n generators.
In the category of groups we can understand Fn via the decomposition Fn =
Z ∗ Z ∗ · · · ∗ Z. Is there a similar free product of von Neumann algebras that will help
us to understand the structure of L(Fn )? The notion of freeness or free independence
makes this precise. In order to understand what it means for elements in L(G) to be
free we need to deal with infinite sums, so the algebraic notion of freeness will not
do: we need a state.

6.4 Moments and isomorphism of von Neumann algebras

We will try to understand a von Neumann algebra with respect to a state. Let M be
a von Neumann algebra and let ϕ : M → C be a state defined on M, i.e. a positive
linear functional. Select finitely many elements a1 , . . . , ak ∈ M. Let us first recall the
notion of (∗-)moments and (∗-)distribution in such a context.
Definition 1. 1) The collection of numbers gotten by applying the state to words in
the alphabet {a1 , . . . , ak } is called the collection of joint moments of a1 , . . . , ak , or
the distribution of a1 , . . . , ak .
2) The collection of numbers gotten by applying the state to words in the alphabet
{a1 , . . . , ak , a∗1 , . . . , a∗k } is called the collection of joint ∗-moments of a1 , . . . , ak , or the
∗-distribution of a1 , . . . , ak .
Theorem 2. Let M = vN(a1 , . . . , ak ) be generated as von Neumann algebra by ele-
ments a1 , . . . , ak and let N = vN(b1 , . . . , bk ) be generated as von Neumann algebra
by elements b1 , . . . , bk . Let ϕ : M → C and ψ : N → C be faithful normal states.
If a1 , . . . , ak and b1 , . . . , bk have the same ∗-distributions with respect to ϕ and ψ
respectively, then the map ai 7→ bi extends to a ∗-isomorphism of M and N.

Exercise 3. Prove Theorem 2 by observing that the assumptions imply that the GNS-
constructions with respect to ϕ and ψ are isomorphic.
6.5 Freeness in the free group factors 169

Though the theorem is not hard to prove, it conveys the important message that
all information about a von Neumann algebra is, in principle, contained in the ∗ -
moments of a generating set with respect to a faithful normal state.
In the case of the group von Neumann algebras L(G) the canonical state is the
trace τ. This is defined as a vector state so it is automatically normal. It is worth to
notice that it is also faithful (and hence (L(G), τ) is a tracial W ∗ -probability space).
Proposition 3. The trace τ on L(G) is a faithful state.

Proof: Suppose that a ∈ L(G) satisfies 0 = τ(a∗ a) = ha∗ ae, ei = hae, aei, thus ae =
0. So we have to show that ae = 0 implies a = 0. To show that a = 0, it suffices
to show that haξ , ηi = 0 for any ξ , η ∈ `2 (G). It suffices to consider vectors of
the form ξ = g, η = h for g, h ∈ G, since we can get the general case from this by
linearity and continuity. Now, by using the traciality of τ, we have

hag, hi = hage, hei = hh−1 age, ei = τ(h−1 ag) = τ(gh−1 a) = hgh−1 ae, ei = 0,

since the first argument to the last inner product is 0.

6.5 Freeness in the free group factors

We now want to see that the algebraic notion of freeness of subgroups in a free
product of groups translates with respect to the canonical trace τ to our notion of
free independence.
Let us say that a product in an algebra A is alternating with respect to subal-
gebras A1 , . . . , As if adjacent factors come from different subalgebras. Recall that
our definition of free independence says: the subalgebras A1 , . . . , As are free if any
product in centred elements over these algebras which alternates is centred.
Proposition 4. Let G be a group containing subgroups G1 , . . . , Gs such that G =
G1 ∗ · · · ∗ Gs . Let τ be the state τ(a) = hae, ei on C[G]. Then the subalgebras
C[G1 ], . . . , C[Gs ] ⊂ C[G] are free with respect to τ.

Proof: Let a1 a2 · · · ak be an element in C[G] which alternates with respect to the

subalgebras C[G1 ], . . . , C[Gs ], and assume the factors of the product are centred
with respect to τ. Since τ is the “coefficient of the identity” state, this means that if
a j ∈ C[Gi j ], then a j looks like a j = ∑g∈Gi a j (g)g and a j (e) = 0. Thus we have
j

τ(a1 a2 · · · ak ) = ∑ a1 (g1 )a2 (g2 ) · · · ak (gk )τ(g1 g2 · · · gk ).

g1 ∈Gi1 ,...,gk ∈Gik

Now, τ(g1 g2 · · · gk ) 6= 0 only if g1 g2 · · · gk = e. But g1 g2 · · · gk is an alternating prod-

uct in G with respect to the subgroups G1 , G2 , . . . , Gs , and since G = G1 ∗ G2 ∗ · · · ∗
Gs , this can happen only when at least one of the factors, let’s say g j , is equal to e;
but in this case a j (g j ) = a j (e) = 0. So each summand in the sum for τ(a1 a2 · · · ak )
vanishes and we have τ(a1 a2 · · · ak ) = 0, as required.
170 6 Free Group Factors and Freeness

Thus freeness of the subgroup algebras C[G1 ], . . . , C[Gs ] with respect to τ is just
a simple reformulation of the fact that G1 , . . . , Gs are free subgroups of G. However,
a non-trivial fact is that this reformulation carries over to closures of the subalgebras.

Proposition 5. (1) Let A be a C∗ -algebra, ϕ : A → C a state. Let B1 , . . . , Bs ⊂ A

k·k
be unital ∗-subalgebras which are free with respect to ϕ. Put Ai := Bi , the norm
closure of Bi . Then A1 , . . . , As are also free.
(2) Let M be a von Neumann algebra, ϕ : M → C a normal state. Let B1 , . . . , Bs
be unital ∗-subalgebras which are free. Put Mi := vN(Bi ). Then M1 , . . . , Ms are also
free.

Proof: (1) Consider a1 , . . . , ak with ai ∈ A ji , ϕ(ai ) = 0, and ji 6= ji+1 for all i.

We have to show that ϕ(a1 · · · ak ) = 0. Since Bi is dense in Ai , we can, for each i,
(n) (n)
approximate ai in operator norm by a sequence (bi )n∈N , with bi ∈ Bi , for all n.
(n) (n) (n) (n)
Since we can replace bi by bi − ϕ(bi ) (note that ϕ(bi ) converges to ϕ(ai ) =
(n)
0), we can assume, without restriction, that ϕ(bi ) = 0. But then we have
(n) (n)
ϕ(a1 · · · ak ) = lim ϕ(b1 · · · bk ) = 0,
n→∞

(n) (n)
since, by the freeness of B1 , . . . , Bs , we have ϕ(b1 · · · bk ) = 0 for each n.
(2) Consider a1 , . . . , ak with ai ∈ M ji , ϕ(ai ) = 0, and ji 6= ji+1 for all i. We have
to show that ϕ(a1 · · · ak ) = 0. We approximate essentially as in the C∗ -algebra case,
we only have to take care that the multiplication of our k factors is still continuous
in the appropriate topology. More precisely, we can now approximate, for each i,
the operator ai in the strong operator topology by a sequence (or a net, if you must)
(n)
bi . By invoking Kaplansky’s density theorem we can choose those such that we
(n)
keep everything bounded, namely kbi k ≤ kai k for all n. Again we can center the
(n)
sequence, so that we can assume that all ϕ(bi ) = 0. Since the joint multiplication
is on bounded sets continuous in the strong operator topology, we have then still
(n) (n)
the convergence of b1 · · · bk to a1 · · · ak , and thus, since ϕ is normal, also the
(n) (n)
convergence of 0 = ϕ(b1 · · · bk ) to ϕ(a1 · · · ak ).

6.6 The structure of free group factors

What does this tell us for the free group factors? It is clear that each generator of
the free group gives a Haar unitary element in (L(Fn ), τ). By the discussion above,
those elements are ∗-free. Thus the free group factor L(Fn ) is generated by n ∗-free
Haar unitaries u1 , . . . , un . Note that, by Theorem 2, we will get the free group factor
L(Fn ) whenever we find somewhere n Haar unitaries which are ∗-free with respect
to a faithful normal state. Furthermore, since we are working inside von Neumann
algebras, we have at our disposal measurable functional calculus, which means that
we can also deform the Haar unitaries into other, possibly more suitable, generators.
6.6 The structure of free group factors 171

Theorem 6. Let M be a von Neumann algebra and τ a faithful normal state on M.

Assume that x1 , . . . , xn ∈ M generate M, vN(x1 , . . . , xn ) = M, and that
◦ x1 , . . . , xn are ∗-free with respect to τ,
◦ each xi is normal and its spectral measure with respect to τ is diffuse (i.e., has
no atoms).
Then M ' L(Fn ).

Proof:
Let x be a normal element in M which is such that its spectral measure with re-
spect to τ is diffuse. Let A = vN(x) be the von Neumann algebra generated by x.
We want to show that there is a Haar unitary u ∈ A that generates A as a von Neu-
mann algebra. A is a commutative von Neumann algebra and the restriction of τ to
A is a faithful state. A cannot have any minimal projections as that would mean that
the spectral measure of x with respect to τ was not diffuse. Thus there is a normal
∗-isomorphism π : A → L∞ [0, 1] where we put Lebesgue measure on [0, 1]. (This
follows from the well-known fact that any commutative von Neumann algebra is
∗-isomorphic to L∞ (µ) for some measure µ and that all spaces L∞ (µ) for µ without
atoms are ∗-isomorphic; see, for example, [171, Chapter III, Theorem 1.22].
Under π the trace τ becomes a normal state on L∞ [0, 1]. Thus there is a positive
function h ∈ L1 [0, 1] such that for all a ∈ A, τ(a) = 01 π(a)(t)h(t) dt. Since τ is
R

faithful the set {t ∈ [0, 1] | h(t) = 0} has Lebesgue measure 0. Thus H(s) = 0s h(t) dt
R

is a continuous positive strictly increasing function on [0, 1] with range [0, 1]. So
by the Stone-Weierstrass theorem the C∗ -algebra generated by 1 and H is all of
C[0, 1]. Hence the von Neumann algebra generated by 1 and H is all of L∞ [0, 1]. Let
v(t) = exp(2πiH(t)). Then H is in the von Neumann algebra generated by v, so the
von Neumann algebra generated by v is L∞ [0, 1]. Also,
Z 1 Z 1 Z 1
v(t)n h(t) dt = exp(2πinH(t))H 0 (t) dt = e2πins ds = δ0,n .
0 0 0

Thus v is Haar unitary with respect to h. Finally let u ∈ A be such that π(u) = v.
Then the von Neumann algebra generated by u is A and u is a Haar unitary with
respect to the trace τ.
This means that for each i we can find in vN(xi ) a Haar unitary ui which generates
the same von Neumann algebra as xi . By Proposition 5, freeness of the xi goes over
to freeness of the ui . So we have found n Haar unitaries in M which are ∗-free and
which generate M. Thus M is isomorphic to the free group factor L(Fn ).
Example 7. Instead of generating L(Fn ) by n ∗-free Haar unitaries it is also very
common to use n free semi-circular elements. (Note that for self-adjoint elements ∗-
freeness is of course the same as freeness.) This is of course covered by the theorem
above. But let us be a bit more explicit on deforming a semi-circular element into
a Haar unitary. Let s ∈ M be a semi-circular operator. The spectral measure of s is
√
4 − t 2 /(2π) dt, i.e.
172 6 Free Group Factors and Freeness
Z 2
1 p
τ( f (s)) = f (t) 4 − t 2 dt.
2π −2

If
t p 1 1 p
H(t) = 4 − t 2 + sin−1 (t/2) then H 0 (t) = 4 − t 2,
4π π 2π
and u = exp(2πiH(s)) is a Haar unitary, i.e.
Z 2 Z 1/2
τ(uk ) = e2πikH(t) H 0 (t) dt = e2πikr dr = δ0,k ,
−2 −1/2

which generates the same von Neumann subalgebra as s.

6.7 Compression of free group factors

Let M be any II1 factor with faithful normal trace τ and e a projection in M. Let
eMe = {exe | x ∈ M}; eMe is again a von Neumann algebra, actually a II1 factor,
with e being its unit, and it is called the compression of M by e. It is an elementary
fact in von Neumann algebra theory that the isomorphism class of eMe depends
only on t = τ(e) and we denote this isomorphism class by Mt . A deeper fact of
Murray and von Neumann is that (Ms )t = Mst . We can define Mt for all t > 0 as
follows. For a positive integer n let Mn = M ⊗ Mn (C) and for any t, let Mt = eMn e
for any sufficiently large n and any projection e in Mn with trace t, where here we
use the non-normalized trace τ ⊗ Tr on Mn . Murray and von Neumann then defined
the fundamental group of M, F(M), to be {t ∈ R+ | M ' Mt } and showed that it is
a multiplicative subgroup of R+ . (See [106, Ex. 13.4.5 and 13.4.6].) It is a theorem
that when G is an amenable ICC group we have L(G) is the hyperfinite II1 factor
and F(L(G)) = R+ , see [171].
Rădulescu showed that F(L(F∞ )) = R+ , see [149]. For finite n, F(L(Fn )) is
unknown; but it is known to be either R+ or {1}. In the rest of this chapter we will
give the key ideas about those compression results for free group factors.
The first crucial step was taken by Voiculescu who showed in 1990 in [179]
that for integers m, n, k we have L(Fn )1/k ' L(Fm ), where (m − 1)/(n − 1) = k2 , or
equivalently
m−1
L(Fn ) ' Mk (C) ⊗ L(Fm ), where = k2 . (6.3)
n−1
So if we embed L(Fm ) into Mk (C) ⊗ L(Fm ) ' L(Fn ) as x 7→ 1 ⊗ x then L(Fm ) is a
subfactor of L(Fn ) of Jones index k2 , see [105, Example 2.3.1]. Thus, (m − 1)/(n − 1) =
[L(Fn ) : L(Fm )]. Notice the similarity to Schreier’s index formula for free groups.
Indeed, suppose G is a free group of rank n and H is a subgroup of G of finite index.
Then H is necessarily a free group, say of rank m, and Schreier’s index formula says
that (m − 1)/(n − 1) = [G : H].
6.8 Circular operators and complex Gaussian random matrices 173

Rather than proving Voiculescu’s theorem, Equation (6.3), in full generality we

shall first prove a special case which illustrates the main ideas of the proof, and then
sketch the general case.

Theorem 8. We have L(F3 )1/2 ' L(F9 ).

To prove this theorem we must find in L(F3 )1/2 nine free normal elements with
diffuse spectral measure which generate L(F3 )1/2 . In order to achieve this we will
start with normal elements x1 , x2 , x3 , together with a faithful normal state ϕ, such
that
◦ the spectral measure of each xi is diffuse (i.e. no atoms) and
◦ x1 , x2 , x3 are ∗-free.
Let N be the von Neumann algebra generated by x1 , x2 and x3 . Then N ' L(F3 ). We
will then show that there is a projection p in N such that
◦ ϕ(p) = 1/2
◦ there are 9 free and diffuse elements in pN p which generate pN p.
Thus L(F3 )1/2 ' pN p ' L(F9 ).
The crucial issue above is that we will be able to choose our elements x1 , x2 , x3
in such a form that we can easily recognize p and the generating elements of pN p.
(Just starting abstractly with three ∗-free normal diffuse elements will not be very
helpful, as we have then no idea how to get p and the required nine free elements.)
Actually, since our claim is equivalent to L(F3 ) ' M2 (C) ⊗ L(F9 ), it will surely be
a good idea to try to realize x1 , x2 , x3 as 2 × 2 matrices. This will be achieved in the
next section with the help of circular operators.

6.8 Circular operators and complex Gaussian random matrices

To construct the elements x1 , x2 , x3 as required above we need to make a digression
into circular operators. Let X be an 2N × 2N GUE random matrix. Let
√

In 0n
P= and G = 2 PX(1 − P).
0n 0n

Then G is a N × N matrix with independent identically distributed entries which

are centred complex Gaussian random variables with complex variance 1/N; such a
matrix we call a complex Gaussian random matrix. We can determine the limiting
∗-moments of G as follows.
√ √ √
Write Y1 = (G + G∗ )/ 2 and Y2 = −i(G − G∗ )/ 2 then G = (Y1 + iY2 )/ 2 and
Y1 and Y2 are independent N × N GUE random matrices. Therefore by the asymptotic
freeness of independent GUE (see section 1.11), Y1 and Y2 converge as N → ∞ to free
standard semi-circulars s1 and s2 .

Definition√ 9. Let s1 and s2 be free and standard semi-circular. Then we call c =

(s1 + is2 )/ 2 a circular operator.
174 6 Free Group Factors and Freeness

Since s1 and s2 are free we can easily calculate the free cumulants of c. If ε = ±1
let us adopt the following notation for x(ε) : x(−1) = x∗ , and x(1) = x. Recall that for
a standard semi-circular operator s
(
1, n = 2
κn (s, . . . , s) = .
0, n 6= 2

Thus

κn (c(ε1 ) , . . . , c(εn ) ) = 2−n/2 κn (s1 + ε1 is2 , . . . , s1 + iεn s2 )

= 2−n/2 κn (s1 , . . . , s1 ) + in ε1 · · · εn κn (s2 , . . . , s2 )

since all mixed cumulants in s1 and s2 are 0. Thus κn (c(ε1 ) , . . . , c(εn ) ) = 0 for n 6= 2,
and
(
1 − ε 1 ε2 1 ε1 6= ε2
κ2 (c(ε1 ) , c(ε2 ) ) = 2−1 κ2 (s1 , s1 ) − ε1 ε2 κ2 (s2 , s2 ) =

= .
2 0 ε1 = ε2

Hence κ2 (c, c∗ ) = κ2 (c∗ , c) = 1, κ2 (c, c) = κ2 (c∗ , c∗ ) = 0 and all other ∗-cumulants

are 0. Thus

τ((c∗ c)n ) = ∑ κπ (c∗ , c, c∗ , c, . . . , c∗ , c) = ∑ κπ (c∗ , c, c∗ , c, . . . , c∗ , c).

π∈NC(2n) π∈NC2 (2n)

Now note that any π ∈ NC2 (2n) connects, by parity reasons, automatically only c
with c∗ , hence κπ (c∗ , c, c∗ , c, . . . , c∗ , c) = 1 for all π ∈ NC2 (2n) and we have

τ((c∗ c)n ) = |NC2 (2n)| = τ(s2n ),

√
7→ t is a uniform
where s is a standard semi-circular element. Since t √ √ limit of
polynomials in t we have that the moments of |c| = c∗ c and |s| = s2 are the
same and |c| and |s| have the same distribution. The operator |c| = |s| is called a
quarter-circular operator and has moments
Z 2 p
1
τ(|c|k ) = tk 4 − t 2 dt.
π 0

An additional result which we will need is Voiculescu’s theorem on the polar

decomposition of a circular operator.

Theorem 10. Let (M, τ) be a W ∗ -probability space and c ∈ M a circular operator.

If c = u |c| is its polar decomposition in M then
(i) u and |c| are ∗-free,
(ii) u is a Haar unitary,
(iii) |c| is a quarter circular operator.
6.8 Circular operators and complex Gaussian random matrices 175

The proof of (i) and (ii) can either be done using random matrix methods (as was
done by Voiculescu [180]) or by showing that if u is a Haar unitary and q is a quarter-
circular operator such that u and q are ∗-free then uq has the same ∗-moments as
a circular operator (this was done by Nica and Speicher [140]). The latter can be
achieved, for example, by using the formula for cumulants of products, equation
(2.23). For the details of this approach, see [140, Theorem 15.14].

Theorem 11. Let (A, ϕ) be a unital ∗-algebra with a state ϕ. Suppose s1 , s2 , c ∈ A

are ∗-free and s1 and s2 semi-circular and c circular. Then

1 s1 c
x= √ ∗ ∈ (M2 (A), ϕ2 )
2 c s2
is semi-circular.

Here we have used the standard notation M2 (A) = M2 (C) ⊗ A for 2 × 2 matrices
with entries from A and ϕ2 = tr ⊗ ϕ for the composition of the normalized trace
with ϕ.
Proof: Let Chx11 , x12 , x21 , x22 i be the polynomials in the non-commuting variables
x11 , x12 , x21 , x22 . Let
k !
1 x11 x12
pk (x11 , x12 , x21 , x22 ) = Tr .
2 x21 x22

Now let AN = MN (L∞− (Ω )) be the N × N matrices with entries in L∞− (Ω ) :=

p
p≥1 L (Ω ), for some classical probability space Ω . On AN we have the state
T

ϕN (X) = E(N −1 Tr(X)). Now suppose in AN we have S1 , S2 , and C, with S1 and

S2 GUE random matrices and C a complex Gaussian random matrix and with the en-
tries of S1 , S2 , C independent. Then we know that S1 , S2 ,C converge in ∗-distribution
to s1 , s2 , c, i.e., for any polynomial p in four non-commuting variables we have
ϕN (p(S1 ,C,C∗ , S2 )) → ϕ(p(s1 , c, c∗ , s2 )). Now let

1 S1 C
X=√ ∗ .
2 C S2
Then X is in A2N , and
1
ϕ2N (X k ) = ϕN pk (S1 ,C,C∗ , S2 ) → ϕ pk (s1 , c, c∗ , s2 ) = ϕ( Tr(xk )) = tr ⊗ ϕ(xk ).

2
On the other hand X is a 2N × 2N GUE random matrix; so ϕ2N (X k ) converges to the
kth moment of a semi-circular operator. Hence x in M2 (A) is semi-circular.
Exercise 4. Suppose s1 , s2 , c, and x are as in Theorem 11. Show that x is semi-
circular by computing ϕ(tr(xn )) directly using the methods of Lemma 1.9.
176 6 Free Group Factors and Freeness

We can now present the realization of the three generators x1 , x2 , x3 of L(F3 )

which we need for the proof of the compression result.

Lemma 12. Let A be a unital ∗-algebra and ϕ a state on A. Suppose s1 , s2 , s3 , s4 ,

c1 , c2 , u in A are ∗-free, with s1 , s2 , s3 , and s4 semi-circular, c1 and c2 circular and
u a Haar unitary. Let

s1 c1 s3 c2 u 0
x1 = ∗ , x2 = ∗ , x3 = .
c1 s2 c2 s4 0 2u

Then x1 , x2 , x3 are ∗-free in M2 (A) with respect to the state tr ⊗ ϕ; x1 and x2 are
semi-circular and x3 is normal and diffuse.

Proof:
We model x1 by X1 , x2 by X2 and x3 by X3 where

S1 C1 S3 C2 U 0
X1 = , X 2 = , X3 =
C1∗ S2 C2∗ S3 0 2U

and S1 , S2 , S3 , S4 are N × N GUE random matrices, C1 and C2 are N × N complex

Gaussian random matrices, and U is a diagonal deterministic unitary matrix, chosen
so that the entries of X1 are independent from those of X2 and that the diagonal
entries of U converge in distribution to the uniform distribution on the unit circle.
Then X1 , X2 , X3 are asymptotically ∗-free by Theorem 4.4. Thus x1 , x2 , and x3 are
∗-free because they have the same distribution as the limiting distribution of X1 ,
X2 , and X3 . By the previous Theorem 11, x1 and x2 are semi-circular. x3 is clearly
normal and its spectral distribution is given by the uniform distribution on the union
of the circle of radius 1 and the circle of radius 2.

6.9 Proof of L(F3 )1/2 ' L(F9 )

We will now present the proof of Theorem 8.
Proof:
We have shown that if we take four semi-circular operators s1 s2 , s3 , s4 , two
circular operators c1 , c2 , and a Haar unitary u in a von Neumann algebra M with
trace τ such that s1 , s2 , s3 , s4 , c1 , c2 , u are ∗-free then

◦ the elements

s1 c1 s3 c2 u 0
x1 = ∗ , x2 = ∗ , x3 =
c1 s2 c2 s4 0 2u

are ∗-free in (M2 (M), tr ⊗ τ),

◦ x1 and x2 are semi-circular and x3 is normal and has diffuse spectral measure.

Let N = vN(x1 , x2 , x3 ) ⊆ M2 (M). Then, by Theorem 6, N ' L(F3 ). Since

6.9 Proof of L(F3 )1/2 ' L(F9 ) 177

10 10
= x3∗ x3 ∈ N, we also have the spectral projection p= ∈ N,
04 00

and thus px1 (1 − p) ∈ N and px2 (1 − p) ∈ N. We have the polar decompositions

0 c1 0 v1 0 0 0 c2 0 v2 0 0
= · and = · ,
0 0 0 0 0 |c1 | 0 0 0 0 0 |c2 |

where c1 = v1 |c1 | and c2 = v2 |c2 | are the polar decompositions of c1 and c2 , respec-
tively, in M.
Hence we see that N = vN(x1 , x2 , x3 ) is generated by the ten elements

s1 0 0 0 0 v1 0 0 s3 0
y1 = y2 = y3 = y4 = y5 =
0 0 0 s2 0 0 0 |c1 | 0 0

0 0 0 v2 0 0 u0 00
y6 = y7 = y8 = y9 = y10 = .
0 s4 0 0 0 |c2 | 00 0u
Let us put

0 v1 00 10
v := ; then v∗ v = and vv∗ = = p = p2 .
0 0 01 00

Since we can write now any pyi1 · · · yin p in the form pyi1 1yi2 1 · · · 1yin p and re-
place each 1 by p2 + v∗ v, it is clear that 10 ∗ ∗
i=1 {pyi p, pyi v , vyi p, vyi v } generate pN p.
S

This gives for pN p the generators

s1 , v1 s2 v∗1 , v1 v∗1 , v1 |c1 | v∗1 , s3 , v1 s4 v∗1 , v2 v∗1 , v1 |c2 | v∗1 , u, v1 uv∗1 .

Note that v1 v∗1 = 1 can be removed from the set of generators. To check that the
remaining nine elements are ∗-free and diffuse we recall a few elementary facts
about freeness.
Exercise 5. Show the following:
(i) if A1 and A2 are free subalgebras of A, if A11 and A12 are free subalgebras
of A1 , and if A21 and A22 are free subalgebras of A2 ; then A11 , A12 , A21 , A22 are
free;
(ii) if u is a Haar unitary ∗-free from A, then A is ∗-free from uAu∗ ;
(iii) if u1 and u2 are Haar unitaries and u2 is ∗-free from {u1 } ∪ A then u2 u∗1 is a
Haar unitary and is ∗-free from u1 Au∗1 .
By construction s1 , s2 , s3 , s4 , |c1 |, |c2 |, v1 , v2 , u are ∗-free. Thus, in particular, s2 , s4 ,
|c1 |, |c2 |, v2 , u are ∗-free. Hence, by (ii), v1 s2 v∗1 , v1 s4 v∗1 , v1 |c1 |v∗1 , v1 |c2 |v∗1 , v1 uv∗1 are
∗-free and, in addition, ∗-free from u, s1 , s3 , v2 . Thus

u, s1 , s3 , v1 s2 v∗1 , v1 s4 v∗1 , v1 |c1 |v∗1 , v1 |c2 |v∗1 , v1 uv∗1 , v2

178 6 Free Group Factors and Freeness

are ∗-free. Let A = alg(s2 , s4 , |c1 |, |c2 |, u). We have that v2 is ∗-free from {v1 } ∪ A,
so by (iii), v2 v∗1 is ∗-free from v1 Av∗1 . Thus v2 v∗1 is ∗-free from

v1 s2 v∗1 , v1 s4 v∗1 , v1 |c1 |v∗1 , v1 |c2 |v∗1 , v1 uv∗1

and it was already ∗-free from s1 , s3 and u. Thus by (i) our nine elements

s1 , s3 , v1 s2 v∗1 , v1 s4 v∗1 , v1 |c1 |v∗1 , v1 |c2 |v∗1 , u, v1 uv∗1 , v2 v∗1

are ∗-free. Since they are either semi-circular, quarter-circular or Haar elements they
are all normal and diffuse; as they generate pN p,we have that pN p is generated by
nine ∗-free normal and diffuse elements and thus, by Theorem 6, pN p ' L(F9 ).
Hence L(F3 )1/2 ' L(F9 ).

6.10 The general case L(Fn )1/k ' L(F1+(n−1)k2 )

Sketch We sketch now the proof for the general case of Equation (6.3). We write
L(Fn ) = vN(x1 , . . . , xn ) where for 1 ≤ i ≤ n − 1 each xi is a semi-circular element
of the form
 (i) (i) (i) 
s1 c12 . . . c1k

u 0 ... 0

 (i) ∗ (i) (i) 0 2u . . . 0 
1 c12 s2 . . . c2k 

xi = √  and where x =  .. . . ..  ,
 
. . . .. 
 n
k  .. . .  . . .
(i) ∗ (i) 0 0 . . . ku
c 1k ··· ··· s k

(i) (i)
with all s j ( j = 1, . . . , k; i = 1, . . . , n − 1) semi-circular, all c pq (1 ≤ p < q ≤ k;
i = 1, . . . , n − 1) circular, and u a Haar unitary, so that all elements are ∗-free.
So we have (n − 1)k semi-circular operators, (n − 1) 2k circular operators and

one Haar unitary. Each circular operator produces two free elements so we have in
total
k
(n − 1)k + 2(n − 1) + 1 = (n − 1)k2 + 1
2
free and diffuse generators. Thus L(Fn )1/k ' L(F1+(n−1)k2 ).

6.11 Interpolating free group factors

The formula L(Fn )1/k ' L(Fm ), which up to now makes sense only for integer
m, n and k, suggests that one might try to define L(Fr ) also for non-integer r by
compression. A crucial issue is that, by the above formula, different compressions
should give the same result. That this really works and is consistent was shown,
independently, by Dykema [67] and Rădulescu [150].

Theorem 13. Let R be the hyperfinite II1 factor and L(F∞ ) = vN(s1 , s2 , . . . ) be a
free group factor generated by countably many free semicircular elements si , such
that R and L(F∞ ) are free in some W ∗ -probability space (M, τ). Consider orthog-
6.12 The dichotomy for the free group factor isomorphism problem 179

onal projections p1 , p2 , · · · ∈ R and put r := 1 + ∑ j τ(p j )2 ∈ [1, ∞]. Then the von
Neumann algebra
L(Fr ) := vN(R, p j s j p j ( j ∈ N)) (6.4)
is a factor and depends, up to isomorphism, only on r.
These L(Fr ) for r ∈ R, 1 ≤ r ≤ ∞ are the interpolating free group factors. Note
that we do not claim to have non-integer free groups Fr . The notation L(Fr ) cannot
be split into smaller components.
Dykema and Rădulescu showed the following results.
Theorem 14. 1) For r ∈ {2, 3, 4, . . . , ∞} the interpolating free group factor L(Fr ) is
the usual free group factor.
2) We have for all r, s ≥ 1: L(Fr ) ? L(Fs ) ' L(Fr+s ).
3) We have for all r ≥ 1 and all t ∈ (0, ∞) the same compression formula as in
the integer case:
L(Fr ) t ' L(F1+t −2 (r−1) ). (6.5)
The compression formula above is also valid in the case r = ∞; since then 1 +
t −2 (r − 1) = ∞, it yields in this case that any compression of L(F∞ ) is isomorphic
to L(F∞ ); or in other words we have that the fundamental group of L(F∞ ) is equal
to R+ .

6.12 The dichotomy for the free group factor isomorphism problem
Whereas for r = ∞ the compression of L(Fr ) gives the same free group factor (and
thus we know that the fundamental group is maximal in this case), for r < ∞ we get
some other free group factors. Since we do not know whether these are isomorphic
to the original L(Fr ) we cannot decide upon the fundamental group in this case.
However, on the positive side, we can connect different free group factors by com-
pressions; this yields that some isomorphisms among the free group factors will
imply other isomorphisms. For example, if we would know that L(F2 ) ' L(F3 ),
then this would imply that also

L(F5 ) ' L(F2 ) 1/2 ' L(F3 ) 1/2 ' L(F9 ).

The possibility of using arbitrary t ∈ (0, ∞) in our compression formulas allows to

connect any two free group factors by compression, which gives then the following
dichotomy for the free group isomorphism problem. This is again due to Dykema
and Rădulescu.
Theorem 15. We have exactly one of the following two possibilities.
(i) All interpolating free group factors are isomorphic: L(Fr ) ' L(Fs ) for all
1 < r, s ≤ ∞. In this case the fundamental group of each L(Fr ) is equal to R+ .
(ii) The interpolating free group factors are pairwise non-isomorphic: L(Fr ) 6'
L(Fs ) for all 1 < r 6= s ≤ ∞. In this case the fundamental group of each L(Fr ), for
r 6= ∞, is equal to {1}.
Chapter 7
Free Entropy χ - the Microstates Approach via Large Deviations

An important concept in classical probability theory is Shannon’s notion of entropy.

Having developed the analogy between free and classical probability theory, one
hopes to find that a notion of free entropy exists in counterpart to the Shannon en-
tropy. In fact there is a useful notion of free entropy. However, the development
of this new concept is at present far from complete. The current state of affairs is
that there are two distinct approaches to free entropy. These should give isomor-
phic theories, but at present we only know that they coincide in a limited number of
situations.
The first approach to a theory of free entropy is via microstates. This is rooted
in the concept of large deviations. The second approach is microstates free. This
draws its inspiration from the statistical approach to classical entropy via the notion
of Fisher information. The unification problem in free probability theory is to prove
that these two theories of free entropy are consistent. We will in this chapter only talk
about the first approach via microstates; the next chapter will address the microstates
free approach.

7.1 Motivation
Let us return to the connection between random matrix theory and free probability
(1) (p)
theory which we have been developing. We know that a p-tuple (AN , . . . , AN ) of
N × N matrices chosen independently at random with respect to the GUE density
(compare Exercise 1.8), PN (A) = const · exp(−NTr(A2 )/2), on the space of N × N
Hermitian matrices converges almost surely (in moments with respect to the nor-
malized trace) to a freely independent family (s1 , . . . , s p ) of semi-circular elements
lying in a non-commutative probability space, see Theorem 4.4. The von Neumann
algebra generated by p freely independent semi-circulars is the von Neumann alge-
bra L(F p ) of the free group on p generators.
We ask now the following question: How likely is it to observe other distribu-
tions/operators for large N?

181
182 7 Free Entropy χ - the Microstates Approach via Large Deviations

Let us consider the case p = 1 more closely. For a random Hermitian matrix
A = A∗ (distribution as above) with real random eigenvalues λ1 ≤ · · · ≤ λN , denote
by
1
µA = (δλ1 + · · · + δλN ) (7.1)
N
the eigenvalue distribution of A (also known as the empirical eigenvalue distri-
bution), which is a random measure on R. Wigner’s semicircle law states that, as
N → ∞, PN (µA ≈ µW ) → 1, where µW is the (non-random) semi-circular distribu-
tion and µA ≈ µW means that the measures are close in a sense that can be made
precise. We are now interested in the deviations from this. What is the rate of de-
cay of the probability PN (µA ≈ ν), where ν is some measure (not necessarily the
semi-circle)? We expect that
2 I(ν)
PN (µA ≈ ν) ∼ e−N (7.2)

for some rate function I vanishing at µW . By analogy with the classical theory of
large deviations, I should correspond to a suitable notion of free entropy.
We used in the above the notion “≈” for meaning “being close” and “∼” for
“behaves asymptotically (in N) like”; here they should just be taken on an intuitive
level, later, in the actual theorems they will be made more precise.
In the next two sections we will recall some of the basic facts of the classical
theory of large deviations and, in particular, Sanov’s theorem; this standard material
can be found, for example, in the book [64]. In Section 7.4 we will come back to the
random matrix question.

7.2 Large deviation theory and Cramér’s Theorem

Consider a real-valued random variable X with distribution µ. Let X1 , X2 , . . . be
a sequence of independent identically distributed random variables with the same
distribution as X, and put Sn = (X1 + · · · + Xn )/n. Let m = E[X] and σ 2 = var(X) =
E[X 2 ] − m2 . Then the law of large numbers asserts that Sn → m, if E[|X|] < ∞; while
if E[X 2 ] < ∞ the central limit theorem tells us that for large n
σ
Sn ≈ m + √ N(0, 1). (7.3)
n

For example if µ = N(0, 1) is Gaussian then m = 0 and Sn has the Gaussian distri-
bution N(0, 1/n), and hence
√
2 n
P(Sn ≈ x) = P(Sn ∈ [x, x + dx]) ≈ e−nx /2 dx √ ∼ e−nI(x) dx.
2π
Thus the probability that Sn is near the value x decays exponentially in n at a rate
determined by x, namely the rate function I(x) = x2 /2. Note that the convex func-
7.2 Large deviation theory and Cramér’s Theorem 183

tion I(x) has a global minimum at x = 0, the minimum value there being 0, which
corresponds to the fact that Sn approaches the mean 0 in probability.
This behaviour is described in general by the following theorem of Cramér. Let
X, µ, {Xi }i and Sn be as above. There exists a function I(x), the rate function, such
that

P(Sn > x) ∼ e−nI(x) , x>m

−nI(x)
P(Sn < x) ∼ e , x < m.

How does one calculate the rate function I for a given distribution µ? We shall let
X be a random variable with the same distribution as the Xi ’s. For arbitrary x > m,
one has for all λ ≥ 0

P(Sn > x) = P(nSn > nx)

= P(eλ (nSn −nx) ≥ 1)
≤ E[eλ (nSn −nx) ] (by Markov’s inequality)
−λ nx λ (X1 +···+Xn )
=e E[e ]
−λ x λX n
= (e E[e ]) .

Here we are allowing that E[eλ X ] = +∞. Now put

Λ (λ ) := log E[eλ X ], (7.4)

the cumulant generating series of µ, c.f. Section 1.1. We consider Λ to be an ex-

tended real-valued function but here only consider µ for which Λ (λ ) is finite for all
real λ in some open set containing 0; however Cramér’s theorem (Theorem 1) holds
without this assumption. With this assumption Λ has a power series expansion with
radius of convergence λ0 > 0, and in particular all moments exist.
Exercise 1. Suppose that X is a real random variable and there is λ0 > 0 so that
for all |λ | ≤ λ0 we have E(eλ X ) < ∞. Then X has moments of all orders and the
function λ 7→ E(eλ X ) has a power series expansion with a radius of convergence of
at least λ0 .
Then the inequality above reads

P(Sn > x) ≤ e−λ nx+nΛ (λ ) = e−n(λ x−Λ (λ )) , (7.5)

which is valid for all 0 ≤ λ . By Jensen’s inequality we have, for all λ ∈ R,

Λ (λ ) = log E[eλ X ] ≥ E[log eλ X ] = λ m. (7.6)

This implies that for λ < 0 and x > m we have −n(λ x −Λ (λ )) ≥ 0 and so equation
(7.5) is valid for all λ . Thus
184 7 Free Entropy χ - the Microstates Approach via Large Deviations

P(Sn > x) ≤ inf e−n(λ x−Λ (λ )) = exp −n sup(λ x − Λ (λ )) .
λ λ

The function λ 7→ Λ (λ ) is convex, and the Legendre transform of Λ defined by

Λ ∗ (x) := sup(λ x − Λ (λ )) (7.7)

is also a convex function of x, as it is the supremum of a family of convex functions

of x.
Exercise 2. Show that (E(Xeλ X ))2 ≤ E(eλ X )E(Xeλ X ). Show that λ 7→ Λ (λ ) is
convex.
Note that Λ (0) = log 1 = 0, thus Λ ∗ (x) ≥ (0x − Λ (0)) = 0 is non-negative, and
hence equation (7.6) implies that Λ ∗ (m) = 0.
Thus we have proved that, for x > m,
∗ (x)
P(Sn > x) ≤ e−nΛ , (7.8)

where Λ ∗ is the Legendre transform of the cumulant generating function Λ . In the

same way one proves the same estimate for P(Sn < x) for x < m. This gives Λ ∗ as a
candidate for the rate function. Moreover we have by Exercise 3 that limn log[P(Sn >
x)]1/n exists and by Equation (7.8) this limit is less than exp (−Λ ∗ (x)). If we assume
that neither P(X > x) nor P(X < x) is 0, exp (−Λ ∗ (x)) will be the limit. In general
we have
1 1
− inf Λ ∗ (y) ≤ lim inf log P(Sn > x) ≤ lim sup log P(Sn ≥ x) ≤ − inf Λ ∗ (y).
y>x n n n n y≥x

Exercise 3. Let an = log P(Sn > a). Show that

(i) for all m, n: am+n ≥ am + an ;
(ii) for all m
an am
lim inf ≥ ;
n→∞ n m
(iii) lim an /n exists.
n→∞

However, in preparation for the vector valued version we will show that exp (−nΛ ∗ (x))
is asymptotically a lower bound; more precisely, we need to verify that
1
lim inf log P(x − δ < Sn < x + δ ) ≥ −Λ ∗ (x)
n→∞ n

for all x and all δ > 0. By replacing Xi by Xi − x we can reduce this to the case x = 0,
namely showing that
1
−Λ ∗ (0) ≤ lim inf log P(−δ < Sn < δ ). (7.9)
n→∞ n
7.2 Large deviation theory and Cramér’s Theorem 185

Note that −Λ ∗ (0) = infλ Λ (λ ). The idea of the proof of (7.9) is then to perturb
the distribution µ to µ̃ such that x = 0 is the mean of µ̃. Let us only consider the
case where Λ has a global minimum at some point η. This will always be the case if
µ has compact support and both P(X > 0) and P(X < 0) are not 0. The general case
can be reduced to this by a truncation argument. With this reduction Λ (λ ) is finite
for all λ and thus Λ has an infinite radius of convergence (c.f. Exercise 1) and thus
Λ is differentiable. So we have Λ 0 (η) = 0. Now let µ̃ be the measure on R such that

d µ̃(x) = eηx−Λ (η) dµ(x). (7.10)

Note that
Z Z
d µ̃(x) = e−Λ (η) eηx dµ(x) = e−Λ (η) E[eηX ] = e−Λ (η) eΛ (η) = 1,
R R

which verifies that µ̃ is a probability measure. Consider now i.i.d. random variables
{X̃i }i with distribution µ̃, and put S̃n = (X̃1 + · · · + X̃n )/n. Let X̃ have the distribution
µ̃. We have

= e−nε|η| enΛ (η) P(−ε < S̃n < ε).

By the weak law of large numbers, S̃n → E[X̃i ] = 0 in probability, i.e. we have
limn→∞ P(−ε < S̃n < ε) = 1 for all ε > 0. Thus for all 0 < ε < δ

1 1
lim inf log P(−δ < Sn < δ ) ≥ lim inf log P(−ε < Sn < ε)
n→∞ n n→∞ n
≥ Λ (η) − ε|η|, for all ε > 0
≥ Λ (η)
= infΛ (λ )
= −Λ ∗ (0).
186 7 Free Entropy χ - the Microstates Approach via Large Deviations

This sketches the proof of Cramér’s theorem for R. The higher-dimensional form
of Cramér’s theorem can be proved in a similar way.
Theorem 1 (Cramér’s Theorem for Rd ). Let X1 , X2 , . . . be a sequence of i.i.d ran-
dom vectors, i.e. independent Rd -valued random variables with common distribu-
tion µ (a probability measure on Rd ). Put

Λ (λ ) := E[ehλ ,Xi i ], λ ∈ Rd , (7.11)

and
Λ ∗ (x) := sup {hλ , xi − Λ (λ )}. (7.12)
λ ∈Rd

Assume that Λ (λ ) < ∞ for all λ ∈ Rd , and put Sn := (X1 + · · · + Xn )/n.

Then the distribution µSn of the random variable Sn satisfies a large deviation
principle with rate function Λ ∗ , i.e.
◦ x 7→ Λ ∗ (x) is lower semicontinuous (actually convex)
◦ Λ ∗ is good, i.e. {x ∈ Rd : Λ ∗ (x) ≤ α} is compact for all α ∈ R
◦ For any closed set F ⊂ Rd ,
1
lim sup log P(Sn ∈ F) ≤ − inf Λ ∗ (x) (7.13)
n→∞ n x∈F

◦ For any open set G ⊂ Rd ,

1
lim inf log P(Sn ∈ G) ≥ − inf Λ ∗ (x). (7.14)
n→∞ n x∈G

7.3 Sanov’s Theorem and entropy

We have seen Cramér’s theorem for Rd ; in an informal way it says P(Sn ≈ x) ∼
exp (−nΛ ∗ (x)). Actually, we are interested not in Sn , but in the empirical distribu-
tion (δX1 + · · · + δXn )/n.
Let us consider this in the special case of random variables Xi : Ω → A, taking
values in a finite alphabet A = {a1 , . . . , ad }, with pk := P(Xi = ak ). As n → ∞, the
empirical distribution of the Xi ’s should converge to the “most likely” probability
measure (p1 , . . . , pd ) on A.
Now define the vector of indicator functions Yi : Ω → Rd by

Yi := (1{a1 } (Xi ), . . . , 1{ad } (Xi )), (7.15)

so that in particular pk is equal to the probability that Yi will have a 1 in the k-th
spot and 0’s elsewhere. Then the averaged sum (Y1 + · · · + Yn )/n gives the relative
frequency of a1 , . . . , ad , i.e., it contains the same information as the empirical distri-
bution of (X1 , . . . , Xn ).
A probability measure on A is given by a d-tuple (q1 , . . . , qd ) of positive real
numbers satisfying q1 + · · · + qd = 1. By Cramér’s theorem,
7.3 Sanov’s Theorem and entropy 187

Y1 + · · · +Yn

1
P (δX1 + · · · + δXn ) ≈ (q1 , . . . , qd ) = P ≈ (q1 , . . . , qd )
n n
∗ (q ,...,q )
∼ e−nΛ 1 d .

Here
Λ (λ1 , . . . , λd ) = log E[ehλ ,Yi i ] = log(p1 eλ1 + · · · + pd eλd ).
Thus the Legendre transform is given by

Λ ∗ (q1 , . . . , qd ) = sup {λ1 q1 + · · · + λd qd − Λ (λ1 , . . . , λd )}.

(λ1 ,...,λd )

We compute the supremum over all tuples (λ1 , . . . , λd ) by finding the partial deriva-
tive ∂ /∂ λi of λ1 q1 + · · · + λd qd − Λ (λ1 , . . . , λd ) to be

1
qi − pi eλi .
p1 eλ1 + · · · + pd eλd
By concavity the maximum occurs when
qi qi
λi = log + log(p1 eλ1 + · · · + pd eλd ) = log + Λ (λ1 , . . . , λd ),
pi pi
and we compute

Λ ∗ (q1 , . . . , qd )
q1 qd
= q1 log + · · · + qd log + (q1 + · · · + qd )Λ (λ1 , . . . , λd ) − Λ (λ1 , . . . , λd )
p1 pd
q1 qd
= q1 log + · · · + qd log .
p1 pd

The latter quantity is Shannon’s relative entropy, H((q1 , . . . , qd )|(p1 , . . . , pd )), of

(q1 , . . . , qd ) with respect to (p1 , . . . , pd ). Note that H((q1 , . . . , qd )|(p1 , . . . , pd )) ≥ 0,
with equality holding if and only if q1 = p1 , . . . , qd = pd .
Thus (p1 , . . . , pd ) is the most likely realization, with other realizations exponen-
tially unlikely; their unlikelihood is measured by the rate function Λ ∗ ; and this rate
function is indeed Shannon’s relative entropy. This is the statement of Sanov’s the-
orem. We have proved it here for a finite alphabet; it also holds for continuous
distributions.

Theorem 2 (Sanov’s Theorem). Let X1 , X2 , . . . be i.i.d. real-valued random vari-

ables with common distribution µ, and let
1
νn = (δX1 + · · · + δXn ) (7.16)
n
be the empirical distribution of X1 , . . . , Xn , which is a random probability measure
on R. Then {νn }n satisfies a large deviation principle with rate function I(ν) =
188 7 Free Entropy χ - the Microstates Approach via Large Deviations

S(ν, µ) (called the relative entropy) given by

(R
p(t) log p(t)dµ(t), if dν = p dµ
I(ν) = (7.17)
+∞, otherwise .

Concretely, this means the following. Consider the set M of probability mea-
sures on R with the weak topology (which is a metrizable topology, e.g. by the Lévy
metric). Then for closed F and open G in M we have
1
lim sup log P(νn ∈ F) ≤ − inf S(ν, µ) (7.18)
n→∞ n ν∈F
1
lim inf log P(νn ∈ G) ≥ − inf S(ν, µ). (7.19)
n→∞ n ν∈G

7.4 Back to random matrices and one-dimensional free entropy

Consider again the space HN of Hermitian matrices equipped with the probability
measure PN having density
N 2
dPN (A) = const · e− 2 Tr(A ) dA. (7.20)

We let RN≥ = {(x1 , . . . , xN ) ∈ RN | x1 ≤ · · · ≤ xN }. For a self-adjoint matrix A we

write the eigenvalues of A as λ1 (A) ≤ · · · ≤ λN (A). The joint eigenvalue distribution
P̃N on RN≥ is defined by

P̃N (B) := PN {A ∈ HN | (λ1 (A), . . . , λN (A)) ∈ B}. (7.21)

The permutation group SN acts on RN by permuting the coordinates, with RN≥ as

a fundamental domain (ignoring sets of measure 0). So we can use this action to
transport P̃N around RN to get a probability measure on RN .
One knows (see, e.g., [6, Thm. 2.5.2]) that P̃N is absolutely continuous with re-
spect to Lebesgue measure on RN and has density

N
N
N 2
d P̃N (λ1 , . . . , λN ) = CN · e− 2 ∑i=1 λi ∏(λi − λ j )2 ∏ dλi , (7.22)
i< j i=1

where
2 /2
NN
CN = . (7.23)
(2π)N/2 ∏Nj=1 j!
We want to establish a large deviation principle for the empirical eigenvalue dis-
tribution µA = (δλ1 (A) + · · · + δλN (A) )/N of a random matrix in HN .
One can argue heuristically as follows for the expected form of the rate function.
We have
7.4 Back to random matrices and one-dimensional free entropy 189

1
PN {µA ≈ ν} = P̃N (δ + · · · + δλN ) ≈ ν
N λ1
Z
N
N
2
= CN · e− 2 ∑ λi ∏(λi − λ j )2 ∏ dλi .
{ N1 (δλ +···+δλ )≈ν} i< j i=1
1 N

Now for (δλ1 (A) + · · · + δλN (A) )/N ≈ ν,

N N 2 N2 1 N 2
− ∑ i λ = − ∑ λi
2 i=1 2 N i=1

is a Riemann sum for the integral t 2 dν(t). Moreover

∏(λi − λ j )2 = exp( ∑ log |λi − λ j |2 ) = exp( ∑ log |λi − λ j |)

i< j i< j i6= j

is a Riemann sum for N 2

RR
log |s − t|dν(s)dν(t).
Hence, heuristically, we expect that PN (µA ≈ ν) ∼ exp (−N 2 I(ν)), with

1 1
Z Z Z
I(ν) = − log |s − t|dν(s)dν(t) + t 2 dν(t) − lim logCN . (7.24)
2 N→∞ N2
The value of the limit can be explicitly computed as 3/4. Note that by writing
√
2 2 2 2 2 2 s2 + t 2
s + t − 4 log |s − t| = s + t − 2 log(s + t ) + 4 log
|s − t|

and using the inequalities

t − 2 logt ≥ 2 − 2 log 2 for t > 0 and 2(s2 + t 2 ) ≥ (s − t)2

we have for s 6= t that s2 + t 2 − 4 log |s − t| ≥ 2 − 4 log 2. This shows that if ν has

a finite second moment, the integral (s2 + t 2 − 4 log |s − t|) dν(s)dν(t) is always
RR

defined as an extended real number, possibly +∞, in which case we set I(ν) = +∞,
otherwise I(ν) is finite and is given by (7.24).
RR
Voiculescu was thus motivated to use the integral log |s − t| dµx (s)dµx (t) to
define in [181] the free entropy χ (x) for one self-adjoint variable x with distribution
µx , see equation (7.30).
The large deviation argument was then made rigorous in the following theorem
of Ben Arous and Guionnet [26].

Theorem 3. Put
1 3
ZZ Z
I(ν) = − log |s − t|dν(s)dν(t) + t 2 dν(t) − . (7.25)
2 4
Then:
190 7 Free Entropy χ - the Microstates Approach via Large Deviations

(i) I : M → [0, ∞] is a well-defined, convex, good function on the space, M, of

probability measures on R. It has unique minimum value of 0 which occurs at
the Wigner semicircle distribution µW with variance 1.
(ii) The empirical eigenvalue distribution satisfies a large deviation principle with
respect to P̃N with rate function I: we have for any open set G in M

1 δλ + · · · + δλN
lim inflog P̃N ( 1 ∈ G) ≥ − inf I(ν), (7.26)
N2
N→∞ N ν∈G

and for any closed set F in M

1 δλ + · · · + δλN
lim sup 2
log P̃N ( 1 ∈ F) ≤ − inf I(ν). (7.27)
N→∞ N N ν∈F

Exercise 4. The above theorem includes in particular the statement that for a Wigner
semicircle distribution µW with variance 1 we have
1
ZZ
− log |s − t| dµW (s)dµW (t) = . (7.28)
4
Prove this directly!

Exercise 5. (i) Let µ be a probability measure with support in [−2, 2]. Show that we
have Z 2
∞
1
Z Z
log |s − t|dµ(s)dµ(t) = − ∑ Cn (t)dµ(t) ,
R R n=1 2n R

where Cn are the Chebyshev polynomials of the first kind.

(ii) Use part (i) to give another derivation of (7.28).

7.5 Definition of multivariate free entropy

Let (M, τ) be a tracial W ∗ -probability space and x1 , . . . , xn self-adjoint elements in
M. Recall that by definition the joint distribution of the noncommutative random
variables x1 , . . . , xn is the collection of all mixed moments

distr(x1 , . . . , xn ) = {τ(xi1 xi2 · · · xik ) | k ∈ N, i1 , . . . , ik ∈ {1, . . . , n}}.

In this section we want to examine the probability that the distribution of

(x1 , . . . , xn ) occurs in Voiculescu’s multivariable generalization of Wigner’s semi-
circle law.
Let A1 , . . . , An be independent Gaussian random matrices: A1 , . . . , An are cho-
sen independently at random from the sample space MN (C)sa of N × N self-adjoint
matrices over C, equipped with Gaussian probability measure having density pro-
portional to exp(−Tr(A2 )/2) with respect to Lebesgue measure on MN (C)sa . We
distr
know that as N → ∞ we have almost sure convergence (A1 , . . . , An ) −→ (s1 , . . . , sn )
7.5 Definition of multivariate free entropy 191

with respect to the normalized trace, where (s1 , . . . , sn ) is a free semi-circular family.
Large deviations from this limit should be given by
2 I(x ,...,x )
PN {(A1 , . . . , An ) | distr(A1 , . . . , An ) ≈ distr(x1 , . . . , xn )} ∼ e−N 1 n
,

where I(x1 , . . . , xn ) is the free entropy of x1 , . . . , xn . The problem is that this has to
be made more precise and that, in contrast to the one-dimensional case, there is no
analytical formula to calculate this quantity.
We use the equation above as motivation to define free entropy as follows. This is
essentially the definition of Voiculescu from [182], the only difference is that he also
included a cut-off parameter R and required in the definition of the “microstate set”
Γ that kAi k ≤ R for all i = 1, . . . , n. Later it was shown by Belinschi and Bercovici
[20] that removing this cut-off condition gives the same quantity.

Definition 4. Given a tracial W ∗ -probability space (M, τ) and an n-tuple (x1 , . . . , xn )

of self-adjoint elements in M, we define the (microstates) free entropy χ (x1 , . . . , xn )
of the variables x1 , . . . , xn as follows. First, we put

Γ (x1 , . . . , xn ; N, r, ε)
:= (A1 , . . . , An ) ∈ MN (C)nsa | |tr(Ai1 · · · Aik ) − τ(xi1 · · · xik )| ≤ ε

for all 1 ≤ i1 , . . . , ik ≤ n, 1 ≤ k ≤ r .

In words, Γ (x1 , . . . , xn ; N, r, ε), which we call the set of microstates, is the set of all
n-tuples of N × N self-adjoint matrices which approximate the mixed moments of
the self-adjoint elements x1 , . . . , xn of length at most r to within ε.
2
Let Λ denote Lebesgue measure on MN (C)nsa ' RnN . Then we define

χ (x1 , . . . , xn ; r, ε) := lim sup 1 log Λ (Γ (x1 , . . . , xn ; N, r, ε)) + n log(N) ,

N→∞ N2 2

and finally put

χ (x1 , . . . , xn ) := lim χ (x1 , . . . , xn ; r, ε). (7.29)
r→∞
ε→0

It is an important open problem whether the lim sup in the definition above of
χ (x1 , . . . , xn ; r, ε) is actually a limit.
We want to elaborate on the meaning of Λ , the Lebesgue measure on MN (C)nsa '
2
RnN and the normalization constant n log(N)/2. Let us consider the case n = 1.
For a self-adjoint matrix A = (ai j )Ni,j=1 ∈ MN (C)sa we identify the elements on the
diagonal (which are real) and the real and imaginary part of the elements above the
diagonal (which are the adjoints of the corresponding elements below the diagonals)
with an N + 2 N(N−1) 2 = N 2 dimensional vector of real numbers. The actual choice
of this mapping is determined by the fact that we want the Euclidean inner product
2
in RN to correspond on the side of the matrices to the form (A, B) 7→ Tr(AB). Note
that
192 7 Free Entropy χ - the Microstates Approach via Large Deviations

N N
Tr(A2 ) = ai j a ji = ∑ (Reaii )2 + 2 (Reai j )2 + (Imai j )2 .

∑ ∑
i, j=1 i=1 1≤i< j≤N
√
This means that there is a difference of a factor 2 between the diagonal and the
off-diagonal elements. (The same effect made its appearance in Chapter 1, Exercise
8, when we defined the GUE by assigning different values for the covariances for
variables on and off the diagonal — in order to make this choice invariant under
conjugation by unitary matrices.) So our specific choice of a map between MN (C)
2
and RN means that we map the set {A ∈ MN (C)sa | Tr(A2 ) ≤ R2 } to the ball BN 2 (R)
of radius R in N 2 real dimensions. The pull back under this map of the Lebesgue
2
measure on RN is what we call Λ , the Lebesgue measure on MN (C)sa . The situation
for general n is given by taking products.
Note that a microstate (A1 , . . . , An ) ∈ Γ (x1 , . . . , xn ; N, r, ε) satisfies for r ≥ 2

1
Tr(A21 + · · · + A2n ) ≤ τ(x12 + · · · + xn2 ) + nε =: c2 ,
N
√
and thus the set of microstates Γ (x1 , . . . , xn ; N, r, ε) is contained in the ball BnN 2 ( Nc).
The fact that the latter grows logarithmically like
√ √ 2
1 √ 1 ( Nc π)nN n
logΛ BnN 2 ( Nc) = log ∼ − log N,
N2 N2 Γ (1 + nN 2 /2) 2

is the reason for adding the term n log N/2 in the definition of χ (x1 , . . . , xn ; r, ε).

7.6 Some important properties of χ

The free entropy has the following properties:
(i) For n = 1, much more can be said than for general n. In particular, one can show
that the lim sup in the definition of χ is indeed a limit, and that we have the
explicit formula
1 3
Z Z
χ (x) = log |s − t|dµx (s)dµx (t) + log(2π) + . (7.30)
2 4
Thus the definition of χ reduces in this case to the quantity from the previous
section. Our discussion before Theorem 3 shows then that χ (x) ∈ [−∞, ∞). For
n ≥ 2, no formula of this sort is known.
When x is a semi-circular operator with variance 1 we know the value of the
double integral by (7.28), hence for a semi-circular operator s with variance 1 we
have
χ (s) = 1 (1 + log(2π)). (7.31)
2
(ii) χ is subadditive:
χ (x1 , . . . , xn ) ≤ χ (x1 ) + · · · + χ (xn ). (7.32)
7.6 Some important properties of χ 193

This is an easy consequence of the fact that

n
Γ (x1 , . . . , xn ; N, r, ε) ⊂ ∏ Γ (xi ; N, r, ε).
i=1

Thus in particular, by using the corresponding property from (i), we always have:
χ (x1 , . . . , xn ) ∈ [−∞, ∞).
(m) (m) distr
(iii) χ is upper semicontinuous: if (x1 , . . . , xn ) −→ (x1 , . . . , xn ) for m → ∞, then

χ (x1 , . . . , xn ) ≥ lim sup χ (x(m) , . . . , xn(m) ). (7.33)

1
m→∞

This is because if, for arbitrary words of length k with 1 ≤ k ≤ r, we have

(m) (m) ε
|τ(xi1 · · · xik ) − τ(xi1 · · · xik )| <
2
for sufficiently large m, then
(m) (m) ε
Γ (x1 , . . . , xn ; N, r, ) ⊂ Γ (x1 , . . . , xn ; N, r, ε).
2
(iv) If x1 , . . . , xn are free, then χ (x1 , . . . , xn ) = χ (x1 ) + · · · + χ (xn ).
(v) χ (x1 , . . . , xn ), under the constraint ∑ τ(xi2 ) = n, has a unique maximum when
x1 , . . . , xn is a free semi-circular family (s1 , . . . , sn ) with τ(s2i ) = 1. In this case

χ (s1 , . . . , sn ) = n (1 + log(2π)). (7.34)

2
(vi) Consider y j = Fj (x1 , . . . , xn ), for some “convergent” non-commutative power se-
ries Fj , such that the mapping (x1 , . . . , xn ) 7→ (y1 , . . . , yn ) can be inverted by some
other power series. Then

χ (y1 , . . . , yn ) = χ (x1 , . . . , xn ) + n log(| det |J (x1 , . . . , xn )), (7.35)

where J is a non-commutative Jacobian and | det | is the Fuglede-Kadison deter-

minant. (We will provide more information on the Fuglede-Kadison determinant
in Chapter 11.)
With the exception of (ii) and (iii), the statements above are quite non-trivial; for
the proofs we refer to the original papers of Voiculescu [182, 186].
Exercise 6. (i) For an n-tuple (x1 , . . . , xn ) of self-adjoint elements in M and an in-
vertible real matrix T = (ti j )ni, j=1 ∈ Mn (R) we put yi := ∑nj=1 ti j x j ∈ M (i = 1, . . . , n).
Part (vi) of the above says then (by taking into account the meaning of the Fuglede-
Kadison determinant for matrices, see (11.4)) that

χ (y1 , . . . , yn ) = χ (x1 , . . . , xn ) + log | det T |. (7.36)

194 7 Free Entropy χ - the Microstates Approach via Large Deviations

Prove this directly from the definitions.

(ii) Show that χ (x1 , . . . , xn ) = −∞ if x1 , . . . , xn are linearly dependent.

7.7 Applications of free entropy to operator algebras

One hopes that χ can be used to construct invariants for von Neumann algebras. In
particular, we define the free entropy dimension of the n-tuple x1 , . . . , xn by
χ (x1 + εs1 , . . . , xn + εsn )
δ (x1 , . . . , xn ) = n + lim sup , (7.37)
ε&0 | log ε|

where s1 , . . . , sn is a free semi-circular family, free from {x1 , . . . , xn }.

One of the main problems in this context is to establish the validity (or false-
hood) of the following implication (or some variant thereof): if vN(x1 , . . . , xn ) =
vN(y1 , . . . , yn ), does this imply that δ (x1 , . . . , xn ) = δ (y1 , . . . , yn ).
In recent years there have been a number of results which allow one to infer some
properties of a von Neumann algebra from knowledge of the free entropy dimension
for some generators of this algebra. Similar statements can be made on the level of
the free entropy. However, there the actual value of χ is not important, the main
issue is to distinguish finite values of χ from the situation χ = −∞.
Let us note that in the case of free group factors L(Fn ) = vN(s1 , . . . , sn ) we have
of course for the canonical generators χ (s1 , . . . , sn ) > −∞ and δ (s1 , . . . , sn ) = n. (For
the latter one should notice that the sum of two free semi-circulars is just another
semi-circular, where the variances add; hence the numerator in (7.37) stays bounded
for ε → 0 in this case.)
We want now to give the idea how to use free entropy to get statements about a
von Neumann algebra. For this, let P be some property that a von Neumann algebra
M may or may not have. Assume that we can verify that “M has P” implies that
χ (x1 , . . . , xn ) = −∞ for any generating set vN(x1 , . . . , xn ) = M. Then a von Neumann
algebra for which we have at least one generating set with finite free entropy cannot
have this property P. In particular, L(Fn ) cannot have P.
Three such properties where this approach was successful are property Γ , the
existence of a Cartan subalgebra, and the property of being prime.
Let us first recall the definition of p property Γ . We will use here the usual
non-commutative L2 -norm, kxk2 := τ(x∗ x), for elements x in our tracial W ∗ -
probability space (M, τ).
Definition 5. A bounded sequence (tk )k≥0 in (M, τ) is central if limk→∞ k[x,tk ]k2 =
0 for all x ∈ M, where [·, ·] denotes the commutator of two elements, i.e., [x,tk ] =
xtk − tk x. If (tk )k is a central sequence and limk→∞ ktk − τ(tk )1k2 = 0, then (tk )k is
said to be a trivial central sequence. (M, τ) has property Γ if there exists a non-
trivial central sequence in M.
Note that elements from the centre of an algebra always give central sequences;
hence if M does not have property Γ then it is a factor.
7.7 Applications of free entropy to operator algebras 195

Definition 6. 1) Given any von Neumann subalgebra N of a von Neumann algebra

M we let the normalizer of N be the von Neumann subalgebra of M generated by all
the unitaries u ∈ M which normalize N, i.e. uNu∗ = N. A von Neumann subalgebra
N of M is said to be maximal abelian if it is abelian and is not properly contained in
any other abelian subalgebra. A maximal abelian subalgebra is a Cartan subalgebra
of M if its normalizer generates M.
2) Finally we recall that a finite von Neumann algebra M is prime if it cannot
be decomposed as M = M1 ⊗M2 for II1 factors M1 and M2 . Here ⊗ denotes the von
Neumann tensor product of M1 and M2 , see [171, Ch. IV].

The above mentioned strategy is the basis of the proof of the following theorem.

Theorem 7. Let M be a finite von Neumann algebra with trace τ generated by self-
adjoint operators x1 , . . . , xn , where n ≥ 2. Assume that χ (x1 , . . . , xn ) > −∞, where
the free entropy is calculated with respect to the trace τ. Then
(i) M does not have property Γ . In particular, M is a factor.
(ii) M does not have a Cartan subalgebra.
(iii) M is prime.

Corollary 8. All this applies in the case of the free group factor L(Fn ) for 2 ≤ n <
∞, thus:
(i) L(Fn ) does not have property Γ .
(ii) L(Fn ) does not have a Cartan subalgebra.
(iii) L(Fn ) is prime.

Parts (i) and (ii) of the theorem above are due to Voiculescu [185], part (iii)
was proved by Liming Ge [76]. In particular, the absence of Cartan subalgebras
for L(Fn ) was a spectacular result, as it falsified the conjecture, which had been
open for decades, that every II1 factor should possess a Cartan subalgebra. Such
a conjecture was suggested by the fact that von Neumann algebras obtained from
ergodic measurable relations always have Cartan subalgebras and for a while there
was the hope that all von Neumann algebras might arise in this way.
In order to give a more concrete idea of this approach we will present the essential
steps in the proof for part (i) (which is the simplest part of the theorem above) and
say a few words about the proof of part (iii). However, one should note that the
absence of property Γ for L(Fn ) is an old result of Murray and von Neumann which
can be proved more directly without using free entropy. The following follows quite
closely the exposition of Biane [36].
196 7 Free Entropy χ - the Microstates Approach via Large Deviations

7.7.1 The proof of Theorem 7, part (i)

We now give the main arguments and estimates for the proof of Part (i) of Theo-
rem 7. So let M = vN(x1 , . . . , xn ) have property Γ ; we must prove that this implies
χ (x1 , . . . , xn ) = −∞.
Let (tk )k be a non-trivial central sequence in M. Then its real and imaginary
parts are also central sequences (at least one of them non-trivial) and, by applying
functional calculus to this sequence, we may replace the tk ’s with a non-trivial cen-
tral sequence of orthogonal projections (pk )k , and assume the existence of a real
number θ in the open interval (0, 1/2) such that θ < τ(pk ) < 1 − θ for all k and
limk→∞ k[x, pk ]k2 = 0 for all x ∈ M.
We then prove the following key lemma.
Lemma 9. Let (M, τ) be a tracial W ∗ -probability space generated by self-adjoint
elements x1 , . . . , xn satisfying τ(xi2 ) ≤ 1. Let 0 < θ < 12 be a constant and p ∈ M a
projection such that θ < τ(p) < 1 − θ . If there is ω > 0 such that k[p, xi ]k2 < ω for
1 ≤ i ≤ n then there exist positive constants C1 ,C2 depending only on n and θ such
that χ (x1 , . . . , xn ) ≤ C1 +C2 log ω.
Assuming this is proved, choose p = pk . We can take ωk → 0 as k → ∞. Thus
we get χ (x1 , . . . , xn ) ≤ C1 + C2 log ω for all ω > 0, implying χ (x1 , . . . , xn ) = −∞.
(Note that we can achieve the assumption τ(xi2 ) ≤ 1 by rescaling our generators.) It
remains to prove the lemma.
Proof: Take (A1 , . . . , An ) ∈ Γ (x1 , . . . , xn ; N, r, ε) for N, r sufficiently large and ε suf-
ficiently small. As p can be approximated by polynomials in x1 , . . . , xn and by an ap-
plication of the functional calculus, we find a projection matrix Q ∈ MN (C) whose
range is a subspace of dimension q = bNτ(p)c and such that we have (where the
k · k2 -norm is now with respect to tr in MN (C)) k[Ai , Q]k2 < 2ω for all i = 1, . . . , n.
This Q is of the form
Iq 0
Q=U U∗
0 0N−q
for some U ∈ U(N)/U(q) × U(N − q). Write

Bi Ci∗

U ∗ AiU = .
Ci Di

Then k[Ai , Q]k2 ≤ 2ω implies the same for the conjugated matrices, i.e.,
r
0 −Ci∗ Bi Ci∗

2 ∗ 10 = k[Ai , Q]k2 < 2ω,
Tr(CiCi ) = = ,
N Ci 0
2
C i D i 0 0 2

and thus we have for all i = 1, . . . , n

N
Tr(CiCi∗ ) < (2ω)2 = 2Nω 2 .
2
7.7 Applications of free entropy to operator algebras 197

Furthermore, τ(xi2 ) ≤ 1 implies that tr(A2i ) ≤ 1 + ε and hence Tr(A2i ) ≤ (1 + ε)N ≤

2N, since we can take ε ≤ 1. Thus, in particular, we also have Tr(B2i ) ≤ 2N and
Tr(D2i ) ≤ 2N.
Denote now by B p (R) the ball of radius R in R p centred at the origin and consider
the map which sends our matrices Ai ∈ MN (C) to the Euclidean N2
√ space R . Then
the latter conditions mean that each Bi is contained in a ball Bq2 ( 2N) and that each
√
Di is contained in a ball B(N−q)2 ( 2N). For the rectangular q × (N − q) matrix Ci ∈
Mq,N−q (C) ' R√ 2q(N−q) the condition Tr(CC∗ ) ≤ 2Nω 2 means that C is contained in
√
a ball B2q(N−q) ( 4Nω). (Here we get an extra factor 2, because all elements from
Ci correspond to upper triangular elements from Ai .)
Thus, the estimates above show that we can cover Γ (x1 , . . . , xn ; N, r, ε) by a union
of products of balls:

Γ (x1 , . . . , xn ; N, r, ε) ⊆
h √ √ √ in
U Bq2 ( 2N) × B2q(N−q) (ω 4N) × B(N−q)2 ( 2N) U ∗ .
[

U∈
U (N)/U (q)×U (N−q)

This does not give directly an estimate for the volume of our set Γ , as we have here
a covering by infinitely many sets. However, we can reduce this to a finite cover by
approximating the U’s which appear by elements from a finite δ -net.
By a result of Szarek [169], for any δ > 0 there exists a δ -net (Us )s∈S in the
2 2 2
Grassmannian U(N)/U(q)× U(N −q) with |S| ≤ (Cδ −1 )N −q −(N−q) with C a uni-
versal constant.
For (A1 , . . . , An ), Q, and U as above, we have that there exists s ∈ S such that
kU − Us k ≤ δ implies k[Us∗ AiUs ,U ∗ QU]k2 ≤ 2ω + 8δ . Repeating the arguments
above for Us∗ AiUs instead of U ∗ AiU (where we have to replace 2ω by 2ω + 8δ ) we
get

Γ (x1 , . . . , xn ; N, r, ε) ⊆
[h √ √ √ in
Us Bq2 ( 2N) × B2q(N−q) (ω + 4δ ) 4N × B(N−q)2 ( 2N) Us∗ , (7.38)
s∈S

and hence
2 2 2
Λ (Γ (x1 , . . . , xn ; N, r, ε)) ≤ (Cδ −1 )N −q −(N−q)
h √ √ √ in
× Λ Bq2 ( 2N) Λ B2q(N−q) (ω + 4δ ) 4N Λ B(N−q)2 ( 2N) .

By using the explicit form of the Lebesgue measure of B p (R) as

R p π p/2
Λ (B p (R)) = ,
Γ (1 + 2p )
198 7 Free Entropy χ - the Microstates Approach via Large Deviations

this simplifies to the bound

" 2 √ #n
(2Nπ)N /2 [ 2(ω + 4δ )]2q(N−q)
(Cδ −1 )2q(N−q) .
Γ (1 + q2 /2)Γ (1 + q(N − q))Γ (1 + (N − q)2 /2)

Thus

1 n
logΛ (Γ (x1 , . . . , xn ; N, r, ε)) + log N ≤ C̃1 + C̃2 log δ −1 + n log(ω + 4δ ) ,

N 2 2
for positive constants C̃1 , C̃2 depending only on n and θ . Taking now δ = ω gives
the claimed estimate with C1 := C̃1 + n log 5 and C2 := (n − 1)C̃2 .
One should note that our estimates work for all n. However, in order to have C2
strictly positive, we need n > 1. For n = 1 we only get an estimate against a constant
C1 , which is not very useful. This corresponds to the fact that for each i the smallness
of the off-diagonal block Ci of U ∗ AiU in some basis U is not very surprising; how-
ever, if we have the smallness of all such blocks C1 , . . . ,Cn of U ∗ A1U, . . . ,U ∗ AnU
for a common U, then this is a much stronger constraint.

7.7.2 The proof of Theorem 7, part (iii)

The proof of part (iii) proceeds in a similar, though technically more complicated,
fashion. Let us assume that our II1 factor M = vN(x1 , . . . , xn ) has a Cartan subalge-
bra N. We have to show that this implies χ (x1 , . . . , xn ) = −∞.
First one has to rewrite the property of having a Cartan subalgebra in a more
algebraic way, encoding a kind of “smallness”. Voiculescu showed the following.
For each ε > 0 there exist: a finite-dimensional C∗ -subalgebra N0 of N; k( j) ∈ N
(i) (i) (i)
for all 1 ≤ j ≤ n; orthogonal projections p j , q j ∈ N0 and elements x j ∈ M for all
(i) (i) (i) (i)
j = 1, . . . , n and 1 ≤ i ≤ k( j); such that the following holds: x j = p j x j q j for all
j = 1, . . . , n and 1 ≤ i ≤ k( j),
(i) (i)∗
kx j − ∑ (x j + x j )k2 < ε for all j = 1, . . . , n, (7.39)
1≤i≤k( j)

and
(i) (i)
∑ ∑ τ(p j )τ(q j ) < ε.
1≤ j≤n 1≤i≤k( j)

Consider now a microstate (A1 , . . . , An ) ∈ Γ (x1 , . . . , xn ; N, r, ε). Since polynomi-

(i) (i)
als in the generators x1 , . . . , xn approximate the given projections p j , q j ∈ N0 ⊂
M, the same polynomials in the matrices A1 , . . . , An will approximate versions
of these projections in finite matrices. Thus we find a unitary matrix such that
(UA1U ∗ , . . . ,UAnU ∗ ) is of a special form with respect to fixed matrix versions of
7.7 Applications of free entropy to operator algebras 199

the projections. This gives some constraints on the volume of possible microstates.
Again, in order to get rid of the freedom of conjugating by an arbitrary unitary ma-
trix one covers the unitary N × N matrices by a δ -net S and gets so in the end a sim-
ilar bound as in (7.38). Invoking from [169] the result that one can choose a δ -net
2
with |S| < (C/δ )N leads finally to an estimate for χ (x1 , . . . , xn ) as in Lemma 9. The
bound in this estimate goes to −∞ for ε → 0, which proves that χ (x1 , . . . , xn ) = −∞.
Chapter 8
Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher
Information

In classical probability theory there exist two important concepts which measure the
amount of “information” of a given distribution. These are the Fisher information
and the entropy. There exist various relations between these quantities and they form
a cornerstone of classical probability theory and statistics. Voiculescu introduced
free probability analogues of these quantities, called free Fisher information and
free entropy, denoted by Φ and χ , respectively. However, there remain some gaps in
our present understanding of these quantities. In particular, there exist two different
approaches, each of them yielding a notion of entropy and Fisher information. One
hopes that finally one will be able to prove that both approaches give the same result,
but at the moment this is not clear. Thus for the time being we have to distinguish
the entropy χ and the free Fisher information Φ coming from the first approach (via
microstates) and the free entropy χ ∗ and the free Fisher information Φ ∗ coming
from the second, non-microstates approach (via conjugate variables).
Whereas we considered the microstates approach for χ in the previous chapter,
we will in this chapter deal with the second approach, which fits quite nicely with
the combinatorial theory of freeness. In this approach the Fisher information is the
basic quantity (in terms of which the free entropy χ ∗ is defined), so we will restrict
our attention mainly to Φ ∗ .
The concepts of information and entropy are only useful when we consider
states (so that we can use the positivity of ϕ to get estimates for the information
or entropy). Thus in this section we will always work in the framework of a W ∗ -
probability space. Furthermore, it is crucial that we work with a faithful normal
trace. The extension of the present theory to non-tracial situations is unclear.

8.1 Non-commutative derivatives

In Chapter 2 we already encountered non-commutative derivatives, on an informal
level, in connection with the subordination property of free convolution. Here we
will introduce and investigate these non-commutative derivatives more thoroughly.

201
202 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

Definition 1. We denote by ChX1 , . . . , Xn i the algebra of polynomials in n non-

commuting variables X1 , . . . , Xn . On this we define the partial non-commutative
derivatives ∂i (i = 1, . . . , n) as linear mappings

∂i : ChX1 , . . . , Xn i → ChX1 , . . . , Xn i ⊗ ChX1 , . . . , Xn i

by
∂i 1 = 0, ∂i X j = δi j 1 ⊗ 1 ( j = 1, . . . , n),
and by the Leibniz rule

∂i (P1 P2 ) = ∂i (P1 ) · 1 ⊗ P2 + P1 ⊗ 1 · ∂i (P2 ) (P1 , P2 ∈ ChX1 , . . . , Xn i).

This means that ∂i is given on monomials by

m
∂i (Xi(1) · · · Xi(m) ) = ∑ δi,i(k) Xi(1) · · · Xi(k−1) ⊗ Xi(k+1) · · · Xi(m) . (8.1)
k=1

Example 2. Consider the monomial P(X1 , X2 , X3 ) = X2 X13 X3 X1 . Then we have

∂1 P = X2 ⊗ X12 X3 X1 + X2 X1 ⊗ X1 X3 X1 + X2 X12 ⊗ X3 X1 + X2 X13 X3 ⊗ 1

∂2 P = 1 ⊗ X13 X3 X1
∂3 P = X2 X13 ⊗ X1 .

Exercise 1. (i) Prove, for i ∈ {1, . . . , n}, the co-associativity of ∂i

(id ⊗ ∂i ) ◦ ∂i = (∂i ⊗ id) ◦ ∂i . (8.2)

(ii) If one mixes different partial derivatives the situation becomes more compli-
cated. Show that (id ⊗ ∂i ) ◦ ∂ j = (∂ j ⊗ id) ◦ ∂i , but in general for i 6= j (id ⊗ ∂i ) ◦ ∂ j 6=
(∂i ⊗ id) ◦ ∂ j .

Proposition 3. In the case n = 1, we can identify ChXi⊗ChXi with the polynomials

C[X,Y ] in two commuting variables X and Y , via X =X ˆ ⊗ 1 and Y =1
ˆ ⊗ X. With this
identification ∂ := ∂1 is given by the free difference quotient

P(X) − P(Y )
∂ P(X)=
ˆ .
X −Y

Proof: It suffices to consider P(X) = X m ; then we have

∂ P(X) = 1 ⊗ X m−1 + X ⊗ X m−2 + X 2 ⊗ X m−3 + · · · + X m−1 ⊗ 1

and
X m −Y m
= X m−1 + X m−2Y + X m−3Y 2 + · · · +Y m−1 .
X −Y
8.1 Non-commutative derivatives 203

One should note that in the non-commutative world there exists another canonical
derivation into the tensor product, namely the mapping P 7→ P ⊗ 1 − 1 ⊗ P. Actually,
there is an important relation between this derivation and our partial derivatives.
Lemma 4. For all P ∈ ChX1 , . . . , Xn i we have:
n
∑ ∂ j P · X j ⊗ 1 − 1 ⊗ X j · ∂ j P = P ⊗ 1 − 1 ⊗ P. (8.3)
j=1

Exercise 2. Prove Lemma 4 by checking it for monomials P.

This allows an easy proof of the following free version of a Poincaré inequality.
This is an unpublished result of Voiculescu and can be found in [63].
In this inequality we will apply our non-commutative polynomials to operators
x1 , . . . , xn ∈ M. If P = P(X1 , . . . , Xn ) ∈ ChX1 , . . . , Xn i, then P(x1 , . . . , xn ) ∈ M is ob-
tained by replacing each of the variables Xi by the corresponding xi . Note in particu-
lar that this applies also to the right-hand side of the inequality. There ∂i P is an ele-
⊗2 and ∂ P(x , . . . , x ) is to be understood as (∂ P)(x , . . . , x ).
ment in ChX1 , . . . , Xpni i 1 n i 1 n
As usual, kak2 := τ(a∗ a) denotes the non-commutative L2 -norm given by τ, and
with L2 (M) we denote the completion of M with respect to this norm. The L2 -norm
on the right-hand side of the inequality is of course with respect to τ ⊗ τ.
Theorem 5 (Free Poincaré Inequality). Let (M, τ) be a tracial W ∗ -probability
space. Consider self-adjoint x1 , . . . , xn ∈ M. Then we have for all P = P∗ ∈ ChX1 , . . . ,
Xn i the inequality
n
kP(x1 , . . . , xn ) − τ(P(x1 , . . . , xn ))k2 ≤ C · ∑ k∂i P(x1 , . . . , xn )k2 , (8.4)
i=1
√
where C := 2 max j=1,...,n kx j k.

Proof: Let us put p := P(x1 , . . . , xn ) and qi := (∂i P)(x1 , . . . , xn ). It suffices to con-

sider P with τ(p) = 0. Then we get from Lemma 4
n
kp ⊗ 1 − 1 ⊗ pk2 = k ∑ qi · xi ⊗ 1 − 1 ⊗ xi · qi k2
i=1
n
≤ ∑ kqi · xi ⊗ 1k2 +k1 ⊗ xi · qi k2
i=1 | {z }
≤kqi k2 ·kxi ⊗1k
n
≤ 2 max kx j k ∑ kq j k2 .
j=1,...,n i=1

On the other hand we have (recall that τ(p) = 0)

204 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

kp ⊗ 1 − 1 ⊗ pk22 = τ ⊗ τ (p ⊗ 1 − 1 ⊗ p)2

= τ ⊗ τ[p2 ⊗ 1 + 1 ⊗ p2 − 2p ⊗ p]
= 2τ(p2 )
= 2kpk22 .

Corollary 6. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for i =

1, . . . , n. Consider P = P∗ ∈ ChX1 , . . . , Xn i. Assume that (∂i P)(x1 , . . . , xn ) = 0 for all
i = 1, . . . , n. Then p := P(x1 , . . . , xn ) is a constant, p = τ(p) · 1.

8.2 ∂i as unbounded operator on Chx1 , . . . , xn i

Let (M, τ) be a tracial W ∗ -probability space and consider xi = xi∗ ∈ M (i = 1, . . . , n)
and let Chx1 , . . . , xn i be the ∗-subalgebra of M generated by x1 . . . . , xn . We shall con-
tinue to denote by ChX1 , . . . , Xn i the algebra generated by the non-commuting ran-
dom variables X1 , . . . , Xn . We always have a evaluation map eval : ChX1 , . . . , Xn i →
Chx1 , . . . , xn i which sends Xi1 · · · Xik to xi1 · · · xik .
If the evaluation map extends to an algebra isomorphism (i.e. has a trivial kernel),
then we say that the operators x1 , . . . , xn are algebraically free.
In the case that x1 , . . . , xn are algebraically free the operators ∂i can also be de-
fined as derivatives on Chx1 , . . . , xn i ⊂ M, according to the commutative diagram

i∂
ChX1 , . . . , Xn i −→ ChX1 , . . . , Xn i ⊗ ChX1 , . . . , Xn i
↓ ↓
eval eval
↓ ↓
Chx1 , . . . , xn i −→ Chx1 , . . . , xn i ⊗ Chx1 , . . . , xn i

In that case we can consider ∂i as unbounded operator on L2 .

Notation 7 We denote by
k·k p
L p (x1 , . . . , xn ) := Chx1 , . . . , xn i ⊂ L p (M)

the closure of Chx1 , . . . , xn i ⊂ M with respect to the L p norms (1 ≤ p < ∞)

kak pp := τ (|a| p ) = τ (a∗ a) p/2 .

Hence, in the case where x1 , . . . , xn are algebraically free, ∂i is then also an un-
bounded operator on L2 , ∂i : L2 (x1 , . . . , xn ) ⊃ D(∂i ) → L2 (x1 , . . . , xn ) ⊗ L2 (x1 , . . . , xn )
with domain D(∂i ) = Chx1 , . . . , xn i. In order that unbounded operators have a nice
analytic structure they should be closable. In terms of the adjoint, this means that
the adjoint operator
8.2 ∂i as unbounded operator on Chx1 , . . . , xn i 205

∂i∗ : L2 (x1 , . . . , xn ) ⊗ L2 (x1 , . . . , xn ) ⊃ D(∂i∗ ) → L2 (x1 , . . . , xn )

should be densely defined. One simple way to guarantee this is to have 1 ⊗ 1 in

the domain D(∂i∗ ). The following theorem shows that this then implies that all
of Chx1 , . . . , xn i ⊗ Chx1 , . . . , xn i (which is by definition dense in L2 (x1 , . . . , xn ) ⊗
L2 (x1 , . . . , xn )) is in the domain of the adjoint. The proof of this is a direct calcu-
lation, which we leave as an exercise.

Theorem 8. Assume 1 ⊗ 1 ∈ D(∂i∗ ). Then ∂i is closable. We have

Chx1 , . . . , xn i ⊗ Chx1 , . . . , xn i ⊂ D(∂i∗ ) (8.5)

and for elementary tensors p ⊗ q with p, q ∈ Chx1 , . . . , xn i the action of ∂i∗ is given
by

∂i∗ (p ⊗ q) = p · ∂i∗ (1 ⊗ 1) · q − p · (τ ⊗ id)(∂i q) − (id ⊗ τ)(∂i p) · q. (8.6)

In the following we will use the notation ξi := ∂i∗ (1 ⊗ 1) (i = 1, . . . , n). In the

next section we will see that the vectors ξi actually play a quite prominent role in
the definition of the free Fisher information.
Exercise 3. (i) On L2 (x1 , . . . , xn ) we may extend the map x 7→ x∗ to a bounded
conjugate linear operator J, called the modular conjugation operator. For η ∈
L2 (x1 , . . . , xn ) and p ∈ Chx1 , . . . , xn i we have hJ(η), pi = hη, J(p)i = hη, p∗ i. Show
that we have hξi , pi = hξi , p∗ i for all p ∈ Chx1 , . . . , xn i and thus ξi is self-adjoint, i.e.
J(ξi ) = ξi .
(ii) Show that we have for all p ∈ Chx1 , . . . , xn i the identity

(τ ⊗ id)[(∂i p∗ )∗ ] = (id ⊗ τ)(∂i p).

(iii) Recall that the domain of ∂i∗ is

D(∂i∗ ) = {η ∈ L2 ⊗ L2 | ∃η 0 ∈ L2 such that hη 0 , ri = hη, ∂i ri ∀r ∈ Chx1 , . . . , xn i}.

For such an η we set ∂i∗ (η) = η 0 . Prove Theorem 8 by showing that for all r ∈
Chx1 , . . . , xn i we have h∂i∗ (p ⊗ q), ri = hp ⊗ q, ∂i ri when we use the right-hand side
of (8.6) as the definition of ∂i∗ (p ⊗ q).
(iv) Show that

h(id ⊗ τ)(∂i p), (id ⊗ τ)(∂i q)i = h1 ⊗ ξi − ξi ⊗ 1, ∂i p∗ · 1 ⊗ qi.

for all p, q ∈ Chx1 , . . . , xn i.

(v) Show that also the unbounded operator (id ⊗τ)◦∂i , with domain Chx1 , . . . , xn i,
is a closable operator on L2 (x1 , . . . , xn ).
206 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

Although ∂i is an unbounded operator from L2 to L2 it turns out that is has some

surprising boundedness properties in an appropriate sense. This observation is due
to Dabrowski [63]. Our presentation follows essentially his arguments.
Proposition 9. Assume that 1 ⊗ 1 ∈ D(∂i∗ ). Then we have for all p, q ∈ Chx1 , . . . , xn i
the identity
h∂i∗ (p ⊗ 1), ∂i∗ (q ⊗ 1)i = h∂i∗ (1 ⊗ 1), ∂i∗ (p∗ q ⊗ 1)i (8.7)
and thus
k(id ⊗ τ)(∂i p) − pξi k22 = kpξi k22 − hξi ⊗ 1, ∂i (p∗ p)i. (8.8)
Proof: By Eq. (8.6) we have

∂i∗ (p ⊗ 1) = pξi − (id ⊗ τ)(∂i p), ∂i∗ (q ⊗ 1) = qξi − (id ⊗ τ)(∂i q)

and

∂i∗ (p∗ q ⊗ 1) = p∗ qξi − (id ⊗ τ)[∂i (p∗ q)]

= p∗ qξi − (id ⊗ τ)[∂i p∗ · 1 ⊗ q] − p∗ · (id ⊗ τ)[∂i q].

Hence our assertion (8.7) is equivalent to

hpξi − (id ⊗ τ)(∂i p), qξi − (id ⊗ τ)(∂i q)i

= hξi , p∗ qξi − (id ⊗ τ)[∂i p∗ · 1 ⊗ q)] − p∗ · (id ⊗ τ)[∂i q]i.

There are two terms which show up obviously on both sides and thus we are left
with showing

−h(id ⊗ τ)(∂i p), qξi i + h(id ⊗ τ)(∂i p), (id ⊗ τ)(∂i q)i = −hξi , (id ⊗ τ)[∂i p∗ · 1 ⊗ q]i.

If we interpret τ as the operator from L2 to C given by τ(ξ ) = hξ , 1i, then

(id ⊗ τ)∗ (ξ ) = ξ ⊗ 1.

Thus
hξi , (id ⊗ τ)[∂i p∗ · 1 ⊗ q]i = hξi ⊗ 1, ∂i p∗ · 1 ⊗ qi
and

h(id ⊗ τ)(∂i p), qξi i = hξi q∗ , ((id ⊗ τ)[∂i p])∗ i

= hξi q∗ , (τ ⊗ id)[∂i p∗ ]i
= hξi , (τ ⊗ id)[∂i p∗ ] · 1 ⊗ qi,

then (8.7) follows from Exercise 3.

A similar calculation shows that for r ∈ Chx1 , . . . , xn i we have

hpξi − (id ⊗ τ)(∂i p), ri = h∂i∗ (p ⊗ 1), ri.

8.2 ∂i as unbounded operator on Chx1 , . . . , xn i 207

Thus pξi − (id ⊗ τ)(∂i p) = ∂i∗ (p ⊗ 1). This then implies Eq. (8.8) as follows:

k(id ⊗ τ)(∂i p) − pξi k22 = h∂i∗ (p ⊗ 1), ∂i∗ (p ⊗ 1)i

= hξi , ∂i∗ (p∗ p ⊗ 1)i
= hξi , (p∗ p)ξi − (id ⊗ τ)[∂i (p∗ p)]i
= hpξi , pξi i − hξi , (id ⊗ τ)[∂i (p∗ p)]i
= hpξi , pξi i − hξi ⊗ 1, ∂i (p∗ p)i.

Theorem 10. Assume that 1 ⊗ 1 ∈ D(∂i∗ ). Then we have for all p ∈ Chx1 , . . . , xn i the
inequality
k(id ⊗ τ)(∂i p) − pξi k2 ≤ kξi k2 · kpk. (8.9)
Hence, with M = vN(x1 , . . . , xn ), the mapping (id ⊗ τ) ◦ ∂i extends to a bounded
mapping M → L2 (M) and we have

k(id ⊗ τ) ◦ ∂i kM→L2 (M) ≤ 2kξi k2 . (8.10)

Proof: Assume that inequality (8.9) has been proved. Then we have

k(id ⊗ τ)∂i pk2 ≤ kξi k2 · kpk + kpξi k2 ≤ 2kξi k2 · kpk

for all p ∈ Chx1 , . . . , xn i. This says that (id ⊗ τ) ◦ ∂i as a linear mapping from
Chx1 , . . . , xn i ⊂ M to L2 (M) has norm less or equal to 2kξi k2 . It is also easy to
check (see Exercise 3) that (id ⊗ τ) ◦ ∂i is closable as an unbounded operator from
L2 to L2 and hence, by the following Proposition 11, it can be extended to a bounded
mapping on M, with the same bound: 2kξi k2 .
So it remains to prove (8.9). By (8.8), we have

k(id ⊗ τ)∂i p − pξi k22 = h∂i∗ (p ⊗ 1), ∂i∗ (p ⊗ 1)i

= hξi , (p∗ p)ξi − (id ⊗ τ)(∂i (p∗ p))i
≤ kξi k2 · k(id ⊗ τ)(∂i (p∗ p)) − (p∗ p)ξi k2 .

So, by iteration we get

1/2 1/2
k(id ⊗ τ)(∂i p) − pξi k2 ≤ kξi k2 · k(id ⊗ τ)(∂i (p∗ p)) − (p∗ p)ξi k2
1/2 1/4 1/4
≤ kξi k2 · kξi k2 · k(id ⊗ τ)(∂i (p∗ p)2 ) − (p∗ p)2 ξi k2
1/2+1/4+···+1/2n n−1 n−1 1/2n
≤ kξi k2 · k(id ⊗ τ)(∂i (p∗ p)2 ) − (p∗ p)2 ξi k2 .

Now note that the first factor converges, for n → ∞, to kξi k2 , whereas for the second
factor we can bound as follows:
n−1 n−1 1/2n
k(id ⊗ τ)[∂i ((p∗ p)2 )] − (p∗ p)2 ξi k2
208 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
n−1 n−1
1/2n
≤ k∂i ((p∗ p)2 )k2 + kp∗ pk2 · kξi k2
∗
1/2n
n−1 k∂i (p p)k2

≤ kpk · 2 + kξi k2 ,
kp∗ pk

where we have used the inequality

n−1 n−1 −1
k∂i (p∗ p)2 k2 ≤ 2n−1 kp∗ pk2 k∂i (p∗ p)k2 .

Sending n → ∞ gives now the assertion.

Proposition 11. Let (M, τ) be a tracial W ∗ -probability space with separable pre-
dual and ∆ : L2 (M, τ) ⊃ D(∆ ) → L2 (M, τ) be a closable linear operator. Assume
that D(∆ ) ⊂ M is a ∗-algebra and that we have k∆ (x)k2 ≤ ckxk for all x ∈ D(∆ ).
Then ∆ extends to a bounded mapping ∆ : M → L2 (M, τ) with k∆ kM→L2 (M) ≤ c.

Proof: Since the extension of ∆ to the norm closure of D(∆ ) is trivial, we can as-
sume without restriction that D(∆ ) is a C∗ -algebra. Consider y ∈ M. By Kaplansky’s
density theorem there exists a sequence (xn )n∈N with xn ∈ D(∆ ), kxn k ≤ kyk for all
n, and such that (xn )n converges to y in the strong operator topology. By assumption
we know that the sequence (∆ (xn ))n is bounded by ckyk in the L2 -norm. By the
Banach-Saks theorem we have then a subsequence (∆ (xnk ))k of which the Cesàro
means converge in the L2 -norm, say to some z ∈ L2 (M):

1 m
zm := ∑ ∆ (xnl ) → z ∈ L2 (M).
m l=1

Now put ym := ∑m l=1 xnl /m. Then we have a sequence (ym )m∈N that converges to y in
the strong operator topology, hence also in the L2 -norm, and such that (∆ (ym ))m =
(zm )m converges to some z ∈ L2 (M). Since ∆ is closable, this z is independent of the
chosen sequences and putting ∆ (y) := z gives the extension to M we seek. Since we
have k∆ (ym )k2 ≤ ckyk for all m, this goes also over to the limit: k∆ (y)k2 = kzk2 ≤
ckyk.

8.3 Conjugate variables and free Fisher information Φ ∗

Before we give the definition of the free Fisher information we want to motivate the
form of this by having a look at classical Fisher information.
In classical probability theory the Fisher information I(X) of a random variable
X is the derivative of the entropy of a Brownian motion starting in X. Assume the
probability distribution µX has a density p, then the density pt at time t of such a
Brownian motion is given by the solution of the diffusion equation

∂ pt (u) ∂ 2 pt (u)
=
∂t ∂ u2
8.3 Conjugate variables and free Fisher information Φ ∗ 209

subject to the initial condition p0 (u) = p(u). Let us calculate the derivative of the
classical entropy S(pt ) at t = 0, where we use the explicit formula for classical
entropy Z
S(pt ) = − pt (u) log pt (u)du.

We will in the following just do formal calculations, but all steps can be justified
rigorously. We will also use the notations

∂ ∂
ṗ := p, p0 := p,
∂t ∂u
where p(t, u) = pt (u). Then we have

dS(pt )
Z Z
∂
=− [pt (u) · log pt (u)] du = − [ ṗt log pt + ṗt ] du.
dt ∂t
The second term vanishes,
d
Z Z
ṗt du = pt (u) du = 0
dt
(because pt is a probability density for all t); by invoking the diffusion equation and
by integration by parts the first term gives

(pt0 (u))2
Z Z Z Z
− ṗt log pt du = − pt00 log pt du = pt0 (log pt )0 du = du.
pt (u)

Taking this at t = 0 gives the explicit formula

(p0 (u))2
Z
I(X) = du if dµX (u) = p(u) du
p(u)

for the Fisher information of X.

To get a non-commutative version of this one first needs a conceptual understand-
ing of this formula. For this let us rewrite it in the form

(p0 (u))2 h p0 2 i
Z
I(X) = du = E − (X) = E(ξ 2 ),
p(u) p

where the random variable ξ (usually called the score function) is defined by

p0
ξ := − (X) (which is in L2 (X) if I(X) < ∞).
p

The advantage of this is that the score ξ has some conceptual meaning. Consider a
nice f (X) ∈ L2 (X) and calculate
210 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
h p0 i Z 0
p (u)
E(ξ f (X)) = −E (X) f (X) = − f (u)p(u) du
p p(u)
Z Z
0
=− p (u) f (u) du = p(u) f 0 (u) du = E( f 0 (X)).

d
In terms of the derivative operator and its adjoint we can also write this in L2 as
du
D d ∗ E
hξ , f (X)i = E(ξ f (X) ) = E( f 0 (X) ) = h1, f 0 (X)i = 1, f (X) ,
du
implying that
d ∗
ξ=1.
du
The above formulas were for the case n = 1 of one variable, but doing the same
in the multivariate case is no problem in the classical case.
Exercise 4. Repeat this formal proof in the multivariate case to show that for a
random vector (X1 , . . . , Xn ) with density p on Rn and a function f : Rn → R we have
∂
p

∂
∂ ui

E f (X1 , . . . , Xn ) = E (X1 , . . . , Xn ) · f (X1 , . . . , Xn ) .
∂ ui p

This can now be made non-commutative by replacing the commutative derivative

∂ /∂ ui by the non-commutative derivative ∂i . The following definitions are due to
Voiculescu [187].

Definition 12. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n.
1) We say ξ1 , . . . , ξn ∈ L2 (M) satisfy the conjugate relations for x1 , . . . , xn if we
have for all P ∈ ChX1 , . . . , Xn i

τ(ξi P(x1 , . . . , xn )) = τ ⊗ τ (∂i P)(x1 , . . . , xn ) (8.11)

where for η ∈ L2 (M) we set τ(η) = hη, 1i or, more explicitly,

m
τ(ξi xi(1) · · · xi(m) ) = ∑ δii(k) τ(xi(1) · · · xi(k−1) )τ(xi(k+1) · · · xi(m) ) (8.12)
k=1

for all m ≥ 0 and all 1 ≤ i, i(1), . . . , i(m) ≤ n.

(m = 0 means here of course: τ(ξi ) = 0.)
2) ξ1 , . . . , ξn is a conjugate system for x1 , . . . , xn , if they satisfy the conjugate
relations (8.11) and if in addition ξi ∈ L2 (x1 , . . . , xn ) for all i = 1, . . . , n.
3) The free Fisher information of x1 , . . . , xn is defined by
8.3 Conjugate variables and free Fisher information Φ ∗ 211
n

∑ kξi k22 , if ξ1 , . . . , ξn is a conjugate system for x1 , . . . , xn


∗
Φ (x1 , . . . , xn ) = i=1

+∞, if no conjugate system exists.
(8.13)

Note the conjugate relations prescribe the inner products of the ξi with a dense
subset in L2 (x1 , . . . , xn ), thus a conjugate system is unique if it exists.
If there exist ξ1 , . . . , ξn ∈ L2 (M) which satisfy the conjugate relations then there
exists a conjugate system; this is given by pξ1 , . . . , pξn where p is the orthogonal
projection from L2 (M) onto L2 (x1 , . . . , xn ). This holds because the left-hand side of
(8.11) is unchanged by replacing ξi by pξi . Furthermore, we have in such a situation
n n
Φ ∗ (x1 , . . . , xn ) = ∑ kpξi k22 ≤ ∑ kξi k22 ,
i=1 i=1

with equality if and only if ξ1 , . . . , ξn is already a conjugate system.

If x and y are free and x has a conjugate variable ξ , then ξ satisfies the conjugate
relation (1) in Definition 12 for x + y. This means that
n
τ(ξ (x + y)n ) = ∑ τ((x + y)l−1 )τ((x + y)n−l ).
l=1

This can be verified from the definition, but there is an easier way to do this using
free cumulants. See Exercise 7 following Remark 21 below. By projecting ξ onto
L2 (x + y) we get η a conjugate vector whose length has not increased. Thus when
x and y are free we have Φ ∗ (x + y) ≤ min{Φ ∗ (x), Φ ∗ (y)}. However the free Stam
inequality (see Theorem 19) is sharper.
Formally, the definition of ξi could also be written as ξi = ∂i∗ (1 ⊗ 1). However, in
order that this makes sense, we need ∂i as an unbounded operator on L2 (x1 , . . . , xn ),
which is the case if and only if x1 , . . . , xn are algebraically free. The next proposi-
tion by Mai, Speicher, Weber [121] shows that the existence of a conjugate system
excludes algebraic relations between the xi , and hence the conjugate variables are,
if they exist, always of the form ξi = ∂i∗ (1 ⊗ 1). This implies then also, by Theorem
8, that the ∂i are closable.

Theorem 13. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Assume that a conjugate system ξ1 , . . . , ξn for x1 , . . . , xn exists. Then
x1 , . . . , xn are algebraically free.

Proof: Consider P ∈ ChX1 , . . . , Xn i with P(x1 , . . . , xn ) = 0. We claim that then also

qi := (∂i P)(x1 , . . . , xn ) = 0 for all i = 1, . . . , n. In order to see this let us consider
R1 PR2 for R1 , R2 ∈ ChX1 , . . . , Xn i. We have (R1 PR2 )(x1 , . . . , xn ) = 0 and, because of

∂i (R1 PR2 ) = ∂i R1 · 1 ⊗ PR2 + R1 ⊗ 1 · ∂i P · 1 ⊗ R2 + R1 P ⊗ 1 · ∂i R2 ,

212 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

we get, by putting r1 := R1 (x1 , . . . , xn ) and r2 := R2 (x1 , . . . , xn ),

∂i (R1 PR2 ) (x1 , . . . , xn ) = r1 ⊗ 1 · qi · 1 ⊗ r2 .

Thus we have

0 = τ[ξi · (R1 PR2 )(x1 , . . . , xn )] = τ ⊗ τ ∂i (R1 PR2 ) (x1 , . . . , xn )
= τ ⊗ τ[r1 ⊗ 1 · qi · 1 ⊗ r2 ] = τ ⊗ τ[qi · r1 ⊗ r2 ].

Hence τ ⊗ τ[qi · r1 ⊗ r2 ] = 0 for all r1 , r2 ∈ Chx1 , . . . , xn i, which implies that qi = 0.

So we can get from a given relation new ones by formal differentiation. We
prefer to have relations in Chx1 , . . . , xn i and not in the tensor product; this can be
achieved by applying id ⊗ τ to the qi . Thus we have seen that a relation of the form
P(x1 , . . . , xn ) = 0 implies also the relation (∂i P)(x1 , . . . , xn ) = 0 and in particular
id ⊗ τ[(∂i P)(x1 , . . . , xn )] = 0.
Assume now that we have an algebraic relation between the xi of the form
P(x1 , . . . , xn ) = 0 for P ∈ Chx1 , . . . , xn i. Let m be the degree of P. This means that P
has a highest order term of the form α Xi(1) · · · Xi(m) (α ∈ C); note that there might
be other terms of highest order. Denote by D the operator

D := (id ⊗ τ) ◦ ∂i(1) ◦ (id ⊗ τ) ◦ ∂i(2) ◦ · · · ◦ (id ⊗ τ) ◦ ∂i(m) .

As an application of (id ⊗ τ) ◦ ∂i reduces the degree of a word X j(1) · · · X j(k) by

at least 1, and exactly 1 only when j(k) = i; we have DXi(1) · · · Xi(m) = 1 and the
application of D on other monomials of length m, as well as on monomials of
smaller length, gives 0. This implies that DP = α. On the other hand we know that
DP(x1 , . . . , xn ) = 0. Hence we get α = 0. By dealing with all highest order terms
of P in this fashion, we get in the end that all highest order terms of P are equal
to zero, hence P = 0. This means there are no non-trivial algebraic relations for the
xi .
Let us now look on the free Fisher information Φ ∗ . As in the case of the free
entropy χ one has again quite explicit formulas in the one-dimensional case, but not
in higher dimensions. Before stating the theorem let us review two basic properties
of the Hilbert transform H. Suppose 1 ≤ p < ∞ and f ∈ L p (R), with respect to
Lebesgue measure. For each ε > 0 let
1 s−t
Z
hε (s) = f (t) dt.
π (s − t)2 + ε 2

Then hε ∈ LP (R), hε converges almost everywhere to a function h ∈ L p (R), and

khε − hk p → 0 as ε → 0+ . We call h the Hilbert transform of f and denote it H( f ).
We can also write H( f ) as a Cauchy principal value integral

1 f (t) 1 f (s − t) − f (s + t)
Z Z
H( f )(s) = dt = dt.
π s−t 2π t
8.3 Conjugate variables and free Fisher information Φ ∗ 213

When p = 2, H is an isometry and for general p there is a constant C p such that

kH( f )k p ≤ C p k f k p . See Stein & Weiss [168, Ch. VI, §6, paragraph 6.13].
The Hilbert transform is also related to the Cauchy transform as follows. Recall
from Notation 3.4 that the Poisson kernel P and the conjugate Poisson kernel Q are
given by
1 t 1 s
Pt (s) = and Qt (s) = .
π s2 + t 2 π s2 + t 2
We have Pt (s) + iQt (s) = i(π(s + it))−1 . Let G(z) = f (t)(z − t)−1 dt, then
R

1 −1
hε (s) = (Qε ∗ f )(s) = Re(G(s + iε)) and (Pε ∗ f )(s) = Im(G(s + iε)).
π π
(8.14)
The first term converges to H( f ) and the second to f as ε → 0+ .
The following result is due to Voiculescu [187].
Theorem 14. Consider x = x∗ ∈ M and assume that µx has a density p which is in
L3 (R). Then a conjugate variable exists and is given by

1 p(u)
Z
ξ = 2πH(p)(x), where H(p)(v) = du
π v−u
is the Hilbert transform. The free Fisher information is then
4
Z
Φ ∗ (x) = π 2 p(u)3 du. (8.15)
3

Proof: We just give a sketch by providing the formal calculations. If we put ξ =

2πH(p)(x) then we have

τ(ξ f (x)) = τ(2πH(p)(x) f (x))

Z
= 2π H(p)(v) f (v) p(v)dv
f (v)
Z Z
=2 p(u)p(v) dudv
v−u
f (v) f (u)
Z Z Z Z
= p(u)p(v) dudv + p(v)p(u) dvdu
v−u u−v
f (u) − f (v)
Z Z
= p(u)p(v) dudv
u−v
= τ ⊗ τ(∂ f (x)).

So we have
4
Z Z
Φ ∗ (x) = τ (2πH(p)(x))2 = 4π 2 (H(p)(u))2 p(u) du = π 2 p(u)3 du.

3
The last equality is a general property of the Hilbert transform which follows from
Equation (8.14), see Exercise 5.
214 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

Exercise 5. (i) By replacing H(p) by hε make the formal argument rigorous.

(ii) Show, by doing a contour integral, that with p ∈ L3 (R) we have for the
Cauchy transform G(z) = (z − t)−1 p(t) dt that G(t + iε)3 dt = 0 for all ε > 0.
R R

Then use Equation (8.14) to prove the last step in the proof of Theorem 14.
After [187] it remained open for a while whether the condition on the density in
the last theorem is also necessary. That this is indeed the case is the content of the
next proposition, which is an unpublished result of Belinschi and Bercovici. Before
we get to this we need to consider briefly freeness for unbounded operators.
The notion of freeness we have given so far assumes that our random variables
have moments of all orders. We now see that the use of conjugate variables requires
us to use unbounded operators and these might only have a first and second moment,
so our current definition of freeness cannot be applied. For classical independence
there is no need for the random variables to have any moments; the usual definition
of independence relies on spectral projections. In the non-commutative picture we
also use spectral projections, except now they may not commute. To describe this
we need to review the idea of an operator affiliated to a von Neumann algebra.
Let M be a von Neumann algebra acting on a Hilbert space H and suppose that
t is a closed operator on H. Let t = u|t| be the polar decomposition of t, see, for
example, Reed and Simon [148, Ch. VIII]. Now |t| is a closed self-adjoint operator
and thus has a spectral resolution E|t| . This means that E|t| is a projection valued
measure on R, i.e. we require that for each Borel set B ⊆ R we have that E|t| (B)
is a projection on H and for each pair η1 , η2 ∈ H the measure µη1 ,η2 , defined by
µη1 ,η2 (B) = hE|t| (B)η1 , η2 i, is a complex measure on R. Returning to our t, if both
u and E|t| (B) belong to M for every Borel set B, we say that t is affiliated with M.
Suppose now that M has a faithful trace τ and H = L2 (M). For t self-adjoint and
affiliated with M we let µt , the distribution of t, be given by µt (B) = τ(Et (B)). If
R
t ≥ 0 and λ dµt (λ ) < ∞ we say that t is integrable. For a general closed operator
affiliated with M we say that t is p-integrable if |t| p is integrable, i.e. λ p dµ|t| (λ ) <
R

∞. In this picture L2 (M) is the space of square integrable operators affiliated with
M.

Definition 15. Suppose M is a von Neumann algebra with a faithful trace τ and
t1 , . . . ,ts are closed operators affiliated with M. For each i, let Ai be the von Neumann
subalgebra of M generated by ui and the spectral projections E|ti | (B) where B ⊂ R is
a Borel set and ti = ui |ti | is the polar decomposition of ti . If the subalgebras A1 , . . . , As
are free with respect to τ then we say that the operators t1 , . . . ,ts are free with respect
to τ.

Remark 16. In [134, Thm. XV] Murray and von Neumann showed that the operators
affiliated with M form a ∗-algebra. So if t1 and t2 are self-adjoint operators affiliated
with M we can form the spectral measure µt1 +t2 . When t1 and t2 are free this is the
free additive convolution of µt1 and µt2 . Indeed this was the definition of µt1 µt2
given by Bercovici and Voiculescu [31]. This shows that by passing to self-adjoint
8.3 Conjugate variables and free Fisher information Φ ∗ 215

operators affiliated to a von Neumann algebra one can obtain the free additive con-
volution of two probability measures on R from the addition of two free random
variables, see Remark 3.48.

Remark 17. If x = x∗ ∈ M and |z| > kxk then both

n−1
∑ z−(n+1) xn and ∑ z−(n+1) ∑ xk ⊗ xn−k−1
n≥0 n≥1 k=0

converge in norm to elements of M and M ⊗M respectively. If x has a conjugate vari-

able ξ then we get by applying the conjugate relation termwise and then summing
the equation
τ(ξ (z − x)−1 ) = τ ⊗ τ((z − x)−1 ⊗ (z − x)−1 ). (8.16)
Conversely if ξ ∈ L2 (x) satisfies this equation for |z| > kxk then ξ is the conjugate
variable for x. If x is a self-adjoint random variable affiliated with M and z ∈ C+
then (z − x)−1 ∈ M and we can ask for a self-adjoint operator ξ ∈ L2 (x) such that
Equation (8.16) holds. If such a ξ exists we say that ξ is the conjugate variable for
x, thus extending the definition to the unbounded case.

The following proposition is an unpublished result by Belinschi and Bercovici.

Proposition 18. Consider x = x∗ ∈ M and assume that Φ ∗ (x) < ∞. Then the distri-
bution µx is absolutely continuous with respect to Lebesgue measure and the density
p is in L3 (R); moreover we have

4
Z
Φ ∗ (x) = π 2 p3 (u) du.
3

Proof: Again, we will only provide formal arguments. The main deficiency of the
following is that we have to invoke unbounded operators, and the statements we
are going to use are only established for bounded operators in our presentation.
However, this can be made rigorous by working with operators affiliated with M
and by extending the previous theorem to the unbounded setting.
Let t be a Cauchy distributed random variable which is free from x. (Note that t
is an unbounded operator!) Consider for ε > 0 the random variable xε := x + εt. It
can be shown that adding a free variable cannot increase the free Fisher information,
since one gets the conjugate variable of xε by conditioning the conjugate variable
of x onto the L2 -space generated by xε . See Exercise 7 below for the argument in
the bounded case. For this to make sense in the unbounded case we use resolvents
as above (Remark 17) to say what a conjugate variable is. Hence Φ ∗ (xε ) ≤ Φ ∗ (x)
for all ε > 0. But, for any ε > 0, the distribution of xε is the free convolution of µx
with a scaled Cauchy distribution. By Remark 3.34 we have Gxε (z) = Gx (z + iε),
and hence, by the Stieltjes inversion formula, the distribution of xε has a density pε
which is given by
216 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

1 1
Z
ε
pε (u) = − ImGx (u + iε) = 2 2
dµx (v).
π π R (u − v) + ε

Since this density is always in L3 (R), we know by (the unbounded version of) the
previous theorem that Z
Φ ∗ (xε ) = pε (u)3 du.

So we get
1
Z
sup 3 |ImGx (u + iε)|3 du = sup Φ ∗ (xε ) ≤ Φ ∗ (x).
ε>0 π ε>0

This implies (see, e.g., [109]) that Gx belongs to the Hardy space H 3 (C+ ), and thus
µx is absolutely continuous and its density is in L3 (R).
Some important properties of the free Fisher information are collected in the
following theorem. For the proof we refer to Voiculescu’s original paper [187].

Theorem 19. The free Fisher information Φ ∗ has the following properties (where
all appearing variables are self-adjoint and live in a tracial W ∗ -probability space).
1) Φ ∗ is superadditive:

Φ ∗ (x1 , . . . , xn , y1 , . . . , ym ) ≥ Φ ∗ (x1 , . . . , xn ) + Φ ∗ (y1 , . . . , ym ). (8.17)

2) We have the free Cramér Rao inequality:

n2
Φ ∗ (x1 , . . . , xn ) ≥ . (8.18)
τ(x12 ) + · · · + τ(xn2 )

3) We have the free Stam inequality. If {x1 , . . . , xn } and {y1 , . . . , yn } are free then
we have
1 1 1
≥ + ∗ . (8.19)
Φ ∗ (x1 + y1 , . . . , xn + yn ) Φ ∗ (x 1 , . . . , xn ) Φ (y1 , . . . , yn )

(This is true even if some of Φ ∗ are +∞.)

(k)
4) Φ ∗ is lower semicontinuous. If, for each i = 1, . . . , n, xi converges to xi in the
weak operator topology as k → ∞, then we have
(k) (k)
lim inf Φ ∗ (x1 , . . . , xn ) ≥ Φ ∗ (x1 , . . . , xn ). (8.20)
k→∞

Of course, we expect that additivity of the free Fisher information corresponds

to the freeness of the variables. We will investigate this more closely in the next
section.

8.4 Additivity of Φ ∗ and freeness

Since cumulants are better suited than moments to deal with freeness we will first
rewrite the conjugate relations into cumulant form.
8.4 Additivity of Φ ∗ and freeness 217

Theorem 20. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Consider ξ1 , . . . , ξn ∈ L2 (M). The following statements are equivalent:
(i) ξ1 , . . . , ξn satisfy the conjugate relations (8.12).
(ii) We have for all m ≥ 1 and 1 ≤ i, i(1), . . . , i(m) ≤ n that

κ1 (ξi ) = 0
κ2 (ξi , xi(1) ) = δii(1)
κm+1 (ξi , xi(1) , . . . , xi(m) ) = 0 (m ≥ 2).

Remark 21. Note that up to now we considered only cumulants where all arguments
are elements of the algebra M; here we have the situation where one argument is
from L2 , all the other arguments are from L∞ = M. This is well defined by approx-
imation using the normality of the trace, and poses no problems, since multiplying
an element from L2 with an operator from L∞ gives again an element from L2 ; or
one can work directly with the inner product on L2 . Cumulants with more than two
arguments from L2 would be problematic. Moreover one can apply our result, Equa-
tion (2.19), when the entries of our cumulant are products, again provided that there
are at most two elements from L2 .

Exercise 6. Prove Theorem 20.

Exercise 7. Prove the claim following Theorem 12 that if x1 and x2 are free and x1
has a conjugate variable ξ , then ξ satisfies the conjugate relations for x1 + x2 .
We can now prove the easy direction of the relation between free Fisher informa-
tion and freeness. This result is due to Voiculescu [187]; our proof using cumulants
is from [137].

Theorem 22. Let (M, τ) be a tracial W ∗ -probability space and consider xi = xi∗ ∈ M
(i = 1, . . . , n) and y j = y∗j ∈ M ( j = 1, . . . , m). If {x1 , . . . , xn } and {y1 , . . . , ym } are free
then we have

Φ ∗ (x1 , . . . , xn , y1 , . . . , ym ) = Φ ∗ (x1 , . . . , xn ) + Φ ∗ (y1 , . . . , ym ).

(This is true even if some of Φ ∗ are +∞.)

Proof: If Φ ∗ (x1 , . . . , xn ) = ∞ or if Φ ∗ (y1 , . . . , ym ) = ∞, then the statement is clear,

by the superadditivity of Φ ∗ from Theorem 19.
So assume Φ ∗ (x1 , . . . , xn ) < ∞ and Φ ∗ (y1 , . . . , ym ) < ∞. This means that we have
a conjugate system ξ1 , . . . , ξn ∈ L2 (x1 , . . . , xn ) for x1 , . . . , xn and a conjugate sys-
tem η1 , . . . , ηm ∈ L2 (y1 , . . . , ym ) for y1 , . . . , ym . We claim now that ξ1 , . . . , ξn , η1 ,
. . . , ηm is a conjugate system for x1 , . . . , xn , y1 , . . . , ym . It is clear that we have
ξ1 , . . . , ξn , η1 , . . . , ηm ∈ L2 (x1 , . . . , xn , y1 , . . . , ym ), it only remains to check the con-
jugate relations. We do this in terms of cumulants, verifying the relations (ii) using
218 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

Theorem 20. The relations involving only x’s and ξ ’s or only y’s and η’s are sat-
isfied because of the conjugate relations for either x/ξ or y/η. Because of ξi ∈
L2 (x1 , . . . , xn ) and η j ∈ L2 (y1 , . . . , ym ) and the fact that {x1 , . . . , xn } and {y1 , . . . , ym }
are free, we have furthermore the vanishing (see Remark 21) of all cumulants with
mixed arguments from {x1 , . . . , xn , ξ1 , . . . , ξn } and {y1 , . . . , ym , η1 , . . . , ηm }. But this
gives then all the conjugate relations.
The less straightforward implication, namely that additivity of the free Fisher
information implies freeness, relies on the following relation for commutators be-
tween variables and their conjugate variables. This, as well as the consequence for
free Fisher information, was proved by Voiculescu in [189], whereas our proofs use
again adaptations of ideas from [137].

Theorem 23. Let (M, τ) be a tracial W *-probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Let ξ1 , . . . , ξn ∈ L2 (x1 , . . . , xn ) be a conjugate system for x1 , . . . , xn . Then
we have
n
∑ [xi , ξi ] = 0
i=1

(where [a, b] = ab − ba denotes the commutator of a and b).

Proof: Let us put

n
c := ∑ [xi , ξi ] ∈ L2 (x1 , . . . , xn ).
i=1

Then it suffices to show

τ(cxi(1) · · · xi(m) ) = 0 for all m ≥ 0 and all 1 ≤ i(1), . . . , i(m) ≤ n.

In terms of cumulants this is equivalent to

κm+1 (c, xi(1) , . . . , xi(m) ) = 0 for all m ≥ 0 and all 1 ≤ i(1), . . . , i(m) ≤ n.

By using the formula for cumulants with products as entries, Theorem 2.13, we get

κm+1 (c, xi(1) , . . . , xi(m) )

m
= ∑ κm+1 (xi ξi , xi(1) , . . . , xi(m) ) − κm+1 (ξi xi , xi(1) , . . . , xi(m) )
i=1
m
= ∑ κ2 (ξi , xi(1) )κm (xi , xi(2) , . . . , xi(m) ) − κ2 (ξi , xi(m) )κm (xi , xi(1) , . . . , xi(m−1) )
i=1
= κm (xi(1) , xi(2) , . . . , xi(m) ) − κm (xi(m) , xi(1) , . . . , xi(m−1) )
= 0,

because in the case of the first sum, the only partition, π, that satisfies the two con-
ditions that ξi is in a block of size two and π ∨ {(1, 2), (3), · · · , (m + 2)} = 1m+2 is
8.4 Additivity of Φ ∗ and freeness 219

π = {(1, 4, 5, . . . , m + 2), (2, 3)}; and in the case of the second sum the only par-
tition, σ , that satisfies the two conditions that ξi is in a block of size two and
σ ∨ {(1, 2), (3), · · · , (m + 2)} = 1m+2 is σ = {(1, m + 2), (2, 3, 4, . . . , m + 1)}. The
last equality follows from the fact that τ is a trace, see Exercise 2.8.

Theorem 24. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n and y j = y∗j ∈ M for j = 1, . . . , m. Assume that

Φ ∗ (x1 , . . . , xn , y1 , . . . , ym ) = Φ ∗ (x1 , . . . , xn ) + Φ ∗ (y1 , . . . , ym ) < ∞.

Then, {x1 , . . . , xn } and {y1 , . . . , ym } are free.

Proof: Let ξ1 , . . . , ξn , η1 , . . . , ηm ∈ L2 (x1 , . . . , xn , y1 , . . . , ym ) be the conjugate sys-

tem for x1 , . . . , xn , y1 , . . . , ym . Since this means in particular that ξ1 , . . . , ξn satisfy the
conjugate relations for x1 , . . . , xn we know that Pξ1 , . . . , Pξn is the conjugate system
for x1 , . . . , xn , where P is the orthogonal projection onto L2 (x1 , . . . , xn ). In the same
way, Qη1 , . . . , Qηm is the conjugate system for y1 , . . . , ym , where Q is the orthogonal
projection onto L2 (y1 , . . . , ym ). But then we have
n m
∑ kξi k22 + ∑ kη j k22 = Φ ∗ (x1 , . . . , xn , y1 , . . . , ym )
i=1 j=1

= Φ ∗ (x1 , . . . , xn ) + Φ ∗ (y1 , . . . , ym )
n m
= ∑ kPξi k22 + ∑ kQη j k22 .
i=1 j=1

However, this means that the projection P has no effect on the ξi and the projection
Q has no effect on η j ; hence the additivity of the Fisher information is saying that
ξ1 , . . . , ξn is already the conjugate system for x1 , . . . , xn and η1 , . . . , ηm is already the
conjugate system for y1 , . . . , ym . By Theorem 23, this implies that
n m
∑ [xi , ξi ] = 0 and ∑ [y j , η j ] = 0.
i=1 j=1

In order to prove the asserted freeness we have to check that all mixed cumulants
in {x1 , . . . , xn } and {y1 , . . . , ym } vanish. In this situation a mixed cumulant means
there is at least one xi and at least one y j . Moreover, because we are working with a
tracial state, it suffices to show κr+2 (xi , z1 , . . . , zr , y j ) = 0 for all r ≥ 0; i = 1, . . . , n;
j = 1, . . . , m; and z1 , . . . , zr ∈ {x1 , . . . , xn , y1 , . . . , ym }. Consider such a situation. Then
we have
n
0 = κr+3 ( ∑ [xk , ξk ], xi , z1 , . . . , zr , y j )
k=1
220 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
n n
= ∑ κ| r+3 (xk ξk , xi{z, z1 , . . . , zr , y j}) − ∑ κ| r+3 (ξk xk , xi{z, z1 , . . . , zr , y j})
k=1 k=1
κ2 (ξk ,xi )·κr+2 (xk ,z1 ,...,zr ,y j ) κ2 (ξk ,y j )·κr+2 (xk ,xi ,z1 ,...,zr )

= κr+2 (xi , z1 , . . . , zr , y j ),

because, by the conjugate relations, κ2 (ξk , xi ) = δki and κ2 (ξk , y j ) = 0 for all k =
1, . . . , n and all j = 1, . . . , m.

8.5 The non-microstates free entropy χ ∗

By analogy with the classical situation we would expect that the free Fisher informa-
tion of x1 , . . . , xn is the derivative of the free entropy for a Brownian motion starting
in x1 , . . . , xn . Reversing this, the free entropy should be the integral over free Fisher
information along Brownian motions. Since we cannot prove this at the moment for
the microstates free entropy χ (which we defined in the last chapter), we use this
idea to define another version of free entropy, which we denote by χ ∗ . Of course,
we hope that at some point in the not too distant future we will be able to show that
χ = χ ∗.

Definition 25. Let (M, τ) be a tracial W ∗ -probability space. For random variables
xi = xi∗ ∈ M (i = 1, . . . , n), the non-microstates free entropy is defined by
Z ∞ √ √

χ ∗ (x1 , . . . , xn ) := 1 n n
− Φ ∗ (x1 + ts1 , . . . , xn + tsn ) dt + log(2πe),
2 0 1+t 2
(8.21)
where s1 , . . . , sn are free semi-circular random variables which are free from {x1 , . . . ,
xn }.

One can now rewrite the properties of Φ ∗ into properties of χ ∗ . In the next the-
orem we collect the most important ones. The proofs are mostly straightforward
(given the properties of Φ ∗ ) and we refer again to Voiculescu’s original papers
[187, 189].

Theorem 26. The non-microstates free entropy has the following properties (where
all variables which appear are self-adjoint and are in a tracial W ∗ -probability
space).
1) For n = 1, we have χ ∗ (x) = χ (x).
2) We have the upper bound

χ ∗ (x1 , . . . , xn ) ≤ n log(2πn−1C2 ), (8.22)

2
where C2 = τ(x12 + · · · + xn2 ).
3) χ ∗ is subadditive:

χ ∗ (x1 , . . . , xn , y1 , . . . , ym ) ≤ χ ∗ (x1 , . . . , xn ) + χ ∗ (y1 , . . . , ym ). (8.23)

8.6 Operator algebraic applications of free Fisher information 221

4) If {x1 , . . . , xn } and {y1 , . . . , ym } are free, then

χ ∗ (x1 , . . . , xn , y1 , . . . , ym ) = χ ∗ (x1 , . . . , xn ) + χ ∗ (y1 , . . . , ym ). (8.24)

5) On the other hand, if

χ ∗ (x1 , . . . , xn , y1 , . . . , ym ) = χ ∗ (x1 , . . . , xn ) + χ ∗ (y1 , . . . , ym ) > −∞

then {x1 , . . . , xn } and {y1 , . . . , ym } are free.

(k)
6) χ ∗ is upper semicontinuous. If, for each i = 1, . . . , n, xi converges for k → ∞
in the weak operator topology to xi , then we have
(k) (k)
lim sup χ ∗ (x1 , . . . , xn ) ≤ χ ∗ (x1 , . . . , xn ). (8.25)
k→∞

7) We have the following log-Sobolev inequality. If Φ ∗ (x1 , . . . , xn ) < ∞ then

χ ∗ (x1 , . . . , xn ) ≥ n log 2πne
. (8.26)
2 Φ ∗ (x1 , . . . , xn )

In particular:

Φ ∗ (x1 , . . . , xn ) < ∞ =⇒ χ ∗ (x1 , . . . , xn ) > −∞. (8.27)

Though we do not know at the moment whether χ = χ ∗ in general, we have at

least one half of this by the following deep result of Biane, Capitaine, and Guionnet
[38].

Theorem 27. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Then we have

χ (x1 , . . . , xn ) ≤ χ ∗ (x1 , . . . , xn ). (8.28)

8.6 Operator algebraic applications of free Fisher information

Assume that Φ ∗ (x1 , . . . , xn ) < ∞. Then, by (8.27), we have that χ ∗ (x1 , . . . , xn ) > −∞.
If we believe that χ ∗ = χ , then by our results from the last chapter this would imply
certain properties of the von Neumann algebra generated by x1 , . . . , xn . In particular,
vN(x1 , . . . , xn ) would not have property Γ . (Note that the inequality χ ≤ χ ∗ from
Theorem 27 goes in the wrong direction to obtain this conclusion.)
We will now show directly the absence of property Γ from the assumption
Φ ∗ (x1 , . . . , xn ) < ∞. This result is due to Dabrowski and we will follow quite closely
his arguments from [63].
In the following we will always work in a tracial W ∗ -probability space (M, τ)
and consider xi = xi∗ ∈ M for i = 1, . . . , n. We assume that Φ ∗ (x1 , . . . , xn ) < ∞ and
denote by ξ1 , . . . , ξn the conjugate system for x1 , . . . , xn . Recall also from Theo-
rem 13 that finite Fisher information excludes algebraic relations among x1 , . . . , xn ,
222 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

hence ∂i is defined as an unbounded operator on Chx1 , . . . , xn i. In particular, if

P ∈ ChX1 , . . . , Xn i and p = P(x1 , . . . , xn ) then ∂i p is the same as (∂i P)(x1 , . . . , xn ).
The crucial technical calculations are contained in the following lemma.
Lemma 28. Assume that Φ ∗ (x1 , . . . , xn ) < ∞. Then we have for all p ∈ Chx1 , . . . , xn i
n n
(n − 1)k[p, 1 ⊗ 1]k22 = ∑ h[p, xi ], [p, ξi ]i + 2Re ∑ h∂i p, [1 ⊗ 1, [p, xi ]]i (8.29)
i=1 i=1

(note that [p, 1 ⊗ 1] should here be understood as module operations, i.e., we have
[p, 1 ⊗ 1] = p ⊗ 1 − 1 ⊗ p).

Proof: We write, for arbitrary j ∈ {1, . . . , n},

k[p, 1 ⊗ 1]k22 = h[p, 1 ⊗ 1], [p, 1 ⊗ 1]i

= h∂ j [p, x j ], [p, 1 ⊗ 1]i − h[∂ j p, x j ], [p, 1 ⊗ 1]i.

We rewrite the first term, by using (8.6), as

h∂ j [p, x j ], [p, 1 ⊗ 1]i = h[p, x j ], ∂ j∗ [p, 1 ⊗ 1]i

= h[p, x j ], ∂ j∗ (p ⊗ 1 − 1 ⊗ p)i
= h[p, x j ], pξ j − id ⊗ τ(∂ j p) − ξ j p + τ ⊗ id(∂ j p)i

= [p, x j ], [p, ξ j ] + 1 ⊗ 1, [p, x j ] , ∂ j p ,

and the second term as

h[∂ j p, x j ], [p, 1 ⊗ 1]i = ∂ j p, p, [1 ⊗ 1, x j ] − ∂ j p, 1 ⊗ 1, [p, x j ] .

The first term of the latter is

∂ j p, p, [1 ⊗ 1, x j ] = ∂ j p, (1 ⊗ x j )[p, 1 ⊗ 1] − [p, 1 ⊗ 1](x j ⊗ 1)
= h1 ⊗ x j · ∂ j p − ∂ j p · x j ⊗ 1, [p, 1 ⊗ 1]i.

Note that summing the last expression over j yields, by Lemma 4,

n
∑ h1 ⊗ x j · ∂ j p − ∂ j p · x j ⊗ 1, [p, 1 ⊗ 1]i = h−(p ⊗ 1 − 1 ⊗ p), [p, 1 ⊗ 1]i
j=1

= −h[p, 1 ⊗ 1], [p, 1 ⊗ 1]i.

Summing all our equations over j gives Equation (8.29).

Corollary 29. Assume that Φ ∗ (x1 , . . . , xn ) < ∞. Then we have for all t ∈ vN(x1 , . . . , xn )

1 n n o
(n − 1)kt − τ(t)k22 ≤ ∑ h[t, xi ], [t, ξi ]i + 4k[t, xi ]k2 · kξi k2 · ktk .
2 i=1
8.7 Absence of atoms for self-adjoint polynomials 223

Proof: It suffices to prove the statement for t = p ∈ Chx1 , . . . , xn i. First note that

k[p, 1 ⊗ 1]k22 = hp ⊗ 1 − 1 ⊗ p, p ⊗ 1 − 1 ⊗ pi = 2(τ(p∗ p) − |τ(p)|2 ) = 2kp − τ(p)k22 .

Thus, (8.29) gives

1 n n
(n − 1)kp − τ(p)k22 = ∑ ih[p, x ], [p, ξ i ]i + Re ∑ ih∂ p, [1 ⊗ 1, [p, xi ]]i .
2 i=1 i=1

We write the second summand as

h∂i p, [1 ⊗ 1, [p, xi ]]i = h∂i p, [p, xi ] ⊗ 1 − 1 ⊗ [p, xi ]i

= hid ⊗ τ(∂i p), [p, xi ]i − hτ ⊗ id(∂i p), [p, xi ]i,

hence we can estimate its real part by

Reh∂i p, [1 ⊗ 1, [p, xi ]]i ≤ 2k(id ⊗ τ)∂i pk2 · k[p, xi ]k2 + 2k(τ ⊗ id)∂i pk2 · k[p, xi ]k2

which, by Equation (8.10), gives the assertion.

Recall from Definition 7.5 that a von Neumann algebra has property Γ if it has a
non-trivial central sequence.

Theorem 30. Let n ≥ 2 and Φ ∗ (x1 , . . . , xn ) < ∞. Then vN(x1 , . . . , xn ) does not have
property Γ (and hence is a factor).

Proof: Let (tk )k∈N be a central sequence in vN(x1 , . . . , xn ). (Recall that central se-
quences are, by definition, bounded in operator norm.) This means in particular that
[tk , xi ] converges, for k → ∞, in L2 (M) to 0, for all i = 1, . . . , n. But then, by Corol-
lary 29, we also have ktk − τ(tk )k2 → 0, which means that our central sequence is
trivial. Thus there exists no non-trivial central sequence.

8.7 Absence of atoms for self-adjoint polynomials

In Theorem 13 we have seen that finite Fisher information (i.e., the existence of a
conjugate system) implies that the variables are algebraically free. This means that
for non-trivial P ∈ ChX1 , . . . , Xn i the operator p := P(x1 , . . . , xn ) cannot be zero. The
ideas from the proof of this statement can actually be refined in order to prove a
much deeper statement, namely the absence of atoms for the distribution µ p for any
such self-adjoint polynomial. Note that atoms at position t in the distribution of µ p
correspond to the existence of a non-trivial eigenspace of p for the eigenvalue t.
By replacing our polynomial by p − t1 we shift the atom to 0, and thus asking the
question whether non-trivial polynomials can have non-trivial kernels. This can be
rephrased in a more algebraic language in the form pw = 0 where w is the orthogonal
projection onto this kernel. Whereas p is a polynomial, the projection w will in
general just be an element in the von Neumann algebra. Hence the question of atoms
224 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

is, at least for self-adjoint polynomials, the same as the question of zero divisors in
the following sense.

Definition 31. A zero divisor w for 0 6= p ∈ Chx1 , . . . , xn i is a non-trivial element

0 6= w ∈ vN(x1 , . . . , xn ) such that pw = 0.

Theorem 32. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for i =
1, . . . , n. Assume that Φ ∗ (x1 , . . . , xn ) < ∞. Then for any non-trivial p ∈ Chx1 , . . . , xn i
there exists no zero divisor.

Proof: The rough idea of the proof follows the same line as the proof of Theorem
13; namely assume that we have a zero divisor for some polynomial, then one shows
that by differentiating this statement one also has a zero divisor for a polynomial of
lesser degree. Thus one can reduce the general case to the (non-trivial) degree 0
case, where obviously no zero divisors exist.
More precisely, assume that we have pw = 0 for non-trivial p ∈ Chx1 , . . . , xn i
and w ∈ vN(x1 , . . . , xn ). Furthermore, we can assume that both p and w are self-
adjoint (otherwise, consider p∗ pww∗ = 0). Then pw = 0 implies also wp = 0. We
will now consider the equation wpw = 0 and take the derivative ∂i of this. Of course,
we have now the problem that w is not necessarily in the domain D(∂i ) of our
derivative. However by approximating w by polynomials and controlling norms via
Dabrowski’s inequality from Theorem 10 one can show that the following formal
arguments can be justified rigorously.
From wpw = 0 we get

0 = ∂i (wpw) = ∂i w · 1 ⊗ pw + w ⊗ 1 · ∂i p · 1 ⊗ w + wp ⊗ 1 · ∂i w.

Because of pw = 0 and wp = 0 the first and the third term vanish and we are left
with w ⊗ 1 · ∂i p · 1 ⊗ w = 0. Again we apply τ ⊗ id to this, in order to get an equation
in the algebra instead of the tensor product; we get

[(τ ⊗ id)(w ⊗ 1 · ∂i p)] w = 0.

| {z }
=:q

Hence we have qw = 0 and q is a polynomial of smaller degree. However, this q is

in general not self-adjoint and thus the other equation wq = 0 is now not a conse-
quence. But since we are in a tracial setting, basic theory of equivalence of projec-
tions for von Neumann algebras shows that we have a non-trivial v ∈ vN(x1 , . . . , xn )
such that vq = 0. Indeed, the projections onto ker(q) and ker(q∗ ) are equivalent.
Since qw = 0 we have ker(q) 6= {0} and thus ker(q∗ ) 6= {0}. This means that ran(q)
is not dense and hence there is v 6= 0 with vq = 0. Then we can continue with vqw = 0
in the same way as above and get a further reduction of our polynomial. Of course,
we have to avoid that taking the derivative gives a trivial polynomial, but since the
above works for all ∂i with i = 1, . . . , n, we have enough flexibility to avoid this.
For the details of the proof we refer to the original work [121].
8.8 Additional exercises 225

The condition Φ ∗ (x1 , . . . , xn ) < ∞ is not the weakest possible; in [52] it was
shown that the conclusion of Theorem 32 still holds under the assumption of maxi-
mal free entropy dimension.

8.8 Additional exercises

Exercise 8. (i) Let s1 , . . . , sn be n free semi-circular elements and ∂1 , . . . , ∂n the cor-

responding non-commutative derivatives. Show that one has

∂i∗ (1 ⊗ 1) = si for all i = 1, . . . , n.

(ii) Show that the condition from (i) actually characterizes a family of n free
semi-circulars. Equivalently, let ξ1 , . . . , ξn be the conjugate system for self-adjoint
variables x1 , . . . , xn in some tracial W ∗ -probability space. Assume that ξi = xi for all
i = 1, . . . , n. Show that x1 , . . . , xn are n free semi-circular variables.

Exercise 9. Let s1 , . . . , sn be n free semicircular elements. Fix a natural number m

and let f : {1, . . . , n}m → C be any function that “vanishes on the diagonals”, i.e.,
f (i1 , . . . , im ) = 0 whenever there are k 6= l such that ik = il . Put
n
p := ∑ f (i1 , . . . , im )si1 · · · sim ∈ Chs1 , . . . , sn i.
i1 ,...,im =1

Calculate ∑ni=1 ∂i∗ ∂i p.

Notation 33 In the following (Cn )n∈N0 and (Un )n∈N0 will be the Chebyshev poly-
nomials of the first and second kind, respectively (rescaled to the interval [−2, 2]),
i.e., the sequence of polynomials Cn ,Un ∈ ChXi which are defined recursively by

C0 (X) = 2, C1 (X) = X, Cn+1 (X) = XCn (X) −Cn−1 (X) (n ≥ 1)

and

U0 (X) = 1, U1 (X) = X, Un+1 (X) = XUn (X) −Un−1 (X) (n ≥ 1).

These polynomials already appeared in Chapter 5. See, in particular, Exercise 5.12.

Exercise 10. Let ∂ : ChXi → ChXi ⊗ ChXi be the non-commutative derivative with
respect to X. Show that
n
∂Un (X) = ∑ Uk−1 (X) ⊗Un−k (X) for all n ∈ N.
k=1
226 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information

Exercise 11. Let s be a semi-circular variable of variance 1. Let ∂ be the non-

commutative derivative with respect to s, considered as an unbounded operator on
L2 .
(i) Show that the (Un )n∈N0 are the orthogonal polynomials for the semi-circle
distribution, i.e., that
τ(Um (s)Un (s)) = δm,n .
(ii) Show that
∂ ∗ (Un (s) ⊗Um (s)) = Un+m+1 (s).
(iii) Show that for any p ∈ Chsi we have

k∂ ∗ (p ⊗ 1)k2 = kpk2 and k(id ⊗ τ)∂ pk2 ≤ kpk2 .

(Note that the latter is in this case a stronger version of Theorem 10.)
(iv) The statement in (iii) shows that (id ⊗ τ) ◦ ∂ is a bounded operator with
respect to k · k2 . Show that this is not true for ∂ , by proving that kUn (s)k2 = 1 and
√
k∂Un (s)k2 = n.

Exercise 12. (i) Show that we have for all n, m ≥ 0


Un+m +Um−n ,
 n≤m
CnUm = Un+m , n = m+1 .

Un+m −Un−m−2 , n ≥ m + 2


(ii) Let (M, τ) be a tracial W ∗ -probability space and x = x∗ ∈ M. Put αn :=

τ(Un−1 (x)). Assume that
∞
ξ := ∑ αnCn (x) ∈ L2 (M, τ).
n=1

Show that ξ is the conjugate variable for x.

Exercise 13. For P = (P1 , . . . , Pn ) with P1 , . . . , Pn ∈ ChX1 , . . . , Xn i we define the non-

commutative Jacobian

J P = (∂ j Pi )ni, j=1 ∈ Mn (ChX1 , . . . , Xn i⊗2 ).

If Q = (Q1 , . . . , Qn ) with Q1 , . . . , Qn ∈ ChX1 , . . . , Xn i, then we define

P ◦ Q = (P1 ◦ Q, . . . , Pn ◦ Q)

and Pi ◦ Q ∈ ChX1 , . . . , Xn i by

Pi ◦ Q(X1 , . . . , Xn ) := Pi (Q1 (X1 , . . . , Xn ), . . . , Qn (X1 , . . . , Xn )).

Express J (P ◦ Q) in terms of J P and J Q.

8.8 Additional exercises 227

Exercise 14. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for i =
1, . . . , n. Assume Φ ∗ (x1 , . . . , xn ) < ∞.
(i) Show that we have for λ > 0
1 ∗
Φ ∗ (λ x1 , . . . , λ xn ) = Φ (x1 , . . . , xn ).
λ2
(ii) Let now A = (ai j )ni, j=1 ∈ Mn (R) be a real invertible n × n matrix and put
n
yi := ∑ ai j x j .
j=1

Determine the relation between a conjugate system for x1 , . . . , xn and a conjugate

system for y1 , . . . , yn . Conclude from this the following.
◦ If A is orthogonal then we have

Φ ∗ (x1 , . . . , xn ) = Φ ∗ (y1 , . . . , yn ).

◦ For general A we have

1
Φ ∗ (y1 , . . . , yn ) ≤ Φ ∗ (x1 , . . . , xn ) ≤ kAk2 Φ ∗ (y1 , . . . , yn ).
kAk2
Chapter 9
Operator-Valued Free Probability Theory and Block Random
Matrices

Gaussian random matrices fit quite well into the framework of free probability the-
ory, asymptotically they are semi-circular elements and they have also nice freeness
properties with other (e.g., non-random) matrices. Gaussian random matrices are
used as input in many basic models in many different mathematical, physical, or
engineering areas. Free probability theory provides then useful tools for the calcu-
lation of the asymptotic eigenvalue distribution for such models. However, in many
situations, Gaussian random matrices are only the first approximation to the con-
sidered phenomena and one would also like to consider more general kinds of such
random matrices. Such generalizations often do not fit into the framework of our
usual free probability theory. However, there exists an extension, operator-valued
free probability theory, which still shares the basic properties of free probability but
is much more powerful because of its wider domain of applicability. In this chapter
we will first motivate the operator-valued version of a semi-circular element, and
then present the general operator-valued theory. Here we will mainly work on a for-
mal level; the analytic description of the theory, as well as its powerful consequences
will be dealt with in the following chapter.

9.1 Gaussian block random matrices

Consider AN = (ai j )Ni,j=1 . Our usual assumptions for a Gaussian random matrix are
that the entries ai j are, apart from the symmetry condition ai j = a∗ji , independent,
and identically distributed with a centred normal distribution. There are many ways
to relax these conditions, for example, one might consider noncentred normal dis-
tributions, relax the identical distribution by allowing a dependency of the variance
on the entry, or even give up the independence by allowing correlations between the
entries. One possibility for such correlations would be block matrices, where our
random matrix is build up as a d × d matrix out of blocks, where each block is an
ordinary Gaussian random matrix, but we allow that the blocks might repeat. For
example, for d = 3, we might consider a block matrix

229
230 9 Operator-Valued Free Probability Theory and Block Random Matrices
0.35 0.35

0.3 0.3

0.25 0.25

0.2 0.2

0.15 0.15

0.1 0.1

0.05 0.05

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

Fig. 9.1 Histogram of the dN eigenvalues of a random matrix XN , for N = 1000, for two different
realizations

 
AN BN CN
1
XN = √ BN AN BN  , (9.1)
3 C B A
N N N

where AN , BN ,CN are independent self-adjoint Gaussian N × N-random matrices.

As usual we are interested in the asymptotic eigenvalue distribution of XN as N → ∞
As in Chapter 5 we can look at numerical simulations for the eigenvalue distri-
bution of such matrices. In Fig. 9.1 there are two realizations of the random matrix
above for N = 1000. This suggests that again we have almost sure convergence to a
deterministic limit distribution. One sees, however, that this limiting distribution is
not a semicircle.
In this example we have of course the following description of the limiting
distribution. Because the joint distribution of {AN , BN ,CN } converges to that of
{s1 , s2 , s3 }, where {s1 , s2 , s3 } are free standard semi-circular elements, the limit
eigenvalue distribution we seek is the same as the distribution µX of
 
s s s
1  1 2 3
X=√ s2 s1 s2 (9.2)
3 s s s
3 2 1

with respect to tr3 ⊗ ϕ (where ϕ is the state acting on s1 , s2 , s3 ). Actually, because

we have the almost sure convergence of AN , BN ,CN (with respect to trN ) to s1 , s2 , s3 ,
this implies that the empirical eigenvalue distribution of XN converges almost surely
to µX . Thus, free probability yields directly the almost sure existence of a limit-
ing eigenvalue distribution of XN . However, the main problem, namely the concrete
determination of this limit µX , cannot be achieved within usual free probability the-
ory. Matrices of semi-circular elements do in general not behave nicely with respect
to trd ⊗ ϕ. However, there exists a generalization, operator-valued free probability
theory, which is tailor-made to deal with such matrices.
9.1 Gaussian block random matrices 231

In order to see what goes wrong on the usual level and what can be saved on
an “operator-valued” level we will now try to calculate the moments of X in our
usual combinatorial way. To construct our first example we shall need the idea of a
circular family of operators, generalizing the idea of a semi-circular family given in
Definition 2.6

Definition 1. Let {c1 , . . . , cn } be operators in (A, ϕ). If {Re(c1 ), Im(c1 ), . . . , Re(cn ),

Im(cn )} is a semi-circular family we say that {c1 , . . . , cn } is a circular family. We
are allowing the possibility that some of Re(ci ) or Im(ci ) is 0. So a semi-circular
family is a circular family.

Exercise 1. Using the notation of Section 6.8, show that for {c1 , . . . , cn } to be a
circular family it is necessary and sufficient that for every i1 , . . . , im ∈ [n] and every
(ε ) (ε ) (ε ) (ε )
ε1 , . . . , εm ∈ {−1, 1} we have ϕ(ci1 1 · · · cimm ) = ∑π∈NC2 (m) κπ (ci1 1 , . . . , cimm ).
Let us consider the more general situation where X is a d × d matrix X =
(si j )di, j=1 , where {si j } is a circular family with a covariance function σ , i.e.,

ϕ(si j skl ) = σ (i, j; k, l). (9.3)

The covariance function σ can here be prescribed quite arbitrarily, only subject to
some symmetry conditions in order to ensure that X is self-adjoint. Thus we allow
arbitrary correlations between different entries, but also that the variance of the si j
depends on (i, j). Note that we do not necessarily ask that all entries are semi-
circular. Off-diagonal elements can also be circular elements, as long as we have
s∗i j = s ji .
By Exercise 1 we have
d
1
trd ⊗ ϕ(X m ) =

∑ ϕ si1 i2 · · · sim i1
d i(1),...,i(m)=1
d
1
= ∑ ∑ ∏ σ i p , i p+1 ; iq , iq+1 .
d π∈NC (m) i(1),...,i(m)=1 (p,q)∈π
2

We can write this in the form

trd ⊗ ϕ(X m ) = ∑ Kπ ,
π∈NC2 (m)

where
d
1
Kπ := ∑ ∏ σ i p , i p+1 ; iq , iq+1 .
d i1 ,...,im =1 (p,q)∈π

So the result looks very similar to our usual description of semi-circular elements,
in terms of a sum over non-crossing pairings. However, the problem here is that the
Kπ are not multiplicative with respect to the block decomposition of π and thus
232 9 Operator-Valued Free Probability Theory and Block Random Matrices

they do not qualify to be considered as cumulants. Even worse, there does not exist
a straightforward recursive way of expressing Kπ in terms of “smaller” Kσ . Thus
we are outside the realm of the usual recursive techniques of free probability theory.
However, one can save most of those techniques by going to an “operator-valued”
level. The main point of such an operator-valued approach is to write Kπ as the trace
of a d × d-matrix κπ , and then realize that κπ has the usual nice recursive structure.
Namely, let us define the matrix κπ = ([κπ ]i j )di, j=1 by

d
[κπ ]i j := ∑ δii1 δ jim+1 ∏ σ i p , i p+1 ; iq , iq+1 .
i1 ...,im ,im+1 =1 (p,q)∈π

Then clearly we have Kπ = trd (κπ ). Furthermore, the value of κπ can be determined
by an iterated application of the covariance mapping

η : Md (C) → Md (C) given by η(B) := id ⊗ ϕ[XBX],

i.e., for B = (bi j ) ∈ Md (C) we have η(B) = ([η(B)]i j ) ∈ Md (C) with

d
[η(B)]i j = ∑ σ (i, k; l, j)bkl .
k,l=1

The main observation is now that the value of κπ is given by an iterated applica-
tion of this mapping η according to the nesting of the blocks of π. If one identifies
a non-crossing pairing with an arrangement of brackets, then the way that η has to
be iterated is quite obvious. Let us clarify these remarks with an example.
Consider the non-crossing pairing

π = {(1, 4), (2, 3), (5, 6)} ∈ NC2 (6).

The corresponding κπ is given by

d
[κπ ]i j = ∑ σ (i, i2 ; i4 , i5 ) · σ (i2 , i3 ; i3 , i4 ) · σ (i5 , i6 ; i6 , j).
i2 ,i3 ,i4 ,i5 ,i6 =1

We can then sum over the index i3 (corresponding to the block (2, 3) of π) without
interfering with the other blocks, giving
d d
[κπ ]i j = ∑ σ (i, i2 ; i4 , i5 ) · σ (i5 , i6 ; i6 , j) · ∑ σ (i2 , i3 ; i3 , i4 )
i2 ,i4 ,i5 ,i6 =1 i3 =1
d
= ∑ σ (i, i2 ; i4 , i5 ) · σ (i5 , i6 ; i6 , j) · [η(1)]i2 i4 .
i2 ,i4 ,i5 ,i6 =1
9.1 Gaussian block random matrices 233

Effectively we have removed the block (2, 3) of π and replaced it by the matrix
η(1).
Now we can do the summation over i(2) and i(4) without interfering with the
other blocks, thus yielding
d d
[κπ ]i j = ∑ σ (i5 , i6 ; i6 , j) · ∑ σ (i, i2 ; i4 , i5 ) · [η(1)]i2 i4
i5 ,i6 =1 i2 ,i4 =1
d
= ∑ σ (i5 , i6 ; i6 , j) · η η(1) ii .
5
i5 ,i6 =1

We have now removed the block (1, 4) of π and the effect of this was that we had to
apply η to whatever was embraced by this block (in our case, η(1)).
Finally, we can do the summation over i5 and i6 corresponding to the last block
(5, 6) of π; this results in
d d
[κπ ]i, j = ∑ η η(1) ii · ∑ σ (i5 , i6 ; i6 , j)
5
i5 =1 i6 =1
d
= ∑ η η(1) ii · [η(1)]i5 j
5
i5 =1

= η η(1) · η(1) i j .

Thus we finally have κπ = η η(1) · η(1), which corresponds to the bracket
expression (X(XX)X)(XX). In the same way every non-crossing pairing results in
an iterated application of the mapping η. For the five non-crossing pairings of six
elements one gets the following results:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6

η(1) · η(1) · η(1) η(1) · η η(1) η η η(1)

1 2 3 4 5 6 1 2 3 4 5 6

η η(1) · η(1) η η(1) · η(1)
Thus for m = 6 we get for trd ⊗ ϕ(X 6 ) the expression
n
trd η(1) · η(1) · η(1) + η(1) · η η(1) +
234 9 Operator-Valued Free Probability Theory and Block Random Matrices
o
+ η η(1) · η(1) + η η(1) · η(1) + η η η(1) .

Let us summarize our calculations for general moments. We have

n o
trd ⊗ ϕ(X m ) = trd ∑ κπ ,
π∈NC2 (m)

where each κπ is a d × d matrix, determined in a recursive way as above, by an

iterated application of the mapping η. If we remove trd from this equation then we
get formally the equation for a semi-circular distribution. Define

E := id ⊗ ϕ : Md (C) → Md (C),

then we have that the operator-valued moments of X satisfy

E(X m ) = ∑ κπ . (9.4)
π∈NC2 (m)

An element X whose operator-valued moments E(X m ) are calculated in such a

way is called an operator-valued semi-circular element (because only pairings are
needed).
One can now repeat essentially all combinatorial arguments from the scalar sit-
uation in this case. One only has to take care that the nesting of the blocks of π is
respected. Let us try this for the reformulation of the relation (9.4) in terms of for-
mal power series. We are using the usual argument by doing the summation over all
π ∈ NC2 (m) by collecting terms according to the block containing the first element
1. If π is a non-crossing pairing of m elements, and (1, r) is the block of π containing
1, then the remaining blocks of π must fall into two classes, those making up a non-
crossing pairing of the numbers 2, 3, . . . , r − 1 and those making up a non-crossing
pairing of the numbers r + 1, r + 2, . . . , m. Let us call the former pairing π1 and the
latter π2 , so that we can write π = (1, r) ∪ π1 ∪ π2 . Then the description above of κπ
shows that κπ = η(κπ1 ) · κπ2 . This results in the following recurrence relation for
the operator valued moments:
m−2
E[X m ] = E[X k ] · E[X m−k−2 ].

∑η
k=0

series, M(z) = ∑∞ m m
If we go over to the corresponding generating power m=0 E[X ]z ,
2

then this yields the relation M(z) = 1 + z η M(z) · M(z).
Note that m(z) := trd (M(z)) is the generating power series of the moments trd ⊗
ϕ(X m ), in which we are ultimately interested. Thus it is preferable to go over from
M(z) to the corresponding operator-valued Cauchy transform G(z) := z−1 M(1/z).
For this the equation above takes on the form

zG(z) = 1 + η(G(z)) · G(z). (9.5)

9.1 Gaussian block random matrices 235

Furthermore, we have for the Cauchy transform g of the limiting eigenvalue distri-
bution µX of our block matrices XN that

g(z) = z−1 m(1/z) = trd z−1 M(1/z) = trd (G(z)).

Since the number of non-crossing pairings of 2k elements is given by the Catalan

number Ck , for which one has Ck ≤ 4k , we can estimate the (operator) norm of the
matrix E(X 2k ) by

kE(X 2k )k ≤ kηkk · #(NC2 (2k)) ≤ kηkk · 22k .

Applying trd , this yields that the support of the limiting eigenvalue distribution of XN
is contained in the interval [−2kηk1/2 , +2kηk1/2 ]. Since all odd moments are zero,
the measure is symmetric. Furthermore, the estimate above on the operator-valued
moments E(X m ) shows that
∞
E(X 2k )
G(z) = ∑ 2k+1
k=0 z

is a power series expansion in 1/z of G(z), which converges in a neighbourhood of

∞. Since on bounded sets, {B ∈ Md (C) | kBk ≤ K} for some K > 0, the mapping

B 7→ z−1 1 + z−1 η(B) · B

is a contraction for |z| sufficiently large, G(z) is, for large z, uniquely determined as
the solution of the equation (9.5).
If we write G as G(z) = E (z − X)−1 , then this shows that it is not only a formal

power series, but actually an analytic (Md (C)-valued) function on the whole upper
complex half-plane. Analytic continuation shows then the validity of (9.5) for all z
in the upper half-plane.
Let us summarize our findings in the following theorem, which was proved in
[145].

Theorem 2. Fix d ∈ N. Consider, for each N ∈ N, block matrices

 (11)
. . . A(1d)

A
XN =  ... . . . ...  (9.6)
 

A(d1) ... A(dd)

(i j) N
where, for each i, j = 1, . . . , d, the blocks A(i j) = arp r,p=1
are Gaussian N × N
random matrices such that the collection of all entries
(i j)
{arp | i, j = 1, . . . , d; r, p = 1, . . . , N}

of the matrix XN forms a Gaussian family which is determined by

236 9 Operator-Valued Free Probability Theory and Block Random Matrices

(i j) ( ji)
arp = a pr for all i, j = 1, . . . , d; r, p = 1, . . . , N

and the prescription of mean zero and covariance

(i j) (kl) 1
E[arp aqs ] = δrs δ pq · σ (i, j; k, l), (9.7)
n
where n := dN.
Then, for N → ∞, the n × n matrix XN has a limiting eigenvalue distribution
whose Cauchy transform g is determined by g(z) = trd (G(z)), where G is an Md (C)-
valued analytic function on the upper complex half-plane, which is uniquely deter-
mined by the requirement that for z ∈ C+

lim zG(z) = 1, (9.8)

|z|→∞

(where 1 is the identity of Md (C)) and that for all z ∈ C+ , G satisfies the matrix
equation (9.5).

Note also that in [95] it was shown that there exists exactly one solution of the
fixed point equation (9.5) with a certain positivity property.
There exists a vast literature on dealing with such or similar generalizations of
Gaussian random matrices. Most of them deal with the situation where the entries
are still independent, but not identically distributed; usually, such matrices are re-
ferred to as band matrices. The basic insight that such questions can be treated
within the framework of operator-valued free probability theory is due to Shlyakht-
enko [155]. A very extensive treatment of band matrices (not using the language of
free probability, but the quite related Wigner type moment method) was given by
Anderson and Zeitouni [7].

Example 3. Let us now reconsider the limit (9.2) of our motivating band matrix
(9.1). Since there are some symmetries in the block pattern, the corresponding G
will also have some additional structure. To work this out let us examine η more
carefully. If B ∈ M3 (C), B = (bi j )i j then
 
b + b22 + b33 b12 + b21 + b23 b13 + b31 + b22
1  11
η(B) = b21 + b12 + b32 b11 + b22 + b33 + b13 + b31 b12 + b23 + b32  .
3
b13 + b31 + b22 b23 + b32 + b21 b11 + b22 + b33

We shall see later on that it is important to find the smallest unital subalgebra C of
M3 (C) that is invariant under η. We have

1 0 31
   
001
1
η(1) =  0 1 0  = 1 + H, where H = 0 0 0 ,
1 3
3 0 1 100
9.1 Gaussian block random matrices 237
   
002 000
1 2 2
η(H) = 0 2 0 = H + E, where E = 0 1 0 ,
3 3 3
200 000
and  
101
1 1 1
η(E) = 0 1 0 = 1 + H.
3 3 3
101
Now HE = EH = 0 and H 2 = 1 − E, so C, the span of {1, H, E}, is a three dimen-
sional commutative subalgebra invariant under η. Let us show that if G satisfies
zG(z) = 1 + η(G(z))G(z) and is analytic then G(z) ∈ C for all z ∈ C+ .
Let Φ : M3 (C) → M3 (C) be given by Φ(B) = z−1 (1+η(B)B). One easily checks
that
kΦ(B)k ≤ |z|−1 (1 + kηkkBk2 )
and
kΦ(B1 ) − Φ(B2 )k ≤ |z|−1 kηk(kB1 k + kB2 k)kB1 − B2 k.
Here kηk is the norm of η as a map from M3 (C) to M3 (C). Since η is completely
positive we have kηk = kη(1)k. In this particular example kηk = 4/3.
Now let Dε = {B ∈ M3 (C) | kBk < ε}. If the pair z ∈ C+ and ε > 0 simultane-
ously satisfies
1 + kηkε 2 < |z|ε and 2εkηk < |z|,
then Φ(Dε ) ⊆ Dε and kΦ(B1 ) − Φ(B2 )k ≤ ckB1 − B2 k for B1 , B2 ∈ Dε and c =
2ε|z|−1 kηk < 1. So when |z| is sufficiently large both conditions are satisfied and
Φ has a unique fixed point in Dε . If we choose B ∈ Dε ∩ C then all iterates of Φ
applied to B will remain in C and so the unique fixed point will be in Dε ∩ C.
Since M3 (C) is finite dimensional there are a finite number of linear functionals,
{ϕi }i , on M3 (C) (6 in our particular example) such that C = ∩i ker(ϕi ). Also for
each i, ϕi ◦ G is analytic so it is identically 0 on C+ if it vanishes on a non-empty
open subset of C+ . We have seen above that G(z) ∈ C provided |z| is sufficiently
large; thus G(z) ∈ C for all z ∈ C+ .
Hence G and η(G) must be of the form
   
f 0h 2 f +e 0 e+2h
1
G = 0 e 0 , η(G) =  0 2 f + e + 2 h 0  .
3
h0 f e+2h 0 2 f +e

So Equation (9.5) gives the following system of equations:

238 9 Operator-Valued Free Probability Theory and Block Random Matrices
0.35

0.3

0.25

0.2

0.15

Fig. 9.2 Comparison of the 0.1

histogram of eigenvalues of
0.05
XN , from Fig. 9.1, with the
numerical solution according
to (9.9) and (9.10) −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

e ( f + h) + 2 f 2 + h2

zf = 1+ ,
3
e (e + 2 ( f + h)) (9.9)
ze = 1 + ,
3
4 f h + e ( f + h)
zh = .
3
This system of equations can be solved numerically for z close to the real axis; then
dµ(t) 1
g(z) = tr3 G(z) = (2 f (z) + e(z))/3, = − lim Img (t + is) (9.10)
dt π s→0
gives the sought eigenvalue distribution. In Figure 3 we compare this numerical
solution (solid curve) with the histogram for the XN from Fig. 9.1, with blocks of
size 1000 × 1000.

9.2 General theory of operator-valued free probability

Not only semi-circular elements can be lifted to an operator-valued level, but such
a generalization exists for the whole theory. The foundation for this was laid by
Voiculescu in [184], Speicher showed in [163] that the combinatorial description of
free probability resting on the notion of free cumulants extends also to the operator-
valued case. We want to give here a short survey of some definitions and results.

Definition 4. Let A be a unital algebra and consider a unital subalgebra B ⊂ A. A

linear map E : A → B is a conditional expectation if

E(b) = b ∀b ∈ B (9.11)

and
E(b1 ab2 ) = b1 E(a)b2 ∀a ∈ A, ∀b1 , b2 ∈ B. (9.12)
9.2 General theory of operator-valued free probability 239

An operator-valued probability space (A, E, B) consists of B ⊂ A and a conditional

expectation E : A → B.
The operator-valued distribution of a random variable x ∈ A is given by all
operator-valued moments E(xb1 xb2 · · · bn−1 x) ∈ B (n ∈ N, b1 , . . . , bn−1 ∈ B).

Since, by the bimodule property (9.12),

E(b0 xb1 xb2 · · · bn−1 xbn ) = b0 · E(xb1 xb2 · · · bn−1 x) · bn ,

there is no need to include b0 and bn in the operator-valued distribution of x.

Definition 5. Consider an operator-valued probability space (A, E, B) and a family

(Ai )i∈I of subalgebras with B ⊂ Ai for all i ∈ I. The subalgebras (Ai )i∈I are free
with respect to E or free with amalgamation over B if E(a1 · · · an ) = 0 whenever
ai ∈ A ji , j1 6= j2 6= · · · =
6 jn , and E(ai ) = 0 for all i = 1, . . . , n. Random variables in
A or subsets of A are free with amalgamation over B if the algebras generated by
B and the variables or the algebras generated by B and the subsets, respectively, are
so.

Note that the subalgebra generated by B and some variable x is not just the linear
span of monomials of the form bxn , but, because elements from B and our variable
x do not commute in general, we must also consider general monomials of the form
b0 xb1 x · · · bn xbn+1 .
If B = A then any two subalgebras of A are free with amalgamation over B; so
the claim of freeness with amalgamation gets weaker as the subalgebra gets larger
until the subalgebra is the whole algebra at which point the claim is empty.
Operator-valued freeness works mostly like ordinary freeness, one only has to
take care of the order of the variables; in all expressions they have to appear in their
original order!

Example 6. 1) If x and {y1 , y2 } are free, then one has as in the scalar case

E(y1 xy2 ) = E y1 E(x)y2 ; (9.13)

and more general, for b1 , b2 ∈ B,

E(y1 b1 xb2 y2 ) = E y1 b1 E(x)b2 y2 . (9.14)

In the scalar case (where B would just be C and E = ϕ : A → C a unital linear
functional) we write of course ϕ y1 ϕ(x)y2 in the factorized form ϕ(y1 y2 )ϕ(x). In
the operator-valued case this is not possible; we have to leave the E(x) at its position
between y1 and y2 .
2) If {x1 , x2 } and {y1 , y2 } are free over B, then one has the operator-valued ver-
sion of (1.14),

E(x1 y1 x2 y2 ) = E x1 E(y1 )x2 · E(y2 ) + E(x1 ) · E y1 E(x2 )y2
240 9 Operator-Valued Free Probability Theory and Block Random Matrices

− E(x1 )E(y1 )E(x2 )E(y2 ). (9.15)

Definition 7. Consider an operator-valued probability space (A, E, B). We define

the corresponding (operator-valued) free cumulants (κnB )n∈N , κnB : An → B, by the
moment-cumulant formula

E(a1 · · · an ) = ∑ κπB (a1 , . . . , an ), (9.16)

π∈NC(n)

where arguments of κπB are distributed according to the blocks of π, but the cumu-
lants are nested inside each other according to the nesting of the blocks of π.

Example 8. Consider the non-crossing partition

1 2 3 4 5 6 7 8 9 10

π = (1, 10), (2, 5, 9), (3, 4), (6), (7, 8) ∈ NC(10).

The corresponding free cumulant κπB is given by

κπB (a1 , . . . , a10 ) = κ2B a1 · κ3B a2 · κ2B (a3 , a4 ), a5 · κ1B (a6 ) · κ2B (a7 , a8 ), a9 , a10 .

Remark 9. Let us give a more formal definition of the operator-valued free cumu-
lants in the following.
1) First note that the bimodule property (9.12) for E implies for κ B the property

κnB (b0 a1 , b1 a2 , . . . , bn an bn+1 ) = b0 κnB (a1 b1 , a2 b2 , . . . , an )bn+1

for all a1 , . . . , an ∈ A and b0 , . . . , bn+1 ∈ B. This can also stated by saying that κnB is
actually a map on the B-module tensor product A⊗B n = A ⊗B A ⊗B · · · ⊗B A.
2) Let now any sequence {Tn }n of B-bimodule maps: Tn : A⊗B n → B be given. In-
stead of Tn (x1 ⊗B · · · ⊗B xn ) we shall write Tn (x1 , . . . , xn ). Then there exists a unique
extension of T , indexed by non-crossing partitions, so that for every π ∈ NC(n) we
have a map Tπ : A⊗B n → B so that the following conditions are satisfied.
(i) when π = 1n we have Tπ = Tn ;
(ii) whenever π ∈ NC(n) and V = {l + 1, . . . , l + k} is an interval in π then

Tπ (x1 , . . . , xn ) = Tπ 0 (x1 , . . . , xl Tk (xl+1 , . . . , xl+k ), xl+k+1 , . . . , xn )

= Tπ 0 (x1 , . . . , xl , Tk (xl+1 , . . . , xl+k )xl+k+1 , . . . , xn ),

where π 0 ∈ NC(n − k) is the partition obtained by deleting from π the block V .

When l = 0 we interpret this property to mean

Tπ (x1 , . . . , xn ) = Tπ 0 (Tk (x1 , . . . , xk )xk+1 , . . . , xn )

9.2 General theory of operator-valued free probability 241

This second property is called the insertion property. One should notice that ev-
ery non-crossing partition can be reduced to a partition with a single block by the
process of interval stripping. For example with the partition π = {(1, 10), (2, 5, 9),
(3, 4), (6), (7, 8)} from above we strip the interval (3, 4) to obtain {(1, 10), (2, 5, 9),
(6), (7, 8)}. Then we strip the interval (7, 8) to obtain {(1, 10), (2, 5, 9), (6), }, then
we strip the (one element) interval (6) to obtain {(1, 10), (2, 5, 9)}; and finally we
strip the interval (2, 5, 9) to obtain the partition with a single block {(1, 10)}.

1 2 34 5 6 7 8 9 10 1 2 5 6 78 9 10 1 2 5 6 9 10 1 2 5 9 10 1 10

The insertion property requires that the family {Tπ }π be compatible with interval
stripping. Thus if there is an extension satisfying (i) and (ii), it must be unique.
Moreover we can compute Tπ by stripping intervals and the outcome is independent
of the order in which we strip the intervals.
3) Let us call a family {Tπ }π determined as above multiplicative. Then it is quite
straightforward to check the following.
◦ Let {Tπ }π be a multiplicative family of B-bimodule maps and define a new fam-
ily by
Sπ = ∑ Tσ (π ∈ NC(n)). (9.17)
σ ∈NC(n)
σ ≤π

Then the family {Sπ }π is also multiplicative.

◦ The relation (9.17) between two multiplicative families is via Möbius inversions
also equivalent to

Tπ = ∑ µ(σ , π)Sσ (π ∈ NC(n)), (9.18)

σ ∈NC(n)
σ ≤π

where µ is the Möbius function on non-crossing partitions; see Remark 2.9.

Again, multiplicativity of {Sπ }π implies multiplicativity of {Tπ }π , if the latter
is defined in terms of the former via (9.18).
4) Now we can use the previous to define the free cumulants κnB . As a starting
point we use the multiplicative family {Eπ }π which is given by the “moment maps”

En : A⊗B n → B, En (a1 , a2 , . . . , an ) = E(a1 a2 · · · an ).

For π = {(1, 10), (2, 5, 9), (3, 4), (6), (7, 8)} ∈ NC(10) from Example 8 the Eπ is,
for example, given by

Eπ (a1 , . . . , a10 ) = E a1 · E a2 · E(a3 a4 ) · a5 · E(a6 ) · E(a7 a8 ) · a9 · a10 .

Then we define the multiplicative family {κπB }π by

242 9 Operator-Valued Free Probability Theory and Block Random Matrices

κπB = ∑ µ(σ , π)Eσ (π ∈ NC(n)),

σ ∈NC(n)
σ ≤π

which is equivalent to (9.16). In particular, this means that the κnB are given by

κnB (a1 , . . . , an ) = ∑ µ(π, 1n )Eπ (a1 , . . . , an ). (9.19)

π∈NC(n)

Definition 10. 1) For a ∈ A we define its (operator-valued) Cauchy transform Ga :

B → B by
Ga (b) := E[(b − a)−1 ] = ∑ E[b−1 (ab−1 )n ],
n≥0

and its (operator-valued) R-transform Ra : B → B by

B
Ra (b) : = ∑ κn+1 (ab, ab, . . . , ab, a)
n≥0

= κ1B (a) + κ2B (ab, a) + κ3B (ab, ab, a) + · · · .

2) We say that s ∈ A is B-valued semi-circular if κnB (sb1 , sb2 , . . . , sbn−1 , s) = 0

for all n 6= 2, and all b1 , . . . , bn−1 ∈ B.

If s ∈ A is B-valued semi-circular then by the moment-cumulant formula we

have
E(sn ) = ∑ κπ (s, . . . , s).
π∈NC2 (n)

This is consistent with (9.4) of our example A = Md (C) and B = Md (C), where
these κ’s were defined by iterated applications of η(B) = E(XBX) = κ2B (XB, X).
As in the scalar-valued case one has the following properties, see [163, 184, 190].

Theorem 11. 1) The relation between the Cauchy transform and the R-transform is
given by

bG(b) = 1 + R(G(b)) · G(b) or G(b) = (b − R(G(b)))−1 . (9.20)

2) Freeness of x and y over B is equivalent to the vanishing of mixed B-valued

cumulants in x and y. This implies, in particular, the additivity of the R-transform:
Rx+y (b) = Rx (b) + Ry (b), if x and y are free over B.
3) If x and y are free over B, then we have the subordination property

Gx+y (b) = Gx b − Ry Gx+y (b) . (9.21)

4) If s is an operator-valued semi-circular element over B then Rs (b) = η(b),

where η : B → B is the linear map given by η(b) = E(sbs).

Remark 12. 1) As for the moments, one has to allow in the operator-valued cumu-
lants elements from B to spread everywhere between the arguments. So with B-
9.3 Relation between scalar-valued and matrix-valued cumulants 243

valued cumulants in random variables x1 , . . . , xr ∈ A we actually mean all expres-

sions of the form κnB (xi1 b1 , xi2 b2 , . . . , xin−1 bn−1 , xin ) (n ∈ N, 1 ≤ i(1), . . . , i(n) ≤ r,
b1 , . . . , bn−1 ∈ B).
2) One might wonder about the nature of the operator-valued Cauchy and R-
transforms. One way to interpret the definitions and the statements is as convergent
power series. For this one needs a Banach algebra setting, and then everything can be
justified as convergent power series for appropriate b; namely with kbk sufficiently
small in the R-transform case, and with b invertible and kb−1 k sufficiently small in
the Cauchy transform case. In those domains they are B-valued analytic functions
and such F have a series expansion of the form (say F is analytic in a neighbourhood
of 0 ∈ B)
∞
F(b) = F(0) + ∑ Fk (b, . . . , b), (9.22)
k=1

where Fk is a symmetric multilinear function from the k-fold product B × · · · × B to

B. In the same way as for usual formal power series, one can consider (9.22) as a
formal multilinear function series (given by the sequence (Fk )k of the coefficients
of F), with the canonical definitions for sums, products and compositions of such
series. One can then also read Definition 10 and Theorem 11 as statements about
such formal multilinear function series. For a more thorough discussion of this point
of view (and more results about operator-valued free probability) one should consult
the work of Dykema [68].

As illuminated in Section 9.1 for the case of an operator-valued semicircle, many

statements from the scalar-valued version of free probability are still true in the
operator-valued case; actually, on a combinatorial (or formal multilinear function
series) level, the proofs are essentially the same as in the scalar-valued case, one only
has to take care that one respects the nested structure of the blocks of non-crossing
partitions. One can also extend some of the theory to an analytic level. In particular,
the operator-valued Cauchy transform is an analytic operator-valued function (in the
sense of Fréchet-derivatives) on the operator upper half-plane H+ (B) := {b ∈ B |
Im(b) > 0 and invertible}. In the next chapter we will have something more to say
about this, when coming back to the analytic theory of operator-valued convolution.
One should, however, note that the analytic theory of operator-valued free convo-
lution lacks at the moment some of the deeper statements of the scalar-valued the-
ory; developing a reasonable analogue of complex function theory on an operator-
valued level, addressed as free analysis, is an active area in free probability (and also
other areas) at the moment, see, for example, [107, 193, 194, 195, 202].

9.3 Relation between scalar-valued and matrix-valued cumulants

Let us now present a relation from [138] between matrix-valued and scalar-valued
cumulants, which shows that taking matrices of random variables goes nicely with
freeness, at least if we allow for the operator-valued version. The proof follows by
comparing the moment-cumulant formulas for the two situations.
244 9 Operator-Valued Free Probability Theory and Block Random Matrices

Proposition 13. Let (C, ϕ) be a non-commutative probability space and fix d ∈ N.

Then (A, E, B), with

A := Md (C), B := Md (C) ⊂ Md (C), E := id ⊗ ϕ : Md (C) → Md (C),

is an operator-valued probability space. We denote the scalar cumulants with re-

spect to ϕ by κ and the operator-valued cumulants with respect to E by κ B .
Consider now akij ∈ C (i, j = 1, . . . , d; k = 1, . . . , n) and put, for each k = 1, . . . , n,
Ak = (akij )di, j=1 ∈ Md (C). Then the operator-valued cumulants of the Ak are given in
terms of the cumulants of their entries as follows:
d
[κnB (A1 , A2 , . . . , An )]i j = κn a1ii2 , a2i2 i3 , . . . , anin j .

∑ (9.23)
i2 ,...,in =1

Proof: Let us begin by noting that

d
ϕ a1ii2 a2i2 i3 · · · anin j .

[E(A1 A2 · · · An ]i j = ∑
i2 ,...,in =1

Let π ∈ NC(n) is a non-crossing partition, we claim that

d
ϕπ a1ii2 , a2i2 i3 , . . . , anin j .

[Eπ (A1 , A2 , . . . , An ]i j = ∑
i2 ,...,in =1

If π has two blocks: π = (1, . . . , k), (k + 1, . . . , n), then this is just matrix multiplica-
tion. We then get the general case by using the insertion property and induction. By
Möbius inversion we have

[κnB (A1 , A2 , . . . , An ]i j = ∑ µ(π, 1n )[Eπ (A1 , A2 , . . . , An ]i j .

π∈NC(n)
d
µ(π, 1n )ϕπ a1ii2 , a2i2 i3 , . . . , anin j

= ∑ ∑
i2 ,...,in =1 π∈NC(n)
d
κπ a1ii2 , a2i2 i3 , . . . , anin j .

= ∑
i2 ,...,in =1

Corollary 14. If the entries of two matrices are free in (C, ϕ), then the two matrices
themselves are free with respect to E : Md (C) → Md (C).

Proof: Let A1 and A2 be the subalgebras of A which are generated by B and by

the respective matrix. Note that the entries of any matrix from A1 are free from the
entries of any matrix from A2 . We have to show that mixed B-valued cumulants in
9.4 Moving between different levels 245

those two algebras vanish. So consider A1 , . . . , An with Ak ∈ Ar(k) . We shall show

that for all n and all r(1), . . . , r(n) ∈ {1, 2} we have κnB (A1 , . . . , An ) = 0 whenever
the r’s are not all equal. As before we write Ak = (akij ). By freeness of the entries
we have κn (a1ii2 , a2i2 i3 , . . . , anin j ) = 0 whenever the r’s are not all equal. Then by The-
orem 13 the (i, j)-entry of κnB (A1 , . . . , An ) equals 0 and thus κnB (A1 , . . . , An ) = 0 as
claimed.
Example 15. If {a1 , b1 , c1 , d1 } and {a2 , b2 , c2 , d2 } are free in (C, ϕ), then the propo-
sition above says that

a1 b1 a2 b2
X1 = and X2 =
c1 d1 c2 d2

are free with amalgamation over M2 (C) in (M2 (C), id ⊗ϕ). Note that in general they
are not free in the scalar-valued non-commutative probability space (M2 (C), tr ⊗ ϕ).
Let us make this distinction clear by looking on a small moment. We have

a a + b1 c2 a1 b2 + b1 d2
X1 X2 = 1 2 .
c1 a2 + d1 c2 c1 b2 + d1 d2

Applying the trace ψ := tr ⊗ ϕ we get in general

ψ(X1 X2 ) = ϕ(a1 )ϕ(a2 ) + ϕ(b1 )ϕ(c2 ) + ϕ(c1 )ϕ(b2 ) + ϕ(d1 )ϕ(d2 ) /2
6= (ϕ(a1 ) + ϕ(d1 )) · (ϕ(a2 ) + ϕ(d2 ))/4
= ψ(X1 ) · ψ(X2 )

but under the conditional expectation E := id ⊗ ϕ we always have

ϕ(a1 )ϕ(a2 ) + ϕ(b1 )ϕ(c2 ) ϕ(a1 )ϕ(b2 ) + ϕ(b1 )ϕ(d2 )
E(X1 X2 ) =
ϕ(c1 )ϕ(a2 ) + ϕ(d1 )ϕ(c2 ) ϕ(c1 )ϕ(b2 ) + ϕ(d1 )ϕ(d2 )

ϕ(a1 ) ϕ(b1 ) ϕ(a2 ) ϕ(b2 )
=
ϕ(c1 ) ϕ(d1 ) ϕ(c2 ) ϕ(d2 )
= E(X1 ) · E(X2 ).

9.4 Moving between different levels

We have seen that in interesting problems, like random matrices with correlation be-
tween the entries, the scalar-valued distribution usually has no nice structure. How-
ever, often the distribution with respect to an intermediate algebra B has a nice struc-
ture and thus it makes sense to split the problem into two parts. First consider the
distribution with respect to the intermediate algebra B. Derive all (operator-valued)
formulas on this level. Then at the very end, go down to C. This last step usually has
to be done numerically. Since our relevant equations (like (9.5)) are not linear, they
are not preserved under the application of the mapping B → C, meaning that we do
not find closed equations on the scalar-valued level. Thus, the first step is nice and
246 9 Operator-Valued Free Probability Theory and Block Random Matrices

gives us some conceptual understanding of the problem, whereas the second step
does not give much theoretical insight, but is more of a numerical nature. Clearly,
the bigger the last step, i.e., the larger B, the less we win with working on the B-level
first. So it is interesting to understand how symmetries of the problem allow us to
restrict from B to some smaller subalgebra D ⊂ B. In general, the behaviour of an
element as a B-valued random variable might be very different from its behaviour
as a D-valued random variable. This is reflected in the fact that in general the ex-
pression of the D-valued cumulants of a random variable in terms of its B-valued
cumulants is quite complicated. So we can only expect that nice properties with re-
spect to B pass over to D if the relation between the corresponding cumulants is
easy. The simplest such situation is where the D-valued cumulants are the restric-
tion of the B-valued cumulants. It turns out that it is actually quite easy to decide
whether this is the case.

Proposition 16. Consider unital algebras C ⊂ D ⊂ B ⊂ A and conditional ex-

pectations EB : A → B and ED : A → D which are compatible in the sense that
ED ◦ EB = ED . Denote the free cumulants with respect to EB by κ B and the free cu-
mulants with respect to ED by κ D . Consider now x ∈ A. Assume that the B-valued
cumulants of x satisfy

κnB (xd1 , xd2 , . . . , xdn−1 , x) ∈ D ∀n ≥ 1, ∀d1 , . . . , dn−1 ∈ D.

Then the D-valued cumulants of x are given by the restrictions of the B-valued
cumulants: for all n ≥ 1 and all d1 , . . . , dn−1 ∈ D we have

κnD (xd1 , xd2 , . . . , xdn−1 , x) = κnB (xd1 , xd2 , . . . , xdn−1 , x).

This statement is from [137]. Its proof is quite straightforward by comparing the
corresponding moment-cumulant formulas. We leave it to the reader.
Exercise 2. Prove Proposition 16.
Proposition 16 allows us in particular to check whether a B-valued semi-circular
element x is also semi-circular with respect to a smaller D ⊂ B. Namely, all B-
valued cumulants of x are given by nested iterations of the mapping η. Hence, if η
maps D to D, then this property extends to all B-valued cumulants of x restricted to
D.

Corollary 17. Let D ⊂ B ⊂ A be as above. Consider a B-valued semi-circular ele-

ment x. Let η : B → B, η(b) = EB (xbx) be the corresponding covariance mapping.
If η(D) ⊂ D, then x is also a D-valued semi-circular element, with covariance
mapping given by the restriction of η to D.

Remark 18. 1) This corollary allows for an easy determination of the smallest
canonical subalgebra with respect to which x is still semi-circular. Namely, if x is
B-semi-circular with covariance mapping η : B → B, we let D be the smallest unital
9.4 Moving between different levels 247

subalgebra of B which is mapped under η into itself. Note that this D exists because
the intersection of two subalgebras which are invariant under η is again a subalge-
bra invariant under η. Then x is also semi-circular with respect to this D. Note that
the corollary above is not an equivalence, thus there might be smaller subalgebras
than D with respect to which x is still semi-circular; however, there is no systematic
way to detect those.
2) Note also that with some added hypotheses the above corollary might become
an equivalence; for example, in [137] it was shown: Let (A, E, B) be an operator-
valued probability space, such that A and B are C∗ -algebras. Let F : B → C =: D ⊂
B be a faithful state. Assume that τ = F ◦ E is a faithful trace on A. Let x be a
B-valued semi-circular variable in A. Then the distribution of x with respect to τ is
the semicircle law if and only if E(x2 ) ∈ C.

Example 19. Let us see what the statements above tell us about our model case
of d × d self-adjoint matrices with semi-circular entries X = (si j )di, j=1 . In Section
9.1 we have seen that if we allow arbitrary correlations between the entries, then
we get a semi-circular distribution with respect to B = Md (C). (We calculated this
explicitly, but one could also invoke Proposition 13 to get a direct proof of this.) The
mapping η : Md (C) → Md (C) was given by
d
[η(B)]i j = ∑ σ (i, k; l, j)bkl .
k,l=1

Let us first check in which situations we can expect a scalar-valued semi-circular

distribution. This is guaranteed, by the corollary above, if η maps C to itself, i.e., if
η(1) is a multiple of the identity matrix. We have
d
[η(1)]i j = ∑ σ (i, k; k, j).
k=1

Thus if ∑dk=1 σ (i, k; k, j) is zero for i 6= j and otherwise independent from i, then X
is semi-circular. The simplest situation where this happens is if all si j , 1 ≤ i ≤ j ≤ d,
are free and have the same variance.
Let us now consider the more special band matrix situation where si j , 1 ≤ i ≤
j ≤ d are free, but not necessarily of the same variance, i.e., we assume that for
i ≤ j, k ≤ l we have
(
σi j , if i = k, j = l
σ (i, j; k, l) = . (9.24)
0, otherwise

Note that this also means that σ (i, k; k, i) = σik , because we have ski = sik . Then
d
[η(1)]i j = δi j ∑ σik .
k=1
248 9 Operator-Valued Free Probability Theory and Block Random Matrices

We see that in order to get a semi-circular distribution we do not need the same
variance everywhere, but that it suffices to have the same sum over the variances in
each row of the matrix.
However, if this sum condition is not satisfied then we do not have a semi-
circular distribution. Still, having all entries free, gives more structure than just
semi-circularity with respect to Md (C). Namely, we see that with the covariance
(9.24) our η maps diagonal matrices into diagonal matrices. Thus we can pass from
Md (C) over to the subalgebra D ⊂ Md (C) of diagonal matrices, and get that for
such situations X is D-semi-circular. The conditional expectation ED : A → D in
this case is of course given by
   
a11 . . . a1d ϕ(a11 ) . . . 0
 .. . . ..   .. .. ..  .
 . . .  7→  . . . 
ad1 . . . add 0 . . . ϕ(add )

Even if we do not have free entries, we might still have some symmetries in the
correlations between the entries which let us pass to some subalgebra of Md (C).
As pointed out in Remark 18 we should look for the smallest subalgebra which is
invariant under η. This was exactly what we did implicitly in our Example 3. There
we observed that η maps the subalgebra
  
 f 0h 
C :=  0 e 0  | e, f , h ∈ C
h0 f
 

into itself. (And we actually saw in Example 3 that C is the smallest such subalge-
bra, because it is generated from the unit by iterated application of η.) Thus the X
from this example, (9.2), is not only M3 (C)-semi-circular, but actually also C-semi-
circular. In our calculations in Example 3 this was implicitly taken into account,
because there we restricted our Cauchy transform G to values in C, i.e., effectively
we solved the equation (9.5) for an operator-valued semi-circular element not in
M3 (C), but in C.

9.5 A non-self-adjoint example

In order to treat a more complicated example let us look at a non-selfajoint situation
as it often shows up in applications (e.g., in wireless communication, see [174]).
Consider the d × d matrix H = B + C where B ∈ Md (C) is a deterministic matrix
and C = (ci j )di, j=1 has as entries ∗-free circular elements ci j (i, j = 1, . . . , d), without
any symmetry conditions; however with varying variance, i.e. ϕ(ci j c∗i j ) = σi j . What
we want to calculate is the distribution of HH ∗ .
Such an H might arise as the limit of block matrices in Gaussian random matri-
ces, where we also allow a non-zero mean for the Gaussian entries. The means are
separated off in the matrix B. We refer to [174] for more information on the use of
9.5 A non-self-adjoint example 249

such non-mean zero Gaussian random matrices (as Ricean model) and why one is
interested in the eigenvalue distribution of HH ∗ .
One can reduce this to a problem involving self-adjoint matrices by observing
that HH ∗ has the same distribution as the square of

0 H 0 B 0 C
T := = + .
H∗ 0 B∗ 0 C∗ 0

Let us use the notations

0 B 0 C
B̂ := and Ĉ := .
B∗ 0 C∗ 0

The matrix Ĉ is a 2d × 2d self-adjoint matrix with ∗ -free circular entries, thus of the
type we considered in Section 9.1. Hence, by the remarks in Example 19, we know
that it is a D2d -valued semi-circular element, where D2d ⊂ M2d (C) is the subalgebra
of diagonal matrices; one checks easily that the covariance function η : D2d → D2d
is given by
D1 0 η1 (D2 ) 0
η = , (9.25)
0 D2 0 η2 (D1 )
where η1 : Dd → Dd and η2 : Dd → Dd are given by

η1 (D2 ) = id ⊗ ϕ[CD2C∗ ]

η2 (D1 ) = id ⊗ ϕ[C∗ D1C].

Furthermore, by using Propositions 13 and 16, one can easily see that B̂ and Ĉ are
free over D2d .
Let GT and GT 2 be the D2d -valued Cauchy transform of T and T 2 , respectively.
We write the latter as
G1 (z) 0
GT 2 (z) = ,
0 G2 (z)
where G1 and G2 are Dd -valued. Note that one also has the general relation GT (z) =
zGT 2 (z2 ).
By using the general subordination relation (9.21) and the fact that Ĉ is semi-
circular with covariance map η given by (9.25), we can now derive the following
equation for GT 2 :

zGT 2 (z2 ) = GT (z) = GB̂ z − RĈ (GT (z))

" −1 #
G1 (z2 ) 0

0 B
= ED2d z − zη − ∗
0 G2 (z2 ) B 0
250 9 Operator-Valued Free Probability Theory and Block Random Matrices
" −1 #
z − zη1 (G2 (z2 )) −B
= ED2d .
−B∗ z − zη2 (G1 (z2 ))

By using the well-known Schur complement formula for the inverse of 2 × 2 block
matrices (see also next chapter for more on this), this yields finally
" −1 #
1 ∗
zG1 (z) = EDd 1 − η1 (G2 (z)) + B B
z − zη2 (G1 (z))

and " −1 #

1
zG2 (z) = EDd 1 − η2 (G1 (z)) + B∗ B .
z − zη1 (G2 (z))
These equations have actually been derived in [90] as the fixed point equations
for a so-called deterministic equivalent of the square of a random matrix with non-
centred, independent Gaussians with non-constant variance as entries. Thus our cal-
culations show that going over to such a deterministic equivalent consists in replac-
ing the original random matrix by our matrix T . We will come back to this notion
of “deterministic equivalent” in the next chapter.
Chapter 10
Deterministic Equivalents, Polynomials in Free Variables, and
Analytic Theory of Operator-Valued Convolution

The notion of a “deterministic equivalent” for random matrices, which can be found
in the engineering literature, is a non-rigorous concept which amounts to replacing a
random matrix model of finite size (which is usually unsolvable) by another problem
which is solvable, in such a way that, for large N, the distributions of both problems
are close to each other. Motivated by our example in the last chapter we will in this
chapter propose a rigorous definition for this concept, which relies on asymptotic
freeness results. This “free deterministic equivalent” was introduced by Speicher
and Vargas in [166].
This will then lead directly to the problem of calculating the distribution of self-
adjoint polynomials in free variables. We will see that, in contrast to the corre-
sponding classical problem on the distribution of polynomials in independent ran-
dom variables, there exists a general algorithm to deal with such polynomials in
free variables. The main idea will be to relate such a polynomial with an operator-
valued linear polynomial, and then use operator-valued convolution to deal with the
latter. The successful implementation of this program is due to Belinschi, Mai and
Speicher [23]; see also [12].

10.1 The general concept of a free deterministic equivalent

Voiculescu’s asymptotic freeness results on random matrices state that if we con-
sider tuples of independent random matrix ensembles, such as Gaussian, Wigner
or Haar unitaries, their collective behaviour in the large N limit is almost surely
that of a corresponding collection of free (semi-)circular and Haar unitary opera-
tors. Moreover, if we consider these random ensembles along with deterministic
ensembles, having a given asymptotic distribution (with respect to the normalized
trace), then, almost surely, the corresponding limiting operators also become free
from the random elements. This means of course that if we consider a function in
our matrices, then this will, for large N, be approximated by the same function in
our limiting operators. We will in the following only consider functions which are
given by polynomials. Furthermore, all our polynomials should be self-adjoint (in

251
252 10 Polynomials in Free Variables and Operator-Valued Convolution

the sense that if we plug in self-adjoint matrices, we will get as output self-adjoint
matrices), so that the eigenvalue distribution of those polynomials can be recovered
by calculating traces of powers.
To be more specific, let us consider a collection of independent random and de-
terministic N × N matrices:
n o
(N) (N)
XN = X1 , . . . , Xi1 : independent self-adjoint Gaussian matrices,
n o
(N) (N)
YN = Y1 , . . . ,Yi2 : independent non-self-adjoint Gaussian matrices,
n o
(N) (N)
UN = U1 , . . . ,Ui3 : independent Haar distributed unitary matrices,
n o
(N) (N)
DN = D1 , . . . , Di4 : deterministic matrices,

and a self-adjoint polynomial P in non-commuting variables (and their adjoints); we

evaluate this polynomial in our matrices
(N) (N) (N) (N) (N) (N) (N) (N)
P X1 , . . . , Xi1 ,Y1 , . . . ,Yi2 ,U1 , . . . ,Ui3 , D1 , . . . , Di4 =: PN .

Relying on asymptotic freeness results, we can then compute the asymptotic

eigenvalue distribution of PN by going over the limit. We know that we can find
collections S, C, U, D of operators in a non-commutative probability space (A, ϕ),

S = {s1 , . . . , si1 } : free semi-circular elements,

C = {c1 , . . . , ci2 } : ∗-free circular elements,

U = u1 , . . . , ui3 : ∗-free Haar unitaries,
D = {d1 , . . . , di4 } : abstract elements,

such that S, C, U, D are ∗-free and the joint distribution of d1 , . . . , di4 is given by the
(N) (N)
asymptotic joint distribution of D1 , . . . , Di4 . Then, almost surely, the asymptotic

distribution of PN is that of P s1 , . . . , si1 , c1 , . . . , ci2 , u1 , . . . , ui3 , d1 , . . . , di4 =: p∞ , in
the sense that, for all k, we have almost surely

lim tr(PNk ) = ϕ(pk∞ ).

N→∞

In this way we can reduce the problem of the asymptotic distribution of PN to the
study of the distribution of p∞ .
A common obstacle of this procedure is that our deterministic matrices may not
have an asymptotic joint distribution. It is then natural to consider, for a fixed N, the
(N) (N)
corresponding “free model” P s1 , . . . , si1 , c1 , . . . , ci2 , u1 , . . . , ui3 , d1 , . . . , di4 =: pN,
where, just as before, the random matrices are replaced by the corresponding free
(N) (N)
operators in some space (AN , ϕN ), but now we let the distribution of d1 , . . . , di4
(N) (N)
be exactly the same as the one of D1 , . . . , Di4 with respect to tr. The free model
10.2 A motivating example: reduction to multiplicative convolution 253

pN will be called the free deterministic equivalent for PN . This was introduced and
investigated in [166, 175].
(In case one wonders about the notation p N : the symbol is according to [31] the
generic qualifier for denoting the free version of some classical object or operation.)
The difference between the distribution of p N and the (almost sure or expected)
distribution of PN is given by the deviation from freeness of XN , YN , UN , DN , the
deviation of XN , YN from being free (semi)-circular systems, and the deviation of
UN from a free system of Haar unitaries. Of course, for large N these deviations
get smaller and thus the distribution of p N becomes a better approximation for the
distribution of PN
Let us denote by GN the Cauchy transform of PN and by G N the Cauchy trans-
form of the free deterministic equivalent p N . Then the usual asymptotic freeness
estimates show that moments of PN are, for large N, with very high probability
close to corresponding moments of p N (where the estimates involve also the opera-
tor norms of the deterministic matrices). This means that for N → ∞ the difference
between the Cauchy transforms GN and G N goes almost surely to zero, even if there
do not exist individual limits for both Cauchy transforms.
In the engineering literature there exists also a version of the notion of a deter-
ministic equivalent (apparently going back to Girko [78], see also [90]). This deter-
ministic equivalent consists in replacing the Cauchy transform GN of the considered
random matrix model (for which no analytic solution exists) by a function ĜN which
is defined as the solution of a specified system of equations. The specific form of
those equations is determined in an ad hoc way, depending on the considered prob-
lem, by making approximations for the equations of GN , such that one gets a closed
system of equations. In many examples of deterministic equivalents (see, e.g., [62,
Chapter 6]) it turns out that actually the Cauchy transform of our free deterministic
equivalent is the solution to those modified equations, i.e., that ĜN = G N . We saw
one concrete example of this in Section 9.5 of the last chapter.
Our definition of a deterministic equivalent gives a more conceptual approach
and shows clearly how this notion relates with free probability theory. In some sense
this indicates that the only meaningful way to get a closed system of equations when
dealing with random matrices is to replace the random matrices by free variables.
Deterministic equivalents are thus polynomials in free variables and it remains
to develop tools to deal with such polynomials in an effective way. It turns out that
operator-valued free probability theory provides such tools. We will elaborate on
this in the remaining sections of this chapter.

10.2 A motivating example: reduction to multiplicative convolution

In the following we want to see how problems about polynomials in free variables
can be treated by means of operator-valued free probability. The main idea in this
context is that complicated polynomials can be transformed into simpler ones by
going to matrices (and thus go from scalar-valued to operator-valued free probabil-
ity). Since the only polynomials which we can effectively deal with are sums and
254 10 Polynomials in Free Variables and Operator-Valued Convolution

products (corresponding to additive and multiplicative convolution, respectively) we

should aim to transform general polynomials into sums or products.
In this section we will treat one special example from [25] to get an idea how this
can be achieved. In this case we will transform our problem into a product of two
free operator-valued matrices.
Let a1 , a2 , b1 , b2 be self-adjoint random variables in a non-commutative probabil-
ity space (C, ϕ), such that {a1 , a2 } and {b1 , b2 } are free and consider the polynomial
p = a1 b1 a1 + a2 b2 a2 . This p is self-adjoint and its distribution, i.e., the collection of
its moments, is determined by the joint distribution of {a1 , a2 }, the joint distribution
of {b1 , b2 }, and the freeness between {a1 , a2 } and {b1 , b2 }. However, there is no
direct way of calculating this distribution.
We observe now that the distribution µ p of p is the same (modulo a Dirac mass
at zero) as the distribution of the element

a1 b1 a1 + a2 b2 a2 0 a1 a2 b1 0 a1 0
= , (10.1)
0 0 0 0 0 b2 a2 0

in the non-commutative probability space (M2 (C), tr2 ⊗ ϕ). But this element has the
same moments as
2
a1 0 a1 a2 b1 0 a1 a1 a2 b1 0
= =: AB. (10.2)
a2 0 0 0 0 b2 a2 a1 a22 0 b2

So, with µAB denoting the distribution of AB with respect to tr2 ⊗ ϕ, we have
1 1
µAB = µ p + δ0 .
2 2
Since A and B are not free with respect to tr2 ⊗ ϕ, we cannot use scalar-valued
multiplicative free convolution to calculate the distribution of AB. However, with
E : M2 (C) → M2 (C) denoting the conditional expectation onto deterministic 2 × 2
matrices, we have that the scalar-valued distribution µAB is given by taking the trace
tr2 of the operator-valued distribution of AB with respect to E. But on this operator-
valued level the matrices A and B are, by Corollary 9.14, free with amalgamation
over M2 (C). Furthermore, the M2 (C)-valued distribution of A is determined by the
joint distribution of a1 and a2 and the M2 (C)-valued distribution of B is determined
by the joint distribution of b1 and b2 . Hence, the scalar-valued distribution µ p will
be given by first calculating the M2 (C)-valued free multiplicative convolution of
A and B to obtain the M2 (C)-valued distribution of AB and then getting from this
the (scalar-valued) distribution µAB by taking the trace over M2 (C). Thus we have
rewritten our original problem as a problem on the product of two free operator-
valued variables.
10.3 Reduction to operator-valued additive convolution via the linearization trick 255

10.3 The general case: reduction to operator-valued additive convolution via

the linearization trick
Let us now be more ambitious and look at an arbitrary self-adjoint polynomial P ∈
ChX1 , . . . , Xn i, evaluated as p = P(x1 , . . . , xn ) ∈ A in free variables x1 , . . . , xn ∈ A. In
the last section we replaced our original variable by a matrix which has (up to some
atoms), with respect to tr ⊗ ϕ the same distribution and which is actually a product
of matrices in the single operators. It is quite unlikely that we can do the same in
general. However, if we do not insist on using the trace as our state on matrices, but
allow for example the evaluation at the (1, 1) entry, then we gain much flexibility
and can indeed find an equivalent matrix which splits even into a sum of matrices of
the individual variables. What we essentially need for this is, given the polynomial
P, to construct in a systematic way a matrix, such that the entries of this matrix are
polynomials of degree 0 or 1 in our variables and such that the inverse of this matrix
has as (1, 1) entry (z − P)−1 . Let us ignore for the moment the degree condition on
the entries and just concentrate on the invertibility questions. The relevant tool in
this context is the following well-known result about Schur complements.

Proposition 1. Let A be a complex and unital algebra and let elements a, b, c, d ∈ A

be given. We assume that d is invertible in A. Then the following statements are
equivalent:

ab
(i) The matrix is invertible in M2 (C) ⊗ A.
cd
(ii) The Schur complement a − bd −1 c is invertible in A.
If the equivalent conditions (i) and (ii) are satisfied, we have the relation
−1
(a − bd −1 c)−1 0 1 −bd −1

ab 1 0
= . (10.3)
cd −d −1 c 1 0 d −1 0 1

In particular, the (1, 1) entry of the inverse is given by (a − bd −1 c)−1 :

−1
(a − bd −1 c)−1 ∗

ab
= .
cd ∗ ∗

Proof: A direct calculation shows that

1 bd −1 a − bd −1 c 0

ab 1 0
= (10.4)
cd 0 1 0 d d −1 c 1

holds. Since the first and third matrix are both invertible in M2 (C) ⊗ A,
−1 −1
1 bd −1 1 −bd −1

1 0 1 0
= and = ,
0 1 0 1 d −1 c 1 −d −1 c 1
256 10 Polynomials in Free Variables and Operator-Valued Convolution

the stated equivalence of (i) and (ii), as well as formula (10.3), follows from (10.4).

What we now need, given our operator p = P(x1 , . . . , xn ), is to find a block ma-
trix such that the (1, 1) entry of the inverse of this block matrix corresponds to the
resolvent (z − p)−1 and that furthermore all the entries of this block matrix have at
most degree 1 in our variables. More precisely, we are looking for an operator

p̂ = b0 ⊗ 1 + b1 ⊗ x1 + · · · + bn ⊗ xn ∈ MN (C) ⊗ A

for some matrices b0 , . . . , bn ∈ MN (C) of dimension N, such that z − p is invertible

in A if and only if Λ (z) − p̂ is invertible in MN (C) ⊗ A. Hereby, we put

z 0 ...

0
0 0 . . . 0
Λ (z) =  . . . ..  for all z ∈ C. (10.5)
 
 .. .. . . .
0 0 ... 0

As we will see in the following, the linearization in terms of the dimension N ∈ N

and the matrices b0 , . . . , bn ∈ MN (C) usually depends only on the given polynomial
P ∈ ChX1 , . . . , Xn i and not on the special choice of elements x1 , . . . , xn ∈ A.
The first famous linearization trick in the context of operator algebras and random
matrices goes back to Haagerup and Thorbjørnsen [88, 89] and turned out to be a
powerful tool in many different respects. However, there was the disadvantage that,
even if we start from a self-adjoint polynomial P, in general, we will not end up with
a linearization p̂, which is self-adjoint as well. Then, in [5], Anderson presented a
new version of this linearization procedure, which preserved self-adjointness.
One should note, however, that the idea of linearizing polynomial (or actually ra-
tional, see Section 10.6)) problems by going to matrices is actually much older and
is known under different names in different communities; like “Higman’s trick” [98]
or “linearization by enlargement” in non-commutative ring theory [56], “recogniz-
able power series” in automata theory and formal languages [154], , or “descriptor
realization” in control theory [94]. For a survey on linearization, non-commutative
system realization and its use in free probability, see [93].
Here is now our precise definition of linearization.

Definition 2. Let P ∈ ChX1 , . . . , Xn i be given. A matrix

0U
P̂ := ∈ MN (C) ⊗ ChX1 , . . . , Xn i,
V Q

where
◦ N ∈ N is an integer,
◦ Q ∈ MN−1 (C) ⊗ ChX1 , . . . , Xn i is invertible
10.3 Reduction to operator-valued additive convolution via the linearization trick 257

◦ and U is a row vector and V is a column vector, both of size N − 1 with entries
in ChX1 , . . . , Xn i,
is called a linearization of P, if the following conditions are satisfied:
(i) There are matrices b0 , . . . , bn ∈ MN (C), such that

P̂ = b0 ⊗ 1 + b1 ⊗ X1 + · · · + bn ⊗ Xn ,

i.e. the polynomial entries in Q, U and V all have degree ≤ 1.

(ii) It holds true that P = −UQ−1V .

Applying the Schur complement, Proposition 1, to this situation yields then the
following.

Corollary 3. Let A be a unital algebra and let elements x1 , . . . , xn ∈ A be given.

Assume P ∈ ChX1 , . . . , Xn i has a linearization

P̂ = b0 ⊗ 1 + b1 ⊗ X1 + · · · + bn ⊗ Xn ∈ MN (C) ⊗ ChX1 , . . . , Xn i

with matrices b0 , . . . , bn ∈ MN (C). Then the following conditions are equivalent for
any complex number z ∈ C:
(i) The operator z − p with p := P(x1 , . . . , xn ) is invertible in A.
(ii) The operator Λ (z) − p̂ with Λ (z) defined as in (10.5) and

p̂ := b0 ⊗ 1 + b1 ⊗ x1 + · · · + bn ⊗ xn ∈ MN (C) ⊗ A

is invertible in MN (C) ⊗ A.
Moreover, if (i) and (ii) are fulfilled for some z ∈ C, we have that

(Λ (z) − p̂)−1 1,1 = (z − p)−1 .

Proof: By the definition of a linearization, Definition 2, we have a block decompo-

sition of the form
0u
p̂ := ∈ MN (C) ⊗ A
vq
where u = U(x1 , . . . , xn ), v = V (x1 , . . . , xn ) and q = Q(x1 , . . . , xn ). Furthermore we
know that q ∈ MN−1 (C) ⊗ A is invertible and p = −uq−1 v holds. This implies

z −u
Λ (z) − p̂ = ,
−v −q

and the statements follow from Proposition 1.

Now, it only remains to ensure the existence of linearizations of this kind.
258 10 Polynomials in Free Variables and Operator-Valued Convolution

Proposition 4. Any polynomial P ∈ ChX1 , . . . , Xn i admits a linearization P̂ in the

sense of Definition 2. If P is self-adjoint, then the linearization can be chosen to be
self-adjoint.

The proof follows by combining the following simple observations.

Exercise 1. (i) Show that X j ∈ ChX1 , . . . , Xn i has a linearization

0 Xj
X̂ j = ∈ M2 (C) ⊗ ChX1 , . . . , Xn i.
1 −1

(This statement looks simplistic taken for itself, but it will be useful when combined
with the third part.)
(ii) A monomial of the form P := Xi1 Xi2 · · · Xik ∈ ChX1 , . . . , Xn i for k ≥ 2, i1 , . . . , ik ∈
{1, . . . , n} has a linearization
 
Xi1
 Xi2 −1
P̂ =   ∈ Mk (C) ⊗ ChX1 , . . . , Xn i.
 
. .
 .. .. 
Xik −1

(iii) If the polynomials P1 , . . . , Pk ∈ ChX1 , . . . , Xn i have linearizations

0 Uj
P̂j = ∈ MN j (C) ⊗ ChX1 , . . . , Xn i
Vj Q j

for j = 1, . . . , n, then their sum P := P1 + · · · + Pk has the linearization


0 U1 . . . Uk

V1 Q1 
P̂ =  .  ∈ MN (C) ⊗ ChX1 , . . . , Xn i
 
 .. ..
. 
Vk Qk

with N := (N1 + · · · + Nk ) − k + 1.
(iv) If
0U
∈ MN (C) ⊗ ChX1 , . . . , Xn i
V Q
is a linearization of P, then

0 U V∗
 
U ∗ 0 Q∗  ∈ M2N−1 (C) ⊗ ChX1 , . . . , Xn i
V Q 0

is a linearization of P + P∗ .
10.4 Analytic theory of operator-valued convolutions 259

10.4 Analytic theory of operator-valued convolutions

In the last two sections we indicated how problems in free variables can be trans-
formed into operator-valued simpler problems. In particular, the distribution of a
self-adjoint polynomial p = P(x1 , . . . , xn ) in free variables x1 , . . . , xn can be deduced
from the operator-valued distribution of a corresponding linearization

p̂ := b0 ⊗ 1 + b1 ⊗ x1 + · · · + bn ⊗ xn ∈ MN (C) ⊗ A.

Note that for this linearization the freeness of the variables plays no role. Where it
becomes crucial is the observation that the freeness of x1 , . . . , xn implies, by Corol-
lary 9.14, the freeness over MN (C) of b1 ⊗ x1 , . . . , bn ⊗ xn . (Note that there is no
classical counter part of this for the case of independent variables.) Hence the distri-
bution of p̂ is given by the operator-valued free additive convolution of the distribu-
tions of b1 ⊗ x1 , . . . , bn ⊗ xn . Furthermore, since the distribution of xi determines also
the MN (C)-valued distribution of bi ⊗ xi , we have finally reduced the determination
of the distribution of P(x1 , . . . , xn ) to a problem involving operator-valued additive
free convolution. As pointed out in Section 9.2 we can in principle deal with such a
convolution.
However, in the last chapter we treated the relevant tools, in particular the
operator-valued R-transform, only as formal power series and it is not clear how one
should be able to derive explicit solutions from such formal equations. But worse,
even if the operator-valued Cauchy and R-transforms are established as analytic ob-
jects, it is not clear how to solve operator-valued equations like the one in Theorem
9.11. There are rarely any non-trivial operator-valued examples where an explicit
solution can be written down; and also numerical methods for such equations are
problematic — a main obstacle being, that those equations usually have many so-
lutions, and it is apriori not clear how to isolate the one with the right positivity
properties. As we have already noticed in the scalar-valued case, it is the subordina-
tion formulation of those convolutions which comes to the rescue. From an analytic
and also a numerical point of view, the subordination function is a much nicer object
than the R-transform.
So, in order to make good use of our linearization algorithm, we need also a well-
developed subordination theory of operator-valued free convolution. Such a theory
exists and we will present in the following the relevant statements. For proofs and
more details we refer to the original papers [23, 25].

10.4.1 General notations

A C∗ -operator-valued probability space (M, E, B) is an operator-valued probability

space, where M is a C∗ -algebra, B is a C∗ -subalgebra of M and E is completely
positive. In such a setting we use for x ∈ M the notation x > 0 for the situation
260 10 Polynomials in Free Variables and Operator-Valued Convolution

where x ≥ 0 and x is invertible; note that this is equivalent to the fact that there
exists a real ε > 0 such that x ≥ ε1. . Any element x ∈ M can be uniquely written
as x = Re(x) + i Im(x), where Re(x) = (x + x∗ )/2 and Im(x) = (x − x∗ )/(2i) are
self-adjoint. We call Re(x) and Im(x) the real and imaginary part of x.
The appropriate domain for the operator-valued Cauchy transform Gx for a self-
adjoint element x = x∗ is the operator upper half-plane

H+ (B) := {b ∈ B : Im(b) > 0}.

Elements in this open set are all invertible, and H+ (B) is invariant under conjugation
by invertible elements in B, i.e. if b ∈ H+ (B) and c ∈ GL(B) is invertible, then
cbc∗ ∈ H+ (B).
We shall use the following analytic mappings, all defined on H+ (B); all trans-
forms have a natural Schwarz-type analytic extension to the lower half-plane given
by f (b∗ ) = f (b)∗ ; in all formulas below, x = x∗ is fixed in M:
◦ the moment-generating function:

Ψx (b) = E (1 − bx)−1 − 1 = E (b−1 − x)−1 b−1 − 1 = Gx (b−1 )b−1 − 1;

(10.6)
◦ the reciprocal Cauchy transform:
−1
Fx (b) = E (b − x)−1 = Gx (b)−1 ;

(10.7)

◦ the eta transform:

ηx (b) = Ψx (b)(1 +Ψx (b))−1 = 1 − bFx (b−1 ); (10.8)

◦ the h transform:
−1
hx (b) = E (b − x)−1

− b = Fx (b) − b. (10.9)

10.4.2 Operator-valued additive convolution

Here is now the main theorem from [23] on operator-valued free additive convolu-
tion.

Theorem 5. Assume that (M, E, B) is a C∗ -operator-valued probability space and

x, y ∈ M are two self-adjoint operator-valued random variables which are free over
B. Then there exists a unique pair of Fréchet (and thus also Gateaux) analytic maps
ω1 , ω2 : H+ (B) → H+ (B) so that
(i) Im(ω j (b)) ≥ Im(b) for all b ∈ H+ (B), j ∈ {1, 2};
(ii) Fx (ω1 (b)) + b = Fy (ω2 (b)) + b = ω1 (b) + ω2 (b) for all b ∈ H+ (B);
10.5 Numerical Example 261

(iii) Gx (ω1 (b)) = Gy (ω2 (b)) = Gx+y (b) for all b ∈ H+ (B).
Moreover, if b ∈ H+ (B), then ω1 (b) is the unique fixed point of the map

fb : H+ (B) → H+ (B), fb (w) = hy (hx (w) + b) + b,

and
ω1 (b) = lim fb◦n (w) for any w ∈ H+ (B),
n→∞

where fb◦n denotes the n-fold composition of fb with itself. Same statements hold for
ω2 , with fb replaced by w 7→ hx (hy (w) + b) + b.

10.4.3 Operator-valued multiplicative convolution

There is also an analogous theorem for treating the operator-valued multiplicative

free convolution, see [25].

Theorem 6. Let (M, E, B) be a W ∗ -operator-valued probability space; i.e., M is

a von Neumann algebra and B a von Neumann subalgebra. Let x > 0, y = y∗ ∈ M
be two random variables with invertible expectations, free over B. There exists a
Fréchet holomorphic map ω2 : {b ∈ B : Im(bx) > 0} → H+ (B), such that
(i) ηy (ω2 (b)) = ηxy (b), Im(bx) > 0;
(ii) ω2 (b) and b−1 ω2 (b) are analytic around zero;
(iii) for any b ∈ B so that Im(bx) > 0, the map gb : H+ (B) → H+ (B), gb (w) =
bhx (hy (w)b) is well defined, analytic and for any fixed w ∈ H+ (B),

ω2 (b) = lim g◦n

b (w),
n→∞

in the weak operator topology.

Moreover, if one defines ω1 (b) := hy (ω2 (b))b, then

ηxy (b) = ω2 (b)ηx (ω1 (b))ω2 (b)−1 , Im(bx) > 0.

10.5 Numerical Example

Let us present a numerical example for the calculation of self-adjoint polynomials
in free variables. We consider the polynomial p = P(x, y) = xy + yx + x2 in the free
variables x and y. This p has a linearization

x y + 2x
 
0
p̂ =  x 0 −1  ,
y + 2x −1 0
262 10 Polynomials in Free Variables and Operator-Valued Convolution

which means that the Cauchy transform of p can be recovered from the operator-
valued Cauchy transform of p̂, namely we have
!
z 00
ϕ((z − p)−1 ) ∗

−1
G p̂ (b) = (id ⊗ ϕ)((b − p̂) ) = for b = 0 0 0 .
∗ ∗ 000

But this p̂ can now be written as

0 x 2x
   
0 0 y
p̂ =  x 0 −1 + 0 0 0 = X̃ + Ỹ
x
2 −1 0 y 0 0

and hence is the sum of two self-adjoint variables X̃ and Ỹ , which are free over
M3 (C). So we can use the subordination result from Theorem 5 in order to calculate
the Cauchy transform G p of p:

G p (z) ∗
= G p̂ (b) = GX̃+Ỹ (b) = GX̃ (ω1 (b)),
∗ ∗

where ω1 (b) is determined by the fixed point equation from Theorem 5.

There are no explicit solutions of those fixed point equations in M3 (C), but a
numerical implementation relying on iterations is straightforward. One point to note
is that b as defined above is not in the open set H+ (M3 (C)), but lies on its boundary.
Thus, in order to be in the frame as needed in Theorem 5, one has to move inside
the upper half-plane, by replacing
   
z 0 0 z 0 0
b = 0 0 0 by 0 iε 0 
0 0 0 0 0 iε

and send ε > 0 to zero at the end.

Figure 10.1 shows the agreement between the achieved theoretic result and the
histogram of the eigenvalues of a corresponding random matrix model.

10.6 The Case of Rational Functions

As we mentioned before the linearization procedure works as well in the case of
non-commutative rational functions. Here is an example of such a case.
Consider the following self-adjoint rational function
−1
r(x1 , x2 ) = (4−x1 )−1 +(4−x1 )−1 x2 (4 − x1 ) − x2 (4 − x1 )−1 x2 x2 (4−x1 )−1

in two free variables x1 and x2 . The fact that we can write it as

10.6 The Case of Rational Functions 263

7
0.35

6
0.3

0.25 5

0.2 4

0.15 3

0.1 2

0.05
1

0
−5 0 5 10 0
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65

Fig. 10.1 Plots of the distribution of p(x, y) = xy + yx + x2 (left) for free x, y, where x is semi-
circular and y Marchenko-Pastur, and of the rational function r(x1 , x2 ) (right) for free semicircular
elements x1 and x2 ; in both cases the theoretical limit curve is compared with the histogram of the
eigenvalues of a corresponding random matrix model

1 − 1 x1 − 1 x2 −1 1

1 4 4 2
r(x1 , x2 ) = 0
2 − 14 x2 1 − 14 x1 0

gives us immediately a self-adjoint linearization of the form

 1 
0 2 0
r̂(x1 , x2 ) =  12 −1 + 41 x1 1
4 x2

1 1
0 4 x2 −1 + 4 x1

 1   
0 2 0 0 0 0
1 
=  12 −1 + 41 x1 0  + 0 0 4 x2 .
1
0 0 −1 + 14 x1 0 4 x2 0

So again, we can write the linearization as the sum of two M3 (C)-free variables
and we can invoke Theorem 5 for the calculation of its operator-valued Cauchy
transform. In Figure 10.1, we compare the histogram of eigenvalues of r(X1 , X2 ) for
one realization of independent Gaussian random matrices X1 , X2 of size 1000×1000
with the distribution of r(x1 , x2 ) for free semi-circular elements x1 , x2 , calculated
according to this algorithm.
264 10 Polynomials in Free Variables and Operator-Valued Convolution

Other examples for the use of operator-valued free probability methods can be
found in [12].

10.7 Additional exercise

Exercise 2. Consider the C∗ -algebra Mn (C) of n × n matrices over C. By definition

we have
H+ (Mn (C)) := {B ∈ Mn (C) | ∃ε > 0 : Im(B) ≥ ε1},
where Im(B) := (B − B∗ )/(2i).
(i) In the case n = 2, show that in fact

+ b11 b12 1 2
H (M2 (C)) := Im(b11 ) > 0, Im(b11 )Im(b22 ) > |b12 − b21 | .
b21 b22 4

(ii) For general n ∈ N, prove: if a matrix B ∈ Mn (C) belongs to H+ (Mn (C)) then
all eigenvalues of B lie in the complex upper half-plane C+ . Is the converse also
true?
Chapter 11
Brown Measure

The Brown measure is a generalization of the eigenvalue distribution for a gen-

eral (not necessarily normal) operator in a finite von Neumann algebra (i.e, a von
Neumann algebra which possesses a trace). It was introduced by Larry Brown in
[46], but fell into obscurity soon after. It was revived by Haagerup and Larsen [85],
and played an important role in Haagerup’s investigations around the invariant sub-
space problem [87]. By using a “hermitization” idea one can actually calculate the
Brown measure by M2 (C)-valued free probability tools. This leads to an extension
of the algorithm from the last chapter to the calculation of arbitrary polynomials in
free variables. For generic non-self-adjoint random matrix models their asymptotic
complex eigenvalue distribution is expected to converge to the Brown measure of
the (∗-distribution) limit operator. However, because the Brown measure is not con-
tinuous with respect to convergence in ∗-moments this is an open problem in the
general case.

11.1 Brown measure for normal operators

Let (M, τ) be a W ∗ -probability space and consider an operator a ∈ M. The relevant
information about a is contained in its ∗-distribution which is by definition the col-
lection of all ∗-moments of a with respect to τ. In the case of self-adjoint or normal
a we can identify this distribution with an analytic object, a probability measure µa
on the spectrum of a. Let us first recall these facts.
If a = a∗ is self-adjoint, there exists a uniquely determined probability measure
µa on R such that for all n ∈ N
Z
τ(an ) = t n dµa (t)
R

and the support of µa is the spectrum of a; see also the discussion after equation
(2.2) in Chapter 2.
More general, if a ∈ M is normal (i.e., aa∗ = a∗ a), then the spectral theorem
provides us with a projection valued spectral measure Ea and the Brown measure is

265
266 11 Brown Measure

just the spectral measure µa = τ ◦ Ea . Note that in the normal case µa may not be
determined by the moments of a. Indeed, if a = u is a Haar unitary then the moments
of u are the same as the moments of the zero operator. Of course, their ∗-moments
are different. For a normal operator a its spectral measure µa is uniquely determined
by Z
τ(an a∗m ) = zn z̄m dµa (z) (11.1)
C
for all m, n ∈ N. The support of µa is again the spectrum of a.
We will now try to assign to any operator a ∈ M a probability measure µa on
its spectrum, which contains relevant information about the ∗-distribution of a. This
µa will be called the Brown measure of a. One should note that for non-normal
operators there are many more ∗-moments of a than those appearing in (11.1). There
is no possibility to capture all the ∗-moments of a by the ∗-moments of a probability
measure. Hence, we will necessarily loose some information about the ∗-distribution
of a when we go over to the Brown measure of a. It will also turn out that we
need our state τ to be a trace in order to define µa . Hence in the following we will
only work in tracial W ∗ -probability spaces (M, τ). Recall that this means that τ is
a faithful and normal trace. Von Neumann algebras which admit such faithful and
normal traces are usually addressed as finite von Neumann algebras. If M is a finite
factor, then a tracial state τ : M → C is unique on M and is automatically normal
and faithful.

11.2 Brown measure for matrices

In the finite-dimensional case M = Mn (C), the Brown measure µT for a normal
matrix T ∈ Mn (C), determined by (11.1), really is the eigenvalue distribution of the
matrix. It is clear that in the case of matrices we can extend this definition to the
general, non-normal case. For a general matrix T ∈ Mn (C), the spectrum σ (T ) is
given by the roots of the characteristic polynomial

P(λ ) = det(λ I − T ) = (λ − λ1 ) · · · (λ − λn ),

where λ1 , . . . , λn are the roots repeated according to algebraic multiplicity. In this

case we have as eigenvalue distribution (and thus as Brown measure)
1
µT = (δλ1 + · · · + δλn ).
n
We want to extend this definition of µT to an infinite dimensional situation. Since
the characteristic polynomial does not make sense in such a situation we have to find
an analytic way of determining the roots of P(λ ) which survives also in an infinite
dimensional setting.
Consider
n
log |P(λ )| = log | det(λ I − T )| = ∑ log |λ − λi |.
i=1
11.2 Brown measure for matrices 267

We claim that the function λ 7→ log |λ | is harmonic in C\{0} and that in general
it has Laplacian
∇2 log |λ | = 2πδ0 (11.2)
in the distributional sense. Here the Laplacian is given by

∂2 ∂2
∇2 = + ,
∂ λr2 ∂ λi2

where λr and λi are the real and imaginary part of λ ∈ C. (Note that we use the sym-
bol ∇2 for the Laplacian, since we reserve the symbol ∆ for the Fuglede-Kadison
determinant of the next section.)
Let us prove this claim on the behaviour of log |λ |. For λ 6= 0 we write ∇2 in
terms of polar coordinates,

∂2 1 ∂ 1 ∂2
∇2 = + +
∂ r2 r ∂ r r2 ∂ θ 2
and have
∂2

1 ∂ 1 1
∇2 log |λ | = 2
+ log r = − 2 + 2 = 0.
∂r r ∂r r r
Ignoring the singularity at 0 we can write formally
Z Z
∇2 log |λ |dλr dλi = div(grad log |λ |)dλr dλi
B(0,r) B(0,r)
Z
= grad log |λ | · ndA
∂ B(0,r)
n
Z
= · ndA
∂ B(0,r) r
1
= · 2πr
r
= 2π.

That is Z
∇2 log |λ |dλr dλi = 2π,
B(0,r)

independent of r > 0. Hence ∇2 log |λ | must be 2πδ0 .

Exercise 1. By integrating against a test function show rigorously that ∇2 log |λ | =
2πδ0 as distributions.
Given the fact (11.2), we can now rewrite the eigenvalue distribution µT in the
form
268 11 Brown Measure

1 1 2 n 1 2
µT = (δλ1 + · · · + δλn ) = ∇ ∑ log |λ − λi | = ∇ log | det(T − λ I)|.
n 2πn i=1 2πn
(11.3)
As there exists a version of the determinant in an infinite dimensional setting we can
use this formula to generalize the definition of µT .

11.3 Fuglede-Kadison determinant in finite von Neumann algebras

In order to use (11.3) in infinite dimensions we need a generalization of the deter-
minant. Such a generalization was provided by Fuglede and Kadison [75] in 1952
for operators in a finite factor M; the case of a general finite von Neumann algebra
is an straightforward extension.

Definition 1. Let (M, τ) be a tracial W ∗ -probability space and consider a ∈ M. Its

Fuglede-Kadison determinant ∆ (a) is defined as follows. If a is invertible one can
put
∆ (a) = exp[τ(log |a|)] ∈ (0, ∞),
where |a| = (a∗ a)1/2 . More generally, we define

∆ (a) = lim exp τ log(a∗ a + ε)1/2 ∈ [0, ∞).

ε&0

By functional calculus and the monotone convergence theorem, the limit always
exists.

This determinant ∆ has the following properties:

◦ ∆ (ab) = ∆ (a)∆ (b) for all a, b ∈ M.
◦ ∆ (a) = ∆ (a∗ ) = ∆ (|a|) for all a ∈ M.
◦ ∆ (u) = 1 when u is unitary.
◦ ∆ (λ a) = |λ |∆ (a) for all λ ∈ C and a ∈ M.
◦ a 7→ ∆ (a) is upper semicontinuous in norm-topology and in k · k p -norm for all
p > 0.
Let us check what this definition gives in the case of matrices, M = Mn (C), τ = tr.
If T is invertible, then we can write
 
t1 . . . 0
|T | = U  ... . . . ...  U ∗ ,
 

0 . . . tn

with ti > 0. Then we have

 
logt1 . . . 0
log |T | = U  ... . . . ...  U ∗
 

0 . . . logtn
11.4 Subharmonic functions and their Riesz measures 269

and
√

1 p p
∆ (T ) = exp (logt1 + · · · + logtn ) = n t1 · · ·tn = n det |T | = n | det T |. (11.4)
n

Note that det |T | = | det T |, because we have the polar decomposition T = V |T |,

where V is unitary and hence | detV | = 1.
Thus we have in finite dimensions
1 2 1 2
µT = ∇ log | det(T − λ I)| = ∇ (log ∆ (T − λ I)).
2πn 2π
So we are facing the question whether it is possible to make sense out of
1 2
∇ (log ∆ (a − λ )) (11.5)
2π
for operators a in general finite von Neumann algebras, where ∆ denotes the
Fuglede-Kadison determinant. (Here and in the following we will write a − λ for
a − λ 1.)

11.4 Subharmonic functions and their Riesz measures

Definition 2. A function f : R2 → [−∞, ∞) is called subharmonic if
(i) f is upper semicontinuous, i.e.,

f (z) ≥ lim sup f (zn ), whenever zn → z;

n→∞

(ii) f satisfies the submean inequality: for every circle the value of f at the centre is
less or equal to the mean value of f over the circle, i.e.
Z 2π
1
f (z) ≤ f (z + reiθ )dθ ;
2π 0

(iii) f is not constantly equal to −∞.

If f is subharmonic then f is Borel measurable, f (z) > −∞ almost everywhere

1 (R2 ). One has the following classical
with respect to Lebesgue measure and f ∈ Lloc
theorem for subharmonic functions; see, e.g., [13, 92]

Theorem 3. If f is subharmonic on R2 ≡ C, then ∇2 f exists in the distributional

sense and it is a positive Radon measure ν f ; i.e., ν f is uniquely determined by

1
Z Z
f (λ ) · ∇2 ϕ(λ )dλr dλi = ϕ(z)dν f (z) for all ϕ ∈ Cc∞ (R2 ).
2π R2 C

If ν f has compact support then

270 11 Brown Measure

1
Z
f (λ ) = log |λ − z|dν f (z) + h(λ ),
2π C

where h is a harmonic function on C.

Definition 4. The measure ν f = ∇2 f is called the Riesz measure of the subharmonic
function f .

11.5 Definition of the Brown measure

If we apply this construction to our question about (11.5), we get the construction of
the Brown measure as follows. This was defined by L. Brown in [46] (for the case
of factors); for more information see also [85].
Theorem 5. Let (M, τ) be a tracial W ∗ -probability space. Then we have:
(i) The function λ 7→ log ∆ (a − λ ) is subharmonic.
(ii) The corresponding Riesz measure

1 2
µa := ∇ log ∆ (a − λ ) (11.6)
2π
is a probability measure on C with support contained in the spectrum of a.
(iii) Moreover, one has for all λ ∈ C
Z
log |λ − z|dµa (z) = log ∆ (a − λ ) (11.7)
C

and this characterizes µa among all probability measures on C.

Definition 6. The measure µa from Theorem 5 is called the Brown measure of a.

Proof: [Sketch of Proof of Theorem 5(i)] Suppose a ∈ M. We want to show that

f (λ ) := log ∆ (a − λ ) is subharmonic. We have

∆ (a) = lim exp τ log(a∗ a + ε)1/2 .

ε&0

Thus
1
log ∆ (a) = lim τ(log(a∗ a + ε)),
2 ε&0
as a decreasing limit as ε & 0. So, with the notations
1
aλ := a − λ , fε (λ ) := τ(log(a∗λ aλ + ε)),
2
we have
f (λ ) = lim fε (λ ).
ε&0

For ε > 0, the function fε is a C2 -function, and therefore fε being subharmonic is

equivalent to ∇2 fε ≥ 0 as a function. But ∇2 fε can be computed explicitly:
11.5 Definition of the Brown measure 271

∇2 fε (λ ) = 2ετ (aλ a∗λ + ε)−1 (a∗λ aλ + ε)−1 .

(11.8)

Since we have for general positive operators x and y that τ(xy) = τ(x1/2 yx1/2 ) ≥ 0,
we see that ∇2 fε (λ ) ≥ 0 for all λ ∈ C and thus fε is subharmonic.
The fact that fε & f implies then that f is upper semicontinuous and satisfies
the submean inequality. Furthermore, if λ 6∈ σ (a) then a − λ is invertible, hence
∆ (a − λ ) > 0, and thus f (λ ) 6= −∞. Hence f is subharmonic.

Exercise 2. We want to prove here (11.8). We consider fε (λ ) as a function in λ and

λ̄ ; hence the Laplacian is given by (where as usual λ = λr + iλi )

∂2 ∂2 ∂2
∇2 = 2
+ 2
=4
∂ λr ∂ λi ∂ λ̄ ∂ λ

where
∂ 1 ∂ ∂ ∂ 1 ∂ ∂
= −i , = +i .
∂λ 2 ∂ λr ∂ λi ∂ λ̄ 2 ∂ λr ∂ λi
(i) Show that we have for each n ∈ N (by relying heavily on the fact that τ is a
trace)
∂
τ[(a∗λ aλ )n ] = −nτ[(a∗λ aλ )n−1 a∗λ ]
∂λ
and
n
∂
τ[(a∗λ aλ )n a∗λ ] = − ∑ τ[(aλ a∗λ ) j (a∗λ aλ )n− j ].
∂ λ̄ j=0

(ii) Prove (11.8) by using the power series expansion of

a∗ aλ

log(a∗λ aλ + ε) = log ε + log 1 + λ .
ε

In the case of a normal operator the Brown measure is just the spectral measure
τ ◦ Ea , where Ea is the projection valued spectral measure according to the spectral
theorem. In that case µa is determined by the equality of the ∗-moments of µa and
of a, i.e., by Z
zn zm dµa (z) = τ(an a∗m ) if a is normal
C
for all m, n ∈ N. If a is not normal, then this equality does not hold anymore. Only
the equality of the moments is always true, i.e., for all n ∈ N
Z Z
zn dµa (z) = τ(an ) and z̄n dµa (z) = τ(a∗n ).
C C

One should note, however, that the Brown measure of a is in general actually
determined by the ∗-moments of a. This is the case, since τ is faithful and the
272 11 Brown Measure

Brown measure depends only on τ restricted to the von Neumann algebra generated
by a; the latter is uniquely determined by the ∗-moments of a, see also Chapter 6,
Theorem 6.2.
What one can say in general about the relation between the ∗-moments of µa and
of a is the following generalized Weyl Inequality of Brown [46]. For any a ∈ M and
0 < p < ∞ we have Z
|z| p dµa (z) ≤ kak pp = τ(|a| p ).
C
This was strengthened by Haagerup and Schultz [87] in the following way: If Minv
denotes the invertible elements in M, then we actually have for all a ∈ M and every
p > 0 that Z
|z| p dµa (z) = inf kbab−1 k pp .
C b∈Minv

Note here that because of ∆ (bab−1 ) = ∆ (a) we have µbab−1 = µa for b ∈ Minv .
Exercise 3. Let (M, τ) be a tracial W ∗ -probability space and a ∈ M. Let p(z) be a
polynomial in the variable z (not involving z̄), hence p(a) ∈ M. Show that the Brown
measure of p(a) is the push-forward of the Brown measure of a, i.e., µ p(a) = p∗ (µa ),
where the push-forward p∗ (ν) of a measure ν is defined by p∗ (ν)(E) = ν(p−1 (E))
for any measurable set E.
The calculation of the Brown measure of concrete non-normal operators is usu-
ally quite hard and there are not too many situations where one has explicit solutions.
We will in the following present some of the main concrete results.

11.6 Brown measure of R-diagonal operators

R-diagonal operators were introduced by Nica and Speicher [139]. They provide
a class of, in general non-normal, operators which are usually accessible to con-
crete calculations. In particular, one is able to determine their Brown measure quite
explicitly.
R-diagonal operators can be considered in general ∗-probability spaces, but we
will restrict here to the tracial W ∗ -probability space situation; only there the notion
of Brown measure makes sense.

Definition 7. An operator a in a tracial W ∗ -probability space (M, τ) is called R-

diagonal if its only non-vanishing ∗-cumulants (i.e. cumulants where each argu-
ment is either a or a∗ ) are alternating, i.e., of the form κ2n (a, a∗ , a, a∗ , . . . , a, a∗ ) =
κ2n (a∗ , a, a∗ , a . . . , a∗ , a) for some n ∈ N.

Main examples for R-diagonal operators are Haar unitaries and Voiculescu’s cir-
cular operator. With the exception of multiples of Haar unitaries, R-diagonal opera-
tors are not normal. One main characterization [139] of R-diagonal operators is the
following: a is R-diagonal if and only if a has the same ∗-distribution as up where
u is a Haar unitary, p ≥ 0, and u and p are ∗-free. If ker(a) = {0}, then this can be
11.6 Brown measure of R-diagonal operators 273

refined to the characterization that R-diagonal operators have a polar decomposition

of the form a = u|a|, where u is Haar unitary and |a| is ∗-free from u.
The Brown measure of R-diagonal operators was calculated by Haagerup and
Larsen [85]. The following theorem contains their main statements on this.

Theorem 8. Let (M, τ) be a tracial W ∗ -probability space and a ∈ M be R-diagonal.

Assume that ker(a) = {0} and that a∗ a is not a constant operator. Then we have the
following.
(i) The support of the Brown measure µa is given by

supp(µa ) = {z ∈ C | ka−1 k−1

2 ≤ |z| ≤ kak2 }, (11.9)

where we put ka−1 k−1

2 = 0 if a
−1 6∈ L2 (M, τ).

(ii) µa is invariant under rotations about 0 ∈ C.

(iii) For 0 < t < 1, we have

1
µa (B(0, r)) = t for r= p , (11.10)
Sa∗ a (t − 1)

where Sa∗ a is the S-transform of the operator a∗ a and B(0, r) is the open disc
with radius r.
(iv) The conditions (i), (ii), and (iii) determine µa uniquely.
(v) The spectrum of an R-diagonal operator a coincides with supp(µa ) unless a−1 ∈
L2 (M, τ)\M in which case supp(µa ) is the annulus (11.9), while the spectrum of
a is the full closed disc with radius kak2 .

For the third part, one has to note that

1
t 7→ p
Sa∗ a (t − 1)

maps (0, 1) onto (ka−1 k−1

2 , kak2 ).

11.6.1 A little about the proof

We give some key ideas of the proof from [85]; for another proof see [158].
Consider λ ∈ C and put α := |λ |. A key point is to find a relation between µ|a|
and µ|a−λ | . For a probability measure σ , we denote its symmetrized version by σ̃ ,
i.e., for any measurable set E we have σ̃ (E) = (σ (E) + σ (−E))/2. Then one has
the relation
1
µ̃|a−λ | = µ̃|a| (δα + δ−α ), (11.11)
2
or in terms of the R-transforms:
274 11 Brown Measure
√
1 + 4α 2 z2 − 1
Rµ̃|a−λ | (z) = Rµ̃|a| (z) + .
2z
Hence µ|a| determines µ|a−λ | , which determines
Z Z ∞
log |λ − z|dµa (z) = log ∆ (a − λ ) = log ∆ (|a − λ |) = log(t)dµ|a−λ | (t).
C 0

Exercise 4. Prove (11.11) by showing that if a is R-diagonal then the matrices

0 a 0λ
and
a∗ 0 λ̄ 0

are free in the (M2 (C) ⊗ M, tr ⊗ τ).

11.6.2 Example: circular operator

√
Let us consider, as a concrete example, the circular operator c = (s1 + is2 )/ 2,
where s1 and s2 are free standard semi-circular elements. √
The distribution of c∗ c is free Poisson with rate 1, given by the density 4 − t/2πt
on [0, 4], and thus the distribution µ|c| of the absolute value |c| is the quarter-circular
√
distribution with density 4 − t 2 /π on [0, 2]. We have kck2 = 1 and kc−1 k2 = ∞,
and hence the support of the Brown measure of c is the closed unit disc, supp(µc ) =
B(0, 1). This coincides with the spectrum of c.
In order to apply Theorem 8, we need to calculate the S-transform of c∗ c. We have
Rc c (z) = 1/(1−z), and thus Sc∗ c (z) = 1/(1+z) (because z 7→ zR(z) and w 7→ wS(w)
∗

are inverses of each other; see [140, Remark 16.18] and also the discussion √ around
[140, Eq. (16.8)]). So, for 0 < t < 1, we have Sc∗ c (t − 1) = 1/t. Thus µc (B(0, t)) =
t, or, for 0 < r < 1, µc (B(0, r)) = r2 . Together with the rotation invariance this shows
that µc is the uniform measure on the unit disc B(0, 1).

11.6.3 The circular law

The circular law is the non-self-adjoint version of Wigner’s semicircle law. Con-
sider an N × N matrix where all entries are independent and identically distributed.
If the distribution of the entries is Gaussian then this ensemble is also called Gini-
bre ensemble. It is very easy to check that the ∗-moments of the Ginibre random
matrices converge to the corresponding ∗-moments of the circular operator. So it is
quite plausible to expect that the Brown measure (i.e., the eigenvalue distribution)
of the Ginibre random matrices converges to the Brown measure of the circular op-
11.6 Brown measure of R-diagonal operators 275

erator, i.e., to the uniform distribution on the disc. This statement is known as the
circular law. However, one has to note that the above is not a proof for the circu-
lar law, because the Brown measure is not continuous with respect to our notion of
convergence in ∗-distribution. One can construct easily examples where this fails.
Exercise 5. Consider the sequence (TN )N≥2 of nilpotent N × N matrices
 
  0 1 0 0 0
  0 1 0 0
010 0 0 1 0 0
01 0 0 1 0  
T2 = , T3 = 0 0 1 , T4 =  ,···
, T5 = 
00 0 0 0 1 0 0 0 1 0
000 0 0 0 0 1
0 0 0 0
0 0 0 0 0

Show that,
◦ with respect to tr, TN converges in ∗-moments to a Haar unitary element,
◦ the Brown measure of a Haar unitary element is the uniform distribution on the
circle of radius 1
◦ but the asymptotic eigenvalue distribution of TN is given by δ0 .

However, for nice random matrix ensembles the philosophy of convergence of

the eigenvalue distribution to the Brown measure of the limit operator seems to be
correct. For the Ginibre ensemble one can write down quite explicitly its eigenvalue
distribution and then it is easy to check the convergence to the circular law. If the
distribution of the entries is not Gaussian, then one still has convergence to the cir-
cular law under very general assumptions (only second moment of the distribution
has to exist), but proving this in full generality has only been achieved recently. For
a survey on this see [42, 172].

11.6.4 The single ring theorem

There are also canonical random matrix models for R-diagonal operators. If one
considers on (non-self-adjoint) N × N matrices a density of the form
N ∗ A))
PN (A) = const · e− 2 Tr( f (A

then one can check, under suitable assumptions on the function f , that the ∗-
distribution of the corresponding random matrix A converges to an R-diagonal op-
erator (whose concrete form is of course determined in terms of f ). So again one
expects that the eigenvalue distribution of those random matrices converges to the
Brown measure of the limit R-diagonal operator, whose form is given in Theorem
8. (In particular, this limiting eigenvalue distribution lives on an, possibly degener-
ate, annulus, i.e. a single ring, even if f has several minima.) This has been proved
recently by Guionnet, Krishnapur, and Zeitouni [82].
276 11 Brown Measure

11.7 Brown measure of elliptic operators

An elliptic operator is of the form a = αs1 + iβ s2 , where α, β > 0 and s1 and s2 are
free standard semi-circular operators. An elliptic operator is not R-diagonal, unless
α = β (in which case it is a circular operator). The following theorem was proved
by Larsen [116] and by Biane and Lehner [39].

Theorem 9. Consider the elliptic operator

π
a = (cos θ )s1 + i(sin θ )s2 , 0<θ < .
2
Put γ := cos(2θ ) and λ = λr + iλi . Then the spectrum of a is the ellipse

λr2 λi2

σ (a) = λ ∈ C | + ≤ 1 ,
(1 + γ)2 (1 − γ)2

and the Brown measure µa is the measure with constant density on σ (a):

1
dµa (λ ) = 1 (λ )dλr dλi .
π(1 − γ 2 ) σ (a)

11.8 Brown measure for unbounded operators

The Brown measure can also be extended to unbounded operators which are affil-
iated to a tracial W ∗ -probability space; for the notion of “affiliated operators” see
our discussion before Definition 8.15 in Chapter 8. This extension of the Brown
measure was done by Haagerup and Schultz in [86].
∆ and µa can be defined for unbounded a provided 1∞ log(t)dµ|a| (t) < ∞, in
R

which case Z ∞
∆ (a) = exp log(t)dµ|a| (t) ∈ [0, ∞),
0

and the Brown measure µa is still determined by (11.7).

Example 10. Let c1 and c2 be two ∗-free circular elements and consider a := c1 c−1 2 .
If c1 , c2 live in the tracial W ∗ -probability space (M, τ), then a ∈ L p (M, τ) for 0 < p <
1. In this case, ∆ (a − λ ) and µa are well defined. In order to calculate µa , one has to
extend the class of R-diagonal operators and the formulas for their Brown measure
to unbounded operators. This was done in [86]. Since the product of an R-diagonal
element with a ∗ free element is R-diagonal, too, we have that a is R-diagonal. So
to use (the unbounded version of) Theorem 8 we need to calculate the S-transform
of a∗ a. Since with c2 , also its inverse c−12 is R-diagonal, we have S|a|2 = S|c1 |2 S|c−1 2.
2 |
The S-transform of the first factor is S|c1 |2 (z) = 1/(1 + z), compare Section 11.6.2.
Furthermore, the S-transforms of x and x−1 are, for positive x, in general related by
Sx (z) = 1/Sx−1 (−1 − z). Since |c−1 2 ∗ −2 and since c∗ has the same distribution
2 | = |c2 | 2
as c2 , we have that S|c−1 |2 = S|c2 |−2 and thus
2
11.9 Hermitization method 277

1 1
S|c−1 |2 (z) = S|c2 |−2 = = 1
= −z.
2 S|c2 |2 (−1 − z) 1−1−z

This gives then S|a|2 (z) = −z/(1 + z), for −1 < z < 0, or S|a|2 (t − 1) = (1 − t)/t for
p
0 < t < 1. So our main formula (11.10) from Theorem 8 gives µa (B(0, t/(1 − t))) =
t or µa (B(0, r)) = r2 /(1 + r2 ). We have kak2 = ∞ = ka−1 k2 , and thus supp(µa ) = C.
The above formula for the measure of balls gives then the density
1 1
dµa (λ ) = dλr dλi . (11.12)
π (1 + |λ |2 )2

For more details and, in particular, the proofs of the above used facts about R-
diagonal elements and the relation between Sx and Sx−1 one should see the original
paper of Haagerup and Schultz [86].

11.9 Hermitization method: using operator-valued free probability for

calculating the Brown measure
Note that formula (11.7) for determining the Brown measure can also be written as
Z Z ∞
log |λ − z|dµa (z) = log ∆ (a − λ ) = log ∆ (|a − λ |) = log(t)dµ|a−λ | (t).
C 0
(11.13)
This tells us that we can understand the Brown measure of a non-normal operator
a if we understand the distributions of all Hermitian operators |a − λ | for all λ ∈ C
sufficiently well. In the random matrix literature this idea goes back at least to Girko
[77] and is usually addressed as hermitization method. A contact of this idea with the
world of free probability was made on a formal level in the works of Janik, Nowak,
Papp, Zahed [103] and of Feinberg, Zee [71]. In [24] it was shown that operator-
valued free probability is the right frame to deal with this rigorously. (Examples
for explicit operator-valued calculations were also done before in [1].) Combining
this hermitization idea with the subordination formulation of operator-valued free
convolution allows then to calculate the Brown measure of any (not just self-adjoint)
polynomial in free variables.
In order to make this connection between Brown measure and operator-valued
quantities more precise we first have to rewrite our description of the Brown mea-
sure. In Section 11.5 we have seen that we get the Brown measure of a as the limit
for ε → 0 of

∇2 fε (λ ) = 2ετ (aλ a∗λ + ε)−1 (a∗λ aλ + ε)−1 ,

where aλ := a − λ .

This can also be reformulated in the following form (compare [116], or Lemma 4.2
in [1]: Let us define
−1
Gε,a (λ ) := τ (λ − a)∗ (λ − a)(λ − a)∗ + ε 2 . (11.14)
278 11 Brown Measure

Then
1 ∂
µε,a = Gε,a (λ ) (11.15)
π ∂ λ̄
is a probability measure on the complex plane (whose density is given by ∇2 fε ),
which converges weakly for ε → 0 to the Brown measure of a.
In order to calculate the Brown measure we need Gε,a (λ ) as defined in (11.14).
Let now
0 a
A= ∗ ∈ M2 (M).
a 0
Note that A is self-adjoint. Consider A in the M2 (C)-valued probability space with
respect to E = id ⊗ τ : M2 (M) → M2 (C) given by

a11 a12 τ(a11 ) τ(a12 )
E = .
a21 a22 τ(a21 ) τ(a22 )

For the argument

iε λ
Λε = ∈ M2 (C)
λ̄ iε
consider now the M2 (C)-valued Cauchy transform of A

g (ε, λ ) g12 (ε, λ )
GA (Λε ) = E (Λε − A)−1 =: 11

.
g21 (ε, λ ) g22 (ε, λ )

One can easily check that (Λε − A)−1 is actually given by

−iε((λ − a)(λ − a)∗ + ε 2 )−1 (λ − a)((λ − a)∗ (λ − a) + ε 2 )−1

,
(λ − a)∗ ((λ − a)(λ − a)∗ + ε 2 )−1 −iε((λ − a)∗ (λ − a) + ε 2 )−1

and thus we are again in the situation that our quantity of interest is actually one
entry of an operator-valued Cauchy transform: Gε,a (λ ) = g21 (ε, λ ) = [GA (Λε )]21 .

11.10 Brown measure of arbitrary polynomials in free variables

So in order to calculate the Brown measure of some polynomial p in self-adjoint free
variables we should first hermitize the problem by going over to self-adjoint 2 × 2
matrices over our underlying space, then we should linearize the problem on this
level and use finally our subordination description of operator-valued free convolu-
tion to deal with this linear problem. It might be not so clear whether hermitization
and linearization go together well, but this is indeed the case. Essentially we do here
a linearization of an operator-valued model instead of a scalar-valued one: we have
to linearize a polynomial in matrices. But the linearization algorithm works in this
case as well. As the end is near, let us illustrate this just with an example. For more
details, see [93].
11.10 Brown measure of arbitrary polynomials in free variables 279

Example 11. Consider the polynomial a = xy in the free self-adjoint variables x = x∗

and y = y∗ . For the Brown measure of this a we have to calculate the operator-valued
Cauchy transform of
0 a 0 xy
A= ∗ = .
a 0 yx 0
In order to linearize this we should first write it as a polynomial in matrices of x
and matrices of y. This can be achieved as follows:

0 xy x0 0y x0
= = XY X,
yx 0 01 y0 01

which is a self-adjoint polynomial in the self-adjoint variables

x0 0y
X= and Y= .
01 y0

This self-adjoint polynomial XY X has a self-adjoint linearization

 
0 0 X
 0 Y −1 .
X −1 0

Plugging in back the 2 × 2 matrices for X and Y we get finally the self-adjoint
linearization of A as
     
00 0 0 x 0 00 0 0 x 0 000000
0 0 0 0 0 1  0 0 0 0 0 1  0 0 0 0 0 0
     
0 0 0 y −1 0  0 0 0 0 −1 0  0 0 0 y 0 0
0 0 y 0 0 −1 0 0 0 0 0 −1 0 0 y 0 0 0 .
 = + 
     
x 0 −1 0 0 0  x 0 −1 0 0 0  0 0 0 0 0 0
0 1 0 −1 0 0 0 1 0 −1 0 0 000000

We have written this as the sum of two M6 (C)-free matrices, both of them being
selfadjoint. For calculating the Cauchy transform of this sum we can then use again
the subordination algorithm for the operator-valued free convolution from Theorem
10.5. Putting all the steps together gives an algorithm for calculating the Brown
measure of a = xy. One might note that in the case where both x and y are even
elements (i.e., all odd moments vanish), the product is actually R-diagonal, see [140,
Theorem 15.17]. Hence in this case we even have an explicit formula for the Brown
measure of xy, given by Theorem 8 and the fact that we can calculate the S-transform
of a∗ a in terms of the S-transforms of x and of y.

Of course, we expect that the eigenvalue distribution of our polynomial evalu-

ated in asymptotically free matrices (like independent Wigner or Wishart matrices)
should converge to the Brown measure of the polynomial in the corresponding free
280 11 Brown Measure

variables. However, as was already pointed out before (see the discussion around
Exercise 5) this is not automatic from the convergence of all ∗-moments and one
actually has to control probabilities of small eigenvalues during all the calculations.
Such controls have been achieved in the special cases of the circular law or the single
ring theorem. However, for an arbitrary polynomial in asymptotically free matrices,
this is an open problem at the moment.
In Figs. 11.1, 11.2, and 11.3, we give for some polynomials the Brown measure
calculated according to the algorithm outlined above and we also compare this with
histograms of the complex eigenvalues of the corresponding polynomials in inde-
pendent random matrices.

0.25

50
0.2

0.15 40

0.1
30

0.05
20
0

10
-0.05
4
0
2 3
2
0 1 2
0 1
-2 -1 2
-2 0 1
-4 -3 0
-1
-1
-2 -2

Fig. 11.1 Brown measure (left) of p(x, y, z) = xyz − 2yzx + zxy with x, y, z free semicircles, com-
pared to histogram (right) of the complex eigenvalues of p(X,Y, Z) for independent Wigner matri-
ces with N = 5000

250
0.8

200
0.6

0.4 150

0.2 100

0 50

-0.2 0
0

-1 4 -1 3.5
3 3
-2 -2 2.5
2 2
-3 1.5
1 -3 1
-4 0.5
0

Fig. 11.2 Brown measure (left) of p(x, y) = x + iy with x, y free Poissons of rate 1, compared to
histogram (right) of the complex eigenvalues of p(X,Y ) for independent Wishart matrices X and
Y with N = 5000
11.10 Brown measure of arbitrary polynomials in free variables 281

0.14
60
0.12
50
0.1

0.08 40
0.06
30
0.04

0.02 20

0
10
-0.02
4 0
2 3
2 2
0 1 1 2
0
-2 -1 0 1
-2 -1 0
-4 -3 -1
-2 -2

Fig. 11.3 Brown measure (left) of p(x1 , x2 , x3 , x4 ) = x1 x2 + x2 x3 + x3 x4 + x4 x1 with x1 , x2 , x3 , x4

free semicircles, compared to histogram (right) of the complex eigenvalues of p(X1 , X2 , X3 , X4 ) for
independent Wigner matrices X1 , X2 , X3 , X4 with N = 5000
Solutions to Exercises

12.1 Solutions to exercises in Chapter 1

n
R
1. Let ν be a probability measure on R such that R |t| dν(t) < ∞. For m ≤ n,
Z Z Z
|t|m dν(t) = |t|m dν(t) + |t|m dν(t)
R |t|≤1 |t|>1
Z Z
≤ 1 dν(t) + |t|n dν(t)
|t|≤1 |t|>1
Z
≤ ν(R) + |t|n dν(t)
R
< ∞.

2. Since ν has a fifth moment we can write

(it) (it)2 (it)3 (it)4
ϕ(t) = 1 + α1 + α2 + α3 + α4 + o(t 4 )
1! 2! 3! 4!
and
(it) (it)2 (it)3 (it)4
log(ϕ(t)) = k1 + k2 + k3 + k4 + o(t 4 ).
1! 2! 3! 4!
The expansion for log(1 + x) is x − x2 /2 + x3 /3 − x4 /4 + o(x4 ). Let s = it. Thus
n s s2 s3 s4 o 1 n s s2 s3 o2
log(ϕ(t)) = α1 + α2 + α3 + α4 − α1 + α2 + α3
1! 2! 3! 4! 2 1! 2! 3!
1n s s2 o3 1 s 4
+ α1 + α2 − {α1 } + o(s4 ).
3 1! 2! 4 1!
The only term of degree 1 is α1 so k1 = α1 . The terms of degree 2 are
α2 1 2 1
− α = (α2 − α12 ), so k2 = α2 − α12 .
2! 2 1 2!

283
284 Solutions to Exercises

The terms of degree 3 are

α3 1 α2 α13 1
− 2α1 + = (α3 − 3α1 α2 + 2α13 ), so k3 = α3 − 3α1 α2 + 2α13 .
3! 2 2! 3 3!
The terms of degree 4 are

α4 1 α22 α3 1 2 α2 1 4
− 2
+ 2α1 + 3α1 − α1
4! 2 (2!) 3! 3 2! 4
1
= α4 − 4α1 α3 − 3α22 + 12α12 α2 − 6α14 .
4!
Summarizing let us put this in a table.

k1 = α1
k2 = α2 − α12
k3 = α3 − 3α2 α1 + 2α13
k4 = α4 − 4α3 α1 − 3α23 + 12α2 α12 − 6α14 ;

α1 = k1
α2 = k2 + k12
α3 = k3 + 3k2 k1 + k13
α4 = k4 + 4k3 k1 + 3k22 + 6k2 k12 + k14 .

3. Suppose (r1 , . . . , rn ) is a type, i.e. r1 , . . . , rn ≥ 0 and 1 · r1 + · · · + n · rn = n. Let

us count the number of partitions of [n] with type (r1 , . . . , rn ). Let m = r1 + · · · + rn
be the number of blocks and l1 , . . . , lm the size of the blocks. Then (l1 , . . . , lm ) is a
composition of the integer n with type (r1 , . . . , rn ). There are ln1 ways of choosing
the elements of the first block, n−l

1 ways of choosing the elements of the second
l2
block and finally n−l1 −l2l−···−l m−1

m
ways of choosing the elements of the last block.
Multiplying these out we get

n − l1 n − l1 − · · · − lm−1

n n!
×···× = .
l1 l2 lm l1 !l2 ! · · · lm !

However this overcounts because we don’t distinguish between permutations of the

r1 blocks of size 1, the r2 blocks of size 2, etc. Thus we must divide by r1 ! · · · rn !.
Also we may write l1 ! · · · lm ! as (1!)r1 · · · (n!)rn . Hence the number of partitions of
[n] of type (r1 , . . . , rn ) is

n!
.
(1!)r1 (2!)r2 · · · (n!)rn r1 ! · · · rn !
12.1 Solutions to exercises in Chapter 1 285

4. (i) Write
zn zm
log 1 + ∑ αn = ∑ βm . (12.1)
n≥1 n! m≥1 m!
n
Then by differentiating both sides and multiplying by 1 + ∑n≥1 αn zn! we have

zn−1 zm−1 zn
∑ αn (n − 1)! = ∑ βm (m − 1)! 1 + ∑ αn
n!
n≥1 m≥1 n≥1

and by reindexing

zn zm zn
∑ αn+1 n! = ∑ βm+1 m! 1 + ∑ αn
n!
.
n≥0 m≥0 n≥1

Next let us expand the right-hand side. For convenience of notation we let α0 = 1.

zm zn zm zm+n
∑ βm+1 1 + ∑ αn = ∑ βm+1 + ∑ ∑ βm+1 αn
m≥0 m! n≥1 n! m≥0 m! m≥0 n≥1 m!n!
m
N
z N z
= ∑ βm+1 + ∑ ∑ βm+1 αn
m≥0 m! N≥1 m≥0,n≥1 m N!
m+n=N
N−1
zN
N
N z
= ∑ βN+1 + ∑ ∑ βm+1 αN−m
N≥0 N! N≥1 m=0 m N!
N N
N z
= ∑ ∑ βm+1 αN−m .
N≥0 m=0 m N!

Thus (12.1) is equivalent to

n−1
n−1

αn = ∑ βm+1 αn−m−1 . (12.2)
m=0 m

(ii) Now let us start with the equation αn = ∑π∈P (n) kπ . We shall show that this
implies that
n−1
n−1

αn = ∑ km+1 αn−m−1 . (12.3)
m=0 m
We shall adopt the following notation; given π ∈ P(n) we let V1 denote the block of
π containing 1.
n−1
∑ kπ = ∑ ∑ kπ
π∈P (n) m=0 π∈P (n)
|V1 |=m+1
286 Solutions to Exercises

n−1
n−1

= ∑ km+1 ∑ kσ
m=0 m σ ∈P (n−m−1)
n−1
n−1

= ∑ km+1 αn−m−1 .
m=0 m

Where the second inequality follows because there are n−1

m ways to choose the
m elements of {2, 3, 4, . . . , n} needed to make a block of size m + 1 containing 1,
and then σ , what remains of π after V1 is removed, is a partition of the remaining
n − m − 1 elements.
(iii) Since k1 = α1 = β1 we can use equations (12.2) and (12.3) and induction to
conclude that βn = kn for all n.
5. (i) First note that for a Gaussian random vector, as we have defined it, the entries
are centred, i.e.
exp(−hBt,ti/2)
Z
E(Xi ) = ti dt = 0
R n (2π)n/2 det(B)−1/2
as the integrand is odd. Let σi2 = E(Xi2 ) be the variance of Xi .
If {X1 , . . . , Xn } are independent then the joint distribution of {X1 , . . . , Xn } is
2 2 2 2
e−t1 /(2σ1 ) e−tn /(2σn ) exp(−hBt,ti/2)
q ··· q × dt1 · · · dtn = q dt
2πσ12 2πσn2 (2π)n/2 σ12 · · · σn2

where B is the diagonal matrix with diagonal entries σ1−2 , . . . , σn−2 .

Conversely suppose that B is diagonal with diagonal entries σ1−2 , . . . , σn−2 . Then
the density is the product:
2 2 2 2
e−t1 /(2σ1 ) e−tn /(2σn )
q ··· q × dt1 · · · dtn
2πσ12 2πσn2

and so {X1 , . . . , Xn } are independent.

(ii) Let C = B−1 . As noted above, when {X1 , . . . , Xn } are independent Bi j =
δi j σi−2 = (C−1 )i j . So the result holds for independent Xi ’s.
B is a positive definite real symmetric matrix, so there is an orthogonal matrix O
such that D = O−1 BO is diagonal. Let Y = O−1 X. Write O−1 = (pi j ) and s = O−1t
or t = Os . Then dt = ds by the orthogonality of O.
n
E(Yi1 · · ·Yik ) = ∑ pi1 j1 · · · pik jk E(X j1 · · · X jk )
j1 ,..., jk =1
n
exp(−hBt,ti/2)
Z
= ∑ pi1 j1 · · · pik jk t j1 · · ·t jk dt
j1 ,..., jk =1 Rn (2π)n/2 det(B)−1/2
12.1 Solutions to exercises in Chapter 1 287

exp(−hBOs, Osi/2)
Z
= si1 · · · sik ds
Rn (2π)n/2 det(B)−1/2
exp(−hDs, si/2)
Z
= si1 · · · sik ds.
Rn (2π)n/2 det(D)−1/2

Thus {Y1 , . . . ,Yn } are independent and Gaussian. Hence E(YiY j ) = (D−1 )i j . Thus

n n
ci j = E(Xi X j ) = ∑ oik o jl E(YkYl ) = ∑ oik o jl (D−1 )kl
k,l=1 k,l=1
n
= ∑ oik (D−1 )kl ol j = (OD−1 O−1 )i j = (B−1 )i j .
k,l=1

6. (i) We have 1
2 0 20
C= 1 , so B= .
0 2 02
So the first claim follows from the formula for the density, and the second from the
usual conversion to polar coordinates.
(ii) Note that the integral in polar coordinates factors as an integral over θ and
one over r. Thus for any θ
Z
2 2
(t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2
R2
Z
2 2
= eiθ (m−n) (t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2 .
R2

Hence
Z
n 2 2
E(Z m Z ) = (t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2 = 0 for m 6= n.
R2

Furthermore, we have
Z 2πZ ∞
1 1
Z
2 2 2
E(|Z|2n ) = (t12 + t22 )n e−(t1 +t2 ) dt1 dt2 = r2n e−r r dr dθ
π R2 π 0 0
Z ∞ Z ∞
2 2
= r2n d(−e−r ) = n r2(n−1) d(−e−r ) = · · · = n!.
0 0

7. We have seen that E(Zi1 · · · Zin Z j1 · · · Z jn ) is the number of pairings π of [2n] such
that for each pair (r, s) of π (with r < s) we have that r ≤ n and n + 1 ≤ s ≤ 2n and
ir = js−n . For such a π let σ be the permutation with σ (r) = s − n; we then have
i = j ◦ σ . Conversely let σ be a permutation of [n] with i = j ◦ σ . Let π be the pairing
with pairs (r, n + σ (r)); then ir = js−n for s = n + σ (r).
288 Solutions to Exercises

8. E(| fi j |2 ) = 1/N, so E(xii2 ) = 1/N and for i 6= j, E(xi2j ) = E(y2i j ) = 1/(2N).

Thus the covariance matrix C is the N 2 × N 2 diagonal matrix with diagonal entries
(1/N, . . . , 1/N, 1/(2N), . . . , 1/(2N)) (here the entry 1/N appears N times). Thus the
density matrix B is the diagonal matrix with diagonal entries (N, . . . , N, 2N, . . . , 2N).
Hence
N
hBX, Xi = N ∑ xii2 + 2 ∑ (xi2j + y2i j )
i=1 1≤i< j≤N
N
=N ∑ xii2 + ∑ (xi2j + y2i j )
i=1 1≤i, j≤N
i6= j
N √ √
∑ xii2 + ∑

=N (xi j + −1yi j )(xi j − −1yi j )
i=1 1≤i, j≤N
i6= j

= N Tr(X 2 ).
2 2 −N
Thus exp(−hBX, Xi/2) = exp(−NTr(X 2 )/2). Next det(B) = N N 2N . Thus
N N 2 /2 1 N/2
c= .
π 2

9. Note that (A1 ∨ A2 ) A1 ⊂ ker ϕ because A1 is unital. By the non-degeneracy

of ϕ, Å1 ∩ ((A1 ∨ A2 ) A1 ) = {0}. So by equation (1.11) the left-hand side of
(1.12) is contained in the right-hand side. To prove the reverse containment let
a1 · · · an ∈ Åα1 · · · Åαn for some α1 6= · · · = 6 αn . Let a ∈ A1 ; for αn 6= 1, we have
ϕ(a1 · · · an a) = ϕ(a1 · · · an å) + ϕ(a1 · · · an )ϕ(a) = 0 by freeness; and if αn = 1 we
have ϕ(a1 · · · an a) = ϕ(a1 · · · an−1 (an a)◦ )+ϕ(a1 · · · an−1 )ϕ(an a) = 0, again by free-
ness. Thus a1 · · · an ∈ (A1 ∨ A2 ) A1 .
10. (i) Let ∑∞ n
n=1 βn z be a formal power series. Using the series for exp we have

∞ ∞ ∞ n
1
exp ∑ βn zn = ∑ n! ∑ βl zl
n=1 n=0 l=1
∞
1 ∞ ∞
= 1+ ∑ ∑ · · · ∑ βl1 · · · βln zl1 +···+ln
n=1 n! l1 =1 ln =1
∞ n βl1 · · · βlm n
= 1+ ∑ ∑ ∑ z .
n=1 m=1 l ,...,lm ≥1 m!
1
l1 +···+lm =n

(ii) We continue from the solution to (i). First we shall work with the sum
12.1 Solutions to exercises in Chapter 1 289
n βl1 · · · βlm
S= ∑ ∑ .
m=1 l1 ,...,lm ≥1 m!
l1 +···+lm =n

We are summing over all tuples l = (l1 , . . . , lm ) of positive integers such that
l1 + · · · + lm = n, i.e. over all compositions of the integer n. By the type of the
composition l = (l1 , . . . , ln ) we mean the n-tuple r = (r1 , . . . , rn ) where the ri ’s
are integers, ri ≥ 0, and ri is the number of l j ’s that equal i. We must have
1 · r1 + 2 · r2 + · · · + n · rn = n and m = r1 + · · · + rn is the number of parts of
l = (l1 , . . . , lm ). Note that βl1 · · · βlm = β1r1 · · · βnrn depends only on the type of
l = (l1 , . . . , lm ). Hence we can group the compositions by their type and thus S be-
comes
β1r1 · · · βnrn
S= ∑ × no. compositions of n of type (r1 , . . . , rn ).
1r1 +···+nrn =n (r1 + · · · + rn )!

Given a type r = (r1 , . . . , rn ) there are r1 + · · · + rn parts which can be permuted

in (r1 + · · · + rn )! ways, however we don’t distinguish between permutations that
change li ’s which are equal; thus we must divide by r1 !r2 ! · · · rn !. Hence the number
of compositions of the integer n of type (r1 , . . . , rn ) is

(r1 + · · · + rn )!
r1 !r2 ! · · · rn !
thus
β1r1 · · · βnrn
S= ∑ .
1r1 +···+nrn =n r1 !r2 ! · · · rn !

Hence
∞
n
∞ β1r1 · · · βnrn n
exp ∑ n = 1+ ∑
β z ∑ r1 !r2 ! · · · rn !
z .
n=1 n=1 r1 ,...,rn ≥0
1r1 +···+nrn =n
kn
By replacing βn by n! we obtain the equation
∞
n! n ∞ zn
r1 rn z
∑ ∑ k · · · kn = exp ∑ k n .
n=0 r1 ,...,rn ≥0 (1!)r1 · · · (n!)rn r1 !r2 ! · · · rn ! 1 n! n=1 n!
1·r1 +···+n·rn =n

Then we compare this with the defining equation

zn zn
log 1 + ∑ αn = ∑ kn
n≥1 n! n≥1 n!

to conclude that equation (1.1) holds.

11. If we replace the ordinary generating function ∑n≥1 βn zn by the exponential
generating function ∑n≥1 βn zn /(n!) we get from Exercise 10 (ii)
290 Solutions to Exercises
∞
βn n ∞ β1r1 · · · βnrn
exp ∑ n! z = 1 + ∑ ∑ zn
n=1 n=1 r1 ,...,rn ≥0 (1!)r1 · · · (n!)rn r1 !r2 ! · · · rn !
1r1 +···+nrn =n
∞
n! zn
= 1+ ∑ ∑ r r
β1r1 · · · βnrn .
n=1 r1 ,...,rn ≥0 (1!) 1 · · · (n!) n r1 !r2 ! · · · rn ! n!
1r1 +···+nrn =n

From Exercise 3 we know

n!
(1!)r1 · · · (n!)rn r1 !r2 ! · · · rn !

counts the number of partitions of the set [n] of type (r1 , . . . , rn ). If π = {V1 , . . . ,Vm }
is a partition of [n] we let βπ = β|V1 | β|V2 | · · · β|Vm | where |Vi | is the number of elements
in the block Vi . If the type of the partition π is (r1 , . . . , rn ) then β1r1 β2r2 · · · βnrn = βπ .
Thus we can write
∞ β ∞ zn
n
exp ∑ zn = 1 + ∑ ∑ π n! .
β
n=1 n! n=1 π∈P (n)

12. Using log(1 − x) = − ∑n≥1 xn /n we have

∞ ∞ ∞ n
1
− log(1 − ∑ βn zn ) = ∑ n ∑ βl zl
n=1 n=1 l=1
∞ ∞ ∞
1
= ∑ n ∑ · · · ∑ βl1 · · · βln zl1 +···+ln
n=1 l1 =1 ln =1
∞ m
1
= ∑ ∑ ∑ βl1 · · · βln zm
m=1 n=1 n l 1 ,...,ln ≥1
l1 +···+ln =m
∞ n
1
= ∑∑ ∑ βl1 · · · βlm zn .
m
n=1 m=1 l 1 ,...,lm ≥1
l1 +···+lm =n

Now let S be the sum

n
1
S= ∑ ∑ βl1 · · · βlm .
m
m=1 l 1 ,...,lm ≥1
l1 +···+lm =n

As with the exponential, this is a sum over all compositions of the integer n so we
group the terms according to their type, as was done in the solution to Exercise 10 .

β1r1 · · · βnrn
S= ∑ × no. compositions of n of type (r1 , . . . , rn )
1r1 +···+nrn =n r1 + · · · + rn
12.1 Solutions to exercises in Chapter 1 291

(r1 + · · · + rn − 1)!
= ∑ β1r1 · · · βnrn .
1r1 +···+nrn =n r1 ! · · · rn !

Putting this in the equation for − log(1 − ∑n≥1 βn zn ) we get

∞ β1r1 · · · βnrn n
− log(1 − ∑ βn zn ) = ∑ ∑ (r1 + · · · + rn − 1)! z .
n≥1 n=1 1r1 +···+nrn =n r1 ! · · · rn !

Replacing βn by −βn we obtain

∞ β1r1 · · · βnrn n
log(1 + ∑ βn zn ) = ∑ ∑ (−1)r1 +···+rn −1 (r1 + · · · + rn − 1)! z .
n≥1 n=1 1r1 +···+nrn =n r1 ! · · · rn !

13. (i) We replace βn with αn /(n!) in Exercise 12 to obtain

αn n
log(1 + ∑ z )
n≥1 n!
∞ α r1 · · · α rn n! zn
= ∑ ∑ (−1)r1 +···+rn −1 (r1 + · · · + rn − 1)! (1!)r1 · ·1· (n!)rnn r1 ! · · · rn ! n! .
n=1 1r1 +···+nrn =n

We then turn this into a sum over partitions recalling that

n!
(1!)r1 · · · (n!)rn r1 ! · · · rn !

is the number of partitions of [n] of type (r1 , . . . , rn ) and if π is a partition of [n] we

denote by #(π) the number of blocks of π. Then as

(−1)r1 +···+rn −1 (r1 + · · · + rn − 1)!α1r1 · · · αnrn = (−1)#(π)−1 (#(π) − 1)! απ

only depends on the type of π we have

∞
αn n ∞
zn
log(1 + ∑ z ) = ∑ ∑ (−1)#(π)−1 (#(π) − 1)! απ . (12.4)
n=1 n! n=1 π∈P (n) n!

(ii) Note that αn only appears once in

kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ

π∈P (n)

so each of the sequences {αn }n and {kn }n determines the other. Thus we may write
the result of (i) as
∞
zn ∞
zn
∑ kn n! = log 1 + ∑ αn n! .
n=1 n=1

On the other hand replacing the sequence {βn }n by {kn }n in Exercise 11 we have
292 Solutions to Exercises
∞
zn ∞ zn ∞ zn
1 + ∑ αn = exp ∑ kn = 1+ ∑ ∑ π n!k
n=1 n! n=1 n! n=1 π∈P (n)

and so we get the other half of the moment-cumulant relation

αn = ∑ kπ .
π∈P (n)

14. Since ν has moments of all orders, ϕ, the characteristic function of ν, has
derivatives of all orders. Fix n > 0. We may write
n
sr
ϕ(t) = 1 + ∑ αr + o(sn )
r=1 r!

where s = it and αr is the rth moment of ν. We can also write

n
zr
log(1 + z) = ∑ (−1)r+1 r + o(zn ).
r=1

Now for l ≥ 1
n n
sr l sr l
∑ αr r! + o(sn ) = ∑ αr r! + o(sn ).
r=1 r=1

Thus
n
(−1)l+1 n sr l
log(ϕ(t)) = ∑ ∑ αr + o(sn )
l=1 l r=1 r!
and hence
n n
sl (−1)l+1 n sr l
∑ kl l! + o(sn ) = ∑ l ∑ αr
r!
+ o(sn ).
l=1 l=1 r=1

By Exercise 12 we have

kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ

π∈P (n)

and
αn = ∑ kπ .
π∈P (n)

12.2 Solutions to exercises in Chapter 2

8. (i) This follows from applying a cyclic rotation to the moment-cumulant formula
and observing that non-crossing partitions are mapped to non-crossing partitions
under rotations.
12.2 Solutions to exercises in Chapter 2 293

(ii) This is not true, since the property non-crossing is not preserved under arbi-
trary permutations. For example, in the calculation of κ4 (a1 , a2 , a3 , a4 ) the cross-
ing term ϕ(a1 a3 )ϕ(a2 a4 ) does not show up. However, in κ4 (a1 , a3 , a2 , a4 ) this
term becomes non-crossing and will make a contribution. Hence κ4 (a1 , a2 , a3 , a4 ) 6=
κ4 (a1 , a3 , a2 , a4 ) in general, even if all ai commute.
9. For the semi-circle law we have th
th 1 2k
that all odd moments are 0 and the 2k moment
is the k Catalan number k+1 k which is also the cardinality of NC2 (2k), the non-
crossing pairings of [2k]. Since α1 = 0 we have κ1 = 0; and since α2 = κ12 + κ2 we
have κ2 = α2 = 1. Now let NC∗ (n) be the set of non-crossing partitions which are
not pairings. For n = 2k we have

αn = ∑ κπ = ∑ κπ + ∑ κπ = αn + ∑ κπ .
π∈NC(n) π∈NC2 (n) π∈NC∗ (n) π∈NC∗ (n)

Thus for n even ∑π∈NC∗ (n) κπ = 0; and also for n odd because there are no pairings
of [n]. When n = 3, this forces κ3 = 0. Then for general n we write

0= ∑ κπ = κn + ∑ κπ ,
π∈NC∗ (n) π∈NC∗∗ (n)

where NC∗∗ (n) is all the partitions in NC∗ (n) with more than one block. By induc-
tion ∑π∈NC∗∗ (n) κπ = 0; so κn = 0 for n ≥ 3.
11. (iv) We have
∑ c#(π) = αn = ∑ κπ . (12.5)
π∈NC(n) π∈NC(n)

When n = 1 this gives κ1 = c. If we have shown that κ1 = · · · = κn−1 = c then

∑∗∗ κπ = ∑∗∗ c#(π)

π∈NC (n) π∈NC (n)

where NC∗∗ (n) is all non-crossing partitions of [n] with more than one block. Thus
(12.5) shows that κn = c.
14. We have

ωa (z) + ωb (z) = 2z − Ra (Ga+b (z)) + Rb (Ga+b (z))
= 2z − Ra+b (Ga+b (z))
= 2z − (z − 1/Ga+b (z))
= z + 1/Ga+b (z)
= z + 1/Ga (ωa (z)).

h−1i
15. By inverting the first equation in (2.32) we have ωa (Gh−1i (z)) = Ga (z) and
h−1i
ωb (Gh−1i (z)) = Gb (z) By the second equation in (2.32) we have
294 Solutions to Exercises

R(z) + 1/z = Gh−1i (z)

= ωa (Gh−1i (z)) + ωb (Gh−1i (z)) − 1/Ga (ωa (Gh−1i (z)))
h−1i h−1i h−1i
= Ga (z) + Gb (z) − 1/Ga (Ga (z))
= Ra (z) + 1/z + Rb (z) + 1/z − 1/z.

Hence R(z) = Ra (z) + Rb (z).

17. (i) Let a2 ∈ Å2 and a1 ∈ A1 . Then ϕ(a1 Ex [a2 ]) = ϕ(a1 a2 ) = ϕ(a1 )ϕ(a2 ) = 0,
by freeness. Thus Ex [a2 ] = 0.
(ii) Let a1 · · · an ∈ Åα1 · · · Åαn and a ∈ A1 . First suppose α1 6= 1. Then

ϕ(aEx [a1 · · · an ]) = ϕ(aa1 · · · an ) = ϕ(åa1 · · · an ) + ϕ(a)ϕ(a1 · · · an ) = 0

by freeness, thus Ex [a1 · · · an ] = 0. If α1 = 1 then we write

ϕ(aEx [a1 · · · an ]) = ϕ((aa1 )a2 · · · an ) = ϕ((aa˚ 1 )a2 · · · an ) + ϕ(aa1 )ϕ(a2 · · · an ) = 0

by freeness and hence again Ex [a1 · · · an ] = 0.

18. Let p(x, y) ∈ Chx, yi be given we must show that using the definition of Ex
given in the exercise we have that Equation (2.36) holds for all q(x) ∈ Chxi, i.e.
ϕ(q(x)p(x, y)) = ϕ(q(x)Ex [p(x, y)]). This equation is linear in p so we only need to
check it for p in each of the summands of the decomposition of A1 ∨ A2 . It is imme-
diate for p ∈ A1 . It is then an easy consequence of freeness that ϕ(q(x)p(x, y)) = 0
for p in any other of the summands.
19. (i) Recall the definition of ϕ̃π . We have for π = {V1 , . . . ,Vs } with n ∈ Vs ,

ϕ̃π (a1 , a2 , . . . , an ) = ϕ ∏ ai1 · · · ϕ ∏ ais−1 ∏ ais
i1 ∈V1 is−1 ∈Vs−1 is ∈Vs

so

ϕ(a0 ϕ̃π (a1 , a2 , . . . , an )) = ϕ ∏ ai1 · · · ϕ ∏ ais−1 ϕ a0 ∏ ais .
i1 ∈V1 is−1 ∈Vs−1 is ∈Vs

Now the right-hand side is exactly ϕπ 0 (a0 , a1 , a2 , . . . , an ) where π 0 is the non-

crossing partition obtained by adding 0 to the block Vs of π containing n.
(ii) For the purposes of this solution we shall introduce the following nota-
tion. Let [n̄] = {1̄, 2̄, . . . , n̄}. Let [n̄0 ] = {0̄, 1̄, 2̄, . . . , n̄} and [2n] = {1, 1̄, 2, 2̄, . . . , n, n̄}
and [2n0 ] = {0̄, 1, 1̄, . . . , n, n̄}. Let σ ∈ NC(2n0 ), since x0 , x1 , . . . , xn are free from
y1 , . . . , yn we have that κσ (x0 , y1 , x1 , . . . , yn , xn ) = 0 unless we can write σ = π ∪ τ
with π ∈ NC(n) and τ ∈ NC(n̄0 ). Let us recall the definition of the Kreweras com-
plement from section 2.3. For π a non-crossing partition of [n], K(π) is the largest
non-crossing partition of [n̄] so that π ∪ K(π) is a non-crossing partition of [2n].
Thus K(π)0 is the largest non-crossing partition of [n̄0 ] such that π ∪ K(π)0 is a non-
12.3 Solutions to exercises in Chapter 3 295

crossing partition of [2n0 ]. Thus for π ∈ NC(n) and τ ∈ NC(n̄0 ) we have that π ∪ τ
is a non-crossing partition of [2n0 ] if and only if τ ≤ K(π)0 . Thus

ϕ(x0 y1 x1 · · · yn xn ) = ∑ κσ (x0 , y1 , x1 , . . . , yn , xn )
σ ∈NC(2n0 )

= ∑ κπ (y1 , . . . , yn ) ∑ κτ (x0 , x1 , . . . , xn )
π∈NC(n) τ∈NC(n̄0 )
π∪τ∈NC(2n0 )

= ∑ κπ (y1 , . . . , yn ) ∑ κτ (x0 , x1 , . . . , xn )
π∈NC(n) τ∈NC(n̄0 )
τ≤K(π)0

= ∑ κπ (y1 , . . . , yn )ϕK(π)0 (x0 , x1 , . . . , xn )

π∈NC(n)

= ∑ κπ (y1 , . . . , yn )ϕ(x0 ϕ̃K(π) (x1 , . . . , xn ))

π∈NC(n)

= ϕ(x0 ∑ κπ (y1 , . . . , yn )ϕ̃K(π) (x1 , . . . , xn )).

π∈NC(n)

Hence by the non-degeneracy of ϕ we have

∑ κπ (y1 , . . . , yn )ϕ̃K(π) (x1 , . . . , xn ) = Ex (y1 x1 · · · yn xn ).

π∈NC(n)

12.3 Solutions to exercises in Chapter 3

1. (i) Let δa be the probability measure with an atom of mass 1 at a. Then

R
1/(z − t) dδa (t) = 1/(z − a). We have
n n
λi
µ = ∑ λi δai , thus G(z) = ∑ .
i=1 i=1 z − ai

(ii) Fix z ∈ C+ . Let

1
Z ∞
f (w) = , then G(z) = f (t) dt.
π(z − w)(w − i)(w + i) −∞

Since f is a rational function such that limw→∞ w f (w) = 0 and by the residue theo-
rem we have
Z ∞ Z
G(z) = f (t) dt = lim f (w) dw = 2πi(Res( f , z) + Res( f , i)),
−∞ R→∞ CR

where CR is the closed curve formed by joining part of the circle |w| = R in C+ to
the interval [−R, R].
296 Solutions to Exercises

−1 1
Res( f , z) = and Res( f , i) = .
π(z − i)(z + i) 2πi(z − i)

Thus G(z) = 1/(z + i).

2 2 2
√ Let z = u + iv, with v > 0, then z − 4 = u − v − 4 + 2iuv and so (by Ex.
4. (iii)
2
3.2) z − 4 = x + iy with
sp
(u2 − v2 − 4)2 + 4u2 v2 + (u2 − v2 − 4)
x = sgn(u) and
2
sp
(u2 − v2 − 4)2 + 4u2 v2 − (u2 − v2 − 4)
y= .
2
Note that x and u always have the same sign. From Exercise 3.3 we have

0 < v < y and 0 ≤ |x| < |u|.

√ √
z − z2 − 4 z + z2 − 4
Recall that w1 = and w2 = .
2 2
Claim: |Re(w1 )| ≤ |Re(w2 )| and |Im(w1 )| < |Im(w2 )|. Since w1 w2 = 1 this implies
|w1 | < 1.

u−x v−y u+x v+y

Re(w1 ) = , Im(w1 ) = , Re(w2 ) = , Im(w2 ) =
2 2 2 2
We have
y−v y+v
|Im(w1 )| = −Im(w1 ) = < = Im(w2 ) = |Im(w2 )|
2 2
When u > 0 we have
u−x u+x
|Re(w1 )| = ≤ = Re(w2 ) = |Re(w2 )|.
2 2
When u < 0 we have
x − u −u − x
|Re(w1 )| = ≤ = −Re(w2 ) = |Re(w2 )|.
2 2

5. (iii) Use the same idea as in Exercise 3.4 (iii) to identify the roots inside Γ .
9. The density is given by

1 −b
dν(t) = dt.
π b2 + (t − a)2
12.3 Solutions to exercises in Chapter 3 297

11. Let 0 < α1 < α2 and β2 > 0 be given, we must find β1 > 0 so that f (Γα1 ,β1 ) ⊂
Γα2 ,β2 . Choose ε > 0 so that
q
1 + α22 1+ε
q > q .
1 + α12 1 − ε 1 + α12

Choose β1 > 0 so that for z ∈ Γα1 ,β1 we have | f (z) − z| < ε|z|. Then

Im( f (z)) = Im(z) + Im( f (z) − z)

> Im(z) − | f (z) − z|
> Im(z) − ε|z|
> ((1 + α1 )−1/2 − ε)|z|
|z| + ε|z|
= ((1 + α1 )−1/2 − ε)
1+ε
|z| + | f (z) − z|
> ((1 + α1 )−1/2 − ε)
1+ε
((1 + α1 )−1/2 − ε)
≥ | f (z)|.
1+ε
q
Thus 1 + α22 Im( f (z)) > | f (z)|, so f (z) ∈ Γα2 . We now have
q q
Im( f (z)) > Im(z) − ε|z| > 1 − ε 1 + α12 Im(z) > 1 − ε 1 + α12 β1 .
q
So by choosing β1 still larger we may have 1 − ε 1 + α12 β1 > β2 . Thus f (z) ∈
Γα2 ,β2 .
12. (i) The result is trivial when t = 0. By symmetry we only need consider the
case t > 0. Since Γα is convex the minimum √ of |z − t| occurs when z is in
√ ∂Γα . The
distance from t to the line x − αy = 0 is t/ 1 + α 2 . Hence |z − t| ≥ |t|/ 1 + α 2 .
(ii) Write z = |z|eiθ with tan−1 (α −1 ) < θ < π − tan−1 (α −1 ). If t = 0 the inequal-
ity is trivially true. Suppose t > 0 then
p
|z − t| = |z − t| = ||z| − teiθ | ≥ |z|/ 1 + α 2

by (i) since teiθ ∈ Γα . If t < 0 then

p
|z − t| = ||z| − te−iθ | ≥ |z|/ 1 + α 2

by (i) since te−iθ ∈ Γα . √

(iii) By (i), |t/(z − t)| ≤ 1 + α 2 for z ∈ Γα . Since σ is a finite measure we may
apply the dominated convergence theorem.
(iv) Now
298 Solutions to Exercises

z t
Z Z
zG(z) = dν(t) so zG(z) − 1 = dν(t).
R z − t R z − t
Thus we can apply the result from (iii).
13. By Exercise 12 we have, for Im(z) ≥ 1,
1 + tz 1 |t| p
≤ + ≤ 2 1 + α 2.

z(t − z) |t − z| |t − z|

Write
F(z) a 1 + tz
Z
= +b+ dσ (t).
z z z(t − z)
For a fixed t we have
1 + tz t + z−1
= −→ 0
z(t − z) t −z
as z → ∞. Since |(1 + tz)/(z(t − z))| is bounded independently of t and z then we can
apply the dominated convergence theorem to conclude that F(z)/z → b as z → ∞ in
Γα .
14. (i) By assumption the function t 7→ |t|n is integrable with respect to ν. By
Exercise 12 we have for z ∈ Γα

|t|n+1 p
≤ |t|n 1 + α 2 .
|z − t|

Thus by the dominated convergence theorem

t n+1
Z
lim dν(t) = 0.
z→∞ z−t
(ii) We have
1 1 1 t tn
Z
α1 αn
G(z) − + + · · · + = − + 2 + · · · + n+1 dν(t)
z z2 zn+1 R z−t z z z
Z n+1
1 t
= n+1 dν(t).
z R z−t

Thus Z n+1
1 α αn t
n+1 1
z G(z) − + + · · · + n+1 = dν(t)
z z2 z R z−t
and this integral converges to 0 as z → ∞ in Γα by (i).
15. We shall proceed by induction on n. To begin the induction process let us show
that α1 and α2 are, respectively, the first and second moments of ν. Note that for
any 1 ≤ k ≤ 2n we have that as z → ∞ in Γα
1 α αk
1
lim zk+1 G(z) − + 2 + · · · + k+1 = 0.
z→∞ z z z
12.3 Solutions to exercises in Chapter 3 299
R
Also by Exercise 12, R |t/(z − t)| dν(t) < ∞, so we may let

1 t
Z
G1 (z) = z G(z) − = dν(t).
z R z − t
Then since n is a least 1 we have
α2 1 α α2
1
lim z zG1 (z) − α1 − = lim z3 G(z) − + 2 + 3 = 0.
z→∞ z z→∞ z z z
Hence
lim z zG1 (z) − α1 = α2 .
z→∞

Since α1 and α2 are real

lim Re z zG1 (z) − α1 = α2 .
z→∞

Now let z = iy with y > 0 then

Re z zG1 (z) − α1 = Re − y2 G1 (iy) − iα1 y = −y2 Re(G1 (iy))
t −t 2
Z Z
= −y2 Re dν(t) = −y2 2 2
dν(t)
R iy − t R y +t
t2
Z
= 2
dν(t).
R 1 + (t/y)

Thus
t2
Z
lim 2
dν(t) = α2 ,
y→∞ R 1 + (t/y)

so by the monotone convergence theorem R t 2 dν(t) = α2 . Hence ν has a first and

second moment and the second moment is α2 .

Since limz→∞ z(zG1 (z) − α1 ) = α2 we must have limz→∞ zG1 (z) = α1 . Letting
z = iy with y > 0 we have α1 = limy→∞ iyG1 (iy) and thus
Z iyt
α1 = lim Re(iyG1 (iy)) = lim Re dν(t)
y→∞ y→∞ R iy − t
y2t t
Z Z
= lim dν(t) = lim dν(t).
y→∞ R y2 + t 2 y→∞ R 1 + (t/y)2

Now |t/(1 + (t/y)2 )| ≤ |t| and R |t| dν(t) < ∞ so by the dominated convergence
R
R
theorem α1 = R t dν(t).
Suppose that we have shown that ν has moments up to order 2n − 2 and αk , for
1 ≤ k ≤ 2n − 2, is the kth moment. Thus R |t 2n−1 /(z − t)| dν(t) < ∞ by Exercise 12
R

(i). Let us write

300 Solutions to Exercises
1 α α2n−2
1
G2n−1 (z) = z2n−1 G(z) − + 2 + · · · + 2n−1
z z z
1 1 t t 2n−2
Z
= z2n−1 − + 2 + · · · + 2n−1 dν(t)
R z−t z z z
Z 2n−1
t
= dν(t).
R z−t
α2n−1
By our hypothesis limz→∞ z2 (G2n−1 (z) − ( z + αz2n
2 )) = 0 or equivalently

lim z(zG2n−1 (z) − α2n−1 ) = α2n . (12.6)

z→∞

Let z = iy with y > 0. Since α2n−1 and α2n are real

α2n = lim Re iy(iyG2n−1 (iy) − α2n−1 ) = lim −y2 Re(G2n−1 (iy))
y→∞ y→∞
Z t 2n−1 Z
y2t 2n
= lim −y2 Re dν(t) = lim dν(t)
y→∞ R iy − t y→∞ R y2 + t 2

t 2n
Z
= lim dν(t).
y→∞ R 1 + (t/y)2

So again by the monotone convergence theorem we have R t 2n dν(t) = α2n and

thus ν has a moment of order 2n and this moment is α2n . Thus ν has a moment of
order 2n − 1 and from Equation (12.6) we have limz→∞ zG2n−1 (z) = α2n−1 . Then by
letting z = iy and taking real parts we obtain that
Z iyt 2n−1
α2n−1 = lim Re iyG2n−1 (iy) = lim Re dν(t)
y→∞ y→∞ R iy − t
t 2n−1
Z
= lim dν(t)
y→∞ R 1 + (t/y)2

2n−1 dν(t).
R
Thus by the dominated convergence theorem α2n−1 = Rt This com-
pletes the induction step.
16. Let us write
1 α1 α2 α3 α4
G(z) = + + 3 + 4 + 5 + r(z)
z z2 z z z
where r(z) = o( z15 ). Then

1
α1
z + αz22 + αz33 + αz44 + zr(z)
z− = 1
.
G(z) z + αz21 + αz32 + αz43 + αz54 + r(z)

Let us equate this with

β0 β1 β2
α1 + + 2 + 3 + q(z)
z z z
12.3 Solutions to exercises in Chapter 3 301

and solve for β0 , β1 , β2 , and q(z). After cross multiplication we find that

α2 = α12 + β0 α3 = α1 α2 + β0 α1 + β1 α4 = α1 α3 + α2 β0 + α1 β1 + β2 .

Thus

β0 = α2 − α12 β1 = α3 − 2α1 α2 + α13 β2 = α4 − 2α1 α3 − α22 + 3α12 α2 − α14

and q(z) = o(z−3 ).

17. (i) Note that since f is proper, each point has only a finite number of preimages.
So let w0 ∈ C and let z1 , . . . , zr be the preimages of w0 . We shall treat the case when
w0 has no preimages separately. For each i choose a chart (Ui , ϕi ) of zi and an integer
mi so that f (ϕ h−1i (z)) = zmi . By shrinking the Ui , if necessary, we may assume that
they are disjoint. If we can show that there is a neighbourhood V of w0 such that all
preimages of points in V are in the union U1 ∪ · · · ∪ Ur , then deg f (w) = m1 + · · · + mr
for w ∈ V. This will show that the integer valued function deg f is locally constant
and by the connectedness of C we shall have that deg f is constant.
So let us suppose that no such V exists and reach a contradiction. If no such
V exists then there is a sequence {wn }n converging to w0 such that each wn has a
preimage zn not in U1 ∪ · · · ∪ Ur . By shrinking V if necessary we may suppose that
V is compact. By the properness of f , we must have a subsequence {znk }k of {zn }n
which has a limit z, say. Then f (z) = limk f (znk ) = limk wnk = w0 . So z = zi for some
i and thus the subsequence {znk }k must penetrate the open set Ui contradicting our
assumption.
If w0 has no preimages then we must show that there is a neighbourhood of w0
with no preimages. If not then there is a sequence {wn }n converging to w0 such
that each wn has a preimage. But {w0 } ∪ {wn }n is a compact set so we can extract
from these preimages a convergent sequence of preimages whose limit can only be
a preimage of w0 , contradicting our assumption. This proves (i).
(ii) First let us note that since Fi0 (z) 6= 0 for i = 1, 2 and z ∈ C+ for each z ∈
+
C there is neighbourhood of z on which both F1 and F2 are one-to-one. So for
(z1 , z2 ) ∈ X let w = F1 (z1 ). Then there is U, a neighbourhood of w, and two analytic
maps f1 and f2 defined on U such that for u ∈ U we have Fi ◦ fi = id. We then let
V = {( f1 (u), f2 (u)) | u ∈ U} and define ϕ : V → U by ϕ(w1 , w2 ) = F1 (w1 ).
To show these charts define a complex structure on X we must show that given
two charts (V, ϕ) and (V 0 , ϕ 0 ) we have that ϕ 0 ◦ϕ h−1i is analytic on ϕ(V ∩V 0 ). So by
construction we have two points (z1 , z2 ) and (z01 , z02 ) in X and two neighbourhoods
U and U 0 of F1 (z1 ) and F1 (z01 ), respectively and on these neighbourhoods we have
analytic maps f1 , f2 : U → C and f10 , f20 : U 0 → C such that Fi ◦ fi = id and Fi ◦ fi0 = id.
Then on ϕ(V ∩ V 0 ), we have that ϕ 0 ◦ ϕ h−1i (u) = ϕ( f10 (u), f20 (u)) = F1 ( f10 (u)) = u.
So ϕ 0 ◦ ϕ h−1i = id is analytic.
(iii) To show that θ is proper we must show that the inverse image of a compact
subset of C is compact. So let K = B(z, r) be given. We must show that θ h−1i (K) is
compact. Since θ is continuous, we have that θ h−1i (K) is closed. So we only have
302 Solutions to Exercises

to show that every sequence in θ h−1i (K) contains a convergent subsequence. Let
{(z1,n , z2,n )}n be a sequence in θ h−1i (K). Then

|z1,n | ≤ |θ (z1,n , z2n )| + |z2,n − F2 (z2,n )| ≤ |z| + r + σ22 /r.

Likewise |z2,n | ≤ |z| + r + σ12 /r. By Lemma 19,

Im(z1,n ), Im(z2,n ) ≥ Im(θ (z1,n , z2,n )) ≥ Im(z) − r.

So there is a subsequence {(z1,nk , z2,nk )}k such that both {z1,nk }k converges to z1 ,
say and {z2,nk }k converges to z2 , say. Then

F1 (z1 ) = lim F1 (z1,nk ) = lim F2 (z2,nk ) = F2 (z2 )

k k

so (z1 , z2 ) ∈ X. Also θ (z1 , z2 ) = limk θ (z1,nk , z2,nk ) ∈ K. Hence (z1 , z2 ) ∈ θ h−1i (K)
as required.

12.4 Solutions to exercises in Chapter 4

5. The commutativity of Jk and Jl is a special case of the fact that Jl commutes with
C[Sl−1 ]. For the latter note that for k < l and σ ∈ Sl−1

σ · (k, l) · σ −1 = (σ (k), l).

Thus we have

σ Jl σ −1 = σ ((1, l) + · · · + (l − 1, l))σ −1 = (σ (1), l) + · · · + (σ (l − 1), l) = Jl .

7. Hint: Using the convention that J10 = 1, write

(1 + N −1 J1 )−1 (1 + N −1 J2 )−1 · · · (1 + N −1 Jn )−1

as
∑ (−N)−l ∑ J1k1 J2k2 · · · Jnkn
l≥0 k1 ,...,kn ≥0
k1 +···+kn =l

and observe that J1k1 · · · Jnkn is a linear combination of permutations of length at most
k1 + · · · + kn .
9. Recall that γ = γm . Given i : [2m] → [n] such that ker(i) ≥ π, let j : [m] → [n]
be defined by j(γ −1 (k)) = i(2k − 1) and j(σ (k)) = i(2k). To show that such a j is
well defined we must show that when σ (k) = γ −1 (l) we have i(2k) = i(2l − 1). If
σ (k) = γ −1 (l) we have that (k, γ −1 (l)) is a pair of σ , and thus (2l − 1, 2k) is a pair
of π. Since we have assumed that ker(i) ≥ π we have i(2l − 1) = i(2k) as required.
Conversely if we have j : [m] → [n], let i(2k − 1) = j(γ −1 (k)) and i(2k) = j(σ (k)).
12.5 Solutions to exercises in Chapter 5 303

Then ker i ≥ π. This gives us a bijection of indices so

n n
(1) (m) (1) (m)
∑ di1 i2 · · · di2m−1 i2m = ∑ d j −1 j · · · d j −1 j .
γ (1) σ (1) γ (m) σ (m)
i1 ,...,i2m =1 j1 ,..., jm =1
ker(i)≥π

By a change of variables we have

n n
(1) (m) (1) (m)
∑ d j −1 j · · · d j −1 jσ (m) = ∑ d j1 jγσ (1) · · · d jm jγσ (m) .
γ (1) σ (1) γ (m)
j1 ,..., jm =1 j1 ,..., jm =1

12. The first part is just the expansion of the product of matrices. Now let us
write xα(l) = xil i−l and xβ (l) = xiγ(l) ,i−l , where γ is the permutation with one cycle
(1, 2, 3, . . . , k). With this notation we have by Exercise 1.7

E(xi1 i−1 xi2 i−1 · · · xik i−k xi1 i−k ) = E(xα(1) · · · xα(k) xβ (1) · · · xβ (k) )
= |{σ ∈ Sk | α = β ◦ σ }|.

If α = β ◦ σ then il = iγ(σ (l)) and i−l = i−σ (l) for 1 ≤ l ≤ k. Thus for a fixed σ there
are N #(γσ ) ways to choose the k-tuple (i1 , . . . , ik ) so that il = iγ(σ (l)) and M #(σ ) ways
of choosing the k-tuple (i−1 , . . . , i−k ) so that i−l = i−σ (l) . Hence

−1 γ)−k
M #(σ )
E(Tr(Ak )) = ∑ N #(γσ )−k M #(σ ) = ∑ N #(σ )+#(σ .
σ ∈Sk σ ∈Sk N

Thus M #(σ )
−1 γ)−(k+1)
E(tr(Ak )) = ∑ N #(σ )+#(σ
σ ∈Sk N
and by Proposition 1.5 the only σ ’s for which the exponent of N is not negative are
those σ ’s which are non-crossing partitions. Thus lim E(tr(Ak )) = ∑σ ∈NC(k) c#(σ ) .

12.5 Solutions to exercises in Chapter 5

1. One has to realize that the order on a through-cycle of a non-crossing annular

permutation has to be of the following form: one moves at one point p from the
first circle to the second circle, and moves then on the second circle in cyclically
increasing order, moves then back to the first circle, moves then on the first circle in
cyclically increasing order, until we are back to the first point p.
(i) If one has at least two through-cycles then the positions where one has to
move to the other circle, as well as the order on the cycles lying on just one circle,
are uniquely determined by the annular non-crossing condition.
(ii) In the case of just one through-cycle the order on this is not uniquely deter-
mined, but depends on the choice of a point p on the first circle and a point q on
the second circle of this through-cycle. We can then fix the order by sending p to q,
304 Solutions to Exercises

and the order on this through block as well as the order on all other blocks is then
determined. So we have mn choices, each giving a different permutation.
10. (i) Calculate the free cumulants with the help of the product formula (2.19)
and observe that in both cases there is for each n exactly one contributing pairing in
(2.19); thus κn (s2 , . . . , s2 ) = 1 = κn (cc∗ , . . . , cc∗ ).
(ii) In Example 5.33 (and in Example 5.36) it was shown that κ1,1 (s2 , s2 ) = 1.
(iii) Use the second order version (5.16) of the product formula to see that all
second order moments of cc∗ are zero. It is instructive to do this for the case
κ1,1 (cc∗ , cc∗ ) = 0 and compare this with the calculation of κ1,1 (s2 , s2 ) = 1 in Ex-
ample 5.36. In both cases we have the term corresponding to π1 , whereas it makes
the contribution κ2 (s, s)κ2 (s, s) = 1 in the first case, in the second case its contribu-
tion is κ2 (c, c)κ2 (c∗ , c∗ ) = 0.

12.6 Solutions to exercises in Chapter 6

1. (i) Begin by recalling that every element x ∈ L(G) defines a function on G as

follows: λ (x)δe ∈ `2 (G) and for convenience we call this square summable function
x. If x is in the centre of L(G) then x must be constant on all conjugacy classes;
so if G has the ICC property then every x in the centre of L(G) vanishes on all
conjugacy classes, except possibly the one containing e. Such an x must then be a
scalar multiple of the identity. This shows that if G has the ICC property then L(G)
is a factor. If G does not have the ICC property and X ⊂ G is a finite conjugacy class
not containing e then the indicator function of X is in the centre of L(G) and is not
a scalar multiple of the identity, and thus L(G) is not a factor.
(ii) Suppose we are given σ ∈ Sn with σ 6= e. Then there is k ≤ n such that
σ (k) 6= k. Let τm = (k, m) for m > n. Note that τm σ τm−1 moves m but fixes all l > m.
Thus {τm σ τm−1 }m is infinite.
2. It suffices to consider the case F2 . A reduced word in F2 can be written g =
aεi11 · · · aεinn where whenever ir = ir+1 for 1 ≤ r < n we must have εr = εr+1 . Let us
−m
show that the conjugacy class of g is infinite. If g is a power of a1 then am 2 ga2 are all
distinct for m = 1, 2, 3, . . . and hence g has an infinite conjugacy class, likewise if g
is a power of a2 . So now we can suppose that there is k such that i1 = · · · = ik 6= ik+1 .
Let hm = amε −1
i1 . We claim that all hm ghm (m ≥ 1) are distinct. If we could find r < s
1

−1 −1 −1
with hr ghr = hs ghs then g = h p gh p with p = s − r. Let us consider the reduced
−ε1
form of h p gh−1 p . The p copies of ai1 on the right of h p gh−1
p cannot cancel off
aik+1 because i1 = ik 6= ik+1 . Thus in reduced form h p gh−1
εk+1
p starts with the letter aεi11
repeated p + k times. However in reduced form g starts with the letter ai1 repeated
k times. Hence, in reduced form, the words in {hm gh−1 m }m are distinct, and thus the
conjugacy class of g is infinite.
4. We shall just compute tr ⊗ ϕ(xn ) directly using the moment-cumulant formula.
For this calculation we will need to rewrite
12.6 Solutions to exercises in Chapter 6 305

1 s1 c 1 a11 a12
x= √ ∗ as √ .
2 c s2 2 a21 a22
Then
2
1
tr ⊗ ϕ(xn ) = ϕ(Tr(xn )) = 2−(1+n/2) ∑ ϕ(ai1 i2 · · · ain i1 ).
2 i1 ,...,in =1

Now given i1 , . . . , in

ϕ(ai1 i2 · · · ain i1 ) = ∑ κπ (ai1 i2 , . . . , ain i1 ) = ∑ κπ (ai1 i2 , . . . , ain i1 )

π∈NC(n) π∈NC2 (n)

because (i) all mixed cumulants vanish so each block of π must consist either of all
a11 ’s or all a22 ’s or a mixture of a21 and a12 , (ii) the only non-zero cumulant of aii is
κ2 so any blocks that contain aii must be κ2 , and (iii) the only non-zero ∗-cumulants
of ai j (for i 6= j) are κ2 (ai j , a∗i j ) and κ2 (a∗i j , ai j ). Thus we have a sum over pairings.
Moreover if π ∈ NC2 (n) is a pairing and if (r, s) is a pair of π then κπ (ai1 i2 , . . . , ain i1 )
will be 0 unless air ir+1 = (ais is+1 )∗ i.e. ir = is+1 and is = ir+1 . For such a π the con-
tribution is 1 since s1 , s2 , and c all have variance 1. Hence, letting γ = (1, 2, 3, . . . , n)
as in Chapter 1 we have ϕ(ai1 i2 · · · ain i1 ) = |{π ∈ NC2 (n) | i = i ◦ γ ◦ π}|. Thus

tr ⊗ ϕ(xn ) = 2−(1+n/2) ∑ |{i : [n] → [2] | i = i ◦ γ ◦ π}|

π∈NC2 (n)

= 2−(1+n/2) ∑ 2#(γπ) .
π∈NC2 (n)

Now recall from Chapter 1 that for any pairing (interpreted as a permutation in
Sn )
#(π) + #(γπ) + #(γ) = n + 2(1 − g)
and π ∈ NC2 (n) if and only if g = 0. Thus for any π ∈ NC2 (n)

#(γπ) = n + 2 − #(γ) − #(π) = 1 + n/2.

Hence tr ⊗ ϕ(xn ) = |NC2 (n)| is the nth moment of a semi-circular operator.

5. (i) A product of alternating centred elements from A11 , A12 , A21 , A22 can, by
multiplying neighbours from A1 and from A2 , be read as a product of alternating
elements from A1 and A2 ; that those elements are also centred follows from the
freeness of A11 and A12 in A1 and the freeness of A21 and A22 in A2 .
(ii) Compare the remarks after Theorem 4.8
(iii) It is clear that u2 u∗1 is a unitary and, by ∗-freeness of u1 and u2 and the
centredness of u1 and u2 , that ϕ((u2 u∗1 ) p ) = δ0p for any p ∈ Z. For the ∗-freeness
between u2 u∗1 and u1 Au∗1 it suffices to show that alternating products in elements of
the form (u2 u∗1 ) p (p ∈ Z\{0}) and centred elements from u1 Au∗1 are also centred.
But this is clear, since ϕ(u1 au∗1 ) = ϕ(a)ϕ(u1 u∗1 ) = ϕ(a) and thus centred elements
from u1 Au∗1 are of the form u1 au∗1 with centred a.
306 Solutions to Exercises

12.7 Solutions to exercises in Chapter 7

1. Let µ be the distribution of X, then E(eλ X ) = eλ x dµ(x). We notice that for

R
R 0 λx
λ > 0 we have −∞ e dµ(x) < ∞ because the integrand is bounded by 1 and µ is a
probability measure. If E(eλ X ) < ∞ for some λ > 0 we must have 0∞ eλ x dµ(x) < ∞.
R

Now expand eλ x into a power series; for λ ≥ 0 all the terms are positive. Hence
R∞ n
0 x dµ(x) < ∞ for all n. Likewise if for some λ < 0 we have E(eλ X ) < ∞ then for
R0 n
all n, −∞ x dµ(x) < ∞. Hence if E(eλ X ) < ∞ for all |λ | ≤ λ0 then X has moments
of all orders and E(eλ0 |X| ) < ∞. Thus by the dominated convergence theorem λ 7→
E(eλ X ) has a convergent power series expansion in λ with a radius of convergence
of at least λ0 . In fact the proof shows that if there are λ1 < 0 and λ2 > 0 with
E(eλ1 X ) < ∞ and E(eλ2 X ) < ∞ then for all λ1 ≤ λ ≤ λ2 we have E(eλ X ) < ∞ and
we may choose λ0 = min{−λ1 , λ2 }.
3. (i) We have

am+n = log[P(X1 + · · · + Xm+n > (m + n)a)]

≥ log[P(X1 + · · · + Xm > ma and Xm+1 + · · · + Xm+n > na)]
= log[P(X1 + · · · + Xm > ma) · P(Xm+1 + · · · + Xm+n > na)]
= am + an .

(ii) Fix m; for n ≥ m write n = rm + s with 0 ≤ s < m, then

an ram + as rm am as am
≥ = + → .
n n n m n m
(iii) We have
an am an
lim sup ≤ sup ≤ lim inf .
n n m m n n

5. We have learned this statement and its proof from an unpublished manuscript of
Uffe Haagerup.
(i) By using the Taylor series expansion
∞
zn
log(1 − z) = − ∑ ,
n=1 n

which converges for every complex number z 6= 1 with |z| ≤ 1, we derive an expan-
sion for log |s − t|, by substituting s = 2 cos u and t = 2 cos v:

log |s − t| = log |eiu + e−iu − eiv − e−iv |

= log |e−iu (1 − ei(u+v) )(1 − ei(u−v) )|
= log |1 − ei(u+v) | + log |1 − ei(u−v) |
= Re(log(1 − ei(u+v) ) + log(1 − ei(u−v) ))
12.7 Solutions to exercises in Chapter 7 307
∞
1
= −Re ∑ (ein(u+v) + ein(u−v) )
n=1 n
∞
2
= −Re ∑ einu cos nv
n=1 n
∞
2
=−∑ cos nu cos nv
n=1 n
∞
1
=−∑ Cn (s)Cn (t).
n=1 2n

Then one has to show (which is not trivial) that the convergence is strong enough
to allow term-by-term integration.
(ii) For this one has to show that

Z +2 2,
 n=0
Cn (t)dµW (t) = −1, n = 2 .
−2 
0, otherwise


6. (i) Let us first see that the mapping T ⊗ IN : (MNsa )n → (MNsa )n transports mi-
crostates for (x1 , . . . , xn ) into microstates for (y1 , . . . , yn ). Namely, let A = (A1 , . . . , An ) ∈
Γ (x1 , . . . , xn ; N, r, ε) be a microstate for (x1 , . . . , xn ) and consider B = (B1 , . . . , Bn ) :=
(T ⊗ IN )A, i.e., Bi = ∑nj=1 ti j A j . Then we have for each k ≤ r:

|τ(yi1 · · · yik )−tr(Bi1 · · · Bik )|

≤ (cn)r ε,

where c := maxi, j {|ti j |}. Thus we have shown

(T ⊗ IN )(Γ (x1 , . . . , xn ; N, r, ε)) ⊆ Γ (y1 , . . . , yn ; N, r, (cn)r ε).

2
The Lebesgue measure Λ on MN (C)nsa ' RnN scales under the linear map T ⊗ IN
as

Λ [(T ⊗ IN )(Γ (x1 , . . . , xn ; N, r, ε))] = Λ [Γ (x1 , . . . , xn ; N, r, ε)] · | det(T ⊗ IN )|

2
= Λ [Γ (x1 , . . . , xn ; N, r, ε)] · | det T |N .

This yields then for the free entropies the estimate

308 Solutions to Exercises

χ (y1 , . . . , yn ) ≥ χ (x1 , . . . , xn ) + log | det T |.

In order to get the reverse inequality, we do the same argument for the inverse
map, (x1 , . . . , xn ) = T −1 (y1 , . . . , yn ), which gives

χ (x1 , . . . , xn ) ≥ χ (y1 , . . . , yn ) + log | det T −1 | = χ (y1 , . . . , yn ) − log | det T |.

(ii) If (x1 , . . . , xn ) are linear dependent there are (α1 , . . . , αn ) ∈ Cn \ {0} such that
0 = α1 x1 + · · · + αn xn . Since the xi are selfadjoint, the αi can be chosen real. Without
restriction we can assume that α1 6= 0.
Now consider T = In + β T 0 , where T 0 = (ti j )ni, j=1 with ti j = δ1i α j . Then T is
invertible for any β 6= −α1−1 and det T = 1 + α1 β .
On the other hand we also have T (x1 , . . . , xn ) = (x1 , . . . , xn ). Hence, by (i),

χ (x1 , . . . , xn ) = χ (x1 , . . . , xn ) + log | det T | = χ (x1 , . . . , xn ) + log |1 + α1 β |.

Since β is arbitrary and χ ∈ [−∞, +∞), we must have χ (x1 , . . . , xn ) = −∞.

12.8 Solutions to exercises in Chapter 8

2. We have
k
∂ j (Xi1 · · · Xik )(X j ⊗ 1) = ∑ δ j,il Xi1 · · · Xil ⊗ Xil+1 · · · Xik
l=1

where we have adopted the convention that we have Xi1 · · · Xil ⊗ Xil+1 · · · Xik =
Xi1 · · · Xik ⊗ 1 when l = k. Similarly

k
(1 ⊗ X j )∂ j (Xi1 · · · Xik ) = ∑ δ j,il Xi1 · · · Xil−1 ⊗ Xil · · · Xik ,
l=1

and we have adopted the convention that Xi1 · · · Xil−1 ⊗ Xil · · · Xik = 1 ⊗ Xi1 · · · Xik
when l = 1. Thus

∑ ∂ j (Xi1 · · · Xik )(X j ⊗ 1) − (1 ⊗ X j )∂ j (Xi1 · · · Xik )

j
k
= ∑ ∑ δ j,il Xi1 · · · Xil ⊗ Xil+1 · · · Xik − δ j,il Xi1 · · · Xil−1 ⊗ Xil · · · Xik
j l=1
k
= ∑ Xi1 · · · Xil ⊗ Xil+1 · · · Xik − Xi1 · · · Xil−1 ⊗ Xil · · · Xik
l=1
= Xi1 · · · Xik ⊗ 1 − 1 ⊗ Xi1 · · · Xik

because ∑ j δ j,il = 1 for all l.

12.8 Solutions to exercises in Chapter 8 309

3. (i) By linearity we are reduced to checking identities on monomials. So consider

p = xi1 · · · xik , hence p∗ = xik · · · xi1 . Then

k k
∂i p = ∑ δi,il xi1 · · · xil−1 ⊗ xil+1 · · · xik , ∂i p ∗ = ∑ δi,il xik · · · xil+1 ⊗ xil−1 · · · xi1 .
l=1 l=1

Thus
k
hξi , pi = h∂i∗ (1 ⊗ 1), pi = h1 ⊗ 1, ∂i pi = ∑ δi,il τ(xi1 · · · xil−1 )τ(xil+1 · · · xik )
l=1

and
k
hξi , p∗ i = h∂i∗ (1 ⊗ 1), p∗ i = h1 ⊗ 1, ∂i p∗ i ∑ δi,il τ(xik · · · xil+1 )τ(xil−1 · · · xi1 ).
l=1

(ii) Consider again a monomial p = xi1 · · · xik as above. Then

k
(∂i p∗ )∗ = ∑ δi,il xil+1 · · · xik ⊗ xi1 · · · xil−1 .
l=1

(iii) First we note that for r ∈ Chx1 , . . . , xn i we have by the Leibniz rule

hp · ∂i∗ (1 ⊗ 1) · q, ri = h∂i∗ (1 ⊗ 1), p∗ rq∗ i = h1 ⊗ 1, ∂i (p∗ rq∗ )i

= h1 ⊗ 1, ∂i (p∗ ) · 1 ⊗ rq∗ i + h1 ⊗ 1, p∗ ⊗ 1 · ∂i r · 1 ⊗ q∗ i + h1 ⊗ 1, p∗ r ⊗ 1 · ∂i (q∗ )i.

The first term becomes h(id ⊗τ)(∂i p)·q, ri, the middle term becomes h∂i∗ (p⊗q), ri,
and the last term becomes hp · (τ ⊗ id)(∂i q), ri.
(iv) We write p = xi1 · · · xik and q = x j1 · · · x jn . Then using the expansion in (i) we
have

hid ⊗ τ(∂i p), id ⊗ τ(∂i q)i

k n
= ∑ ∑ δi,il δi, jm τ[τ(x jn · · · x jm+1 )x jm−1 · · · x j1 xi1 · · · xil−1 τ(xil+1 · · · xik )].
l=1 m=1

h1 ⊗ ξi , ∂i (p∗ ) · 1 ⊗ qi
k n
= ∑ ∑ δi,il δi, jm τ[τ(x jn · · · x jm+1 )x jm−1 · · · x j1 xi1 · · · xil−1 τ(xil+1 · · · xik )]
l=1 m=1
k l−1
+∑ ∑ δi,il δi,ir τ[x jn · · · x j1 xi1 · · · xir−1 τ(xir+1 · · · xil−1 )τ(xil+1 · · · xik )]
l=1 r=1
310 Solutions to Exercises

and

hξi ⊗ 1, ∂i (p∗ ) · 1 ⊗ qi
k k
= ∑ ∑ δi,il δi,ir τ[x jn · · · x j1 xi1 · · · xil−1 τ(xil+1 · · · xir−1 )τ(xir+1 · · · xik )].
l=1 r=l+1

(v) Check that for p, r ∈ Chx1 , . . . , xn i we have h(id ⊗ τ)(∂i r), pi = hr, ∂i∗ (p ⊗ 1)i.
This shows that Chx1 , . . . , xn i is in the domain of the adjoint of (id ⊗ τ) ◦ ∂i , hence
this adjoint has a dense domain and thus (id ⊗ τ) ◦ ∂i is itself closable.
5. (i) Since we assumed that we have p ∈ L3 (R) we have that hε and H(p) are in
L3 (R) and khε − H(p)k3 → 0. Thus by Hölder’s inequality
Z
|hε (s) − H(p)(s)|2 p(s) ds ≤ khε − H(p)k23 kpk3 .

If f is a polynomial, it is bounded on the support of p which is contained in

[−kxk, kxk]. Thus
Z
| f (s)|2 |hε (s) − H(p)(s)|2 p(s) ds → 0

as ε → 0+ . Thus
Z Z
2π f (s)hε (s)p(s) ds → 2π f (s)H(p)(s)p(s) ds = 2π τ( f (x)H(p)(x)) = τ( f (x)ξ ).

For s and t real and f a polynomial we have for ε > 0

(s − t)( f (s) − f (t)) f (s) − f (t)
≤
(s − t)2 + ε 2 s−t

and the right-hand side is bounded on compact subsets of R2 . Thus

(s − t)( f (s) − f (t))

ZZ
lim p(s)p(t) ds dt
ε→0+ (s − t)2 + ε 2
f (s) − f (t)
ZZ
= p(s)p(t) ds dt = τ ⊗ τ(∂ f (x)).
s−t
On the other hand
(s − t)( f (s) − f (t))
ZZ
p(s)p(t) ds dt
(s − t)2 + ε 2
(s − t) f (s) (s − t) f (t)
ZZ ZZ
= p(s)p(t) ds dt − p(s)p(t) ds dt
(s − t)2 + ε 2 (s − t)2 + ε 2
(s − t) f (s) (t − s) f (s)
ZZ ZZ
= p(s)p(t) ds dt − p(s)p(t) ds dt
(s − t)2 + ε 2 (s − t)2 + ε 2
12.8 Solutions to exercises in Chapter 8 311

(s − t) f (s)
ZZ
= 2 p(s)p(t) ds dt
(s − t)2 + ε 2
s−t
Z Z
= 2 f (s)p(s) p(t) dt ds
(s − t)2 + ε 2
Z
= 2π f (s)p(s)hε (s) ds
→ τ( f (x)ξ ) for ε → 0.

Thus τ( f (x)ξ ) = τ ⊗ τ(∂ f (x)) so ξ satisfies the conjugate relation. Since ξ is a

function of x, ξ is the conjugate variable for x.
(ii) Let γ be the curve {iε + Reiθ | 0 ≤ θ ≤ π} ∪ {x + iε | −R ≤ x ≤ R} ⊂ C+ . As
G is analytic on C+ we have that the integral γ G(z)3 dz = 0. Thus
R

Z R
Z π
G(x + iε)3 dx = −i G(iε + Reiθ )3 Reiθ dθ .
−R 0

Now for c = kxk and for R > c we have

Z c Z c
p(t) 1 1
|G(iε + Reiθ )| ≤ iθ
dt ≤ p(t) dt = .
−c |iε + Re − t| R−c −c R−c

Hence
Z R Z π π
G(x + iε)3 dx = G(iε + Reiθ )3 Reiθ dθ ≤ →0 as R → ∞.

(R − c)3

−R 0

Thus G(x + iε)3 dx = 0. By taking the imaginary part of this equality we get that
R

Z Z
hε (s)2 p(s) ds = 3 p(s)3 ds.

6. We begin by extending τ to vectors in L2 by setting τ(η) = hη, 1i. If η ∈ M

then hη, 1i = τ(1∗ η) so the two ways of computing τ(η) agree. If π is any partition
we define τπ (η, a2 , . . . , an ) to be the product, along the blocks of π, of τ applied to
the product of elements of each block. One block will contain η, but η is the only
argument that is unbounded and it is in L2 so all factors are defined and finite. We
can also use the cumulant-moment formula

κn (η, a2 , . . . , an ) = ∑ µ(π, 1n )τπ (η, a2 , . . . , an )

π∈NC(n)

to extend the definition of κn (η, a2 , . . . , an ); κπ (η, a2 , . . . , an ) is then defined as the

product of cumulants along the blocks of π.
Let us first show that (ii) implies (i). If we have (ii), then κπ (ξi , xi(1) , . . . , xi(m) )
is only different from 0 if 1 belongs to a block of size 2. This means that the only
312 Solutions to Exercises

contributing partitions in the moment-cumulant formula for τ(ξi xi(1) · · · xi(m) ) are of
the form π = {(1, k)} ∪ σ1 ∪ σ2 , where σ1 is a non-crossing partition of [1, k − 1]
and σ2 is a non-crossing partition of [k + 1, m]. Then we have

τ(ξi xi(1) · · · xi(m) ) = ∑ κ2 (ξi , xi(k) )κσ1 (xi(1) , . . . , xi(k−1) )κσ2 (xi(k+1) , . . . , xi(m) )
(1,k)∪σ1 ∪σ2
! !
= ∑ κ2 (ξi , xi(k) ) ∑ κσ1 (xi(1) , . . . , xi(k−1) ) ∑ κσ2 (xi(k+1) , . . . , xi(m) )
k σ1 σ2

= ∑ δii(k) τ(xi(1) · · · xi(k−1) )τ(xi(k+1) · · · xi(m) ).

Let us now show that (i) implies (ii). We do this by induction on m. It is clear that
the conjugate relations (8.12) for m = 0 and m = 1 are equivalent to the cumulant
relations for m = 0 and m = 1 from (ii). So remains to consider the cases m ≥ 3.
Assume (i) and that we have already shown the conditions (ii) up to m − 1. We have
to show it for m. By our induction hypothesis we know that in

τ(ξi xi(1) · · · xi(m) ) = ∑ κπ (ξi , xi(1) , . . . , xi(m) )

π∈NC(m+1)

the cumulants involving ξi are either of length 2 or the maximal one, κm+1 (ξi , xi(1) ,
. . . , xi(m) ); hence

τ(ξi xi(1) · · · xi(m) ) = ∑ κπ (ξi , xi(1) , . . . , xi(m) ) + κm+1 (ξi , xi(1) , . . . , xi(m) )
π=(1,k)∪σ1 ∪σ2

= ∑ δii(k) τ(xi(1) · · · xi(k−1) )τ(xi(k+1) · · · xi(m) ) + κm+1 (ξi , xi(1) , . . . , xi(m) ).

Since the first sum gives by our assumption (i) the value τ(ξi xi(1) · · · xi(m) ) it follows
that κm+1 (ξi , xi(1) , . . . , xi(m) ) = 0.
7. By Theorem 8.20 we have to show that κ1 (ξ ) = 0, κ2 (ξ , x1 + x2 ) = 1 and
κm+1 (ξ , x1 + x2 , . . . , x1 + x2 ) = 0 for all m ≥ 2. However, this follows directly from
the facts that ξ is conjugate variable for x1 (hence we have κ1 (ξ ) = 0, κ2 (ξ , x1 ) = 1
and κm+1 (ξ , x1 , . . . , x1 ) = 0 for all m ≥ 2) and that mixed cumulants in {x1 , ξ } and
x2 vanish; for this note that ξ as a conjugate variable is in L2 (x1 ) and the vanishing
of mixed cumulants in free variables goes also over to a situation, where one of the
variables is in L2 .
8. By Theorem 8.20, the condition that for a conjugate system we have ξi =
xi is equivalent to the cumulant conditions: κ1 (xi ) = 0, κ2 (xi , xi(1) ) = δii(1) , and
κm+1 (xi , xi(1) , . . . , xi(m) ) = 0 for m ≥ 2 and all 1 ≤ i, i(1), . . . , i(m) ≤ n. But these are
just the cumulants of a free semi-circular family.
9. Note that in the special case where i 6∈ {i(1), . . . , i(k − 1), i(k + 1), . . . i(m)} we
have
12.8 Solutions to exercises in Chapter 8 313

∂i∗ si(1) · · · si(k−1) ⊗ si(k+1) · · · si(m) = si(1) · · · si(k−1) si si(k+1) · · · si(m) .

This follows by noticing that in this case in the formula (8.6) for the action of ∂i∗
only the first term is different from zero and gives, by also using ∂i∗ (1 ⊗ 1) = si ,
exactly the above result.
Thus we get in the case where all i(1), . . . , i(m) are different
n n m
∑ ∂i∗ ∂i si(1) · · · si(m) = ∑ ∑ δii(k) ∂i∗ si(1) · · · si(k−1) ⊗ si(k+1) · · · si(m)
i=1 i=1 k=1
m
∗
= ∑∂i(k) si(1) · · · si(k−1) ⊗ si(k+1) · · · si(m)
k=1
m
= ∑ si(1) · · · si(k−1) si(k) si(k+1) · · · si(m)
k=1
= msi(1) · · · si(k−1) si(k) si(k+1) · · · si(m) .

Thus we have ∑ni=1 ∂i∗ ∂i p = mp.

12. (ii) We have to show that τ(ξ p(x)) = τ ⊗ τ(∂ p(x)) for all p(x) ∈ Chxi. By
linearity, it suffices to treat the cases p(x) = Um (x) for all m ≥ 0. So fix such an m.
Thus we have to show

∑ αn τ(Cn (x)Um (x)) = τ ⊗ τ(∂Um (x)).

n≥0

For the left-hand side we have

∑ αn τ(CnUm ) = ∑ αn τ(Un+m ) + τ(Um−n ) + αm+1 τ(U2m+1 )
n n≤m

+ ∑ αn τ(Un+m ) − τ(Un−m−2 )
n≥m+2

= ∑ αn αn+m+1 + αm−n+1 + αm+1 α2m+2
n≤m

+ ∑ αn αn+m+1 − αn−m−1
n≥m+2
= ∑ αn αn+m+1 − ∑ αn αn−m−1 + ∑ αn αm−n+1 .
n n≥m+2 n≤m

But the first two sums cancel and thus we remain with exactly the same as in
m−1 m−1
τ ⊗ τ(∂Um (x)) = ∑ τ(Uk )τ(Um−k−1 ) = ∑ αk+1 αm−k .
k=0 k=0

For the relevance of this in the context of Schwinger-Dyson equations, see [131].
314 Solutions to Exercises

12.9 Solutions to exercises in Chapter 9

2. We have

EB (xd1 · · · xdn−1 x) = ∑ κπB (xd1 , . . . , xdn−1 , x).

π∈NC(n)

Note that the assumption implies that also all κπB (xd1 , . . . , xdn−1 , x) for π ∈ NC(n)
are in D. Applying ED to the equation above gives thus

ED (xd1 · · · xdn−1 x) = ∑ κπB (xd1 , . . . , xdn−1 , x).

π∈NC(n)

If we compare this with the moment-cumulant formula on the D-level,

ED (xd1 · · · xdn−1 x) = ∑ κπD (xd1 , . . . , xdn−1 , x),

π∈NC(n)

then we get the equality of the B-valued and the D-valued cumulants by induction.

12.10 Solutions to exercises in Chapter 10

2. Note that in general

H+ (Mn (C)) = B ∈ Mn (C) | ∃ε > 0 : Im(B) ≥ ε1

= B ∈ Mn (C) | Im(B) is positive definite .

(i) Recall that any self-adjoint matrix

αβ
∈ M2 (C)
β γ

is positive definite if and only if α > 0 and αγ − |β |2 > 0.

Now, for
1

b b
B = 11 12 ∈ M2 (C) we have Im(b) = 1
Im(b11 ) 2i (b12 − b21 ) .
b21 b22 2i 21 − b12 )
(b Im(b22 )

Hence Im(B) is positive definite, if and only if

1
Im(b11 ) > 0 and Im(b11 )Im(b22 ) − |b12 − b21 |2 > 0.
4
(ii) Assume that λ ∈ C is an eigenvalue of B ∈ H+ (Mn (C)). We want to show
that Im(λ ) > 0. Let η ∈ Cn with kηk = 1 be a corresponding eigenvector of B, i.e.
Bη = λ η. Since Im(B) is positive definite, it follows
12.11 Solutions to exercises in Chapter 11 315

1 1
hBη, ηi − hB∗ η, ηi =

0 < hIm(B)η, ηi = hBη, ηi − hη, Bηi = Im(λ ),
2i 2i
as desired.
The converse is not true as shown by the following counterexample for n = 2.
Take a matrix of the form
λ1 ρ
B=
0 λ2
with Im(λ1 ) > 0, Im(λ2 ) > 0 and some ρ ∈ C. B satisfies the condition that
all its eigenvalues belong to the upper half-plane C+ . However, if in addition
|ρ| ≥ 2 Im(λ1 )Im(λ2 ) holds, it cannot belong to H+ (M2 (C)), since the second
p

characterizing condition of H+ (M2 (C)), Im(b11 )Im(b22 ) > |b12 − b21 |2 /4, is vio-
lated.

12.11 Solutions to exercises in Chapter 11

1. We shall show that while ∇2 log |z| = 0 as a function, ∇2 log |z| = 2πδ0 as a
distribution, where δ0 is the distribution which evaluates a test function at (0, 0).
1
In other words, G(z, w) = 2π log |z − w| is the Green function of the the Laplacian
on R2 . To see what this means first note that by writing log |z| dxdy = r log r drdθ ,
where (r, θ ) are polar coordinates, we see that log |z| is a locally integrable function
on R2 . Thus it determines (see Rudin [152, Ch. 6]) a distribution
ZZ p
f 7→ f (x, y) log x2 + y2 dxdy
R2

where f is a test function, i.e. a C∞ -function with compact support. By definition,

the Laplacian of this distribution, ∇2 log |z|, is the distribution
ZZ p
f 7→ ∇2 f (x, y) log x2 + y2 dxdy.
R2

Hence our claim is that for a test function f

ZZ p
∇2 f (x, y) log x2 + y2 dxdy = 2π f (0, 0).
R2

We denote the gradient of f by ∇ f = ( ∂∂ xf , ∂∂ ff ) and the divergence of a vector field F

p p
by ∇ · F. Let Dr = {(x, y) | x2 + y2 < r}, and Dr,R = {(x, y) | r < x2 + y2 < R}.
We proceed in three steps.
(i) Let f , g be C2 -functions on R2 , then

∂ f ∂g ∂ 2g ∂ f ∂ g ∂ 2g
∇ · f ∇g = +f 2+ + f 2 = f ∇2 g + ∇ f · ∇g
∂x ∂x ∂x ∂y ∂y ∂y
so that
f ∇2 g − g∇2 f = ∇ · ( f ∇g − g∇ f ).
316 Solutions to Exercises
p
(ii) Let g(x, y) = log x2 + y2 and f be a test function. Choose R large enough
so that supp( f ) ⊂ DR . We show that for all 0 < r < R
ZZ p Z 1 ∂f
∇2 f (x, y) log x2 + y2 dxdy = f − log r ds.
Dr,R ∂ Dr r ∂r

Let D be an open connected region in R2 and ∂ D its boundary. Suppose that ∂ D

is the union of a finite number of Jordan curves which do not intersect each other.
Green’s theorem asserts that for a vector field F
ZZ Z
∇ · F(x, y) dxdy = F · n ds
D ∂D

where n is the outward pointing unit normal of ∂ D. So in particular, if we let F = ∇ f

we have ZZ Z
∇2 f (x, y) dxdy = ∇ f · n ds.
D ∂D
By assumption both f and ∇ f vanish on ∂ DR , and by our earlier observation that
log |z| is harmonic, ∇2 g = 0 on Dr,R . Hence
ZZ p ZZ
∇2 f (x, y) log x2 + y2 dxdy = − ( f ∇2 g − g∇2 f ) dxdy
Dr,R Dr,R
ZZ
=− ∇ · ( f ∇g − g∇ f ) dxdy
Dr,R
Z Z
= ( f ∇g − g∇ f ) · n ds − ( f ∇g − g∇ f ) · n ds
∂ Dr ∂ DR
Z
= ( f ∇g − g∇ f ) · n ds.
∂ Dr
p
Now ∇g = (x, y)/(x2 +y2 ) and on ∂ Dr we have n = (x, y)/ x2 + y2 , so ∇g·n = 1/r.
Also g = log r on ∂ Dr , and on Dr , ∇ f · n = ∂∂ rf , by the chain rule. Thus

1 ∂f
ZZ p Z Z
∇2 f (x, y) log x2 + y2 dxdy = f ds − log r ds.
Dr,R r ∂ Dr ∂ Dr ∂ r

(iii) Finally we show that for a test function f

ZZ p
∇2 f (x, y) log x2 + y2 dxdy = 2π f (0, 0).
R2

above let us parameterize ∂ Dr with x(θ ) = r cos θ and

To calculate the integrals p
y(θ ) = r sin θ . Then ds = x0 (θ )2 + y0 (θ )2 dθ = r dθ . So
Z 2π
1
Z
f ds = f (r cos θ , r sin θ ) dθ
r ∂ Dr 0
12.11 Solutions to exercises in Chapter 11 317

which converges to 2π f (0, 0) as r → 0. Also

Z 2π
∂f ∂f
Z
log r ds = r log r (r cos θ , r sin θ ) dθ .
∂ Dr ∂r 0 ∂r

Now as r → 0, 02π ∂f
converges to 2π ∂∂ rf (0, 0) and r log r con-
R
∂ r (r cos θ , r sin θ ) dθ
verges to 0. Thus
ZZ p ZZ p
∇2 f (x, y) log x2 + y2 dxdy = ∇2 f (x, y) log x2 + y2 dxdy
R2 DR
ZZ p
= lim ∇2 f (x, y) log x2 + y2 dxdy
r→0 Dr,R

= 2π f (0, 0)

as claimed.
4. Let us put
0 a 0λ
A := , Λ := .
a∗ 0 λ̄ 0
Note that both A and Λ are selfadjoint and have with respect to tr ⊗ τ the distribu-
tions µ̃|a| , and (δα + δ−α )/2, respectively, and that A −Λ has the distribution µ̃|a−λ | .
(It is of course important that we are in a tracial setting, so that aa∗ and a∗ a have the
same distribution.)
It remains to show that A and Λ are free with respect to tr ⊗ τ. For this note that
the kernel of tr ⊗ τ on the unital algebra generated by A is spanned by matrices of
the form
(aa∗ )k−1 a
∗ k
(aa ) − τ((aa∗ )k )

0 0
or
(a∗ a)k−1 a∗ 0 0 (a∗ a)k − τ((a∗ a)k )
(12.7)
for some k ≥ 1; whereas the kernel of tr ⊗ τ on the algebra generated by Λ is just
spanned by the off-diagonal matrices of the form

0 |λ |k λ

= |λ |k Λ
|λ |k λ̄ 0

for some k ≥ 1. Hence we have to check that we have

tr ⊗ τ[A1Λ A2Λ · · · AnΛ ] = 0 and tr ⊗ τ[A1Λ A2Λ · · · An ] = 0,

for all n and all choices of A1 , . . . , An from the collection (12.7). Multiplication with
Λ has on the Ai the effect that we get matrices from the collection
∗ k−1
(aa∗ )k − τ((aa∗ )k )

(aa ) a 0 0
or .
0 (a∗ a)k−1 a∗ (a∗ a)k − τ((a∗ a)k ) 0
(12.8)
318 Solutions to Exercises

Hence we have to see that whenever we multiply matrices from the collection (12.8)
in any order we get only matrices where all entries vanish under the application of
τ. Let us denote the non-trivial entries in the matrices from (12.8) as follows.

pk11 := (aa∗ )k−1 a, pk12 := (aa∗ )k − τ((aa∗ )k ),

pk21 := (a∗ a)k − τ((a∗ a)k ), pk22 := (a∗ a)k−1 a∗ .
k
With this notation we have to show that τ(pki11i2 pki22i3 · · · pin−1
n−1 kn
in pin in+1 ) = 0 for all n, k ≥
1 and all i1 , . . . , in+1 ∈ {1, 2}. Now we use the fact that an R-diagonal element a
has the property that its ∗-distribution is invariant under the multiplication with a
free Haar unitary; this means we can replace a by au, where u is a Haar unitary
which is ∗-free from a. But then our operators pkij go over to pk11 u, pk12 , u∗ pk21 u, and
u∗ pk22 . If we multiply those elements as required then we always get words which are
alternating in factors from the pkij and {u, u∗ }; all those factors are centred, hence,
by the ∗-freeness between a and u, the whole product is centred.
For more details, see also [136].
References

1. Lars Aagaard and Uffe Haagerup. Moment formulas for the quasi-nilpotent
DT-operator. Internat. J. Math., 15(6):581–628, 2004.
2. Gernot Akemann, Jinho Baik, and Philippe Di Francesco, editors. The Oxford
handbook of random matrix theory. Oxford University Press, 2011.
3. Naum I. Akhiezer. The classical moment problem and some related questions
in analysis. Hafner Publishing Co., New York, 1965.
4. Naum I. Akhiezer and Izrail’ M. Glazman. Theory of linear operators in
Hilbert space. Vol. II. Pitman, Boston, 1981.
5. Greg W. Anderson. Convergence of the largest singular value of a polynomial
in independent Wigner matrices. Ann. Probab., 41(3B):2103–2181, 2013.
6. Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to ran-
dom matrices, volume 118 of Cambridge Studies in Advanced Mathematics.
Cambridge University Press, Cambridge, 2010.
7. Greg W. Anderson and Ofer Zeitouni. A CLT for a band matrix model. Probab.
Theory Relat. Fields, 134(2):283–338, 2006.
8. Michael Anshelevich. Free martingale polynomials. J. Funct. Anal.,
201(1):228–261, 2003.
9. Michael V. Anshelevich, Serban T. Belinschi, Marek Bożejko, and Franz
Lehner. Free infinite divisibility for q-Gaussians. Math. Res. Lett., 17(5):905–
916, 2010.
10. Octavio Arizmendi, Takahiro Hasebe, Franz Lehner, and Carlos Vargas. Rela-
tions between cumulants in noncommutative probability. Adv. Math., 282:56–
92, 2015.
11. Octavio Arizmendi, Takahiro Hasebe, and Noriyoshi Sakuma. On the law of
free subordinators. ALEA, Lat. Am. J. Probab. Math. Stat., 10(1):271–291,
2013.
12. Octavio Arizmendi, Ion Nechita, and Carlos Vargas. On the asymptotic dis-
tribution of block-modified random matrices. J. Math. Phys., 57(1):015216,
2016.

319
320 References

13. David H. Armitage and Stephen J. Gardiner. Classical potential theory. Lon-
don: Springer, 2001.
14. Zhidong Bai and Jack W. Silverstein. CLT for linear spectral statistics of
large-dimensional sample covariance matrices. Ann. Probab., 32(1A):553–
605, 2004.
15. Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional
random matrices. Springer Series in Statistics. Springer, New York, second
edition, 2010.
16. Teodor Banica, Serban Teodor Belinschi, Mireille Capitaine, and Benoit
Collins. Free Bessel laws. Canad. J. Math, 63(1):3–37, 2011.
17. Teodor Banica and Roland Speicher. Liberation of orthogonal Lie groups.
Adv. Math., 222(4):1461–1501, 2009.
18. Serban T Belinschi. Complex Analysis Methods In Noncommutative Proba-
bility. PhD thesis, Indiana University, 2005.
19. Serban T Belinschi. The Lebesgue decomposition of the free additive convo-
lution of two probability distributions. Probab. Theory Related Fields, 142(1-
2):125–150, 2008.
20. Serban T. Belinschi and H. Bercovici. A property of free entropy. Pacific J.
Math., 211(1):35–40, 2003.
21. Serban T Belinschi and Hari Bercovici. A new approach to subordination
results in free probability. J. Anal. Math., 101(1):357–365, 2007.
22. Serban T Belinschi, Marek Bożejko, Franz Lehner, and Roland Speicher. The
normal distribution is -infinitely divisible. Adv. Math., 226(4):3677–3698,
2011.
23. Serban T Belinschi, Tobias Mai, and Roland Speicher. Analytic subordina-
tion theory of operator-valued free additive convolution and the solution of
a general random matrix problem. J. Reine Angew. Math., published online:
2015-04-12.
24. Serban T Belinschi, Piotr Śniady, and Roland Speicher. Eigenvalues of non-
hermitian random matrices and Brown measure of non-normal operators: Her-
mitian reduction and linearization method. arXiv preprint arXiv:1506.02017,
2015.
25. Serban T. Belinschi, Roland Speicher, John Treilhard, and Carlos Vargas.
Operator-valued free multiplicative convolution: analytic subordination the-
ory and applications to random matrix theory. Int. Math. Res. Not. IMRN,
2015(14):5933–5958, 2015.
26. Gérard Ben Arous and Alice Guionnet. Large deviations for Wigner’s law
and Voiculescu’s non-commutative entropy. Probab. Theory Related Fields,
108(4):517–542, 1997.
27. Florent Benaych-Georges. Taylor expansions of R-transforms: application to
supports and moments. Indiana Univ. Math. J., 55(2):465–481, 2006.
28. Florent Benaych-Georges. Rectangular random matrices, related convolution.
Probab. Theory Related Fields, 144(3-4):471–515, 2009.
References 321

29. Hari Bercovici and Vittorino Pata. Stable laws and domains of attraction in
free probability theory. Ann. of Math. (2), 149:1023–1060, 1999.
30. Hari Bercovici and Dan-Virgil Voiculescu. Lévy-Hinčin type theorems for
multiplicative and additive free convolution. Pac. J. Math., 153(2):217–248,
1992.
31. Hari Bercovici and Dan-Virgil Voiculescu. Free convolution of measures with
unbounded support. Indiana Univ. Math. J., 42(3):733–773, 1993.
32. Hari Bercovici and Dan-Virgil Voiculescu. Superconvergence to the central
limit and failure of the Cramér theorem for free random variables. Probab.
Theory Related Fields, 103(2):215–222, 1995.
33. Philippe Biane. Some properties of crossings and partitions. Discrete Math.,
175(1-3):41–53, 1997.
34. Philippe Biane. Processes with free increments. Math. Z., 227(1):143–174,
1998.
35. Philippe Biane. Representations of symmetric groups and free probability.
Adv. Math., 138(1):126–181, 1998.
36. Philippe Biane. Entropie libre et algèbres d’opérateurs. Séminaire Bourbaki,
43:279–300, 2002.
37. Philippe Biane. Free probability and combinatorics. Proc. ICM 2002, Beijing,
Vol. II:765–774, 2002.
38. Philippe Biane, Mireille Capitaine, and Alice Guionnet. Large deviation
bounds for matrix Brownian motion. Invent. Math., 152(2):433–459, 2003.
39. Philippe Biane and Franz Lehner. Computation of some examples of Brown’s
spectral measure in free probability. Colloq. Math., 90(2):181–211, 2001.
40. Philippe Biane and Roland Speicher. Stochastic calculus with respect to free
Brownian motion and analysis on Wigner space. Probab. Theory Related
Fields, 112(3):373–409, 1998.
41. Patrick Billingsley. Probability and Measure. Wiley Series in Probability and
Mathematical Statistics. John Wiley & Sons, Inc., New York, third edition,
1995.
42. Charles Bordenave and Djalil Chafaı̈. Around the circular law. Probab. Surv.,
9:1–89, 2012.
43. Marek Bożejko. On Λ (p) sets with minimal constant in discrete noncommu-
tative groups. Proc. Amer. Math. Soc., 51(2):407–412, 1975.
44. Michael Brannan. Approximation properties for free orthogonal and free uni-
tary quantum groups. J. Reine Angew. Math., 2012(672):223–251, 2012.
45. Edouard Brézin, Claude Itzykson, Giorgio Parisi, and Jean-Bernard Zuber.
Planar diagrams. Comm. Math. Phy., 59(1):35–51, 1978.
46. Lawrence G. Brown. Lidskiı̆’s theorem in the type II case. In Geometric
methods in operator algebras (Kyoto, 1983), volume 123 of Pitman Res. Notes
Math. Ser., pages 1–35. Longman Sci. Tech., Harlow, 1986.
47. Thierry Cabanal-Duvillard. Fluctuations de la loi empirique de grandes matri-
ces aléatoires. Ann. Inst. H. Poincaré Probab. Statist., 37(3):373–402, 2001.
322 References

48. Mireille Capitaine and Catherine Donati-Martin. Strong asymptotic freeness

for Wigner and Wishart matrices. Indiana Univ. Math. J., 56(2):767–803,
2007.
49. Donald I. Cartwright and Paolo M. Soardi. Random walks on free products,
quotients and amalgams. Nagoya Math. J., 102:163–180, 1986.
50. Guillame Cébron, Antoine Dahlqvist, and Camille Male. Universal construc-
tions for spaces of traffics. ArXiv e-prints, January 2016.
51. Ian Charlesworth, Brent Nelson, and Paul Skoufranis. Combinatorics of bi-
freeness with amalgamation. Comm. Math. Phy., 338(2):801–847, 2015.
52. Ian Charlesworth and Dimitri Shlyakhtenko. Free entropy dimension and reg-
ularity of non-commutative polynomials. J. Funct. Anal., 271(8):2274–2292,
2016.
53. Gennadii P. Chistyakov and Friedrich Götze. Limit theorems in free probabil-
ity theory. I. Ann. Probab., pages 54–90, 2008.
54. Gennadii P. Chistyakov and Friedrich Götze. The arithmetic of distributions
in free probability theory. Open Math. J., 9(5):997–1050, 2011.
55. Kai Lai Chung. A Course in Probability Theory. Academic Press, Inc., San
Diego, CA, third edition, 2001.
56. Paul M. Cohn. Free ideal rings and localization in general rings. Cambridge:
Cambridge University Press, 2006.
57. Benoı̂t Collins. Moments and cumulants of polynomial random variables on
unitary groups, the Itzykson-Zuber integral, and free probability. Int. Math.
Res. Not. IMRN, 2003(17):953–982, 2003.
58. Benoı̂t Collins and Camille Male. Liberté asymptotique forte pour les ma-
trices de Haar et les matrices déterministes. Ann. Sci. Éc. Norm. Supér. (4),
47(1):147–163, 2014.
59. Benoı̂t Collins, James A. Mingo, Piotr Śniady, and Roland Speicher. Second
order freeness and fluctuations of random matrices. III. Higher order freeness
and free cumulants. Doc. Math., 12:1–70, 2007.
60. Benoı̂t Collins and Piotr Śniady. Integration with respect to the Haar measure
on unitary, orthogonal and symplectic group. Comm. Math. Phy., 264(3):773–
795, 2006.
61. Robert Cori. Un code pour les graphes planaires et ses applications.
Astérisque, No. 27. Société Mathématique de France, Paris, 1975.
62. Romain Couillet and Mérouane Debbah. Random matrix methods for wireless
communications. Cambridge University Press, Cambridge, 2011.
63. Yoann Dabrowski. A note about proving non-Γ under a finite non-microstates
free Fisher information assumption. J. Funct. Anal., 258(11):3662–3674,
2010.
64. Amir Dembo and Ofer Zeitouni. Large deviations techniques and applica-
tions, volume 38 of Stochastic Modelling and Applied Probability. Springer-
Verlag, Berlin, 2010.
References 323

65. Peter L. Duren. Univalent functions, volume 259 of Grundlehren der Math-
ematischen Wissenschaften [Fundamental Principles of Mathematical Sci-
ences]. Springer-Verlag, New York, 1983.
66. Kenneth J. Dykema. On certain free product factors via an extended matrix
model. J. Funct. Anal., 112(1):31–60, 1993.
67. Kenneth J. Dykema. Interpolated free group factors. Pacific J. Math.,
163(1):123–135, 1994.
68. Kenneth J. Dykema. Multilinear function series and transforms in free proba-
bility theory. Adv. Math., 208(1):351–407, 2007.
69. Alan Edelman and N Raj Rao. Random matrix theory. Acta Numer., 14:233–
297, 2005.
70. Wiktor Ejsmont, Uwe Franz, and Kamil Szpojankowski. Convolution, subor-
dination and characterization problems in noncommutative probability. Indi-
ana Univ. Math. J., 66:237–257, 2017.
71. Joshua Feinberg and Anthony Zee. Non-Hermitian random matrix theory:
method of Hermitian reduction. Nuclear Phys. B, 504(3):579–608, 1997.
72. Valentin Féray and Piotr Śniady. Asymptotics of characters of symmetric
groups related to Stanley character formula. Ann. of Math. (2), 173(2):887–
906, 2011.
73. Amaury Freslon and Moritz Weber. On the representation theory of partition
(easy) quantum groups. J. Reine Angew. Math., 720:155–197, 2016.
74. Roland M. Friedrich and John McKay. Homogeneous Lie Groups and Quan-
tum Probability. ArXiv e-prints, June 2015.
75. Bent Fuglede and Richard V. Kadison. Determinant theory in finite factors.
Ann. of Math. (2), 55:520–530, 1952.
76. Liming Ge. Applications of free entropy to finite von Neumann algebras. II.
Ann. of Math. (2), 147(1):143–157, 1998.
77. Vyacheslav L. Girko. The circular law. Teor. Veroyatnost. i Primenen.,
29(4):669–679, 1984.
78. Vyacheslav L. Girko. Theory of stochastic canonical equations. Vol. I, vol-
ume 535 of Mathematics and its Applications. Kluwer Academic Publishers,
Dordrecht, 2001.
79. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete mathe-
matics. Addison-Wesley Publishing Company, Reading, MA, second edition,
1994.
80. David S. Greenstein. On the analytic continuation of functions which map the
upper half plane into itself. J. Math. Anal. Appl., 1:355–362, 1960.
81. Yinzheng Gu, Hao-Wei Huang, and James A. Mingo. An analogue of the
Lévy-Hinčin formula for bi-free infinitely divisible distributions. Indiana
Univ. Math. J., 65(5):1795–1831, 2016.
82. Alice Guionnet, Manjunath Krishnapur, and Ofer Zeitouni. The single ring
theorem. Ann. of Math. (2), 174(2):1189–1217, 2011.
324 References

83. Alice Guionnet and Dimitri Shlyakhtenko. Free monotone transport. Invent.
Math., 197(3):613–661, 2014.
84. Uffe Haagerup. Random matrices, free probability and the invariant subspace
problem relative to a von Neumann algebra. Proc. ICM 2002, Beijing, Vol.
I:273–290, 2002.
85. Uffe Haagerup and Flemming Larsen. Brown’s spectral distribution measure
for R-diagonal elements in finite von Neumann algebras. J. Funct. Anal.,
176(2):331–367, 2000.
86. Uffe Haagerup and Hanne Schultz. Brown measures of unbounded operators
affiliated with a finite von Neumann algebra. Math. Scand., 100(2):209–263,
2007.
87. Uffe Haagerup and Hanne Schultz. Invariant subspaces for operators in a
general II1 -factor. Publ. Math. Inst. Hautes Études Sci., 109(1):19–111, 2009.
88. Uffe Haagerup, Hanne Schultz, and Steen Thorbjørnsen. A random matrix
∗ (F ). Adv. Math., 204(1):1–83, 2006.
approach to the lack of projections in Cred 2
89. Uffe Haagerup and Steen Thorbjørnsen. A new application of random matri-
∗ (F )) is not a group. Ann. of Math. (2), 162(2):711–775, 2005.
ces: Ext(Cred 2
90. Walid Hachem, Philippe Loubaton, and Jamal Najim. Deterministic equiv-
alents for certain functionals of large random matrices. Ann. Appl. Probab.,
17(3):875–930, 2007.
91. John Harer and Don Zagier. The Euler characteristic of the moduli space of
curves. Invent. Math., 85(3):457–485, 1986.
92. Walter K. Hayman and Patrick B. Kennedy. Subharmonic functions. Vol. I.
London Mathematical Society Monographs. No. 9. Academic Press, London-
New York, 1976.
93. J. William Helton, Tobias Mai, and Roland Speicher. Applications of realiza-
tions (aka linearizations) to free probability. arXiv preprint arXiv:1511.05330,
2015.
94. J. William Helton, Scott A. McCullough, and Victor Vinnikov. Noncom-
mutative convexity arises from linear matrix inequalities. J. Funct. Anal.,
240(1):105–191, 2006.
95. J. William Helton, Reza Rashidi Far, and Roland Speicher. Operator-valued
semicircular elements: solving a quadratic matrix equation with positivity con-
straints. Int. Math. Res. Not. IMRN, 2007(22):Art. ID rnm086, 15, 2007.
96. Fumio Hiai and Dénes Petz. Asymptotic freeness almost everywhere for ran-
dom matrices. Acta Sci. Math. (Szeged), 66(3-4):809–834, 2000.
97. Fumio Hiai and Dénes Petz. The semicircle law, free random variables and
entropy, volume 77 of Mathematical Surveys and Monographs. American
Mathematical Society, Providence, RI, 2000.
98. Graham Higman. The units of group-rings. Proc. Lond. Math. Soc., 2(1):231–
248, 1940.
99. Kenneth Hoffman. Banach spaces of analytic functions. Dover Publications,
Inc., New York, 1988.
References 325

100. Hao-Wei Huang and Jiun-Chau Wang. Analytic aspects of the bi-free partial
R-transform. J. Funct. Anal., 271(4):922–957, 2016.
101. Leon Isserlis. On a formula for the product-moment coefficient of any order
of a normal frequency distribution in any number of variables. Biometrika,
12(1/2):134–139, 1918.
102. Alain Jacques. Sur le genre d’une paire de substitutions. C. R. Acad. Sci. Paris
Sér. A-B, 267:A625–A627, 1968.
103. Romuald A. Janik, Maciej A. Nowak, Gábor Papp, and Ismail Zahed. Non-
Hermitian random matrix models. Nuclear Phys. B, 501(3):603–642, 1997.
104. Kurt Johansson. On fluctuations of eigenvalues of random Hermitian matrices.
Duke Math. J., 91(1):151–204, 1998.
105. Vaughan F. R. Jones. Index for subfactors. Invent. Math., 72(1):1–25, 1983.
106. Richard V. Kadison and John R. Ringrose. Fundamentals of the theory of
operator algebras. Vol. II. American Mathematical Society, Providence, RI,
1997.
107. Dmitry S. Kaliuzhnyi-Verbovetskyi and Victor Vinnikov. Foundations of free
noncommutative function theory, volume 199 of Mathematical Surveys and
Monographs. American Mathematical Society, Providence, RI, 2014.
108. Todd Kemp, Ivan Nourdin, Giovanni Peccati, and Roland Speicher. Wigner
chaos and the fourth moment. Ann. Probab., 40(4):1577–1635, 2012.
109. Paul Koosis. Introduction to H p spaces, volume 115 of Cambridge Tracts in
Mathematics. Cambridge University Press, Cambridge, second edition, 1998.
110. Claus Köstler and Roland Speicher. A noncommutative de Finetti theorem:
invariance under quantum permutations is equivalent to freeness with amalga-
mation. Comm. Math. Phy., 291(2):473–490, 2009.
111. Bernadette Krawczyk and Roland Speicher. Combinatorics of free cumulants.
J. Combin. Theory Ser. A, 90(2):267–292, 2000.
112. Milan Krbalek, Petr Šeba, and Peter Wagner. Headways in traffic flow: Re-
marks from a physical perspective. Phys. Rev. E, 64(6):066119, 2001.
113. Germain Kreweras. Sur les partitions non croisées d’un cycle. Discrete Math.,
1(4):333–350, 1972.
114. Burkhard Kümmerer and Roland Speicher. Stochastic integration on the Cuntz
algebra O∞ . J. Funct. Anal., 103(2):372–408, 1992.
115. Timothy Kusalik, James A. Mingo, and Roland Speicher. Orthogonal polyno-
mials and fluctuations of random matrices. J. Reine Angew. Math., 604:1–46,
2007.
116. Flemming Larsen. Brown measures and R-diagonal elements in finite von
Neumann algebras. PhD thesis, University of Southern Denmark, 1999.
117. Franz Lehner. Cumulants in noncommutative probability theory I. Noncom-
mutative exchangeability systems. Math. Z., 248(1):67–100, 2004.
118. François Lemeux and Pierre Tarrago. Free wreath product quantum groups:
the monoidal category, approximation properties and free probability. J. Funct.
Anal., 270(10):3828–3883, 2016.
326 References

119. Eugene Lukacs. Characteristic functions. Griffin’s Statistical Monographs &

Courses, No. 5. Hafner Publishing Co., New York, 1960.
120. Hans Maassen. Addition of freely independent random variables. J. Funct.
Anal., 106(2):409–438, 1992.
121. Tobias Mai, Roland Speicher, and Moritz Weber. Absence of algebraic rela-
tions and of zero divisors under the assumption of full non-microstates free
entropy dimension. Adv. Math., 304:1080–1107, 2017.
122. Camille Male. The distributions of traffics and their free product. arXiv
preprint arXiv:1111.4662, 2015.
123. Vladimir A. Marchenko and Leonid A. Pastur. Distribution of eigenvalues for
some sets of random matrices. Mat. Sb., 114(4):507–536, 1967.
124. Adam Marcus, Daniel A. Spielman, and Nikhil Srivastava. Finite free convo-
lutions of polynomials. arXiv preprint arXiv:1504.00350, 2015.
125. John C. McLaughlin. Random walks and convolution operators on free prod-
ucts. PhD thesis, New York University, 1986.
126. James A. Mingo and Alexandru Nica. Annular noncrossing permutations and
partitions, and second-order asymptotics for random matrices. Int. Math. Res.
Not., 2004(28):1413–1460, 2004.
127. James A. Mingo and Mihai Popa. Real second order freeness and Haar or-
thogonal matrices. J. Math. Phys., 54(5):051701, 35, 2013.
128. James A. Mingo, Piotr Śniady, and Roland Speicher. Second order freeness
and fluctuations of random matrices. II. Unitary random matrices. Adv. Math.,
209(1):212–240, 2007.
129. James A. Mingo and Roland Speicher. Second order freeness and fluctuations
of random matrices. I. Gaussian and Wishart matrices and cyclic Fock spaces.
J. Funct. Anal., 235(1):226–270, 2006.
130. James A. Mingo and Roland Speicher. Sharp bounds for sums associated to
graphs of matrices. J. Funct. Anal., 262(5):2272–2288, 2012.
131. James A. Mingo and Roland Speicher. Schwinger-Dyson equations: classical
and quantum. Probab. Math. Stat., 33(2):275–285, 2013.
132. James A. Mingo, Roland Speicher, and Edward Tan. Second order cumulants
of products. Trans. Amer. Math. Soc., 361(9):4751–4781, 2009.
133. Rick Miranda. Algebraic curves and Riemann surfaces, volume 5 of Gradu-
ate Studies in Mathematics. American Mathematical Society, Providence, RI,
1995.
134. Francis J. Murray and John von Neumann. On rings of operators. Ann. of
Math. (2), 37(1):116–229, 1936.
135. Alexandru Nica. R-transforms of free joint distributions and non-crossing
partitions. J. Funct. Anal., 135(2):271–296, 1996.
136. Alexandru Nica, Dimitri Shlyakhtenko, and Roland Speicher. R-diagonal el-
ements and freeness with amalgamation. Canad. J. Math., 53(2):355–381,
2001.
References 327

137. Alexandru Nica, Dimitri Shlyakhtenko, and Roland Speicher. Operator-valued

distributions. I. Characterizations of freeness. Int. Math. Res. Not. IMRN,
2002(29):1509–1538, 2002.
138. Alexandru Nica, Dimitri Shlyakhtenko, and Roland Speicher. R-cyclic fami-
lies of matrices in free probability. J. Funct. Anal., 188(1):227–271, 2002.
139. Alexandru Nica and Roland Speicher. R-diagonal pairs—a common approach
to Haar unitaries and circular elements. In Free probability theory (Waterloo,
ON, 1995), volume 12 of Fields Inst. Commun., pages 149–188. Amer. Math.
Soc., Providence, RI, 1997.
140. Alexandru Nica and Roland Speicher. Lectures on the combinatorics of free
probability, volume 335 of London Mathematical Society Lecture Note Series.
Cambridge University Press, Cambridge, 2006.
141. Jonathan Novak. Three lectures on free probability. In Random matrix the-
ory, interacting particle systems and integrable systems, pages 309–383. New
York, NY: Cambridge University Press, 2014.
142. Jonathan Novak and Piotr Śniady. What is ... a free cumulant? Notices Amer.
Math. Soc., 58(2):300–301, 2011.
143. Mihai Pimsner and Dan-Virgil Voiculescu. K-groups of reduced crossed prod-
ucts by free groups. J. Operator Theory, 8(1):131–156, 1982.
144. N. Raj Rao and Roland Speicher. Multiplication of free random variables and
the S-transform: the case of vanishing mean. Electron. Commun. Probab.,
12:248–258, 2007.
145. Reza Rashidi Far, Tamer Oraby, Wlodek Bryc, and Roland Speicher. On slow-
fading MIMO systems with nonseparable correlation. IEEE Trans. Inform.
Theory, 54(2):544–553, 2008.
146. Sven Raum and Moritz Weber. The full classification of orthogonal easy quan-
tum groups. Comm. Math. Phy., 341(3):751–779, 2016.
147. Catherine Emily Iska Redelmeier. Real second-order freeness and the asymp-
totic real second-order freeness of several real matrix models. Int. Math. Res.
Not. IMRN, 2014(12):3353–3395, 2014.
148. Michael Reed and Barry Simon. Methods of modern mathematical physics. I:
Functional analysis. Academic Press, New York-London, 1980.
149. Florin Rădulescu. The fundamental group of the von Neumann algebra of a
free group with infinitely many generators is R+ \{0}. J. Amer. Math. Soc.,
5(3):517–532, 1992.
150. Florin Rădulescu. Random matrices, amalgamated free products and subfac-
tors of the von Neumann algebra of a free group, of noninteger index. Invent.
Math., 115(2):347–389, 1994.
151. Walter Rudin. Real and complex analysis. 3rd ed. McGraw-Hill,New York,
1987.
152. Walter Rudin. Functional analysis. McGraw-Hill, Inc., New York, second
edition, 1991.
328 References

153. Michael Schürmann and Stefan Voß. Schoenberg correspondence on dual

groups. Comm. Math. Phys., 328(2):849–865, 2014.
154. Marcel-Paul Schützenberger. On the definition of a family of automata. Inf.
Control, 4:245–270, 1961.
155. Dimitri Shlyakhtenko. Random Gaussian band matrices and freeness with
amalgamation. Int. Math. Res. Not. IMRN, 1996(20):1013–1025, 1996.
156. Dimitri Shlyakhtenko. Free probability, planar algebras, subfactors and ran-
dom matrices. Proc. ICM 2010 Hyderabad, Vol. III:1603–1623, 2010.
157. James A. Shohat and Jacob D. Tamarkin. The problem of moments, Mathe-
matical Surveys. I. American Mathematical Society, New York, 1943.
158. Piotr Śniady and Roland Speicher. Continuous family of invariant subspaces
for R–diagonal operators. Invent. Math., 146(2):329–363, 2001.
159. Roland Speicher. A new example of “independence” and “white noise”.
Probab. Theory Related Fields, 84(2):141–159, 1990.
160. Roland Speicher. Free convolution and the random sum of matrices. Publ.
Res. Inst. Math. Sci., 29(5):731–744, 1993.
161. Roland Speicher. Multiplicative functions on the lattice of non-crossing parti-
tions and free convolution. Math. Ann., 298(1):611–628, 1994.
162. Roland Speicher. Free probability theory and non-crossing partitions. Sém.
Lothar. Combin, 39:38, 1997.
163. Roland Speicher. Combinatorial theory of the free product with amalgamation
and operator-valued free probability theory. Mem. Am. Math. Soc., 627:88,
1998.
164. Roland Speicher. Free probability and random matrices. Proc. ICM 2014, Vol.
III:477–501, 2014.
165. Roland Speicher. Free probability theory: and its avatars in representation the-
ory, random matrices, and operator algebras; also featuring: non-commutative
distributions. Jahresber. Dtsch. Math.-Ver., 119(1):3–30, 2017.
166. Roland Speicher and Carlos Vargas. Free deterministic equivalents, rectangu-
lar random matrix models, and operator-valued free probability theory. Ran-
dom Matrices Theory Appl., 1(02):1150008, 2012.
167. Richard P. Stanley. Enumerative combinatorics. Vol. 1. Cambridge: Cam-
bridge University Press, 1997.
168. Elias M. Stein and Guido Weiss. Introduction to Fourier analysis on Euclidean
spaces. Princeton University Press, Princeton, 1971.
169. Stanislaw J. Szarek. Nets of Grassmann manifold and orthogonal group. Proc.
Res. Workshop on Banach space theory, Univ. Iowa 1981, 169-185, 1982.
170. Gerard ’t Hooft. A planar diagram theory for strong interactions. Nuclear
Phys. B, 72:461–473, 1974.
171. Masamichi Takesaki. Theory of operator algebras I. Berlin: Springer, 2002.
172. Terence Tao. Topics in random matrix theory, volume 132 of Graduate Studies
in Mathematics. American Mathematical Society, Providence, RI, 2012.
References 329

173. Steen Thorbjørnsen. Mixed moments of Voiculescu’s Gaussian random ma-

trices. J. Funct. Anal., 176(2):213–246, 2000.
174. Antonia M. Tulino and Sergio Verdú. Random matrix theory and wireless
communications. Found. Trends Commun. Inf. Theory, 1(1):184, 2004.
175. Carlos Vargas Obieta. Free probability theory: deterministic equivalents and
combinatorics. PhD thesis, Saarbrücken, Universität des Saarlandes, 2015.
176. Dan-Virgil Voiculescu. Symmetries of some reduced free product C∗ -algebras.
In Operator algebras and their connections with topology and ergodic the-
ory (Buşteni, 1983), volume 1132 of Lecture Notes in Math., pages 556–588.
Springer, Berlin, 1985.
177. Dan-Virgil Voiculescu. Addition of certain noncommuting random variables.
J. Funct. Anal., 66(3):323–346, 1986.
178. Dan-Virgil Voiculescu. Multiplication of certain non-commuting random vari-
ables. J. Operator Theory, 18(2):223–235, 1987.
179. Dan-Virgil Voiculescu. Circular and semicircular systems and free product
factors. In Operator algebras, unitary representations, enveloping algebras,
and invariant theory (Paris, 1989), volume 92 of Progr. Math., pages 45–60.
Birkhäuser, Boston, 1990.
180. Dan-Virgil Voiculescu. Limit laws for random matrices and free products.
Invent. Math., 104(1):201–220, 1991.
181. Dan-Virgil Voiculescu. The analogues of entropy and of Fisher’s information
measure in free probability theory. I. Comm. Math. Phys., 155(1):71–92, 1993.
182. Dan-Virgil Voiculescu. The analogues of entropy and of Fisher’s information
measure in free probability theory. II. Invent. Math., 118(3):411–440, 1994.
183. Dan-Virgil Voiculescu. Free probability theory: Random matrices and von
Neumann algebras. In Proceedings of the International Congress of Mathe-
maticians, ICM, volume 94, pages 3–11, 1994.
184. Dan-Virgil Voiculescu. Operations on certain non-commutative operator-
valued random variables. In Recent advances in operator algebras (Orléans,
1992), volume 232 of Astérisque, pages 243–275. Soc. Math. France, 1995.
185. Dan-Virgil Voiculescu. The analogues of entropy and of Fisher’s information
measure in free probability theory. III. The absence of Cartan subalgebras.
Geom. Funct. Anal., 6(1):172–199, 1996.
186. Dan-Virgil Voiculescu. The analogues of entropy and of Fisher’s information
measure in free probability theory. IV. Maximum entropy and freeness. In Free
Probability Theory (Waterloo, ON, 1995), volume 12 of Fields Inst. Commun.,
pages 293–302. Amer. Math. Soc., Providence, RI, 1997.
187. Dan-Virgil Voiculescu. The analogues of entropy and of Fisher’s information
measure in free probability theory. V. Noncommutative Hilbert transforms.
Invent. Math., 132(1):189–227, 1998.
188. Dan-Virgil Voiculescu. A strengthened asymptotic freeness result for ran-
dom matrices with applications to free entropy. Int. Math. Res. Not. IMRN,
1998:41–63, 1998.
330 References

189. Dan-Virgil Voiculescu. The analogues of entropy and of Fisher’s information

measure in free probability theory. VI. Liberation and mutual free information.
Adv. Math., 146(2):101–166, 1999.
190. Dan-Virgil Voiculescu. The coalgebra of the free difference quotient and free
probability. Int. Math. Res. Not. IMRN, 2000(2):79–106, 2000.
191. Dan-Virgil Voiculescu. Lectures on free probability. In Lectures on probability
theory and statistics (Saint-Flour, 1998), volume 1738 of Lecture Notes in
Math., pages 279–349. Springer, Berlin, 2000.
192. Dan-Virgil Voiculescu. Free entropy. Bull. Lond. Math. Soc., 34(3):257–278,
2002.
193. Dan-Virgil Voiculescu. Free analysis questions. I. Duality transform for the
coalgebra of ∂X : B . Int. Math. Res. Not. IMRN, 2004(16):793–822, 2004.
194. Dan-Virgil Voiculescu. Aspects of free analysis. Jpn. J. Math., 3(2):163–183,
2008.
195. Dan-Virgil Voiculescu. Free analysis questions II: the Grassmannian com-
pletion and the series expansions at the origin. J. Reine Angew. Math.,
2010(645):155–236, 2010.
196. Dan-Virgil Voiculescu. Free probability for pairs of faces I. Comm. Math.
Phy., 332(3):955–980, 2014.
197. Dan-Virgil Voiculescu, Kenneth J. Dykema, and Alexandru Nica. Free ran-
dom variables, volume 1 of CRM Monograph Series. American Mathematical
Society, Providence, RI, 1992.
198. Dan-Virgil Voiculescu, Nicolai Stammeier, and Moritz Weber. Free Proba-
bility and Operator Algebras, EMS Münster Lectures in Mathematics, Vol. 1.
European Mathematical Society, Zürich, 2016.
199. Jiun-Chau Wang. Local limit theorems in free probability theory. Ann.
Probab., 38(4):1492–1506, 2010.
200. Gian-Carlo Wick. The evaluation of the collision matrix. Phys. Rev, 80(2):268,
1950.
201. Eugene P. Wigner. Characteristic vectors of bordered matrices with infinite
dimensions. Ann. of Math. (2), 62:548–564, 1955.
202. John D. Williams. Analytic function theory for operator-valued free probabil-
ity. J. Reine Angew. Math., published online: 2015-01-20.
203. John Wishart. The generalised product moment distribution in samples from
a normal multivariate population. Biometrika, 20A(1-2):32–52, 1928.
204. Wolfgang Woess. Nearest neighbour random walks on free products of dis-
crete groups. Boll. Unione Mat. Ital., VI. Ser., B, 5:961–982, 1986.
205. Wolfgang Woess. Random walks on infinite graphs and groups, volume 138.
Cambridge University Press, 2000.
206. Yong-Quan Yin and Paruchuri R. Krishnaiah. A limit theorem for the eigen-
values of product of two random matrices. J. Multivariate Anal., 13:489–507,
1983.
References 331

207. Alexander Zvonkin. Matrix integrals and map enumeration: an accessible in-
troduction. Math. Comput. Modelling, 26(8):281–304, 1997.
Index of Exercises

Chapter 1 10 ... 44 14 ... 74 4 ... 144

Ex. 1 page 14 11 ... 45 15 ... 74 5 ... 145
2 ... 15 12 ... 45 16 ... 80 6 ... 146
3 ... 15 13 ... 46 17 ... 85 7 ... 146
4 ... 16 14 ... 52 18 ... 86 8 ... 151
5 ... 17 15 ... 52 19 ... 91 9 ... 153
6 ... 17 16 ... 53 20 ... 93 10 . . . 156
7 ... 19 17 ... 54 11 . . . 160
Chapter 4
8 ... 19 18 ... 54 12 . . . 160
19 ... 56 1 ... 106
9 ... 28 Chapter 6
20 ... 56 2 ... 108
10 . . . 30
3 ... 108 1 ... 167
11 . . . 31 Chapter 3 4 ... 108 2 ... 167
12 . . . 31
1 ... 61 5 ... 109 3 ... 168
13 . . . 31
2 ... 61 6 ... 109 4 ... 175
14 . . . 31
3 ... 61 7 ... 110 5 ... 177
Chapter 2 4 ... 61 8 ... 115
Chapter 7
1 ... 34 5 ... 62 9 ... 115
6 ... 62 10 . . . 121 1 ... 183
2 ... 34
7 ... 70 11 . . . 121 2 ... 184
3 ... 35
8 ... 71 12 . . . 122 3 ... 184
4 ... 39
9 ... 71 13 . . . 123 4 ... 190
5 ... 41
10 . . . 72 5 ... 190
6 ... 41 Chapter 5
11 . . . 73 6 ... 193
7 ... 41
12 . . . 73 1 . . . 132
8 ... 43 Chapter 8
13 . . . 73 2 . . . 138
9 ... 44 1 . . . 202
3 . . . 143
2 . . . 203

333
334 Index of Exercises

3 ... 205
4 ... 210
5 ... 214
6 ... 217
7 ... 217
8 ... 225
9 ... 225
10 . . . 225
11 . . . 226
12 . . . 226
13 . . . 226
14 . . . 227
Chapter 9
1 . . . 231
2 . . . 246
Chapter 10
1 . . . 258
2 . . . 264
Chapter 11
1 ... 267
2 ... 271
3 ... 272
4 ... 274
5 ... 275
Index

(A, ϕ): non-commutative probability H+ (B): operator upper half-plane,

space, 26 260
å: centred version of a, 27 ker(i): kernel of multi-index, 35
(A, E, B): operator-valued kn : classical cumulant, 14
probability space, 239 κn : free cumulant, 42
B C, 27 κm,n : second-order cumulant, 149
C+ β : {z | Im(z) > β }, 81
Λ : cumulant generating series, 183
Cn : Catalan numbers, 39 L(Fr ): interpolating free group
Cn : Chebyshev polynomials first factor, 179
kind, 129 L(Fn ): free group factor, 167
∆α,β : wedge in lower half-plane, 88 Λ ∗ : Legendre transform, 184
∆ (a): Fuglede-Kadison determinant, Mt : compression of vN-algebra, 172
268 (M, E, B): C∗ -operator-valued
probability space, 259
δ : free entropy dimension, 194
(M, τ): tracial W ∗ -probability space,
∂x , ∂i : non-commutative derivative,
27, 266
52, 202
µa : distribution of a, 34
Eπ , 17
µa : Brown measure of a, 266
F(z): reciprocal Cauchy transform,
NC(n): non-crossing partitions, 21,
78
41
Fn : free group, 167
[NC(m) × NC(n)]: second-order
G(z): Cauchy transform, 51 non-crossing annular
Γα : Stolz angle, 73 partitions, 149
Γα,β : truncated Stolz angle, 73 NC2 (n): non-crossing pairings, 21
Γ : property Γ of vN algebra, 194 NC2 (p, q): non-crossing annular
γn : permutation with one cycle, 20 pairings, 131
2
∇ : Laplacian, 267
γ p,q : permutation with two cycles,
138 ω(z): subordination function, 52

335
336 INDEX

P(n): all partitions, 15 asymptotic freeness

P2 (n): all pairings, 15 of Haar unitary random matrices,
P̂: linearization of P, 257 110
Φ ∗ : free Fisher information, 210 of independent GUE, 26
: free additive convolution, 59 and deterministic, 106
R(z): R-transform, 51 and independent random
S(z): S-transform, 124 matrices, 107
SNC (p, q): non-crossing annular of randomly rotated matrices, 111
permutations, 131 of unitarily invariant matrices, 111
: free multiplicative convolution, of Wigner matrices, 121
97 asymptotically free, 26, 103
Tr: unnormalized trace, 21 almost surely, 103
tr: normalized trace, 21 in probability, 104
Un : Chebyshev polynomials second of second order, 146
kind, 159
χ : microstates free entropy, 191 band matrix, 236
χ ∗ : non-microstates free entropy, block matrix, 229
220 bounded higher cumulants, 140
ϕ(t): characteristic function, 14 Brown measure, 270
vN: von Neumann algebra generated calculation
by, 168 for R-diagonal operator, 273
ξi : conjugate system, 210 for circular operator, 274
1n : maximal partition, 46 for elliptic operator, 276
K(π): Kreweras complement, 48 for unbounded operators, 276
[n]: {1, 2, . . . , n}, 15 motivation, 266
#(π): number of blocks of π, 15
|π|: length function on Sn , 21 Carleman condition, 34
∧: meet, 42 Cartan subalgebra, 195
∨: joint, 42 Catalan number
as moment of semi-circle, 23
additivity explicit formula, 39
of χ , 193 generating function, 39
of χ ∗ , 221 recursion, 39
of Φ ∗ , 217, 219 Cauchy distribution, 64
of R-transform, 51 Cauchy transform of, 61
operator-valued, 242 Cauchy transform
affiliated operator, 214 as analytic function, 60
algebraically free, 204 as formal power series, 51
almost surely asymptotically free, operator-valued, 242
103 Cauchy transform of
alternating product, 169 arc-sine, 62
arc-sine law, 61 atomic measure, 61
Cauchy transform of, 62 Cauchy distribution, 61
INDEX 337

Marchenko-Pastur, 63 in law, 34
semi-circle, 62, 67 vague, 71
central limit theorem weak, 71
classical, 37 of probability measures, 34
free, 40 covariance of operator-valued
central sequence, 194 semicircular element, 232
characteristic function, 14 Cramér’s theorem, 186
Chebyshev polynomials C∗ -operator-valued probability
of first kind, 129 space, 259
as orthogonal polynomials, 160 C∗ -probability space, 26
combinatorial interpretation, 161 cumulant
recursion, 225 classical, 14
of second kind, 159 free, 42
as orthogonal polynomials, 159 mixed, 46
combinatorial interpretation, 163 second-order, 149
recursion, 225 cumulant generating series, 183
circular element, 173 cumulant series, 50
free cumulants of, 174 cutting edge of graph, 116
of second order, 156 cyclically alternating, 145
circular family, 231
free cumulants of, 231 Denjoy-Wolff theorem, 93
circular law, 275 deterministic equivalent, 250, 253
closed pair, 161 free, 253
complex Gaussian random matrix, distribution, 168
173 algebraic, 34
complex Gaussian random variable, arc-sine, 61
17 Cauchy, 64
compression of von Neumann determined by moments, 34
algebra, 172 Carleman condition, 34
conditional expectation, 238 elliptic, 276
conjugate Poisson kernel, 213 free Poisson, 44, 123
conjugate relations, 210 Haar unitary, 108
conjugate system, 210 Marchenko-Pastur, 44
convergence of random variable, 34
almost sure for random matrices, quarter-circular, 174
100 R-diagonal, 272
in distribution, 34 second-order limiting, 129
in moments, 34 semi-circle, 23
in probability, 104
of averaged eigenvalue eigenvalue distribution
distribution, 100 joint, 188
of real-valued random variables empirical, 182
in distribution, 34 elliptic operator, 276
338 INDEX

empirical eigenvalue distribution, general, 181

182 unification problem, 201
eta transform, 260 free entropy χ , 191
additivity, 193
factor, 167 one-dimensional case, 192
free group, 167 subadditivity, 192
hyperfinite, 167 free entropy χ ∗ , 220
faithful state, 26 additivity, 221
finite von Neumann algebra, 266 one-dimensional case, 220
Fisher information subadditivity, 220
free, 210 log-Sobolev inequality, 221
classical, 209 free entropy dimension δ , 194
fluctuations, 128 free Fisher information Φ ∗ , 210
forest of two-edge connected additivity, 217, 219
components, 116 Cramér Rao inequality, 216
free one-dimensional case, 213
with amalgamation, 239 Stam inequality, 216
asymptotically, 103 superadditivity, 216
almost surely, 103 free group, 167
for elements, 28 rank, 167
for subalgebras, 26 free group factor, 167
for unbounded operators, 214 interpolating, 179
of second order, 145 isomorphism problem, 168
asymptotically, 146 free independence, 26
free additive convolution, 59 free multiplicative convolution, 97
for finite variance, 83, 86 free Poincare inequality, 203
for general measures, 91, 97 free Poisson distribution, 44, 123
free analysis, 243 free cumulants of, 44
free convolution, see free additive of second order, 143
convolution free Stam inequality, 216
free Cramér Rao inequality, 216 freely independent, 26
free cumulant, 42 freeness, 26
operator-valued, 240 operator-valued, 239
free cumulants of Fuglede-Kadison determinant, 268
circular element, 174 fundamental group of von Neumann
circular family, 231 algebra, 172
free Poisson, 44
limit of Wigner matrices, 151 Gaussian
Marchenko-Pastur, 44 random matrix, 19
semi-circle, 44 complex, 173
free deterministic equivalent, 253 random variable, 14
free difference quotient, 202 complex, 17
free entropy random vector, 16
INDEX 339

generalized Weyl inequality, 272 Kreweras complement, 48

genus expansion, 21
for Wishart matrices, 123 large deviation principle, 186
genus of permutation, 134 large deviations, 182
Ginibre ensemble, 274 leaf, 116
good rate function, 186 left regular representation, 166
group C∗ -algebra Legendre transform, 184
reduced, 166 length function on Sn , 21
group algebra, 166 lift of partition, 114
group von Neumann algebra, 165, limiting distribution, 138
166 linearization, 257
GUE, see Gaussian random matrix log-Sobolev inequality, 221

Haar unitary, 108 Marchenko-Pastur distribution, 44

second-order, 144 by Stieltjes inversion, 70
Haar unitary random matrix, 108 Cauchy transform of, 63
half-pairing for random matrices, 123
non-crossing annular, 161 free cumulants of, 44
non-crossing linear, 163 marked block, 149
half-permutation meet, 42
non-crossing annular, 162 merging of vertices, 117
non-crossing linear, 163 microstates, 191
Herglotz representation, 69 microstates free entropy, see free
hermitization method, 277 entropy χ
Hilbert transform, 212 Möbius inversion
h transform, 260 in P(n), 29
hyperfinite II1 factor, 167 in NC(n), 42
modular conjugation operator, 205
ICC group, 167 moment (joint), 168
inclusion-exclusion, 24 moment series, 50
insertion property, 241 operator-valued, 260
integrable operator, 214 moment-cumulant formula
interpolating free group factor, 179 operator-valued, 240
interval stripping, 241 classical, 14, 29
inversion formula cumulant-moment, 42
Stieltjes, 64 free, 42
isomorphism problem for free group second-order, 149
factors, 168 multiplicative family, 241

join, 42 Nevanlinna representation, 68

joint eigenvalue distribution, 188 non-commutative derivative, 52, 202
Jucys-Murphy element, 109 non-commutative probability space,
26
kernel of a multi-index, 35 second-order, 142
340 INDEX

tracial, 142 on all partitions, 42

non-commutative random variable, on non-crossing partitions, 42
26 partition, 15
non-crossing non-crossing, 21, 41
annular half-pairing, 161 second-order non-crossing
annular half-permutation, 162 annular, 149
annular pairing, 130 type of, 15
annular permutation, 131 permutation
linear half-pairing, 163 γ-planar, 134
linear half-permutation, 163 non-crossing annular, 131
pairing, 21 planar, 22
partition, 41 planar
non-degenerate state, 26 γ-, 134
non-microstates free entropy, see annular pairing, 130
free entropy χ ∗ permutation, 22
non-tangential limit, 67 Poisson kernel, 64, 213
normal operator, 265 conjugate, 213
normal random variable, 14 prime von Neumann algebra, 195
probability space
open pair, 161 ∗-, 26
operator C∗ -, 26
affiliated with an algebra, 214 W ∗ -, 27
integrable, 214 tracial, 27
operator upper half-plane, 260 non-commutative, 26
operator-valued property Γ , 194
R-transform, 242
moment-cumulant formula, 240 quarter-circular element, 174
Cauchy transform, 242
distribution, 239 R-transform
free cumulant, 240 as formal power series, 51
freeness, 239 for arbitrary measures, 88
moment, 239 for compactly supported
probability space, 239 measures, 75
C∗ -, 259 for finite variance, 83
semi-circular element, 242 operator-valued, 242
random matrix
pairing, 15, 37 deterministic, 104
non-crossing, 21 Gaussian, 19
non-crossing annular, 130 Ginibre, 274
partial non-commutative derivative, Haar unitary, 108
see non-commutative unitarily invariant, 111, 147
derivative Wigner, 112
partial order Wishart, 122
INDEX 341

random rotation, 111 by Stieltjes inversion, 65

random variable free cumulants of, 44
Gaussian, 14 of second order, 143
complex, 17 semi-circle law, 23
non-commutative, 26 semi-circular element, 40
rank of free group, 167 operator-valued, 242
rate function, 182, 186 covariance, 232
good, 186 standard, 40
R-diagonal operator, 272 semi-circular family, 41
random matrix model, 275 spoke diagram, 145
single ring theorem, 275 standard
reciprocal Cauchy transform, 78, 89 complex Gaussian random
operator-valued, 260 variable, 17
reduced group C∗ -algebra, 166 lift of partition, 114
reduced word, 167 semi-circular element, 40
refinement of partition, 42 ∗-probability space, 26
relative entropy, 188 ∗-distribution, 168
removing a vertex, 118 ∗-free, 28
Riesz measure, 270 ∗-moment (joint), 168
state, 26
S-transform, 124 faithful, 26
Sanov’s theorem, 187 non-degenerate, 26
Schreier’s index formula, 172 tracial, 27
Schur complement, 255 Stieltjes inversion formula, 64
score function, 209 Stieltjes transform, 64
second-order Stolz angle, 73
cumulants, 149 truncated, 73
limit, 142 subharmonic function, 269
limiting distribution, 129 submean inequality, 269
non-commutative probability subordination function
space, 142 as analytic function, 94
non-crossing annular partitions, as formal power series, 52
149 operator-valued
circular element, 156 as analytic function, 260
free Poisson, 143 as formal power series, 242
Haar unitary, 144 multiplicative version, 261
semi-circle, 143
second-order cumulants of through-block, 162
free Poisson, 151 through-cycle, 132
limit of Wigner matrices, 151 through-pair, 161
semi-circle, 151 trace, 142
semi-circle distribution, 23 tracial W ∗ -probability space, 27
Cauchy transform of, 62, 67 tree, 116
342 INDEX

trivial prime, 195

leaf, 116 type II1 , 167
tree, 116
truncated Stolz angle, 73 weak convergence, 71
two-edge connected graph, 116 Weingarten convolution formula,
type of partition, 15 109
Weyl inequality
unitarily invariant, 111, 147
generalized, 272
upper semicontinuous, 269
Wick’s formula, 19
vague convergence, 71 Wigner random matrix, 112
von Neumann algebra Wigner’s semi-circle law, see
compression, 172 semi-circle law
factor, 167 Wishart random matrix, 122
finite, 266 genus expansion, 123
fundamental group, 172 W ∗ -probability space, 27
generated by, 168
group vN algebra, 166 zero divisor, 224

vi.i.mmxix

1000 Solved Problems in Modern Physics - WordPress - Com - Get A (PDFDrive)
No ratings yet
1000 Solved Problems in Modern Physics - WordPress - Com - Get A (PDFDrive)
651 pages
Devoe - Thermodynamics and Chemistry 2e (2012) - Small Page Size
50% (2)
Devoe - Thermodynamics and Chemistry 2e (2012) - Small Page Size
533 pages
Kuznetsov, Reitmann - Attractor Dimension Estimates For Dynamical Systems - Theory and Computation
No ratings yet
Kuznetsov, Reitmann - Attractor Dimension Estimates For Dynamical Systems - Theory and Computation
555 pages
Journal of Intellectual Property Law
No ratings yet
Journal of Intellectual Property Law
514 pages
Ergodic Theory
No ratings yet
Ergodic Theory
56 pages
The Material Culture of Steamboat Passengers - Archaeological Evidence From The Missouri River
No ratings yet
The Material Culture of Steamboat Passengers - Archaeological Evidence From The Missouri River
256 pages
Epigraphia Carnatica. by B. Lewis Rice, Director of Archaeological Researches in Mysore (PDFDrive)
No ratings yet
Epigraphia Carnatica. by B. Lewis Rice, Director of Archaeological Researches in Mysore (PDFDrive)
992 pages
160.101 CALCULUS I (Distance) Assignment 5 - Solutions: Massey University Institute of Fundamental Sciences Mathematics
No ratings yet
160.101 CALCULUS I (Distance) Assignment 5 - Solutions: Massey University Institute of Fundamental Sciences Mathematics
8 pages
The Chemistry and Technology of Paints 1916
No ratings yet
The Chemistry and Technology of Paints 1916
404 pages
The Journal of San Diego History Vol. 53
No ratings yet
The Journal of San Diego History Vol. 53
89 pages
S6_MATH_1_EXAM_2025
No ratings yet
S6_MATH_1_EXAM_2025
3 pages
(Encyclopedia of Mathematics and Its Applications) Magurn B.A. - An Algebraic Introduction To K-Theory-Cambridge University Press (2002)
No ratings yet
(Encyclopedia of Mathematics and Its Applications) Magurn B.A. - An Algebraic Introduction To K-Theory-Cambridge University Press (2002)
694 pages
Hardy, G.H. Collected Papers of e J. E Littlewood
No ratings yet
Hardy, G.H. Collected Papers of e J. E Littlewood
708 pages
Stress Gradient Calculations at Notches: Mauro Filippini
No ratings yet
Stress Gradient Calculations at Notches: Mauro Filippini
13 pages
Thailand 14 Northern Thailand
No ratings yet
Thailand 14 Northern Thailand
113 pages
Coorg Inscriptions. Epigraphia Carnatica (PDFDrive)
No ratings yet
Coorg Inscriptions. Epigraphia Carnatica (PDFDrive)
164 pages
Fibonacci Numbers and Binet Formula: (An Introduction To Number Theory)
No ratings yet
Fibonacci Numbers and Binet Formula: (An Introduction To Number Theory)
27 pages
MA105: Calculus Lecture 5 (D1) : Shripad M. Garge IIT Bombay, Mumbai
No ratings yet
MA105: Calculus Lecture 5 (D1) : Shripad M. Garge IIT Bombay, Mumbai
21 pages
Polyhedral Geometry
100% (1)
Polyhedral Geometry
442 pages
Aptitude Shortcuts For Trigonometry With Solved Examples
No ratings yet
Aptitude Shortcuts For Trigonometry With Solved Examples
15 pages
MATH283 L08 Wk2 Web
No ratings yet
MATH283 L08 Wk2 Web
14 pages
Na 10
No ratings yet
Na 10
15 pages
Extensions of Calculus Assignment
No ratings yet
Extensions of Calculus Assignment
2 pages
Gelfand - (Vol 1) Generalized Functions
0% (2)
Gelfand - (Vol 1) Generalized Functions
17 pages
e 09 DFCF 4
No ratings yet
e 09 DFCF 4
60 pages
New Soliton Solutions of Nonlinear Space-Time Frac
No ratings yet
New Soliton Solutions of Nonlinear Space-Time Frac
6 pages
Lie Algebra
No ratings yet
Lie Algebra
16 pages
Sequence and Series: Single Correct Type Questions
No ratings yet
Sequence and Series: Single Correct Type Questions
8 pages
MIT Algebraic Topology
No ratings yet
MIT Algebraic Topology
135 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
5 Power Series
No ratings yet
5 Power Series
10 pages
LFT FS: Lfs FT
No ratings yet
LFT FS: Lfs FT
8 pages
Chapter 4 Cost Minimization
No ratings yet
Chapter 4 Cost Minimization
6 pages
Project Calculus
No ratings yet
Project Calculus
17 pages
Matrices and Determinants: Why Study The Matrix... ?
No ratings yet
Matrices and Determinants: Why Study The Matrix... ?
7 pages
Terjemahan Buku
No ratings yet
Terjemahan Buku
3 pages
Quiz 1 - Q2
No ratings yet
Quiz 1 - Q2
2 pages
Root Locus Notes - Matlab
No ratings yet
Root Locus Notes - Matlab
4 pages
Simultaneous Equations: X X X X
No ratings yet
Simultaneous Equations: X X X X
7 pages
Exponential Operators and Parameter Differentiation in Quantum Physics
No ratings yet
Exponential Operators and Parameter Differentiation in Quantum Physics
22 pages
Lecture 2 - STAT - 2022
No ratings yet
Lecture 2 - STAT - 2022
6 pages
Chapter 7: Supervised Hebbian Learning: Brandon Morgan 1/13/2021
No ratings yet
Chapter 7: Supervised Hebbian Learning: Brandon Morgan 1/13/2021
2 pages
OCE 421 Midterm
No ratings yet
OCE 421 Midterm
2 pages
Numerical Solution of Singular Integral Equations Using Haar Wavelet
No ratings yet
Numerical Solution of Singular Integral Equations Using Haar Wavelet
4 pages
(Encyclopedia of Mathematics and Its Applications) H. Salzmann, T. Grundhöfer, H. Hähl, R. Löwen - The Classical Fields_ Structural Features of the Real and Rational Numbers (Encyclopedia of Mathemati
100% (1)
(Encyclopedia of Mathematics and Its Applications) H. Salzmann, T. Grundhöfer, H. Hähl, R. Löwen - The Classical Fields_ Structural Features of the Real and Rational Numbers (Encyclopedia of Mathemati
418 pages
Artin - Galois Title
No ratings yet
Artin - Galois Title
2 pages
Using Matlab: Page 1 of 3 Spring Semester 2012
No ratings yet
Using Matlab: Page 1 of 3 Spring Semester 2012
3 pages
A Brief History of Functional Analysis
100% (1)
A Brief History of Functional Analysis
23 pages
Rolles & Mean Value Theorem - M
No ratings yet
Rolles & Mean Value Theorem - M
26 pages
Finite Group Theory M. Aschbacher
No ratings yet
Finite Group Theory M. Aschbacher
316 pages
Classical Covariant Fields
100% (2)
Classical Covariant Fields
554 pages
B. Ozbagci and A. I. Stipsicz - Surgery On Contact 3-Manifolds and Stein Surfaces
100% (1)
B. Ozbagci and A. I. Stipsicz - Surgery On Contact 3-Manifolds and Stein Surfaces
285 pages
CF PDF
No ratings yet
CF PDF
197 pages
Lecture Notes Methods of Mathematical Physics MATH
No ratings yet
Lecture Notes Methods of Mathematical Physics MATH
71 pages
(Gregory Karpilovsky) Topics in Field Theory (BookFi)
No ratings yet
(Gregory Karpilovsky) Topics in Field Theory (BookFi)
559 pages
Ergodic Notes
No ratings yet
Ergodic Notes
115 pages
Probability and Geometry On Groups Lecture Notes For A Graduate Course
No ratings yet
Probability and Geometry On Groups Lecture Notes For A Graduate Course
209 pages
Sieve Methods
From Everand
Sieve Methods
Heine Halberstam
No ratings yet
SD
No ratings yet
SD
5 pages
Kreisel and Lawvere On Category Theory and The Foundations of Mathematics
100% (2)
Kreisel and Lawvere On Category Theory and The Foundations of Mathematics
46 pages
Lectures On Ergodic Theory
No ratings yet
Lectures On Ergodic Theory
153 pages
Entire and Meromorphic Functions
No ratings yet
Entire and Meromorphic Functions
88 pages
(ESI Lectures in Mathematics and Physics) Christian GÃ©rard - Microlocal Analysis of Quantum Fields On Curved Spacetimes-European Mathematical Society (2019)
No ratings yet
(ESI Lectures in Mathematics and Physics) Christian GÃ©rard - Microlocal Analysis of Quantum Fields On Curved Spacetimes-European Mathematical Society (2019)
230 pages
Algebraic Geometry
No ratings yet
Algebraic Geometry
331 pages
RoughPaths 2
100% (1)
RoughPaths 2
263 pages
Homotopy Theory: Elementary Basic Concepts.
No ratings yet
Homotopy Theory: Elementary Basic Concepts.
63 pages
Theory of Finite Groups
No ratings yet
Theory of Finite Groups
254 pages
Kahn Topology PDF
100% (1)
Kahn Topology PDF
220 pages
Recurrence and Ergodicity
No ratings yet
Recurrence and Ergodicity
19 pages
9780203755419
100% (1)
9780203755419
431 pages
Brodmann M.P., Sharp R.Y.-local Cohomology
No ratings yet
Brodmann M.P., Sharp R.Y.-local Cohomology
516 pages
Lebesgue Integration and Measure
No ratings yet
Lebesgue Integration and Measure
290 pages
D.H. Fremlin - Measure Theory - Measure Algebras (Vol. 3) (2002)
No ratings yet
D.H. Fremlin - Measure Theory - Measure Algebras (Vol. 3) (2002)
672 pages
(Allyn and Bacon Series in Advanced Mathematics) James Dugundji - Topology-Allyn and Bacon, Inc. (1966)
No ratings yet
(Allyn and Bacon Series in Advanced Mathematics) James Dugundji - Topology-Allyn and Bacon, Inc. (1966)
464 pages
P-Adic Hodge Theory (Brinon and Conrad)
No ratings yet
P-Adic Hodge Theory (Brinon and Conrad)
290 pages
J. H. Wells, L. R. Williams Auth. Embeddings and Extensions in Analysis PDF
No ratings yet
J. H. Wells, L. R. Williams Auth. Embeddings and Extensions in Analysis PDF
116 pages
Paul C. Shields The Ergodic Theory of Discrete Sample Paths Graduate Studies in Mathematics 13 1996
100% (1)
Paul C. Shields The Ergodic Theory of Discrete Sample Paths Graduate Studies in Mathematics 13 1996
259 pages
The Calculus of Variations
No ratings yet
The Calculus of Variations
52 pages
Lesson 0: Martingales: Le Thi Xuan Mai
No ratings yet
Lesson 0: Martingales: Le Thi Xuan Mai
50 pages
Fourier-Mukai and Nahm Transforms in Geometry and Mathematical Physics (Claudio Bartocci, Ugo Bruzzo Etc.)
100% (1)
Fourier-Mukai and Nahm Transforms in Geometry and Mathematical Physics (Claudio Bartocci, Ugo Bruzzo Etc.)
434 pages
Nonlinear Functional Analysis
100% (1)
Nonlinear Functional Analysis
391 pages
978-3-030-91029-7
No ratings yet
978-3-030-91029-7
253 pages
An Introduction To The Theory of Automorphic Functions
100% (1)
An Introduction To The Theory of Automorphic Functions
112 pages
Free Probability and Operator Algebras
100% (1)
Free Probability and Operator Algebras
144 pages
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
No ratings yet
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
180 pages
Foundation of Algebra Geometry
No ratings yet
Foundation of Algebra Geometry
826 pages
Hypergeometric Functions of Two Variables
100% (1)
Hypergeometric Functions of Two Variables
201 pages
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
100% (1)
(Series in Mathematical Analysis and Applications 9) Leszek Gasinski, Nikolaos S. Papageorgiou - Nonlinear Analysis-Chapman & Hall - CRC (2006)
960 pages
Bok 3A978 3 319 29977 8
No ratings yet
Bok 3A978 3 319 29977 8
191 pages
Algebraic Geometry
100% (3)
Algebraic Geometry
260 pages
Milnor Thom
No ratings yet
Milnor Thom
18 pages
Cvitanovic Et Al. Classical and Quantum Chaos Book (Web Version 9.2.3, 2002) (750s) - PNC
No ratings yet
Cvitanovic Et Al. Classical and Quantum Chaos Book (Web Version 9.2.3, 2002) (750s) - PNC
750 pages
Forsyth - Theory of Differential Equation - Vol. 5 PDF
100% (3)
Forsyth - Theory of Differential Equation - Vol. 5 PDF
508 pages
Finite Dimensional Vector Spaces Paul Halmos
No ratings yet
Finite Dimensional Vector Spaces Paul Halmos
204 pages
Cohomology of Arithmetic Groups, L-Functions and Automorphic - T. Venkatamarana PDF
No ratings yet
Cohomology of Arithmetic Groups, L-Functions and Automorphic - T. Venkatamarana PDF
132 pages
B L Van Der Waerden Modern Algebra Vol 2 PDF
No ratings yet
B L Van Der Waerden Modern Algebra Vol 2 PDF
227 pages
Class 12 Revision Notes Inverse Trigonometric Functions
No ratings yet
Class 12 Revision Notes Inverse Trigonometric Functions
17 pages
Tao T. Higher Order Fourier Analysis (Draft, 2011) (233s) - MT
100% (4)
Tao T. Higher Order Fourier Analysis (Draft, 2011) (233s) - MT
233 pages
Methods of Modern Mathematical Physics Vol 1 - Functional Analysis 2nd. Ed. - M. Reed
No ratings yet
Methods of Modern Mathematical Physics Vol 1 - Functional Analysis 2nd. Ed. - M. Reed
208 pages
Multivariate Approximation
100% (1)
Multivariate Approximation
296 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.