Free Probability and Random Matrices
Free Probability and Random Matrices
Speicher
5
6 Contents
4 Asymptotic Freeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.1 Averaged convergence versus almost sure convergence . . . . . . . . . . . 99
4.2 Gaussian random matrices and deterministic matrices . . . . . . . . . . . . 104
4.3 Haar distributed unitary random and deterministic matrices . . . . . . . 108
4.4 Wigner and deterministic random matrices . . . . . . . . . . . . . . . . . . . . . 112
4.5 Examples of random matrix calculations . . . . . . . . . . . . . . . . . . . . . . . 122
4.5.1 Wishart matrices and the Marchenko-Pastur distribution . . . . 122
4.5.2 Sum of random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.5.3 Product of random matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
9
10 Contents
make it into the book. One reason for this is that free probability is still very fast
evolving with new connections popping up quite unexpectedly.
So we are, for example, not addressing such exciting topics as free stochastic
and Malliavin calculus [40, 108, 114], or the rectangular version of free probability
[28], or the strong version of asymptotic freeness [48, 58, 89], or free monotone
transport [83], or the relation with representation theory [35, 72] or with quantum
groups [16, 17, 44, 73, 110, 118, 146]; or the quite recent new developments around
bifreeness [51, 81, 100, 196], traffic freeness [50, 122] or connections to Ramanujan
graphs via finite free convolution [124]. Instead of trying to add more chapters to a
never-ending (and never-published) book, we prefer just to stop where we are and
leave the remaining parts for others.
We want to emphasize that some of the results in this book owe their existence to
the book writing itself, and our endeavour to fill apparent gaps in the existing theory.
Examples of this are our proof of the asymptotic freeness of Wigner matrices from
deterministic matrices in Section 4.4 (for which there exists now also another proof
in the book [6]), the fact that finite free Fisher information implies the existence of
a density in Proposition 8.18, or the results about the absence of algebraic relations
and zero divisors in the case of finite free Fisher information in Theorems 8.13 and
8.32.
Our presentation benefited a lot from input by others. In particular, we like to
mention Serban Belinschi and Hari Bercovici for providing us with a proof of Propo-
sition 8.18, and Uffe Haagerup for allowing us to use his manuscript of his talk at
the Fields Institute as the basis for Chapter 11. With the exception of Sections 11.9
and 11.10 we are mainly following his notes in Chapter 11. Chapter 3 relied sub-
stantially on input and feedback from the experts on the subject. Many of the results
and proofs around subordination were explained to us by Serban Belinschi, and we
also got a lot of feedback from JC Wang and John Williams. We are also grateful
to N. Raj Rao for help with his RMTool package which was used in our numerical
simulations.
The whole idea of writing this book started from a lectures series on free prob-
ability and random matrices which we gave at the Fields Institute, Toronto, in the
fall of 2007 within the Thematic Programme on Operator Algebras. Notes of our
lectures were taken by Emily Redelmeier and by Jonathan Novak; and the first draft
of the book was based on these notes.
We had the good fortune to have Uffe Haagerup around during this programme
and he agreed to give one of the lectures, on his work on the Brown measure. As
mentioned above, the notes of his lecture became the basis of Chapter 11.
What are now Chapters 5, 8, 9, and 10 were not part of the lectures at the Fields
Institute, but were added later. Those additional chapters cover in big parts also
results which did not yet exist in 2007. So this gives us at least some kind of excuse
that the finishing of the book took so long.
Much of Chapter 8 is based on classes on “Random matrices and free entropy”
and “Non-commutative distributions” which one of us (RS) taught at Saarland Uni-
12 Contents
versity during the winter terms 2013-14 and 2014-15, respectively. The final out-
come of this chapter owes a lot to the support of Tobias Mai for those classes.
Chapter 9 is based on work of RS with Wlodek Bryc, Reza Rashidi Far and Tamer
Oraby on block random matrices in a wireless communications (MIMO) context,
and on various lectures of RS for engineering audiences, where he tried to convince
them of the relevance and usefulness of operator-valued methods in wireless prob-
lems. Chapter 10 benefited a lot from the work of Carlos Vargas on free deterministic
equivalents in his PhD thesis and from the joint work of RS with Serban Belinschi
and Tobias Mai around linearization and the analytic theory of operator-valued free
probability. The algorithms, numerical simulations, and histograms for eigenvalue
distributions in Chapter 10 and Brown measures in Chapter 11 are done with great
expertise and dedication by Tobias Mai.
There are exercises scattered throughout the text. The intention is to give readers
an opportunity to test their understanding. In some cases, where the result is used in
a crucial way or where the calculation illustrates basic ideas, a solution is provided
at the end of the book.
In addition to the already mentioned individuals we owe a lot of thanks to people
who read preliminary versions of the book and gave useful feedback, which helped
to improve the presentation and correct some mistakes. We want to mention in par-
ticular Marwa Banna, Arup Bose, Mario Diaz, Yinzheng Gu, Todd Kemp, Felix
Leid, Josué Vázquez, Hao-Wei Wang, Simeng Wang, and Guangqu Zheng.
Further thanks are due to the Fields Institute for the great environment they of-
fered us during the already mentioned thematic programme on Operator Algebras
and for the opportunity to publish our work in their Monographs series. The writing
of this book, as well as many of the reported results, would not have been possible
without financial support from various sources; in particular, we want to mention a
Killam Fellowship for RS in 2007 and 2008, which allowed him to participate in the
thematic programme at the Fields Institute and thus get the whole project started;
and the ERC Advanced Grant “Non-commutative distributions in free probability”
of RS which provided time and resources for the finishing of the project. Many of
the results we report here were supported by grants from the Canadian and German
Science Foundations NSERC and DFG, respectively; by Humboldt Fellowships for
Serban Belinschi and John Williams for stays at Saarland University; and by DAAD
German-French Procope exchange programmes between Saarland University and
the universities of Besançon and of Toulouse.
As we are covering a wide range of topics there might come a point where one
gets a bit exhausted from our book. There are, however, some alternatives; like the
standard references [97, 140, 197, 198] or survey articles [37, 84, 141, 142, 156,
162, 164, 165, 183, 191, 192] on (some aspects of) free probability. Our advice:
take a break, enjoy those and then come back motivated to learn more from our
book.
Chapter 1
Asymptotic Freeness of Gaussian Random Matrices
13
14 1 Asymptotic Freeness of Gaussian Random Matrices
Exercise 1. If ν has a moment of order n then ν has all moments of order m for
m < n.
√
The integral ϕ(t) = eist dν(s) (with i = −1) is always convergent and is called
R
The numbers {kn }n are the cumulants of ν. To distinguish them from the free
cumulants, which will be defined in the next chapter, we will call {kn }n the classi-
cal cumulants of ν. The moments {αn }n of ν and the cumulants {kn }n of ν each
determine the other through the moment-cumulant formulas:
n!
αn = ∑ kr1 · · · knrn
r1 · · · (n!)rn r ! · · · r ! 1
(1.1)
1·r1 +···+n·rn =n (1!) 1 n
r1 ,...,rn ≥0
where a is the mean and σ 2 is the variance. The characteristic function of a Gaussian
random variable is
σ 2t 2 (it)1 (it)2
ϕ(t) = exp(iat − ), thus log ϕ(t) = a +σ2 .
2 1! 2!
1.2 Moments of a Gaussian random variable 15
Hence for a Gaussian random variable all cumulants beyond the second are 0.
Exercise 2. Suppose ν has a fifth moment and we write
dt
Z
2 /2
αn = E(X n ) = t n e−t √ = (n − 1)αn−2 for n ≥ 2.
R 2π
Thus
α2n = (2n − 1)(2n − 3) · · · 5 · 3 · 1 =: (2n − 1)!!
and α2n−1 = 0 for all n.
Let us find a combinatorial interpretation of these numbers. For a positive integer
n let [n] = {1, 2, 3, . . . , n}, and P(n) denote all partitions of the set [n], i.e. π =
{V1 , . . . ,Vk } ∈ P(n) means V1 , . . . ,Vk ⊆ [n], Vi 6= 0/ for all i, V1 ∪ · · · ∪ Vk = [n], and
Vi ∩ V j = 0/ for i 6= j; V1 , . . . ,Vk are called the blocks of π. We let #(π) denote the
number of blocks of π and #(Vi ) the number of elements in the block Vi . A partition
is a pairing if each block has size 2. The pairings of [n] will be denoted P2 (n).
Let us count |P2 (2n)|, the number of pairings of [2n]. 1 must be paired with
something and there are 2n − 1 ways of choosing it. Thus
So E(X 2n ) = |P2 (2n)|. There is a deeper connection between moments and parti-
tions known as Wick’s formula (see Section 1.5).
Exercise 3. We say that a partition of [n] has type (r1 , . . . , rn ) if it has ri blocks of
size i. Show that the number of partitions of [n] of type (r1 , r2 , . . . , rn ) is
16 1 Asymptotic Freeness of Gaussian Random Matrices
n!
.
(1!)r1 (2!)r2 · · · (n!)rn r 1 !r2 ! · · · rn !
Using the type of a partition there is a very simple expression for the moment-
cumulant relations above. Moreover this expression is quite amenable for calcula-
tion. If π is a partition of [n] and {ki }i is any sequence, let kπ = k1r1 k2r2 · · · knrn where
ri is the number of blocks of π of size i. Using this notation the first of the moment-
cumulant relations can be written
αn = ∑ kπ . (1.3)
π∈P (n)
The simplest way to do calculations with relations like those above is to use formal
power series (see Stanley [167, §1.1]).
Exercise 4. Let {αn } and {kn } be two sequences satisfying (1.3). In this exercise
we shall show that as formal power series
∞
zn ∞
zn
log 1 + ∑ αn = ∑ kn . (1.5)
n=1 n! n=1 n!
(ii) By grouping the terms in ∑π kπ according to the size of the block containing
1 show that
n−1
n−1
αn = ∑ kπ = ∑ km+1 αn−m−1 .
π∈P (n) m=0 m
exp(−hBt,ti/2)dt
Z
E(Xi1 · · · Xik ) = ti1 · · ·tik
Rn (2π)n/2 det(B)−1/2
1.5 Wick’s formula 17
where h·, ·i denotes the standard inner product onRn . Let C = (ci j ) be the covariance
matrix, that is ci j = E [Xi − E(Xi )] · [X j − E(X j )] .
In fact C = B−1 and if X1 , . . . , Xn are independent then B is a diagonal ma-
trix, see Exercise 5. If Y1 , . . . ,Yn are independent Gaussian random variables, A
is an invertible real matrix, and X = AY , then X is a Gaussian random vector
and every Gaussian random vector is obtained in this way. If X = (X1 , . . . , Xn ) is
a complex random vector we say that X is a complex Gaussian random vector if
(Re(X1 ), Im(X1 ), . . . , Re(Xn ), Im(Xn )) is a real Gaussian random vector.
Exercise 5. Let X = (X1 , . . . , Xn ) be a Gaussian random vector with density
p
det(B)(2π)−n exp(−hBt,ti/2). Let C = (ci j ) = B−1 .
(i) Show that B is diagonal if and only if {X1 , . . . , Xn } are independent.
(ii) By first diagonalizing B show that ci j = E [Xi − E(Xi )] · [X j − E(X j )] .
n
(ii) Show that E(Z m Z ) = 0 for m 6= n, and that E(|Z|2n ) = n!.
The fact that only pairings arise in Wick’s formula is a consequence of the observa-
tion on page 15 that for a Gaussian random variable, all cumulants above the second
vanish.
Proof: Suppose that the covariance matrix C of (X1 , . . . , Xn ) is diagonal, i.e. the
Xi ’s are independent. Consider (i1 , . . . , ik ) as a function [k] → [n]. Let {a1 , . . . , ar }
be the range of i and A j = i−1 (a j ). Then {A1 , . . . , Ar } is a partition of [k] which
we denote ker(i). Let |At | be the number of elements in At . Then E(Xi1 · · · Xik ) =
r |A |
∏t=1 E(Xat t ). Let us recall that if X is a real Gaussian random variable of mean 0
and variance c, then for k even E(X k ) = ck/2 × |P2 (k)| = ∑π∈P2 (k) Eπ (X, . . . , X),
|A |
and for k odd E(X k ) = 0. Thus we can write the product ∏t E(Xat t ) as a sum
∑π∈P2 (k) Eπ (Xi1 , . . . , Xik ) where the sum runs over all π’s which only connect el-
ements in the same block of ker(i). Since E(Xir Xis ) = 0 for ir 6= is we can relax
the condition that π only connect elements in the same block of ker(i). Hence
E(Xi1 · · · Xik ) = ∑π∈P2 (k) Eπ (Xi1 , . . . , Xik ).
Finally let us suppose that C is arbitrary. Let the density of (X1 , . . . , Xn ) be
exp(−hBt,ti/2)[(2π)n/2 det(B)−1/2 ]−1 and choose an orthogonal matrix O such that
D = O−1 BO is diagonal. Let
Y1 X1
.. −1 ..
. = O . .
Yn Xn
Then (Y1 , . . . ,Yn ) is a real Gaussian random vector with the diagonal covariance
matrix D−1 . Then
n
E(Xi1 · · · Xik ) = ∑ oi1 j1 oi2 j2 · · · oik jk E(Y j1 Y j2 · · ·Y jk )
j1 ,..., jk =1
n
= ∑ oi1 j1 · · · oik jk ∑ Eπ (Y j1 , . . . ,Y jk )
j1 ,..., jk =1 π∈P2 (k)
= ∑ Eπ (Xi1 , . . . , Xik ).
π∈P2 (k)
Since both sides of equation (1.7) are k-linear we can extend by linearity to the
complex case.
(ε ) (ε ) (ε ) (ε )
E(Xi1 1 · · · Xik k ) = ∑ Eπ (Xi1 1 , . . . , Xik k ) (1.8)
π∈P2 (k)
for all i1 , . . . , ik ∈ [n] and all ε1 , . . . , εk ∈ {0, 1}; where we have used the notation
(0) (1)
Xi := Xi and Xi := Xi .
Formulas (1.7) and (1.8) are usually referred to as Wick’s formula after the physi-
cist Gian-Carlo Wick [200], who introduced them in 1950 as a fundamental tool in
quantum field theory; one should notice, though, that they had already appeared
much earlier, in 1918, in the work of the statistician Leon Isserlis [101].
Exercise 7. Let Z1 , . . . , Zs be independent standard complex Gaussian random vari-
ables with mean 0 and E(|Zi |2 ) = 1. Show that
Sn denotes the symmetric group on [n]. Note that this is consistent with part (iii) of
Exercise 6.
N
E(Tr(Y k )) = ∑ E(gi1 i2 gi2 i3 · · · gik i1 ).
i1 ,...,ik =1
By Wick’s formula (1.8), E(gi1 i2 gi2 i3 · · · gik i1 ) = 0 whenever k is odd, and otherwise
Now E(gir ir+1 gis is+1 ) will be 0 unless ir = is+1 and is = ir+1 (using the convention
2
that i2k+1 = i1 ). If ir = is+1 and is = ir+1 then E(gi i gi i ) = E(gi i ) = 1.
r r+1 s s+1 r r+1
Thus given (i1 , . . . , i2k ), E(gi1 i2 gi2 i3 · · · gi2k i1 ) will be the number of pairings π of [2k]
such that for each pair (r, s) of π, ir = is+1 and is = ir+1 .
In order to easily count these we introduce the following notation. We regard the
2k-tuple (i1 , . . . , i2k ) as a function i : [2k] → [N]. A pairing π = {(r1 , s1 )(r2 , s2 ), . . . ,
(rk , sk )} of [2k] will be regarded as a permutation of [2k] by letting (ri , si ) be the
transposition that switches ri with si and π = (r1 , s1 ) · · · (rk , sk ) as the product of
these transpositions. We also let γ2k be the permutation of [2k] which has the one
cycle (1, 2, 3, . . . , 2k). With this notation our condition on the pairings has a simple
expression. Let π be a pairing of [2k] and (r, s) be a pair of π. The condition ir =
is+1 can be written as i(r) = i(γ2k (π(r))) since π(r) = s and γ2k (π(r)) = s + 1.
Thus Eπ (gi1 i2 , gi2 i3 , . . . , gi2k i1 ) will be 1 if i is constant on the orbits of γ2k π and 0
otherwise. For a permutation σ , let #(σ ) denote the number of cycles of σ . Thus
N n i is constant on the o
E(Tr(Y 2k )) = π ∈ P2 (2k)
∑ orbits of γ2k π
i1 ,...,i2k =1
n i is constant on the o
= ∑ i : [2k] → [N]
orbits of γ2k π
π∈P (2k)2
= ∑ N #(γ2k π) .
π∈P2 (2k)
E(Tr(YN2k )) = ∑ N #(γ2k π) .
π∈P2 (2k)
This shows that #(γ2k π) − k − 1 ≤ 0 for all pairings π. Next we have to identify
for which π’s we have equality. For this we use a theorem of Biane which embeds
NC(n) into Sn .
We let γn = (1, 2, 3, . . . , n). Let π be a partition of [n]. We can arrange the elements
of the blocks of π in increasing order and consider these blocks to be the cycles of a
permutation, also denoted π. When we regard π as a permutation #(π) also denotes
the number of cycles of π. Biane’s result is that π is non-crossing, as a partition,
if and only if, the triangle inequality |γn | ≤ |π| + |π −1 γn | becomes an equality. In
terms of cycles this means #(π) + #(π −1 γn ) ≤ n + 1 with equality only if π is non-
crossing. This is a special case of a theorem which states that for π and σ , any two
permutations of [n] such that the subgroup generated by π and σ acts transitively
on [n], there is an integer g ≥ 0 such that #(π) + #(π −1 σ ) + #(σ ) = n + 2(1 − g),
and g is the minimal genus of a surface upon which the ‘graph’ of π relative to σ
can be embedded. See [61, Propriété II.2] and Fig. 1.1. Thus we can say that π is
non-crossing with respect to σ if |σ | = |π| + |π −1 σ |. We shall need this relation in
Chapter 5. An easy corollary of the equation #(π) + #(π −1 σ ) + #(σ ) = n + 2(1 − g)
is that if π is a pairing of [2k] and #(γ2k π) < k + 1 then #(γ2k π) < k.
Example 4. γ = (1, 2, 3, 4, 5, 6), π = (1, 4)(2, 5)(3, 6), #(π) = 3, #(γ6 ) = 1, #(π −1 γ6 )
= 2, #(π) + #(π −1 γ6 ) + #(γ6 ) = 6, therefore g = 1.
4 5 6
3 2 1
If g = 0 the surface is a sphere and the graph is planar and we say π is planar
relative to γ. When γ has one cycle, ‘planar relative to γ’ is what we have been
calling a non-crossing partition; for a proof of Biane’s theorem see [140, Proposition
23.22].
= ∑ N −2gπ , (1.9)
π∈P2 (2k)
because #(π −1 γ) = #(γπ −1 ) for any permutations π and γ, and if π is a pairing then
π = π −1 . Thus Ck := limN→∞ E(tr(XN2k )) is the number of non-crossing pairings of
[2k], i.e. the cardinality of NC2 (2k).It is well-known that this is the k-th Catalan
1 2k
number k+1 k (see [140, Lemma 8.9], or (2.5) in the next chapter).
p
Fig. 1.2 The graph of (2π)−1 4 − t 2 . The 2kth mo- 0.3
ment of the Rsemi-circle
√ law is the Catalan number
2 2k
Ck = (2π)−1 −2 t 4 − t 2 dt. 0.2
0.1
-2 -1 0 1 2
Since the Catalan numbers are the moments of the semi-circle distribution, we
have arrived at Wigner’s famous semi-circle law [201], which says that√the spec-
tral measures of {XN }N , relative to the state E(tr(·)), converge to (2π)−1 4 − t 2 dt,
i.e. the expected proportion of eigenvalues of X between a and b is asymptotically
R √
(2π)−1 ab 4 − t 2 dt. See Fig 1.2.
If we arrange that all the XN ’s are defined on the same probability space XN : Ω →
MN (C) we can say something stronger: {tr(XNk )}N converges to the kth moment
−1
R 2 k√
(2π) −2 t 4 − t 2 dt almost surely. We shall prove this in Chapters 4 and 5. See
Theorem 4.4 and Remark 5.14.
asymptotic value of the m-th moment of Xi (note that this is the same for all i); i.e.,
cm is zero for m odd and the Catalan number Cm/2 for m even.
Each factor is centred asymptotically and adjacent factors have independent en-
tries. We shall show that E(tr(YN )) → 0 and we shall call this property asymptotic
freeness. This will then motivate Voiculescu’s definition of freeness.
First let us recall the principle of inclusion-exclusion (see Stanley [167, Vol. 1,
Chap. 2]). Let S be a set and E1 , . . . , Er ⊆ S. Then
r
|S \ (E1 ∪ · · · ∪ Er )| = |S| − ∑ |Ei | + ∑ |Ei1 ∩ Ei2 | + · · ·
i=1 i1 6=i2
= S and (−1)|0|/ = 1.
T
provided we make the convention that i∈0/ Ei
Notation 8 Let i1 , . . . , im ∈ [s]. We regard these labels as the colours of the matrices
Xi1 , Xi2 , . . . , Xim . Given a pairing π ∈ P2 (m), we say that π respects the colours
i := (i1 , . . . , im ), or to be brief: π respects i, if ir = i p whenever (r, p) is a pair of π.
Thus π respects i if and only if π only connects matrices of the same colour.
Proof: The proof proceeds essentially in the same way as for the genus expansion
of moments of one GUE matrix.
(i ) (i )
E(tr(Xi1 · · · Xim )) = ∑ E( f j11j2 · · · f jmm, j1 )
j1 ,..., jm
(i ) (i )
= ∑ ∑ Eπ ( f j11, j2 , . . . , f jmm, j1 )
j1 ,..., jm π∈P2 (m)
(i ) (i )
= ∑ ∑ Eπ ( f j11, j2 , . . . , f jmm, j1 )
π∈P2 (m) j1 ,..., jm
π respects i
by (1.9)
= ∑ N −2gπ
π∈P2 (m)
π respects i
1.10 Asymptotic freeness of independent GUE’s 25
The penultimate equality follows in the same way as in the calculations leading to
Theorem 3; for this note that we have for π which respects i that
(i ) (i ) (1) (1)
Eπ ( f j11, j2 , . . . , f jmm, j1 ) = Eπ ( f j1 , j2 , . . . , f jm , j1 ),
so for the contribution of such a π which respects i it does not play a role any more
that we have several matrices instead of one.
Thus
E tr((Xim1 1 − cm1 I) · · · (Ximr r − cmr I)) = (−1)|M| E j + O(N −2 ).
\
∑
M⊆[r] j∈M
But
(a1 − ϕ(a1 )1)(a2 − ϕ(a2 )1) = a1 a2 − ϕ(a2 )a1 − ϕ(a1 )a2 + ϕ(a1 )ϕ(a2 )1.
Hence we have
ϕ(a1 a2 ) = ϕ ϕ(a2 )a1 + ϕ(a1 )a2 − ϕ(a1 )ϕ(a2 )1 = ϕ(a1 )ϕ(a2 ).
Continuing in this fashion, we know that ϕ(å1 · · · åk ) = 0 by the definition of free-
ness, where åi = ai − ϕ(ai )1 is a centred random variable. But then
where the lower order terms are already dealt with by induction hypothesis.
Remark 14. Let (A, ϕ) be a non-commutative probability space. For any subalgebra
B ⊂ A we let B̊ = B ∩ ker ϕ. Let A1 and A2 be unital subalgebras of A, we let
A1 ∨ A2 be the subalgebra of A generated algebraically by A1 and A2 . With this
notation we can restate Proposition 13 as follows. If A1 and A2 are free then
⊕ ⊕
ker ϕ|A1 ∨A2 = ∑ ∑ Åα1 Åα2 · · · Åαn (1.11)
n≥1 α1 6=···6=αn
understanding of this rule in the next chapter. For the moment let us just note the
following.
Proposition 16. Let (A, ϕ) be a non-commutative probability space. The subalge-
bra of scalars C1 is free from any other unital subalgebra B ⊂ A.
is the logarithm of the moment-generating function then {kn }n are the cumulants of
ν. We gave without proof two formulas (1.1) and (1.2) showing how to compute the
nth moment from the first n cumulants and conversely.
In the exercises below we shall prove equations (1.1) and (1.2) as well as showing
the very simple restatements in terms of set partitions
The simplicity of these formulas, in particular the first, makes them very useful for
computation. Moreover they naturally lead to the moment-cumulant formulas for
the free cumulants in which the set P(n) of all partitions of [n] is replaced by NC(n)
the set of non-crossing partitions of [n]. This will be taken up in Chapter 2.
It was shown in Exercise 4 that if we have two sequences {αn }n and {kn }n such
that αn = ∑π∈P (n) kπ then we have (1.15) as relation between their exponential
power series. In Exercises 11 and 12 this is proved again starting from the formal
power series relation and ending with the first moment-cumulant relation. This can
be regarded as a warm-up for Exercises 13 and 14 when we prove the second half
of the moment-cumulant relation:
This formula can also be proved by the general theory of Möbius inversion in P(n)
after identifying the Möbius function on P(n) (see [140, Ex. 10.33]).
So far we have only considered cumulants of a single random variable; we need
an extension to several random variables so that kn becomes a n-linear functional.
We begin with mixed moments and extend the notation used in Section 1.5. Let
{Xi }i be a sequence of random variables and π ∈ P(n), we let
30 1 Asymptotic Freeness of Gaussian Random Matrices
Then we set
Another formula we shall need is the product formula of Leonov and Shiryaev
for cumulants (see [140, Theorem 11.30]). Let n1 , . . . , nr be positive integers and
n = n1 + · · · + nr . Given random variables X1 , . . . , Xn let Y1 = X1 · · · Xn1 , Y2 =
Xn1 +1 · · · Xn1 +n2 , . . . , Yr = Xn1 +···+nr−1 +1 · · · Xn1 +···+nr . Then
where the sum runs over all π ∈ P(n) such that π ∨ τ = 1n and τ ∈ P(n) is the
partition with r blocks
and 1n ∈ P(n) is the partition with one block. Here ∨ denotes the join in the lattice
of all partitions (see [140, Remark 9.19]).
In the next chapter we will have in (2.19) an analogue of (1.16) for free cumu-
lants.
(ii) Show
1.14 Additional exercises 31
∞
n
∞ β1r1 · · · βnrn n
exp ∑ n = 1+ ∑
β z ∑ r1 !r2 ! · · · rn !
z .
n=1 n=1 r1 ,...,rn ≥0
1·r1 +···+n·rn =n
z n
Exercise 13. (i) Let ∑∞
n=1 αn n! be a formal power series. Show that
∞
zn ∞
#(π)−1
zn
log 1 + ∑ αn =∑ ∑ (−1) (#(π) − 1)! απ .
n=1 n! n=1 π∈P (n) n!
(ii) Let
kn = ∑ (−1)#(π)−1 (#(π) − 1)! απ .
π∈P (n)
αn = ∑ kπ .
π∈P (n)
Exercise 14. Suppose ν is a probability measure with moments {αn }n of all orders
and let {kn }n be its sequence of cumulants. Show that
33
34 2 The Free Central Limit Theorem and Free Cumulants
For k ≥ 1, set
1
Sk := √ (a1 + · · · + ak ). (2.1)
k
The Central Limit Theorem is a statement about the limit distribution of the ran-
dom variable Sk in the large k limit. Let us begin by reviewing the kind of conver-
gence we shall be considering.
Recall that given a real-valued random variable X on a probability space we have
a probability measure µX on R, called the distribution of X. The distribution of X is
defined by the equation
Z
E( f (X)) = f (t) dµX (t) for all f ∈ Cb (R) (2.2)
A more general criterion is the Carleman condition (see Akhiezer [3, p. 85])
which says that a measure µ is determined by its moments {αk }k if we have
∑k≥1 (α2k )−1/(2k) = ∞.
Exercise 2. Using the Carleman condition, show that the Gaussian measure is de-
termined by its moments.
A sequence of probability measures {µn }n on R is said to converge weakly to µ
R R
if { f dµn }n converges to f dµ for all f ∈ Cb (R). Given a sequence {Xn }n of real-
valued random variables we say that {Xn }n converges in distribution (or converges
in law) if the probability measures {µXn }n converge weakly.
If we are working in a non-commutative probability space (A, ϕ) we call an
element a of A a non-commutative random variable. Given such an a we may define
R
µa by p dµa = ϕ(p(a)) for all polynomials p ∈ C[x]. At this level of generality
R
we may not be able to define f dµa for all functions f ∈ Cb (R), so we call the
linear functional µa : C[x] → C the algebraic distribution of a, even if it is not a
probability measure. However when it is clear from the context we shall just call µa
the distribution of a. Note that if a is a self-adjoint element of a C∗ -algebra and ϕ is
positive and has norm 1, then µa extends from C[x] to Cb (R) and thus µa becomes
a probability measure on R.
Definition 1. Let (Ak , ϕk ), for k ∈ N, and (A, ϕ) be non-commutative probability
spaces.
1) Let (bk )k∈N be a sequence of non-commutative random variables with bk ∈ Ak ,
distr
and let b ∈ A. We say that bk converges in distribution to b, denoted by bk −→ b, if
2.1 The classical and free central limit theorems 35
Note that this definition is neither weaker nor stronger than weak convergence of
the corresponding distributions. For real-valued random variables the convergence
in (2.3) is sometimes called convergence in moments. However there is an impor-
tant case where the two conditions coincide. If we have a sequence of probability
measures {µk }k on R, each having moments of all orders and a probability measure
µ determined by its moments, such that for every n we have t n dµk (t) → t n dµ
R R
1
ϕ(Skn ) = ∑ ϕ(ar1 · · · arn ).
kn/2 r:[n]→[k]
It turns out that the fact that the random variables a1 , . . . , ak are independent and
identically distributed makes the task of calculating this sum less complex than it
initially appears. The key observation is that because of (classical or free) inde-
pendence of the ai ’s and the fact that they are identically distributed, the value of
ϕ(ar1 . . . arn ) depends not on all details of the multi-index r, but just on the informa-
tion where the indices are the same and where they are different. Let us recall some
notation from the proof of Theorem 1.1.
Notation 2 Let i = (i1 , . . . , in ) be a multi-index. Then its kernel, denoted by ker i, is
that partition in P(n) whose blocks correspond exactly to the different values of the
indices,
k and l are in the same block of ker i ⇐⇒ ik = il .
36 2 The Free Central Limit Theorem and Free Cumulants
Lemma 3. With this notation we have that ker i = ker j implies ϕ(ai1 · · · ain ) =
ϕ(a j1 · · · a jn ).
Proof: To see this note first that ker i = ker j implies that the i-indices can be ob-
tained from the j-indices by the application of some permutation σ , i.e. ( j1 , . . . , jn )
= (σ (i1 ), . . . , σ (in )). We know that the random variables a1 , . . . , ak are (classically
or freely) independent. This means that we have a factorization rule for calculating
mixed moments in a1 , . . . , ak in terms of the moments of individual ai ’s. In partic-
ular this means that ϕ(ai1 · · · ain ) can be written as some expression in moments
ϕ(ari ), while ϕ(a j1 · · · a jn ) can be written as that same expression except with ϕ(ari )
replaced by ϕ(arσ (i) ). However, since our random variables all have the same distri-
bution, then ϕ(ari ) = ϕ(arσ (i) ) for any i, j, and thus ϕ(ai1 · · · ain ) = ϕ(a j1 · · · a jn ).
Let us denote the common value of ϕ(ai1 · · · ain ) for all i with ker i = π, for some
π ∈ P(n), by ϕ(π). Consequently, we have
1
ϕ(Skn ) = ∑ ϕ(π) · |{i : [n] → [k] | ker i = π}|.
kn/2 π∈P (n)
because we have k choices for the first block of π, k − 1 choices for the second block
of π and so on until the last block where we have k − #(π) + 1.
Then what we have proved is that
1
ϕ(Skn ) = ∑ ϕ(π) · k(k − 1) · · · (k − #(π) + 1).
kn/2 π∈P (n)
The great advantage of this expression over what we started with is that the num-
ber of terms does not depend on k. Thus we are in a position to take the limit as
k → ∞, provided we can effectively estimate each term of the sum.
Our first observation is the most obvious one, namely we have
Next observe that if π has a block of size 1, then we will have ϕ(π) = 0. Indeed
suppose that π = {V1 , . . . ,Vm , . . . ,Vs } ∈ P(n) with Vm = {l} for some l ∈ [n]. Then
we will have
2.1 The classical and free central limit theorems 37
since a jl is (classically or freely) independent of {b, c}. (For the free case this factor-
ization was considered in Equation (1.13) in the last chapter. In the classical case it
is obvious, too.) Of course, for this part of the argument, it is crucial that we assume
our variables ai to be centred.
Thus the only partitions which contribute to the sum are those with blocks of size
at least 2. Note that such a partition can have at most n/2 blocks. Now,
(
k#(π) 1, if #(π) = n/2
lim = .
k→∞ kn/2 0, if #(π) < n/2
Hence the only partitions which contribute to the sum in the k → ∞ limit are those
with exactly n/2 blocks, i.e. partitions each of whose blocks has size 2. Such parti-
tions are called pairings, and the set of pairings is denoted P2 (n).
Thus we have shown that
In the case of classical independence, our random variables commute and factorize
completely with respect to ϕ. Thus if we denote by ϕ(a2i ) = σ 2 the common vari-
ance of our random variables, then for any pairing π ∈ P2 (n) we have ϕ(π) = σ n .
Thus we have
(
σ n (n − 1)(n − 3) . . . 5 · 3 · 1, if n even
lim ϕ(Skn ) = ∑ σ n = .
k→∞
π∈P2 (n) 0, if n odd
From Section 1.1, we recognize these as exactly the moments of a Gaussian random
variable of mean 0 and variance σ 2 . Since by Exercise 2 the normal distribution is
determined by its moments, and hence our convergence in moments is the same as
the classical convergence in distribution, we get the following form of the classical
38 2 The Free Central Limit Theorem and Free Cumulants
central limit theorem: if (ai )i∈N are classically independent random variables which
are identically distributed with ϕ(ai ) = 0 and ϕ(a2i ) = σ 2 , and having all moments,
then Sk converges in distribution to a Gaussian random variable with mean 0 and
variance σ 2 . Note that one can see the derivation above also as a proof of the Wick
formula for Gaussian random variables if one takes the central limit theorem for
granted.
Now we want to deal with the case where the random variables are freely indepen-
dent. In this case, ϕ(π) will not be the same for all pair partitions π ∈ P2 (2n) (we
focus on the even moments now because we already know that the odd ones are
zero). Let’s take a look at some examples:
1 2 3 4 5 6 1 4 5 6 5 6
Fig. 2.2 We start with the pairing {(1, 4), (2, 3), (5, 6)} and remove the pair (2, 3) of adjacent
elements (middle figure). Next we remove the pair (1, 4) of adjacent elements. We are then left
with a single pair; so the pairing must have been non-crossing to start with.
A very simple method is to show that the pairings are in a bijective correspon-
dence with
the Dyck paths; by using André’s reflection principle one finds that there
are 2nn − 2n 1 2n
n−1 = n+1 n such paths (see [140, Prop. 2.11] for details).
Our second method for counting non-crossing pairings is to find a simple recur-
rence which they satisfy. The idea is to look at the block of a pairing which contains
the number 1. In order for the pairing to be non-crossing, 1 must be paired with
some even number in the set [2n], else we would necessarily have a crossing. Thus
1 must be paired with 2i for some i ∈ [n]. Now let i run through all possible values
in [n], and count for each the number of non-crossing pairings that contain this pair,
as in the diagram below.
1 2 2i − 1 2i 2i + 1 2n
Fig. 2.3 We have Ci−1 possible pairings on [2, 2i − 1] and Cn−i possible pairings on [2i + 1, 2n].
In this way we see that the cardinality Cn of NC2 (2n) must satisfy the recurrence
relation
n
Cn = ∑ Ci−1Cn−i , (2.5)
i=1
with initial condition C0 = 1. One can then check using a generating function that
1 2n
the Catalan numbers satisfy this recurrence, hence Cn = n+1 n .
n
Exercise 4. Let f (z) = ∑∞ n=0 Cn z be the generating function for {Cn }n , where C0 = 1
and Cn satisfies the recursion (2.5).
(i) Show that 1 + z f (z)2 = f (z). √
(ii) Show that f is also the power series for 1− 2z1−4z .
1 2n
(iii) Show that Cn = n+1 n .
1 2n
We can also prove directly that Cn = n+1 n by finding a bijection between
NC2 (2n) and some standard set of objects which we can see directly is enumerated
by the Catalan numbers. A reasonable choice for this “canonical” set is the collec-
tion of 2 × n standard Young tableaux. A standard Young tableaux of shape 2 × n is
a filling of the squares of a 2 × n grid with the numbers 1, . . . , 2n which is strictly
increasing in each of the two rows and each of the n columns. The number of these
standard Young tableaux is very easy to calculate, using a famous and fundamental
result known as the hook-length formula [167, Vol. 2, Corollary 7.21.6]. The hook-
length formula tells us that the number of standard Young tableaux on the 2 × n
rectangle is
(2n)! 1 2n
= . (2.6)
(n + 1)!n! n + 1 n
40 2 The Free Central Limit Theorem and Free Cumulants
1 2n
Thus we will have proved that |NC2 (2n)| = n+1 n if we can bijectively associate to
each pair partition π ∈ NC2 (2n) a standard Young tableaux on the 2 × n rectangular
grid. This is very easy to do. Simply take the “left-halves” of each pair in π and write
them in increasing order in the cells of the first row. Then take the “right-halves”
of each pair of π and write them in increasing order in the cells of the second row.
Figure 2.4 shows the bijection between NC2 (6) and standard Young tableaux on the
2 × 3 rectangle.
1 2 3 4 5 6 1 3 4
2 5 6
Fig. 2.4 In the bijection between NC2 (6) and 2 × 3 standard Young tableaux the pairing
{(1, 2), (3, 6), (4, 5)} gets mapped to the tableaux on the right.
The argument we have just provided gives us the Free Central Limit Theorem.
Theorem 5. If (ai )i∈N are self-adjoint, freely independent, and identically dis-
tributed with ϕ(ai ) = 0 and ϕ(a2i ) = σ 2 , then Sk converges in distribution to a
semi-circular element of variance σ 2 as k → ∞.
This free central limit theorem was proved as one of the first results in free prob-
ability theory by Voiculescu already in [176]. His proof was much more operator
theoretic; the proof presented here is due to Speicher [159] and was the first hint
at a relation between free probability theory and the combinatorics of non-crossing
partitions. (An early concrete version of the free central limit theorem, before the
notion of freeness was isolated, appeared also in the work of Bożejko [43] in the
context of convolution operators on free groups.)
Recall that in Chapter 1 it was shown that for a random matrix XN chosen from
N × N GUE we have that
(
n 0, if n odd
lim E[tr(XN )] = (2.7)
N→∞ Cn/2 , if n even
so that a GUE random matrix is a semi-circular element in the limit of large matrix
distr
size, XN −→ s.
We can also define a family of semi-circular random variables.
2.2 Non-crossing partitions and free cumulants 41
where
ϕπ [si1 , . . . , sin ] = ∏ ci p iq .
(p,q)∈π
This is the free analogue of Wick’s formula. In fact, using this language and our
definition of convergence in distribution from Definition 1, it follows directly from
Lemma 1.9 that if X1 , . . . , Xr are matrices chosen independently from GUE, then, in
the large N limit, they converge in distribution to a semi-circular family s1 , . . . , sr of
covariance ci j = δi j .
Exercise 5. Show that if {x1 , . . . , xn } is a semi-circular family and A = (ai j ) is an
invertible matrix with real entries then {y1 , . . . , yn } is a semi-circular family where
yi = ∑ j ai j x j .
Exercise 6. Let {x1 , . . . , xn } be a semi-circular family such that for all i and j we
have ϕ(xi x j ) = ϕ(x j xi ). Show that by diagonalizing the covariance matrix we can
find an orthogonal matrix O = (oi j ) such that {y1 , . . . , yn } is a free semi-circular
family where yi = ∑ j oi j x j .
Exercise 7. Formulate and prove a multidimensional version of the free central limit
theorem.
Figure 2.5 should make it clear what a crossing in a partition is; a non-crossing
partition is a partition with no crossings.
Note that P(n) is partially ordered by
We also say that π1 is a refinement of π2 . NC(n) is a subset of P(n) and inherits this
partial order, so NC(n) is an induced sub-poset of P(n). In fact both are lattices;
they have well-defined join ∨ and meet ∧ operations (though the join of two non-
crossing partitions in NC(n) does not necessarily agree with their join when viewed
as elements of P(n)). Recall that the join π1 ∨ π2 in a lattice is the smallest σ with
the property that σ ≥ π1 and σ ≥ π2 ; and that the meet π1 ∧ π2 is the largest σ with
the property that σ ≤ π1 and σ ≤ π2 .
We now define the important free cumulants of a non-commutative probability
space (A, ϕ). They were introduced by Speicher in [161]. For other notions of cu-
mulants and the relation between them see [10, 74, 117, 153].
Remark 9. In Equation (2.10) and below, we always mean that the elements i1 , . . . , il
of V are in increasing order. Note that Equation (2.9) has a formulation using Möbius
inversion which we might call the cumulant-moment formula. To present this we
need the moment version of Equation (2.10). For a partition π ∈ P(n) with π =
{V1 , . . . ,Vr } we set
We also need the Möbius function µ for NC(n) (see [140, Lecture 10]). Then our
cumulant-moment relation can be written
One could use Equation (2.12) as the definition of free cumulants, however for prac-
tical calculations Equation (2.9) is usually easier to work with.
Example 10. (1) For n = 1, we have ϕ(a1 ) = κ1 (a1 ), and thus
Since we know from the n = 1 calculation that κ1 (a1 ) = ϕ(a1 ), this yields
κn (a1 , a2 , . . . , an ) = κn (a2 , . . . , an , a1 ).
(ii) Let us assume that all moments with respect to ϕ are invariant under all
permutations of the entries, i.e., that we have for all n ∈ N and all a1 , . . . , an ∈ A
and all σ ∈ Sn that ϕ(aσ (1) · · · aσ (n) ) = ϕ(a1 · · · an ). Is it then true that also the free
cumulants κn (n ∈ N) are invariant under all permutations?
Let us also point out how the definition appears when a1 = · · · = an = a, i.e. when
all the random variables are the same. Then we have
called the free Poisson law (of rate c). We should also note that we have chosen a dif-
ferent normalization than that used by other authors in order to make the cumulants
simple; see Remark 12 and Exercise 12 below.
Exercise 11. In this exercise we shall find the moments and free cumulants of the
Marchenko-Pastur law.
√
(i) Let αn be the nth moment. Use the substitution t = (x − (1 + c))/ c to show
that
[(n−1)/2]
n − 1 2k
1
αn = ∑ (1 + c)n−2k−1 c1+k .
k=0 k + 1 2k k
[(n−1)/2] n−k−1
(n − 1)!
αn = ∑ ∑ cl+1 .
k=0 l=k k! (k + 1)! (l − k)! (n − k − l − 1)!
(iii) Interchange the order of summation and use Vandermonde convolution ([79,
(5.23)]) to show that
n l
c n n
αn = ∑ .
l=1 n l − 1 l
n n
(iv) Finally use the fact ([140, Cor. 9.13]) that n1 l−1 l is the number of non-
crossing partitions of [n] with l blocks to show that
αn = ∑ c#(π) .
π∈NC(n)
√ √
Remark 12. Given y > 0, let a0 = (1 − y)2 and b0 = (1 + y)2 . Let ρy be the prob-
ability measurepon R given by (b0 − t)(t − a0 )/(2πyt) dt on [a0 , b0 ] when y ≤ 1 and
p
(1 − y−1 )δ0 + (b0 − t)(t − a0 )/(2πyt) dt on {0} ∪ [a0 , b0 ] when y > 1. As above δ0
is the Dirac mass at 0. This might be called the standard form of the Marchenko-
Pastur law. In the exercise below we shall see that ρy is related to νc in a simple way
and the cumulants of ρy are not as simple as those of νc .
Exercise 12. Show that by setting c = 1/y and making the substitution t = x/c we
have Z Z
xk dνc (x) = ck t k dρy (t).
κ2 (a1 a2 , a3 a4 ) (2.18)
= κ4 (a1 , a2 , a3 , a4 ) + κ1 (a1 )κ3 (a2 , a3 , a4 ) + κ1 (a2 )κ3 (a1 , a3 , a4 )
+ κ1 (a3 )κ3 (a1 , a2 , a4 ) + κ1 (a4 )κ3 (a1 , a2 , a3 ) + κ2 (a1 , a4 )κ2 (a2 , a3 )
+ κ2 (a1 , a3 )κ1 (a2 )κ1 (a4 ) + κ2 (a1 , a4 )κ1 (a2 )κ1 (a3 )
+ κ1 (a1 )κ2 (a2 , a3 )κ1 (a4 ) + κ1 (a1 )κ2 (a2 , a4 )κ1 (a3 ).
Then
κr (A1 , . . . , Ar ) = ∑ κπ (a1 , . . . , an ) (2.19)
π∈NC(n)
π∨τ=1n
where the summation is over those π ∈ NC(n) which connect the blocks correspond-
ing to A1 , . . . , Ar . More precisely, this means that π ∨ τ = 1n where
Exercise 13. (i) Let τ = {(1, 2), (3)}. List all π ∈ NC(3) such that π ∨ τ = 13 . Check
that these are exactly the terms appearing on the right-hand side of Equation (2.17).
(ii) Let τ = {(1, 2), (3, 4)}. List all π ∈ NC(4) such that π ∨ τ = 14 . Check that
these are exactly the terms on the right-hand side of Equation (2.18)
The most important property of free cumulants is that we may characterize
free independence by the vanishing of “mixed” cumulants. Let (A, ϕ) be a non-
commutative probability space and A1 , . . . , As ⊂ A unital subalgebras. A cumulant
κn (a1 , a2 , . . . , an ) is mixed if each ai is in one of the subalgebras, but a1 , a2 , . . . , an
do not all come from the same subalgebra.
Theorem 14. The subalgebras A1 , . . . , As are free if and only if all mixed cumulants
vanish.
2.2 Non-crossing partitions and free cumulants 47
The proof of this theorem relies on formula (2.19) and on the following proposi-
tion which is a special case of Theorem 14. For the details of the proof of Theorem
14 we refer again to [140, Theorem 11.15].
Proof: We consider the case where the last argument an is equal to 1, and proceed
by induction on n.
For n = 2,
κ2 (a, 1) = ϕ(a1) − ϕ(a)ϕ(1) = 0.
So the base step is done.
Now assume for the induction hypothesis that the result is true for all 1 ≤ k < n.
We have that
hence
Since ϕ(a1 · · · an−1 1) = ϕ(a1 · · · an−1 ), we have proved that κn (a1 , . . . , an−1 , 1) = 0.
Theorem 16. Let (A, ϕ) be a non-commutative probability space. The random vari-
ables a1 , . . . , as ∈ A are free if and only if all mixed cumulants of the a1 , . . . , as
48 2 The Free Central Limit Theorem and Free Cumulants
ϕ(a1 b1 a2 b2 · · · ar br ) = ∑ κπ (a1 , b1 , a2 , b2 , · · · , ar , br ).
π∈NC(2r)
Since the a’s are free from the b’s, we only need to sum over those partitions π
which do not connect the a’s with the b’s. Each such partition may be written as π =
πa ∪ πb , where πa denotes the blocks consisting of a’s and πb the blocks consisting
of b’s. Hence by the definition of free cumulants
It is now easy to see that, for a given πa ∈ NC(r), there exists a biggest σ ∈ NC(r)
with the property that πa ∪σ ∈ NC(2r). This σ is called the Kreweras complement of
πa and is denoted by K(πa ), see [140, Def. 9.21]. This K(πa ) is given by connecting
as many b’s as possible in a non-crossing way without getting crossings with the
blocks of πa . The mapping K is an order-reversing bijection on the lattice NC(r).
But then the summation condition on the internal sum above is equivalent to the
condition πb ≤ K(πa ). Summing κπ over all π ∈ NC(r) gives the corresponding r-th
moment, which extends easily to
∑ κπ (b1 , . . . , br ) = ϕσ (b1 , . . . , br ),
π∈NC(r)
π≤σ
where ϕσ denotes, in the same way as in κπ , the product of moments along the
blocks of σ ; see Equation (2.11).
Thus we get as the final conclusion of our calculations that
Let us consider some simple examples for this formula. For r = 1, there is only
one π ∈ NC(1), which is its own complement, and we get
2.4 Functional relation between moment series and cumulant series 49
K( )= and K( ) =
With κ1 (a) = ϕ(a) and κ2 (a1 , a2 ) = ϕ(a1 a2 ) − ϕ(a1 )ϕ(a2 ) this reproduces formula
(1.14).
The formula above is not symmetric between the a’s and the b’s (the former
appear with cumulants, the latter with moments). Of course, one can also exchange
the roles of a and b, in which case one ends up with
Let us also note in passing that one can rewrite the Equations (2.20) and (2.21)
above in the symmetric form (see [140, (14.4)])
κna+b = κn (a + b, . . . , a + b)
= κn (a, . . . , a) + κn (b, . . . , b) + (mixed cumulants in a, b)
= κna + κnb .
Thus the problem of calculating moments is shifted to the relation between cu-
mulants and moments. We already know that the moments are polynomials in the
cumulants, according to the moment-cumulant formula (2.16), but we want to put
this relationship into a framework more amenable to performing calculations.
For any a ∈ A, let us consider formal power series in an indeterminate z defined
by
∞
M(z) = 1 + ∑ αna zn , moment series of a
n=1
∞
C(z) = 1 + ∑ κna zn , cumulant series of a.
n=1
Proposition 17. The relation between the moment series M(z) and the cumulant
series C(z) of a random variable is given by
Proof: The idea is to sum first over the possibilities for the block of π containing
1, as in the derivation of the recurrence for Cn . Suppose that the first block of π
looks like V = {1, v2 , . . . , vs }, where 1 < v1 < · · · < vs ≤ n. Then we build up the
rest of the partition π out of smaller “nested” non-crossing partitions π1 , . . . , πs with
π1 ∈ NC({2, . . . , v2 − 1}), π2 ∈ NC({v2 + 1, . . . , v3 − 1}), etc. Hence if we denote
i1 = |{2, . . . , v2 − 1}|, i2 = |{v2 + 1, . . . , v3 − 1}|, etc., then we have
n
αn = ∑ ∑ ∑ κs κπ1 · · · κπs
s=1 i1 ,...,is ≥0 π=V ∪π1 ∪···∪πs
s+i1 +···+is =n
n
= ∑ ∑ κs ∑ κπ1 · · · ∑ κπs
s=1 i1 ,...,is ≥0 π1 ∈NC(i1 ) πs ∈NC(is )
s+i1 +···+is =n
n
= ∑ ∑ κs αi1 · · · αis .
s=1 i1 ,...,is ≥0
s+i1 +···+is =n
Thus we have
2.4 Functional relation between moment series and cumulant series 51
∞ ∞ n
1 + ∑ αn zn = 1 + ∑ ∑ ∑ κs zs αi1 zi1 . . . αis zis
n=1 n=1 s=1i1 ,...,is ≥0
s+i1 +···+is =n
∞ ∞ s
s i
= 1 + κs z
∑ ∑ αi z .
s=1 i=0
C(z) − 1 ∞
a
R(z) := = ∑ κn+1 zn . (2.26)
z n=0
C(z)
Also put K(z) = R(z) + 1z = z . Then we have the relations
1 1 1 1 1
K(G(z)) = C(G(z)) = C M = zG(z) = z.
G(z) G(z) z z G(z)
Note that M and C are in C[[z]], the ring of formal power series in z, G ∈ C[[ 1z ]],
and K ∈ C((z)), the ring of formal Laurent series in z, i.e. zK(z) ∈ C[[z]]. Thus K ◦G ∈
C(( 1z )) and G ◦ K ∈ C[[z]]. We then also have G(K(z)) = z.
Thus we recover the following theorem of Voiculescu, which is the main re-
sult on the R-transform. Voiculescu’s original proof in [177] was much more op-
erator theoretic. One should also note that this computational machinery for the
R-transform was also found independently and about the same time by Woess
[204, 205], Cartwright and Soardi [49], and McLaughlin [125], in a more restricted
setting of random walks on free product of groups. Our presentation here is based
on the approach of Speicher in [161].
Theorem 18. For a random variable a let Ga (z) be its Cauchy transform and define
its R-transform Ra (z) by
Ga [Ra (z) + 1/z] = z. (2.27)
Then, for a and b freely independent, we have
z = Ga+b [Ra+b (z) + 1/z] = Ga+b [Ra (z) + Rb (z) + 1/z]. (2.29)
52 2 The Free Central Limit Theorem and Free Cumulants
If we now put w := Ra+b (z) + 1/z, then we have z = Ga+b (w) and we can continue
Equation (2.29) as:
Ga+b (w) = z = Ga [Ra (z) + 1/z] = Ga [w − Rb (z)] = Ga w − Rb [Ga+b (w)] .
We have ωa , ωb ∈ C(( 1z )), so Ga ◦ ωa ∈ C[[ 1z ]]. These satisfy the subordination rela-
tions
Ga+b (z) = Ga [ωa (z)] = Gb [ωb (z)]. (2.31)
We say that Ga+b is subordinate to both Ga and Gb . The name comes from the theory
of univalent functions; see [65, Ch. 6] for a general discussion.
Exercise 14. Show that ωa (z) + ωb (z) − 1/Ga (ωa (z)) = z.
1
Exercise 15. Suppose we have formal Laurent series ωa (z) and ωb (z) in z such that
Ga (ωa (z)) = Gb (ωb (z)) and ωa (z) + ωb (z) − 1/Ga (ωa (z)) = z. (2.32)
Let G be the formal power series G(z) = Ga (ωa (z)) and R(z) = Gh−1i (z) − z−1 .
(Gh−1i denotes here the inverse under composition of G.) By replacing z by Gh−1i (z)
in the second equation of (2.32) show that R(z) = Ra (z) + Rb (z). These equations
can thus be used to define the distribution of the sum of two free random variables.
At the moment these are identities on the level of formal power series. In the next
chapter, we will elaborate on their interpretation as identities of analytic functions,
see Theorem 3.43.
and by
∂x 1 = 0, ∂x x = 1 ⊗ 1.
This means that it is given more explicitly as the linear extension of
n−1
∂x x n = ∑ xk ⊗ xn−1−k . (2.33)
k=0
We can also (and will) extend this definition from polynomials to infinite formal
power series.
Exercise 16. (i) Let, for some z ∈ C with z 6= 0, f be the formal power series
1 ∞
xn
f (x) = = ∑ n+1 .
z − x n=0 z
∂x x = 1 ⊗ 1, ∂x y = 0, ∂x 1 = 0.
For a monomial xi1 · · · xin in x and y (where we put x1 := x and x2 := y) this means
explicitly
n
∂x xi1 · · · xin = ∑ δ1ik xi1 · · · xik−1 ⊗ xik+1 · · · xin . (2.34)
k=1
Again it is clear that we can extend this definition also to formal power series in
non-commuting variables.
Let us note that we may define the derivation ∂x+y on Chx + yi exactly as we did
∂x . Namely ∂x+y (1) = 0 and ∂x+y (x + y) = 1 ⊗ 1. Note that ∂x+y can be extended
to all of Chx, yi but not in a unique way unless we specify another basis element.
Since Chx + yi ⊂ Chx, yi we may apply ∂x to Chx + yi and observe that ∂x (x + y) =
1 ⊗ 1 = ∂x+y (x + y). Thus
54 2 The Free Central Limit Theorem and Free Cumulants
n
∂x (x + y)n = ∑ (x + y)k−1 ⊗ (x + y)n−k = ∂x+y (x + y)n .
k=1
Hence
∂x |Chx+yi = ∂x+y . (2.35)
If we are given a polynomial p(x, y) ∈ Chx, yi, then we will also consider
Ex [p(x, y)], the conditional expectation of p(x, y) onto a function of just the vari-
able x, which should be the best approximation to p among such functions. There
is no algebraic way of specifying what best approximation means; we need a state
ϕ on the ∗-algebra generated by self-adjoint elements x and y for this. Given such
a state, we will require that the difference between p(x, y) and Ex [p(x, y)] cannot be
detected by functions of x alone; more precisely, we ask that
ϕ q(x) · Ex [p(x, y)] = ϕ q(x) · p(x, y) (2.36)
for all q ∈ Chxi. If we are going from the polynomials Chx, yi over to the Hilbert
space completion L2 (x, y, ϕ) with respect to the inner product given by h f , gi :=
ϕ(g∗ f ) then this amounts just to an orthogonal projection from the space L2 (x, y, ϕ)
onto the subspace L2 (x, ϕ) generated by polynomials in the variable x. (Let us as-
sume that ϕ is positive and faithful so that we get an inner product.) Thus, on the
Hilbert space level the existence and uniqueness of Ex [p(x, y)] is clear. In general,
though, it might not be the case that the projection of a polynomial in x and y is a
polynomial in x – it will just be an L2 -function. If we assume, however, that x and
y are free, then we claim that this projection maps polynomials to polynomials. In
fact for this construction to work at the algebraic level we only need assume that
ϕ|Chxi is non-degenerate as this shows that Ex is well defined by (2.36). It is clear
from Equation (2.36) that ϕ(Ex (a)) = ϕ(a) for all a ∈ Chx, yi.
Let us consider some examples. Assume that x and y are free. Then it is clear that
we have
Ex [xn ym ] = xn ϕ(ym )
and more generally
Ex [xn1 ym xn2 ] = xn1 +n2 ϕ(ym ).
It is not so clear what Ex [yxyx] might be. Before giving the general rule let us make
some simple observations.
Exercise 17. Let A1 = Chxi and A2 = Chyi with x and y free and ϕ|A1 non-degen-
erate.
(i) Show that Ex [Å2 ] = 0.
(ii) For α1 , . . . , αn ∈ {1, 2} with α1 6= · · · 6= αn and n ≥ 2, show that Ex [Åα1 · · · Åαn ] =
0.
Exercise 18. Let A1 and A2 be as in Exercise 17. Since A1 and A2 are free we can
use Equation (1.12) from Exercise 1.9 to write
2.5 Subordination and the non-commutative derivative 55
⊕ ⊕
A1 ∨ A2 = A1 ⊕ Å2 ⊕ ∑ ∑ Åα1 Åα2 · · · Åαn .
n≥2 α1 6=···6=αn
We have just shown that if Ex is a linear map satisfying Equation (2.36) then Ex is
the identity on the first summand and 0 on all remaining summands. Show that by
defining Ex this way we get the existence of a linear mapping from A1 ∨ A2 to A1
satisfying Equation (2.36). An easy consequence of this is that for q1 (x), q2 (x) ∈
Chxi and p(x, y) ∈ Chx, yi we have Ex [q1 (x)p(x, y)q2 (x)] = q1 (x)Ex [p(x, y)]q2 (x).
Let a1 = yn1 , a2 = yn2 and b = xm1 . To compute Ex (yn1 xm1 yn2 ) we follow the same
centring procedure used to compute ϕ(a1 ba2 ) in Section 1.12. From Exercise 17 we
see that
Thus
Ex [yn1 xm1 yn2 xm2 ] = ϕ(yn1 +n2 )ϕ(xm1 )xm2 + ϕ(yn1 )xm1 ϕ(yn2 )xm2
− ϕ(yn1 )ϕ(xm1 )ϕ(yn2 )xm2 .
The following theorem (essentially in the work [34] of Biane) gives the gen-
eral recipe for calculating such expectations. As usual the formulas are simplified
by using cumulants. To give the rule we need the following bit of notation. Given
σ ∈ P(n) and a1 , . . . , an ∈ A we define ϕ̃σ (a1 , . . . , an ) in the same way as ϕσ in
Equation (2.11) except we do not apply ϕ to the last block, i.e. the block con-
taining n. For example if σ = {(1, 3, 4), (2, 6), (5)} then ϕ̃σ (a1 , a2 , a3 , a4 , a5 , a6 ) =
ϕ(a1 a3 a4 )ϕ(a5 )a2 a6 . More explicitly, for σ = {V1 , . . . ,Vs } ∈ NC(r) with r ∈ Vs we
put
ϕ̃σ (a1 , . . . , ar ) = ϕ ∏ ai1 · · · ϕ ∏ ais−1 · ∏ ais .
i1 ∈V1 is−1 ∈Vs−1 is ∈Vs
Ex [yn1 xm1 · · · ynr xmr ] = ∑ κπ (yn1 , . . . , ynr ) · ϕ̃K(π) (xm1 , . . . , xmr ). (2.37)
π∈NC(r)
Let us check that this agrees with our previous calculation of Ex [yn1 xm1 yn2 xm2 ].
Prove Theorem 19 by showing that with the expression given in (2.37) one has for
all m ≥ 0
Exercise 20. Use the method of Exercise 19 to work out Ex [xm1 yn1 · · · xmr ynr ].
By linear extension of Equation (2.37) one can thus get the projection onto one
variable x of any non-commutative polynomial or formal power series in two free
variables x and y. We now want to identify the projection of resolvents in x + y. To
achieve this we need a crucial intertwining relation between the partial derivative
and the conditional expectation.
Lemma 20. Suppose ϕ is a state on Chx, yi such that x and y are free and ϕ|Chxi is
non-degenerate. Then
Proof: We let A1 = Chxi and A2 = Chyi. We use the decomposition from Exer-
cise 1.9
⊕ ⊕
A1 ∨ A2 A1 = Å2 ⊕ ∑ ∑ Åα1 · · · Åαn
n≥2 α1 6=···6=αn
Ex ⊗ Ex ◦ ∂x (Åα1 · · · Åαn )
n
⊆ ∑ δ1,αk Ex (Åα1 · · · Åαk−1 (C1 ⊕ Åαk )) ⊗ Ex ((C1 ⊕ Åαk )Åαk+1 · · · Åαn ).
k=1
Theorem 21. Let x and y be free. For every z ∈ C with z 6= 0 there exists a w ∈ C
such that
1 1
Ex [ ]= . (2.39)
z − (x + y) w−x
In other words, the best approximation for a resolvent in x + y by a function of x is
again a resolvent.
By applying the state ϕ to both sides of (2.39) one obtains the subordination for
the Cauchy transforms, and thus it is clear that the w from above must agree with
the subordination function from (2.31), w = ω(z).
Proof: We put
1
f (x, y) := .
z − (x + y)
By Exercise 16, part (i), we know that ∂x+y f = f ⊗ f . By Lemma 20 we have that
for functions g of x + y
∂x Ex [ f ] = Ex ⊗ Ex [∂x+y f ] = Ex ⊗ Ex [ f ⊗ f ] = Ex [ f ] ⊗ Ex [ f ].
Thus, by the second part of Exercise 16, we know that Ex [ f ] is a resolvent in x and
we are done.
Chapter 3
Free Harmonic Analysis
can then consider the relation between the formal power series G obtained from the
moment generating function and the analytic function G obtained from the spectral
measure. It turns out that on the exterior of a disc containing the support of ν, the
formal power series converges to the analytic function, and the R-transform becomes
an analytic function on an open set containing 0 whose power series expansion is
the formal power series ∑n≥1 κn zn−1 given in the previous chapter.
When ν does not have all moments there is no formal power series; this corre-
sponds to a being an unbounded self-adjoint operator affiliated with A. However, the
Cauchy transform is always defined. Moreover, one can construct the R-transform
of ν, analytic on some open set, satisfying equation (3.1) — although there may not
be any free cumulants if ν has no moments. However if ν does have moments then
the R-transform has cumulants given by an asymptotic expansion at 0.
If X and Y are classically independent random variables with distributions νX
and νY then the distribution of X +Y is the convolution, νX ∗ νY . We shall construct
the free analogue, νX νY , of the classical convolution. νX νY is called the free
additive convolution of νX and νY ; it is the distribution of the sum X + Y when X
59
60 3 Free Harmonic Analysis
and Y are freely independent. Since X and Y do not commute we cannot do this
with functions as in the classical case. We shall do this on the level of probability
measures.
We shall ultimately show that the R-transform exists for all probability measures.
However, we shall first do this for compactly supported probability measures, then
for probability measures with finite variance, and finally for arbitrary probability
measures. This follows more or less the historical development. The compactly sup-
ported case was treated in [177] by Voiculescu. The case of finite variance was then
treated by Maassen in [120]; this was an important intermediate step, as it promoted
the use of the reciprocal Cauchy transform F = 1/G and of the subordination func-
tion. The general case was then first treated by Bercovici and Voiculescu in [31] by
operator algebraic methods; however, more recent alternative approaches, by Be-
linschi and Bercovici [21, 18] and by Chistyakov and Götze [54, 53], rely on the
subordination formulation. Since this subordination approach seems to be analyti-
cally better controllable than the R-transform, and also best suited for generaliza-
tions to the operator-valued case (see Chapter 10, in particular Section 10.4)), we
will concentrate in our presentation on this approach and try to give a streamlined
and self-contained presentation.
of z. Show that
s√ s√
√ u2 + v2 + u √ u2 + v2 − u
Re( z) = and Im( z) = .
2 2
Exercise 4. In this exercise we shall compute the Cauchy transform of the arc-sine
law using contour integration. Recall that
√ the density of the arc-sine law on the
interval [−2, 2] is given by dν(t) = 1/(π 4 − t 2 ). Let
62 3 Free Harmonic Analysis
(z − t)−1
Z 2
1
G(z) = √ dt.
π −2 4 − t2
(i) Make the substitution t = 2 cos θ for 0 ≤ θ ≤ π. Show that
Z 2π
1
G(z) = (z − 2 cos θ )−1 dθ .
2π 0
(ii) Make the substitution w = eiθ and show that we can write G as the contour
integral
1 1
Z
G(z) = dw
2πi Γ zw − w2 − 1
where Γ = {w ∈ C | |w| = 1}. √
2 2
√ Show that the roots of zw − w − 1 = 0 are w1 = (z − z − 4)/2 and w2 =
(iii)
2
(z + z − 4)/2 and that w1 ∈ int(Γ ) and that w2 6∈ int(Γ ), using the branch defined
above. √
(iv) Using the residue calculus show that G(z) = 1/ z2 − 4.
Exercise 5. In this exercise we shall compute the Cauchy transform of the semi-
circle law using contour integration. Recall that the
√ density of the semi-circle law
on the interval [−2, 2] is given by dν(t) = (2π)−1 4 − t 2 . Let
Z √
1 2 4 − t2
G(z) = dt.
2π −2 z − t
(i) Make the substitution t = 2 cos θ for 0 ≤ θ ≤ π. Show that
4 sin2 θ
Z 2π
1
G(z) = dθ .
4π 0 z − 2 cos θ
(ii) Make the substitution w = eiθ and show that we can write G as the contour
integral
1 (w2 − 1)2
Z
G(z) = dw
4πi Γ w (w2 − zw + 1)
2
Exercise 6. In this exercise we shall compute the Cauchy transform of the Mar-
chenko-Pastur law with parameter c using contour integration. We shall start by
supposing that c > 1. Recall that the density of the Marchenko-Pastur law on the
3.1 The Cauchy transform 63
p √
interval [a, b] is given by dνc (t) = (b − t)(t − a)/(2πt)dt with a = (1 − c)2 and
√
b = (1 + c)2 . Let
Z bp
(b − t)(t − a)
G(z) = dt.
a 2πt(z − t)
√
(i) Make the substitution t = 1 + 2 c cos θ + c for 0 ≤ θ ≤ π. Show that
4c sin2 θ
Z 2π
1
G(z) = √ √ dθ .
4π 0 (1 + 2 c cos θ + c)(z − 1 − 2 c cos θ − c)
(ii) Make the substitution w = eiθ and show that we can write G as the contour
integral
1 (w2 − 1)2
Z
G(z) = dw
4πi Γ w(w + f w + 1)(w2 − ew + 1)
2
√ √
where Γ = {w ∈ C | |w| = 1}, f = (1 + c)/ c, and e = (z − (1 + c))/ c.
(iii) Using the results from Exercise 3 and the residue calculus, show that
p
z + 1 − c − (z − a)(z − b)
G(z) = , (3.3)
2z
√
using the branch defined in same way as with z2 − 4 above except a replaces −2
and b replaces 2.
Proof: We have
1 −y2
Z Z
y Im(G(iy)) = y Im dν(t) = 2 2
dν(t)
R iy − t R y +t
1
Z Z
=− 2
dν(t) → − dν(t) = −1
R 1 + (t/y) R
Thus sup y |G(x + iy)| ≤ 1. By the first part, however, the supremum is 1.
y>0, x∈R
Another frequently used notation is to let m(z) = (t − z)−1 dν(t). We have
R
Notation 4 Let us recall the Poisson kernel from harmonic analysis. Let
1 1 1 ε
P(t) = and Pε (t) = ε −1 P(tε −1 ) = for ε > 0.
π 1 + t2 π t2 + ε2
If ν1 and ν2 are two probability measures on R recall that their convolution is
R∞
defined by ν1 ∗ ν2 (E) = −∞ ν1 (E − t) dν2 (t) (see Rudin [151, Ex. 8.5]). If ν is
a probability measure on R and f ∈ L1 (R, ν) we can define f ∗ ν by f ∗ ν(t) =
R∞
−∞ f (t − s) dν(s). Since P is bounded we can form Pε ∗ ν for any probability mea-
sure ν and any ε > 0. Moreover Pε is the density of a probability measure, namely
a Cauchy distribution with scale parameter ε. We shall denote this distribution by
δ−iε .
Proof: We have
3.1 The Cauchy transform 65
1 −y
Z Z
Im(G(x + iy)) = Im dν(t) = 2 2
dν(t).
R x − t + iy R (x − t) + y
Thus
Z b Z Z b
−y
Im(G(x + iy)) dx = 2 2
dx dν(t)
a R a (x − t) + y
Z Z (b−t)/y
1
=− 2
d x̃ dν(t)
R (a−t)/y 1 + x̃
Z b−t a − t
=− tan−1 − tan−1 dν(t),
R y y
Then limy→0+ f (y,t) = f (t), and, for all y > 0 and for all t, we have | f (y,t)| ≤ π.
So by Lebesgue’s dominated convergence theorem
Z b Z
lim Im(G(x + iy)) dx = − lim f (y,t) dν(t)
y→0+ a y→0+ R
Z
=− f (t) dν(t)
R
1
= −π(ν((a, b) + ν({a, b})).
2
This proves the first claim.
Now assume that Gν1 = Gν2 . This implies, by the formula just proved, that
ν1 ((a, b)) = ν2 ((a, b)) for all a and b which are atoms neither of ν1 nor of ν2 . Since
there are only countably many atoms of ν1 and ν2 , we can write any interval (a, b)
in the form (a, b) = ∪∞ +
n=1 (a + εn , b − εn ) for a decreasing sequence ε → 0 , such
that all a + εn and all b − εn are atoms neither of ν1 nor of ν2 . But then we get
This shows that ν1 and ν2 agree on all open intervals and thus are equal.
√
4 − t2
dν(t) = dt on [−2, 2];
2π
and the moments are given by
Z 2
(
n 0, n odd
mn = t dν(t) = ,
−2 Cn/2 , n even
where the Cn ’s are the Catalan numbers:
1 2n
Cn = .
n+1 n
1
M(z)2 = ∑ Ck+1 z2k = z2 ∑ Ck+1 z2(k+1)
k≥0 k≥0
and therefore
(
|x2 − 4|1/2 · 0 = 0, |x| > 2
q
lim Im (x + iy)2 − 4 = 2 1/2
√
y→0+ 2
|x − 4| · 1 = 4 − x , |x| ≤ 2
and thus
p !
x + iy − (x + iy)2 − 4
lim Im(G(x + iy)) = lim Im
y→0+ y→0+ 2
0, |x| > 2
√
= − 4 − x2 .
, |x| ≤ 2
2
Therefore
1 0, |x| > 2
√
− lim Im(G(x + iy)) = 2 .
y→0+ π 4 − x , |x| ≤ 2
2π
Hence we recover our original density.
z − a 2 (x − a)2 + y2 1 + ( x−a
y )
2
x − a 2
= = ≤ 1 + < 1 + θ 2.
z−t (x − t)2 + y2 1 + ( x−t
y )2 y
68 3 Free Harmonic Analysis
We have |(z − a)/(z − t)| → 0 as z → a for all t 6= a. Since {a} is a set of σ mea-
sure 0, we may apply the dominated convergence theorem to conclude that indeed
lim^z→a (z − a)G(z) = m.
Let f (z) = (z − a)G(z). Suppose f has an analytic extension to a neighbour-
hood of a then G has a meromorphic extension to a neighbourhood of a. If
m = lim^z→a f (z) > 0 then G has a simple pole at a with residue m and ν has an
atom of mass m at a. If m = 0 then G has an analytic extension to a neighbourhood
of a.
Let us illustrate this with the example of the Marchenko-Pastur distribution with
parameter c (see
p the discussion following Exercise 2.9). √ In that case we have G(z) =
√
(z + 1 − c − (z − a)(z − b)/(2z); recall that a =p (1 − c)2 and b = (1 + c)2 . If
we write this as f (z)/z with f (z) = (z + 1 − c − (z − a)(z − b) )/2 then we may
(using the convention of Exercise 6 (ii)) extend f to be analytic on {z | Re(z) < a}
by choosing π/2 < θ1 , θ2 < 3π/2. With this convention we have f (0) = 1 − c when
c < 1 and f (0) = 0 when c > 1. Note that this is exactly the weight of the atom at 0.
For many probability measures arising in free probability G has a meromorphic
extension to a neighbourhood of a given point a. This is due to two results. The
first is a theorem of Greenstein [80, Thm. 1.2] which states that G can be continued
analytically to an open set containing the interval (a, b) if and only if the restriction
of ν to (a, b) is absolutely continuous with respect to Lebesgue measure and that
the density is real analytic. The second is a theorem of Belinschi [19, Thm. 4.1]
which states that the free additive convolution (see §3.5) of two probability measures
(provided neither is a Dirac mass) has no continuous singular part and the density
is real analytic whenever positive and finite. This means that for such measures G
has a meromorphic extension to a neighbourhood of every point where the density
is positive on some open set containing the point.
This integral representation is achieved by mapping the upper half-plane to the open
unit disc D, via ξ = (iz+1)/(iz−1),
and then defining ψ on D by ψ(ξ ) = −iϕ(z) =
−iϕ i(1 + ξ )/(1 − ξ ) and obtaining an analytic function ψ mapping the open unit
disc, D, into the complex right half-plane. In the disc version of the problem we
must find a real number β 0 and a positive measure σ 0 on ∂ D = [0, 2π] such that
Z 2π it
e +z
ψ(z) = iβ 0 + dσ 0 (t).
0 eit − z
The measure σ 0 is then obtained as a limit using the Helly selection principle (see
e.g. Lukacs [119, Thm. 3.5.1]). This representation is usually attributed to Herglotz.
The details can be found in Akhiezer and Glazman [4, Ch. VI, §59], Rudin [151,
Thm. 11.9], or Hoffman [99, p. 34].
The next theorem answers the question as to which analytic functions from C+
to C− are the Cauchy transform of a positive Borel measure.
Proof: By the remark above, applied to −G, there is a unique finite positive measure
R
σ on R such that G(z) = α + β z + (1 + tz)/(z − t) dσ (t) with α ∈ R and β ≤ 0.
Considering first the real part of iyG(iy) we get that for all y > 0 large enough
1 + t2
Z
2c ≥ Re(iyG(iy)) = y2 − β + dσ (t) .
y2 + t 2
Since both −β and (1 + t 2 )/(y2 + t 2 ) dσ (t) are non-negative, the right-hand term
R
can only stay bounded if β = 0. Thus for all y > 0 sufficiently large
1 + t2
Z
dσ (t) ≤ 2c.
1 + (t/y)2
second moment.
From the imaginary part of iyG(iy) we get that for all y > 0 sufficiently large
t(y2 − 1)
Z
y α +
2 2
dσ (t) ≤ 2c,
R t +y
Since |(y2 − 1)/(t 2 + y2 )| ≤ 1 for y ≥ 1 and since σ has a second (and hence also
a first) moment we can apply the dominated convergence theorem and conclude that
1 − y−2
Z Z
α = − lim t dσ (t) = − tdσ (t).
y→∞ R 1 + (t/y)2 R
Hence
Z
1 + tz 1 1
Z Z
G(z) = −t + dσ (t) = (1 + t 2 )dσ (t) = dν(t),
R z−t R z−t R z−t
where we have put ν(E) := E (1 + t 2 )dσ (t). This ν is a finite measure since σ has
R
Remark 11. Recall that in Definition 2.11 we defined the Marchenko-Pastur law via
the density νc on R. We then showed in Exercise 2.11 the free cumulants of νc
are given by κn = c for all n ≥ 1. We can also approach the Marchenko-Pastur
distribution from the other direction; namely start with the free cumulants and derive
the density using Theorems 6 and 10.
If we assume that κn = c for all n ≥ 1 and 0 < c < ∞, then R(z) = c/(1 − z) and
so by the reverse of equation (2.27)
1
+ R(G(z)) = z (3.4)
G(z)
Remark 12. If {νn }n is a sequence of finite Borel measures on R we say that {νn }n
converges weakly to the measure ν if for every f ∈ Cb (R) (the continuous bounded
R R
functions on R) we have limn f (t) dνn (t) = f (t) dν(t). We say that {νn }n con-
verges vaguely to ν if for every f ∈ C0 (R) (the continuous functions on R vanish-
R R
ing at infinity) we have limn f (t) dνn (t) = f (t) dν(t). Weak convergence implies
vague convergence but not conversely. However if all νn and ν are probability mea-
sures then the vague convergence of {νn }n to ν does imply that {νn }n converges
72 3 Free Harmonic Analysis
G(z) = (z − t)−1 d ν̃(t) i.e. ν = ν̃. Thus {νnk }k converges vaguely to ν. Since ν is
R
Exercise 10. Identify νn and ν for the sequence of Cauchy transforms which are
given by Gn (z) = 1/(z − n).
Γα,β
|
|x
=
αy
y=β
Notation 15 Let α > 0 and let Γα = {x + iy | αy > |x|} and for β > 0 let Γα,β =
{z√∈ Γα | Im(z) > β }. See Fig. 3.2. Note that for z ∈ C+ we have z ∈ Γα if and only
if 1 + α 2 Im(z) > |z|.
be its Nevanlinna representation with a real and b ≥ 0. Then for all α > 0 we have
limz→∞ F(z)/z = b for z ∈ Γα .
Exercise 15. Suppose that α > 0 and ν is a probability measure on R and that for
some n > 0 there are real numbers α1 , α2 , . . . , α2n such that as z → ∞ in Γα
2n+1 1 α1 α2n
lim z G(z) − + + · · · + 2n+1 = 0.
z→∞ z z2 z
2n dν(t) < ∞
R
Show that ν has a moment of order 2n, i.e. Rt and that α1 , α2 , . . . , α2n
are the first 2n moments of ν.
Consider now first the case that ν is a compactly supported probability measure
on R. Then ν has moments of all orders. We will show that the Cauchy transform of
ν is univalent on the exterior of a circle centred at the origin. We can then solve the
equation G(R(z) + 1/z) = z for R(z) to obtain a function R, analytic on the interior
of a disc centred at the origin and with power series given by the free cumulants of
ν. The precise statements are given in the next theorem.
Proof:
Let {αn }n be the moments of ν and let α0 = 1. Note that |αn | ≤ |t|n dν(t) ≤ rn .
R
Let
1
Z
f (z) = G 1/z = z dν(t).
1 − tz
For |z| < 1/r and t ∈ supp(ν), |zt| < 1 and the series ∑(zt)n converges uniformly
on supp(ν) and thus ∑n≥0 αn zn+1 converges uniformly to f (z) on compact subsets
of {z | |z| < 1/r}. Hence ∑n≥0 αn z−(n+1) converges uniformly to G(z) on compact
subsets of {z | |z| > r}.
Suppose |z1 |, |z2 | < r−1 . Then
f (z1 ) − f (z2 )
≥ Re f (z1 ) − f (z2 )
z1 − z2 z1 − z2
Z 1
d f (z1 + t(z2 − z1 ))
= Re dt
0 dt z2 − z1
Z 1
Re f 0 (z1 + t(z2 − z1 )) dt.
=
0
And
1 2
Re( f 0 (z)) ≥ 2 − 2
= .
(1 − 1/4) 9
Hence for |z1 | , |z2 | < (4r)−1 we have | f (z1 ) − f (z2 )| ≥ 2|z1 − z2 |/9. In particular, f
is univalent on {z | |z| < (4r)−1 }. Hence G is univalent on {z | |z| > 4r}. This proves
(i).
For any curve Γ in C and any w not on Γ let IndΓ (w) = Γ (z − w)−1 dz/(2πi)
R
be the index of w with respect to Γ (or the winding number of Γ around w). Now,
as f (0) = 0, the only solution to f (z) = 0 for |z| < (4r)−1 is z = 0. Let Γ be the
curve {z | |z| = (4r)−1 } and f (Γ ) = { f (z) | z ∈ Γ } be the image of Γ under f . By
the argument principle
1 f 0 (z)
Z
Ind f (Γ ) (0) = dz = 1.
2πi Γ f (z)
| f (z)| = |z| |1 + α1 z + α2 z2 + · · · |
≥ |z| (2 − (1 + r|z| + r2 |z|2 + · · · ))
1
= |z| 2 −
1 − r |z|
1
≥ |z| 2 −
1 − 1/4
2
= |z|.
3
Thus for |z| = (4r)−1 we have | f (z)| ≥ (6r)−1 . Hence f (Γ ) lies outside the circle
|z| = (6r)−1 and thus {z | |z| < (6r)−1 } is contained in the connected component of
C\ f (Γ ) containing 0. So for w ∈ {z | |z| < (6r)−1 }, Ind f (Γ ) (w) = Ind f (Γ ) (0) = 1, as
the index is constant on connected components of the complement of f (Γ ). Hence
1 f 0 (z)
Z
1 = Ind f (Γ ) (w) = dz,
2πi Γ f (z) − w
so again by the argument principle there is exactly one |z| with z < (4r)−1 such that
f (z) = w. Hence
and thus
{z | 0 < |z| < (6r)−1 } ⊂ {G(z) | |z| > 4r}.
This proves (ii).
3.3 Analyticity of the R-transform: compactly supported measures 77
Let f h−1i be the inverse of f on {z | |z| < (6r)−1 }. Then f h−1i (0) = 0 and
0
f h−1i (0) = 1/ f 0 (0) = 1, so f h−1i has a simple zero at 0. Let K be the meromorphic
function on {z | |z| < (6r)−1 } given by K(z) = 1/ f h−1i (z). Then K has a simple pole
at 0 with residue 1. Hence R(z) = K(z) − 1/z is holomorphic on {z | |z| < (6r)−1 },
and for 0 < |z| < (6r)−1
1
G(R(z) + 1/z) = G(K(z)) = f = f ( f h−1i )(z) = z.
K(z)
f (z) f (z)
C( f (z)) = f (z)K( f (z)) = = = M(z). (3.5)
f h−1i ( f (z)) z
p
C(z) = 1 + ∑ κ̃l zl + o(z p ),
l=1
and
p l
( f (z))l = ∑ αm−1 zm + o(z p ).
m=1
Hence
p p l
C( f (z)) = 1 + ∑ κ̃l ∑ αm−1 zm + o(z p ).
l=1 m=1
However this is exactly the relation between {αn }n and {κn }n found at the end of
the proof of Proposition 2.17. Given {αn }n there are unique κn ’s that satisfy this
relation, so we must have κ̃n = κn for all n. This proves (iv).
has compact support. Note that for the Cauchy distribution R(k) (0) = 0 for k ≥ 1 but
R(0) is not real.
or equivalently
1 α1 α2 1
G(z) = + 2 + 3 +o 3
z z z z
and thus
1 α2 − α12 1
G1 (z) = z − 1
= α1 + +o . (3.6)
z + α1
z2
+ α2
z3
+ o( z13 ) z z
The next lemma shows that G1 maps C+ to C− . We shall work with the function
F = 1/G. It will be useful to establish some properties of F (Lemmas 19 and 20) and
then show that these properties characterize the reciprocals of Cauchy transforms of
measures of finite variance (Lemma 21).
Lemma 19. Let ν be a probability measure on R and G its Cauchy transform. Let
F(z) = 1/G(z). Then F maps C+ to C+ and Im(z) ≤ Im(F(z)) for z ∈ C+ , with
equality for some z only if ν is a Dirac mass.
|z − t|−2 dν(t)
R
Im(F(z)) −Im(G(z)) 1
= = .
Im(z) Im(z) |G(z)|2 |G(z)|2
So our claim reduces to showing that |G(z)|2 ≤ |z − t|−2 dν(t). However by the
R
Cauchy-Schwartz inequality
3.4 Measures with finite variance 79
Z 2 Z
1 2
1
Z Z 1 2
2
z − t dν(t) ≤ 1 dν(t) z − t dν(t) = dν(t),
z−t
with equality only if t 7→ (z − t)−1 is ν-almost constant, i.e. ν is a Dirac mass. This
completes the proof.
Lemma 20. Let ν be a probability measure with finite variance σ 2 and let G1 (z) =
z − 1/G(z), where G is the Cauchy transform of ν. Then there is a probability mea-
sure ν1 such that
1
Z
G1 (z) = α1 + σ 2 dν1 (t)
z−t
where α1 is the mean of ν.
Lemma 21. Suppose that F : C+ → C+ is analytic and there is C > 0 such that for
z ∈ C+ , |F(z) − z| ≤ C/Im(z). Then there is a probability measure ν with mean 0
and variance σ 2 ≤ C such that 1/F is the Cauchy transform of ν. Moreover σ 2 is
the smallest C such that |F(z) − z| ≤ C/Im(z).
Hence limz→∞ zG(z) = 1 in any Stolz angle. Thus by Theorem 10 there is a proba-
bility measure ν such that G is the Cauchy transform of ν. Now
y2 2 y2
Z Z
2
2 2
t dν(t) = y − 2 2
dν(t) + 1 = y Im iy G(iy) (F(iy) − iy) .
y +t y +t
Also, allowing that both sides might equal ∞, we have by the monotone convergence
theorem that
y2 2
Z Z
t 2 dν(t) = lim t dν(t).
y→∞ y + t2
2
However
80 3 Free Harmonic Analysis
y |iy G(iy)|C
|yIm iyG(iy) (F(iy) − iy) | ≤ = C |iy G(iy)|,
Im(iy)
thus t 2 dν(t) ≤ C, and so ν has a second, and thus also a first, moment. Also
R
y2
Z
t dν(t) = −y2 Re(G(iy)) = −Re iyG(iy)(F(iy) − iy) .
y2 + t 2
Since iyG(iy) → 1 and |F(iy) − iy| ≤ C/y we see that the first moment of ν is 0, also
by the monotone convergence theorem.
We now have that σ 2 ≤ C. The inequality |z−F(z)| ≤ C/Im(z) precludes ν being
a Dirac mass other than δ0 . For ν = δ0 we have F(z) = z, and then the minimal C
is clearly 0 = σ 2 . Hence we can restrict to ν 6= δ0 , hence to ν not being a Dirac
mass. Thus by Lemma 19 we have for z ∈ C+ that z − F(z) ∈ C− . By equation
(3.6), limz→∞ z(z − F(z)) = σ 2 in any Stolz angle. Hence by Theorem 10 there is a
probability measure ν̃ such that z − F(z) = σ 2 (z − t)−1 d ν̃(t). Hence
R
1 1 σ2
Z Z
|z − F(z)| ≤ σ 2 d ν̃(t) ≤ σ 2 d ν̃(t) = .
|z − t| Im(z) Im(z)
where
and thus conclude that the probability measure ν1 of Lemma 20 has the second
moment β2 /β0 .
Remark 22. We have seen that if ν has a second moment then we may write
1 1
G(z) = 1
= R 1
(z − α1 ) − (α2 − α12 )
R
z−t dν1 (t) (z − a1 ) − b1 z−t dν1 (t)
1 1
Z
dν1 (t) = 1
z−t
R
(z − a2 ) − b2 z−t dν2 (t)
for some probability measure ν2 , where a2 = (α3 − 2α1 α2 + α13 )/(α2 − α12 ) and
b2 = (α2 α4 + 2α1 α2 α3 − α23 − α12 α4 − α32 )/(α2 − α12 )2 . Thus
1
G(z) = .
b1
z − a1 − R 1
z − a2 − b2 z−t dν2 (t)
If ν has moments of all orders {αn }n then the Cauchy transform of ν has a
continued fraction expansion (often called a J-fraction because of the connection
with Jacobi matrices).
1
G(z) = .
b1
z − a1 −
b2
z − a2 −
z − a3 − · · ·
The coefficients {an }n and {bn }n are obtained from the moments {αn }n as fol-
lows. Let An be the (n + 1) × (n + 1) Hankel matrix
1 α1 · · · αn
α1 α2 · · · αn+1
An = . .. ..
.. . .
αn αn+1 · · · α2n
and Ãn−1 be the n × n matrix obtained from An by deleting the last row and second
last column and Ã0 = (α1 ). Then let ∆−1 = 1, ∆n = det(An ), ∆˜ −1 = 0, and ∆˜ n =
det(Ãn ). By Hamburger’s theorem (see Shohat and Tamarkin [157, Thm. 1.2]) we
have that for all n, ∆n ≥ 0. Then b1 b2 · · · bn = ∆n /∆n−1 and
a1 + a2 + · · · + an = ∆˜ n−1 /∆n−1 ,
2
or equivalently bn = ∆n−2 ∆n /∆n−1 and an = ∆˜ n−1 /∆n−1 − ∆˜ n−2 /∆n−2 . If for some
n, ∆n = 0 then we only get a finite continued fraction.
Lemma 24. Suppose F : C+ → C+ is analytic and there is σ > 0 such that for
z ∈ C+ we have |z − F(z)| ≤ σ 2 /Im(z). Then
(i) C+ +
2σ ⊂ F(Cσ );
(ii) for each w ∈ C+ +
2σ there is a unique z ∈ Cσ such that F(z) = w.
σ2
|(F(z) − w) − (z − w)| = |F(z) − z| ≤ < σ = |z − w|.
Im(z)
Thus by Rouché’s theorem there is a unique z ∈ int(C) with F(z) = w. This proves
(i).
If z0 ∈ C+ 0
σ and F(z ) = w then
σ2
|w − z0 | = |F(z0 ) − z0 | ≤ < σ,
Im(z0 )
σ2 2σ 2
|F h−1i (w) − w| = |z − w| = |F(z) − z| ≤ ≤ .
Im(z) Im(w)
Theorem 25. Let ν be a probability measure on R with first and second moments α1
and α2 . Let G(z) = (z −t)−1 dν(t) be the Cauchy transform of ν and σ 2 = α2 −α12
R
be the variance of ν. Let F(z) = 1/G(z), then |F(z)+α1 −z| ≤ σ 2 /Im(z). Moreover
there is an analytic function Gh−1i defined on {z | |z + i(4σ )−1 | < (4σ )−1 } such that
G(Gh−1i (z)) = z.
σ2 σ2 σ2
Z Z
|z − F(z)|
e ≤ d ν̃(t) ≤ d ν̃(t) = .
|z − t| Im(z) Im(z)
If we apply Lemma 24, we get an inverse for Fe on {z | Im(z) > 2σ }. Note that
|z + i(4σ )−1 | < (4σ )−1 if and only if Im(1/z) > 2σ . Since G(z) = 1/F̃(z − α1 ) we
let Gh−1i (z) = Feh−1i (1/z) + α1 for |z + i(4σ )−1 | < (4σ )−1 . Then
1
G Gh−1i (z) = G Feh−1i (1/z) + α1 = G
e Feh−1i (1/z) = = z.
Fe Feh−1i (1/z)
In the next theorem we show that with the assumption of finite variance σ 2 we
can find an analytic function R which solves the equation G(R(z) + 1/z) = z on the
open disc with centre −i(4σ )−1 and radius (4σ )−1 . This is the R-transform of the
measure.
Theorem 26. Let ν be a probability measure with variance σ 2 . Then on the open
disc with centre −i(4σ )−1 and radius (4σ )−1 there is an analytic function R(z) such
that G(R(z) + 1/z) = z for |z + i(4σ )−1 | < (4σ )−1 where G the Cauchy transform
of ν.
Proof: Let Gh−1i be the inverse provided by Theorem 25 and R(z) = Gh−1i (z) − 1/z.
Then G(R(z) + 1/z) = G(Gh−1i (z)) = z.
One should note that the statements and proofs of Theorems 25 and 26, inter-
preted in the right way, remain also valid for the degenerated case σ 2 = 0, where ν
is a Delta mass. Then Gh−1i and R are defined on the whole lower half-plane C− ;
actually, for ν = δα1 we have R(z) = α1 .
3.5 The free additive convolution of probability measures with finite variance
One of the main ideas of free probability is that if we have two self-adjoint operators
a1 and a2 in a unital C∗ -algebra with state ϕ and if a1 and a2 are free with respect to
ϕ then we can find the moments of a1 +a2 from the moments of a1 and a2 according
to a universal rule. Since a1 , a2 and a1 + a2 are all bounded self-adjoint operators
there are probability measures ν1 , ν2 , and ν such that for i = 1, 2
Z Z
ϕ(aki ) = t k dνi (t) and ϕ((a1 + a2 )k ) = t k dν(t).
on ν1 and ν2 and not on the operators a1 and a2 used to construct it. For bounded
operators we also know that the free additive convolution can be described by the
additivity of their R-transforms We shall show in this section how to construct ν1
ν2 without assuming that the measures have compact support and thus without using
Banach algebra techniques. As we have seen in the last section we can still define
an R-transform by analytic means (at least for the case of finite variance); the idea is
then of course to define ν = ν1 ν2 by prescribing the R-transform of ν as the sum
of the R-transforms of ν1 and ν2 . However, it is then not at all obvious that there
actually exists a probability measure with this prescribed R-transform. In order to
see that this is indeed the case, we have to reformulate our description in terms of
the R-transform in a subordination form, as already alluded to in (2.31) at the end
of the last chapter.
Recall that the R-transform in the compactly supported case satisfied the equation
G(R(z) + 1/z) = z for |z| sufficiently small. So letting F(z) = 1/G(z) this becomes
F(R(z) + 1/z) = z−1 . For |z| sufficiently small Gh−1i (z) is defined, and hence also
F h−1i (z−1 ); then for such z we have
We shall now show, given two probability measures ν1 and ν2 with finite variance,
we can construct a probability measure ν with finite variance such that R = R1 + R2 ,
the R-transforms of ν, ν1 , and ν2 respectively.
Given w ∈ C+ we shall show in Lemma 27 that there are w1 and w2 in C+ such
that (3.8) holds. Then we define F by F(w) = F1 (w1 ) and show that 1/F is the
Cauchy transform of a probability measure of finite variance. This measure will
then be the free additive convolution of ν1 and ν2 . Moreover the maps w 7→ w1 and
w 7→ w2 will be the subordination maps of equation (2.31).
We need the notion of the degree of an analytic function which we summarize in
the exercise below.
3.5 Free additive convolution 85
It is a standard theorem that if X is compact then deg f is constant (see e.g. Miranda
[133, Ch. II, Prop. 4.8]).
(i) Adapt the proof in the compact case to show that if X is not necessarily com-
pact but f is proper, i.e. if the inverse image of a compact set is compact, then deg f
is constant.
(ii) Suppose that F1 , F2 : C+ → C+ are analytic and Fi0 (z) 6= 0 for z ∈ C+ and
i = 1, 2. Let X = {(z1 , z2 ) ∈ C+ × C+ | F1 (z1 ) = F2 (z2 )}. Give X the structure of a
complex manifold so that (z1 , z2 ) 7→ F1 (z1 ) is analytic.
(iii) Suppose F1 , F2 , and X are as in (ii) and in addition there are σ1 and σ2 such
that for i = 1, 2 and z ∈ C+ we have |z − Fi (z)| ≤ σi2 /Im(z). Show that θ : X → C
given by θ (z1 , z2 ) = z1 + z2 − F1 (z1 ) is a proper map.
Lemma 27. Suppose F1 and F2 are analytic maps from C+ to C+ and that there is
r > 0 such that for z ∈ C+ and i = 1, 2 we have |Fi (z) − z| ≤ r2 /Im(z) . Then for
each z ∈ C+ there is a unique pair (z1 , z2 ) ∈ C+ × C+ such that
(i) F1 (z1 ) = F2 (z2 ), and
(ii) z1 + z2 − F1 (z1 ) = z.
Proof: Note that, by Lemma 21, our assumptions imply that, for i = 1, 2, 1/Fi is
the Cauchy transform of some probability measure and thus, by Lemma 19, we also
know that it satisfies Im(z) ≤ Im(Fi (z)).
We first assume that z ∈ C+4r . If (z1 , z2 ) satisfies (i) and (ii),
Likewise Im(z2 ) ≥ Im(z). Hence, if we are to find a solution to (i) and (ii) we shall
find it in C+ + +
4r × C4r . By Lemma 24, F1 and F2 are invertible on C2r . Thus to find a
+
solution to (i) and (ii) it is sufficient to find u ∈ C2r such that
h−1i h−1i
F1 (u) + F2 (u) − u = z (3.9)
h−1i h−1i
and then let z1 = F1 (u) and z2 = F2 (u). Thus we must show that for every
z ∈ C+ +
4r there is a unique u ∈ C2r satisfying equation (3.9).
Let C be the circle with centre z and radius 2r. Then C ⊂ C+
2r and for u ∈ C we
have by Lemma 24
86 3 Free Harmonic Analysis
Hence
h−1i h−1i
(z − u) − z − u − (F1 (u) − u) − (F2 (u) − u)
h−1i h−1i
≤ F1 (u) − u + F2 (u) − u < 2r = |z − u|.
If there is u0 ∈ C+
2r with
h−1i h−1i
z − u0 = F1 (u0 ) − u0 + F2 (u0 ) − u0 ,
Theorem 28. Let ν1 and ν2 be two probability measures on R with finite variances
and R1 and R2 be the corresponding R-transforms. Then there is a unique prob-
ability measure with finite variance, denoted ν1 ν2 , and called the free additive
convolution of ν1 and ν2 , such that the R-transform of ν1 ν2 is R1 + R2 .
Moreover the first moment of ν1 ν2 is the sum of the first moments of ν1 and ν2
and the variance of ν1 ν2 is the sum of the variances of ν1 and ν2 .
Proof: By Exercise 18 we only have to prove the theorem in the case ν1 and ν2
are centred. Moreover there are probability measures ρ1 and ρ2 such that for z ∈
C+ and i = 1, 2 we have z − Fi (z) = σi2 (z − t)−1 dρi (t). By Lemma 27 for each
R
3.5 Free additive convolution 87
Since Im(F1 (z)) ≥ Im(z) we have Im(z) = Im(z2 ) + Im(z1 − F1 (z)) ≤ Im(z2 ).
Likewise Im(z) ≤ Im(z1 ). Thus
σ12 σ2 σ 2 + σ22
|z − F(z)| = |z1 − F1 (z1 ) + z2 − F2 (z2 )| ≤ + ≤ 1 .
Im(z1 ) Im(z2 ) Im(z)
Let Dr = {z | |z+ir| < r}, then D1/(4σ ) ⊂ D1/(4σ1 ) ∩D1/(4σ2 ) . Let z ∈ D1/(4σ ) , then
h−1i h−1i
z−1 is in the domains of F h−1i , F1 , and F2 . Now by Lemma 27, for F h−1i (z−1 )
find z1 and z2 in C+ so that F1 (z1 ) = F2 (z2 ) and F h−1i (z−1 ) = z1 + z2 − F1 (z1 ). By
the construction of F we have z−1 = F(F h−1i (z−1 )) = F1 (z1 ) = F2 (z2 ) and so z1 =
h−1i h−1i
F1 (z−1 ) and z2 = F2 (z−1 ). Thus the equation F h−1i (z−1 ) = z1 + z2 − F1 (z1 )
becomes
h−1i −1 h−1i −1
F h−1i z−1 − z−1 = F1 − z−1 + F2 − z−1 .
z z
Now recall the construction of the R-transform given by Theorem 26, reformulated
as in (3.7) in terms of F: R(z) = F h−1i (z−1 ) − z−1 . Hence R(z) = R1 (z) + R2 (z).
This means that the R-transform is a germ of analytic functions in that for each
α > 0 there is β > 0 and an analytic function R on ∆α,β such that whenever we are
given another α 0 > 0 for which there exists a β 0 > 0 and a second analytic function
R0 on ∆α 0 ,β 0 , the two functions agree on ∆α,β ∩ ∆α 0 ,β 0 . (See Fig. 3.4.)
Remark 30. When ν is compactly supported we can find a disc centred at 0 on which
there is an analytic function satisfying equation (3.1). This was shown in Theorem
17. When ν has finite variance we showed that there a disc in C− tangent to 0 and
3.6 The R-transform and free additive convolution of arbitrary measures 89
with centre on the imaginary axis (see Fig. 3.3) on which there is a analytic function
satisfying equation (3.1). This was shown in Theorem 26. In the general case we
shall define R(z) by the equation R(z) = F h−1i (z−1 ) − z−1 . The next two lemmas
show that we can find a domain where this definition works.
Lemma 31. Let F be the reciprocal of the Cauchy transform of a probability mea-
sure on R. Suppose 0 < α1 < α2 . Then there is β0 > 0 such that for all β2 ≥ β0 and
β1 ≥ β2 (1 + α2 − α1 ),
(i) we have Γα1 ,β1 ⊆ F(Γα2 ,β2 )
(ii) and F h−1i exists on Γα1 ,β1 , i.e. for each w ∈ Γα1 ,β1 there is a unique z ∈ Γα2 ,β2
such that F(z) = w.
α2 − α1
ε < sin θ = q q .
1 + α12 1 + α22
Choose β0 > 0 such that |F(z) − z| < ε|z| for z ∈ Γα2 ,β0 (which is possible by Exer-
cise 12). Let β2 ≥ β0 and β1 ≥ β2 (1 + α2 − α1 ).
Let us first show that for w ∈ Γα1 ,β1 and for z ∈ ∂Γα2 ,β2 we have ε|z| < |z − w|.
If z = α2 y + iy ∈ ∂Γα2 then |z − w|/|z| ≥ sin θ > ε. If z = x + iβ2 ∈ ∂Γα2 ,β2 then
q q q
|z − w| > β1 − β2 ≥ β2 (α2 − α1 ) > εβ2 1 + α12 1 + α22 ≥ ε|z| 1 + α12 > ε|z|.
Thus for w ∈ Γα1 ,β1 and z ∈ ∂Γα2 ,β2 we have ε|z| < |z − w|.
Now fix w ∈ Γα1 ,β1 and let r > |w|/(1 − ε). Thus for z ∈ {z̃ | |z̃| = r} ∩ Γα2 ,β2 we
have |z − w| ≥ r − |w| > εr = ε|z|. So let C be the curve
C := ∂Γα2 ,β2 ∩ {z̃| | |z̃| ≤ r} ∪ {z̃ | |z̃| = r} ∩ Γα2 ,β2 .
So by Rouché’s theorem there is exactly one z in the interior of C such that F(z) = w.
Since we can make r as large as we want, there is a unique z ∈ Γα2 ,β2 such that
F(z) = w. Hence F has an inverse on Γα1 ,β1 .
Lemma 32. Let F be the reciprocal of the Cauchy transform of a probability mea-
sure on R. Suppose 0 < α1 < α2 . Then there is β0 > 0 such that
Then choose β0 > 0 such that |F(z) − z| < ε|z| for z ∈ Γα1 ,β0 .
Suppose β1 ≥ β0 and let z ∈ Γα1 ,β1 with Re(z) ≥ 0, (the case Re(z) < 0 is similar).
Write z = |z|eiϕ . Then ϕ > tan−1 (α1−1 ). Write F(z) = |F(z)|eiψ . We have |z−1 F(z)−
1| < ε. Thus | sin(ψ − ϕ)| < ε, so
If ψ ≤ π/2, then
√
−1 α1−1 − ε/ 1 − ε 2
(α1−1 ) − sin−1 (ε) > α2−1 .
tan(ψ) > tan tan = √
1 + α1−1 ε/ 1 − ε 2
Theorem 33. Let ν be a probability measure on R with Cauchy transform G and set
F = 1/G. For every α > 0 there is β > 0 so that R(z) = F h−1i (z−1 ) − z−1 is defined
for z ∈ ∆α,β and such that we have
(i) G(R(z) + 1/z) = z for z ∈ ∆α,β and
(ii) R(G(z)) + 1/G(z) = z for z ∈ Γα,β .
Proof: Let F(z) = 1/G(z). Let α > 0 be given and by Lemma 31 choose β0 > 0
so that F h−1i is defined on Γ2α,β0 . For z ∈ ∆2α,β0 , R(z) is thus defined and we have
G(R(z) + 1/z) = G(F h−1i (z−1 )) = z.
Now by Lemma 32 we may choose β > β0 such that F(Γα,β ) ⊆ Γ2α,β . For z ∈
−1
Γα,β we have G(z) = 1/F(z) ∈ Γ2α,β = ∆2α,β ⊆ ∆2α,β0 and so
Exercise 19. Let w ∈ C be such that Im(w) ≤ 0. Then we saw in Exercise 9 that
G(z) = (z − w)−1 is the Cauchy transform of a probability measure on R. Show that
the R-transform of this measure is R(z) = w. In this case R is defined on all of C
even though the corresponding measure has no moments (when Im(w) < 0).
Remark 34. We shall now show that given two probability measures ν1 and ν2 with
R-transforms R1 and R2 , respectively, we can find a third probability measure ν with
Cauchy transform G and R-transform R such that R = R1 + R2 . This means that for
all α > 0 there is β > 0 such that all three of R, R1 , and R2 are defined on ∆α,β and
for z ∈ ∆α,β we have R(z) = R1 (z) + R2 (z). We shall denote ν by ν1 ν2 and call it
the free additive convolution of ν1 and ν2 . Clearly, this extends then our definition
for probability measures with finite variance from the last section.
When ν1 is a Dirac mass at a ∈ R we can dispose of this case directly. An easy
calculation shows that R1 (z) = a, c.f. Exercise 1. So R(z) = a + R2 (z) and thus
G(z) = G2 (z − a). Thus ν(E) = ν2 (E − a), c.f. Exercise 18. So for the rest of this
section we shall assume that neither ν1 nor ν2 is a Dirac mass.
There is another case that we can easily deal with. Suppose Im(w) < 0. Let ν1 =
δw be the probability measure with Cauchy transform G1 (z) = (z − w)−1 . This is the
measure we discussed in Notation 4; see also Exercises 9 and 19. Then R1 (z) = w.
Let ν2 be any probability measure on R. We let G2 be the Cauchy transform of ν2
and R2 be its R-transform. So if ν1 ν2 exists its R-transform should be R(z) =
w + R2 (z). Let us now go back to the subordination formula (2.31) in Chapter 2. It
says that if ν1 ν2 exists its Cauchy transform, G, should satisfy G(z) = G2 (ω2 (z))
where ω2 (z) = z − R1 (G(z)) = z − w. Now ω2 maps C+ to C+ and letting G =
G2 ◦ ω2 we have
lim iy G(iy) = 1.
y→∞
In the remainder of this Chapter we shall define ν1 ν2 in full generality; for this
we will show that we can always find ω1 and ω2 satisfying (2.32).
92 3 Free Harmonic Analysis
Corollary 37. Let F1 and F2 be as in Notation 36. Suppose 0 < α2 < α1 . Then there
are β2 ≥ β0 > 0 such that
h−1i h−1i
(i) F1 is defined on Γα1 ,β1 for any β1 ≥ β0 with F1 (Γα1 ,β1 ) ⊆ Γα1 +1,β1 /2 ;
(ii) F2 (Γα2 ,β2 ) ⊆ Γα1 ,β0 .
Remark 39. Choose now some α1 > α2 > 0, and β2 ≥ β0 ≥ 0 according to Corollary
37. In the following we will also need to control Im(F2 (w) − w). Note that, by the
fact that F2 (w)/w → 1, for w → ∞ in Γα2 ,β2 , we have, for any ε < 1, |F2 (w) − w| <
ε|w| for sufficiently large w ∈ Γα2 ,β2 . But then
q
0 ≤ Im(F2 (w) − w) ≤ |F2 (w) − w| < ε|w| < ε 1 + α22 · Im(w);
q
the latter inequality is from Notation 15 for w ∈ Γα2 ,β2 . By choosing 1/ε = 2 1 + α22
we find thus a β > 0 (which we can take β ≥ β2 ) such that we have
1
Im(F2 (w) − w) < Im(w) for all w ∈ Γα2 ,β ⊆ Γα2 ,β2 . (3.12)
2
h−1i
Consider now for w ∈ Γα2 ,β the point z = w + F1 (F2 (w)) − F2 (w). Since F2 (w) ∈
Γα1 ,β0 , this is well defined. Furthermore, we have Im(F2 (w)) ≥ Im(w) > β ≥ β0 ,
and thus actually F2 (w) ∈ Γα1 ,Im(w) , which then yields
h−1i
z = w + F1 (F2 (w)) − F2 (w) ⇐⇒ g(z, w) = w.
3.6 The R-transform and free additive convolution of arbitrary measures 93
h−1i
Proof: Suppose z = w + F1 (F2 (w)) − F2 (w). By Remark 39 we have z ∈ C+ .
g(z, w) = z + H1 (z + H2 (w))
= z + H1 (z + F2 (w) − w)
h−1i
= z + H1 (F1 (F2 (w)))
h−1i h−1i
= z + F1 (F1 (F2 (w))) − F1 (F2 (w))
h−1i
= z + F2 (w) − F1 (F2 (w))
= w.
so
F2 (w) = F1 (z + F2 (w) − w)
thus
h−1i
F1 (F2 (w)) = z + F2 (w) − w
as required.
is such that for z ∈ Ω , gz has a fixed point in C+ (even in Γα2 ,β ). Our goal is to
show that for every z ∈ C+ there is w such that gz (w) = w and that w is an analytic
function of z.
Exercise 20. In the next proof we will use the following simple part of the Denjoy-
Wolff Theorem. Suppose f : D → D is a non-constant holomorphic function on the
unit disc D := {z ∈ C | |z| < 1} and it is not an automorphism of D (i.e., not of
the form λ (z − α)/(1 − ᾱz) for some α ∈ D and λ ∈ C with |λ | = 1). If there is a
z0 ∈ D with f (z0 ) = z0 , then for all z ∈ D, f ◦n (z) → z0 . In particular, the fixed point
is unique.
Prove this by an application of the Schwarz Lemma.
Lemma 42. Let g(z, w) be as in Definition 38. Then there is a non-constant analytic
function f : C+ → C+ such that for all z ∈ C+ , g(z, f (z)) = f (z). The analytic
function f is uniquely determined by the fixed point equation.
it is clear that gz cannot be an automorphism of the upper half-plane and hence g̃z
cannot be an automorphism of the disc. Hence, by Denjoy-Wolff, g̃◦n z (ũ) → w̃ for
all ũ ∈ D. Converting back to C+ we see that g◦n z (u) → w for all u ∈ C +.
+
Now we define our iterates on all of C , where we choose for concreteness the
initial point as u0 = i. We define a sequence { fn }n of analytic functions from C+ to
C+ by fn (z) = g◦n +
z (i). We claim that for all z ∈ C , limn f n (z) exists. We have shown
h−1i
that already for z ∈ Ω . There z = w + F1 (F2 (w)) − F2 (w) with w ∈ Γα2 ,β , and
g◦n
z (i) → w. Thus for all z ∈ Ω the sequence { f n (z)}n converges to the corresponding
w. Now let Ω̃ = ψ(Ω ) and f˜n = ψ ◦ fn ◦ ϕ. then f˜n : D → D and for z̃ ∈ Ω̃ , limn f˜n (z̃)
exists. Hence, by Vitali’s Theorem, limn f˜n (z̃) exists for all z̃ ∈ D. Note that by the
maximum modulus principle this limit cannot take on values on the boundary of D
unless it is constant. Since it is clearly not constant on Ω̃ , the limit takes on only
values in D. Hence limn fn (z) exists for all z ∈ C+ as an element in C+ . So we define
f : C+ → C+ by f (z) = limn fn (z); by Vitali’s Theorem the convergence is uniform
on compact subsets of C+ and f is analytic. Recall that fn (z) = g◦n z (i), so
◦(n+1)
gz ( f (z)) = lim gz ( fn (z)) = lim gz (i) = f (z),
n n
Theorem 43. There are analytic functions ω1 , ω2 : C+ → C+ such that for all z ∈
C+
(i) F1 (ω1 (z)) = F2 (ω2 (z)), and
(ii) ω1 (z) + ω2 (z) = z + F1 (ω1 (z)).
The analytic functions ω1 and ω2 are uniquely determined by these two equations.
Proof: Let z ∈ C+ and gz (w) = g(z, w). By Lemma 42, gz has a unique fixed point
f (z). So define the function ω2 by ω2 (z) = f (z) for z ∈ C+ , and the function ω1 by
ω1 (z) = z + F2 (ω2 (z)) − ω2 (z). Then ω1 and ω2 are analytic on C+ and
and
ω2 (z) = z + F1 (ω1 (z)) − ω1 (z) = z + H1 (ω1 (z)),
and thus
ω2 (z) = z + H1 (z + H2 (ω2 (z))) = g(z, ω2 (z)).
By Lemma 42, we know that an analytic solution of this fixed point equation is
unique. Exchanging H1 and H2 gives in the same way the uniqueness of ω1 .
To define the free additive convolution of ν1 and ν2 we shall let F(z) = F1 (ω1 (z))
= F2 (ω2 (z)) and then show that 1/F is the Cauchy transform of a probability mea-
sure, which will be ν1 ν2 . The main difficulty is to show that F(z)/z → 1 as
^z → ∞. For this we need the following lemma.
ω1 (iy) ω2 (iy)
Lemma 44. lim = lim = 1.
y→∞ iy y→∞ iy
Proof: Let us begin by showing that limy→∞ ω2 (iy) = ∞ (in the sense of Definition
16).
We must show that given α, β > 0 there is y0 > 0 such that ω2 (iy) ⊆ Γα,β when-
ever y > y0 . Note that by the previous Theorem we have ω2 (z) = z+H1 (ω1 (z)) ∈ z+
C+ . So we have that Im(ω2 (z)) > Im(z). Since ω2 maps C+ to C+ we have by the
Nevanlinna representation of ω2 (see Exercise 13) that b2 = limy→∞ ω2 (iy)/(iy) ≥ 0.
This means that Im(ω2 (iy))/y → b2 and our inequality Im(ω2 (z)) > Im(z) implies
that b2 ≥ 1. We also have that Re(ω2 (iy))/y → 0. So there is y0 > 0 so that for
y > y0 ≥ β we have
96 3 Free Harmonic Analysis
2 2 2
Re(ω2 (iy)) Im(ω2 (iy)) 2 Im(ω2 (iy))
+ < (α + 1) .
y y y
Thus ω2 (iy) ∈ Γα (see Notation 15). Since Im(ω2 (iy)) > y > y0 , we have that
ω2 (iy) ∈ Γα,β . Thus limy→∞ ω2 (iy) = ∞.
Recall that ω1 (z) = z+H2 (ω2 (z)) ∈ z+C+ , so by repeating our arguments above
we have that b1 = limy→∞ ω1 (iy)/(iy) ≥ 1 and limy→∞ ω1 (iy) = ∞.
F1 (ω1 (iy))
Since lim^z→∞ F1 (z)/z = 1 (see Exercise 12) we now have lim = 1.
y→∞ ω2 (iy)
Moreover the equation ω1 (z) + ω2 (z) = z + F1 (ω1 (z)) means that
ω1 (iy) + ω2 (iy)
b1 + b2 = lim
y→∞ iy
iy + F1 (ω1 (iy))
= lim
y→∞ iy
F1 (ω1 (iy)) ω1 (iy)
= 1 + lim
y→∞ ω1 (iy) iy
= 1 + b1 .
ω1 (F h−1i (z−1 )) + ω2 (F h−1i (z−1 )) − F1 (ω1 (F h−1i (z−1 ))) = F h−1i (z−1 ).
h−1i h−1i
Also ω1 (F h−1i (z−1 )) = F1 (z−1 ) and ω2 (F h−1i (z−1 )) = F2 (z−1 ) so our equa-
tion becomes
3.6 The R-transform and free additive convolution of arbitrary measures 97
h−1i h−1i
F1 (z−1 ) + F2 (z−1 ) − z−1 = F h−1i (z−1 ).
Hence R(z) = R1 (z) + R2 (z).
Remark 48. In the case of bounded operators x and y which are free we saw in Sec-
tion 3.5 that the distribution of their sum gives the free additive convolution of their
distributions. Later we shall see how using the theory of unbounded operators affil-
iated with a von Neumann algebra we can have the same conclusion for probability
measures with non-compact support (see Remark 8.16).
Remark 49. 1) There is also a similar analytic theory of free multiplicative convolu-
tion for the product of free variables; see, for example, [21, 31, 54].
2) There exists a huge body of results around infinitely divisible and stable laws
in the free sense; see, for example, [8, 9, 11, 22, 29, 31, 32, 30, 53, 70, 97, 199].
Chapter 4
Asymptotic Freeness for Gaussian, Wigner, and Unitary Random
Matrices
After having developed the basic theory of freeness we are now ready to have a
more systematic look into the relation between freeness and random matrices. In
chapter 1 we showed the asymptotic freeness between independent Gaussian ran-
dom matrices. This is only the tip of an iceberg. There are many more classes of
random matrices which show asymptotic freeness. In particular, we will present
such results for Wigner matrices, Haar unitary random matrices and treat also the
relation between such ensembles and deterministic matrices. Furthermore, we will
strengthen the considered form of freeness from the averaged version (which we
considered in chapter 1) to an almost sure one.
We should point out that our presentation of the notion of freeness is quite orthog-
onal to its historical development. Voiculescu introduced this concept in an operator
algebraic context (we will say more about this in chapter 6); at the beginning of
free probability, when Voiculescu discovered the R-transform and proved the free
central limit theorem around 1983, there was no relation at all with random matri-
ces. This connection was only revealed later in 1991 by Voiculescu [180]; he was
motivated by the fact that the limit distribution which he found in the free central
limit theorem had appeared before in Wigner’s semi-circle law in the random ma-
trix context. The observation that operator algebras and random matrices are deeply
related had a tremendous impact and was the beginning of new era in the subject of
free probability.
99
100 4 Asymptotic Freeness
Recall that since s1 , . . . , s p are free their mixed cumulants will vanish, and only the
second cumulants of the form κ2 (si , si ) will be non-zero. With the chosen normal-
ization of the variance for our random matrices those second cumulants will be 1.
Thus
ϕ(si1 · · · sim ) = ∑ κπ [si1 , . . . , sim ]
π∈NC2 (m)
is given by the number of non-crossing pairings of the si1 , . . . , sim which connect
only si ’s with the same index. Hence (4.2) follows from Lemma 1.9.
The statements above about the limit distribution of Gaussian random matrices
are in distribution with respect to the averaged trace E[tr(·)]. However, they also hold
in the stronger sense of almost sure convergence. Before formalizing this let us first
look at some numerical simulations in order to get an idea of the difference between
convergence of averaged eigenvalue distribution and almost sure convergence of
eigenvalue distribution.
Consider first our usual setting with respect to E[tr(·)]. To simulate this we have
to average for fixed N the eigenvalue distributions of the sampled N × N matrices.
4.1 Averaged convergence versus almost sure convergence 101
-3 -2 -1 0 1 2 3
For the Gaussian ensemble there are infinitely many of those, so we approximate
this averaging by choosing a large number of realizations of our random matrices. In
the following pictures we created 10,000 N × N matrices (by generating the entries
independently and according to a normal distribution), calculated for each of those
10,000 matrices the N eigenvalues and plotted the histogram for the 10, 000 × N
eigenvalues. We show those histograms for N = 5 (see Fig. 4.1) and N = 20 (see
Fig. 4.2). Wigner’s theorem in the averaged version tells us that as N → ∞ these
averaged histograms have to converge to the semi-circle. The numerical simulations
show this very clearly. Note that already for quite small N, for example N = 20, we
have a very good agreement with the semi-circular distribution.
Let us now consider the stronger almost sure version of this. In that case we
produce for each N only one N × N matrix (generated according to the probability
measure for our ensemble) and plot the corresponding histogram of the N eigen-
values. The almost sure version of Wigner’s theorem says that generically, i.e., for
almost all choices of such sequences of N × N matrices, the corresponding sequence
of histograms converges to the semi-circle. This statement is supported by the fol-
lowing pictures of four such samples, for N = 10, N = 100, N = 1000, N = 4000
(see Figures 4.3 and 4.4). Clearly, for small N the histogram depends on the specific
realization of our random matrix, but the larger N gets, the smaller the variations
between different realizations get.
Also for the asymptotic freeness of independent Gaussian random matrices we
have an almost sure version. Consider two independent Gaussian random matrices
distr
AN , BN . We have seen that AN , BN −→ s1 , s2 , where s1 , s2 are free semi-circular
elements.
This means, for example, that
0.3 0.3
0.2 0.2
0.1 0.1
0
-2 -1 0 1 2 -2 -1 0 1 2
0.3
0.3
0.2
0.2
0.1 0.1
0
-2 -1 0 1 2 -2 -1 0 1 2
Fig. 4.4 One realization of a N = 1000 and a N = 4000 Gaussian random matrix.
The numerical simulation in the first part of the following figure shows the aver-
aged (over 1000 realizations) value of tr(AN AN BN BN AN BN BN AN ), plotted against
N, for N between 2 and 30. Again, one sees (Fig. 4.5 left) a very good agreement
with the asymptotic value of 2 for quite small N.
3.0 4.0
2.8
3.5
2.6
3.0
2.4
2.2 2.5
2.0 2.0
1.8 1.5
1.6
1 5 10 15 20 25 30 10 50 100 150 200
Fig. 4.5 On the left we have the averaged trace (averaged over 1000 realizations) of the normalized
trace of XN = AN AN BN BN AN BN BN AN for N from 1 to 30. On the right the normalized trace of XN
for N from 1 to 200 (one realization for each N).
4.1 Averaged convergence versus almost sure convergence 103
For the almost sure version of this we realize for each N just one matrix
AN and (independently) one matrix BN and calculate for this pair the number
tr(AN AN BN BN AN BN BN AN ). We expect that generically, as N → ∞, this should also
converge to 2. The second part of the above figure shows a simulation for this (Fig.
4.5 right).
Let us now formalize our two notions of asymptotic freeness. For notational con-
venience, we restrict here to two sequences of matrices. The extension to more ran-
dom matrices or to sets of random matrices should be clear.
Definition 1. Consider two sequences (AN )N∈N and (BN )N∈N of random N × N ma-
trices such that for each N ∈ N, AN and BN are defined on the same probability space
(ΩN , PN ). Denote by EN the expectation with respect to PN .
1) We say AN and BN are asymptotically free if AN , BN ∈ (AN , EN [tr(·)]) (where
AN is the algebra generated by the random matrices AN and BN ) converge in dis-
tribution to some elements a, b (living in some non-commutative probability space
(A, ϕ)) such that a, b are free.
2) Consider now the product space Ω = ∏N∈N ΩN and let P = ∏N∈N PN be
the product measure of the PN on Ω . Then we say that AN and BN are almost
surely asymptotically free, if there exists a, b (in some non-commutative probabil-
ity space (A, ϕ)) which are free and such that we have for almost all ω ∈ Ω that
AN (ω), BN (ω) ∈ (MN (C), tr(·)) converge in distribution to a, b.
Remark 2. What does this mean concretely? Assume we are given our two se-
quences AN and BN and we want to investigate their convergence to some a and
b, where a and b are free. Then, for any choice of m ∈ N and p1 , q1 , . . . , pm , qm ≥ 0,
we have to consider the trace of the corresponding monomial,
α := ϕ(aq1 b p1 · · · aqm b pm ).
Note that, since the fN are independent with respect to P, this is by the second
Borel-Cantelli lemma actually equivalent to the almost sure convergence of fN . On
the other hand, Chebyshev’s inequality gives us the bound (since αN = E[ fN ])
1
PN ({ω | | fN (ω) − αN | ≥ ε}) ≤ var[ fN ].
ε2
So if we can show that ∑N∈N var[ fN ] < ∞, then we are done. Usually, one is able to
bound the order of these variances by a constant times 1/N 2 , which is good enough.
We will come back to the question of estimating the variances in Remark 5.14.
In Theorem 5.13 we will show the variances is of order 1/N 2 , as claimed above.
(Actually we will do more there and provide a non-crossing interpretation of the co-
efficient of this leading order term.) So in the following we will usually only address
the asymptotic freeness of the random matrices under consideration in the averaged
sense and postpone questions about the almost sure convergence to Chapter 5. How-
ever, in all cases considered the averaged convergence can be strengthened to almost
sure convergence, and we will state our theorems directly in this stronger form.
Remark 3. There is actually another notion of convergence which might be more
intuitive than almost sure convergence, namely convergence in probability. Namely,
our random matrices AN and BN converge in probability to a and b (and hence, if a
and b are free, are asymptotically free in probability), if we have for each ε > 0 that
lim tr(Dm
N) (4.3)
N→∞
distr
exists for all m ≥ 1. Then we have DN −→ d, as N → ∞, where d lives in some
non-commutative probability space and where the moments of d are given by the
4.2 Gaussian random matrices and deterministic matrices 105
limit moments (4.3) of the DN . We want to investigate the question whether there is
anything definite to say about the relation between s and d?
In order to answer this question we need to find out whether the limiting mixed
moments
q(1) q(2) q(m)
lim E[tr(DN AN DN · · · DN AN )], (4.4)
N→∞
for all m ≥ 1 (where q(k) can be 0 for some k), exist. In the calculation let us
suppress the dependence on N to reduce the number of indices, and write
q(k) (k)
DN = (di j )Ni,j=1 and AN = (ai j )Ni,j=1 . (4.5)
where
1
E[ai j akl ] = δil δ jk . (4.7)
N
Thus we have
In terms of matrix elements we have the following which we leave as an easy exer-
cise.
Exercise 1. Let A1 , . . . , An be N × N matrices and let σ ∈ Sn be a permutation. Let
(k)
the entries of Ak be (ai j )Ni,j=1 . Show that
N
(1) (2) (n)
trσ (A1 , . . . , An ) = N −#(σ ) ∑ ai1 iσ (1) ai2 iσ (2) · · · ain iσ (n) .
i1 ,...,in =1
Now, as pointed out in Corollary 1.6, one has for π ∈ P2 (m) that
(
1, if π ∈ NC2 (m)
lim N #(γπ)−1−m/2 = ,
N→∞ 0, otherwise
We see that the mixed moments of Gaussian random matrices and deterministic ma-
trices have a definite limit. And moreover, we can recognize this limit as something
familiar. Namely compare (4.9) to the formula (2.22) for a corresponding mixed
moment in free variables d and s, in the case where s is semi-circular:
Both formulas, (4.9) and (4.10), are the same provided K −1 (π) = γπ where K is
the Kreweras complement. But this is indeed true for all π ∈ NC2 (m), see [140, Ex.
18.25]. Consider for example π = {(1, 10), (2, 3), (4, 7), (5, 6), (8, 9)} ∈ NC2 (10).
Regard this as the involution π = (1, 10)(2, 3)(4, 7)(5, 6)(8, 9) ∈ S10 . Then we have
γπ = (1)(2, 4, 8, 10)(3)(5, 7)(6)(9), which corresponds exactly to K −1 (π).
Thus we have proved that Gaussian random matrices and deterministic matrices
become asymptotically free with respect to the averaged trace. The calculations can
of course also be extended to the case of several GUE and deterministic matrices.
By estimating the covariance of the appropriate traces, see Remark 5.14, one can
strengthen this to almost sure asymptotic freeness. So we have the following theo-
rem of Voiculescu [180, 188].
4.2 Gaussian random matrices and deterministic matrices 107
(1) (p)
Theorem 4. Let AN , . . . , AN be p independent N × N GUE random matrices and
(1) (q)
let DN , . . . , DN be q deterministic N × N matrices such that
Then
(1) (p) (1) (q) distr
AN , . . . , AN , DN , . . . , DN −→ s1 , . . . , s p , d1 , . . . , dq as N → ∞,
Let us also remark that in our algebraic framework it is not obvious how to deal
directly with the assumption of almost sure convergence to the limit distribution. We
will actually replace this in the next chapter by the more accessible condition that
the variance of the normalized traces is of order 1/N 2 . Note that this is a stronger
condition in general than almost sure convergence of the eigenvalue distribution, but
this stronger assumption in our theorems will be compensated by the fact that we
can then also show this stronger behaviour in the conclusion.
Exercise 3. Let Φ : GLN (C) → U(N) be the map which takes an invertible complex
matrix A and applies the Gram-Schmidt procedure to the columns of A to obtain a
unitary matrix. Show that for any U ∈ U(N) we have Φ(UA) = UΦ(A).
Exercise 4. Let {Zi j }i j be as in Exercise 2 and let Z be the N × N matrix with entries
Zi j . Since Z ∈ GLN (C), almost surely, we may let U = Φ(Z). Show that U is Haar
distributed.
What is the ∗-distribution of a Haar unitary random matrix with respect to the
state ϕ = E ◦ tr? Since UN∗ UN = IN = UN UN∗ , the ∗-distribution is determined by the
values ϕ(UNm ) for m ∈ Z. Note that for any complex number λ ∈ C with |λ | = 1,
λUN is again a Haar unitary random matrix. Thus, ϕ(λ mUNm ) = ϕ(UNm ) for all m ∈ Z.
This implies that we must have ϕ(UNm ) = 0 for m 6= 0. For m = 0, we have of course
ϕ(UN0 ) = ϕ(IN ) = 1.
Thus a Haar unitary random matrix UN ∈ U(N) is a Haar unitary for each N ≥ 1
(with respect to ϕ = E ◦ tr).
We want to see that asymptotic freeness occurs between Haar unitary random
matrices and deterministic matrices, as was the case with GUE random matrices.
The crucial element in the Gaussian setting was the Wick formula, which of course
does not apply when dealing with Haar unitary random matrices, whose entries are
neither independent nor Gaussian. However, we do have a replacement for the Wick
formula in this context, which is known as the Weingarten convolution formula, see
[57, 60].
The Weingarten convolution formula asserts the existence of a sequence of func-
tions (WgN )∞ N=1 with each WgN a central function in the group algebra C[Sn ] of the
symmetric group Sn , for each N ≥ n. The function WgN has the property that for
the entries ui j of a Haar distributed unitary random matrix U = (ui j ) ∈ U(N) and all
index tuples i, j, i0 , j0 : [n] → [N]
n
E[ui1 j1 · · · uin jn ui0 j0 · · · ui0n jn0 ] = ∑ ∏ δir i0σ (r) δ jr jτ(r)
0 WgN (τσ −1 ). (4.11)
1 1
σ ,τ∈Sn r=1
(N + J1 ) · · · (N + Jn ) = ∑ N #(σ ) σ .
σ ∈Sn
110 4 Asymptotic Freeness
Exercise 7. Let G ∈ C[Sn ] be the function G(σ ) = N #(σ ) . Thus as operators we have
G = (N + J1 ) · · · (N + Jn ). Show that kJk k ≤ k − 1 and for N ≥ n, G is invertible in
C[Sn ]. Let WgN be the inverse of G.
By writing
N n WgN = (1 + N −1 J1 )−1 · · · (1 + N −1 Jn )−1
show that
N n WgN (σ ) = O(N −|σ | ).
Then, for N → ∞,
(1) (1)∗ (p) (p)∗ (1) (q) distr
UN ,UN , . . . ,UN ,UN , DN , . . . , DN −→ u1 , u∗1 , . . . , u p , u∗p , d1 , . . . , dq ,
where each ui is a Haar unitary and {u1 , u∗1 }, . . . , {u p , u∗p }, {d1 , . . . , dq } are free.
(1) (1)∗
The above convergence holds also almost surely. In particular, {UN ,UN }, . . . ,
(p) (p)∗ (1) (q)
{UN ,UN }, {DN , . . . , DN } are almost surely asymptotically free.
The proof proceeds in a fashion similar to the Gaussian setting and will not be
given here. We refer to [140, Lecture 23].
Note that in general if u is a Haar unitary such that {u, u∗ } is free from elements
{a, b}, then a and ubu∗ are free. In order to prove this, consider
Note that by the unitary condition we have qi (ubu∗ ) = uqi (b)u∗ . Thus, by the free-
ness between {u, u∗ } and b,
But then
is zero, since {u, u∗ } is free from {a, b} and ϕ vanishes on all the factors in the latter
product.
Thus our Theorem 8 yields also the following as a corollary.
The reader might notice that this theorem is, strictly speaking, not a consequence
of Theorem 8, because in order to use the latter we would need the assumption that
also mixed moments in AN and BN converge to some limit; which we do not assume
in Theorem 9. However, the proof of Theorem 8, for the special case where we only
need to consider moments in which UN and UN∗ come alternatingly, reveals that we
never encounter a mixed moment in AN and BN . The structure of the Weingarten
formula ensures that they will never interact. A detailed proof of Theorem 9 can be
found in [140, Lecture 23].
Conjugation by a Haar unitary random matrix corresponds to a random rotation.
Thus the above theorem says that randomly rotated deterministic matrices become
asymptotically free in the limit of large matrix dimension. Another way of saying
this is that random matrix ensembles which are unitarily invariant (i.e., such that
the joint distribution of their entries is not changed by conjugation with any unitary
matrix) are asymptotically free from deterministic matrices.
Note that the eigenvalue distribution of BN is not changed if we consider UN BN UN∗
instead. Only the relation between AN and BN is brought into a generic form by ap-
plying a random rotation between the eigenspaces of AN and of BN .
Again one can generalize Theorems 8 and 9 by replacing the deterministic ma-
trices by random matrices, which are independent from the Haar unitary matrices
and which have an almost sure limit distribution. As outlined at the end of the last
section we will replace in Chapter 5 the assumption of almost sure convergence by
the vanishing of fluctuations var[tr(·), tr(·)] like 1/N 2 . See also our discussions in
Chapter 5 around Remark 5.26 and Theorem 5.29.
Note also that Gaussian random matrices are invariant under conjugation by uni-
tary matrices, i.e., if BN is GUE, then also UN BN UN∗ is GUE. Furthermore the fluctu-
ations of GUE random matrices vanish of the right order and hence we have almost
sure convergence to the semi-circle distribution. Thus Theorem 9 (in the version
where BN is allowed to be a random matrix ensemble with almost sure limit dis-
tribution) contains the asymptotic freeness of Gaussian random matrices and deter-
ministic random matrices (Theorem 4) as a special case.
112 4 Asymptotic Freeness
Let AN be now such a Wigner matrix; clearly, in our algebraic frame we have to
assume that all moments of µ exist; furthermore, we have to assume that the mean
of µ is zero, and we normalize the variance of µ to be 1.
Remark 11. We want to comment on our assumption that µ has mean zero. In ana-
lytic proofs involving Wigner matrices one usually does not need this assumption.
For example, Wigner’s semi-circle law holds for Wigner matrices, even if the en-
tries have non-vanishing mean. The general case can, by using properties of weak
convergence, be reduced to the case of vanishing mean. However, in our algebraic
frame we cannot achieve this reduction. The reason for this discrepancy is that
our notion of convergence in distribution is actually stronger than weak conver-
gence in situations where mass might escape to infinity. For example, consider a
deterministic diagonal matrix DN , with a11 = N, and all other entries zero. Then
µDN = (1 − 1/N)δ0 + 1/NδN , thus µDN converges weakly to δ0 , for N → ∞. How-
ever, the second and higher moments of DN with respect to tr do not converge, thus
DN does not converge in distribution.
Another simplifying assumption we have made is that the distribution of the di-
agonal entries is the same as that of the off-diagonal entries. With a little more work
the method given here can be made to work without this assumption.
h i
(1) (m)
E tr DN AN · · · DN AN
N
1 h
(1) (m)
i
= ∑ E di1 i2 ai2 i3 · · · di2m−1 i2m ai2m i1
N m/2+1 i1 ,...,i2m =1
N
1 (1) (m)
= ∑ E ai2 i3 · · · ai2m i1 di1 i2 · · · di2m−1 i2m
N m/2+1 i1 ,...,i2m =1
N
1 (1) (m)
= ∑ ∑ kσ (ai2 i3 , . . . , ai2m i1 )di1 i2 · · · di2m−1 i2m .
N m/2+1 i1 ,...,i2m =1 σ ∈P (m)
In the last step we have replaced the Wick formula for Gaussian random variables
by the general expansion of moments in terms of classical cumulants. Now we use
the independence of the entries of AN . A cumulant in the ai j is only different from
zero if all its arguments are the same; of course, we have to remember that ai j = a ji .
(Not having to bother about the complex conjugate here, is the advantage of looking
at real Wigner matrices.) Thus, in order that kσ [ai2 i3 , . . . , ai2m i1 ] is different from zero
we must have: if k and l are in the same block of σ then we must have {i2k , i2k+1 } =
{i2l , i2l+1 }. Note that now we do not prescribe whether i2k has to agree with i2l
or with i2l+1 . In order to deal with partitions of the indices i1 , . . . , i2m instead of
partitions of the pairs (i2 , i3 ), (i4 , i5 ) . . . , (i2m , i1 ), we say that a partition π ∈ P(2m)
114 4 Asymptotic Freeness
Here we using the notation k ∼σ l to mean that k and l are in the same block of σ .
Then the condition that kσ (ai2 i3 , . . . , ai2m i1 ) is different from zero can also be para-
phrased as: ker i ≥ π, for some lift π of σ . Note that the value of kσ (ai2 i3 , . . . , ai2m i1 )
depends only on ker(i) because we have assumed that the diagonal and off-diagonal
elements have the same distribution. Let us denote this common value by kker(i) .
Thus we can rewrite the equation above as
h i
(1) (m)
E tr DN AN · · · DN AN
1 (1) (m)
= ∑ ∑ kker(i) di1 i2 · · · di2m−1 i2m . (4.13)
N m/2+1 σ ∈P (m) i:[2m]→[N]
ker i≥π for some lift π of σ
Note that in general there is not a unique lift of a given σ . For example, for the one
block partition σ = {(1, 2, 3)} ∈ P(3) we have the following lifts in P(6):
{(1, 3, 5), (2, 4, 6)}, {(1, 3, 4), (2, 5, 6)}, {(1, 2, 4), (3, 5, 6)},
for fixed lifts π, then we have to notice that in general a multi-index i will show up
with different π’s; indeed, the lifts of a given σ are partially ordered by inclusion
and form a poset; thus we can rewrite the sum over i with ker i ≥ π for some lift π of
σ in terms of sums over fixed lifts, with some well-defined coefficients (given by the
Möbius function of this poset – see Exercise 8). However, the precise form of these
coefficients is not needed since we will show that at most one of the corresponding
sums has the right asymptotic order (namely N m/2+1 ), so all the other terms will play
no role asymptotically. So our main goal will now be to examine the sum (4.14) and
show that for all π ∈ P(2m) which are lifts of σ , a term of the form (4.14) grows
in N with order at most m/2 + 1, and furthermore, this maximal order is achieved
only in the case in which σ is a non-crossing pairing and π is the standard lift of σ .
4.4 Wigner and deterministic random matrices 115
After identifying these terms we must relate them to Equation (4.9); this is achieved
in Exercise 9.
Exercise 8. Let σ be a partition of [m] and M = {π ∈ P(2m) | π is a lift of σ }. For
a subset L of M, let πL = supπ∈L π; here sup denotes the join in the lattice of all
partitions. Use the principle of inclusion-exclusion to show that
(1) (m) (1) (m)
∑ di1 i2 · · · di2m−1 i2m = ∑ (−1)|L|−1 ∑ di1 i2 · · · di2m−1 i2m .
i:[2m]→[N] L⊂M i:[2m]→[N]
ker i≥π for some π∈M ker i≥πL
Let us first note that, because of our assumption that the entries of the Wigner
matrices have vanishing mean, first-order cumulants are zero and thus only those
σ which have no singletons will contribute to (4.13). This implies the same prop-
erty for the lifts and in (4.14) we can restrict ourselves to considering π without
singletons.
It turns out that it is convenient to associate to π a graph Gπ . Let us start with
the directed graph Γ2m with 2m vertices labelled 1, 2, . . . , 2m and directed edges
(1, 2), (3, 4), . . . , (2m − 1, 2m); (2i − 1, 2i) starts at 2i and goes to 2i − 1. Given a
π ∈ P(2m) we obtain a directed graph Gπ by identifying the vertices which belong
to the same block of π. We will not identify the edges (actually, the direction of two
edges between identified vertices might even not be the same) so that Gπ will in
general have multiple edges, as well as loops. The sum (4.14) can then be rewritten
in terms of the graph G = Gπ as
(e)
SG (N) := ∑ ∏ dit(e) ,is(e) , (4.15)
i:V (G)→[N] e∈E(G)
where we sum over all functions i : V (G) → [N], and for each such function we take
(e)
the product of dit(e) ,is(e) as e runs over all the edges of the graph and s(e) and t(e)
denote, respectively, the source and terminus of the edge e. Note that we keep all
edges under the identification according to π; thus the m matrices D(1) , . . . , D(m) in
(4.14) show up in (4.15) as the various De for the m edges of Gπ . See Fig. 4.6.
What we have to understand about such graph sums is their asymptotic behaviour
as N → ∞. This problem has a nice answer for arbitrary graphs, namely one can
estimate such graph sums (4.15) in terms of the norms of the matrices corresponding
to the edges and properties of the graph G. The relevant feature of the graph is the
structure of its two-edge connected components.
116 4 Asymptotic Freeness
Definition 12. A cutting edge of a connected graph is an edge whose removal would
disconnect the graph. A connected graph is two-edge connected if it does not contain
a cutting edge, i.e., if it cannot be disconnected by the removal of an edge. A two-
edge connected component of a graph is a two-edge connected subgraph which is
not properly contained is a larger two-edge connected subgraph.
A forest is a graph without cycles. A tree is a connected component of a forest,
i.e., a connected graph without cycles. A tree is trivial if it consists of only one
vertex. A leaf of a non-trivial tree is a vertex which meets only one edge. The sole
vertex of a trivial tree will also be called a trivial leaf.
We can now state the main theorem on estimates for graph sums. The special case
for two-edge connected graphs goes back to the work of Yin and Krishnaiah [206],
see also the book of Bai and Silverstein [15]. The general case, which is stronger
than the corresponding statement in [206, 15], is proved in [130].
Theorem 14. Let G be a directed graph, possibly with multiple edges and loops. Let
(e)
for each edge e of G be given an N × N matrix De = (di j )Ni,j=1 . Then the associated
graph sum (4.15) satisfies
where r(G) is determined as follows from the structure of the graph G. Let F(G) be
the forest of two-edge connected components of G. Then
r(G) = ∑ r(l),
l leaf of F(G)
where (
1, if l is a trivial leaf
r(l) := 1
.
2, if l is a leaf of a non-trivial tree
Note that each tree of the forest F(G) makes at least a contribution of 1 in r(G),
because a non-trivial tree has at least two leaves. One can also make the description
above more uniform by having a factor 1/2 for each leaf, but then counting a trivial
leaf as two actual leaves. Note also that the direction of the edges plays no role for
4.4 Wigner and deterministic random matrices 117
2
D1
1
(3)
D2
3 (1, 4, 6)
(2)
6 D2
D1
D3
D3 4 (5)
5
Fig. 4.6 On the left we have Γ6 . We let π be the partition of [6] with blocks {(1, 4, 6), (2), (3), (5)}.
(1) (2) (3)
The graph on the right is Gπ . We have Sπ (N) = ∑i, j,k,l di j d jk d jl and r(Gπ ) = 3/2.
the estimate above. The direction of an edge is only important in order to define the
contribution of an edge to the graph sum. One direction corresponds to the matrix
De , the other direction corresponds to the transpose Dte . Since the norm of a matrix
is the same as the norm of its transpose, the estimate is the same for all graph sums
which correspond to the same undirected graph.
Let us now apply Theorem 14 to Gπ . We have to show that r(Gπ ) ≤ m/2 + 1 for
our graphs Gπ , π ∈ P(2m). Of course, for general π ∈ P(2m) this does not need to
be true. For example, if π = {(1, 2), (3, 4), . . . , (2m − 1, 2m)} then Gπ consists of m
isolated points and thus r(Gπ ) = m. Clearly, we have to take into account that we
can restrict in (4.13) to lifts of a σ without singletons.
Definition 15. Let G = (V, E) be a graph and w1 , w2 ∈ V . Let us consider the graph
G0 obtained by merging the two vertices w1 and w2 into a single vertex w. This
means that the vertices V 0 of G0 are (V \ {w1 , w2 }) ∪ {w}. Also each edge of G
becomes an edge of G0 , except that if the edge started (or ended) at w1 or w2 then
the corresponding edge of G0 starts (or ends) at w.
Lemma 16. Suppose π1 and π2 are partitions of [2m] and π1 ≤ π2 . Then r(Gπ2 ) ≤
r(Gπ1 ).
Proof: We only have to consider the case where π2 is obtained from π1 by joining
two blocks w1 and w2 of π1 , and then use induction.
We have to consider three cases. Let C1 and C2 be the two-edge connected com-
ponents of Gπ1 containing w1 and w2 respectively. Recall that r(Gπ1 ) is the sum of
the contributions of each connected component and the contribution of a connected
component is either 1 or one half the number of leaves in the corresponding tree of
F(Gπ1 ), whichever is larger.
Case 1. Suppose the connected component of Gπ1 containing w1 is two-edge con-
nected, i.e. C1 becomes the only leaf of a trivial tree in F(Gπ1 ). Then the contribution
of this component to r(Gπ1 ) is 1. If w2 is in C1 then merging w1 and w2 has no effect
on r(Gπ1 ) and thus r(Gπ1 ) = r(Gπ2 ). If w2 is not in C1 , then C1 gets joined to some
118 4 Asymptotic Freeness
C1
C2
Fig. 4.7 Suppose w1 and w2 are in the same connected component of Gπ1 but in different, say C1
and C2 , two-edge connected components of Gπ1 , we collapse the edge (shown here shaded) joining
C1 to C2 in F(Gπ1 ). (See Case 3 in the proof of Lemma 16.)
e e e
… … … …
v v v v v
Fig. 4.8 If we remove the vertex v from a graph we replace the edges e1 and e2 by the edge e. (See
Definition 17.)
other connected component of Gπ1 , which will leave the contribution of this other
component unchanged. In this latter case we shall have r(Gπ2 ) = r(Gπ1 ) − 1.
For the rest of the proof we shall assume that neither w1 nor w2 lies in a connected
component of Gπ1 which has only one two-edge connected component.
Case 2. Suppose w1 and w2 lie in different connected components of Gπ1 . When w1
and w2 are merged the corresponding two-edge connected components are joined.
If either of these corresponded to a leaf in F(Gπ1 ) then the number of leaves would
be reduced by 1 or 2 (depending on whether both two-edge components were leaves
in Fπ1 ). Hence r(Gπ2 ) is either r(Gπ1 ) − 1/2 or r(Gπ1 ) − 1.
Case 3. Suppose that both w1 and w2 are in the same connected component of Gπ1 .
Then the two-edge connected components C1 and C2 become vertices of a tree T in
F(Gπ1 ) (see Fig. 4.7). When we merge w1 and w2 we form a two-edge connected
component C of Gπ2 , which consists of all the two-edge connected components
corresponding to the vertices of T along the unique path from C1 to C2 . On the level
of T this corresponds to collapsing all the edges between C1 and C2 into a single
vertex. This may reduce the number of leaves by 0, 1, or 2. If there were only two
leaves, we might end up with a single vertex but the contribution to r(Gπ1 ) would
still not increase. Thus r(Gπ1 ) can only decrease.
Definition 17. Let G be a directed graph and let v be a vertex of G. Suppose that v
has one incoming edge e1 and one outgoing edge e2 . Let G0 be the graph obtained
by removing e1 , e2 and v and replacing these with an edge e from s(e1 ) to t(e2 ). We
say that G0 is the graph obtained from G by removing the vertex v. See Fig. 4.8.
We say that the degree of a vertex is the number of edges to which it is incident,
using the convention that a loop contributes 2. The total degree of a subgraph is the
sum of the degrees of all its vertices.
Using the usual order on partitions of [2m], we say that a partition π is a minimal
lift of σ if is not larger than some other lift of σ .
4.4 Wigner and deterministic random matrices 119
Proof: Since σ has no singletons, each block of σ contains at least two elements
and thus each block of the lift π contains at least two points. Thus every vertex of
Gπ has degree at least 2. So a two-edge connected component with total degree less
than 3 must consist of a single vertex. Moreover if this vertex has distinct incoming
and outgoing edges then this two-edge connected component cannot become a leaf
in F(Gπ ). Thus Gπ has a two-edge connected component C which consists of a
vertex with a loop. Moreover C will also be a connected component. Since an edge
always goes from 2k − 1 to 2k, π must have a block consisting of the two elements
2k − 1 and 2k. Since π is a lift of σ , σ must have the block (k − 1, k). Since π is a
minimal lift of σ , π has the two blocks (2k − 2, 2k + 1), (2k − 1, 2k). This proves (i)
and (ii).
Now π 0 is a minimal lift of σ 0 because π was minimal on all the other blocks of
σ . Also the block (2k − 2, 2k + 1) corresponds to a vertex of Gπ with one incoming
edge and one outgoing edge. Thus by removing this block from π we remove a
vertex from Gπ , as described in Definition 17. Hence Gπ 0 is obtained from Gπ by
removing the connected component C and the vertex (2k − 2, 2k + 1).
Finally, the contribution of C to r(Gπ ) is 1. If the connected component, C0 , of
Gπ containing the vertex (2k − 2, 2k + 1) has only one other vertex, which would
have to be (2k − 3, 2k + 2), the contribution of this component to r(Gπ ) will be 1
and Gπ 0 will have as a connected component this vertex (2k − 3, 2k + 2) and a loop
whose contribution to r(Gπ 0 ) will still be 1. On the other hand, if C0 has more than
one other vertex then the number of leaves will not be diminished when the vertex
(2k − 1, 2k + 1) is removed and thus also in this case the contribution of C0 to r(Gπ )
is unchanged. Hence in both cases r(Gπ ) = r(Gπ 0 ) + 1.
Lemma 19. Consider σ ∈ P(m) without singletons and let π ∈ P(2m) be a lift of
σ . Then we have for the corresponding graph Gπ that
m
r(Gπ ) ≤ + 1, (4.17)
2
120 4 Asymptotic Freeness
D1
(1,3) (2,4)
D1 D2
(1, 2) (3, 4)
D2
Fig. 4.9 If σ = {(1, 2)} there are two possible minimal lifts: π1 = {(1, 2), (3, 4)} and π2 =
{(1, 3), (2, 4)}. We show Gπ1 on the left and Gπ2 on the right. The graph sum for π1 is
Tr(D1 )Tr(D2 ) and the graph sum for π2 is Tr(D1 Dt2 ). (See the conclusion of the proof of Lemma
19.)
and we have equality if and only if σ is a non-crossing pairing and π the corre-
sponding standard lift
k ∼σ l ⇔ 2k ∼π 2l + 1 and 2k + 1 ∼π 2l .
Proof: By Lemma 16 we may suppose that π is a minimal lift of σ . Let the con-
nected components of Gπ be C1 , . . . ,C p . Let the number of edges in Ci be mi , and
the number of leaves in the tree of F(Gπ ) corresponding to Ci be li . The contribution
of Ci to r(Gπ ) is ri = max{1, li /2}.
Suppose σ has no blocks of the form (k − 1, k). Then by Lemma 18 each two-
edge connected component of Gπ which becomes a leaf in F(Gπ ) must have total
degree at least 3. Thus mi ≥ 2 for each i. Moreover the contribution of each leaf to
the total degree must be at least 3. Thus 3li ≤ 2mi . If li ≥ 2 then ri = li /2 ≤ mi /3. If
li = 1 then, as mi ≥ 2, we have ri = 1 ≤ mi /2. So in either case ri ≤ mi /2. Summing
over all components we have r(Gπ ) ≤ m/2.
If σ does contain a block of the form (k −1, k) and π blocks (2k −2, 2k +1), (2k −
1, 2k), then we may repeatedly remove these blocks from σ and π until we reach
σ 0 and π 0 such that either: (a) σ 0 contains no blocks which are a pair of adjacent
elements; or (b) σ 0 = {(1, 2)} (after renumbering) and π 0 is a minimal lift of σ 0 . In
either case by Lemma 18, r(Gπ ) = r(Gπ 0 ) + q where q is the number of times we
have removed a pair of adjacent elements of σ .
In case (a), we have by the earlier part of the proof that r(Gπ 0 ) ≤ m0 /2. Thus
r(Gπ ) = r(Gπ 0 ) + q ≤ m0 /2 + q = m/2.
In case (b) we have that σ 0 = {(1, 2)} and either π = {(1, 2), (3, 4)} (π is stan-
dard) or π = {(1, 3), (2, 4)} (π is not standard). In the first case, see Fig. 4.9,
Gπ 0 has two vertices, each with a loop and so r(Gπ 0 ) = 2 = m0 /2 + 1, and hence
r(Gπ ) = q + m0 /2 + 1 = m/2 + 1. In the second case Gπ 0 is two-edge connected and
so r(Gπ 0 ) = 1 = m0 /2, and hence r(Gπ ) = q + m0 /2 = m/2. So we can only have
r(Gπ ) = m/2 + 1 when σ is a non-crossing pairing and π is standard; in all other
cases we have r(Gπ ) ≤ m/2.
Equipped with this lemma the investigation of the asymptotic freeness of Wigner
matrices and deterministic matrices is now quite straightforward. Lemma 19 shows
that the sum (4.14) has at most the order N m/2+1 and that the maximal order is
4.4 Wigner and deterministic random matrices 121
achieved exactly for σ which are non-crossing pairings and for π which are the
corresponding standard lifts. But for those we get in (4.13) the same contribution
as for Gaussian random matrices. The other terms in (4.13) will vanish, as long as
we have uniform bounds on the norms of the deterministic matrices. Thus the result
for Wigner matrices is the same as for Gaussian matrices, provided we assume a
uniform bound on the norm of the deterministic matrices.
Moreover the forgoing arguments can be extended to several independent Wigner
matrices. Thus we have proved the following theorem.
Then, as N → ∞,
(1) (p) (1) (q) distr
AN , . . . , AN , DN , . . . , DN −→ s1 , . . . , s p , d1 , . . . , dq ,
By estimating the variance of the traces one can show that one also has almost
sure convergence in the above theorem; also, one can extend those statements to ran-
(k)
dom matrices DN which are independent from the Wigner matrices, provided one
assumes the almost sure version of a limit distribution and of the norm boundedness
condition. We leave the details to the reader.
Exercise 10. Show that under the same assumptions as in Theorem 20 one can
bound the variance of the trace of a word in Wigner and deterministic matrices as
h
(1) (m) i C
var tr DN AN · · · DN AN ≤ 2 ,
N
where C is a constant, depending on the word.
Show that this implies that Wigner matrices and deterministic matrices are almost
surely asymptotically free under the assumptions of Theorem 20.
Exercise 11. State (and possibly prove) the version of Theorem 20, where the
(1) (q)
DN , . . . , DN are allowed to be random matrices.
122 4 Asymptotic Freeness
Fig. 4.10 On the left we have the eigenvalue distribution of a Wishart random matrix with N = 100
and M = 200 averaged over 3000 instances and on the right we have one instance with N = 2000
and M = 4000. The solid line is the graph of the density of the limiting distribution.
Besides the Gaussian random matrices the most important random matrix en-
semble are the Wishart random matrices [203]. They are of the form A = N1 XX ∗ ,
where X is an N × M random matrix with independent Gaussian entries. There are
two forms: a complex case when the entries xi j are standard complex Gaussian ran-
dom variables with mean 0 and E(|xi j |2 ) = 1; and a real case where the entries are
real-valued Gaussian random variables with mean 0 and variance 1. Again, one has
an almost sure convergence to a limiting eigenvalue distribution (which is the same
in both cases), if one sends N and M to infinity in such a way that the ratio M/N
is kept fixed. Fig. 4.10 above shows the eigenvalue histograms with M = 2N: for
N = 100 and N = 2000. For N = 200 we have averaged over 3000 realizations.
By similar calculations as for the Gaussian random matrices one can show that in
the limit N, M → ∞ such that the ratio M/N → c, for some 0 < c < ∞, the asymptotic
averaged eigenvalue distribution is given by
N M
1
E(Tr(Ak )) = ∑ ∑ E(xi1 i−1 xi2 i−1 · · · xik i−k xi1 i−k ).
N k i1 ,...,i =1 i−1 ,...,i
k −k =1
4.5 Examples of random matrix calculations 123
Then use Exercise 1.7 to show that, in the case of standard complex Gaussian entries
for X, we have the “genus expansion”
−1 )−(k+1)
M #(σ )
E(tr(Ak )) = ∑ N #(σ )+#(γk σ .
σ ∈Sk N
Let us now consider the sum of random matrices. If the two matrices are asymptot-
ically free then we can apply the R-transform machinery for calculating the asymp-
totic distribution of their sum. Namely, for each of the two matrices we calculate the
Cauchy transform of their asymptotic eigenvalue distribution, and from this their
R-transform. Then the sum of the R-transforms gives us the R-transform of the sum
of the matrices, and from there we can go back to the Cauchy transform and, via
Stieltjes inversion theorem, to the density of the sum.
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
-1 0 1 2 3 4 -1 0 1 2 3 4
Fig. 4.12 On the left we display the averaged eigenvalue distribution for 3000 realizations of the
sum of a GUE and a complex Wishart random matrix with M = 200 and N = 100. On the right
we display the eigenvalue distribution of a single realization of the sum of a GUE and a complex
Wishart random matrix with M = 8000 and N = 4000.
Example 22. Consider now independent GUE and Wishart matrices. They are asymp-
totically free, thus the asymptotic eigenvalue distribution of their sum is given by
the free convolution of a semi-circle and a Marchenko-Pastur distribution.
Fig. 4.12 shows the agreement (for c = 2) between numerical simulations and
the predicted distribution using the R-transform. The first is averaged over 3000
realizations with N = 100, and the second is one realization for N = 4000.
One can also rewrite the combinatorial description (2.23) of the product of free
variables into an analytic form. The following theorem gives this version in terms
of Voiculescu’s S-transform [178]. For more details and a proof of that theorem we
refer to [140, Lecture 18].
1 + z h−1i
Sa (z) := Ma (z),
z
where M h−1i denotes the inverse under composition of M. Then: if a and b are free,
we have Sab (z) = Sa (z) · Sb (z).
Again, this allows one to do analytic calculations for the asymptotic eigenvalue
distribution of a product of asymptotically free random matrices. One should note
in this context, that the product of two self-adjoint matrices is in general not self-
adjoint, thus is is not clear why all its eigenvalues should be real. (If they are not
real then the S-transform does not contain enough information to recover the eigen-
values.) However, if one makes the restriction that at least one of the matrices has
4.5 Examples of random matrix calculations 125
����
����
����
����
���� ����
���� ����
�� �� �� �� �� �� �� �� �� �� �� �� �� ��
Fig. 4.13 The eigenvalue distribution of the product of two independent complex Wishart matrices.
On the left we have one realization with N = 100 and M = 500. On the right we have one realization
with N = 2000 and M = 10000. See Example 24.
positive spectrum, then, because the eigenvalues of AB are the same as those of the
self-adjoint matrix B1/2 AB1/2 , one can be sure that the eigenvalues of AB are real as
well, and one can use the S-transform to recover them. One should also note that a
priori the S-transform of a is only defined if ϕ(a) 6= 0. However, by allowing for-
√
mal power series in z one can also extend the definition of the S-transform to the
case where ϕ(a) = 0, ϕ(a2 ) > 0. For more on this, and the corresponding version
of Theorem 23 in that case, see [144].
Example 24. Consider two independent Wishart matrices. They are asymptotically
free; this follows either by the fact that a Wishart matrix is unitarily invariant or, al-
ternatively, by an easy generalization of the genus expansion from (4.18) to the case
of several independent Wishart matrices. So the asymptotic eigenvalue distribution
of their product is given by the distribution of the product of two free Marchenko-
Pastur distributions.
As an example consider two independent Wishart matrices for c = 5. Fig. 4.13
compares simulations with the analytic formula derived from the S-transform. The
first is one realization for N = 100 and M = 500, the second is one realization for
N = 2000 and M = 10000.
Chapter 5
Fluctuations and Second-Order Freeness
��� ���
��� ���
��� ���
��� ���
��� ���
��� ���
��� ���
� �
-� -� � � � -� -� � � �
Fig. 5.1 On the left is a histogram of the eigenvalues of an instance of a 50 × 50 GUE random ma-
trix. The tick marks at the bottom show the actual eigenvalues. On the right we have independently
sampled a semi-circular distribution 50 times. We can see that the spacing is more ‘uniform’ in the
eigenvalue plot (on the left). The fluctuation moments are a way of measuring this quantitatively.
127
128 5 Second-Order Freeness
We saw earlier that freeness allows us to find the limiting distributions of XN +YN
or XN YN provided we know the limiting distributions of XN and YN individually and
XN and YN are asymptotically free. The theory of second-order freeness, which was
developed in [59, 128, 129], provides an analogous machinery for calculating the
fluctuations of sums and products from those of the constituent matrices, provided
one has asymptotic second order freeness.
We want to emphasize that on the level of fluctuations the theory is less robust
than on the level of expectations. In particular, whereas on the first order level it
does not make any difference for most results whether we consider real or complex
random matrices, this is not true any more for second order. What we are going to
present here is the theory of second-order freeness for complex random matrices
(modelled according to the GUE). There exists also a real second-order freeness
theory (modelled according to the GOE, i.e., Gaussian orthogonal ensemble); the
general structure of the real theory is the same as in the complex case, but details
are different. In particular, in the real case there will be additional contributions in
the combinatorial formulas, which correspond to non-orientable surfaces. We will
not say more on the real case, but refer to [127, 147].
The Chebyshev polynomials of the first kind are defined by the relation Tn (cos θ )
= cos nθ . They are the orthogonal polynomials on [−1, 1] which are orthogonal with
respect to the arc-sine law π −1 (1 − x2 )−1/2 . Rescaling to the interval [−2, 2] means
using the measure π −1 (4 − x2 )−1/2 dx and setting Cn (x) = 2 Tn (x/2). We thus have
C0 (x) = 2 C3 (x) = x3 − 3x
C1 (x) = x C4 (x) = x4 − 4x2 + 2
C2 (x) = x2 − 2 C5 (x) = x5 − 5x3 + 5x
and for n ≥ 1, Cn+1 (x) = xCn (x) −Cn−1 (x).
The reader will be asked to prove some of the above mentioned properties of Cn (as
well as corresponding properties of the second kind analogue Un ) in Exercise 12.
We will provide a proof of Theorem 1 at the end of this chapter, see Section 5.6.1.
Recall that in the case of first order freeness the moments of the GUE had a
combinatorial interpretation in terms of planar diagrams. These diagrams led to the
notion of free cumulants and the R-transform, which unlocked the whole theory.
For the GUE the moments {αk }k of the limiting eigenvalue distribution are 0 for k
odd and the Catalan numbers for k even. For example when k = 6, α6 = 5, the third
Catalan number, and the corresponding diagrams are the five non-crossing pairings
on [6].
1 1 1 1 1
6 2 6 2 6 2 6 2 6 2
5 3 5 3 5 3 5 3 5 3
4 4 4 4 4
Definition 2. Let {XN }N be a sequence of random matrices. We say that {XN }N has
a second-order limiting distribution if there are sequences {αk }k and {α p,q } p,q such
that
◦ for all k, αk = limN E(tr(XNk )) and
◦ for all p ≥ 1 and q ≥ 1,
Here, kr are the classical cumulants; note that the αk are the limits of k1 (which
is the expectation) and α p,q are the limits of k2 (which is the covariance).
130 5 Second-Order Freeness
Remark 3. Note that the first condition says that XN has a limiting eigenvalue dis-
tribution in the averaged sense. By the second condition the variances of normal-
ized traces go asymptotically like 1/N 2 . Thus, by Remark 4.2, the existence of a
second-order limiting distribution implies actually almost sure convergence to the
limit distribution.
We shall next show that the GUE has a second-order limiting distribution. The
numbers {α p,q } p,q that are obtained have an important combinatorial significance
as the number of non-crossing annular pairings. Informally, a pairing of the (p, q)-
annulus is non-crossing or planar if when we arrange the numbers 1, 2, 3, . . . , p
in clockwise order on the outer circle and the numbers p + 1, . . . , p + q in counter-
clockwise order on the inner circle there is a way to draw the pairings so that the
lines do not cross and there is at least one string that connects the two circles. For
example α4,2 = 8 and the eight drawings are shown below.
1 1 1 1
4 5 2 4 5 4 5 2 4 5 2
6 6 2 6 6
3 3 3 3
1 1 1 1
4 5 2 4 5 4 5 4 5
6 6 2 6 2 6 2
3 3 3 3
Theorem 4. Let γn denote the permutation in Sn which has the one cycle (1, 2, 3, . . . ,
n). For all π ∈ Sn we have
Definition 5. The (p, q)-annulus is the annulus with the integers 1 to p arranged
clockwise on the outside circle and p + 1 to p + q arranged counterclockwise on the
inner circle. A permutation π in S p+q is a non-crossing permutation on the (p, q)-
annulus (or just: a non-crossing annular permutation) if we can draw the cycles of
π between the circles of the annulus so that
(i) the cycles do not cross,
(ii) each cycle encloses a region between the circles homeomorphic to the disc with
boundary oriented clockwise, and
(iii) at least one cycle connects the two circles.
We denote by SNC (p, q) the set of non-crossing permutations on the (p, q)-annulus.
The subset consisting of non-crossing pairings on the (p, q)-annulus is denoted by
NC2 (p, q).
1
5 2
6 8
7
4 3
But for π2 we can find a drawing satisfying one of (i) or (ii) but not both. Notice also
that if we try to draw π1 on a disc we will have a crossing, so π1 is non-crossing on
the annulus but not on the disc. See also Fig. 5.2.
Notice that when we have a partition π of [n] and we want to know if π is non-
crossing in the disc sense, property (ii) of Definition 5 is automatic because we
always put the elements of the blocks of π in increasing order.
Remark 7. Note that in general we have to distinguish between non-crossing an-
nular permutations and the corresponding partitions. On the disc the non-crossing
condition ensures that for each π ∈ NC(n) there is exactly one corresponding non-
crossing permutation (by putting the elements in a block of π in increasing order to
read it as a cycle of a permutation). On the annulus, however, this one-to-one cor-
respondence breaks down. Namely, if π ∈ SNC (p, q) has only one through-cycle (a
through-cycle is a cycle which contains elements from both circles), then the block
structure of this cycle is not enough to recover its cycle structure. For example, in
SNC (2, 2) we have the following four non-crossing annular permutations:
As partitions all four are the same, having one block {1, 2, 3, 4}; but as permutations
they are all different. It is indeed the permutations, and not the partitions, which are
relevant for the description of the fluctuations. One should, however, also note that
this difference disappears if one has more than one through-cycle. Also for pairings
there is no difference between non-crossing annular permutations and partitions.
This justifies the notation NC2 (p, q) in this case.
Exercise 1. (i) Let π1 and π2 be two non-crossing annular permutations in SNC (p, q),
which are the same as partitions. Show that if they have more than one through-
cycle, then π1 = π2 .
(ii) Show that the number of non-crossing annular permutations which are the
same as partitions is, in the case of one through-cycle, given by mn, where m and n
are the number of elements of the through-cycle on the first and the second circle,
respectively.
1 1
8 2
5 2
6
7 3
7 8
6 4
4 3
5
Fig. 5.2 Consider the permutation π = (1, 5)(2, 6)(3, 4, 7, 8). As a disc permutation, it cannot be
drawn in a non-crossing way. However on the (5, 3)-annulus it has a non-crossing presentation.
Note that we have π −1 γ5,3 = (1, 6, 4)(2, 8)(3)(5)(7). So #(π) + #(π −1 γ8 ) = 8.
1
8 2 1 1
2 5 2
7 3 5 8 6 8
6
7 7
6 4
5 4 3 4 3
1
1 1 5 2
5 2 5 2
6 8 6 8 6 3
7 7
8 4
4 3 4 3
7
Hence our π is non-crossing in the disc; however, the order of the points on the
disc produced by cutting the annulus is not the standard order – it is the order given
by
γ̃ = γ (i, γ −1 π(i))
= (1, . . . , i, π(i), γ(π(i)), . . . , p + q, p + 1, . . . , γ −1 (π(i)), γ(i), . . . , p).
Thus we must show that for i and π(i) on different circles the following are equiva-
lent
(a) π is non-crossing in the disc with respect to γ̃ = γ (i, γ −1 π(i)), and
(b) #(π) + #(π −1 γ) = p + q.
If i and π(i) are in different cycles of γ, then i and π −1 γ(i) are in the same
cycle of π −1 γ. Hence #(π −1 γ (i, π −1 γ(i))) = #(π −1 γ) + 1. Thus #(π) + #(π −1 γ̃)
= #(π) + #(π −1 γ) + 1. Since γ̃ has only one cycle we know, by Theorem 4, that π
is non-crossing with respect to γ̃ if and only if #(π) + #(π −1 γ̃) = p + q + 1. Thus π
is non-crossing with respect to γ̃ if and only if #(π) + #(π −1 γ) = p + q. This shows
the equivalence of (a) and (b).
This result is part of a more general theory of maps on surfaces found by Jacques
[102] and Cori [61]. Suppose we have two permutations π and γ in Sn and that π
and γ generate a subgroup of Sn that acts transitively on [n]. Suppose also that γ has
k cycles and we draw k discs on a surface of genus g and arrange the points in the
cycles of γ around the circles so that when viewed from the outside the numbers
appear in the same order as in the cycles of γ. We then draw the cycles of π on the
surface such that
◦ the cycles do not cross, and
◦ each cycle of π is the oriented boundary of a region on the sphere, oriented with
an outward pointing normal, homeomorphic to a disc.
The genus of π relative to γ is the smallest g such that the cycles of π can be drawn
on a surface of genus g. When g = 0, i.e. we can draw π on a sphere, we say that π
is γ-planar.
In the example below we let n = 3, γ = (1, 2, 3) and, in the first example π1 =
(1, 2, 3) and in the second π2 = (1, 3, 2).
5.1 Fluctuations of GUE random matrices 135
2 2
3 1 3 1
Since π1 and π2 have only one cycle there is no problem with the blocks crossing;
it is only to get the correct orientation that we must add a handle for π2 .
Sketch The idea of the proof is to use Euler’s formula for the surface of genus
g on which we have drawn the cycles of π, as in the definition. Each cycle of γ is
a disc numbered according to γ and we shrink each of these to a point to make the
vertices of our simplex. Thus V = #(γ). The resulting surface will have one face for
each cycle of π and one for each cycle of π −1 γ. Thus F = #(π) + #(π −1 γ). Finally
the edges will be the boundaries between the cycles of π and the cycles of π −1 γ,
there will be n of these. Thus 2(1 − g) = F − E +V = #(π) + #(π −1 γ) − n + #(γ).
Remark 10. The requirement that the subgroup generated by π and γ act transitively
is needed to get a connected surface. In the disconnected case we can replace 2(1 −
g) by the Euler characteristic of the union of the surfaces.
Theorem 11. Let {XN }N be the GUE. Then {XN }N has a second-order limiting
distribution with fluctuation moments {α p,q } p,q where α p,q is the number of non-
crossing pairings on a (p, q)-annulus.
αk = lim E(tr(XNk ))
N
exists for all k, and is given by the number of non-crossing pairings of [k]. Let us
next fix r ≥ 2 and positive integers p1 , p2 , . . . , pr and we shall find a formula for
kr (Tr(XNp1 ), Tr(XNp2 ), . . . , Tr(XNpr )).
We shall let p = p1 + p2 + · · · + pr and γ be the permutation in S p with the r
cycles
136 5 Second-Order Freeness
Given a pairing π and a pair (s,t) of π, E( fis ,iγ(s) fit ,iγ(t) ) will be 0 unless is = iγ(t) and
it = iγ(s) . Following our usual convention of regarding partitions as permutations and
a p-tuple (i1 , . . . , i p ) as a function i : [p] → [N], this last condition can be written as
i(s) = i(γ(π(s))) and i(t) = i(γ(π(t))). Thus for Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) ) to be non-
zero we require i = i◦γ ◦π, or the function i to be constant on the cycles of γπ. When
Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) ) 6= 0 it equals N −p/2 (by our normalization of the variance,
E(| fi j |2 ) = 1/N). An important quantity will then be the number of functions i :
[p] → [N] that are constant on the cycles of γπ; since we can choose the value of the
function arbitrarily on each cycle this number is N #(γπ) . Hence
N
E(Tr(XNp1 ) · · · Tr(XNpr )) = ∑ ∑ Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) )
i1 ,...,i p =1 π∈P2 (p)
N
= ∑ ∑ Eπ ( fi1 ,iγ(1) , . . . , fi p ,iγ(p) )
π∈P2 (p) i1 ,...,i p =1
= ∑ N #(γπ)−p/2 .
π∈P2 (p)
The next step is to find which pairings π contribute to the cumulant kr . Recall
that if Y1 , . . . ,Yr are random variables then
where µ is the Möbius function of the partially ordered set P(r), see Exercise 1.14.
If σ is a partition of [r] there is an associated partition σ̃ of [p] where each block of
5.1 Fluctuations of GUE random matrices 137
σ̃ is a union of cycles of γ, in fact if s and t are in the same block of σ then the rth
and sth cycles of γ
are in the same block of σ̃ . Using the same calculation as was used above we have
for σ ∈ P(r)
Now given π ∈ P(p) we let π̂ be the partition of [r] such that s and t are in the same
block of π̂ if there is a block of π that contains both elements of sth and t th cycles
of π. Thus
= ∑ N #(γπ)−p/2 ∑ µ(σ , 1r ).
π∈P2 (p) σ ∈P (r)
σ ≥π̂
A fundamental fact of the Möbius function is that for an interval [σ1 , σ2 ] in P(r)
we have ∑σ1 ≤σ ≤σ2 µ(σ , σ2 ) = 0 unless σ1 = σ2 in which case the sum is 1. Thus
∑σ ≥π̂ µ(σ , 1r ) = 0 unless π̂ = 1r in which case the sum is 1. Hence
When π̂ = 1r the subgroup generated by γ and π acts transitively on [p] and thus
Euler’s formula (5.5) can be applied. Thus for the π which appear in the sum we
have
#(γπ) = #(π −1 γ)
= p + 2(1 − g) − #π − #γ
= p + 2(1 − g) − p/2 − r
= p/2 + 2(1 − g) − r,
and thus #(γπ) − p/2 = 2 − r − 2g. So the leading order of kr , corresponding to the
γ-planar π, is given by N 2−r . Taking the limit N → ∞ gives the assertion. It shows
that kr goes to zero for r > 2, and for r = 2 the limit is given by the number of
γ-planar π, i.e., by #(NC2 (p, q)).
138 5 Second-Order Freeness
(r)
where NC2 (p, q) denotes the non-crossing annular pairings which respect the
colour, i.e., those π ∈ NC2 (p, q) such that (k, l) ∈ π only if rk = rl . Furthermore,
all higher order cumulants of unnormalized traces go to zero.
Maybe more interesting is the situation where we also include deterministic ma-
trices. Similarly to the first order case, we expect to see some second-order freeness
structure appearing there. Of course, the calculation of the asymptotic fluctuations
of mixed moments in GUE and deterministic matrices will involve the (first order)
limiting distribution of the deterministic matrices. Let us first recall what we mean
by this.
Definition 12. Suppose that we have, for each N ∈ N, deterministic N × N matrices
(N) (N)
D1 , . . . , Ds ∈ MN (C) and a non-commutative probability space (A, ϕ) with ele-
ments d1 , . . . , ds ∈ A such that we have for each polynomial p ∈ Chx1 , . . . , xs i in s
non-commuting variables
(N) (N)
lim tr p(D1 , . . . , Ds ) = ϕ p(d1 , . . . , ds ) .
N
(N) (N)
Then we say that (D1 , . . . , Ds )N has a limiting distribution given by (d1 , . . . , ds ) ∈
(A, ϕ).
(N) (N)
Theorem 13. Suppose X1 , . . . , Xs are s independent N × N GUE random matri-
(N) (N)
ces. Fix p, q ≥ 1 and let {D1 , . . . , D p+q } ⊆ MN (C) be deterministic N × N matri-
ces with limiting distribution given by d1 , . . . , d p+q ∈ (A, ϕ). Then we have for all
1 ≤ r1 , . . . , r p+q ≤ s that
(N) (N) (N) (N) (N) (N) (N) (N)
lim k2 Tr(D1 Xr1 · · · D p Xr p ), Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )
N
= ∑ ϕγ p,q π (d1 , . . . , d p+q ),
(r)
π∈NC2 (p,q)
where the sum runs over all π ∈ NC2 (p, q) such that (k, l) ∈ π only if rk = rl and
where
γ p,q = (1, . . . , p)(p + 1, . . . , p + q) ∈ S p+q . (5.6)
5.2 Fluctuations of several matrices 139
Proof: Let us first calculate the expectation of the product of the two traces. For
better legibility, we suppress in the following the upper index N. We write as usual
(N) (r) (N) (p) (r)
Xr = ( fi j ) and D p = (di j ). We will denote by P2 (p + q) the pairings of
(r)
[p +q] which respect the colour r = (r1 , . . . , r p+q ), and by P2,c (p +q) the pairings in
(r)
P2 (p+q) where at least one pair connects a point in [p] to a point in [p+1, p+q] =
{p + 1, p + 2, . . . , p + q}.
E Tr(D1 Xr1 · · · D p Xr p )Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )
(1) (r ) (2) (p) (r ) (p+1) (p+q) (r p+q )
= ∑ E di1 j1 f j1 i12 di2 i3 · · · di p j p f j p ip1 · di p+1 j p+1 · · · di p+q j p+q f j p+q i p+1
i1 ,...,i p+q
j1 ,..., j p+q
(r ) (r p+q ) (1) (p+q)
= ∑ E f j1 i12 · · · f j p+q i p+1 di1 j1 · · · di p+q j p+q
i1 ,...,i p+q
j1 ,..., j p+q
(1) (p+q)
= ∑ ∑ N −(p+q)/2 δ j,i◦γ p,q ◦π · di1 j1 · · · di p+q j p+q
i1 ,...,i p+q (r)
π∈P2 (p+q)
j1 ,..., j p+q
(1) (p+q)
= N −(p+q)/2 ∑ ∑ di1 j1 · · · di p+q j p+q
(r) i ,...,i p+q
π∈P2 (p+q) 1
j1 ,..., j p+q
j=i◦γ p,q ◦π
(1) (p+q)
= N −(p+q)/2 ∑ ∑ di1 ,iγ · · · di p+q ,iγ
p,q ◦π(1) p,q ◦π(p+q)
(r) i ,...,i p+q
π∈P2 (p+q) 1
For π ∈ P2,c (p + q) we have #(π) + #(γ p,q π) + #(γ p,q ) = p + q + 2(1 − g), and
hence #(γ p,q π) − p+q
2 = −2g. The genus g is always ≥ 0 and equal to 0 only when
π is non-crossing. Thus
Note that this is a property of the algebra generated by the D’s. We won’t prove
it here but for many examples we have kr (Y1 , . . . ,Yr ) = O(N 2−r ) with the Yi ’s as
above. These examples include the GUE, Wishart, and Haar distributed unitary ran-
dom matrices.
(N) (N)
Theorem 16. Suppose X1 , . . . , Xs are s independent N × N GUE random matri-
(N) (N)
ces. Fix p, q ≥ 1 and let {D1 , . . . , D p+q } ⊆ MN (C) be random N × N matrices
with a limiting distribution and with bounded higher cumulants. Then we have for
all 1 ≤ r1 , . . . , r p+q ≤ s that
(N) (N) (N) (N) (N) (N) (N) (N)
k2 tr(D1 Xr1 · · · D p Xr p ), tr(D p+1 Xr p+1 · · · D p+q Xr p+q ) = O(N −2 ).
Proof: We rewrite the proof of Theorem 13 with the change that the D’s are now
random to get
E Tr(D1 Xr1 · · · D p Xr p )Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )
= N −(p+q)/2
∑ E Trγ p,q π (D1 , . . . , D p+q ) ,
(r)
π∈P2 (p+q)
and
E Tr(D1 Xr1 · · · D p Xr p ) · E Tr(D p+1 Xr p+1 · · · D p+q Xr p+q )
= N −(p+q)/2 ∑ E Trγ p π1 (D1 , . . . , D p ) · E Trγq π2 (D p+1 , . . . , D p+q ) .
(r)
π1 ∈P2 (p)
(r)
π2 ∈P2 (q)
5.2 Fluctuations of several matrices 141
We shall show that both of these terms are O(1), and thus after normalizing the
traces k2 = O(N −2 ). For the first term this is the same argument as in the proof
(r) (r)
of Theorem 5.13. So let π1 ∈ P2 (p) and π2 ∈ P2 (q). We let s = #(γ p π1 ) and
t = #(γq π2 ). Since γ p π1 has s cycles we may write Trγ p π1 (D1 , . . . , D p ) = Y1 · · ·Ys with
each Yi of the form Tr(Dl1 · · · Dlk ). Likewise since γq π2 has t cycles we may write
Trγq π2 (D p+1 , . . . , D p+q ) = Ys+1 · · ·Ys+t with the Y ’s of the same form as before. Now
by our assumption on the D’s we know that for u ≥ 2 we have ku (Yi1 , . . . ,Yiu ) = O(1).
Using the product formula for classical cumulants, see Equation (1.16), we have that
where τ must connect [s] to [s+1, s+t]. Now kτ (Y1 , . . . ,Ys+t ) = O(N c ) where c is the
number of singletons in τ. Thus the order of N −(p+q)/2 kτ (Y1 , . . . ,Ys+t ) is N c−(p+q)/2 .
So we are reduced to showing that c ≤ (p+q)/2. Since τ connects [s] to [s+1, s+t],
τ must have a block with at least 2 elements. Thus the number of singletons is at
142 5 Second-Order Freeness
Usually our second-order limit elements will arise as limits of random matrices;
where ϕ encodes the asymptotic behaviour of the expectation of traces, whereas
ϕ2 does the same for the covariances of traces. As we have seen before, in typical
examples (as the GUE) we should consider the expectation of the normalized trace
tr, but the covariances of the unnormalized traces Tr.
As we have seen in Theorem 16 one usually also needs some control over the
higher order cumulants; requiring bounded higher cumulants for the unnormalized
traces of the D’s was enough to control the variances of the mixed unnormalized
traces. However, as in the case of one matrix (see Definition 2), we will in the
following definition require instead of boundedness of the higher cumulants the
stronger condition that they converge to zero. This definition from [128] makes some
arguments easier, and is usually satisfied in all relevant random matrix models. Let
us point out that, as remarked in [127], the whole theory could also be developed
with the boundedness condition instead.
(N) (N)
Definition 18. Suppose we have a sequence of random matrices {A1 , . . . , As }N
and random variables a1 , . . . , as in a second-order non-commutative probability
(N) (N)
space. We say that (A1 , . . . , As )N has the second order limit (a1 , . . . , as ) if we
have:
◦ for all p ∈ Chx1 , . . . , xs i
(N) (N)
lim E tr(p(A1 , . . . , As )) = ϕ p(a1 , . . . , as ) ;
N
5.3 second-order probability space and second-order freeness 143
Remark 19. As in Remark 3, the second condition implies that we have almost sure
(N) (N)
convergence of the (first order) distribution of the {A1 , . . . , As }N . So in particu-
lar, if the a1 , . . . , as are free, then the existence of a second-order limit includes also
(N) (N)
the fact that A1 , . . . , As are almost surely asymptotically free.
Example 20. A trivial example of second-order limit is given by deterministic ma-
(N) (N)
trices. If {D1 , . . . , Ds } are deterministic N × N matrices with limiting distribu-
tion then kr (Y1 , . . . ,Yr ) = 0 for r > 1 and for any polynomials Yi in the D’s. So
(N) (N)
D1 , . . . , Ds has a second-order limiting distribution; ϕ is given by the limiting
distribution and ϕ2 is identically zero.
Example 21. Define (A, ϕ, ϕ2 ) by A = Chsi and
Then (A, ϕ, ϕ2 ) is a second-order probability space and s is, by Theorem 11, the
second-order limit of a GUE random matrix. In first order s is, of course, just a
semi-circular element in (A, ϕ). We will address a distribution given by (5.7) as a
second-order semi-circle distribution.
Exercise 3. Prove that the second-order limit of a Wishart random matrix with rate
c (see Section 4.5.1) is given by (A, ϕ, ϕ2 ) with A = Chxi and
Exercise 4. Prove the statement from the previous example: Show that for Haar
distributed N × N unitary random matrices U we have
(
|p|, if p = −q
lim k2 (Tr(U p ), Tr(U q )) =
N 0, otherwise
and that the higher order cumulants of unnormalized traces of polynomials in U and
U ∗ go to zero.
Example 23. Let us now consider the simplest case of several variables, namely the
limit of s independent GUE. According to Exercise 2 their second-order limit is given
by (A, ϕ, ϕ2 ) where A = Chx1 , . . . , xs i and
(r)
ϕ xr(1) · · · xr(k) = # NC2 (k)
and (r)
ϕ2 xr(1) · · · xr(p) , xr(p+1) · · · xr(p+q) = #(NC2 (p, q)).
In the same way as we used in Chapter 1 the formula for ϕ as our guide to the defi-
nition of the notion of freeness we will now have a closer look on the corresponding
formula for ϕ2 and try to extract from this a concept of second-order freeness.
As in the first order case, let us consider ϕ2 applied to alternating products of
centred variables, i.e., we want to understand
m n
ϕ2 (xim1 1 − cm1 1) · · · (xi p p − cm p 1), (xnj11 − cn1 1) · · · (x jqq − cnq 1) ,
where cm := ϕ(xim ) (which is independent of i). The variables are here assumed to
be alternating in each argument, i.e., we have
i1 6= i2 6= · · · 6= i p−1 6= i p and j1 6= j2 6= · · · =
6 jq−1 6= jq .
In addition, since the whole theory relies on ϕ2 being tracial in each of its arguments
(as the limit of variances of traces) we will actually assume that it is alternating in a
cyclic way, i.e., that we also have i p 6= i1 and jq 6= j1 .
Let us put m := m1 + · · · + m p and n := n1 + · · · + nq . Furthermore, we call the
consecutive numbers corresponding to the factors in our arguments “intervals”; so
the intervals on the first circle are
(m + 1, . . . , m + n1 ), . . . , (m + n1 + · · · + nq−1 + 1, . . . , m + n).
5.3 second-order probability space and second-order freeness 145
By the same arguing as in Chapter 1 one can convince oneself that the subtraction
of the means has the effect that instead of counting all π ∈ NC2 (m, n) we count
now only those where each interval is connected to at least one other interval. In
the first order case, because of the non-crossing property, there were no such π and
the corresponding expression was zero. Now, however, we can connect an interval
from one circle to an interval of the other circle, and there are possibilities to do this
m n
in a non-crossing way. Renaming ak := xik k − cmk 1 and bl := x jll − cnl 1 leads then
exactly to the formula which will be our defining property of second-order freeness
in the next definition.
6
2
7
8
12
9 11
10
5
3
Fig. 5.3 The spoke diagram for π = (1, 8)(2, 7)(3, 12)(4, 11)(5, 10)(6, 9). For this permutation we
have ϕπ (a1 , . . . , a12 ) = ϕ(a1 a8 )ϕ(a2 a7 )ϕ(a3 a12 )ϕ(a4 a11 )ϕ(a5 a10 )ϕ(a6 a9 ).
Exercise 6. Prove Theorem 27 by using the explicit formula for the second-order
limit distribution given in Theorem 13.
Exercise 7. Show that Theorem 27 remains also true if the deterministic matrices are
replaced by random matrices which are independent from the GUE’s and which have
5.4 second-order cumulants 147
The latter form comes from the fact that the free cumulants κn for semi-circulars
are 1 for n = 2 and zero otherwise, i.e., κπ is 1 for a non-crossing pairing, and zero
otherwise. For the second-order free Poisson (i.e, for the limit of Wishart random
matrices: see Exercise 3) we have
The latter form comes here from the fact that the free cumulants for a free Poisson
are all equal to c. So in both cases the value of ϕ2 is expressed as a sum over the an-
nular versions of non-crossing partitions, and each such permutation π is weighted
by a factor κπ , which is given by the product of first order cumulants, one factor κr
for each cycle of π of length r. This is essentially the same formula as for ϕ, the only
difference is that we sum over annular permutations instead over circle partitions.
However, it turns out that in general the term
is only one part of ϕ2 (a1 · · · am , am+1 · · · am+n ); there will also be another contribu-
tion which involves genuine “second order cumulants”.
To see that we need in general such an additional contribution, let us rewrite the
expression from Exercise 5 for ϕ2 (a1 b1 , a2 b2 ), for {a1 , a2 } and {b1 , b2 } being free
of second order, in terms of first order cumulants.
The three displayed terms are the three non-vanishing terms κπ for π ∈ SNC (2, 2)
(there are of course more such π, but they do not contribute because of the vanishing
of mixed cumulants in free variables). But we have some additional contributions
which we write in the form
something else = κ1,1 (a1 , a2 )κ1 (b1 )κ1 (b2 ) + κ1 (a1 )κ1 (a2 )κ1,1 (b1 , b2 )
The general structure of the additional terms is the following. We have second-
order cumulants κm,n which have as arguments m elements from the first circle and
n elements from the second circle. As one already sees in the above simple example,
one only has summands which contain at most one such second-order cumulant as
factor. All the other factors are first order cumulants. So these terms can also be
written as κσ , but now σ is of the form σ = π1 × π2 ∈ NC(m) × NC(n) where one
5.4 second-order cumulants 149
1 9
8 2
7 3 12 10
6 4
5 11
Fig. 5.4 The second-order non-crossing annular partition σ = {(1, 2, 3), (4, 7), (5, 6), (8)} ×
{(9, 12), (10, 11)} . Its contribution in the moment cumulant formula is
κσ (a1 , . . . , a12 ) = κ3,2 (a1 , a2 , a3 , a9 , a12 )κ2 (a4 , a7 )κ2 (a5 , a6 )κ1 (a8 )κ2 (a10 , a11 ).
block of σ1 and one block of σ2 is marked. The two marked blocks go together as
arguments into a second-order cumulant, all the other blocks give just first order
cumulants. Let us make this more rigorous in the following definition.
Here we have used the following notation. For a π = {V1 , . . . ,Vr } ∈ SNC (m, n) we
put
r
κπ (a1 , . . . , am+n ) := ∏ κ#(Vi ) (ak )i∈Vk ,
i=1
where the κn are the already defined first order cumulants in the probability space
(A, ϕ). For a σ ∈ [NC(m) × NC(n)] we define κσ as follows. If σ = (π1 ,W1 ) ×
(π2 ,W2 ) is of the form π1 = {W1 ,V1 , . . . ,Vr } ∈ NC(m) and π2 = {W2 , Ṽ1 , . . . , Ṽs } ∈
NC(n), where W1 and W2 are the two marked blocks, then
150 5 Second-Order Freeness
r s
κσ (a1 , . . . , am+n ) := ∏ κ#(Vi ) (ak )k∈Vi · ∏ κ#(Ṽ j ) (al )l∈Ṽ j ·
i=1 j=1
κ#(W1 ),#(W2 ) (au )u∈W1 , (av )v∈W2 .
The first sum only involves first order cumulants and in the second sum each term
is a product of one second-order cumulant and some first order cumulants. Thus,
since we already know all first order cumulants, the first sum is totally determined
in terms of moments of ϕ. The second sum, on the other side, contains exactly the
highest order term κm,n (a1 , . . . , am+n ) and some lower order cumulants. Thus, by
recursion, we can again solve the moment-cumulant formulas for the determination
of κm,n (a1 , . . . , am+n ).
2) For m = 2 and n = 1 we have four first order contributions in SNC (2, 1),
resulting in
By using the known formulas for κ1 , κ2 , κ3 , and the formula for κ1,1 from above,
this can be solved for κ2,1 :
√ N
Exercise 8. Let XN = 1/ N xi j i, j=1 be a Wigner random matrix ensemble, where
xi j = x ji for all i, j; all xi j for i ≥ j are independent; all diagonal entries xii are
identically distributed according to a distribution ν; and all off-diagonal entries xi j ,
for i 6= j are identically distributed according to a distribution µ. Show that {XN }N
has a second-order limit x ∈ (A, ϕ, ϕ2 ) which is in terms of cumulants given by: all
µ
first order cumulants are zero but κ2x = k2 ; all second-order cumulants are zero but
µ µ µ
x = k ; where k and k are the 2nd and 4th classical cumulant of µ, respectively.
κ2,2 4 2 4
The usefulness of the notion of second-order cumulants comes from the follow-
ing second-order analogue of the characterization of freeness by the vanishing of
mixed cumulants.
Sketch Let us give a sketch of the proof. The statement about the first order
cumulants is just Theorem 2.14.
That the vanishing of mixed cumulants implies second-order freeness follows
quite easily from the moment-cumulant formula. In the case of cyclically alternating
centred arguments the only remaining contributions are given by spoke diagrams
and then the moment cumulant formula (5.14) reduces to the defining formula (5.11)
of second-order freeness.
For the other direction, note first that second-order freeness implies the vanish-
ing of κm,n (a1 , . . . , am+n ) = 0 whenever all the ai are centred and both groups of
arguments are cyclically alternating, i.e., i1 6= i2 6= · · · 6= im 6= i1 and im+1 6= im+2 6=
··· =
6 im+n 6= im+1 . Next, because centring does not change the value of second-
order cumulants, we can drop the assumption of centredness. For also getting rid of
the assumption that neighbours must be from different algebras one has, as in the
first order case (see Theorem 3.14), to invoke a formula for second-order cumulants
which have products as arguments.
In the following theorem we state the formula for the κm,n with products as argu-
ments. For the proof we refer to [132].
Then
(ii) those π ∈ SNC (m, n) which connect the groups corresponding to all Ai on both
circles in the following annular way: for such a π all the groups must be con-
nected, but it is not possible to cut the annulus open by cutting on each of the two
circles between two groups.
Example 36. 1) Let us reconsider the second-order cumulant κ1,1 (A1 , A2 ) for A1 =
A2 = s2 from Example 33, by calculating it via the above theorem. Since all second-
order cumulants and all but the second first order cumulants of s are zero, in
the formula (5.16) there is no contributing σ and the only two possible π’s are
π1 = {(1, 3), (2, 4)} and π2 = {(1, 4), (2, 3)}. Both connect both groups (a1 , a2 ) and
(a3 , a4 ), but whereas π1 does this in an annular way, in the case of π2 the annulus
could be cut open outside these groups. So π1 contributes and π2 does not. Hence
As in the first order case one can, with the help of this product formula, also get a
version of the characterization of freeness in terms of vanishing of mixed cumulants
for random variables instead of subalgebras.
Exercise 9. The main point in reducing this theorem to the version for subalgebras
consists in using the product formula to show that the vanishing of mixed cumulants
154 5 Second-Order Freeness
in the variables implies also the vanishing of mixed cumulants in elements in the
generated subalgebras. As an example of this, show that the vanishing of all mixed
first and second-order cumulants in a1 and a2 implies also the vanishing of the mixed
cumulants κ2,1 (a31 , a1 , a22 ) and κ1,2 (a31 , a1 , a22 ).
κm,n (a, . . . , a). The vanishing of mixed cumulants for free variables gives then again
that our cumulants linearize the addition of free variables.
As in the first order case one can translate the combinatorial relation between mo-
ments and cumulants into a functional relation between generating power series. In
the following theorem we give this as a relation between the corresponding Cauchy
and R-transforms. Again, we refer to [59] for the proof and more details.
αn = ∑ κπ and αm,n = ∑ κπ + ∑ κσ
π∈NC(n) π∈SNC (m,n) σ ∈[NC(m)×NC(n)]
and
∂2 F(z) − F(w)
G(z, w) = G0 (z)G0 (w)R(G(z), G(w)) + log (5.19)
∂ z∂ w z−w
between the following formal power series: the Cauchy transforms
1 1
G(z) = ∑ αn z−n
z n≥0
and G(z, w) = ∑ αm,n z−m w−n
zw m,n≥1
Equation (5.18) is just the well-known functional relation (2.27) from Chapter
2 between first order moments and cumulants. Equation (5.19) determines a se-
quence of equations relating the first and second-order moments with the second-
order cumulants; if we also express the first order moments in terms of first or-
der cumulants, then this corresponds to the moment-cumulant relation αm,n =
∑π∈SNC (m,n) κπ + ∑σ ∈[NC(m)×NC(n)] κσ .
Note that formally the second term on the right-hand side of (5.19) can also be
written as
∂2 F(z) − F(w) ∂2 G(w) − G(z)
log = log ; (5.21)
∂ z∂ w z−w ∂ z∂ w z−w
but since (G(w) − G(z))/(z − w) has no constant term, the power series expansion
of log[(G(w) − G(z))/(z − w)] is not well-defined.
Below is a table, produced from (5.19), giving the first few equations.
α1,1 = κ1,1 + κ2
α1,2 = κ1,2 + 2κ1 κ1,1 + 2κ3 + 2κ1 κ2
α2,2 = κ2,2 + 4κ1 κ1,2 + 4κ12 κ1,1 + 4κ4 + 8κ1 κ3 + 2κ22 + 4κ12 κ2
α1,3 = κ1,3 + 3κ1 κ1,2 + 3κ2 κ1,1 + 3κ12 κ1,1 + 3κ4 + 6κ1 κ3 + 3κ22 + 3κ12 κ2
α2,3 = κ2,3 + 2κ1 κ1,3 + 3κ1 κ2,2 + 3κ2 κ1,2 + 9κ12 κ1,2 + 6κ1 κ2 κ1,1 + 6κ13 κ1,1
+ 6κ5 + 18κ1 κ4 + 12κ2 κ3 + 18κ12 κ3 + 12κ1 κ22 + 6κ13 κ2
α3,3 = κ3,3 + 6κ1 κ2,3 + 6κ2 κ1,3 + 6κ12 κ1,3 + 9κ12 κ2,2 + 18κ1 κ2 κ1,2 + 18κ13 κ1,2
+ 9κ22 κ1,1 + 18κ12 κ2 κ1,1 + 9κ14 κ1,1 + 9κ6 + 36κ1 κ5 + 27κ2 κ4 + 54κ12 κ4
+ 9κ32 + 72κ1 κ2 κ3 + 36κ13 κ3 + 12κ23 + 36κ12 κ22 + 9κ14 κ2 .
Remark 40. Note that the Cauchy transforms can also be written as
1 1
G(z) = lim E tr( ) =ϕ (5.22)
N→∞ z − AN z−a
and
1 1 1 1
G(z, w) = lim cov Tr( ), Tr( ) = ϕ2 , , (5.23)
N→∞ z − AN w − An z−a w−a
In the case where all the second-order cumulants are zero, i.e., R(z, w) = 0, Equa-
tion (5.19) expresses the second-order Cauchy transform in terms of the first order
Cauchy transform,
∂2 F(z) − F(w)
1 1
ϕ2 , = G(z, w) = log . (5.24)
z−a w−a ∂ z∂ w z−w
This applies then in particular to the GUE and Wishart random matrices; that in those
cases the second-order cumulants vanish follows from equations (5.12) and (5.13);
see also Example 33. In the case of Wishart matrices equation (5.24) (in terms of
G(z) instead of F(z), via (5.21)) was derived by Bai and Silverstein [14, 15].
However, there are also many important situations where the second-order cu-
mulants do not vanish and we need the full version of (5.19) to understand the
fluctuations. The following exercise gives an example for this.
Let us first look on the one-variable situation. If all second-order cumulants are
zero (as for example for GUE or Wishart random matrices), so that our second-order
Cauchy transform is given by (5.24), then one can proceed as follows.
In order to extract from G(z, w) some information about the covariance for arbi-
trary polynomials p1 and p2 we use Cauchy’s integral formula to write
1 p1 (z) 1 p2 (w)
Z Z
p1 (a) = dz, p2 (a) = dw,
2πi C1 z−a 2πi C2 w−a
where the contour integrals over C1 and C2 are in the complex plane around the
spectrum of a. We are assuming that a is a bounded self-adjoint operator, thus we
have to integrate around sufficiently large portions of the real line. This gives then,
by using Equation (5.24) and integration by parts,
1 1 1
Z Z
ϕ2 (p1 (a), p2 (a)) = − 2 p1 (z)p2 (w)ϕ2 , dzdw
4π C1 C2 z−x w−x
1
Z Z
=− 2 p1 (z)p2 (w)G(z, w)dzdw
4π C1 C2
1 ∂2 F(z) − F(w)
Z Z
=− 2 p1 (z)p2 (w) log dzdw
4π C1 C2 ∂ z∂ w z−w
1
Z Z F(z) − F(w)
=− 2 p01 (z)p02 (w) log dzdw.
4π C1 C2 z−w
158 5 Second-Order Freeness
We choose now for C1 and C2 rectangles with height going to zero, hence the integra-
tion over each of these contours goes to integrals over the real axis, one approaching
the real line from above, and the other approaching the real line from below. We de-
note the corresponding limits of F(z), when z is approaching x ∈ R from above or
from below, by F(x+ ) and F(x− ), respectively. Since p01 and p02 are continuous at
the real axis, we get
F(x+ ) − F(y+ )
1
Z Z
0 0
ϕ2 (p1 (a), p2 (a)) = − p 1 (x)p2 (y) log
2
4π R R x+ − y+
F(x+ ) − F(y− ) F(x− ) − F(y+ ) F(x− ) − F(y− )
− log − log + log dxdy.
x+ − y− x− − y+ x− − y−
Note that one has for the reciprocal Cauchy transform F(z̄) = F(z), hence
F(x− ) = F(x+ ). Since the contributions of the denominators cancel, we get in the
end
F(x) − F(y) 2
1
Z Z
0 0
ϕ2 (p1 (a), p2 (a)) = − 2 p (x)p2 (y) log
dxdy, (5.26)
4π R R 1 F(x) − F(y)
where F(x) denotes now the usual limit F(x+ ) coming from the complex upper
half-plane.
The diagonalization of this bilinear form (5.26) depends on the actual form of
F(x) − F(y) 2
2
1 = − 1 log G(x) − G(y) .
K(x, y) = − 2
log
2
(5.27)
4π F(x) − F(y) 4π G(x) − G(y)
Example 41. Consider the GUE case. Then G is the Cauchy transform of the semi-
circle, √ √
z − z2 − 4 x − i 4 − x2
G(z) = , thus G(x) = .
2 2
Hence we have
1 x − y − i(√4 − x2 − p4 − y2 ) 2
K(x, y) = − 2 log √
p
4π
x − y − i( 4 − x2 + 4 − y2 )
√ p
1 (x − y)2 + ( 4 − x2 − 4 − y2 )2
= − 2 log √ p
4π (x − y)2 + ( 4 − x2 + 4 − y2 )2
p
1 4 − xy − (4 − x2 )(4 − y2 )
= − 2 log p .
4π 4 − xy + (4 − x2 )(4 − y2 )
In the penultimate step we have used the expansion (5.31) for log(1 − cos θ ) from
the next exercise.
Similarly as cos(nθ ) is related to x = 2 cos θ via the Chebyshev polynomials Cn
of the first kind, sin(nθ ) can be expressed in terms of x via the Chebyshev polyno-
mials Un of the second kind. Those are defined via
sin((n + 1)θ )
Un (2 cos θ ) = . (5.28)
sin θ
We will address some of its properties in Exercise 12.
We can then continue our calculation above as follows.
1 ∞ 1
K(x, y) = ∑ n Un−1 (x) sin θ ·Un−1 (y) sin ψ
π 2 n=1
∞
1 1 p 1 p
= ∑ Un−1 (x) 4 − x2 ·Un−1 (y) 4 − y2 .
n=1 n 2π 2π
We will now use the following two facts about Chebyshev polynomials:
◦ the Chebyshev polynomials of second kind are orthogonal polynomials with re-
spect to the semi-circular distribution, i.e., for all m, n ≥ 0
Z +2
1 p
Un (x)Um (x) 4 − x2 dx = δnm ; (5.29)
−2 2π
◦ the two kinds of Chebyshev polynomials are related by differentiation,
Then we can recover Theorem 1 by checking that the covariance is diagonal for
the Chebyshev polynomials of first kind.
Z Z
ϕ2 (Cn (a),Cm (a)) = Cn0 (x)Cm0 (y)K(x, y)dxdy
160 5 Second-Order Freeness
Z +2Z +2 ∞
1 1 p
= nUn−1 (x)mUm−1 (y) ∑ Uk−1 (x) 4 − x2
−2 −2 k=1 k 2π
1 p
·Uk−1 (y) 4 − y2 dxdy
2π
∞ Z +2
1 1 p 2
= nm ∑ Un−1 (x)Uk−1 (x) 4 − x dx
k=1 k −2 2π
Z +2
1 p
× Um−1 (y)Uk−1 (y) 4 − y2 dy
−2 2π
∞
1
= nm ∑ δnk δmk
k=1 k
= nδnm .
Note that all our manipulations were formal and we did not address analytic issues,
like the justification of the calculations concerning contour integrals. For this, and
also for extending the formula for the covariance beyond polynomial functions one
should consult the original literature, in particular [104, 15].
Exercise 12. Let Cn and Un be the Chebyshev polynomials, rescaled to the interval
[−2, +2], of the first and second kind, respectively. (See also Notation 8.33 and
subsequent exercises.)
(i) Show that the definition of the Chebyshev polynomials via recurrence rela-
tions, as given in Notation 8.33, is equivalent to the definition via trigonometric
functions, as given in the discussion following Theorem 1 and in Equation (5.28).
(ii) Show equations (5.29) and (5.30).
(iii) Show that the Chebyshev polynomials of first kind are orthogonal with re-
spect to the arc-sine distribution, i.e., for all n, m ≥ 0 with (m, n) 6= (0, 0) we have
Z +2
dx
Cn (x)Cm (y) √ = δnm . (5.32)
−2 π 4 − x2
Note that the definition for the case n = 0, C0 = 2, is made in order to have it fit
with the recurrence relations; to fit the orthonormality relations, C0 = 1 would be
the natural choice.
Example 42. By similar calculations as for the GUE one can show that in the case of
Wishart matrices the diagonalization of the covariance (5.13) is achieved by going
5.6 Diagonalization of fluctuations 161
√ √
over to shifted Chebyshev polynomials of the first kind, cnCn ((x − (1 + c))/ c).
This result is due to Cabanal-Duvillard [47], see also [14, 115].
Remark 43. We want to address here a combinatorial interpretation of the fact that
the Chebyshev polynomials Ck diagonalize the covariance for a GUE random matrix.
Let s be our second-order semi-circular element; hence ϕ2 (sm , sn ) is given by the
number of annular non-crossing pairings on an (m, n) annulus. This is, of course,
not diagonal in m and n because some points on each circle can be paired among
themselves, and this pairing on both sides has no correlation; so there is no constraint
that m has to be equal to n. However, a quantity which clearly must be the same for
both circles is the number of through-pairs, i.e., pairs which connect both circles.
Thus in order to diagonalize the covariance we should go over from the number of
points on a circle to the number of through-pairs leaving this circle. A nice way
to achieve this is to cut our diagrams in two parts - one part for each circle. These
diagrams will be called non-crossing annular half-pairings. See Figures 5.5 and 5.7.
We will call what is left in a half-pairing of a through-pair after cutting an open pair
- as opposed to closed pairs which live totally on one circle and are thus not affected
by the cutting.
In this pictorial description sm corresponds to the sum over non-crossing an-
nular half-pairings on one circle with m points and sn corresponds to a sum over
non-crossing annular half-pairings on another circle with n points. Then ϕ2 (sm , sn )
corresponds to pairing the non-crossing annular half-pairings for sm with the non-
crossing annular half-pairings for sn . A pairing of two non-crossing annular half-
pairings consists of glueing together their open pairs in all possible planar ways.
This clearly means that both non-crossing annular half-pairings must have the same
number of open pairs, and thus our covariance should become diagonal if we go
over from the number n of points on a circle to the number k of open pairs. Fur-
thermore, there are clearly k possibilities to pair two sets of k open pairs in a planar
way.
1 1
1 1
4 2 4 2
4 2 4 2
3 3
3 3
Fig. 5.5 The 4 non-crossing half-pairings on four points with 2 through strings are shown.
From this point of view the Chebyshev polynomials Ck should describe k open
pairs. If we write xn as a linear combination of the Ck , xn = ∑nk=0 qn,kCk (x), then the
above correspondence suggests that for k > 0, the coefficients qn,k are the number
of non-crossing annular half-pairings of n points with k open pairs. See Fig. 5.6 and
Fig. 5.7.
162 5 Second-Order Freeness
1 q0,0
1 q1,1
2 1 q2,0 q2,2
3 1 q3,1 q3,2
Fig. 5.6 As noted earlier, for the purpose of diagonalizing the fluctuations the constant term of the
polynomials is not important. If we make the small adjustment that C0 (x) = 1 and all the others
are unchanged then the recurrence relation becomes Cn+1 (x) = xCn (x) − 2Cn−1 (x) for n ≥ 2 and
C2 (x) = xC1 (x) − 2C0 (x). From this we obtain qn+1,k = qn,k−1 + qn,k+1
for k ≥ 1 and qn+1,0 = 2qn,1 .
n
From these relations we see that for k ≥ 1 we have qn,k = (n−k)/2 when n − k is even and 0 when
n−1
n − k is odd. When k = 0 we have qn,0 = 2 n/2−1 when n is even and qn,0 = 0 when n is odd.
1 1
5 2 1 1
1 5 2 5
5 2 2
5 2
4 3 4 3 4 3 4 3
4 3
1 1 1
1 1 5 5
2 5 2 2
5 2 5 2
4 3 4 3 4 3
4 3 4 3
Fig. 5.7 When n = 5 and k = 1, q5,1 = 10. The ten non-crossing half-pairings on five points with
one through string.
That this is indeed the correct combinatorial interpretation of the result of Jo-
hansson can be found in [115]. There the main emphasis is actually on the case
of Wishart matrices and the result of Cabanal-Duvillard from Example 42 . The
Wishart case can be understood in a similar combinatorial way; instead of non-
crossing annular half-pairings and through-pairs one has to consider non-crossing
annular half-permutations and through-blocks.
Consider now the situation of several variables; then we have to diagonalize the
bilinear form (p1 , p2 ) 7→ ϕ2 (p1 (a1 , . . . , as ), p2 (a1 , . . . , as )). For polynomials in just
5.6 Diagonalization of fluctuations 163
one of the variables this is the same problem as in the previous section. It remains
to understand the mixed fluctuations in more than one variable. If we have that
a1 , . . . , as are free of second order, then this is fairly easy. The following theorem
from [128] follows directly from Definition 24 of second order freeness.
Theorem 44. Assume a1 , . . . , as are free of second order in the second-order prob-
(i)
ability space (A, ϕ, ϕ2 ). Let, for each i = 1, . . . , s, Qk (k ≥ 0) be the orthogonal
(i)
polynomials for the distribution of ai ; i.e., Qk is a polynomial of degree k such that
(i) (i)
ϕ(Qk (ai )Ql (ai )) = δkl for all k, l ≥ 0. Then the fluctuations of mixed words in
(i ) (i )
the ai ’s are diagonalized by cyclically alternating products Qk11 (ai1 ) · · · Qkmm (aim )
(with all kr ≥ 1 and i1 6= i2 , i2 6= i3 , . . . , im 6= i1 ) and the covariances are given by
the number of cyclic matchings of these products:
(i ) (i ) (j ) (j )
ϕ2 Qk11 (ai1 ) · · · Qkmm (aim ), Ql1 1 (a j1 ) · · · Qln n (a jn )
= δmn · #{r ∈ {1, . . . , n} | is = js+r , ks = ls+r ∀s = 1, . . . , n}, (5.33)
Remark 45. Note the different nature of the solution for the one-variate and the
multi-variate case. For example, for independent GUE’s we have that the covariance
is diagonalized by the following set of polynomials:
◦ Chebyshev polynomials Ck of first kind in one of the variables
◦ cyclically alternating products of Chebyshev polynomials Uk of second kind for
different variables.
Again there is a combinatorial way of understanding the appearance of the two dif-
ferent kinds of Chebyshev polynomials. As we have outlined in Remark 43, the
Chebyshev polynomials Ck show up in the one-variate case, because this corre-
sponds to going over to non-crossing annular half-pairings with k through-pairs.
In the multi-variate case one has to realize that having several variables breaks the
circular symmetry of the circle and thus effectively replaces a circular problem by a
linear one. In this spirit, the expansion of xn in terms of Chebyshev polynomials Uk
of second kind counts the number of non-crossing linear half-pairings on n points
with k open pairs.
In the Wishart case there is a similar description by replacing non-crossing an-
nular half-permutations by non-crossing linear half-permutations, resulting in an
analogue appearance of orthogonal polynomials of first and second kind for the
one-variate and multi-variate situation, respectively.
More details and the proofs of the above statements can be found in [115].
Chapter 6
Free Group Factors and Freeness
165
166 6 Free Group Factors and Freeness
functions can be identified with the group algebra C[G] of formal finite linear com-
binations of elements in G with complex coefficients, a = ∑g∈G a(g)g, where only
finitely many a(g) 6= 0. Integration over such functions is with respect to the count-
ing measure, hence the convolution is then written as
and is hence nothing but the multiplication in C[G]. Note that the function δe = 1 · e
is the identity element in the group algebra C[G], where e is the identity element in
G.
Now define an inner product on C[G] by setting
(
1, if g = h
hg, hi = (6.1)
0, if g 6= h
on G and extending sesquilinearly to C[G]. From this inner product we define the 2-
norm on C[G] by kak22 = ha, ai. In this way (C[G], k · k2 ) is a normed vector space.
However, it is not complete in the case of infinite G (for finite G the following
is trivial). The completion of C[G] with respect to k · k2 consists of all functions
a : G → C satisfying ∑g∈G |a(g)|2 < ∞, and is denoted by `2 (G), and is a Hilbert
space.
Now consider the unitary group representation λ : G → U(`2 (G)) defined by
This is the left regular representation of G on the Hilbert space `2 (G). It is obvious
from the definition that each λ (g) is an isometry of `2 (G), but we want to check
that it is in fact a unitary operator on `2 (G). Since clearly hgh, ki = hh, g−1 ki, the
adjoint of the operator λ (g) is λ (g−1 ). But then since λ is a group homomorphism,
we have λ (g)λ (g)∗ = I = λ (g)∗ λ (g), so that λ (g) is indeed a unitary operator on
`2 (G).
Now extend the domain of λ from G to C[G] in the obvious way:
This makes λ into an algebra homomorphism λ : C[G] → B(`2 (G)), i.e. λ is a rep-
resentation of the group algebra on `2 (G). We define two new (closed) algebras via
this representation. The reduced group C∗ -algebra Cred∗ (G) of G is the closure of
λ (C[G]) ⊂ B(`2 (G)) in the operator norm topology. The group von Neumann alge-
bra of G, denoted L(G), is the closure of λ (C[G]) in the strong operator topology
on B(`2 (G)).
6.2 Free group factors 167
One knows that for an infinite discrete group G, L(G) is a type II1 von Neumann
algebra, i.e. L(G) is infinite dimensional, but yet there is a trace τ on L(G) defined
by τ(a) := hae, ei for a ∈ L(G), where e ∈ G is the identity element. To see the
trace property of τ it suffices to check it for group elements; this extends then to
the general situation by linearity and normality. However, for g, h ∈ G, the fact that
τ(gh) = τ(hg) is just the statement that gh = e is equivalent to hg = e; this is clearly
true in a group. The existence of a trace shows that L(G) is a proper subalgebra
of B(`2 (G)); this is the case because there does not exist a trace on all bounded
operators on an infinite dimensional Hilbert space. An easy fact is that if G is an
ICC group, meaning that the conjugacy class of each g ∈ G with g 6= e has infinite
cardinality, then L(G) is a factor, i.e. has trivial centre (see [106, Theorem 6.75]).
Another fact is that if G is an amenable group (e.g the infinite permutation group
S∞ = ∪n Sn ), then L(G) is the hyperfinite II1 factor R.
Exercise 1. (i) Show that L(G) is a factor if and only if G is an ICC group.
(ii) Show that the infinite permutation group S∞ = ∪n Sn is ICC. (Note that each
element from S∞ moves only a finite number of elements.)
Exercise 3. Prove Theorem 2 by observing that the assumptions imply that the GNS-
constructions with respect to ϕ and ψ are isomorphic.
6.5 Freeness in the free group factors 169
Though the theorem is not hard to prove, it conveys the important message that
all information about a von Neumann algebra is, in principle, contained in the ∗ -
moments of a generating set with respect to a faithful normal state.
In the case of the group von Neumann algebras L(G) the canonical state is the
trace τ. This is defined as a vector state so it is automatically normal. It is worth to
notice that it is also faithful (and hence (L(G), τ) is a tracial W ∗ -probability space).
Proposition 3. The trace τ on L(G) is a faithful state.
Proof: Suppose that a ∈ L(G) satisfies 0 = τ(a∗ a) = ha∗ ae, ei = hae, aei, thus ae =
0. So we have to show that ae = 0 implies a = 0. To show that a = 0, it suffices
to show that haξ , ηi = 0 for any ξ , η ∈ `2 (G). It suffices to consider vectors of
the form ξ = g, η = h for g, h ∈ G, since we can get the general case from this by
linearity and continuity. Now, by using the traciality of τ, we have
hag, hi = hage, hei = hh−1 age, ei = τ(h−1 ag) = τ(gh−1 a) = hgh−1 ae, ei = 0,
Thus freeness of the subgroup algebras C[G1 ], . . . , C[Gs ] with respect to τ is just
a simple reformulation of the fact that G1 , . . . , Gs are free subgroups of G. However,
a non-trivial fact is that this reformulation carries over to closures of the subalgebras.
(n) (n)
since, by the freeness of B1 , . . . , Bs , we have ϕ(b1 · · · bk ) = 0 for each n.
(2) Consider a1 , . . . , ak with ai ∈ M ji , ϕ(ai ) = 0, and ji 6= ji+1 for all i. We have
to show that ϕ(a1 · · · ak ) = 0. We approximate essentially as in the C∗ -algebra case,
we only have to take care that the multiplication of our k factors is still continuous
in the appropriate topology. More precisely, we can now approximate, for each i,
the operator ai in the strong operator topology by a sequence (or a net, if you must)
(n)
bi . By invoking Kaplansky’s density theorem we can choose those such that we
(n)
keep everything bounded, namely kbi k ≤ kai k for all n. Again we can center the
(n)
sequence, so that we can assume that all ϕ(bi ) = 0. Since the joint multiplication
is on bounded sets continuous in the strong operator topology, we have then still
(n) (n)
the convergence of b1 · · · bk to a1 · · · ak , and thus, since ϕ is normal, also the
(n) (n)
convergence of 0 = ϕ(b1 · · · bk ) to ϕ(a1 · · · ak ).
Proof:
Let x be a normal element in M which is such that its spectral measure with re-
spect to τ is diffuse. Let A = vN(x) be the von Neumann algebra generated by x.
We want to show that there is a Haar unitary u ∈ A that generates A as a von Neu-
mann algebra. A is a commutative von Neumann algebra and the restriction of τ to
A is a faithful state. A cannot have any minimal projections as that would mean that
the spectral measure of x with respect to τ was not diffuse. Thus there is a normal
∗-isomorphism π : A → L∞ [0, 1] where we put Lebesgue measure on [0, 1]. (This
follows from the well-known fact that any commutative von Neumann algebra is
∗-isomorphic to L∞ (µ) for some measure µ and that all spaces L∞ (µ) for µ without
atoms are ∗-isomorphic; see, for example, [171, Chapter III, Theorem 1.22].
Under π the trace τ becomes a normal state on L∞ [0, 1]. Thus there is a positive
function h ∈ L1 [0, 1] such that for all a ∈ A, τ(a) = 01 π(a)(t)h(t) dt. Since τ is
R
faithful the set {t ∈ [0, 1] | h(t) = 0} has Lebesgue measure 0. Thus H(s) = 0s h(t) dt
R
is a continuous positive strictly increasing function on [0, 1] with range [0, 1]. So
by the Stone-Weierstrass theorem the C∗ -algebra generated by 1 and H is all of
C[0, 1]. Hence the von Neumann algebra generated by 1 and H is all of L∞ [0, 1]. Let
v(t) = exp(2πiH(t)). Then H is in the von Neumann algebra generated by v, so the
von Neumann algebra generated by v is L∞ [0, 1]. Also,
Z 1 Z 1 Z 1
v(t)n h(t) dt = exp(2πinH(t))H 0 (t) dt = e2πins ds = δ0,n .
0 0 0
Thus v is Haar unitary with respect to h. Finally let u ∈ A be such that π(u) = v.
Then the von Neumann algebra generated by u is A and u is a Haar unitary with
respect to the trace τ.
This means that for each i we can find in vN(xi ) a Haar unitary ui which generates
the same von Neumann algebra as xi . By Proposition 5, freeness of the xi goes over
to freeness of the ui . So we have found n Haar unitaries in M which are ∗-free and
which generate M. Thus M is isomorphic to the free group factor L(Fn ).
Example 7. Instead of generating L(Fn ) by n ∗-free Haar unitaries it is also very
common to use n free semi-circular elements. (Note that for self-adjoint elements ∗-
freeness is of course the same as freeness.) This is of course covered by the theorem
above. But let us be a bit more explicit on deforming a semi-circular element into
a Haar unitary. Let s ∈ M be a semi-circular operator. The spectral measure of s is
√
4 − t 2 /(2π) dt, i.e.
172 6 Free Group Factors and Freeness
Z 2
1 p
τ( f (s)) = f (t) 4 − t 2 dt.
2π −2
If
t p 1 1 p
H(t) = 4 − t 2 + sin−1 (t/2) then H 0 (t) = 4 − t 2,
4π π 2π
and u = exp(2πiH(s)) is a Haar unitary, i.e.
Z 2 Z 1/2
τ(uk ) = e2πikH(t) H 0 (t) dt = e2πikr dr = δ0,k ,
−2 −1/2
To prove this theorem we must find in L(F3 )1/2 nine free normal elements with
diffuse spectral measure which generate L(F3 )1/2 . In order to achieve this we will
start with normal elements x1 , x2 , x3 , together with a faithful normal state ϕ, such
that
◦ the spectral measure of each xi is diffuse (i.e. no atoms) and
◦ x1 , x2 , x3 are ∗-free.
Let N be the von Neumann algebra generated by x1 , x2 and x3 . Then N ' L(F3 ). We
will then show that there is a projection p in N such that
◦ ϕ(p) = 1/2
◦ there are 9 free and diffuse elements in pN p which generate pN p.
Thus L(F3 )1/2 ' pN p ' L(F9 ).
The crucial issue above is that we will be able to choose our elements x1 , x2 , x3
in such a form that we can easily recognize p and the generating elements of pN p.
(Just starting abstractly with three ∗-free normal diffuse elements will not be very
helpful, as we have then no idea how to get p and the required nine free elements.)
Actually, since our claim is equivalent to L(F3 ) ' M2 (C) ⊗ L(F9 ), it will surely be
a good idea to try to realize x1 , x2 , x3 as 2 × 2 matrices. This will be achieved in the
next section with the help of circular operators.
Since s1 and s2 are free we can easily calculate the free cumulants of c. If ε = ±1
let us adopt the following notation for x(ε) : x(−1) = x∗ , and x(1) = x. Recall that for
a standard semi-circular operator s
(
1, n = 2
κn (s, . . . , s) = .
0, n 6= 2
Thus
since all mixed cumulants in s1 and s2 are 0. Thus κn (c(ε1 ) , . . . , c(εn ) ) = 0 for n 6= 2,
and
(
1 − ε 1 ε2 1 ε1 6= ε2
κ2 (c(ε1 ) , c(ε2 ) ) = 2−1 κ2 (s1 , s1 ) − ε1 ε2 κ2 (s2 , s2 ) =
= .
2 0 ε1 = ε2
Now note that any π ∈ NC2 (2n) connects, by parity reasons, automatically only c
with c∗ , hence κπ (c∗ , c, c∗ , c, . . . , c∗ , c) = 1 for all π ∈ NC2 (2n) and we have
The proof of (i) and (ii) can either be done using random matrix methods (as was
done by Voiculescu [180]) or by showing that if u is a Haar unitary and q is a quarter-
circular operator such that u and q are ∗-free then uq has the same ∗-moments as
a circular operator (this was done by Nica and Speicher [140]). The latter can be
achieved, for example, by using the formula for cumulants of products, equation
(2.23). For the details of this approach, see [140, Theorem 15.14].
Here we have used the standard notation M2 (A) = M2 (C) ⊗ A for 2 × 2 matrices
with entries from A and ϕ2 = tr ⊗ ϕ for the composition of the normalized trace
with ϕ.
Proof: Let Chx11 , x12 , x21 , x22 i be the polynomials in the non-commuting variables
x11 , x12 , x21 , x22 . Let
k !
1 x11 x12
pk (x11 , x12 , x21 , x22 ) = Tr .
2 x21 x22
Then x1 , x2 , x3 are ∗-free in M2 (A) with respect to the state tr ⊗ ϕ; x1 and x2 are
semi-circular and x3 is normal and diffuse.
Proof:
We model x1 by X1 , x2 by X2 and x3 by X3 where
S1 C1 S3 C2 U 0
X1 = , X 2 = , X3 =
C1∗ S2 C2∗ S3 0 2U
◦ the elements
s1 c1 s3 c2 u 0
x1 = ∗ , x2 = ∗ , x3 =
c1 s2 c2 s4 0 2u
where c1 = v1 |c1 | and c2 = v2 |c2 | are the polar decompositions of c1 and c2 , respec-
tively, in M.
Hence we see that N = vN(x1 , x2 , x3 ) is generated by the ten elements
s1 0 0 0 0 v1 0 0 s3 0
y1 = y2 = y3 = y4 = y5 =
0 0 0 s2 0 0 0 |c1 | 0 0
0 0 0 v2 0 0 u0 00
y6 = y7 = y8 = y9 = y10 = .
0 s4 0 0 0 |c2 | 00 0u
Let us put
0 v1 00 10
v := ; then v∗ v = and vv∗ = = p = p2 .
0 0 01 00
Since we can write now any pyi1 · · · yin p in the form pyi1 1yi2 1 · · · 1yin p and re-
place each 1 by p2 + v∗ v, it is clear that 10 ∗ ∗
i=1 {pyi p, pyi v , vyi p, vyi v } generate pN p.
S
Note that v1 v∗1 = 1 can be removed from the set of generators. To check that the
remaining nine elements are ∗-free and diffuse we recall a few elementary facts
about freeness.
Exercise 5. Show the following:
(i) if A1 and A2 are free subalgebras of A, if A11 and A12 are free subalgebras
of A1 , and if A21 and A22 are free subalgebras of A2 ; then A11 , A12 , A21 , A22 are
free;
(ii) if u is a Haar unitary ∗-free from A, then A is ∗-free from uAu∗ ;
(iii) if u1 and u2 are Haar unitaries and u2 is ∗-free from {u1 } ∪ A then u2 u∗1 is a
Haar unitary and is ∗-free from u1 Au∗1 .
By construction s1 , s2 , s3 , s4 , |c1 |, |c2 |, v1 , v2 , u are ∗-free. Thus, in particular, s2 , s4 ,
|c1 |, |c2 |, v2 , u are ∗-free. Hence, by (ii), v1 s2 v∗1 , v1 s4 v∗1 , v1 |c1 |v∗1 , v1 |c2 |v∗1 , v1 uv∗1 are
∗-free and, in addition, ∗-free from u, s1 , s3 , v2 . Thus
are ∗-free. Let A = alg(s2 , s4 , |c1 |, |c2 |, u). We have that v2 is ∗-free from {v1 } ∪ A,
so by (iii), v2 v∗1 is ∗-free from v1 Av∗1 . Thus v2 v∗1 is ∗-free from
and it was already ∗-free from s1 , s3 and u. Thus by (i) our nine elements
are ∗-free. Since they are either semi-circular, quarter-circular or Haar elements they
are all normal and diffuse; as they generate pN p,we have that pN p is generated by
nine ∗-free normal and diffuse elements and thus, by Theorem 6, pN p ' L(F9 ).
Hence L(F3 )1/2 ' L(F9 ).
(i) (i)
with all s j ( j = 1, . . . , k; i = 1, . . . , n − 1) semi-circular, all c pq (1 ≤ p < q ≤ k;
i = 1, . . . , n − 1) circular, and u a Haar unitary, so that all elements are ∗-free.
So we have (n − 1)k semi-circular operators, (n − 1) 2k circular operators and
one Haar unitary. Each circular operator produces two free elements so we have in
total
k
(n − 1)k + 2(n − 1) + 1 = (n − 1)k2 + 1
2
free and diffuse generators. Thus L(Fn )1/k ' L(F1+(n−1)k2 ).
Theorem 13. Let R be the hyperfinite II1 factor and L(F∞ ) = vN(s1 , s2 , . . . ) be a
free group factor generated by countably many free semicircular elements si , such
that R and L(F∞ ) are free in some W ∗ -probability space (M, τ). Consider orthog-
6.12 The dichotomy for the free group factor isomorphism problem 179
onal projections p1 , p2 , · · · ∈ R and put r := 1 + ∑ j τ(p j )2 ∈ [1, ∞]. Then the von
Neumann algebra
L(Fr ) := vN(R, p j s j p j ( j ∈ N)) (6.4)
is a factor and depends, up to isomorphism, only on r.
These L(Fr ) for r ∈ R, 1 ≤ r ≤ ∞ are the interpolating free group factors. Note
that we do not claim to have non-integer free groups Fr . The notation L(Fr ) cannot
be split into smaller components.
Dykema and Rădulescu showed the following results.
Theorem 14. 1) For r ∈ {2, 3, 4, . . . , ∞} the interpolating free group factor L(Fr ) is
the usual free group factor.
2) We have for all r, s ≥ 1: L(Fr ) ? L(Fs ) ' L(Fr+s ).
3) We have for all r ≥ 1 and all t ∈ (0, ∞) the same compression formula as in
the integer case:
L(Fr ) t ' L(F1+t −2 (r−1) ). (6.5)
The compression formula above is also valid in the case r = ∞; since then 1 +
t −2 (r − 1) = ∞, it yields in this case that any compression of L(F∞ ) is isomorphic
to L(F∞ ); or in other words we have that the fundamental group of L(F∞ ) is equal
to R+ .
6.12 The dichotomy for the free group factor isomorphism problem
Whereas for r = ∞ the compression of L(Fr ) gives the same free group factor (and
thus we know that the fundamental group is maximal in this case), for r < ∞ we get
some other free group factors. Since we do not know whether these are isomorphic
to the original L(Fr ) we cannot decide upon the fundamental group in this case.
However, on the positive side, we can connect different free group factors by com-
pressions; this yields that some isomorphisms among the free group factors will
imply other isomorphisms. For example, if we would know that L(F2 ) ' L(F3 ),
then this would imply that also
L(F5 ) ' L(F2 ) 1/2 ' L(F3 ) 1/2 ' L(F9 ).
7.1 Motivation
Let us return to the connection between random matrix theory and free probability
(1) (p)
theory which we have been developing. We know that a p-tuple (AN , . . . , AN ) of
N × N matrices chosen independently at random with respect to the GUE density
(compare Exercise 1.8), PN (A) = const · exp(−NTr(A2 )/2), on the space of N × N
Hermitian matrices converges almost surely (in moments with respect to the nor-
malized trace) to a freely independent family (s1 , . . . , s p ) of semi-circular elements
lying in a non-commutative probability space, see Theorem 4.4. The von Neumann
algebra generated by p freely independent semi-circulars is the von Neumann alge-
bra L(F p ) of the free group on p generators.
We ask now the following question: How likely is it to observe other distribu-
tions/operators for large N?
181
182 7 Free Entropy χ - the Microstates Approach via Large Deviations
Let us consider the case p = 1 more closely. For a random Hermitian matrix
A = A∗ (distribution as above) with real random eigenvalues λ1 ≤ · · · ≤ λN , denote
by
1
µA = (δλ1 + · · · + δλN ) (7.1)
N
the eigenvalue distribution of A (also known as the empirical eigenvalue distri-
bution), which is a random measure on R. Wigner’s semicircle law states that, as
N → ∞, PN (µA ≈ µW ) → 1, where µW is the (non-random) semi-circular distribu-
tion and µA ≈ µW means that the measures are close in a sense that can be made
precise. We are now interested in the deviations from this. What is the rate of de-
cay of the probability PN (µA ≈ ν), where ν is some measure (not necessarily the
semi-circle)? We expect that
2 I(ν)
PN (µA ≈ ν) ∼ e−N (7.2)
for some rate function I vanishing at µW . By analogy with the classical theory of
large deviations, I should correspond to a suitable notion of free entropy.
We used in the above the notion “≈” for meaning “being close” and “∼” for
“behaves asymptotically (in N) like”; here they should just be taken on an intuitive
level, later, in the actual theorems they will be made more precise.
In the next two sections we will recall some of the basic facts of the classical
theory of large deviations and, in particular, Sanov’s theorem; this standard material
can be found, for example, in the book [64]. In Section 7.4 we will come back to the
random matrix question.
For example if µ = N(0, 1) is Gaussian then m = 0 and Sn has the Gaussian distri-
bution N(0, 1/n), and hence
√
2 n
P(Sn ≈ x) = P(Sn ∈ [x, x + dx]) ≈ e−nx /2 dx √ ∼ e−nI(x) dx.
2π
Thus the probability that Sn is near the value x decays exponentially in n at a rate
determined by x, namely the rate function I(x) = x2 /2. Note that the convex func-
7.2 Large deviation theory and Cramér’s Theorem 183
tion I(x) has a global minimum at x = 0, the minimum value there being 0, which
corresponds to the fact that Sn approaches the mean 0 in probability.
This behaviour is described in general by the following theorem of Cramér. Let
X, µ, {Xi }i and Sn be as above. There exists a function I(x), the rate function, such
that
How does one calculate the rate function I for a given distribution µ? We shall let
X be a random variable with the same distribution as the Xi ’s. For arbitrary x > m,
one has for all λ ≥ 0
This implies that for λ < 0 and x > m we have −n(λ x −Λ (λ )) ≥ 0 and so equation
(7.5) is valid for all λ . Thus
184 7 Free Entropy χ - the Microstates Approach via Large Deviations
P(Sn > x) ≤ inf e−n(λ x−Λ (λ )) = exp −n sup(λ x − Λ (λ )) .
λ λ
However, in preparation for the vector valued version we will show that exp (−nΛ ∗ (x))
is asymptotically a lower bound; more precisely, we need to verify that
1
lim inf log P(x − δ < Sn < x + δ ) ≥ −Λ ∗ (x)
n→∞ n
for all x and all δ > 0. By replacing Xi by Xi − x we can reduce this to the case x = 0,
namely showing that
1
−Λ ∗ (0) ≤ lim inf log P(−δ < Sn < δ ). (7.9)
n→∞ n
7.2 Large deviation theory and Cramér’s Theorem 185
Note that −Λ ∗ (0) = infλ Λ (λ ). The idea of the proof of (7.9) is then to perturb
the distribution µ to µ̃ such that x = 0 is the mean of µ̃. Let us only consider the
case where Λ has a global minimum at some point η. This will always be the case if
µ has compact support and both P(X > 0) and P(X < 0) are not 0. The general case
can be reduced to this by a truncation argument. With this reduction Λ (λ ) is finite
for all λ and thus Λ has an infinite radius of convergence (c.f. Exercise 1) and thus
Λ is differentiable. So we have Λ 0 (η) = 0. Now let µ̃ be the measure on R such that
Note that
Z Z
d µ̃(x) = e−Λ (η) eηx dµ(x) = e−Λ (η) E[eηX ] = e−Λ (η) eΛ (η) = 1,
R R
which verifies that µ̃ is a probability measure. Consider now i.i.d. random variables
{X̃i }i with distribution µ̃, and put S̃n = (X̃1 + · · · + X̃n )/n. Let X̃ have the distribution
µ̃. We have
d
Z Z Z
xd µ̃(x) = e−Λ (η) xeηx dµ(x) = e−Λ (η) eλ x dµ(x)λ =η
E[X̃] =
R R dλ R
−Λ (η) d Λ (λ )
=e e λ =η
= e−Λ (η)Λ 0 (η)eΛ (η) = Λ 0 (η) = 0.
dλ
Now, for all ε > 0, we have exp (η ∑ xi ) ≤ exp (nε|η|) whenever | ∑ xi | ≤ nε and so
Z
P(−ε < Sn < ε) = dµ(x1 ) · · · dµ(xn )
| ∑ni=1 xi |<nε
Z
≥ e−nε|η| eη ∑ xi dµ(x1 ) · · · dµ(xn )
| ∑ni=1 xi |<nε
Z
= e−nε|η| enΛ (η) d µ̃(x1 ) · · · d µ̃(xn )
| ∑ni=1 xi |<nε
By the weak law of large numbers, S̃n → E[X̃i ] = 0 in probability, i.e. we have
limn→∞ P(−ε < S̃n < ε) = 1 for all ε > 0. Thus for all 0 < ε < δ
1 1
lim inf log P(−δ < Sn < δ ) ≥ lim inf log P(−ε < Sn < ε)
n→∞ n n→∞ n
≥ Λ (η) − ε|η|, for all ε > 0
≥ Λ (η)
= infΛ (λ )
= −Λ ∗ (0).
186 7 Free Entropy χ - the Microstates Approach via Large Deviations
This sketches the proof of Cramér’s theorem for R. The higher-dimensional form
of Cramér’s theorem can be proved in a similar way.
Theorem 1 (Cramér’s Theorem for Rd ). Let X1 , X2 , . . . be a sequence of i.i.d ran-
dom vectors, i.e. independent Rd -valued random variables with common distribu-
tion µ (a probability measure on Rd ). Put
and
Λ ∗ (x) := sup {hλ , xi − Λ (λ )}. (7.12)
λ ∈Rd
so that in particular pk is equal to the probability that Yi will have a 1 in the k-th
spot and 0’s elsewhere. Then the averaged sum (Y1 + · · · + Yn )/n gives the relative
frequency of a1 , . . . , ad , i.e., it contains the same information as the empirical distri-
bution of (X1 , . . . , Xn ).
A probability measure on A is given by a d-tuple (q1 , . . . , qd ) of positive real
numbers satisfying q1 + · · · + qd = 1. By Cramér’s theorem,
7.3 Sanov’s Theorem and entropy 187
Y1 + · · · +Yn
1
P (δX1 + · · · + δXn ) ≈ (q1 , . . . , qd ) = P ≈ (q1 , . . . , qd )
n n
∗ (q ,...,q )
∼ e−nΛ 1 d .
Here
Λ (λ1 , . . . , λd ) = log E[ehλ ,Yi i ] = log(p1 eλ1 + · · · + pd eλd ).
Thus the Legendre transform is given by
We compute the supremum over all tuples (λ1 , . . . , λd ) by finding the partial deriva-
tive ∂ /∂ λi of λ1 q1 + · · · + λd qd − Λ (λ1 , . . . , λd ) to be
1
qi − pi eλi .
p1 eλ1 + · · · + pd eλd
By concavity the maximum occurs when
qi qi
λi = log + log(p1 eλ1 + · · · + pd eλd ) = log + Λ (λ1 , . . . , λd ),
pi pi
and we compute
Λ ∗ (q1 , . . . , qd )
q1 qd
= q1 log + · · · + qd log + (q1 + · · · + qd )Λ (λ1 , . . . , λd ) − Λ (λ1 , . . . , λd )
p1 pd
q1 qd
= q1 log + · · · + qd log .
p1 pd
Concretely, this means the following. Consider the set M of probability mea-
sures on R with the weak topology (which is a metrizable topology, e.g. by the Lévy
metric). Then for closed F and open G in M we have
1
lim sup log P(νn ∈ F) ≤ − inf S(ν, µ) (7.18)
n→∞ n ν∈F
1
lim inf log P(νn ∈ G) ≥ − inf S(ν, µ). (7.19)
n→∞ n ν∈G
N
N
N 2
d P̃N (λ1 , . . . , λN ) = CN · e− 2 ∑i=1 λi ∏(λi − λ j )2 ∏ dλi , (7.22)
i< j i=1
where
2 /2
NN
CN = . (7.23)
(2π)N/2 ∏Nj=1 j!
We want to establish a large deviation principle for the empirical eigenvalue dis-
tribution µA = (δλ1 (A) + · · · + δλN (A) )/N of a random matrix in HN .
One can argue heuristically as follows for the expected form of the rate function.
We have
7.4 Back to random matrices and one-dimensional free entropy 189
1
PN {µA ≈ ν} = P̃N (δ + · · · + δλN ) ≈ ν
N λ1
Z
N
N
2
= CN · e− 2 ∑ λi ∏(λi − λ j )2 ∏ dλi .
{ N1 (δλ +···+δλ )≈ν} i< j i=1
1 N
N N 2 N2 1 N 2
− ∑ i λ = − ∑ λi
2 i=1 2 N i=1
1 1
Z Z Z
I(ν) = − log |s − t|dν(s)dν(t) + t 2 dν(t) − lim logCN . (7.24)
2 N→∞ N2
The value of the limit can be explicitly computed as 3/4. Note that by writing
√
2 2 2 2 2 2 s2 + t 2
s + t − 4 log |s − t| = s + t − 2 log(s + t ) + 4 log
|s − t|
defined as an extended real number, possibly +∞, in which case we set I(ν) = +∞,
otherwise I(ν) is finite and is given by (7.24).
RR
Voiculescu was thus motivated to use the integral log |s − t| dµx (s)dµx (t) to
define in [181] the free entropy χ (x) for one self-adjoint variable x with distribution
µx , see equation (7.30).
The large deviation argument was then made rigorous in the following theorem
of Ben Arous and Guionnet [26].
Theorem 3. Put
1 3
ZZ Z
I(ν) = − log |s − t|dν(s)dν(t) + t 2 dν(t) − . (7.25)
2 4
Then:
190 7 Free Entropy χ - the Microstates Approach via Large Deviations
1 δλ + · · · + δλN
lim inflog P̃N ( 1 ∈ G) ≥ − inf I(ν), (7.26)
N2
N→∞ N ν∈G
1 δλ + · · · + δλN
lim sup 2
log P̃N ( 1 ∈ F) ≤ − inf I(ν). (7.27)
N→∞ N N ν∈F
Exercise 4. The above theorem includes in particular the statement that for a Wigner
semicircle distribution µW with variance 1 we have
1
ZZ
− log |s − t| dµW (s)dµW (t) = . (7.28)
4
Prove this directly!
Exercise 5. (i) Let µ be a probability measure with support in [−2, 2]. Show that we
have Z 2
∞
1
Z Z
log |s − t|dµ(s)dµ(t) = − ∑ Cn (t)dµ(t) ,
R R n=1 2n R
with respect to the normalized trace, where (s1 , . . . , sn ) is a free semi-circular family.
Large deviations from this limit should be given by
2 I(x ,...,x )
PN {(A1 , . . . , An ) | distr(A1 , . . . , An ) ≈ distr(x1 , . . . , xn )} ∼ e−N 1 n
,
where I(x1 , . . . , xn ) is the free entropy of x1 , . . . , xn . The problem is that this has to
be made more precise and that, in contrast to the one-dimensional case, there is no
analytical formula to calculate this quantity.
We use the equation above as motivation to define free entropy as follows. This is
essentially the definition of Voiculescu from [182], the only difference is that he also
included a cut-off parameter R and required in the definition of the “microstate set”
Γ that kAi k ≤ R for all i = 1, . . . , n. Later it was shown by Belinschi and Bercovici
[20] that removing this cut-off condition gives the same quantity.
Γ (x1 , . . . , xn ; N, r, ε)
:= (A1 , . . . , An ) ∈ MN (C)nsa | |tr(Ai1 · · · Aik ) − τ(xi1 · · · xik )| ≤ ε
for all 1 ≤ i1 , . . . , ik ≤ n, 1 ≤ k ≤ r .
In words, Γ (x1 , . . . , xn ; N, r, ε), which we call the set of microstates, is the set of all
n-tuples of N × N self-adjoint matrices which approximate the mixed moments of
the self-adjoint elements x1 , . . . , xn of length at most r to within ε.
2
Let Λ denote Lebesgue measure on MN (C)nsa ' RnN . Then we define
χ (x1 , . . . , xn ; r, ε) := lim sup 1 log Λ (Γ (x1 , . . . , xn ; N, r, ε)) + n log(N) ,
N→∞ N2 2
It is an important open problem whether the lim sup in the definition above of
χ (x1 , . . . , xn ; r, ε) is actually a limit.
We want to elaborate on the meaning of Λ , the Lebesgue measure on MN (C)nsa '
2
RnN and the normalization constant n log(N)/2. Let us consider the case n = 1.
For a self-adjoint matrix A = (ai j )Ni,j=1 ∈ MN (C)sa we identify the elements on the
diagonal (which are real) and the real and imaginary part of the elements above the
diagonal (which are the adjoints of the corresponding elements below the diagonals)
with an N + 2 N(N−1) 2 = N 2 dimensional vector of real numbers. The actual choice
of this mapping is determined by the fact that we want the Euclidean inner product
2
in RN to correspond on the side of the matrices to the form (A, B) 7→ Tr(AB). Note
that
192 7 Free Entropy χ - the Microstates Approach via Large Deviations
N N
Tr(A2 ) = ai j a ji = ∑ (Reaii )2 + 2 (Reai j )2 + (Imai j )2 .
∑ ∑
i, j=1 i=1 1≤i< j≤N
√
This means that there is a difference of a factor 2 between the diagonal and the
off-diagonal elements. (The same effect made its appearance in Chapter 1, Exercise
8, when we defined the GUE by assigning different values for the covariances for
variables on and off the diagonal — in order to make this choice invariant under
conjugation by unitary matrices.) So our specific choice of a map between MN (C)
2
and RN means that we map the set {A ∈ MN (C)sa | Tr(A2 ) ≤ R2 } to the ball BN 2 (R)
of radius R in N 2 real dimensions. The pull back under this map of the Lebesgue
2
measure on RN is what we call Λ , the Lebesgue measure on MN (C)sa . The situation
for general n is given by taking products.
Note that a microstate (A1 , . . . , An ) ∈ Γ (x1 , . . . , xn ; N, r, ε) satisfies for r ≥ 2
1
Tr(A21 + · · · + A2n ) ≤ τ(x12 + · · · + xn2 ) + nε =: c2 ,
N
√
and thus the set of microstates Γ (x1 , . . . , xn ; N, r, ε) is contained in the ball BnN 2 ( Nc).
The fact that the latter grows logarithmically like
√ √ 2
1 √ 1 ( Nc π)nN n
logΛ BnN 2 ( Nc) = log ∼ − log N,
N2 N2 Γ (1 + nN 2 /2) 2
is the reason for adding the term n log N/2 in the definition of χ (x1 , . . . , xn ; r, ε).
Thus in particular, by using the corresponding property from (i), we always have:
χ (x1 , . . . , xn ) ∈ [−∞, ∞).
(m) (m) distr
(iii) χ is upper semicontinuous: if (x1 , . . . , xn ) −→ (x1 , . . . , xn ) for m → ∞, then
The above mentioned strategy is the basis of the proof of the following theorem.
Theorem 7. Let M be a finite von Neumann algebra with trace τ generated by self-
adjoint operators x1 , . . . , xn , where n ≥ 2. Assume that χ (x1 , . . . , xn ) > −∞, where
the free entropy is calculated with respect to the trace τ. Then
(i) M does not have property Γ . In particular, M is a factor.
(ii) M does not have a Cartan subalgebra.
(iii) M is prime.
Corollary 8. All this applies in the case of the free group factor L(Fn ) for 2 ≤ n <
∞, thus:
(i) L(Fn ) does not have property Γ .
(ii) L(Fn ) does not have a Cartan subalgebra.
(iii) L(Fn ) is prime.
Parts (i) and (ii) of the theorem above are due to Voiculescu [185], part (iii)
was proved by Liming Ge [76]. In particular, the absence of Cartan subalgebras
for L(Fn ) was a spectacular result, as it falsified the conjecture, which had been
open for decades, that every II1 factor should possess a Cartan subalgebra. Such
a conjecture was suggested by the fact that von Neumann algebras obtained from
ergodic measurable relations always have Cartan subalgebras and for a while there
was the hope that all von Neumann algebras might arise in this way.
In order to give a more concrete idea of this approach we will present the essential
steps in the proof for part (i) (which is the simplest part of the theorem above) and
say a few words about the proof of part (iii). However, one should note that the
absence of property Γ for L(Fn ) is an old result of Murray and von Neumann which
can be proved more directly without using free entropy. The following follows quite
closely the exposition of Biane [36].
196 7 Free Entropy χ - the Microstates Approach via Large Deviations
We now give the main arguments and estimates for the proof of Part (i) of Theo-
rem 7. So let M = vN(x1 , . . . , xn ) have property Γ ; we must prove that this implies
χ (x1 , . . . , xn ) = −∞.
Let (tk )k be a non-trivial central sequence in M. Then its real and imaginary
parts are also central sequences (at least one of them non-trivial) and, by applying
functional calculus to this sequence, we may replace the tk ’s with a non-trivial cen-
tral sequence of orthogonal projections (pk )k , and assume the existence of a real
number θ in the open interval (0, 1/2) such that θ < τ(pk ) < 1 − θ for all k and
limk→∞ k[x, pk ]k2 = 0 for all x ∈ M.
We then prove the following key lemma.
Lemma 9. Let (M, τ) be a tracial W ∗ -probability space generated by self-adjoint
elements x1 , . . . , xn satisfying τ(xi2 ) ≤ 1. Let 0 < θ < 12 be a constant and p ∈ M a
projection such that θ < τ(p) < 1 − θ . If there is ω > 0 such that k[p, xi ]k2 < ω for
1 ≤ i ≤ n then there exist positive constants C1 ,C2 depending only on n and θ such
that χ (x1 , . . . , xn ) ≤ C1 +C2 log ω.
Assuming this is proved, choose p = pk . We can take ωk → 0 as k → ∞. Thus
we get χ (x1 , . . . , xn ) ≤ C1 + C2 log ω for all ω > 0, implying χ (x1 , . . . , xn ) = −∞.
(Note that we can achieve the assumption τ(xi2 ) ≤ 1 by rescaling our generators.) It
remains to prove the lemma.
Proof: Take (A1 , . . . , An ) ∈ Γ (x1 , . . . , xn ; N, r, ε) for N, r sufficiently large and ε suf-
ficiently small. As p can be approximated by polynomials in x1 , . . . , xn and by an ap-
plication of the functional calculus, we find a projection matrix Q ∈ MN (C) whose
range is a subspace of dimension q = bNτ(p)c and such that we have (where the
k · k2 -norm is now with respect to tr in MN (C)) k[Ai , Q]k2 < 2ω for all i = 1, . . . , n.
This Q is of the form
Iq 0
Q=U U∗
0 0N−q
for some U ∈ U(N)/U(q) × U(N − q). Write
Bi Ci∗
U ∗ AiU = .
Ci Di
Then k[Ai , Q]k2 ≤ 2ω implies the same for the conjugated matrices, i.e.,
r
0 −Ci∗
Bi Ci∗
2 ∗ 10
= k[Ai , Q]k2 < 2ω,
Tr(CiCi ) =
=
,
N Ci 0
2
C i D i 0 0
2
Γ (x1 , . . . , xn ; N, r, ε) ⊆
h √ √ √ in
U Bq2 ( 2N) × B2q(N−q) (ω 4N) × B(N−q)2 ( 2N) U ∗ .
[
U∈
U (N)/U (q)×U (N−q)
This does not give directly an estimate for the volume of our set Γ , as we have here
a covering by infinitely many sets. However, we can reduce this to a finite cover by
approximating the U’s which appear by elements from a finite δ -net.
By a result of Szarek [169], for any δ > 0 there exists a δ -net (Us )s∈S in the
2 2 2
Grassmannian U(N)/U(q)× U(N −q) with |S| ≤ (Cδ −1 )N −q −(N−q) with C a uni-
versal constant.
For (A1 , . . . , An ), Q, and U as above, we have that there exists s ∈ S such that
kU − Us k ≤ δ implies k[Us∗ AiUs ,U ∗ QU]k2 ≤ 2ω + 8δ . Repeating the arguments
above for Us∗ AiUs instead of U ∗ AiU (where we have to replace 2ω by 2ω + 8δ ) we
get
Γ (x1 , . . . , xn ; N, r, ε) ⊆
[h √ √ √ in
Us Bq2 ( 2N) × B2q(N−q) (ω + 4δ ) 4N × B(N−q)2 ( 2N) Us∗ , (7.38)
s∈S
and hence
2 2 2
Λ (Γ (x1 , . . . , xn ; N, r, ε)) ≤ (Cδ −1 )N −q −(N−q)
h √ √ √ in
× Λ Bq2 ( 2N) Λ B2q(N−q) (ω + 4δ ) 4N Λ B(N−q)2 ( 2N) .
R p π p/2
Λ (B p (R)) = ,
Γ (1 + 2p )
198 7 Free Entropy χ - the Microstates Approach via Large Deviations
Thus
1 n
logΛ (Γ (x1 , . . . , xn ; N, r, ε)) + log N ≤ C̃1 + C̃2 log δ −1 + n log(ω + 4δ ) ,
N 2 2
for positive constants C̃1 , C̃2 depending only on n and θ . Taking now δ = ω gives
the claimed estimate with C1 := C̃1 + n log 5 and C2 := (n − 1)C̃2 .
One should note that our estimates work for all n. However, in order to have C2
strictly positive, we need n > 1. For n = 1 we only get an estimate against a constant
C1 , which is not very useful. This corresponds to the fact that for each i the smallness
of the off-diagonal block Ci of U ∗ AiU in some basis U is not very surprising; how-
ever, if we have the smallness of all such blocks C1 , . . . ,Cn of U ∗ A1U, . . . ,U ∗ AnU
for a common U, then this is a much stronger constraint.
The proof of part (iii) proceeds in a similar, though technically more complicated,
fashion. Let us assume that our II1 factor M = vN(x1 , . . . , xn ) has a Cartan subalge-
bra N. We have to show that this implies χ (x1 , . . . , xn ) = −∞.
First one has to rewrite the property of having a Cartan subalgebra in a more
algebraic way, encoding a kind of “smallness”. Voiculescu showed the following.
For each ε > 0 there exist: a finite-dimensional C∗ -subalgebra N0 of N; k( j) ∈ N
(i) (i) (i)
for all 1 ≤ j ≤ n; orthogonal projections p j , q j ∈ N0 and elements x j ∈ M for all
(i) (i) (i) (i)
j = 1, . . . , n and 1 ≤ i ≤ k( j); such that the following holds: x j = p j x j q j for all
j = 1, . . . , n and 1 ≤ i ≤ k( j),
(i) (i)∗
kx j − ∑ (x j + x j )k2 < ε for all j = 1, . . . , n, (7.39)
1≤i≤k( j)
and
(i) (i)
∑ ∑ τ(p j )τ(q j ) < ε.
1≤ j≤n 1≤i≤k( j)
the projections. This gives some constraints on the volume of possible microstates.
Again, in order to get rid of the freedom of conjugating by an arbitrary unitary ma-
trix one covers the unitary N × N matrices by a δ -net S and gets so in the end a sim-
ilar bound as in (7.38). Invoking from [169] the result that one can choose a δ -net
2
with |S| < (C/δ )N leads finally to an estimate for χ (x1 , . . . , xn ) as in Lemma 9. The
bound in this estimate goes to −∞ for ε → 0, which proves that χ (x1 , . . . , xn ) = −∞.
Chapter 8
Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher
Information
In classical probability theory there exist two important concepts which measure the
amount of “information” of a given distribution. These are the Fisher information
and the entropy. There exist various relations between these quantities and they form
a cornerstone of classical probability theory and statistics. Voiculescu introduced
free probability analogues of these quantities, called free Fisher information and
free entropy, denoted by Φ and χ , respectively. However, there remain some gaps in
our present understanding of these quantities. In particular, there exist two different
approaches, each of them yielding a notion of entropy and Fisher information. One
hopes that finally one will be able to prove that both approaches give the same result,
but at the moment this is not clear. Thus for the time being we have to distinguish
the entropy χ and the free Fisher information Φ coming from the first approach (via
microstates) and the free entropy χ ∗ and the free Fisher information Φ ∗ coming
from the second, non-microstates approach (via conjugate variables).
Whereas we considered the microstates approach for χ in the previous chapter,
we will in this chapter deal with the second approach, which fits quite nicely with
the combinatorial theory of freeness. In this approach the Fisher information is the
basic quantity (in terms of which the free entropy χ ∗ is defined), so we will restrict
our attention mainly to Φ ∗ .
The concepts of information and entropy are only useful when we consider
states (so that we can use the positivity of ϕ to get estimates for the information
or entropy). Thus in this section we will always work in the framework of a W ∗ -
probability space. Furthermore, it is crucial that we work with a faithful normal
trace. The extension of the present theory to non-tracial situations is unclear.
201
202 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
by
∂i 1 = 0, ∂i X j = δi j 1 ⊗ 1 ( j = 1, . . . , n),
and by the Leibniz rule
(ii) If one mixes different partial derivatives the situation becomes more compli-
cated. Show that (id ⊗ ∂i ) ◦ ∂ j = (∂ j ⊗ id) ◦ ∂i , but in general for i 6= j (id ⊗ ∂i ) ◦ ∂ j 6=
(∂i ⊗ id) ◦ ∂ j .
P(X) − P(Y )
∂ P(X)=
ˆ .
X −Y
and
X m −Y m
= X m−1 + X m−2Y + X m−3Y 2 + · · · +Y m−1 .
X −Y
8.1 Non-commutative derivatives 203
One should note that in the non-commutative world there exists another canonical
derivation into the tensor product, namely the mapping P 7→ P ⊗ 1 − 1 ⊗ P. Actually,
there is an important relation between this derivation and our partial derivatives.
Lemma 4. For all P ∈ ChX1 , . . . , Xn i we have:
n
∑ ∂ j P · X j ⊗ 1 − 1 ⊗ X j · ∂ j P = P ⊗ 1 − 1 ⊗ P. (8.3)
j=1
kp ⊗ 1 − 1 ⊗ pk22 = τ ⊗ τ (p ⊗ 1 − 1 ⊗ p)2
= τ ⊗ τ[p2 ⊗ 1 + 1 ⊗ p2 − 2p ⊗ p]
= 2τ(p2 )
= 2kpk22 .
i∂
ChX1 , . . . , Xn i −→ ChX1 , . . . , Xn i ⊗ ChX1 , . . . , Xn i
↓ ↓
eval eval
↓ ↓
Chx1 , . . . , xn i −→ Chx1 , . . . , xn i ⊗ Chx1 , . . . , xn i
Notation 7 We denote by
k·k p
L p (x1 , . . . , xn ) := Chx1 , . . . , xn i ⊂ L p (M)
Hence, in the case where x1 , . . . , xn are algebraically free, ∂i is then also an un-
bounded operator on L2 , ∂i : L2 (x1 , . . . , xn ) ⊃ D(∂i ) → L2 (x1 , . . . , xn ) ⊗ L2 (x1 , . . . , xn )
with domain D(∂i ) = Chx1 , . . . , xn i. In order that unbounded operators have a nice
analytic structure they should be closable. In terms of the adjoint, this means that
the adjoint operator
8.2 ∂i as unbounded operator on Chx1 , . . . , xn i 205
and for elementary tensors p ⊗ q with p, q ∈ Chx1 , . . . , xn i the action of ∂i∗ is given
by
For such an η we set ∂i∗ (η) = η 0 . Prove Theorem 8 by showing that for all r ∈
Chx1 , . . . , xn i we have h∂i∗ (p ⊗ q), ri = hp ⊗ q, ∂i ri when we use the right-hand side
of (8.6) as the definition of ∂i∗ (p ⊗ q).
(iv) Show that
and
There are two terms which show up obviously on both sides and thus we are left
with showing
−h(id ⊗ τ)(∂i p), qξi i + h(id ⊗ τ)(∂i p), (id ⊗ τ)(∂i q)i = −hξi , (id ⊗ τ)[∂i p∗ · 1 ⊗ q]i.
(id ⊗ τ)∗ (ξ ) = ξ ⊗ 1.
Thus
hξi , (id ⊗ τ)[∂i p∗ · 1 ⊗ q]i = hξi ⊗ 1, ∂i p∗ · 1 ⊗ qi
and
Thus pξi − (id ⊗ τ)(∂i p) = ∂i∗ (p ⊗ 1). This then implies Eq. (8.8) as follows:
Theorem 10. Assume that 1 ⊗ 1 ∈ D(∂i∗ ). Then we have for all p ∈ Chx1 , . . . , xn i the
inequality
k(id ⊗ τ)(∂i p) − pξi k2 ≤ kξi k2 · kpk. (8.9)
Hence, with M = vN(x1 , . . . , xn ), the mapping (id ⊗ τ) ◦ ∂i extends to a bounded
mapping M → L2 (M) and we have
Proof: Assume that inequality (8.9) has been proved. Then we have
for all p ∈ Chx1 , . . . , xn i. This says that (id ⊗ τ) ◦ ∂i as a linear mapping from
Chx1 , . . . , xn i ⊂ M to L2 (M) has norm less or equal to 2kξi k2 . It is also easy to
check (see Exercise 3) that (id ⊗ τ) ◦ ∂i is closable as an unbounded operator from
L2 to L2 and hence, by the following Proposition 11, it can be extended to a bounded
mapping on M, with the same bound: 2kξi k2 .
So it remains to prove (8.9). By (8.8), we have
Now note that the first factor converges, for n → ∞, to kξi k2 , whereas for the second
factor we can bound as follows:
n−1 n−1 1/2n
k(id ⊗ τ)[∂i ((p∗ p)2 )] − (p∗ p)2 ξi k2
208 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
n−1 n−1
1/2n
≤ k∂i ((p∗ p)2 )k2 + kp∗ pk2 · kξi k2
∗
1/2n
n−1 k∂i (p p)k2
≤ kpk · 2 + kξi k2 ,
kp∗ pk
Proposition 11. Let (M, τ) be a tracial W ∗ -probability space with separable pre-
dual and ∆ : L2 (M, τ) ⊃ D(∆ ) → L2 (M, τ) be a closable linear operator. Assume
that D(∆ ) ⊂ M is a ∗-algebra and that we have k∆ (x)k2 ≤ ckxk for all x ∈ D(∆ ).
Then ∆ extends to a bounded mapping ∆ : M → L2 (M, τ) with k∆ kM→L2 (M) ≤ c.
Proof: Since the extension of ∆ to the norm closure of D(∆ ) is trivial, we can as-
sume without restriction that D(∆ ) is a C∗ -algebra. Consider y ∈ M. By Kaplansky’s
density theorem there exists a sequence (xn )n∈N with xn ∈ D(∆ ), kxn k ≤ kyk for all
n, and such that (xn )n converges to y in the strong operator topology. By assumption
we know that the sequence (∆ (xn ))n is bounded by ckyk in the L2 -norm. By the
Banach-Saks theorem we have then a subsequence (∆ (xnk ))k of which the Cesàro
means converge in the L2 -norm, say to some z ∈ L2 (M):
1 m
zm := ∑ ∆ (xnl ) → z ∈ L2 (M).
m l=1
Now put ym := ∑m l=1 xnl /m. Then we have a sequence (ym )m∈N that converges to y in
the strong operator topology, hence also in the L2 -norm, and such that (∆ (ym ))m =
(zm )m converges to some z ∈ L2 (M). Since ∆ is closable, this z is independent of the
chosen sequences and putting ∆ (y) := z gives the extension to M we seek. Since we
have k∆ (ym )k2 ≤ ckyk for all m, this goes also over to the limit: k∆ (y)k2 = kzk2 ≤
ckyk.
∂ pt (u) ∂ 2 pt (u)
=
∂t ∂ u2
8.3 Conjugate variables and free Fisher information Φ ∗ 209
subject to the initial condition p0 (u) = p(u). Let us calculate the derivative of the
classical entropy S(pt ) at t = 0, where we use the explicit formula for classical
entropy Z
S(pt ) = − pt (u) log pt (u)du.
We will in the following just do formal calculations, but all steps can be justified
rigorously. We will also use the notations
∂ ∂
ṗ := p, p0 := p,
∂t ∂u
where p(t, u) = pt (u). Then we have
dS(pt )
Z Z
∂
=− [pt (u) · log pt (u)] du = − [ ṗt log pt + ṗt ] du.
dt ∂t
The second term vanishes,
d
Z Z
ṗt du = pt (u) du = 0
dt
(because pt is a probability density for all t); by invoking the diffusion equation and
by integration by parts the first term gives
(pt0 (u))2
Z Z Z Z
− ṗt log pt du = − pt00 log pt du = pt0 (log pt )0 du = du.
pt (u)
(p0 (u))2
Z
I(X) = du if dµX (u) = p(u) du
p(u)
(p0 (u))2 h p0 2 i
Z
I(X) = du = E − (X) = E(ξ 2 ),
p(u) p
where the random variable ξ (usually called the score function) is defined by
p0
ξ := − (X) (which is in L2 (X) if I(X) < ∞).
p
The advantage of this is that the score ξ has some conceptual meaning. Consider a
nice f (X) ∈ L2 (X) and calculate
210 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
h p0 i Z 0
p (u)
E(ξ f (X)) = −E (X) f (X) = − f (u)p(u) du
p p(u)
Z Z
0
=− p (u) f (u) du = p(u) f 0 (u) du = E( f 0 (X)).
d
In terms of the derivative operator and its adjoint we can also write this in L2 as
du
D d ∗ E
hξ , f (X)i = E(ξ f (X) ) = E( f 0 (X) ) = h1, f 0 (X)i = 1, f (X) ,
du
implying that
d ∗
ξ=1.
du
The above formulas were for the case n = 1 of one variable, but doing the same
in the multivariate case is no problem in the classical case.
Exercise 4. Repeat this formal proof in the multivariate case to show that for a
random vector (X1 , . . . , Xn ) with density p on Rn and a function f : Rn → R we have
∂
p
∂
∂ ui
E f (X1 , . . . , Xn ) = E (X1 , . . . , Xn ) · f (X1 , . . . , Xn ) .
∂ ui p
Definition 12. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n.
1) We say ξ1 , . . . , ξn ∈ L2 (M) satisfy the conjugate relations for x1 , . . . , xn if we
have for all P ∈ ChX1 , . . . , Xn i
τ(ξi P(x1 , . . . , xn )) = τ ⊗ τ (∂i P)(x1 , . . . , xn ) (8.11)
Note the conjugate relations prescribe the inner products of the ξi with a dense
subset in L2 (x1 , . . . , xn ), thus a conjugate system is unique if it exists.
If there exist ξ1 , . . . , ξn ∈ L2 (M) which satisfy the conjugate relations then there
exists a conjugate system; this is given by pξ1 , . . . , pξn where p is the orthogonal
projection from L2 (M) onto L2 (x1 , . . . , xn ). This holds because the left-hand side of
(8.11) is unchanged by replacing ξi by pξi . Furthermore, we have in such a situation
n n
Φ ∗ (x1 , . . . , xn ) = ∑ kpξi k22 ≤ ∑ kξi k22 ,
i=1 i=1
This can be verified from the definition, but there is an easier way to do this using
free cumulants. See Exercise 7 following Remark 21 below. By projecting ξ onto
L2 (x + y) we get η a conjugate vector whose length has not increased. Thus when
x and y are free we have Φ ∗ (x + y) ≤ min{Φ ∗ (x), Φ ∗ (y)}. However the free Stam
inequality (see Theorem 19) is sharper.
Formally, the definition of ξi could also be written as ξi = ∂i∗ (1 ⊗ 1). However, in
order that this makes sense, we need ∂i as an unbounded operator on L2 (x1 , . . . , xn ),
which is the case if and only if x1 , . . . , xn are algebraically free. The next proposi-
tion by Mai, Speicher, Weber [121] shows that the existence of a conjugate system
excludes algebraic relations between the xi , and hence the conjugate variables are,
if they exist, always of the form ξi = ∂i∗ (1 ⊗ 1). This implies then also, by Theorem
8, that the ∂i are closable.
Theorem 13. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Assume that a conjugate system ξ1 , . . . , ξn for x1 , . . . , xn exists. Then
x1 , . . . , xn are algebraically free.
Thus we have
0 = τ[ξi · (R1 PR2 )(x1 , . . . , xn )] = τ ⊗ τ ∂i (R1 PR2 ) (x1 , . . . , xn )
= τ ⊗ τ[r1 ⊗ 1 · qi · 1 ⊗ r2 ] = τ ⊗ τ[qi · r1 ⊗ r2 ].
1 f (t) 1 f (s − t) − f (s + t)
Z Z
H( f )(s) = dt = dt.
π s−t 2π t
8.3 Conjugate variables and free Fisher information Φ ∗ 213
1 −1
hε (s) = (Qε ∗ f )(s) = Re(G(s + iε)) and (Pε ∗ f )(s) = Im(G(s + iε)).
π π
(8.14)
The first term converges to H( f ) and the second to f as ε → 0+ .
The following result is due to Voiculescu [187].
Theorem 14. Consider x = x∗ ∈ M and assume that µx has a density p which is in
L3 (R). Then a conjugate variable exists and is given by
1 p(u)
Z
ξ = 2πH(p)(x), where H(p)(v) = du
π v−u
is the Hilbert transform. The free Fisher information is then
4
Z
Φ ∗ (x) = π 2 p(u)3 du. (8.15)
3
So we have
4
Z Z
Φ ∗ (x) = τ (2πH(p)(x))2 = 4π 2 (H(p)(u))2 p(u) du = π 2 p(u)3 du.
3
The last equality is a general property of the Hilbert transform which follows from
Equation (8.14), see Exercise 5.
214 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
Then use Equation (8.14) to prove the last step in the proof of Theorem 14.
After [187] it remained open for a while whether the condition on the density in
the last theorem is also necessary. That this is indeed the case is the content of the
next proposition, which is an unpublished result of Belinschi and Bercovici. Before
we get to this we need to consider briefly freeness for unbounded operators.
The notion of freeness we have given so far assumes that our random variables
have moments of all orders. We now see that the use of conjugate variables requires
us to use unbounded operators and these might only have a first and second moment,
so our current definition of freeness cannot be applied. For classical independence
there is no need for the random variables to have any moments; the usual definition
of independence relies on spectral projections. In the non-commutative picture we
also use spectral projections, except now they may not commute. To describe this
we need to review the idea of an operator affiliated to a von Neumann algebra.
Let M be a von Neumann algebra acting on a Hilbert space H and suppose that
t is a closed operator on H. Let t = u|t| be the polar decomposition of t, see, for
example, Reed and Simon [148, Ch. VIII]. Now |t| is a closed self-adjoint operator
and thus has a spectral resolution E|t| . This means that E|t| is a projection valued
measure on R, i.e. we require that for each Borel set B ⊆ R we have that E|t| (B)
is a projection on H and for each pair η1 , η2 ∈ H the measure µη1 ,η2 , defined by
µη1 ,η2 (B) = hE|t| (B)η1 , η2 i, is a complex measure on R. Returning to our t, if both
u and E|t| (B) belong to M for every Borel set B, we say that t is affiliated with M.
Suppose now that M has a faithful trace τ and H = L2 (M). For t self-adjoint and
affiliated with M we let µt , the distribution of t, be given by µt (B) = τ(Et (B)). If
R
t ≥ 0 and λ dµt (λ ) < ∞ we say that t is integrable. For a general closed operator
affiliated with M we say that t is p-integrable if |t| p is integrable, i.e. λ p dµ|t| (λ ) <
R
∞. In this picture L2 (M) is the space of square integrable operators affiliated with
M.
Definition 15. Suppose M is a von Neumann algebra with a faithful trace τ and
t1 , . . . ,ts are closed operators affiliated with M. For each i, let Ai be the von Neumann
subalgebra of M generated by ui and the spectral projections E|ti | (B) where B ⊂ R is
a Borel set and ti = ui |ti | is the polar decomposition of ti . If the subalgebras A1 , . . . , As
are free with respect to τ then we say that the operators t1 , . . . ,ts are free with respect
to τ.
Remark 16. In [134, Thm. XV] Murray and von Neumann showed that the operators
affiliated with M form a ∗-algebra. So if t1 and t2 are self-adjoint operators affiliated
with M we can form the spectral measure µt1 +t2 . When t1 and t2 are free this is the
free additive convolution of µt1 and µt2 . Indeed this was the definition of µt1 µt2
given by Bercovici and Voiculescu [31]. This shows that by passing to self-adjoint
8.3 Conjugate variables and free Fisher information Φ ∗ 215
operators affiliated to a von Neumann algebra one can obtain the free additive con-
volution of two probability measures on R from the addition of two free random
variables, see Remark 3.48.
4
Z
Φ ∗ (x) = π 2 p3 (u) du.
3
Proof: Again, we will only provide formal arguments. The main deficiency of the
following is that we have to invoke unbounded operators, and the statements we
are going to use are only established for bounded operators in our presentation.
However, this can be made rigorous by working with operators affiliated with M
and by extending the previous theorem to the unbounded setting.
Let t be a Cauchy distributed random variable which is free from x. (Note that t
is an unbounded operator!) Consider for ε > 0 the random variable xε := x + εt. It
can be shown that adding a free variable cannot increase the free Fisher information,
since one gets the conjugate variable of xε by conditioning the conjugate variable
of x onto the L2 -space generated by xε . See Exercise 7 below for the argument in
the bounded case. For this to make sense in the unbounded case we use resolvents
as above (Remark 17) to say what a conjugate variable is. Hence Φ ∗ (xε ) ≤ Φ ∗ (x)
for all ε > 0. But, for any ε > 0, the distribution of xε is the free convolution of µx
with a scaled Cauchy distribution. By Remark 3.34 we have Gxε (z) = Gx (z + iε),
and hence, by the Stieltjes inversion formula, the distribution of xε has a density pε
which is given by
216 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
1 1
Z
ε
pε (u) = − ImGx (u + iε) = 2 2
dµx (v).
π π R (u − v) + ε
Since this density is always in L3 (R), we know by (the unbounded version of) the
previous theorem that Z
Φ ∗ (xε ) = pε (u)3 du.
So we get
1
Z
sup 3 |ImGx (u + iε)|3 du = sup Φ ∗ (xε ) ≤ Φ ∗ (x).
ε>0 π ε>0
This implies (see, e.g., [109]) that Gx belongs to the Hardy space H 3 (C+ ), and thus
µx is absolutely continuous and its density is in L3 (R).
Some important properties of the free Fisher information are collected in the
following theorem. For the proof we refer to Voiculescu’s original paper [187].
Theorem 19. The free Fisher information Φ ∗ has the following properties (where
all appearing variables are self-adjoint and live in a tracial W ∗ -probability space).
1) Φ ∗ is superadditive:
n2
Φ ∗ (x1 , . . . , xn ) ≥ . (8.18)
τ(x12 ) + · · · + τ(xn2 )
3) We have the free Stam inequality. If {x1 , . . . , xn } and {y1 , . . . , yn } are free then
we have
1 1 1
≥ + ∗ . (8.19)
Φ ∗ (x1 + y1 , . . . , xn + yn ) Φ ∗ (x 1 , . . . , xn ) Φ (y1 , . . . , yn )
Theorem 20. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Consider ξ1 , . . . , ξn ∈ L2 (M). The following statements are equivalent:
(i) ξ1 , . . . , ξn satisfy the conjugate relations (8.12).
(ii) We have for all m ≥ 1 and 1 ≤ i, i(1), . . . , i(m) ≤ n that
κ1 (ξi ) = 0
κ2 (ξi , xi(1) ) = δii(1)
κm+1 (ξi , xi(1) , . . . , xi(m) ) = 0 (m ≥ 2).
Remark 21. Note that up to now we considered only cumulants where all arguments
are elements of the algebra M; here we have the situation where one argument is
from L2 , all the other arguments are from L∞ = M. This is well defined by approx-
imation using the normality of the trace, and poses no problems, since multiplying
an element from L2 with an operator from L∞ gives again an element from L2 ; or
one can work directly with the inner product on L2 . Cumulants with more than two
arguments from L2 would be problematic. Moreover one can apply our result, Equa-
tion (2.19), when the entries of our cumulant are products, again provided that there
are at most two elements from L2 .
Exercise 7. Prove the claim following Theorem 12 that if x1 and x2 are free and x1
has a conjugate variable ξ , then ξ satisfies the conjugate relations for x1 + x2 .
We can now prove the easy direction of the relation between free Fisher informa-
tion and freeness. This result is due to Voiculescu [187]; our proof using cumulants
is from [137].
Theorem 22. Let (M, τ) be a tracial W ∗ -probability space and consider xi = xi∗ ∈ M
(i = 1, . . . , n) and y j = y∗j ∈ M ( j = 1, . . . , m). If {x1 , . . . , xn } and {y1 , . . . , ym } are free
then we have
Theorem 20. The relations involving only x’s and ξ ’s or only y’s and η’s are sat-
isfied because of the conjugate relations for either x/ξ or y/η. Because of ξi ∈
L2 (x1 , . . . , xn ) and η j ∈ L2 (y1 , . . . , ym ) and the fact that {x1 , . . . , xn } and {y1 , . . . , ym }
are free, we have furthermore the vanishing (see Remark 21) of all cumulants with
mixed arguments from {x1 , . . . , xn , ξ1 , . . . , ξn } and {y1 , . . . , ym , η1 , . . . , ηm }. But this
gives then all the conjugate relations.
The less straightforward implication, namely that additivity of the free Fisher
information implies freeness, relies on the following relation for commutators be-
tween variables and their conjugate variables. This, as well as the consequence for
free Fisher information, was proved by Voiculescu in [189], whereas our proofs use
again adaptations of ideas from [137].
Theorem 23. Let (M, τ) be a tracial W *-probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Let ξ1 , . . . , ξn ∈ L2 (x1 , . . . , xn ) be a conjugate system for x1 , . . . , xn . Then
we have
n
∑ [xi , ξi ] = 0
i=1
κm+1 (c, xi(1) , . . . , xi(m) ) = 0 for all m ≥ 0 and all 1 ≤ i(1), . . . , i(m) ≤ n.
By using the formula for cumulants with products as entries, Theorem 2.13, we get
because in the case of the first sum, the only partition, π, that satisfies the two con-
ditions that ξi is in a block of size two and π ∨ {(1, 2), (3), · · · , (m + 2)} = 1m+2 is
8.4 Additivity of Φ ∗ and freeness 219
π = {(1, 4, 5, . . . , m + 2), (2, 3)}; and in the case of the second sum the only par-
tition, σ , that satisfies the two conditions that ξi is in a block of size two and
σ ∨ {(1, 2), (3), · · · , (m + 2)} = 1m+2 is σ = {(1, m + 2), (2, 3, 4, . . . , m + 1)}. The
last equality follows from the fact that τ is a trace, see Exercise 2.8.
Theorem 24. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n and y j = y∗j ∈ M for j = 1, . . . , m. Assume that
= Φ ∗ (x1 , . . . , xn ) + Φ ∗ (y1 , . . . , ym )
n m
= ∑ kPξi k22 + ∑ kQη j k22 .
i=1 j=1
However, this means that the projection P has no effect on the ξi and the projection
Q has no effect on η j ; hence the additivity of the Fisher information is saying that
ξ1 , . . . , ξn is already the conjugate system for x1 , . . . , xn and η1 , . . . , ηm is already the
conjugate system for y1 , . . . , ym . By Theorem 23, this implies that
n m
∑ [xi , ξi ] = 0 and ∑ [y j , η j ] = 0.
i=1 j=1
In order to prove the asserted freeness we have to check that all mixed cumulants
in {x1 , . . . , xn } and {y1 , . . . , ym } vanish. In this situation a mixed cumulant means
there is at least one xi and at least one y j . Moreover, because we are working with a
tracial state, it suffices to show κr+2 (xi , z1 , . . . , zr , y j ) = 0 for all r ≥ 0; i = 1, . . . , n;
j = 1, . . . , m; and z1 , . . . , zr ∈ {x1 , . . . , xn , y1 , . . . , ym }. Consider such a situation. Then
we have
n
0 = κr+3 ( ∑ [xk , ξk ], xi , z1 , . . . , zr , y j )
k=1
220 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
n n
= ∑ κ| r+3 (xk ξk , xi{z, z1 , . . . , zr , y j}) − ∑ κ| r+3 (ξk xk , xi{z, z1 , . . . , zr , y j})
k=1 k=1
κ2 (ξk ,xi )·κr+2 (xk ,z1 ,...,zr ,y j ) κ2 (ξk ,y j )·κr+2 (xk ,xi ,z1 ,...,zr )
= κr+2 (xi , z1 , . . . , zr , y j ),
because, by the conjugate relations, κ2 (ξk , xi ) = δki and κ2 (ξk , y j ) = 0 for all k =
1, . . . , n and all j = 1, . . . , m.
Definition 25. Let (M, τ) be a tracial W ∗ -probability space. For random variables
xi = xi∗ ∈ M (i = 1, . . . , n), the non-microstates free entropy is defined by
Z ∞ √ √
χ ∗ (x1 , . . . , xn ) := 1 n n
− Φ ∗ (x1 + ts1 , . . . , xn + tsn ) dt + log(2πe),
2 0 1+t 2
(8.21)
where s1 , . . . , sn are free semi-circular random variables which are free from {x1 , . . . ,
xn }.
One can now rewrite the properties of Φ ∗ into properties of χ ∗ . In the next the-
orem we collect the most important ones. The proofs are mostly straightforward
(given the properties of Φ ∗ ) and we refer again to Voiculescu’s original papers
[187, 189].
Theorem 26. The non-microstates free entropy has the following properties (where
all variables which appear are self-adjoint and are in a tracial W ∗ -probability
space).
1) For n = 1, we have χ ∗ (x) = χ (x).
2) We have the upper bound
In particular:
Theorem 27. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for
i = 1, . . . , n. Then we have
(note that [p, 1 ⊗ 1] should here be understood as module operations, i.e., we have
[p, 1 ⊗ 1] = p ⊗ 1 − 1 ⊗ p).
Corollary 29. Assume that Φ ∗ (x1 , . . . , xn ) < ∞. Then we have for all t ∈ vN(x1 , . . . , xn )
1 n n o
(n − 1)kt − τ(t)k22 ≤ ∑ h[t, xi ], [t, ξi ]i + 4k[t, xi ]k2 · kξi k2 · ktk .
2 i=1
8.7 Absence of atoms for self-adjoint polynomials 223
Proof: It suffices to prove the statement for t = p ∈ Chx1 , . . . , xn i. First note that
1 n n
(n − 1)kp − τ(p)k22 = ∑ ih[p, x ], [p, ξ i ]i + Re ∑ ih∂ p, [1 ⊗ 1, [p, xi ]]i .
2 i=1 i=1
Reh∂i p, [1 ⊗ 1, [p, xi ]]i ≤ 2k(id ⊗ τ)∂i pk2 · k[p, xi ]k2 + 2k(τ ⊗ id)∂i pk2 · k[p, xi ]k2
Theorem 30. Let n ≥ 2 and Φ ∗ (x1 , . . . , xn ) < ∞. Then vN(x1 , . . . , xn ) does not have
property Γ (and hence is a factor).
Proof: Let (tk )k∈N be a central sequence in vN(x1 , . . . , xn ). (Recall that central se-
quences are, by definition, bounded in operator norm.) This means in particular that
[tk , xi ] converges, for k → ∞, in L2 (M) to 0, for all i = 1, . . . , n. But then, by Corol-
lary 29, we also have ktk − τ(tk )k2 → 0, which means that our central sequence is
trivial. Thus there exists no non-trivial central sequence.
is, at least for self-adjoint polynomials, the same as the question of zero divisors in
the following sense.
Theorem 32. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for i =
1, . . . , n. Assume that Φ ∗ (x1 , . . . , xn ) < ∞. Then for any non-trivial p ∈ Chx1 , . . . , xn i
there exists no zero divisor.
Proof: The rough idea of the proof follows the same line as the proof of Theorem
13; namely assume that we have a zero divisor for some polynomial, then one shows
that by differentiating this statement one also has a zero divisor for a polynomial of
lesser degree. Thus one can reduce the general case to the (non-trivial) degree 0
case, where obviously no zero divisors exist.
More precisely, assume that we have pw = 0 for non-trivial p ∈ Chx1 , . . . , xn i
and w ∈ vN(x1 , . . . , xn ). Furthermore, we can assume that both p and w are self-
adjoint (otherwise, consider p∗ pww∗ = 0). Then pw = 0 implies also wp = 0. We
will now consider the equation wpw = 0 and take the derivative ∂i of this. Of course,
we have now the problem that w is not necessarily in the domain D(∂i ) of our
derivative. However by approximating w by polynomials and controlling norms via
Dabrowski’s inequality from Theorem 10 one can show that the following formal
arguments can be justified rigorously.
From wpw = 0 we get
0 = ∂i (wpw) = ∂i w · 1 ⊗ pw + w ⊗ 1 · ∂i p · 1 ⊗ w + wp ⊗ 1 · ∂i w.
Because of pw = 0 and wp = 0 the first and the third term vanish and we are left
with w ⊗ 1 · ∂i p · 1 ⊗ w = 0. Again we apply τ ⊗ id to this, in order to get an equation
in the algebra instead of the tensor product; we get
The condition Φ ∗ (x1 , . . . , xn ) < ∞ is not the weakest possible; in [52] it was
shown that the conclusion of Theorem 32 still holds under the assumption of maxi-
mal free entropy dimension.
(ii) Show that the condition from (i) actually characterizes a family of n free
semi-circulars. Equivalently, let ξ1 , . . . , ξn be the conjugate system for self-adjoint
variables x1 , . . . , xn in some tracial W ∗ -probability space. Assume that ξi = xi for all
i = 1, . . . , n. Show that x1 , . . . , xn are n free semi-circular variables.
Notation 33 In the following (Cn )n∈N0 and (Un )n∈N0 will be the Chebyshev poly-
nomials of the first and second kind, respectively (rescaled to the interval [−2, 2]),
i.e., the sequence of polynomials Cn ,Un ∈ ChXi which are defined recursively by
and
Exercise 10. Let ∂ : ChXi → ChXi ⊗ ChXi be the non-commutative derivative with
respect to X. Show that
n
∂Un (X) = ∑ Uk−1 (X) ⊗Un−k (X) for all n ∈ N.
k=1
226 8 Free Entropy χ ∗ - the Non-Microstates Approach via Free Fisher Information
(Note that the latter is in this case a stronger version of Theorem 10.)
(iv) The statement in (iii) shows that (id ⊗ τ) ◦ ∂ is a bounded operator with
respect to k · k2 . Show that this is not true for ∂ , by proving that kUn (s)k2 = 1 and
√
k∂Un (s)k2 = n.
P ◦ Q = (P1 ◦ Q, . . . , Pn ◦ Q)
and Pi ◦ Q ∈ ChX1 , . . . , Xn i by
Exercise 14. Let (M, τ) be a tracial W ∗ -probability space and xi = xi∗ ∈ M for i =
1, . . . , n. Assume Φ ∗ (x1 , . . . , xn ) < ∞.
(i) Show that we have for λ > 0
1 ∗
Φ ∗ (λ x1 , . . . , λ xn ) = Φ (x1 , . . . , xn ).
λ2
(ii) Let now A = (ai j )ni, j=1 ∈ Mn (R) be a real invertible n × n matrix and put
n
yi := ∑ ai j x j .
j=1
Φ ∗ (x1 , . . . , xn ) = Φ ∗ (y1 , . . . , yn ).
Gaussian random matrices fit quite well into the framework of free probability the-
ory, asymptotically they are semi-circular elements and they have also nice freeness
properties with other (e.g., non-random) matrices. Gaussian random matrices are
used as input in many basic models in many different mathematical, physical, or
engineering areas. Free probability theory provides then useful tools for the calcu-
lation of the asymptotic eigenvalue distribution for such models. However, in many
situations, Gaussian random matrices are only the first approximation to the con-
sidered phenomena and one would also like to consider more general kinds of such
random matrices. Such generalizations often do not fit into the framework of our
usual free probability theory. However, there exists an extension, operator-valued
free probability theory, which still shares the basic properties of free probability but
is much more powerful because of its wider domain of applicability. In this chapter
we will first motivate the operator-valued version of a semi-circular element, and
then present the general operator-valued theory. Here we will mainly work on a for-
mal level; the analytic description of the theory, as well as its powerful consequences
will be dealt with in the following chapter.
229
230 9 Operator-Valued Free Probability Theory and Block Random Matrices
0.35 0.35
0.3 0.3
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
Fig. 9.1 Histogram of the dN eigenvalues of a random matrix XN , for N = 1000, for two different
realizations
AN BN CN
1
XN = √ BN AN BN , (9.1)
3 C B A
N N N
In order to see what goes wrong on the usual level and what can be saved on
an “operator-valued” level we will now try to calculate the moments of X in our
usual combinatorial way. To construct our first example we shall need the idea of a
circular family of operators, generalizing the idea of a semi-circular family given in
Definition 2.6
Exercise 1. Using the notation of Section 6.8, show that for {c1 , . . . , cn } to be a
circular family it is necessary and sufficient that for every i1 , . . . , im ∈ [n] and every
(ε ) (ε ) (ε ) (ε )
ε1 , . . . , εm ∈ {−1, 1} we have ϕ(ci1 1 · · · cimm ) = ∑π∈NC2 (m) κπ (ci1 1 , . . . , cimm ).
Let us consider the more general situation where X is a d × d matrix X =
(si j )di, j=1 , where {si j } is a circular family with a covariance function σ , i.e.,
The covariance function σ can here be prescribed quite arbitrarily, only subject to
some symmetry conditions in order to ensure that X is self-adjoint. Thus we allow
arbitrary correlations between different entries, but also that the variance of the si j
depends on (i, j). Note that we do not necessarily ask that all entries are semi-
circular. Off-diagonal elements can also be circular elements, as long as we have
s∗i j = s ji .
By Exercise 1 we have
d
1
trd ⊗ ϕ(X m ) =
∑ ϕ si1 i2 · · · sim i1
d i(1),...,i(m)=1
d
1
= ∑ ∑ ∏ σ i p , i p+1 ; iq , iq+1 .
d π∈NC (m) i(1),...,i(m)=1 (p,q)∈π
2
trd ⊗ ϕ(X m ) = ∑ Kπ ,
π∈NC2 (m)
where
d
1
Kπ := ∑ ∏ σ i p , i p+1 ; iq , iq+1 .
d i1 ,...,im =1 (p,q)∈π
So the result looks very similar to our usual description of semi-circular elements,
in terms of a sum over non-crossing pairings. However, the problem here is that the
Kπ are not multiplicative with respect to the block decomposition of π and thus
232 9 Operator-Valued Free Probability Theory and Block Random Matrices
they do not qualify to be considered as cumulants. Even worse, there does not exist
a straightforward recursive way of expressing Kπ in terms of “smaller” Kσ . Thus
we are outside the realm of the usual recursive techniques of free probability theory.
However, one can save most of those techniques by going to an “operator-valued”
level. The main point of such an operator-valued approach is to write Kπ as the trace
of a d × d-matrix κπ , and then realize that κπ has the usual nice recursive structure.
Namely, let us define the matrix κπ = ([κπ ]i j )di, j=1 by
d
[κπ ]i j := ∑ δii1 δ jim+1 ∏ σ i p , i p+1 ; iq , iq+1 .
i1 ...,im ,im+1 =1 (p,q)∈π
Then clearly we have Kπ = trd (κπ ). Furthermore, the value of κπ can be determined
by an iterated application of the covariance mapping
d
[η(B)]i j = ∑ σ (i, k; l, j)bkl .
k,l=1
The main observation is now that the value of κπ is given by an iterated applica-
tion of this mapping η according to the nesting of the blocks of π. If one identifies
a non-crossing pairing with an arrangement of brackets, then the way that η has to
be iterated is quite obvious. Let us clarify these remarks with an example.
Consider the non-crossing pairing
We can then sum over the index i3 (corresponding to the block (2, 3) of π) without
interfering with the other blocks, giving
d d
[κπ ]i j = ∑ σ (i, i2 ; i4 , i5 ) · σ (i5 , i6 ; i6 , j) · ∑ σ (i2 , i3 ; i3 , i4 )
i2 ,i4 ,i5 ,i6 =1 i3 =1
d
= ∑ σ (i, i2 ; i4 , i5 ) · σ (i5 , i6 ; i6 , j) · [η(1)]i2 i4 .
i2 ,i4 ,i5 ,i6 =1
9.1 Gaussian block random matrices 233
Effectively we have removed the block (2, 3) of π and replaced it by the matrix
η(1).
Now we can do the summation over i(2) and i(4) without interfering with the
other blocks, thus yielding
d d
[κπ ]i j = ∑ σ (i5 , i6 ; i6 , j) · ∑ σ (i, i2 ; i4 , i5 ) · [η(1)]i2 i4
i5 ,i6 =1 i2 ,i4 =1
d
= ∑ σ (i5 , i6 ; i6 , j) · η η(1) ii .
5
i5 ,i6 =1
We have now removed the block (1, 4) of π and the effect of this was that we had to
apply η to whatever was embraced by this block (in our case, η(1)).
Finally, we can do the summation over i5 and i6 corresponding to the last block
(5, 6) of π; this results in
d d
[κπ ]i, j = ∑ η η(1) ii · ∑ σ (i5 , i6 ; i6 , j)
5
i5 =1 i6 =1
d
= ∑ η η(1) ii · [η(1)]i5 j
5
i5 =1
= η η(1) · η(1) i j .
Thus we finally have κπ = η η(1) · η(1), which corresponds to the bracket
expression (X(XX)X)(XX). In the same way every non-crossing pairing results in
an iterated application of the mapping η. For the five non-crossing pairings of six
elements one gets the following results:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
η(1) · η(1) · η(1) η(1) · η η(1) η η η(1)
1 2 3 4 5 6 1 2 3 4 5 6
η η(1) · η(1) η η(1) · η(1)
Thus for m = 6 we get for trd ⊗ ϕ(X 6 ) the expression
n
trd η(1) · η(1) · η(1) + η(1) · η η(1) +
234 9 Operator-Valued Free Probability Theory and Block Random Matrices
o
+ η η(1) · η(1) + η η(1) · η(1) + η η η(1) .
E := id ⊗ ϕ : Md (C) → Md (C),
E(X m ) = ∑ κπ . (9.4)
π∈NC2 (m)
series, M(z) = ∑∞ m m
If we go over to the corresponding generating power m=0 E[X ]z ,
2
then this yields the relation M(z) = 1 + z η M(z) · M(z).
Note that m(z) := trd (M(z)) is the generating power series of the moments trd ⊗
ϕ(X m ), in which we are ultimately interested. Thus it is preferable to go over from
M(z) to the corresponding operator-valued Cauchy transform G(z) := z−1 M(1/z).
For this the equation above takes on the form
Furthermore, we have for the Cauchy transform g of the limiting eigenvalue distri-
bution µX of our block matrices XN that
Applying trd , this yields that the support of the limiting eigenvalue distribution of XN
is contained in the interval [−2kηk1/2 , +2kηk1/2 ]. Since all odd moments are zero,
the measure is symmetric. Furthermore, the estimate above on the operator-valued
moments E(X m ) shows that
∞
E(X 2k )
G(z) = ∑ 2k+1
k=0 z
is a contraction for |z| sufficiently large, G(z) is, for large z, uniquely determined as
the solution of the equation (9.5).
If we write G as G(z) = E (z − X)−1 , then this shows that it is not only a formal
power series, but actually an analytic (Md (C)-valued) function on the whole upper
complex half-plane. Analytic continuation shows then the validity of (9.5) for all z
in the upper half-plane.
Let us summarize our findings in the following theorem, which was proved in
[145].
(i j) N
where, for each i, j = 1, . . . , d, the blocks A(i j) = arp r,p=1
are Gaussian N × N
random matrices such that the collection of all entries
(i j)
{arp | i, j = 1, . . . , d; r, p = 1, . . . , N}
(i j) ( ji)
arp = a pr for all i, j = 1, . . . , d; r, p = 1, . . . , N
(i j) (kl) 1
E[arp aqs ] = δrs δ pq · σ (i, j; k, l), (9.7)
n
where n := dN.
Then, for N → ∞, the n × n matrix XN has a limiting eigenvalue distribution
whose Cauchy transform g is determined by g(z) = trd (G(z)), where G is an Md (C)-
valued analytic function on the upper complex half-plane, which is uniquely deter-
mined by the requirement that for z ∈ C+
(where 1 is the identity of Md (C)) and that for all z ∈ C+ , G satisfies the matrix
equation (9.5).
Note also that in [95] it was shown that there exists exactly one solution of the
fixed point equation (9.5) with a certain positivity property.
There exists a vast literature on dealing with such or similar generalizations of
Gaussian random matrices. Most of them deal with the situation where the entries
are still independent, but not identically distributed; usually, such matrices are re-
ferred to as band matrices. The basic insight that such questions can be treated
within the framework of operator-valued free probability theory is due to Shlyakht-
enko [155]. A very extensive treatment of band matrices (not using the language of
free probability, but the quite related Wigner type moment method) was given by
Anderson and Zeitouni [7].
Example 3. Let us now reconsider the limit (9.2) of our motivating band matrix
(9.1). Since there are some symmetries in the block pattern, the corresponding G
will also have some additional structure. To work this out let us examine η more
carefully. If B ∈ M3 (C), B = (bi j )i j then
b + b22 + b33 b12 + b21 + b23 b13 + b31 + b22
1 11
η(B) = b21 + b12 + b32 b11 + b22 + b33 + b13 + b31 b12 + b23 + b32 .
3
b13 + b31 + b22 b23 + b32 + b21 b11 + b22 + b33
We shall see later on that it is important to find the smallest unital subalgebra C of
M3 (C) that is invariant under η. We have
1 0 31
001
1
η(1) = 0 1 0 = 1 + H, where H = 0 0 0 ,
1 3
3 0 1 100
9.1 Gaussian block random matrices 237
002 000
1 2 2
η(H) = 0 2 0 = H + E, where E = 0 1 0 ,
3 3 3
200 000
and
101
1 1 1
η(E) = 0 1 0 = 1 + H.
3 3 3
101
Now HE = EH = 0 and H 2 = 1 − E, so C, the span of {1, H, E}, is a three dimen-
sional commutative subalgebra invariant under η. Let us show that if G satisfies
zG(z) = 1 + η(G(z))G(z) and is analytic then G(z) ∈ C for all z ∈ C+ .
Let Φ : M3 (C) → M3 (C) be given by Φ(B) = z−1 (1+η(B)B). One easily checks
that
kΦ(B)k ≤ |z|−1 (1 + kηkkBk2 )
and
kΦ(B1 ) − Φ(B2 )k ≤ |z|−1 kηk(kB1 k + kB2 k)kB1 − B2 k.
Here kηk is the norm of η as a map from M3 (C) to M3 (C). Since η is completely
positive we have kηk = kη(1)k. In this particular example kηk = 4/3.
Now let Dε = {B ∈ M3 (C) | kBk < ε}. If the pair z ∈ C+ and ε > 0 simultane-
ously satisfies
1 + kηkε 2 < |z|ε and 2εkηk < |z|,
then Φ(Dε ) ⊆ Dε and kΦ(B1 ) − Φ(B2 )k ≤ ckB1 − B2 k for B1 , B2 ∈ Dε and c =
2ε|z|−1 kηk < 1. So when |z| is sufficiently large both conditions are satisfied and
Φ has a unique fixed point in Dε . If we choose B ∈ Dε ∩ C then all iterates of Φ
applied to B will remain in C and so the unique fixed point will be in Dε ∩ C.
Since M3 (C) is finite dimensional there are a finite number of linear functionals,
{ϕi }i , on M3 (C) (6 in our particular example) such that C = ∩i ker(ϕi ). Also for
each i, ϕi ◦ G is analytic so it is identically 0 on C+ if it vanishes on a non-empty
open subset of C+ . We have seen above that G(z) ∈ C provided |z| is sufficiently
large; thus G(z) ∈ C for all z ∈ C+ .
Hence G and η(G) must be of the form
f 0h 2 f +e 0 e+2h
1
G = 0 e 0 , η(G) = 0 2 f + e + 2 h 0 .
3
h0 f e+2h 0 2 f +e
0.3
0.25
0.2
0.15
histogram of eigenvalues of
0.05
XN , from Fig. 9.1, with the
numerical solution according
to (9.9) and (9.10) −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
e ( f + h) + 2 f 2 + h2
zf = 1+ ,
3
e (e + 2 ( f + h)) (9.9)
ze = 1 + ,
3
4 f h + e ( f + h)
zh = .
3
This system of equations can be solved numerically for z close to the real axis; then
dµ(t) 1
g(z) = tr3 G(z) = (2 f (z) + e(z))/3, = − lim Img (t + is) (9.10)
dt π s→0
gives the sought eigenvalue distribution. In Figure 3 we compare this numerical
solution (solid curve) with the histogram for the XN from Fig. 9.1, with blocks of
size 1000 × 1000.
E(b) = b ∀b ∈ B (9.11)
and
E(b1 ab2 ) = b1 E(a)b2 ∀a ∈ A, ∀b1 , b2 ∈ B. (9.12)
9.2 General theory of operator-valued free probability 239
Note that the subalgebra generated by B and some variable x is not just the linear
span of monomials of the form bxn , but, because elements from B and our variable
x do not commute in general, we must also consider general monomials of the form
b0 xb1 x · · · bn xbn+1 .
If B = A then any two subalgebras of A are free with amalgamation over B; so
the claim of freeness with amalgamation gets weaker as the subalgebra gets larger
until the subalgebra is the whole algebra at which point the claim is empty.
Operator-valued freeness works mostly like ordinary freeness, one only has to
take care of the order of the variables; in all expressions they have to appear in their
original order!
Example 6. 1) If x and {y1 , y2 } are free, then one has as in the scalar case
E(y1 xy2 ) = E y1 E(x)y2 ; (9.13)
In the scalar case (where B would just be C and E = ϕ : A → C a unital linear
functional) we write of course ϕ y1 ϕ(x)y2 in the factorized form ϕ(y1 y2 )ϕ(x). In
the operator-valued case this is not possible; we have to leave the E(x) at its position
between y1 and y2 .
2) If {x1 , x2 } and {y1 , y2 } are free over B, then one has the operator-valued ver-
sion of (1.14),
E(x1 y1 x2 y2 ) = E x1 E(y1 )x2 · E(y2 ) + E(x1 ) · E y1 E(x2 )y2
240 9 Operator-Valued Free Probability Theory and Block Random Matrices
where arguments of κπB are distributed according to the blocks of π, but the cumu-
lants are nested inside each other according to the nesting of the blocks of π.
1 2 3 4 5 6 7 8 9 10
π = (1, 10), (2, 5, 9), (3, 4), (6), (7, 8) ∈ NC(10).
Remark 9. Let us give a more formal definition of the operator-valued free cumu-
lants in the following.
1) First note that the bimodule property (9.12) for E implies for κ B the property
for all a1 , . . . , an ∈ A and b0 , . . . , bn+1 ∈ B. This can also stated by saying that κnB is
actually a map on the B-module tensor product A⊗B n = A ⊗B A ⊗B · · · ⊗B A.
2) Let now any sequence {Tn }n of B-bimodule maps: Tn : A⊗B n → B be given. In-
stead of Tn (x1 ⊗B · · · ⊗B xn ) we shall write Tn (x1 , . . . , xn ). Then there exists a unique
extension of T , indexed by non-crossing partitions, so that for every π ∈ NC(n) we
have a map Tπ : A⊗B n → B so that the following conditions are satisfied.
(i) when π = 1n we have Tπ = Tn ;
(ii) whenever π ∈ NC(n) and V = {l + 1, . . . , l + k} is an interval in π then
This second property is called the insertion property. One should notice that ev-
ery non-crossing partition can be reduced to a partition with a single block by the
process of interval stripping. For example with the partition π = {(1, 10), (2, 5, 9),
(3, 4), (6), (7, 8)} from above we strip the interval (3, 4) to obtain {(1, 10), (2, 5, 9),
(6), (7, 8)}. Then we strip the interval (7, 8) to obtain {(1, 10), (2, 5, 9), (6), }, then
we strip the (one element) interval (6) to obtain {(1, 10), (2, 5, 9)}; and finally we
strip the interval (2, 5, 9) to obtain the partition with a single block {(1, 10)}.
1 2 34 5 6 7 8 9 10 1 2 5 6 78 9 10 1 2 5 6 9 10 1 2 5 9 10 1 10
The insertion property requires that the family {Tπ }π be compatible with interval
stripping. Thus if there is an extension satisfying (i) and (ii), it must be unique.
Moreover we can compute Tπ by stripping intervals and the outcome is independent
of the order in which we strip the intervals.
3) Let us call a family {Tπ }π determined as above multiplicative. Then it is quite
straightforward to check the following.
◦ Let {Tπ }π be a multiplicative family of B-bimodule maps and define a new fam-
ily by
Sπ = ∑ Tσ (π ∈ NC(n)). (9.17)
σ ∈NC(n)
σ ≤π
For π = {(1, 10), (2, 5, 9), (3, 4), (6), (7, 8)} ∈ NC(10) from Example 8 the Eπ is,
for example, given by
Eπ (a1 , . . . , a10 ) = E a1 · E a2 · E(a3 a4 ) · a5 · E(a6 ) · E(a7 a8 ) · a9 · a10 .
which is equivalent to (9.16). In particular, this means that the κnB are given by
This is consistent with (9.4) of our example A = Md (C) and B = Md (C), where
these κ’s were defined by iterated applications of η(B) = E(XBX) = κ2B (XB, X).
As in the scalar-valued case one has the following properties, see [163, 184, 190].
Theorem 11. 1) The relation between the Cauchy transform and the R-transform is
given by
Remark 12. 1) As for the moments, one has to allow in the operator-valued cumu-
lants elements from B to spread everywhere between the arguments. So with B-
9.3 Relation between scalar-valued and matrix-valued cumulants 243
If π has two blocks: π = (1, . . . , k), (k + 1, . . . , n), then this is just matrix multiplica-
tion. We then get the general case by using the insertion property and induction. By
Möbius inversion we have
Corollary 14. If the entries of two matrices are free in (C, ϕ), then the two matrices
themselves are free with respect to E : Md (C) → Md (C).
are free with amalgamation over M2 (C) in (M2 (C), id ⊗ϕ). Note that in general they
are not free in the scalar-valued non-commutative probability space (M2 (C), tr ⊗ ϕ).
Let us make this distinction clear by looking on a small moment. We have
a a + b1 c2 a1 b2 + b1 d2
X1 X2 = 1 2 .
c1 a2 + d1 c2 c1 b2 + d1 d2
gives us some conceptual understanding of the problem, whereas the second step
does not give much theoretical insight, but is more of a numerical nature. Clearly,
the bigger the last step, i.e., the larger B, the less we win with working on the B-level
first. So it is interesting to understand how symmetries of the problem allow us to
restrict from B to some smaller subalgebra D ⊂ B. In general, the behaviour of an
element as a B-valued random variable might be very different from its behaviour
as a D-valued random variable. This is reflected in the fact that in general the ex-
pression of the D-valued cumulants of a random variable in terms of its B-valued
cumulants is quite complicated. So we can only expect that nice properties with re-
spect to B pass over to D if the relation between the corresponding cumulants is
easy. The simplest such situation is where the D-valued cumulants are the restric-
tion of the B-valued cumulants. It turns out that it is actually quite easy to decide
whether this is the case.
Then the D-valued cumulants of x are given by the restrictions of the B-valued
cumulants: for all n ≥ 1 and all d1 , . . . , dn−1 ∈ D we have
This statement is from [137]. Its proof is quite straightforward by comparing the
corresponding moment-cumulant formulas. We leave it to the reader.
Exercise 2. Prove Proposition 16.
Proposition 16 allows us in particular to check whether a B-valued semi-circular
element x is also semi-circular with respect to a smaller D ⊂ B. Namely, all B-
valued cumulants of x are given by nested iterations of the mapping η. Hence, if η
maps D to D, then this property extends to all B-valued cumulants of x restricted to
D.
Remark 18. 1) This corollary allows for an easy determination of the smallest
canonical subalgebra with respect to which x is still semi-circular. Namely, if x is
B-semi-circular with covariance mapping η : B → B, we let D be the smallest unital
9.4 Moving between different levels 247
subalgebra of B which is mapped under η into itself. Note that this D exists because
the intersection of two subalgebras which are invariant under η is again a subalge-
bra invariant under η. Then x is also semi-circular with respect to this D. Note that
the corollary above is not an equivalence, thus there might be smaller subalgebras
than D with respect to which x is still semi-circular; however, there is no systematic
way to detect those.
2) Note also that with some added hypotheses the above corollary might become
an equivalence; for example, in [137] it was shown: Let (A, E, B) be an operator-
valued probability space, such that A and B are C∗ -algebras. Let F : B → C =: D ⊂
B be a faithful state. Assume that τ = F ◦ E is a faithful trace on A. Let x be a
B-valued semi-circular variable in A. Then the distribution of x with respect to τ is
the semicircle law if and only if E(x2 ) ∈ C.
Example 19. Let us see what the statements above tell us about our model case
of d × d self-adjoint matrices with semi-circular entries X = (si j )di, j=1 . In Section
9.1 we have seen that if we allow arbitrary correlations between the entries, then
we get a semi-circular distribution with respect to B = Md (C). (We calculated this
explicitly, but one could also invoke Proposition 13 to get a direct proof of this.) The
mapping η : Md (C) → Md (C) was given by
d
[η(B)]i j = ∑ σ (i, k; l, j)bkl .
k,l=1
Thus if ∑dk=1 σ (i, k; k, j) is zero for i 6= j and otherwise independent from i, then X
is semi-circular. The simplest situation where this happens is if all si j , 1 ≤ i ≤ j ≤ d,
are free and have the same variance.
Let us now consider the more special band matrix situation where si j , 1 ≤ i ≤
j ≤ d are free, but not necessarily of the same variance, i.e., we assume that for
i ≤ j, k ≤ l we have
(
σi j , if i = k, j = l
σ (i, j; k, l) = . (9.24)
0, otherwise
Note that this also means that σ (i, k; k, i) = σik , because we have ski = sik . Then
d
[η(1)]i j = δi j ∑ σik .
k=1
248 9 Operator-Valued Free Probability Theory and Block Random Matrices
We see that in order to get a semi-circular distribution we do not need the same
variance everywhere, but that it suffices to have the same sum over the variances in
each row of the matrix.
However, if this sum condition is not satisfied then we do not have a semi-
circular distribution. Still, having all entries free, gives more structure than just
semi-circularity with respect to Md (C). Namely, we see that with the covariance
(9.24) our η maps diagonal matrices into diagonal matrices. Thus we can pass from
Md (C) over to the subalgebra D ⊂ Md (C) of diagonal matrices, and get that for
such situations X is D-semi-circular. The conditional expectation ED : A → D in
this case is of course given by
a11 . . . a1d ϕ(a11 ) . . . 0
.. . . .. .. .. .. .
. . . 7→ . . .
ad1 . . . add 0 . . . ϕ(add )
Even if we do not have free entries, we might still have some symmetries in the
correlations between the entries which let us pass to some subalgebra of Md (C).
As pointed out in Remark 18 we should look for the smallest subalgebra which is
invariant under η. This was exactly what we did implicitly in our Example 3. There
we observed that η maps the subalgebra
f 0h
C := 0 e 0 | e, f , h ∈ C
h0 f
into itself. (And we actually saw in Example 3 that C is the smallest such subalge-
bra, because it is generated from the unit by iterated application of η.) Thus the X
from this example, (9.2), is not only M3 (C)-semi-circular, but actually also C-semi-
circular. In our calculations in Example 3 this was implicitly taken into account,
because there we restricted our Cauchy transform G to values in C, i.e., effectively
we solved the equation (9.5) for an operator-valued semi-circular element not in
M3 (C), but in C.
such non-mean zero Gaussian random matrices (as Ricean model) and why one is
interested in the eigenvalue distribution of HH ∗ .
One can reduce this to a problem involving self-adjoint matrices by observing
that HH ∗ has the same distribution as the square of
0 H 0 B 0 C
T := = + .
H∗ 0 B∗ 0 C∗ 0
The matrix Ĉ is a 2d × 2d self-adjoint matrix with ∗ -free circular entries, thus of the
type we considered in Section 9.1. Hence, by the remarks in Example 19, we know
that it is a D2d -valued semi-circular element, where D2d ⊂ M2d (C) is the subalgebra
of diagonal matrices; one checks easily that the covariance function η : D2d → D2d
is given by
D1 0 η1 (D2 ) 0
η = , (9.25)
0 D2 0 η2 (D1 )
where η1 : Dd → Dd and η2 : Dd → Dd are given by
η1 (D2 ) = id ⊗ ϕ[CD2C∗ ]
By using the well-known Schur complement formula for the inverse of 2 × 2 block
matrices (see also next chapter for more on this), this yields finally
" −1 #
1 ∗
zG1 (z) = EDd 1 − η1 (G2 (z)) + B B
z − zη2 (G1 (z))
The notion of a “deterministic equivalent” for random matrices, which can be found
in the engineering literature, is a non-rigorous concept which amounts to replacing a
random matrix model of finite size (which is usually unsolvable) by another problem
which is solvable, in such a way that, for large N, the distributions of both problems
are close to each other. Motivated by our example in the last chapter we will in this
chapter propose a rigorous definition for this concept, which relies on asymptotic
freeness results. This “free deterministic equivalent” was introduced by Speicher
and Vargas in [166].
This will then lead directly to the problem of calculating the distribution of self-
adjoint polynomials in free variables. We will see that, in contrast to the corre-
sponding classical problem on the distribution of polynomials in independent ran-
dom variables, there exists a general algorithm to deal with such polynomials in
free variables. The main idea will be to relate such a polynomial with an operator-
valued linear polynomial, and then use operator-valued convolution to deal with the
latter. The successful implementation of this program is due to Belinschi, Mai and
Speicher [23]; see also [12].
251
252 10 Polynomials in Free Variables and Operator-Valued Convolution
the sense that if we plug in self-adjoint matrices, we will get as output self-adjoint
matrices), so that the eigenvalue distribution of those polynomials can be recovered
by calculating traces of powers.
To be more specific, let us consider a collection of independent random and de-
terministic N × N matrices:
n o
(N) (N)
XN = X1 , . . . , Xi1 : independent self-adjoint Gaussian matrices,
n o
(N) (N)
YN = Y1 , . . . ,Yi2 : independent non-self-adjoint Gaussian matrices,
n o
(N) (N)
UN = U1 , . . . ,Ui3 : independent Haar distributed unitary matrices,
n o
(N) (N)
DN = D1 , . . . , Di4 : deterministic matrices,
such that S, C, U, D are ∗-free and the joint distribution of d1 , . . . , di4 is given by the
(N) (N)
asymptotic joint distribution of D1 , . . . , Di4 . Then, almost surely, the asymptotic
distribution of PN is that of P s1 , . . . , si1 , c1 , . . . , ci2 , u1 , . . . , ui3 , d1 , . . . , di4 =: p∞ , in
the sense that, for all k, we have almost surely
In this way we can reduce the problem of the asymptotic distribution of PN to the
study of the distribution of p∞ .
A common obstacle of this procedure is that our deterministic matrices may not
have an asymptotic joint distribution. It is then natural to consider, for a fixed N, the
(N) (N)
corresponding “free model” P s1 , . . . , si1 , c1 , . . . , ci2 , u1 , . . . , ui3 , d1 , . . . , di4 =: pN,
where, just as before, the random matrices are replaced by the corresponding free
(N) (N)
operators in some space (AN , ϕN ), but now we let the distribution of d1 , . . . , di4
(N) (N)
be exactly the same as the one of D1 , . . . , Di4 with respect to tr. The free model
10.2 A motivating example: reduction to multiplicative convolution 253
pN will be called the free deterministic equivalent for PN . This was introduced and
investigated in [166, 175].
(In case one wonders about the notation p N : the symbol is according to [31] the
generic qualifier for denoting the free version of some classical object or operation.)
The difference between the distribution of p N and the (almost sure or expected)
distribution of PN is given by the deviation from freeness of XN , YN , UN , DN , the
deviation of XN , YN from being free (semi)-circular systems, and the deviation of
UN from a free system of Haar unitaries. Of course, for large N these deviations
get smaller and thus the distribution of p N becomes a better approximation for the
distribution of PN
Let us denote by GN the Cauchy transform of PN and by G N the Cauchy trans-
form of the free deterministic equivalent p N . Then the usual asymptotic freeness
estimates show that moments of PN are, for large N, with very high probability
close to corresponding moments of p N (where the estimates involve also the opera-
tor norms of the deterministic matrices). This means that for N → ∞ the difference
between the Cauchy transforms GN and G N goes almost surely to zero, even if there
do not exist individual limits for both Cauchy transforms.
In the engineering literature there exists also a version of the notion of a deter-
ministic equivalent (apparently going back to Girko [78], see also [90]). This deter-
ministic equivalent consists in replacing the Cauchy transform GN of the considered
random matrix model (for which no analytic solution exists) by a function ĜN which
is defined as the solution of a specified system of equations. The specific form of
those equations is determined in an ad hoc way, depending on the considered prob-
lem, by making approximations for the equations of GN , such that one gets a closed
system of equations. In many examples of deterministic equivalents (see, e.g., [62,
Chapter 6]) it turns out that actually the Cauchy transform of our free deterministic
equivalent is the solution to those modified equations, i.e., that ĜN = G N . We saw
one concrete example of this in Section 9.5 of the last chapter.
Our definition of a deterministic equivalent gives a more conceptual approach
and shows clearly how this notion relates with free probability theory. In some sense
this indicates that the only meaningful way to get a closed system of equations when
dealing with random matrices is to replace the random matrices by free variables.
Deterministic equivalents are thus polynomials in free variables and it remains
to develop tools to deal with such polynomials in an effective way. It turns out that
operator-valued free probability theory provides such tools. We will elaborate on
this in the remaining sections of this chapter.
in the non-commutative probability space (M2 (C), tr2 ⊗ ϕ). But this element has the
same moments as
2
a1 0 a1 a2 b1 0 a1 a1 a2 b1 0
= =: AB. (10.2)
a2 0 0 0 0 b2 a2 a1 a22 0 b2
So, with µAB denoting the distribution of AB with respect to tr2 ⊗ ϕ, we have
1 1
µAB = µ p + δ0 .
2 2
Since A and B are not free with respect to tr2 ⊗ ϕ, we cannot use scalar-valued
multiplicative free convolution to calculate the distribution of AB. However, with
E : M2 (C) → M2 (C) denoting the conditional expectation onto deterministic 2 × 2
matrices, we have that the scalar-valued distribution µAB is given by taking the trace
tr2 of the operator-valued distribution of AB with respect to E. But on this operator-
valued level the matrices A and B are, by Corollary 9.14, free with amalgamation
over M2 (C). Furthermore, the M2 (C)-valued distribution of A is determined by the
joint distribution of a1 and a2 and the M2 (C)-valued distribution of B is determined
by the joint distribution of b1 and b2 . Hence, the scalar-valued distribution µ p will
be given by first calculating the M2 (C)-valued free multiplicative convolution of
A and B to obtain the M2 (C)-valued distribution of AB and then getting from this
the (scalar-valued) distribution µAB by taking the trace over M2 (C). Thus we have
rewritten our original problem as a problem on the product of two free operator-
valued variables.
10.3 Reduction to operator-valued additive convolution via the linearization trick 255
1 bd −1 a − bd −1 c 0
ab 1 0
= (10.4)
cd 0 1 0 d d −1 c 1
holds. Since the first and third matrix are both invertible in M2 (C) ⊗ A,
−1 −1
1 bd −1 1 −bd −1
1 0 1 0
= and = ,
0 1 0 1 d −1 c 1 −d −1 c 1
256 10 Polynomials in Free Variables and Operator-Valued Convolution
the stated equivalence of (i) and (ii), as well as formula (10.3), follows from (10.4).
What we now need, given our operator p = P(x1 , . . . , xn ), is to find a block ma-
trix such that the (1, 1) entry of the inverse of this block matrix corresponds to the
resolvent (z − p)−1 and that furthermore all the entries of this block matrix have at
most degree 1 in our variables. More precisely, we are looking for an operator
p̂ = b0 ⊗ 1 + b1 ⊗ x1 + · · · + bn ⊗ xn ∈ MN (C) ⊗ A
where
◦ N ∈ N is an integer,
◦ Q ∈ MN−1 (C) ⊗ ChX1 , . . . , Xn i is invertible
10.3 Reduction to operator-valued additive convolution via the linearization trick 257
◦ and U is a row vector and V is a column vector, both of size N − 1 with entries
in ChX1 , . . . , Xn i,
is called a linearization of P, if the following conditions are satisfied:
(i) There are matrices b0 , . . . , bn ∈ MN (C), such that
P̂ = b0 ⊗ 1 + b1 ⊗ X1 + · · · + bn ⊗ Xn ,
Applying the Schur complement, Proposition 1, to this situation yields then the
following.
P̂ = b0 ⊗ 1 + b1 ⊗ X1 + · · · + bn ⊗ Xn ∈ MN (C) ⊗ ChX1 , . . . , Xn i
with matrices b0 , . . . , bn ∈ MN (C). Then the following conditions are equivalent for
any complex number z ∈ C:
(i) The operator z − p with p := P(x1 , . . . , xn ) is invertible in A.
(ii) The operator Λ (z) − p̂ with Λ (z) defined as in (10.5) and
p̂ := b0 ⊗ 1 + b1 ⊗ x1 + · · · + bn ⊗ xn ∈ MN (C) ⊗ A
is invertible in MN (C) ⊗ A.
Moreover, if (i) and (ii) are fulfilled for some z ∈ C, we have that
(This statement looks simplistic taken for itself, but it will be useful when combined
with the third part.)
(ii) A monomial of the form P := Xi1 Xi2 · · · Xik ∈ ChX1 , . . . , Xn i for k ≥ 2, i1 , . . . , ik ∈
{1, . . . , n} has a linearization
Xi1
Xi2 −1
P̂ = ∈ Mk (C) ⊗ ChX1 , . . . , Xn i.
. .
.. ..
Xik −1
with N := (N1 + · · · + Nk ) − k + 1.
(iv) If
0U
∈ MN (C) ⊗ ChX1 , . . . , Xn i
V Q
is a linearization of P, then
0 U V∗
U ∗ 0 Q∗ ∈ M2N−1 (C) ⊗ ChX1 , . . . , Xn i
V Q 0
is a linearization of P + P∗ .
10.4 Analytic theory of operator-valued convolutions 259
p̂ := b0 ⊗ 1 + b1 ⊗ x1 + · · · + bn ⊗ xn ∈ MN (C) ⊗ A.
Note that for this linearization the freeness of the variables plays no role. Where it
becomes crucial is the observation that the freeness of x1 , . . . , xn implies, by Corol-
lary 9.14, the freeness over MN (C) of b1 ⊗ x1 , . . . , bn ⊗ xn . (Note that there is no
classical counter part of this for the case of independent variables.) Hence the distri-
bution of p̂ is given by the operator-valued free additive convolution of the distribu-
tions of b1 ⊗ x1 , . . . , bn ⊗ xn . Furthermore, since the distribution of xi determines also
the MN (C)-valued distribution of bi ⊗ xi , we have finally reduced the determination
of the distribution of P(x1 , . . . , xn ) to a problem involving operator-valued additive
free convolution. As pointed out in Section 9.2 we can in principle deal with such a
convolution.
However, in the last chapter we treated the relevant tools, in particular the
operator-valued R-transform, only as formal power series and it is not clear how one
should be able to derive explicit solutions from such formal equations. But worse,
even if the operator-valued Cauchy and R-transforms are established as analytic ob-
jects, it is not clear how to solve operator-valued equations like the one in Theorem
9.11. There are rarely any non-trivial operator-valued examples where an explicit
solution can be written down; and also numerical methods for such equations are
problematic — a main obstacle being, that those equations usually have many so-
lutions, and it is apriori not clear how to isolate the one with the right positivity
properties. As we have already noticed in the scalar-valued case, it is the subordina-
tion formulation of those convolutions which comes to the rescue. From an analytic
and also a numerical point of view, the subordination function is a much nicer object
than the R-transform.
So, in order to make good use of our linearization algorithm, we need also a well-
developed subordination theory of operator-valued free convolution. Such a theory
exists and we will present in the following the relevant statements. For proofs and
more details we refer to the original papers [23, 25].
where x ≥ 0 and x is invertible; note that this is equivalent to the fact that there
exists a real ε > 0 such that x ≥ ε1. . Any element x ∈ M can be uniquely written
as x = Re(x) + i Im(x), where Re(x) = (x + x∗ )/2 and Im(x) = (x − x∗ )/(2i) are
self-adjoint. We call Re(x) and Im(x) the real and imaginary part of x.
The appropriate domain for the operator-valued Cauchy transform Gx for a self-
adjoint element x = x∗ is the operator upper half-plane
Elements in this open set are all invertible, and H+ (B) is invariant under conjugation
by invertible elements in B, i.e. if b ∈ H+ (B) and c ∈ GL(B) is invertible, then
cbc∗ ∈ H+ (B).
We shall use the following analytic mappings, all defined on H+ (B); all trans-
forms have a natural Schwarz-type analytic extension to the lower half-plane given
by f (b∗ ) = f (b)∗ ; in all formulas below, x = x∗ is fixed in M:
◦ the moment-generating function:
(10.6)
◦ the reciprocal Cauchy transform:
−1
Fx (b) = E (b − x)−1 = Gx (b)−1 ;
(10.7)
◦ the h transform:
−1
hx (b) = E (b − x)−1
− b = Fx (b) − b. (10.9)
Here is now the main theorem from [23] on operator-valued free additive convolu-
tion.
(iii) Gx (ω1 (b)) = Gy (ω2 (b)) = Gx+y (b) for all b ∈ H+ (B).
Moreover, if b ∈ H+ (B), then ω1 (b) is the unique fixed point of the map
and
ω1 (b) = lim fb◦n (w) for any w ∈ H+ (B),
n→∞
where fb◦n denotes the n-fold composition of fb with itself. Same statements hold for
ω2 , with fb replaced by w 7→ hx (hy (w) + b) + b.
x y + 2x
0
p̂ = x 0 −1 ,
y + 2x −1 0
262 10 Polynomials in Free Variables and Operator-Valued Convolution
which means that the Cauchy transform of p can be recovered from the operator-
valued Cauchy transform of p̂, namely we have
!
z 00
ϕ((z − p)−1 ) ∗
−1
G p̂ (b) = (id ⊗ ϕ)((b − p̂) ) = for b = 0 0 0 .
∗ ∗ 000
0 x 2x
0 0 y
p̂ = x 0 −1 + 0 0 0 = X̃ + Ỹ
x
2 −1 0 y 0 0
and hence is the sum of two self-adjoint variables X̃ and Ỹ , which are free over
M3 (C). So we can use the subordination result from Theorem 5 in order to calculate
the Cauchy transform G p of p:
G p (z) ∗
= G p̂ (b) = GX̃+Ỹ (b) = GX̃ (ω1 (b)),
∗ ∗
7
0.35
6
0.3
0.25 5
0.2 4
0.15 3
0.1 2
0.05
1
0
−5 0 5 10 0
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
Fig. 10.1 Plots of the distribution of p(x, y) = xy + yx + x2 (left) for free x, y, where x is semi-
circular and y Marchenko-Pastur, and of the rational function r(x1 , x2 ) (right) for free semicircular
elements x1 and x2 ; in both cases the theoretical limit curve is compared with the histogram of the
eigenvalues of a corresponding random matrix model
1 − 1 x1 − 1 x2 −1 1
1 4 4 2
r(x1 , x2 ) = 0
2 − 14 x2 1 − 14 x1 0
1
0 2 0 0 0 0
1
= 12 −1 + 41 x1 0 + 0 0 4 x2 .
1
0 0 −1 + 14 x1 0 4 x2 0
So again, we can write the linearization as the sum of two M3 (C)-free variables
and we can invoke Theorem 5 for the calculation of its operator-valued Cauchy
transform. In Figure 10.1, we compare the histogram of eigenvalues of r(X1 , X2 ) for
one realization of independent Gaussian random matrices X1 , X2 of size 1000×1000
with the distribution of r(x1 , x2 ) for free semi-circular elements x1 , x2 , calculated
according to this algorithm.
264 10 Polynomials in Free Variables and Operator-Valued Convolution
Other examples for the use of operator-valued free probability methods can be
found in [12].
(ii) For general n ∈ N, prove: if a matrix B ∈ Mn (C) belongs to H+ (Mn (C)) then
all eigenvalues of B lie in the complex upper half-plane C+ . Is the converse also
true?
Chapter 11
Brown Measure
and the support of µa is the spectrum of a; see also the discussion after equation
(2.2) in Chapter 2.
More general, if a ∈ M is normal (i.e., aa∗ = a∗ a), then the spectral theorem
provides us with a projection valued spectral measure Ea and the Brown measure is
265
266 11 Brown Measure
just the spectral measure µa = τ ◦ Ea . Note that in the normal case µa may not be
determined by the moments of a. Indeed, if a = u is a Haar unitary then the moments
of u are the same as the moments of the zero operator. Of course, their ∗-moments
are different. For a normal operator a its spectral measure µa is uniquely determined
by Z
τ(an a∗m ) = zn z̄m dµa (z) (11.1)
C
for all m, n ∈ N. The support of µa is again the spectrum of a.
We will now try to assign to any operator a ∈ M a probability measure µa on
its spectrum, which contains relevant information about the ∗-distribution of a. This
µa will be called the Brown measure of a. One should note that for non-normal
operators there are many more ∗-moments of a than those appearing in (11.1). There
is no possibility to capture all the ∗-moments of a by the ∗-moments of a probability
measure. Hence, we will necessarily loose some information about the ∗-distribution
of a when we go over to the Brown measure of a. It will also turn out that we
need our state τ to be a trace in order to define µa . Hence in the following we will
only work in tracial W ∗ -probability spaces (M, τ). Recall that this means that τ is
a faithful and normal trace. Von Neumann algebras which admit such faithful and
normal traces are usually addressed as finite von Neumann algebras. If M is a finite
factor, then a tracial state τ : M → C is unique on M and is automatically normal
and faithful.
P(λ ) = det(λ I − T ) = (λ − λ1 ) · · · (λ − λn ),
We claim that the function λ 7→ log |λ | is harmonic in C\{0} and that in general
it has Laplacian
∇2 log |λ | = 2πδ0 (11.2)
in the distributional sense. Here the Laplacian is given by
∂2 ∂2
∇2 = + ,
∂ λr2 ∂ λi2
where λr and λi are the real and imaginary part of λ ∈ C. (Note that we use the sym-
bol ∇2 for the Laplacian, since we reserve the symbol ∆ for the Fuglede-Kadison
determinant of the next section.)
Let us prove this claim on the behaviour of log |λ |. For λ 6= 0 we write ∇2 in
terms of polar coordinates,
∂2 1 ∂ 1 ∂2
∇2 = + +
∂ r2 r ∂ r r2 ∂ θ 2
and have
∂2
1 ∂ 1 1
∇2 log |λ | = 2
+ log r = − 2 + 2 = 0.
∂r r ∂r r r
Ignoring the singularity at 0 we can write formally
Z Z
∇2 log |λ |dλr dλi = div(grad log |λ |)dλr dλi
B(0,r) B(0,r)
Z
= grad log |λ | · ndA
∂ B(0,r)
n
Z
= · ndA
∂ B(0,r) r
1
= · 2πr
r
= 2π.
That is Z
∇2 log |λ |dλr dλi = 2π,
B(0,r)
1 1 2 n 1 2
µT = (δλ1 + · · · + δλn ) = ∇ ∑ log |λ − λi | = ∇ log | det(T − λ I)|.
n 2πn i=1 2πn
(11.3)
As there exists a version of the determinant in an infinite dimensional setting we can
use this formula to generalize the definition of µT .
By functional calculus and the monotone convergence theorem, the limit always
exists.
0 . . . tn
0 . . . logtn
11.4 Subharmonic functions and their Riesz measures 269
and
√
1 p p
∆ (T ) = exp (logt1 + · · · + logtn ) = n t1 · · ·tn = n det |T | = n | det T |. (11.4)
n
(ii) f satisfies the submean inequality: for every circle the value of f at the centre is
less or equal to the mean value of f over the circle, i.e.
Z 2π
1
f (z) ≤ f (z + reiθ )dθ ;
2π 0
1
Z Z
f (λ ) · ∇2 ϕ(λ )dλr dλi = ϕ(z)dν f (z) for all ϕ ∈ Cc∞ (R2 ).
2π R2 C
1
Z
f (λ ) = log |λ − z|dν f (z) + h(λ ),
2π C
1 2
µa := ∇ log ∆ (a − λ ) (11.6)
2π
is a probability measure on C with support contained in the spectrum of a.
(iii) Moreover, one has for all λ ∈ C
Z
log |λ − z|dµa (z) = log ∆ (a − λ ) (11.7)
C
Thus
1
log ∆ (a) = lim τ(log(a∗ a + ε)),
2 ε&0
as a decreasing limit as ε & 0. So, with the notations
1
aλ := a − λ , fε (λ ) := τ(log(a∗λ aλ + ε)),
2
we have
f (λ ) = lim fε (λ ).
ε&0
Since we have for general positive operators x and y that τ(xy) = τ(x1/2 yx1/2 ) ≥ 0,
we see that ∇2 fε (λ ) ≥ 0 for all λ ∈ C and thus fε is subharmonic.
The fact that fε & f implies then that f is upper semicontinuous and satisfies
the submean inequality. Furthermore, if λ 6∈ σ (a) then a − λ is invertible, hence
∆ (a − λ ) > 0, and thus f (λ ) 6= −∞. Hence f is subharmonic.
∂2 ∂2 ∂2
∇2 = 2
+ 2
=4
∂ λr ∂ λi ∂ λ̄ ∂ λ
where
∂ 1 ∂ ∂ ∂ 1 ∂ ∂
= −i , = +i .
∂λ 2 ∂ λr ∂ λi ∂ λ̄ 2 ∂ λr ∂ λi
(i) Show that we have for each n ∈ N (by relying heavily on the fact that τ is a
trace)
∂
τ[(a∗λ aλ )n ] = −nτ[(a∗λ aλ )n−1 a∗λ ]
∂λ
and
n
∂
τ[(a∗λ aλ )n a∗λ ] = − ∑ τ[(aλ a∗λ ) j (a∗λ aλ )n− j ].
∂ λ̄ j=0
a∗ aλ
log(a∗λ aλ + ε) = log ε + log 1 + λ .
ε
In the case of a normal operator the Brown measure is just the spectral measure
τ ◦ Ea , where Ea is the projection valued spectral measure according to the spectral
theorem. In that case µa is determined by the equality of the ∗-moments of µa and
of a, i.e., by Z
zn zm dµa (z) = τ(an a∗m ) if a is normal
C
for all m, n ∈ N. If a is not normal, then this equality does not hold anymore. Only
the equality of the moments is always true, i.e., for all n ∈ N
Z Z
zn dµa (z) = τ(an ) and z̄n dµa (z) = τ(a∗n ).
C C
One should note, however, that the Brown measure of a is in general actually
determined by the ∗-moments of a. This is the case, since τ is faithful and the
272 11 Brown Measure
Brown measure depends only on τ restricted to the von Neumann algebra generated
by a; the latter is uniquely determined by the ∗-moments of a, see also Chapter 6,
Theorem 6.2.
What one can say in general about the relation between the ∗-moments of µa and
of a is the following generalized Weyl Inequality of Brown [46]. For any a ∈ M and
0 < p < ∞ we have Z
|z| p dµa (z) ≤ kak pp = τ(|a| p ).
C
This was strengthened by Haagerup and Schultz [87] in the following way: If Minv
denotes the invertible elements in M, then we actually have for all a ∈ M and every
p > 0 that Z
|z| p dµa (z) = inf kbab−1 k pp .
C b∈Minv
Note here that because of ∆ (bab−1 ) = ∆ (a) we have µbab−1 = µa for b ∈ Minv .
Exercise 3. Let (M, τ) be a tracial W ∗ -probability space and a ∈ M. Let p(z) be a
polynomial in the variable z (not involving z̄), hence p(a) ∈ M. Show that the Brown
measure of p(a) is the push-forward of the Brown measure of a, i.e., µ p(a) = p∗ (µa ),
where the push-forward p∗ (ν) of a measure ν is defined by p∗ (ν)(E) = ν(p−1 (E))
for any measurable set E.
The calculation of the Brown measure of concrete non-normal operators is usu-
ally quite hard and there are not too many situations where one has explicit solutions.
We will in the following present some of the main concrete results.
Main examples for R-diagonal operators are Haar unitaries and Voiculescu’s cir-
cular operator. With the exception of multiples of Haar unitaries, R-diagonal opera-
tors are not normal. One main characterization [139] of R-diagonal operators is the
following: a is R-diagonal if and only if a has the same ∗-distribution as up where
u is a Haar unitary, p ≥ 0, and u and p are ∗-free. If ker(a) = {0}, then this can be
11.6 Brown measure of R-diagonal operators 273
1
µa (B(0, r)) = t for r= p , (11.10)
Sa∗ a (t − 1)
where Sa∗ a is the S-transform of the operator a∗ a and B(0, r) is the open disc
with radius r.
(iv) The conditions (i), (ii), and (iii) determine µa uniquely.
(v) The spectrum of an R-diagonal operator a coincides with supp(µa ) unless a−1 ∈
L2 (M, τ)\M in which case supp(µa ) is the annulus (11.9), while the spectrum of
a is the full closed disc with radius kak2 .
We give some key ideas of the proof from [85]; for another proof see [158].
Consider λ ∈ C and put α := |λ |. A key point is to find a relation between µ|a|
and µ|a−λ | . For a probability measure σ , we denote its symmetrized version by σ̃ ,
i.e., for any measurable set E we have σ̃ (E) = (σ (E) + σ (−E))/2. Then one has
the relation
1
µ̃|a−λ | = µ̃|a| (δα + δ−α ), (11.11)
2
or in terms of the R-transforms:
274 11 Brown Measure
√
1 + 4α 2 z2 − 1
Rµ̃|a−λ | (z) = Rµ̃|a| (z) + .
2z
Hence µ|a| determines µ|a−λ | , which determines
Z Z ∞
log |λ − z|dµa (z) = log ∆ (a − λ ) = log ∆ (|a − λ |) = log(t)dµ|a−λ | (t).
C 0
√
Let us consider, as a concrete example, the circular operator c = (s1 + is2 )/ 2,
where s1 and s2 are free standard semi-circular elements. √
The distribution of c∗ c is free Poisson with rate 1, given by the density 4 − t/2πt
on [0, 4], and thus the distribution µ|c| of the absolute value |c| is the quarter-circular
√
distribution with density 4 − t 2 /π on [0, 2]. We have kck2 = 1 and kc−1 k2 = ∞,
and hence the support of the Brown measure of c is the closed unit disc, supp(µc ) =
B(0, 1). This coincides with the spectrum of c.
In order to apply Theorem 8, we need to calculate the S-transform of c∗ c. We have
Rc c (z) = 1/(1−z), and thus Sc∗ c (z) = 1/(1+z) (because z 7→ zR(z) and w 7→ wS(w)
∗
are inverses of each other; see [140, Remark 16.18] and also the discussion √ around
[140, Eq. (16.8)]). So, for 0 < t < 1, we have Sc∗ c (t − 1) = 1/t. Thus µc (B(0, t)) =
t, or, for 0 < r < 1, µc (B(0, r)) = r2 . Together with the rotation invariance this shows
that µc is the uniform measure on the unit disc B(0, 1).
The circular law is the non-self-adjoint version of Wigner’s semicircle law. Con-
sider an N × N matrix where all entries are independent and identically distributed.
If the distribution of the entries is Gaussian then this ensemble is also called Gini-
bre ensemble. It is very easy to check that the ∗-moments of the Ginibre random
matrices converge to the corresponding ∗-moments of the circular operator. So it is
quite plausible to expect that the Brown measure (i.e., the eigenvalue distribution)
of the Ginibre random matrices converges to the Brown measure of the circular op-
11.6 Brown measure of R-diagonal operators 275
erator, i.e., to the uniform distribution on the disc. This statement is known as the
circular law. However, one has to note that the above is not a proof for the circu-
lar law, because the Brown measure is not continuous with respect to our notion of
convergence in ∗-distribution. One can construct easily examples where this fails.
Exercise 5. Consider the sequence (TN )N≥2 of nilpotent N × N matrices
0 1 0 0 0
0 1 0 0
010 0 0 1 0 0
01 0 0 1 0
T2 = , T3 = 0 0 1 , T4 = ,···
, T5 =
00 0 0 0 1 0 0 0 1 0
000 0 0 0 0 1
0 0 0 0
0 0 0 0 0
Show that,
◦ with respect to tr, TN converges in ∗-moments to a Haar unitary element,
◦ the Brown measure of a Haar unitary element is the uniform distribution on the
circle of radius 1
◦ but the asymptotic eigenvalue distribution of TN is given by δ0 .
There are also canonical random matrix models for R-diagonal operators. If one
considers on (non-self-adjoint) N × N matrices a density of the form
N ∗ A))
PN (A) = const · e− 2 Tr( f (A
then one can check, under suitable assumptions on the function f , that the ∗-
distribution of the corresponding random matrix A converges to an R-diagonal op-
erator (whose concrete form is of course determined in terms of f ). So again one
expects that the eigenvalue distribution of those random matrices converges to the
Brown measure of the limit R-diagonal operator, whose form is given in Theorem
8. (In particular, this limiting eigenvalue distribution lives on an, possibly degener-
ate, annulus, i.e. a single ring, even if f has several minima.) This has been proved
recently by Guionnet, Krishnapur, and Zeitouni [82].
276 11 Brown Measure
λr2 λi2
σ (a) = λ ∈ C | + ≤ 1 ,
(1 + γ)2 (1 − γ)2
and the Brown measure µa is the measure with constant density on σ (a):
1
dµa (λ ) = 1 (λ )dλr dλi .
π(1 − γ 2 ) σ (a)
which case Z ∞
∆ (a) = exp log(t)dµ|a| (t) ∈ [0, ∞),
0
Example 10. Let c1 and c2 be two ∗-free circular elements and consider a := c1 c−1 2 .
If c1 , c2 live in the tracial W ∗ -probability space (M, τ), then a ∈ L p (M, τ) for 0 < p <
1. In this case, ∆ (a − λ ) and µa are well defined. In order to calculate µa , one has to
extend the class of R-diagonal operators and the formulas for their Brown measure
to unbounded operators. This was done in [86]. Since the product of an R-diagonal
element with a ∗ free element is R-diagonal, too, we have that a is R-diagonal. So
to use (the unbounded version of) Theorem 8 we need to calculate the S-transform
of a∗ a. Since with c2 , also its inverse c−12 is R-diagonal, we have S|a|2 = S|c1 |2 S|c−1 2.
2 |
The S-transform of the first factor is S|c1 |2 (z) = 1/(1 + z), compare Section 11.6.2.
Furthermore, the S-transforms of x and x−1 are, for positive x, in general related by
Sx (z) = 1/Sx−1 (−1 − z). Since |c−1 2 ∗ −2 and since c∗ has the same distribution
2 | = |c2 | 2
as c2 , we have that S|c−1 |2 = S|c2 |−2 and thus
2
11.9 Hermitization method 277
1 1
S|c−1 |2 (z) = S|c2 |−2 = = 1
= −z.
2 S|c2 |2 (−1 − z) 1−1−z
This gives then S|a|2 (z) = −z/(1 + z), for −1 < z < 0, or S|a|2 (t − 1) = (1 − t)/t for
p
0 < t < 1. So our main formula (11.10) from Theorem 8 gives µa (B(0, t/(1 − t))) =
t or µa (B(0, r)) = r2 /(1 + r2 ). We have kak2 = ∞ = ka−1 k2 , and thus supp(µa ) = C.
The above formula for the measure of balls gives then the density
1 1
dµa (λ ) = dλr dλi . (11.12)
π (1 + |λ |2 )2
For more details and, in particular, the proofs of the above used facts about R-
diagonal elements and the relation between Sx and Sx−1 one should see the original
paper of Haagerup and Schultz [86].
This can also be reformulated in the following form (compare [116], or Lemma 4.2
in [1]: Let us define
−1
Gε,a (λ ) := τ (λ − a)∗ (λ − a)(λ − a)∗ + ε 2 . (11.14)
278 11 Brown Measure
Then
1 ∂
µε,a = Gε,a (λ ) (11.15)
π ∂ λ̄
is a probability measure on the complex plane (whose density is given by ∇2 fε ),
which converges weakly for ε → 0 to the Brown measure of a.
In order to calculate the Brown measure we need Gε,a (λ ) as defined in (11.14).
Let now
0 a
A= ∗ ∈ M2 (M).
a 0
Note that A is self-adjoint. Consider A in the M2 (C)-valued probability space with
respect to E = id ⊗ τ : M2 (M) → M2 (C) given by
a11 a12 τ(a11 ) τ(a12 )
E = .
a21 a22 τ(a21 ) τ(a22 )
and thus we are again in the situation that our quantity of interest is actually one
entry of an operator-valued Cauchy transform: Gε,a (λ ) = g21 (ε, λ ) = [GA (Λε )]21 .
Plugging in back the 2 × 2 matrices for X and Y we get finally the self-adjoint
linearization of A as
00 0 0 x 0 00 0 0 x 0 000000
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 y −1 0 0 0 0 0 −1 0 0 0 0 y 0 0
0 0 y 0 0 −1 0 0 0 0 0 −1 0 0 y 0 0 0 .
= +
x 0 −1 0 0 0 x 0 −1 0 0 0 0 0 0 0 0 0
0 1 0 −1 0 0 0 1 0 −1 0 0 000000
We have written this as the sum of two M6 (C)-free matrices, both of them being
selfadjoint. For calculating the Cauchy transform of this sum we can then use again
the subordination algorithm for the operator-valued free convolution from Theorem
10.5. Putting all the steps together gives an algorithm for calculating the Brown
measure of a = xy. One might note that in the case where both x and y are even
elements (i.e., all odd moments vanish), the product is actually R-diagonal, see [140,
Theorem 15.17]. Hence in this case we even have an explicit formula for the Brown
measure of xy, given by Theorem 8 and the fact that we can calculate the S-transform
of a∗ a in terms of the S-transforms of x and of y.
variables. However, as was already pointed out before (see the discussion around
Exercise 5) this is not automatic from the convergence of all ∗-moments and one
actually has to control probabilities of small eigenvalues during all the calculations.
Such controls have been achieved in the special cases of the circular law or the single
ring theorem. However, for an arbitrary polynomial in asymptotically free matrices,
this is an open problem at the moment.
In Figs. 11.1, 11.2, and 11.3, we give for some polynomials the Brown measure
calculated according to the algorithm outlined above and we also compare this with
histograms of the complex eigenvalues of the corresponding polynomials in inde-
pendent random matrices.
0.25
50
0.2
0.15 40
0.1
30
0.05
20
0
10
-0.05
4
0
2 3
2
0 1 2
0 1
-2 -1 2
-2 0 1
-4 -3 0
-1
-1
-2 -2
Fig. 11.1 Brown measure (left) of p(x, y, z) = xyz − 2yzx + zxy with x, y, z free semicircles, com-
pared to histogram (right) of the complex eigenvalues of p(X,Y, Z) for independent Wigner matri-
ces with N = 5000
250
0.8
200
0.6
0.4 150
0.2 100
0 50
-0.2 0
0
-1 4 -1 3.5
3 3
-2 -2 2.5
2 2
-3 1.5
1 -3 1
-4 0.5
0
Fig. 11.2 Brown measure (left) of p(x, y) = x + iy with x, y free Poissons of rate 1, compared to
histogram (right) of the complex eigenvalues of p(X,Y ) for independent Wishart matrices X and
Y with N = 5000
11.10 Brown measure of arbitrary polynomials in free variables 281
0.14
60
0.12
50
0.1
0.08 40
0.06
30
0.04
0.02 20
0
10
-0.02
4 0
2 3
2 2
0 1 1 2
0
-2 -1 0 1
-2 -1 0
-4 -3 -1
-2 -2
283
284 Solutions to Exercises
α3 1 α2 α13 1
− 2α1 + = (α3 − 3α1 α2 + 2α13 ), so k3 = α3 − 3α1 α2 + 2α13 .
3! 2 2! 3 3!
The terms of degree 4 are
α4 1 α22 α3 1 2 α2 1 4
− 2
+ 2α1 + 3α1 − α1
4! 2 (2!) 3! 3 2! 4
1
= α4 − 4α1 α3 − 3α22 + 12α12 α2 − 6α14 .
4!
Summarizing let us put this in a table.
k1 = α1
k2 = α2 − α12
k3 = α3 − 3α2 α1 + 2α13
k4 = α4 − 4α3 α1 − 3α23 + 12α2 α12 − 6α14 ;
α1 = k1
α2 = k2 + k12
α3 = k3 + 3k2 k1 + k13
α4 = k4 + 4k3 k1 + 3k22 + 6k2 k12 + k14 .
n − l1 n − l1 − · · · − lm−1
n n!
×···× = .
l1 l2 lm l1 !l2 ! · · · lm !
n!
.
(1!)r1 (2!)r2 · · · (n!)rn r1 ! · · · rn !
12.1 Solutions to exercises in Chapter 1 285
4. (i) Write
zn zm
log 1 + ∑ αn = ∑ βm . (12.1)
n≥1 n! m≥1 m!
n
Then by differentiating both sides and multiplying by 1 + ∑n≥1 αn zn! we have
zn−1 zm−1 zn
∑ αn (n − 1)! = ∑ βm (m − 1)! 1 + ∑ αn
n!
n≥1 m≥1 n≥1
and by reindexing
zn zm zn
∑ αn+1 n! = ∑ βm+1 m! 1 + ∑ αn
n!
.
n≥0 m≥0 n≥1
Next let us expand the right-hand side. For convenience of notation we let α0 = 1.
zm zn zm zm+n
∑ βm+1 1 + ∑ αn = ∑ βm+1 + ∑ ∑ βm+1 αn
m≥0 m! n≥1 n! m≥0 m! m≥0 n≥1 m!n!
m
N
z N z
= ∑ βm+1 + ∑ ∑ βm+1 αn
m≥0 m! N≥1 m≥0,n≥1 m N!
m+n=N
N−1
zN
N
N z
= ∑ βN+1 + ∑ ∑ βm+1 αN−m
N≥0 N! N≥1 m=0 m N!
N N
N z
= ∑ ∑ βm+1 αN−m .
N≥0 m=0 m N!
(ii) Now let us start with the equation αn = ∑π∈P (n) kπ . We shall show that this
implies that
n−1
n−1
αn = ∑ km+1 αn−m−1 . (12.3)
m=0 m
We shall adopt the following notation; given π ∈ P(n) we let V1 denote the block of
π containing 1.
n−1
∑ kπ = ∑ ∑ kπ
π∈P (n) m=0 π∈P (n)
|V1 |=m+1
286 Solutions to Exercises
n−1
n−1
= ∑ km+1 ∑ kσ
m=0 m σ ∈P (n−m−1)
n−1
n−1
= ∑ km+1 αn−m−1 .
m=0 m
exp(−hBOs, Osi/2)
Z
= si1 · · · sik ds
Rn (2π)n/2 det(B)−1/2
exp(−hDs, si/2)
Z
= si1 · · · sik ds.
Rn (2π)n/2 det(D)−1/2
Thus {Y1 , . . . ,Yn } are independent and Gaussian. Hence E(YiY j ) = (D−1 )i j . Thus
n n
ci j = E(Xi X j ) = ∑ oik o jl E(YkYl ) = ∑ oik o jl (D−1 )kl
k,l=1 k,l=1
n
= ∑ oik (D−1 )kl ol j = (OD−1 O−1 )i j = (B−1 )i j .
k,l=1
6. (i) We have 1
2 0 20
C= 1 , so B= .
0 2 02
So the first claim follows from the formula for the density, and the second from the
usual conversion to polar coordinates.
(ii) Note that the integral in polar coordinates factors as an integral over θ and
one over r. Thus for any θ
Z
2 2
(t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2
R2
Z
2 2
= eiθ (m−n) (t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2 .
R2
Hence
Z
n 2 2
E(Z m Z ) = (t1 + it2 )m (t1 − it2 )n e−(t1 +t2 ) dt1 dt2 = 0 for m 6= n.
R2
Furthermore, we have
Z 2πZ ∞
1 1
Z
2 2 2
E(|Z|2n ) = (t12 + t22 )n e−(t1 +t2 ) dt1 dt2 = r2n e−r r dr dθ
π R2 π 0 0
Z ∞ Z ∞
2 2
= r2n d(−e−r ) = n r2(n−1) d(−e−r ) = · · · = n!.
0 0
7. We have seen that E(Zi1 · · · Zin Z j1 · · · Z jn ) is the number of pairings π of [2n] such
that for each pair (r, s) of π (with r < s) we have that r ≤ n and n + 1 ≤ s ≤ 2n and
ir = js−n . For such a π let σ be the permutation with σ (r) = s − n; we then have
i = j ◦ σ . Conversely let σ be a permutation of [n] with i = j ◦ σ . Let π be the pairing
with pairs (r, n + σ (r)); then ir = js−n for s = n + σ (r).
288 Solutions to Exercises
= N Tr(X 2 ).
2 2 −N
Thus exp(−hBX, Xi/2) = exp(−NTr(X 2 )/2). Next det(B) = N N 2N . Thus
N N 2 /2 1 N/2
c= .
π 2
∞ ∞ ∞ n
1
exp ∑ βn zn = ∑ n! ∑ βl zl
n=1 n=0 l=1
∞
1 ∞ ∞
= 1+ ∑ ∑ · · · ∑ βl1 · · · βln zl1 +···+ln
n=1 n! l1 =1 ln =1
∞ n βl1 · · · βlm n
= 1+ ∑ ∑ ∑ z .
n=1 m=1 l ,...,lm ≥1 m!
1
l1 +···+lm =n
(ii) We continue from the solution to (i). First we shall work with the sum
12.1 Solutions to exercises in Chapter 1 289
n βl1 · · · βlm
S= ∑ ∑ .
m=1 l1 ,...,lm ≥1 m!
l1 +···+lm =n
We are summing over all tuples l = (l1 , . . . , lm ) of positive integers such that
l1 + · · · + lm = n, i.e. over all compositions of the integer n. By the type of the
composition l = (l1 , . . . , ln ) we mean the n-tuple r = (r1 , . . . , rn ) where the ri ’s
are integers, ri ≥ 0, and ri is the number of l j ’s that equal i. We must have
1 · r1 + 2 · r2 + · · · + n · rn = n and m = r1 + · · · + rn is the number of parts of
l = (l1 , . . . , lm ). Note that βl1 · · · βlm = β1r1 · · · βnrn depends only on the type of
l = (l1 , . . . , lm ). Hence we can group the compositions by their type and thus S be-
comes
β1r1 · · · βnrn
S= ∑ × no. compositions of n of type (r1 , . . . , rn ).
1r1 +···+nrn =n (r1 + · · · + rn )!
(r1 + · · · + rn )!
r1 !r2 ! · · · rn !
thus
β1r1 · · · βnrn
S= ∑ .
1r1 +···+nrn =n r1 !r2 ! · · · rn !
Hence
∞
n
∞ β1r1 · · · βnrn n
exp ∑ n = 1+ ∑
β z ∑ r1 !r2 ! · · · rn !
z .
n=1 n=1 r1 ,...,rn ≥0
1r1 +···+nrn =n
kn
By replacing βn by n! we obtain the equation
∞
n! n ∞ zn
r1 rn z
∑ ∑ k · · · kn = exp ∑ k n .
n=0 r1 ,...,rn ≥0 (1!)r1 · · · (n!)rn r1 !r2 ! · · · rn ! 1 n! n=1 n!
1·r1 +···+n·rn =n
counts the number of partitions of the set [n] of type (r1 , . . . , rn ). If π = {V1 , . . . ,Vm }
is a partition of [n] we let βπ = β|V1 | β|V2 | · · · β|Vm | where |Vi | is the number of elements
in the block Vi . If the type of the partition π is (r1 , . . . , rn ) then β1r1 β2r2 · · · βnrn = βπ .
Thus we can write
∞ β ∞ zn
n
exp ∑ zn = 1 + ∑ ∑ π n! .
β
n=1 n! n=1 π∈P (n)
∞ ∞ ∞ n
1
− log(1 − ∑ βn zn ) = ∑ n ∑ βl zl
n=1 n=1 l=1
∞ ∞ ∞
1
= ∑ n ∑ · · · ∑ βl1 · · · βln zl1 +···+ln
n=1 l1 =1 ln =1
∞ m
1
= ∑ ∑ ∑ βl1 · · · βln zm
m=1 n=1 n l 1 ,...,ln ≥1
l1 +···+ln =m
∞ n
1
= ∑∑ ∑ βl1 · · · βlm zn .
m
n=1 m=1 l 1 ,...,lm ≥1
l1 +···+lm =n
As with the exponential, this is a sum over all compositions of the integer n so we
group the terms according to their type, as was done in the solution to Exercise 10 .
β1r1 · · · βnrn
S= ∑ × no. compositions of n of type (r1 , . . . , rn )
1r1 +···+nrn =n r1 + · · · + rn
12.1 Solutions to exercises in Chapter 1 291
(r1 + · · · + rn − 1)!
= ∑ β1r1 · · · βnrn .
1r1 +···+nrn =n r1 ! · · · rn !
so each of the sequences {αn }n and {kn }n determines the other. Thus we may write
the result of (i) as
∞
zn ∞
zn
∑ kn n! = log 1 + ∑ αn n! .
n=1 n=1
On the other hand replacing the sequence {βn }n by {kn }n in Exercise 11 we have
292 Solutions to Exercises
∞
zn ∞ zn ∞ zn
1 + ∑ αn = exp ∑ kn = 1+ ∑ ∑ π n!k
n=1 n! n=1 n! n=1 π∈P (n)
αn = ∑ kπ .
π∈P (n)
14. Since ν has moments of all orders, ϕ, the characteristic function of ν, has
derivatives of all orders. Fix n > 0. We may write
n
sr
ϕ(t) = 1 + ∑ αr + o(sn )
r=1 r!
Now for l ≥ 1
n n
sr l sr l
∑ αr r! + o(sn ) = ∑ αr r! + o(sn ).
r=1 r=1
Thus
n
(−1)l+1 n sr l
log(ϕ(t)) = ∑ ∑ αr + o(sn )
l=1 l r=1 r!
and hence
n n
sl (−1)l+1 n sr l
∑ kl l! + o(sn ) = ∑ l ∑ αr
r!
+ o(sn ).
l=1 l=1 r=1
By Exercise 12 we have
and
αn = ∑ kπ .
π∈P (n)
8. (i) This follows from applying a cyclic rotation to the moment-cumulant formula
and observing that non-crossing partitions are mapped to non-crossing partitions
under rotations.
12.2 Solutions to exercises in Chapter 2 293
(ii) This is not true, since the property non-crossing is not preserved under arbi-
trary permutations. For example, in the calculation of κ4 (a1 , a2 , a3 , a4 ) the cross-
ing term ϕ(a1 a3 )ϕ(a2 a4 ) does not show up. However, in κ4 (a1 , a3 , a2 , a4 ) this
term becomes non-crossing and will make a contribution. Hence κ4 (a1 , a2 , a3 , a4 ) 6=
κ4 (a1 , a3 , a2 , a4 ) in general, even if all ai commute.
9. For the semi-circle law we have th
th 1 2k
that all odd moments are 0 and the 2k moment
is the k Catalan number k+1 k which is also the cardinality of NC2 (2k), the non-
crossing pairings of [2k]. Since α1 = 0 we have κ1 = 0; and since α2 = κ12 + κ2 we
have κ2 = α2 = 1. Now let NC∗ (n) be the set of non-crossing partitions which are
not pairings. For n = 2k we have
αn = ∑ κπ = ∑ κπ + ∑ κπ = αn + ∑ κπ .
π∈NC(n) π∈NC2 (n) π∈NC∗ (n) π∈NC∗ (n)
Thus for n even ∑π∈NC∗ (n) κπ = 0; and also for n odd because there are no pairings
of [n]. When n = 3, this forces κ3 = 0. Then for general n we write
0= ∑ κπ = κn + ∑ κπ ,
π∈NC∗ (n) π∈NC∗∗ (n)
where NC∗∗ (n) is all the partitions in NC∗ (n) with more than one block. By induc-
tion ∑π∈NC∗∗ (n) κπ = 0; so κn = 0 for n ≥ 3.
11. (iv) We have
∑ c#(π) = αn = ∑ κπ . (12.5)
π∈NC(n) π∈NC(n)
where NC∗∗ (n) is all non-crossing partitions of [n] with more than one block. Thus
(12.5) shows that κn = c.
14. We have
ωa (z) + ωb (z) = 2z − Ra (Ga+b (z)) + Rb (Ga+b (z))
= 2z − Ra+b (Ga+b (z))
= 2z − (z − 1/Ga+b (z))
= z + 1/Ga+b (z)
= z + 1/Ga (ωa (z)).
h−1i
15. By inverting the first equation in (2.32) we have ωa (Gh−1i (z)) = Ga (z) and
h−1i
ωb (Gh−1i (z)) = Gb (z) By the second equation in (2.32) we have
294 Solutions to Exercises
so
ϕ(a0 ϕ̃π (a1 , a2 , . . . , an )) = ϕ ∏ ai1 · · · ϕ ∏ ais−1 ϕ a0 ∏ ais .
i1 ∈V1 is−1 ∈Vs−1 is ∈Vs
crossing partition of [2n0 ]. Thus for π ∈ NC(n) and τ ∈ NC(n̄0 ) we have that π ∪ τ
is a non-crossing partition of [2n0 ] if and only if τ ≤ K(π)0 . Thus
ϕ(x0 y1 x1 · · · yn xn ) = ∑ κσ (x0 , y1 , x1 , . . . , yn , xn )
σ ∈NC(2n0 )
= ∑ κπ (y1 , . . . , yn ) ∑ κτ (x0 , x1 , . . . , xn )
π∈NC(n) τ∈NC(n̄0 )
π∪τ∈NC(2n0 )
= ∑ κπ (y1 , . . . , yn ) ∑ κτ (x0 , x1 , . . . , xn )
π∈NC(n) τ∈NC(n̄0 )
τ≤K(π)0
Since f is a rational function such that limw→∞ w f (w) = 0 and by the residue theo-
rem we have
Z ∞ Z
G(z) = f (t) dt = lim f (w) dw = 2πi(Res( f , z) + Res( f , i)),
−∞ R→∞ CR
where CR is the closed curve formed by joining part of the circle |w| = R in C+ to
the interval [−R, R].
296 Solutions to Exercises
−1 1
Res( f , z) = and Res( f , i) = .
π(z − i)(z + i) 2πi(z − i)
5. (iii) Use the same idea as in Exercise 3.4 (iii) to identify the roots inside Γ .
9. The density is given by
1 −b
dν(t) = dt.
π b2 + (t − a)2
12.3 Solutions to exercises in Chapter 3 297
11. Let 0 < α1 < α2 and β2 > 0 be given, we must find β1 > 0 so that f (Γα1 ,β1 ) ⊂
Γα2 ,β2 . Choose ε > 0 so that
q
1 + α22 1+ε
q > q .
1 + α12 1 − ε 1 + α12
Choose β1 > 0 so that for z ∈ Γα1 ,β1 we have | f (z) − z| < ε|z|. Then
z t
Z Z
zG(z) = dν(t) so zG(z) − 1 = dν(t).
R z − t R z − t
Thus we can apply the result from (iii).
13. By Exercise 12 we have, for Im(z) ≥ 1,
1 + tz 1 |t| p
≤ + ≤ 2 1 + α 2.
z(t − z) |t − z| |t − z|
Write
F(z) a 1 + tz
Z
= +b+ dσ (t).
z z z(t − z)
For a fixed t we have
1 + tz t + z−1
= −→ 0
z(t − z) t −z
as z → ∞. Since |(1 + tz)/(z(t − z))| is bounded independently of t and z then we can
apply the dominated convergence theorem to conclude that F(z)/z → b as z → ∞ in
Γα .
14. (i) By assumption the function t 7→ |t|n is integrable with respect to ν. By
Exercise 12 we have for z ∈ Γα
|t|n+1 p
≤ |t|n 1 + α 2 .
|z − t|
t n+1
Z
lim dν(t) = 0.
z→∞ z−t
(ii) We have
1 1 1 t tn
Z
α1 αn
G(z) − + + · · · + = − + 2 + · · · + n+1 dν(t)
z z2 zn+1 R z−t z z z
Z n+1
1 t
= n+1 dν(t).
z R z−t
Thus Z n+1
1 α αn t
n+1 1
z G(z) − + + · · · + n+1 = dν(t)
z z2 z R z−t
and this integral converges to 0 as z → ∞ in Γα by (i).
15. We shall proceed by induction on n. To begin the induction process let us show
that α1 and α2 are, respectively, the first and second moments of ν. Note that for
any 1 ≤ k ≤ 2n we have that as z → ∞ in Γα
1 α αk
1
lim zk+1 G(z) − + 2 + · · · + k+1 = 0.
z→∞ z z z
12.3 Solutions to exercises in Chapter 3 299
R
Also by Exercise 12, R |t/(z − t)| dν(t) < ∞, so we may let
1 t
Z
G1 (z) = z G(z) − = dν(t).
z R z − t
Then since n is a least 1 we have
α2 1 α α2
1
lim z zG1 (z) − α1 − = lim z3 G(z) − + 2 + 3 = 0.
z→∞ z z→∞ z z z
Hence
lim z zG1 (z) − α1 = α2 .
z→∞
Thus
t2
Z
lim 2
dν(t) = α2 ,
y→∞ R 1 + (t/y)
Now |t/(1 + (t/y)2 )| ≤ |t| and R |t| dν(t) < ∞ so by the dominated convergence
R
R
theorem α1 = R t dν(t).
Suppose that we have shown that ν has moments up to order 2n − 2 and αk , for
1 ≤ k ≤ 2n − 2, is the kth moment. Thus R |t 2n−1 /(z − t)| dν(t) < ∞ by Exercise 12
R
t 2n
Z
= lim dν(t).
y→∞ R 1 + (t/y)2
thus ν has a moment of order 2n and this moment is α2n . Thus ν has a moment of
order 2n − 1 and from Equation (12.6) we have limz→∞ zG2n−1 (z) = α2n−1 . Then by
letting z = iy and taking real parts we obtain that
Z iyt 2n−1
α2n−1 = lim Re iyG2n−1 (iy) = lim Re dν(t)
y→∞ y→∞ R iy − t
t 2n−1
Z
= lim dν(t)
y→∞ R 1 + (t/y)2
2n−1 dν(t).
R
Thus by the dominated convergence theorem α2n−1 = Rt This com-
pletes the induction step.
16. Let us write
1 α1 α2 α3 α4
G(z) = + + 3 + 4 + 5 + r(z)
z z2 z z z
where r(z) = o( z15 ). Then
1
α1
z + αz22 + αz33 + αz44 + zr(z)
z− = 1
.
G(z) z + αz21 + αz32 + αz43 + αz54 + r(z)
and solve for β0 , β1 , β2 , and q(z). After cross multiplication we find that
α2 = α12 + β0 α3 = α1 α2 + β0 α1 + β1 α4 = α1 α3 + α2 β0 + α1 β1 + β2 .
Thus
to show that every sequence in θ h−1i (K) contains a convergent subsequence. Let
{(z1,n , z2,n )}n be a sequence in θ h−1i (K). Then
So there is a subsequence {(z1,nk , z2,nk )}k such that both {z1,nk }k converges to z1 ,
say and {z2,nk }k converges to z2 , say. Then
so (z1 , z2 ) ∈ X. Also θ (z1 , z2 ) = limk θ (z1,nk , z2,nk ) ∈ K. Hence (z1 , z2 ) ∈ θ h−1i (K)
as required.
5. The commutativity of Jk and Jl is a special case of the fact that Jl commutes with
C[Sl−1 ]. For the latter note that for k < l and σ ∈ Sl−1
Thus we have
as
∑ (−N)−l ∑ J1k1 J2k2 · · · Jnkn
l≥0 k1 ,...,kn ≥0
k1 +···+kn =l
and observe that J1k1 · · · Jnkn is a linear combination of permutations of length at most
k1 + · · · + kn .
9. Recall that γ = γm . Given i : [2m] → [n] such that ker(i) ≥ π, let j : [m] → [n]
be defined by j(γ −1 (k)) = i(2k − 1) and j(σ (k)) = i(2k). To show that such a j is
well defined we must show that when σ (k) = γ −1 (l) we have i(2k) = i(2l − 1). If
σ (k) = γ −1 (l) we have that (k, γ −1 (l)) is a pair of σ , and thus (2l − 1, 2k) is a pair
of π. Since we have assumed that ker(i) ≥ π we have i(2l − 1) = i(2k) as required.
Conversely if we have j : [m] → [n], let i(2k − 1) = j(γ −1 (k)) and i(2k) = j(σ (k)).
12.5 Solutions to exercises in Chapter 5 303
12. The first part is just the expansion of the product of matrices. Now let us
write xα(l) = xil i−l and xβ (l) = xiγ(l) ,i−l , where γ is the permutation with one cycle
(1, 2, 3, . . . , k). With this notation we have by Exercise 1.7
E(xi1 i−1 xi2 i−1 · · · xik i−k xi1 i−k ) = E(xα(1) · · · xα(k) xβ (1) · · · xβ (k) )
= |{σ ∈ Sk | α = β ◦ σ }|.
If α = β ◦ σ then il = iγ(σ (l)) and i−l = i−σ (l) for 1 ≤ l ≤ k. Thus for a fixed σ there
are N #(γσ ) ways to choose the k-tuple (i1 , . . . , ik ) so that il = iγ(σ (l)) and M #(σ ) ways
of choosing the k-tuple (i−1 , . . . , i−k ) so that i−l = i−σ (l) . Hence
−1 γ)−k
M #(σ )
E(Tr(Ak )) = ∑ N #(γσ )−k M #(σ ) = ∑ N #(σ )+#(σ .
σ ∈Sk σ ∈Sk N
Thus M #(σ )
−1 γ)−(k+1)
E(tr(Ak )) = ∑ N #(σ )+#(σ
σ ∈Sk N
and by Proposition 1.5 the only σ ’s for which the exponent of N is not negative are
those σ ’s which are non-crossing partitions. Thus lim E(tr(Ak )) = ∑σ ∈NC(k) c#(σ ) .
and the order on this through block as well as the order on all other blocks is then
determined. So we have mn choices, each giving a different permutation.
10. (i) Calculate the free cumulants with the help of the product formula (2.19)
and observe that in both cases there is for each n exactly one contributing pairing in
(2.19); thus κn (s2 , . . . , s2 ) = 1 = κn (cc∗ , . . . , cc∗ ).
(ii) In Example 5.33 (and in Example 5.36) it was shown that κ1,1 (s2 , s2 ) = 1.
(iii) Use the second order version (5.16) of the product formula to see that all
second order moments of cc∗ are zero. It is instructive to do this for the case
κ1,1 (cc∗ , cc∗ ) = 0 and compare this with the calculation of κ1,1 (s2 , s2 ) = 1 in Ex-
ample 5.36. In both cases we have the term corresponding to π1 , whereas it makes
the contribution κ2 (s, s)κ2 (s, s) = 1 in the first case, in the second case its contribu-
tion is κ2 (c, c)κ2 (c∗ , c∗ ) = 0.
−1 −1 −1
with hr ghr = hs ghs then g = h p gh p with p = s − r. Let us consider the reduced
−ε1
form of h p gh−1 p . The p copies of ai1 on the right of h p gh−1
p cannot cancel off
aik+1 because i1 = ik 6= ik+1 . Thus in reduced form h p gh−1
εk+1
p starts with the letter aεi11
repeated p + k times. However in reduced form g starts with the letter ai1 repeated
k times. Hence, in reduced form, the words in {hm gh−1 m }m are distinct, and thus the
conjugacy class of g is infinite.
4. We shall just compute tr ⊗ ϕ(xn ) directly using the moment-cumulant formula.
For this calculation we will need to rewrite
12.6 Solutions to exercises in Chapter 6 305
1 s1 c 1 a11 a12
x= √ ∗ as √ .
2 c s2 2 a21 a22
Then
2
1
tr ⊗ ϕ(xn ) = ϕ(Tr(xn )) = 2−(1+n/2) ∑ ϕ(ai1 i2 · · · ain i1 ).
2 i1 ,...,in =1
Now given i1 , . . . , in
because (i) all mixed cumulants vanish so each block of π must consist either of all
a11 ’s or all a22 ’s or a mixture of a21 and a12 , (ii) the only non-zero cumulant of aii is
κ2 so any blocks that contain aii must be κ2 , and (iii) the only non-zero ∗-cumulants
of ai j (for i 6= j) are κ2 (ai j , a∗i j ) and κ2 (a∗i j , ai j ). Thus we have a sum over pairings.
Moreover if π ∈ NC2 (n) is a pairing and if (r, s) is a pair of π then κπ (ai1 i2 , . . . , ain i1 )
will be 0 unless air ir+1 = (ais is+1 )∗ i.e. ir = is+1 and is = ir+1 . For such a π the con-
tribution is 1 since s1 , s2 , and c all have variance 1. Hence, letting γ = (1, 2, 3, . . . , n)
as in Chapter 1 we have ϕ(ai1 i2 · · · ain i1 ) = |{π ∈ NC2 (n) | i = i ◦ γ ◦ π}|. Thus
= 2−(1+n/2) ∑ 2#(γπ) .
π∈NC2 (n)
Now recall from Chapter 1 that for any pairing (interpreted as a permutation in
Sn )
#(π) + #(γπ) + #(γ) = n + 2(1 − g)
and π ∈ NC2 (n) if and only if g = 0. Thus for any π ∈ NC2 (n)
Now expand eλ x into a power series; for λ ≥ 0 all the terms are positive. Hence
R∞ n
0 x dµ(x) < ∞ for all n. Likewise if for some λ < 0 we have E(eλ X ) < ∞ then for
R0 n
all n, −∞ x dµ(x) < ∞. Hence if E(eλ X ) < ∞ for all |λ | ≤ λ0 then X has moments
of all orders and E(eλ0 |X| ) < ∞. Thus by the dominated convergence theorem λ 7→
E(eλ X ) has a convergent power series expansion in λ with a radius of convergence
of at least λ0 . In fact the proof shows that if there are λ1 < 0 and λ2 > 0 with
E(eλ1 X ) < ∞ and E(eλ2 X ) < ∞ then for all λ1 ≤ λ ≤ λ2 we have E(eλ X ) < ∞ and
we may choose λ0 = min{−λ1 , λ2 }.
3. (i) We have
5. We have learned this statement and its proof from an unpublished manuscript of
Uffe Haagerup.
(i) By using the Taylor series expansion
∞
zn
log(1 − z) = − ∑ ,
n=1 n
which converges for every complex number z 6= 1 with |z| ≤ 1, we derive an expan-
sion for log |s − t|, by substituting s = 2 cos u and t = 2 cos v:
Then one has to show (which is not trivial) that the convergence is strong enough
to allow term-by-term integration.
(ii) For this one has to show that
Z +2 2,
n=0
Cn (t)dµW (t) = −1, n = 2 .
−2
0, otherwise
6. (i) Let us first see that the mapping T ⊗ IN : (MNsa )n → (MNsa )n transports mi-
crostates for (x1 , . . . , xn ) into microstates for (y1 , . . . , yn ). Namely, let A = (A1 , . . . , An ) ∈
Γ (x1 , . . . , xn ; N, r, ε) be a microstate for (x1 , . . . , xn ) and consider B = (B1 , . . . , Bn ) :=
(T ⊗ IN )A, i.e., Bi = ∑nj=1 ti j A j . Then we have for each k ≤ r:
≤ (cn)r ε,
In order to get the reverse inequality, we do the same argument for the inverse
map, (x1 , . . . , xn ) = T −1 (y1 , . . . , yn ), which gives
(ii) If (x1 , . . . , xn ) are linear dependent there are (α1 , . . . , αn ) ∈ Cn \ {0} such that
0 = α1 x1 + · · · + αn xn . Since the xi are selfadjoint, the αi can be chosen real. Without
restriction we can assume that α1 6= 0.
Now consider T = In + β T 0 , where T 0 = (ti j )ni, j=1 with ti j = δ1i α j . Then T is
invertible for any β 6= −α1−1 and det T = 1 + α1 β .
On the other hand we also have T (x1 , . . . , xn ) = (x1 , . . . , xn ). Hence, by (i),
2. We have
k
∂ j (Xi1 · · · Xik )(X j ⊗ 1) = ∑ δ j,il Xi1 · · · Xil ⊗ Xil+1 · · · Xik
l=1
where we have adopted the convention that we have Xi1 · · · Xil ⊗ Xil+1 · · · Xik =
Xi1 · · · Xik ⊗ 1 when l = k. Similarly
k
(1 ⊗ X j )∂ j (Xi1 · · · Xik ) = ∑ δ j,il Xi1 · · · Xil−1 ⊗ Xil · · · Xik ,
l=1
and we have adopted the convention that Xi1 · · · Xil−1 ⊗ Xil · · · Xik = 1 ⊗ Xi1 · · · Xik
when l = 1. Thus
k k
∂i p = ∑ δi,il xi1 · · · xil−1 ⊗ xil+1 · · · xik , ∂i p ∗ = ∑ δi,il xik · · · xil+1 ⊗ xil−1 · · · xi1 .
l=1 l=1
Thus
k
hξi , pi = h∂i∗ (1 ⊗ 1), pi = h1 ⊗ 1, ∂i pi = ∑ δi,il τ(xi1 · · · xil−1 )τ(xil+1 · · · xik )
l=1
and
k
hξi , p∗ i = h∂i∗ (1 ⊗ 1), p∗ i = h1 ⊗ 1, ∂i p∗ i ∑ δi,il τ(xik · · · xil+1 )τ(xil−1 · · · xi1 ).
l=1
k
(∂i p∗ )∗ = ∑ δi,il xil+1 · · · xik ⊗ xi1 · · · xil−1 .
l=1
(iii) First we note that for r ∈ Chx1 , . . . , xn i we have by the Leibniz rule
The first term becomes h(id ⊗τ)(∂i p)·q, ri, the middle term becomes h∂i∗ (p⊗q), ri,
and the last term becomes hp · (τ ⊗ id)(∂i q), ri.
(iv) We write p = xi1 · · · xik and q = x j1 · · · x jn . Then using the expansion in (i) we
have
Next
h1 ⊗ ξi , ∂i (p∗ ) · 1 ⊗ qi
k n
= ∑ ∑ δi,il δi, jm τ[τ(x jn · · · x jm+1 )x jm−1 · · · x j1 xi1 · · · xil−1 τ(xil+1 · · · xik )]
l=1 m=1
k l−1
+∑ ∑ δi,il δi,ir τ[x jn · · · x j1 xi1 · · · xir−1 τ(xir+1 · · · xil−1 )τ(xil+1 · · · xik )]
l=1 r=1
310 Solutions to Exercises
and
hξi ⊗ 1, ∂i (p∗ ) · 1 ⊗ qi
k k
= ∑ ∑ δi,il δi,ir τ[x jn · · · x j1 xi1 · · · xil−1 τ(xil+1 · · · xir−1 )τ(xir+1 · · · xik )].
l=1 r=l+1
(v) Check that for p, r ∈ Chx1 , . . . , xn i we have h(id ⊗ τ)(∂i r), pi = hr, ∂i∗ (p ⊗ 1)i.
This shows that Chx1 , . . . , xn i is in the domain of the adjoint of (id ⊗ τ) ◦ ∂i , hence
this adjoint has a dense domain and thus (id ⊗ τ) ◦ ∂i is itself closable.
5. (i) Since we assumed that we have p ∈ L3 (R) we have that hε and H(p) are in
L3 (R) and khε − H(p)k3 → 0. Thus by Hölder’s inequality
Z
|hε (s) − H(p)(s)|2 p(s) ds ≤ khε − H(p)k23 kpk3 .
as ε → 0+ . Thus
Z Z
2π f (s)hε (s)p(s) ds → 2π f (s)H(p)(s)p(s) ds = 2π τ( f (x)H(p)(x)) = τ( f (x)ξ ).
(s − t) f (s)
ZZ
= 2 p(s)p(t) ds dt
(s − t)2 + ε 2
s−t
Z Z
= 2 f (s)p(s) p(t) dt ds
(s − t)2 + ε 2
Z
= 2π f (s)p(s)hε (s) ds
→ τ( f (x)ξ ) for ε → 0.
Z R
Z π
G(x + iε)3 dx = −i G(iε + Reiθ )3 Reiθ dθ .
−R 0
Hence
Z R Z π π
G(x + iε)3 dx = G(iε + Reiθ )3 Reiθ dθ ≤ →0 as R → ∞.
(R − c)3
−R 0
Thus G(x + iε)3 dx = 0. By taking the imaginary part of this equality we get that
R
Z Z
hε (s)2 p(s) ds = 3 p(s)3 ds.
contributing partitions in the moment-cumulant formula for τ(ξi xi(1) · · · xi(m) ) are of
the form π = {(1, k)} ∪ σ1 ∪ σ2 , where σ1 is a non-crossing partition of [1, k − 1]
and σ2 is a non-crossing partition of [k + 1, m]. Then we have
τ(ξi xi(1) · · · xi(m) ) = ∑ κ2 (ξi , xi(k) )κσ1 (xi(1) , . . . , xi(k−1) )κσ2 (xi(k+1) , . . . , xi(m) )
(1,k)∪σ1 ∪σ2
! !
= ∑ κ2 (ξi , xi(k) ) ∑ κσ1 (xi(1) , . . . , xi(k−1) ) ∑ κσ2 (xi(k+1) , . . . , xi(m) )
k σ1 σ2
Let us now show that (i) implies (ii). We do this by induction on m. It is clear that
the conjugate relations (8.12) for m = 0 and m = 1 are equivalent to the cumulant
relations for m = 0 and m = 1 from (ii). So remains to consider the cases m ≥ 3.
Assume (i) and that we have already shown the conditions (ii) up to m − 1. We have
to show it for m. By our induction hypothesis we know that in
the cumulants involving ξi are either of length 2 or the maximal one, κm+1 (ξi , xi(1) ,
. . . , xi(m) ); hence
τ(ξi xi(1) · · · xi(m) ) = ∑ κπ (ξi , xi(1) , . . . , xi(m) ) + κm+1 (ξi , xi(1) , . . . , xi(m) )
π=(1,k)∪σ1 ∪σ2
Since the first sum gives by our assumption (i) the value τ(ξi xi(1) · · · xi(m) ) it follows
that κm+1 (ξi , xi(1) , . . . , xi(m) ) = 0.
7. By Theorem 8.20 we have to show that κ1 (ξ ) = 0, κ2 (ξ , x1 + x2 ) = 1 and
κm+1 (ξ , x1 + x2 , . . . , x1 + x2 ) = 0 for all m ≥ 2. However, this follows directly from
the facts that ξ is conjugate variable for x1 (hence we have κ1 (ξ ) = 0, κ2 (ξ , x1 ) = 1
and κm+1 (ξ , x1 , . . . , x1 ) = 0 for all m ≥ 2) and that mixed cumulants in {x1 , ξ } and
x2 vanish; for this note that ξ as a conjugate variable is in L2 (x1 ) and the vanishing
of mixed cumulants in free variables goes also over to a situation, where one of the
variables is in L2 .
8. By Theorem 8.20, the condition that for a conjugate system we have ξi =
xi is equivalent to the cumulant conditions: κ1 (xi ) = 0, κ2 (xi , xi(1) ) = δii(1) , and
κm+1 (xi , xi(1) , . . . , xi(m) ) = 0 for m ≥ 2 and all 1 ≤ i, i(1), . . . , i(m) ≤ n. But these are
just the cumulants of a free semi-circular family.
9. Note that in the special case where i 6∈ {i(1), . . . , i(k − 1), i(k + 1), . . . i(m)} we
have
12.8 Solutions to exercises in Chapter 8 313
This follows by noticing that in this case in the formula (8.6) for the action of ∂i∗
only the first term is different from zero and gives, by also using ∂i∗ (1 ⊗ 1) = si ,
exactly the above result.
Thus we get in the case where all i(1), . . . , i(m) are different
n n m
∑ ∂i∗ ∂i si(1) · · · si(m) = ∑ ∑ δii(k) ∂i∗ si(1) · · · si(k−1) ⊗ si(k+1) · · · si(m)
i=1 i=1 k=1
m
∗
= ∑∂i(k) si(1) · · · si(k−1) ⊗ si(k+1) · · · si(m)
k=1
m
= ∑ si(1) · · · si(k−1) si(k) si(k+1) · · · si(m)
k=1
= msi(1) · · · si(k−1) si(k) si(k+1) · · · si(m) .
But the first two sums cancel and thus we remain with exactly the same as in
m−1 m−1
τ ⊗ τ(∂Um (x)) = ∑ τ(Uk )τ(Um−k−1 ) = ∑ αk+1 αm−k .
k=0 k=0
For the relevance of this in the context of Schwinger-Dyson equations, see [131].
314 Solutions to Exercises
2. We have
Note that the assumption implies that also all κπB (xd1 , . . . , xdn−1 , x) for π ∈ NC(n)
are in D. Applying ED to the equation above gives thus
then we get the equality of the B-valued and the D-valued cumulants by induction.
1
Im(b11 ) > 0 and Im(b11 )Im(b22 ) − |b12 − b21 |2 > 0.
4
(ii) Assume that λ ∈ C is an eigenvalue of B ∈ H+ (Mn (C)). We want to show
that Im(λ ) > 0. Let η ∈ Cn with kηk = 1 be a corresponding eigenvector of B, i.e.
Bη = λ η. Since Im(B) is positive definite, it follows
12.11 Solutions to exercises in Chapter 11 315
1 1
hBη, ηi − hB∗ η, ηi =
0 < hIm(B)η, ηi = hBη, ηi − hη, Bηi = Im(λ ),
2i 2i
as desired.
The converse is not true as shown by the following counterexample for n = 2.
Take a matrix of the form
λ1 ρ
B=
0 λ2
with Im(λ1 ) > 0, Im(λ2 ) > 0 and some ρ ∈ C. B satisfies the condition that
all its eigenvalues belong to the upper half-plane C+ . However, if in addition
|ρ| ≥ 2 Im(λ1 )Im(λ2 ) holds, it cannot belong to H+ (M2 (C)), since the second
p
characterizing condition of H+ (M2 (C)), Im(b11 )Im(b22 ) > |b12 − b21 |2 /4, is vio-
lated.
1. We shall show that while ∇2 log |z| = 0 as a function, ∇2 log |z| = 2πδ0 as a
distribution, where δ0 is the distribution which evaluates a test function at (0, 0).
1
In other words, G(z, w) = 2π log |z − w| is the Green function of the the Laplacian
on R2 . To see what this means first note that by writing log |z| dxdy = r log r drdθ ,
where (r, θ ) are polar coordinates, we see that log |z| is a locally integrable function
on R2 . Thus it determines (see Rudin [152, Ch. 6]) a distribution
ZZ p
f 7→ f (x, y) log x2 + y2 dxdy
R2
∂ f ∂g ∂ 2g ∂ f ∂ g ∂ 2g
∇ · f ∇g = +f 2+ + f 2 = f ∇2 g + ∇ f · ∇g
∂x ∂x ∂x ∂y ∂y ∂y
so that
f ∇2 g − g∇2 f = ∇ · ( f ∇g − g∇ f ).
316 Solutions to Exercises
p
(ii) Let g(x, y) = log x2 + y2 and f be a test function. Choose R large enough
so that supp( f ) ⊂ DR . We show that for all 0 < r < R
ZZ p Z 1 ∂f
∇2 f (x, y) log x2 + y2 dxdy = f − log r ds.
Dr,R ∂ Dr r ∂r
1 ∂f
ZZ p Z Z
∇2 f (x, y) log x2 + y2 dxdy = f ds − log r ds.
Dr,R r ∂ Dr ∂ Dr ∂ r
Now as r → 0, 02π ∂f
converges to 2π ∂∂ rf (0, 0) and r log r con-
R
∂ r (r cos θ , r sin θ ) dθ
verges to 0. Thus
ZZ p ZZ p
∇2 f (x, y) log x2 + y2 dxdy = ∇2 f (x, y) log x2 + y2 dxdy
R2 DR
ZZ p
= lim ∇2 f (x, y) log x2 + y2 dxdy
r→0 Dr,R
= 2π f (0, 0)
as claimed.
4. Let us put
0 a 0λ
A := , Λ := .
a∗ 0 λ̄ 0
Note that both A and Λ are selfadjoint and have with respect to tr ⊗ τ the distribu-
tions µ̃|a| , and (δα + δ−α )/2, respectively, and that A −Λ has the distribution µ̃|a−λ | .
(It is of course important that we are in a tracial setting, so that aa∗ and a∗ a have the
same distribution.)
It remains to show that A and Λ are free with respect to tr ⊗ τ. For this note that
the kernel of tr ⊗ τ on the unital algebra generated by A is spanned by matrices of
the form
(aa∗ )k−1 a
∗ k
(aa ) − τ((aa∗ )k )
0 0
or
(a∗ a)k−1 a∗ 0 0 (a∗ a)k − τ((a∗ a)k )
(12.7)
for some k ≥ 1; whereas the kernel of tr ⊗ τ on the algebra generated by Λ is just
spanned by the off-diagonal matrices of the form
0 |λ |k λ
= |λ |k Λ
|λ |k λ̄ 0
for all n and all choices of A1 , . . . , An from the collection (12.7). Multiplication with
Λ has on the Ai the effect that we get matrices from the collection
∗ k−1
(aa∗ )k − τ((aa∗ )k )
(aa ) a 0 0
or .
0 (a∗ a)k−1 a∗ (a∗ a)k − τ((a∗ a)k ) 0
(12.8)
318 Solutions to Exercises
Hence we have to see that whenever we multiply matrices from the collection (12.8)
in any order we get only matrices where all entries vanish under the application of
τ. Let us denote the non-trivial entries in the matrices from (12.8) as follows.
1. Lars Aagaard and Uffe Haagerup. Moment formulas for the quasi-nilpotent
DT-operator. Internat. J. Math., 15(6):581–628, 2004.
2. Gernot Akemann, Jinho Baik, and Philippe Di Francesco, editors. The Oxford
handbook of random matrix theory. Oxford University Press, 2011.
3. Naum I. Akhiezer. The classical moment problem and some related questions
in analysis. Hafner Publishing Co., New York, 1965.
4. Naum I. Akhiezer and Izrail’ M. Glazman. Theory of linear operators in
Hilbert space. Vol. II. Pitman, Boston, 1981.
5. Greg W. Anderson. Convergence of the largest singular value of a polynomial
in independent Wigner matrices. Ann. Probab., 41(3B):2103–2181, 2013.
6. Greg W. Anderson, Alice Guionnet, and Ofer Zeitouni. An introduction to ran-
dom matrices, volume 118 of Cambridge Studies in Advanced Mathematics.
Cambridge University Press, Cambridge, 2010.
7. Greg W. Anderson and Ofer Zeitouni. A CLT for a band matrix model. Probab.
Theory Relat. Fields, 134(2):283–338, 2006.
8. Michael Anshelevich. Free martingale polynomials. J. Funct. Anal.,
201(1):228–261, 2003.
9. Michael V. Anshelevich, Serban T. Belinschi, Marek Bożejko, and Franz
Lehner. Free infinite divisibility for q-Gaussians. Math. Res. Lett., 17(5):905–
916, 2010.
10. Octavio Arizmendi, Takahiro Hasebe, Franz Lehner, and Carlos Vargas. Rela-
tions between cumulants in noncommutative probability. Adv. Math., 282:56–
92, 2015.
11. Octavio Arizmendi, Takahiro Hasebe, and Noriyoshi Sakuma. On the law of
free subordinators. ALEA, Lat. Am. J. Probab. Math. Stat., 10(1):271–291,
2013.
12. Octavio Arizmendi, Ion Nechita, and Carlos Vargas. On the asymptotic dis-
tribution of block-modified random matrices. J. Math. Phys., 57(1):015216,
2016.
319
320 References
13. David H. Armitage and Stephen J. Gardiner. Classical potential theory. Lon-
don: Springer, 2001.
14. Zhidong Bai and Jack W. Silverstein. CLT for linear spectral statistics of
large-dimensional sample covariance matrices. Ann. Probab., 32(1A):553–
605, 2004.
15. Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional
random matrices. Springer Series in Statistics. Springer, New York, second
edition, 2010.
16. Teodor Banica, Serban Teodor Belinschi, Mireille Capitaine, and Benoit
Collins. Free Bessel laws. Canad. J. Math, 63(1):3–37, 2011.
17. Teodor Banica and Roland Speicher. Liberation of orthogonal Lie groups.
Adv. Math., 222(4):1461–1501, 2009.
18. Serban T Belinschi. Complex Analysis Methods In Noncommutative Proba-
bility. PhD thesis, Indiana University, 2005.
19. Serban T Belinschi. The Lebesgue decomposition of the free additive convo-
lution of two probability distributions. Probab. Theory Related Fields, 142(1-
2):125–150, 2008.
20. Serban T. Belinschi and H. Bercovici. A property of free entropy. Pacific J.
Math., 211(1):35–40, 2003.
21. Serban T Belinschi and Hari Bercovici. A new approach to subordination
results in free probability. J. Anal. Math., 101(1):357–365, 2007.
22. Serban T Belinschi, Marek Bożejko, Franz Lehner, and Roland Speicher. The
normal distribution is -infinitely divisible. Adv. Math., 226(4):3677–3698,
2011.
23. Serban T Belinschi, Tobias Mai, and Roland Speicher. Analytic subordina-
tion theory of operator-valued free additive convolution and the solution of
a general random matrix problem. J. Reine Angew. Math., published online:
2015-04-12.
24. Serban T Belinschi, Piotr Śniady, and Roland Speicher. Eigenvalues of non-
hermitian random matrices and Brown measure of non-normal operators: Her-
mitian reduction and linearization method. arXiv preprint arXiv:1506.02017,
2015.
25. Serban T. Belinschi, Roland Speicher, John Treilhard, and Carlos Vargas.
Operator-valued free multiplicative convolution: analytic subordination the-
ory and applications to random matrix theory. Int. Math. Res. Not. IMRN,
2015(14):5933–5958, 2015.
26. Gérard Ben Arous and Alice Guionnet. Large deviations for Wigner’s law
and Voiculescu’s non-commutative entropy. Probab. Theory Related Fields,
108(4):517–542, 1997.
27. Florent Benaych-Georges. Taylor expansions of R-transforms: application to
supports and moments. Indiana Univ. Math. J., 55(2):465–481, 2006.
28. Florent Benaych-Georges. Rectangular random matrices, related convolution.
Probab. Theory Related Fields, 144(3-4):471–515, 2009.
References 321
29. Hari Bercovici and Vittorino Pata. Stable laws and domains of attraction in
free probability theory. Ann. of Math. (2), 149:1023–1060, 1999.
30. Hari Bercovici and Dan-Virgil Voiculescu. Lévy-Hinčin type theorems for
multiplicative and additive free convolution. Pac. J. Math., 153(2):217–248,
1992.
31. Hari Bercovici and Dan-Virgil Voiculescu. Free convolution of measures with
unbounded support. Indiana Univ. Math. J., 42(3):733–773, 1993.
32. Hari Bercovici and Dan-Virgil Voiculescu. Superconvergence to the central
limit and failure of the Cramér theorem for free random variables. Probab.
Theory Related Fields, 103(2):215–222, 1995.
33. Philippe Biane. Some properties of crossings and partitions. Discrete Math.,
175(1-3):41–53, 1997.
34. Philippe Biane. Processes with free increments. Math. Z., 227(1):143–174,
1998.
35. Philippe Biane. Representations of symmetric groups and free probability.
Adv. Math., 138(1):126–181, 1998.
36. Philippe Biane. Entropie libre et algèbres d’opérateurs. Séminaire Bourbaki,
43:279–300, 2002.
37. Philippe Biane. Free probability and combinatorics. Proc. ICM 2002, Beijing,
Vol. II:765–774, 2002.
38. Philippe Biane, Mireille Capitaine, and Alice Guionnet. Large deviation
bounds for matrix Brownian motion. Invent. Math., 152(2):433–459, 2003.
39. Philippe Biane and Franz Lehner. Computation of some examples of Brown’s
spectral measure in free probability. Colloq. Math., 90(2):181–211, 2001.
40. Philippe Biane and Roland Speicher. Stochastic calculus with respect to free
Brownian motion and analysis on Wigner space. Probab. Theory Related
Fields, 112(3):373–409, 1998.
41. Patrick Billingsley. Probability and Measure. Wiley Series in Probability and
Mathematical Statistics. John Wiley & Sons, Inc., New York, third edition,
1995.
42. Charles Bordenave and Djalil Chafaı̈. Around the circular law. Probab. Surv.,
9:1–89, 2012.
43. Marek Bożejko. On Λ (p) sets with minimal constant in discrete noncommu-
tative groups. Proc. Amer. Math. Soc., 51(2):407–412, 1975.
44. Michael Brannan. Approximation properties for free orthogonal and free uni-
tary quantum groups. J. Reine Angew. Math., 2012(672):223–251, 2012.
45. Edouard Brézin, Claude Itzykson, Giorgio Parisi, and Jean-Bernard Zuber.
Planar diagrams. Comm. Math. Phy., 59(1):35–51, 1978.
46. Lawrence G. Brown. Lidskiı̆’s theorem in the type II case. In Geometric
methods in operator algebras (Kyoto, 1983), volume 123 of Pitman Res. Notes
Math. Ser., pages 1–35. Longman Sci. Tech., Harlow, 1986.
47. Thierry Cabanal-Duvillard. Fluctuations de la loi empirique de grandes matri-
ces aléatoires. Ann. Inst. H. Poincaré Probab. Statist., 37(3):373–402, 2001.
322 References
65. Peter L. Duren. Univalent functions, volume 259 of Grundlehren der Math-
ematischen Wissenschaften [Fundamental Principles of Mathematical Sci-
ences]. Springer-Verlag, New York, 1983.
66. Kenneth J. Dykema. On certain free product factors via an extended matrix
model. J. Funct. Anal., 112(1):31–60, 1993.
67. Kenneth J. Dykema. Interpolated free group factors. Pacific J. Math.,
163(1):123–135, 1994.
68. Kenneth J. Dykema. Multilinear function series and transforms in free proba-
bility theory. Adv. Math., 208(1):351–407, 2007.
69. Alan Edelman and N Raj Rao. Random matrix theory. Acta Numer., 14:233–
297, 2005.
70. Wiktor Ejsmont, Uwe Franz, and Kamil Szpojankowski. Convolution, subor-
dination and characterization problems in noncommutative probability. Indi-
ana Univ. Math. J., 66:237–257, 2017.
71. Joshua Feinberg and Anthony Zee. Non-Hermitian random matrix theory:
method of Hermitian reduction. Nuclear Phys. B, 504(3):579–608, 1997.
72. Valentin Féray and Piotr Śniady. Asymptotics of characters of symmetric
groups related to Stanley character formula. Ann. of Math. (2), 173(2):887–
906, 2011.
73. Amaury Freslon and Moritz Weber. On the representation theory of partition
(easy) quantum groups. J. Reine Angew. Math., 720:155–197, 2016.
74. Roland M. Friedrich and John McKay. Homogeneous Lie Groups and Quan-
tum Probability. ArXiv e-prints, June 2015.
75. Bent Fuglede and Richard V. Kadison. Determinant theory in finite factors.
Ann. of Math. (2), 55:520–530, 1952.
76. Liming Ge. Applications of free entropy to finite von Neumann algebras. II.
Ann. of Math. (2), 147(1):143–157, 1998.
77. Vyacheslav L. Girko. The circular law. Teor. Veroyatnost. i Primenen.,
29(4):669–679, 1984.
78. Vyacheslav L. Girko. Theory of stochastic canonical equations. Vol. I, vol-
ume 535 of Mathematics and its Applications. Kluwer Academic Publishers,
Dordrecht, 2001.
79. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete mathe-
matics. Addison-Wesley Publishing Company, Reading, MA, second edition,
1994.
80. David S. Greenstein. On the analytic continuation of functions which map the
upper half plane into itself. J. Math. Anal. Appl., 1:355–362, 1960.
81. Yinzheng Gu, Hao-Wei Huang, and James A. Mingo. An analogue of the
Lévy-Hinčin formula for bi-free infinitely divisible distributions. Indiana
Univ. Math. J., 65(5):1795–1831, 2016.
82. Alice Guionnet, Manjunath Krishnapur, and Ofer Zeitouni. The single ring
theorem. Ann. of Math. (2), 174(2):1189–1217, 2011.
324 References
83. Alice Guionnet and Dimitri Shlyakhtenko. Free monotone transport. Invent.
Math., 197(3):613–661, 2014.
84. Uffe Haagerup. Random matrices, free probability and the invariant subspace
problem relative to a von Neumann algebra. Proc. ICM 2002, Beijing, Vol.
I:273–290, 2002.
85. Uffe Haagerup and Flemming Larsen. Brown’s spectral distribution measure
for R-diagonal elements in finite von Neumann algebras. J. Funct. Anal.,
176(2):331–367, 2000.
86. Uffe Haagerup and Hanne Schultz. Brown measures of unbounded operators
affiliated with a finite von Neumann algebra. Math. Scand., 100(2):209–263,
2007.
87. Uffe Haagerup and Hanne Schultz. Invariant subspaces for operators in a
general II1 -factor. Publ. Math. Inst. Hautes Études Sci., 109(1):19–111, 2009.
88. Uffe Haagerup, Hanne Schultz, and Steen Thorbjørnsen. A random matrix
∗ (F ). Adv. Math., 204(1):1–83, 2006.
approach to the lack of projections in Cred 2
89. Uffe Haagerup and Steen Thorbjørnsen. A new application of random matri-
∗ (F )) is not a group. Ann. of Math. (2), 162(2):711–775, 2005.
ces: Ext(Cred 2
90. Walid Hachem, Philippe Loubaton, and Jamal Najim. Deterministic equiv-
alents for certain functionals of large random matrices. Ann. Appl. Probab.,
17(3):875–930, 2007.
91. John Harer and Don Zagier. The Euler characteristic of the moduli space of
curves. Invent. Math., 85(3):457–485, 1986.
92. Walter K. Hayman and Patrick B. Kennedy. Subharmonic functions. Vol. I.
London Mathematical Society Monographs. No. 9. Academic Press, London-
New York, 1976.
93. J. William Helton, Tobias Mai, and Roland Speicher. Applications of realiza-
tions (aka linearizations) to free probability. arXiv preprint arXiv:1511.05330,
2015.
94. J. William Helton, Scott A. McCullough, and Victor Vinnikov. Noncom-
mutative convexity arises from linear matrix inequalities. J. Funct. Anal.,
240(1):105–191, 2006.
95. J. William Helton, Reza Rashidi Far, and Roland Speicher. Operator-valued
semicircular elements: solving a quadratic matrix equation with positivity con-
straints. Int. Math. Res. Not. IMRN, 2007(22):Art. ID rnm086, 15, 2007.
96. Fumio Hiai and Dénes Petz. Asymptotic freeness almost everywhere for ran-
dom matrices. Acta Sci. Math. (Szeged), 66(3-4):809–834, 2000.
97. Fumio Hiai and Dénes Petz. The semicircle law, free random variables and
entropy, volume 77 of Mathematical Surveys and Monographs. American
Mathematical Society, Providence, RI, 2000.
98. Graham Higman. The units of group-rings. Proc. Lond. Math. Soc., 2(1):231–
248, 1940.
99. Kenneth Hoffman. Banach spaces of analytic functions. Dover Publications,
Inc., New York, 1988.
References 325
100. Hao-Wei Huang and Jiun-Chau Wang. Analytic aspects of the bi-free partial
R-transform. J. Funct. Anal., 271(4):922–957, 2016.
101. Leon Isserlis. On a formula for the product-moment coefficient of any order
of a normal frequency distribution in any number of variables. Biometrika,
12(1/2):134–139, 1918.
102. Alain Jacques. Sur le genre d’une paire de substitutions. C. R. Acad. Sci. Paris
Sér. A-B, 267:A625–A627, 1968.
103. Romuald A. Janik, Maciej A. Nowak, Gábor Papp, and Ismail Zahed. Non-
Hermitian random matrix models. Nuclear Phys. B, 501(3):603–642, 1997.
104. Kurt Johansson. On fluctuations of eigenvalues of random Hermitian matrices.
Duke Math. J., 91(1):151–204, 1998.
105. Vaughan F. R. Jones. Index for subfactors. Invent. Math., 72(1):1–25, 1983.
106. Richard V. Kadison and John R. Ringrose. Fundamentals of the theory of
operator algebras. Vol. II. American Mathematical Society, Providence, RI,
1997.
107. Dmitry S. Kaliuzhnyi-Verbovetskyi and Victor Vinnikov. Foundations of free
noncommutative function theory, volume 199 of Mathematical Surveys and
Monographs. American Mathematical Society, Providence, RI, 2014.
108. Todd Kemp, Ivan Nourdin, Giovanni Peccati, and Roland Speicher. Wigner
chaos and the fourth moment. Ann. Probab., 40(4):1577–1635, 2012.
109. Paul Koosis. Introduction to H p spaces, volume 115 of Cambridge Tracts in
Mathematics. Cambridge University Press, Cambridge, second edition, 1998.
110. Claus Köstler and Roland Speicher. A noncommutative de Finetti theorem:
invariance under quantum permutations is equivalent to freeness with amalga-
mation. Comm. Math. Phy., 291(2):473–490, 2009.
111. Bernadette Krawczyk and Roland Speicher. Combinatorics of free cumulants.
J. Combin. Theory Ser. A, 90(2):267–292, 2000.
112. Milan Krbalek, Petr Šeba, and Peter Wagner. Headways in traffic flow: Re-
marks from a physical perspective. Phys. Rev. E, 64(6):066119, 2001.
113. Germain Kreweras. Sur les partitions non croisées d’un cycle. Discrete Math.,
1(4):333–350, 1972.
114. Burkhard Kümmerer and Roland Speicher. Stochastic integration on the Cuntz
algebra O∞ . J. Funct. Anal., 103(2):372–408, 1992.
115. Timothy Kusalik, James A. Mingo, and Roland Speicher. Orthogonal polyno-
mials and fluctuations of random matrices. J. Reine Angew. Math., 604:1–46,
2007.
116. Flemming Larsen. Brown measures and R-diagonal elements in finite von
Neumann algebras. PhD thesis, University of Southern Denmark, 1999.
117. Franz Lehner. Cumulants in noncommutative probability theory I. Noncom-
mutative exchangeability systems. Math. Z., 248(1):67–100, 2004.
118. François Lemeux and Pierre Tarrago. Free wreath product quantum groups:
the monoidal category, approximation properties and free probability. J. Funct.
Anal., 270(10):3828–3883, 2016.
326 References
207. Alexander Zvonkin. Matrix integrals and map enumeration: an accessible in-
troduction. Math. Comput. Modelling, 26(8):281–304, 1997.
Index of Exercises
333
334 Index of Exercises
3 ... 205
4 ... 210
5 ... 214
6 ... 217
7 ... 217
8 ... 225
9 ... 225
10 . . . 225
11 . . . 226
12 . . . 226
13 . . . 226
14 . . . 227
Chapter 9
1 . . . 231
2 . . . 246
Chapter 10
1 . . . 258
2 . . . 264
Chapter 11
1 ... 267
2 ... 271
3 ... 272
4 ... 274
5 ... 275
Index
335
336 INDEX
Marchenko-Pastur, 63 in law, 34
semi-circle, 62, 67 vague, 71
central limit theorem weak, 71
classical, 37 of probability measures, 34
free, 40 covariance of operator-valued
central sequence, 194 semicircular element, 232
characteristic function, 14 Cramér’s theorem, 186
Chebyshev polynomials C∗ -operator-valued probability
of first kind, 129 space, 259
as orthogonal polynomials, 160 C∗ -probability space, 26
combinatorial interpretation, 161 cumulant
recursion, 225 classical, 14
of second kind, 159 free, 42
as orthogonal polynomials, 159 mixed, 46
combinatorial interpretation, 163 second-order, 149
recursion, 225 cumulant generating series, 183
circular element, 173 cumulant series, 50
free cumulants of, 174 cutting edge of graph, 116
of second order, 156 cyclically alternating, 145
circular family, 231
free cumulants of, 231 Denjoy-Wolff theorem, 93
circular law, 275 deterministic equivalent, 250, 253
closed pair, 161 free, 253
complex Gaussian random matrix, distribution, 168
173 algebraic, 34
complex Gaussian random variable, arc-sine, 61
17 Cauchy, 64
compression of von Neumann determined by moments, 34
algebra, 172 Carleman condition, 34
conditional expectation, 238 elliptic, 276
conjugate Poisson kernel, 213 free Poisson, 44, 123
conjugate relations, 210 Haar unitary, 108
conjugate system, 210 Marchenko-Pastur, 44
convergence of random variable, 34
almost sure for random matrices, quarter-circular, 174
100 R-diagonal, 272
in distribution, 34 second-order limiting, 129
in moments, 34 semi-circle, 23
in probability, 104
of averaged eigenvalue eigenvalue distribution
distribution, 100 joint, 188
of real-valued random variables empirical, 182
in distribution, 34 elliptic operator, 276
338 INDEX
vi.i.mmxix