Sagt (N-Up 2x1)
Sagt (N-Up 2x1)
Preface v
Contents vi
Spectral and Algebraic Graph Theory
Incomplete Draft, dated February 6, 2025 Notation xxii
Current version available at http://cs-www.cs.yale.edu/homes/spielman/sagt.
5 Fundamental Graphs 44
6 Comparing Graphs 53
7 Cayley Graphs 61
i
CHAPTER LIST ii CHAPTER LIST iii
11 Walks, Springs, and Resistor Networks 102 28 A brief introduction to Coding Theory 225
14 Approximating Effective Resistances 126 31 PSRGs via Random Walks on Graphs 248
IV Spectra and Graph Structure 160 35 The Conjugate Gradient and Diameter 277
22 Local Graph Clustering 178 39 Testing Isomorphism of Graphs with Distinct Eigenvalues 304
23 Spectral Partitioning in a Stochastic Block Model 186 40 Testing Isomorphism of Strongly Regular Graphs 314
Bibliography 360
Preface
Please note that this is a rapidly evolving draft. You will find warning messages at
the start of sections that need substantial editing.
This book is about how combinatorial properties of graphs are related to algebraic properties of
associated matrices, as well as applications of those connections. One’s initial excitement over this
material usually stems from its counter-intuitive nature. I hope to convey this initial amazement,
but then make the connections seem intuitive. After gaining intuition, I hope the reader will
appreciate the material for its beauty.
This book is mostly based on lecture notes from the “Spectral Graph Theory” course that I have
taught at Yale, with notes from “Graphs and Networks” and “Spectral Graph Theory and its
Applications” mixed in. I love the material in these courses, and find that I can never teach
everything I want to cover within one semester. This is why I wrote this book. As this book is
based on lecture notes, it does not contain the tightest or most recent results. Rather, my goal is
to introduce the main ideas and to provide intuition.
There are three tasks that one must accomplish in the beginning of a course on Spectral Graph
Theory:
• One must convey how the coordinates of eigenvectors correspond to vertices in a graph.
This is obvious to those who understand it, but it can take a while for students to grasp.
• One must introduce necessary linear algebra and show some interesting interpretations of
graph eigenvalues.
• One must derive the eigenvalues of some example graphs to ground the theory.
I find that one has to do all these at once. For this reason my first few lectures jump between
developing theory and examining particular graphs. For this book I have decided to organize the
material differently, mostly separating examinations of particular graphs from the development of
the theory. To help the reader reconstruct the flow of my courses, I give three orders that I have
used for the material:
put orders here
There are many terrific books on Spectral Graph Theory. The four that influenced me the most
are
v
PREFACE vi
Other books that I find very helpful and that contain related material include
Contents
“Modern Graph Theory” by Bela Bollobas,
“Eigenspaces of Graphs” By Dragos Cvetkovic, Peter Rowlinson, and Slobodan Simic Contents vi
“Numerical Linear Algebra” by Lloyd N. Trefethen and David Bau, III I Introduction and Background 1
“Applied Numerical Linear Algebra” by James W. Demmel
1 Introduction 2
For those needing an introduction to linear algebra, a perspective that is compatible with this 1.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
book is contained in Gil Strang’s “Introduction to Linear Algebra.” For more advanced topics in
linear algebra, I recommend “Matrix Analysis” by Roger Horn and Charles Johnson, as well as 1.2 Matrices for Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
their “Topics in Matrix Analysis.” For treatments of physical systems related to graphs, the topic
1.2.1 A spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
of Part III, I recommend Gil Strang’s “Introduction to Applied Mathematics”, Sydney H. Gould’s
“Variational Methods for Eigenvalue Problems”, and “Markov Chains and Mixing Times” by 1.2.2 An operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Levin, Peres and Wilmer.
1.2.3 A quadratic form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
I have gained a lot of intuition for spectral and algebraic graph theory by examining examples. I
1.3 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
include many examples so that you can play with them, develop your own intuition, and test your
own ideas. My preferred environment for computational experiments is a Jupyter notebook 1.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
written in the Julia programming language. All of the code used in this book may be found at
1.4.1 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
this GitHub repository: https://github.com/danspielman/sagt_code. If you want to start
running the code in this book, you should begin by importing a few packages and setting some 1.5 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
defaults with the lines
1.5.1 Spectral Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
using Laplacians, LinearAlgebra, Plots, SparseArrays, FileIO, JLD2, Random 1.5.2 Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
gr(); default(fmt=:png)
1.5.3 Platonic Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Random.seed!(0)
1.5.4 The Fiedler Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.5 Bounding Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
vii
CONTENTS viii CONTENTS ix
10 Random Walks on Graphs 92 12.5 Equivalent Networks, Elimination, and Schur Complements . . . . . . . . . . . . . . 114
32 Sparsification by Random Sampling 255 35 The Conjugate Gradient and Diameter 277
38 Fast Laplacian Solvers by Sparsification 297 40.8 Distinguishing Sets for Strongly Regular Graphs . . . . . . . . . . . . . . . . . . . . 318
38.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 40.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
38.2 Today’s notion of approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
38.3 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 VII Interlacing Families 322
38.4 A symmetric expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
41 Expected Characteristic Polynomials 323
38.5 D and A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
41.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
38.6 Sketch of the construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
41.2 Random sums of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
38.7 Making the construction efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
41.3 Interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
38.8 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
41.4 Sums of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
39 Testing Isomorphism of Graphs with Distinct Eigenvalues 304 41.5 Random Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
CONTENTS xx CONTENTS xxi
42 Quadrature for the Finite Free Convolution 330 45.3 Properties of the Matching Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 355
42.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 45.4 The Path Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
42.2 The Finite Free Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 45.5 Root bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
42.3 Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Bibliography 360
42.4 Quadrature by Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
42.5 Structure of the Orthogonal Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
42.6 The Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
42.7 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
xxii
Chapter 1
Introduction
Part I In this chapter we present essential background on graphs and spectral theory. We also introduce
some spectral and algebraic graph theory, describe some of the topics covered in this book, and
try to give some useful intuition about graph spectra.
First, we recall that a graph G = (V, E) is specified by its vertex1 set, V , and edge set, E. In an
undirected graph, the edge set is a set of unordered pairs of vertices. We use the notation (a, b) to
indicate an edge between vertices a and b. As this edge is undirected, this is the same as edge
(b, a). Some prefer to write undirected edges using set notation, like {a, b}; but, we won’t do that.
Unless otherwise specified, all graphs discussed in this book will be undirected, simple (having no
loops or multiple edges) and finite. We will sometimes assign weights to edges. These will usually
be positive real numbers. If no weights have been specified, we will assume all edges have weight
1. This is an arbitrary choice, and we should remember that it has an impact.
Graphs (also called “networks”) are typically used to model connections or relations between
things, where “things” are vertices. When the edges in a graph are more important than the
vertices, we may just specify an edge set E and omit the ambient vertex set.
Common “natural” examples of graphs are:
• Friendship graphs: people are vertices, edges exist between pairs of people who are friends
(assuming the relation is symmetric).
• Network graphs: devices, routers and computers are vertices, edges exist between pairs that
are connected.
• Circuit graphs: electronic components, such as transistors, are vertices: edges exist between
pairs connected by wires.
1
I will use the words “vertex” and “node” interchangeably. Sorry about that.
1 2
CHAPTER 1. INTRODUCTION 3 CHAPTER 1. INTRODUCTION 4
• Protein-Protein Interaction graphs: proteins are vertices. Edges exist between pairs that While the adjacency matrix is the most natural matrix to associate with a graph, I find it the
interact. These should really have weights indicating the strength and nature of interaction. least useful. Eigenvalues and eigenvectors are most meaningful when used to understand a
So should most other graphs. natural operator or a natural quadratic form. The adjacency matrix provides neither.
One of the purposes of spectral theory is to provide an understanding of what happens when one Recall that the eigenvectors are not uniquely determined, although the eigenvalues are. If ψ is an
repeatedly applies a linear operator like W G . eigenvector, then −ψ is as well. Some eigenvalues can be repeated. If λi = λi+1 , then ψ i + ψ i+1
will also be an eigenvector of eigenvalue λi . The eigenvectors of a given eigenvalue are only
determined up to an orthogonal transformation.
1.2.3 A quadratic form
Definition 1.3.2. A matrix is positive definite, written M ≻ 0, if it is symmetric and all of its
The most natural quadratic form associated with a graph is defined in terms of its Laplacian eigenvalues are positive. It is positive semidefinite, written M ⪰ 0, if it is symmetric and all of
matrix, its eigenvalues are nonnegative.
def
LG = D G − M G .
Fact 1.3.3. The Laplacian matrix of a graph is positive semidefinite.
V
Given a function on the vertices, x ∈ IR , the value of the Laplacian quadratic form of a weighted
graph in which edge (a, b) has weight wa,b > 0 is Proof. Let ψ be a unit eigenvector of L of eigenvalue λ. Then,
X X
x T LG x = wa,b (x (a) − x (b))2 . (1.1) ψ T Lψ = ψ T λψ = λ(ψ T ψ) = λ = wa,b (ψ(a) − ψ(b))2 ≥ 0.
(a,b)∈E (a,b)∈E
This form measures the smoothness of the function x . It will be small if the function x does not
vary too much between the vertices connected by any edge.
We always number the eigenvalues of the Laplacian from smallest to largest. Thus, λ1 = 0. We
We will occasionally want to consider the Laplacians of graphs that have both positively and
will refer to λ2 , and in general λk for small k, as low-frequency eigenvalues. λn is a high-frequency
negatively weighted edges. As there are many reasonable definitions of these Laplacians, we will
eigenvalue. We will see why soon.
only define them when we need them.
We now review the highlights of the spectral theory for symmetric matrices. Almost all of the Before we start proving theorems, we will see examples that should convince you that the
matrices we consider will be symmetric or will be similar5 to symmetric matrices. eigenvalues and eigenvectors of graphs are meaningful.
det(xI − M ). M = path_graph(4)
Matrix(M)
Theorem 1.3.1. [The Spectral Theorem] For every n-by-n, real, symmetric matrix M , there 0.0 1.0 0.0 0.0
exist real numbers λ1 , . . . , λn and n mutually orthogonal unit vectors ψ 1 , . . . , ψ n and such that ψ i 1.0 0.0 1.0 0.0
is an eigenvector of M of eigenvalue λi , for each i. 0.0 1.0 0.0 1.0
0.0 0.0 1.0 0.0
This is the great fact about symmetric matrices. If the matrix is not symmetric, it might not have
n eigenvalues. And, even if it has n eigenvalues, their eigenvectors will not be orthogonal6 . If M
And, here is its Laplacian matrix
is not symmetric, its eigenvalues and eigenvectors might be the wrong thing to study.
5
A matrix M is similar to a matrix B if there is a non-singular matrix X such that X −1 M X = B. In this case, Matrix(lap(M))
M and B have the same eigenvalues. See the exercises at the end of this section.
6
You can prove that if the eigenvectors are orthogonal, then the matrix is symmetric.
1.0 -1.0 0.0 0.0
CHAPTER 1. INTRODUCTION 7 CHAPTER 1. INTRODUCTION 8
L = lap(path_graph(10))
E = eigen(Matrix(L))
E.values’
0.0 0.097887 0.381966 0.824429 1.38197 2.0 2.61803 3.17557 3.61803 3.90211
E.vectors[:,1] plot(v2,marker=5,legend=false)
xlabel!("vertex number")
0.31622776601683755 ylabel!("value in eigenvector")
0.31622776601683716
0.31622776601683766 The x-axis is the name/number of the vertex, and the y-axis is the value of the eigenvector at
0.3162277660168381 that vertex. Now, let’s look at the next few eigenvectors.
0.31622776601683855
0.3162277660168381
0.3162277660168385
0.31622776601683805
0.3162277660168378
0.3162277660168378
v2 = E.vectors[:,2]
0.44170765403093926
0.3984702312962002
0.31622776601683794
0.20303072371134548
0.0699596195707542
plot(E.vectors[:,2:4],label=["v2" "v3" "v4"],marker = 5)
-0.06995961957075394
xlabel!("Vertex Number")
-0.2030307237113458
ylabel!("Value in Eigenvector")
-0.3162277660168378
-0.39847023129619985
-0.44170765403093826 You may now understand why we refer to these as the low-frequency eigenvectors. The curves
they trace out resemble the low-frequency modes of vibration of a string. The reason for this is
Let’s plot that. that the path graph can be viewed as a discretization of the string, and its Laplacian matrix is a
CHAPTER 1. INTRODUCTION 9 CHAPTER 1. INTRODUCTION 10
discretization of the Laplace operator. We will relate the low-frequency eigenvalues to -0.15623 0.353553
connectivity. 0.15623 0.353553
0.377172 0.353553
In contrast, the highest frequency eigenvalue alternates positive and negative with every vertex.
-0.377172 -1.66533e-16
We will see that the high-frequency eigenvectors may be related to problems of graph coloring
-0.15623 -4.16334e-16
and finding independent sets.
0.15623 -5.82867e-16
0.377172 2.77556e-16
-0.377172 -0.353553
0.4
v10 -0.15623 -0.353553
0.15623 -0.353553
0.377172 -0.353553
Value in Eigenvector
0.2
In the figure below, we use these eigenvectors to draw the graph. Vertex a has been plotted at
0.0
coordinates ψ 2 (a), ψ 3 (a). That is, we use ψ 2 to provide a horizontal coordinate for every vertex,
and ψ 3 to obtain a vertical coordinate. We then draw the edges as straight lines.
-0.2
-0.4
2 4 6 8 10
Vertex Number
Plots.plot(E.vectors[:,10],label="v10",marker=5)
xlabel!("Vertex Number")
ylabel!("Value in Eigenvector")
1.5 Highlights
We now attempt to motivate this book, and the course on which it is based, by surveying some of
its highlights. plot_graph(M,V[:,1],V[:,2])
2.00
1.75
1.50
1.25
1.00
1.00 1.25 1.50 1.75 2.00
That’s a great way to draw a graph if you start out knowing nothing about it8 . Note that the
middle of the picture is almost planar, although edges do cross near the boundaries.
It is important to note that the eigenvalues do not change if we relabel the vertices. Moreover, if
we permute the vertices then the entries of the eigenvectors are similarly permuted. That is, if P
is a permutation matrix, then
because P T P = I . To prove it by experiment, let’s randomly permute the vertices, and plot the
permuted graph.
8
It’s the first thing I do whenever I meet a strange graph.
plot_graph(a,xy[:,1],xy[:,2])
generate coordinates by computing two eigenvectors, and using each as a coordinate. In Fig. 1.4,
we plot vertex a at position ψ 2 (a), ψ 3 (a), and again draw the edges as straight lines.
CHAPTER 1. INTRODUCTION 13 CHAPTER 1. INTRODUCTION 14
[0.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0,
15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
All Latin Square Graphs of the same size have the same eigenvalues, whether or not they are
isomorphic. We will learn some surprisingly fast (but still not polynomial time) algorithms for
checking whether or not Strongly Regular Graphs are isomorphic.
Of course, some graphs are not meant to be drawn in two dimensions. For example let’s try
drawing the skeleton of the dodecahedron using ψ 2 and ψ 3 .
Random.seed!(3)
p = randperm(size(a,1))
M = a[p,p]
E = eigen(Matrix(lap(M)))
V = E.vectors[:,2:3]
plot_graph(M,V[:,1],V[:,2], dots=false);
Note that this picture is slightly different from the previous one: it has flipped vertically. That’s
because eigenvectors are only determined up to signs, and that’s only if they have multiplicity 1.
This gives us a powerful heuristic for testing if one graph is a permutation of another (this is the
famous “Graph Isomorphism Testing Problem”). First, check if the two graphs have the same sets
of eigenvalues. If they don’t, then they are not isomorphic. If they do, and the eigenvalues have
multiplicity one, then draw the pictures above. If the pictures are the same, up to horizontal or M = read_graph("dodec.txt")
vertical flips, and no vertex is mapped to the same location as another, then by lining up the E = eigen(Matrix(lap(M)))
pictures we can recover the permutation. x = E.vectors[:,2]
y = E.vectors[:,3]
As some vertices can map to the same location, this heuristic doesn’t always work. We will learn plot_graph(M, x, y; setaxis=false);
about the extent to which it does. In particular, we will see in Chapter 39 that if every eigenvalue
of two graphs G and H has multiplicity 1, then we can efficiently test whether or not they are
isomorphic.
Figure 1.5: The 1-skeleton of the dodecahedron.
These algorithms have been extended to handle graphs in which the multiplicity of every
eigenvalue is bounded by a constant [BGM82]. But, there are graphs in which every non-trivial You will notice that this looks like what you would get if you squashed the dodecahedron down to
eigenvalue has large multiplicity. In Chapter 9 We will learn how to construct and analyze some, the plane. The reason is that we really shouldn’t be drawing this picture in two dimensions: the
as they constitute fundamental examples and counter-examples to many natural conjectures. For smallest non-zero eigenvalue of the Laplacian has multiplicity three.
example, here are the eigenvalues of a Latin Square Graph on 25 vertices. These are a type of
Strongly Regular Graph. E = eigen(Matrix(lap(M)))
E.values’
M = latin_square_graph(5);
println(eigvals(Matrix(lap(M)))) 3.55271e-15 0.763932 0.763932 0.763932 2.0 2.0 2.0 2.0 2.0
3.0 3.0 3.0 3.0 5.0 5.0 5.0 5.0 5.23607 5.23607 5.23607
CHAPTER 1. INTRODUCTION 15 CHAPTER 1. INTRODUCTION 16
So, we can’t reasonably choose just two eigenvectors. We should be choosing three that span the
eigenspace. When we do, we get the canonical representation of the dodecahedron in three
dimensions.
x = E.vectors[:,20]
y = E.vectors[:,19]
z = E.vectors[:,18]
plot_graph(M, x, y, z; setaxis=false);
may be removed by cutting many fewer than |Si | edges. This spectral graph partitioning heuristic Error-correcting codes and expander graphs are both fundamental objects of study in the field of
has proved very successful in practice. extremal combinatorics and are extremely useful. We will also use error-correcting codes to
construct crude expander graphs. In Chapter 30 we will see a simple construction of good
In general, it will be interesting to turn qualitative statements like “G is connected if and only if
expanders. The best expanders are the Ramanujan graphs. These were first constructed by
λ2 > 0” into quantitative ones. For example, the smallest eigenvalue of the diffusion matrix is
Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88]. In Chapters 44 and 43 we will
zero if and only if the graph is bipartite. Trevisan [Tre09] showed that the magnitude of this
prove that there exist infinite families of bipartite Ramanujan graphs.
eigenvalue is related to how far a graph is from being bipartite.
We will prove that graphs that can be drawn nicely must have small Fiedler value, and we will
1.5.10 Solving equations in and computing eigenvalues of Laplacians
prove very tight results for planar graphs (Chapter 25).
In Chapter 15 we will see how to use the graph Laplacian to draw planar graphs: Tutte [Tut63] We will also ask how well a graph can be approximated by a tree, and see in Chapter 36 that
showed that if one reasonably fixes the locations of the vertices on a face of a planar graph and low-stretch spanning-trees provide good approximations under this measure.
then lets the others settle into the positions obtained by treating the edges as springs, then one
Our motivation for this material is the need to design fast algorithms for solving systems of linear
obtains a planar drawing of the graph!
equations in Laplacian matrices and for computing their eigenvectors. This first problem arises in
numerous contexts, including the solution of elliptic PDEs by the finite element method, the
1.5.7 Random Walks on Graphs solution of network flow problems by interior point algorithms, and in classification problems in
Machine Learning.
Spectral graph theory is one of the main tools we use for analyzing random walks on graphs. We In fact, our definition of graph approximation is designed to suit the needs of the Preconditioned
will devote a few chapters to this theory, connect it to Cheeger’s inequality, and use tools Conjugate Gradient algorithm.
developed to study random walks to derive a fascinating proof of Cheeger’s inequality
(Chapter 16).
1.5.11 Advice on reading this book
1.5.8 Expanders Throughout this book, we have tried to strike a balance between the simplicity and generality of
the results that we prove. But, whenever you want to understand a proof, you should try to make
We will be particularly interested in graphs that are very well connected. These are called as many simplifying assumptions as are reasonable. For example, that the graph under
expanders. Roughly speaking, expanders are sparse graphs (say having a number of edges linear consideration is connected, all of its edges have weight 1, and that of its eigenvalues have
in the number of vertices), in which λ2 is bounded away from zero by a constant. They are among multiplicity one.
the most important examples of graphs, and play a prominent role in Theoretical Computer
Science. When seeking generalizations of the material in this book, you should consult the source material
of the notes at the end of each chapter.
Expander graphs have numerous applications. We will see how to use random walks on expander
graphs to construct pseudo-random generators about which one can actually prove something. We
will also use them to construct good error-correcting codes.
CHAPTER 1. INTRODUCTION 19 CHAPTER 1. INTRODUCTION 20
Let Π be a permutation matrix. That is, there is a permutation π : V → V so that where λ1 , . . . , λn are the eigenvalues of A. You are probably familiar with this fact about the
( trace, or it may have been the definition you were given. This is why I want you to remember
1 if u = π(v), and how to prove it.
Π(u, v) =
0 otherwise.
7. The Characteristic Polynomial
Prove that if
Let M be a symmetric matrix. Recall that the eigenvalues of M are the roots of the
M ψ = λψ,
characteristic polynomial of M :
then n
Y
ΠM ΠT (Πψ) = λ(Πψ). def
p(x) = det(xI − M ) = (x − µi ).
That is, permuting the coordinates of the matrix merely permutes the coordinates of the i=1
eigenvectors, and does not change the eigenvalues.
Write
n
X
3. Invariance under rotations.
p(x) = xn−k ck (−1)k .
Let Q be an orthogonal matrix. That is, a matrix such that Q T Q = I . Prove that if k=0
Prove that X
M ψ = λψ,
ck = det(M (S, S)).
then
S⊆[n],|S|=k
T
QM Q (Qψ) = λ(Qψ). Here, we write [n] to denote the set {1, . . . , n}, and M (S, S) to denote the submatrix of M with
rows and columns indexed by S.
4. Similar Matrices.
8. Reversing products.
A matrix M is similar to a matrix B if there is a non-singular matrix X such that
X −1 M X = B. Prove that similar matrices have the same eigenvalues. Let M be a d-by-n matrix. Prove that the multiset of nonzero eigenvalues of M M T is the same
as the multiset of nonzero eigenvalues of M T M .
5. Spectral decomposition.
Let M be a symmetric matrix with eigenvalues λ1 , . . . , λn and let ψ 1 , . . . , ψ n be a corresponding
set of orthonormal column eigenvectors. Let Ψ be the orthogonal matrix whose ith column is ψ i .
Prove that
ΨT M Ψ = Λ,
where Λ is the diagonal matrix with λ1 , . . . , λn on its diagonal. Conclude that
X
M = ΨΛΨT = λi ψ i ψ Ti .
i∈V
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 22
where the maximization and minimization are over subspaces S and T of IRn .
The corresponding eigenvectors satisfy
and for 2 ≤ k ≤ n
Chapter 2
ψ k ∈ arg min xTM x, and ψ k ∈ arg max xTM x. (2.2)
∥x ∥=1 ∥x ∥=1
x T ψ j =0, for j > k x T ψ j =0, for j < k
21
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 23 CHAPTER 2. EIGENVALUES AND OPTIMIZATION 24
Then, So, 2
n
X xTM x
xTM x = c2i µi . min ≥ µk .
x ∈S xTx
i=1
To show that this is in fact the maximum, we will prove that for all subspaces S of dimension k,
Proof. Compute:
!T xTM x
X X min ≤ µk .
x ∈S xTx
xTM x = ci ψ i M cj ψ j
i j Let T be the span of ψ k , . . . , ψ n . As T has dimension n − k + 1, every subspace S of dimension k
!T
has an intersection with T of dimension at least 1. In particular, S ∩ T is non-empty so we can
X X
= ci ψ i cj M ψ j write
xTM x xTM x xTM x
i j min ≤ min ≤ max .
!T x ∈S x T x x ∈S∩T x T x x ∈T xTx
X X
= ci ψ i cj µj ψ j Let x be a vector in T at which this maximum is achieved, and expand x in the form
i j n
X X
= ci cj µj ψ Ti ψ j x = ci ψ i ,
i,j i=k
X
= c2i µi , Again applying Lemma 2.1.1 we obtain
i Pn P
xTM x µi c2i µk n c2
as ( T
= Pi=k
n 2 ≤ Pn i=k 2 i = µk .
x x i=k ci i=k ci
0 for i ̸= j
ψ Ti ψ j =
1 for i = j. As ψ Tk M ψ k = µk , and T = x : x T ψ j = 0, for j < k ,
Proof. We first observe that the maximum is achieved: as the Rayleigh quotient is homogeneous, Theorem 2.2.3. For every real symmetric matrix M of rank r, there exist non-zero real numbers
it suffices to consider unit vectors x . As the Rayleigh quotient is continuous on the set of unit µ1 , . . . , µr and orthonormal vectors ψ 1 , . . . , ψ r such that
vectors and this set is closed and compact, the maximum is achieved on this set3 .
r
X
Now, let x be any non-zero vector that maximizes the Rayleigh quotient. We recall that the M = µi ψ i ψ Ti . (2.4)
gradient of a function at its maximum must be the zero vector. Let’s compute that gradient. i=1
We have4
The content of this theorem is equivalent to that of Theorem 1.3.1, because multiplying Eq. (2.4)
∇x T x = 2x ,
on the right by ψ i gives M ψ i = µi ψ i .
and
We first recall an elementary property of symmetric matrices.
∇x T M x = 2M x .
So, Theorem 2.2.4. The span of a symmetric matrix is orthogonal to its nullspace.
xTM x (x T x )(2M x ) − (x T M x )(2x )
∇ T = .
x x (x T x )2 Proof. Let M be a symmetric matrix. Recall that its span is the set of vectors of the form M x ,
In order for this to be zero, we must have and its nullspace is the set of vectors z for which M z = 0. For y = M x , we have
(x T x )M x = (x T M x )x , z T y = z T M x = (z T M )x = 0T x = 0,
If x T M x > 0, we can use Theorem 2.2.1 to obtain an eigenvector ψ of M with eigenvalue µ > 0. c x = M x − µψψ T x = 0 − ψ0 = 0.
M
If x T M x < 0, then apply Theorem 2.2.1 to −M to obtain an eigenvector ψ of −M with
eigenvalue ν > 0, and observe that ψ as an eigenvector of M with eigenvalue µ = −ν < 0. c forces ψ to lie in its
As µ ̸= 0, ψ is not in the nullspace of M . However, our construction of M
nullspace:
We now give a self-contained proof of the Spectral Theorem for symmetric matrices. The idea of c ψ = M ψ − µψ(ψ T ψ) = µψ − µψ = 0,
M
the proof is to use Theorem 2.2.1 to obtain ψ 1 and µ1 , and then proceed by induction.
where the second equality uses the fact that ψ is a unit vector.
3
Here’s an explanation for those not familiar with analysis: we need to avoid the situation in which there are x
on which the function is arbitrarily close to its maximum, but there are none on which it is achieved. We also need c . Together with the fact that
Now, Theorem 2.2.4 tells us that ψ is orthogonal to the span of M
to avoid the situation in which the maximum is undefined. These conditions guarantee that the maximum is defined M =M c + µψψ T , this implies that the span of M equals the span of Mc and ψ, and thus is 1
and that there is a unit vector x at which it is achieved. You can read almost all of this book without knowing c.
dimensional larger than the span of M
analysis, as long as you are willing to accept this result.
4
In case you are not used to computing gradients of functions of vectors, you can derive these directly by reasoning
like Proof of Theorem 2.2.3. We proceed by induction on the rank of M . If M is the zero matrix,
∂ ∂ X
xT x = x (b)2 = 2x (a). then the theorem is trivial.
∂x (a) ∂x (a)
b
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 27 CHAPTER 2. EIGENVALUES AND OPTIMIZATION 28
We now assume that the theorem has been proved for all matrices of rank r, and prove it for Theorem 2.3.2. Let A be an arbitrary real m-by-n matrix, and let σ1 ≥ . . . ≥ σr be its singular
matrices of rank r + 1. Let M be a symmetric matrix of rank r + 1. We know from Corollary 2.2.2 values, where r = min(m, n). Then,
c = M − µψψ T .
that there is a unit eigenvector ψ of M of eigenvalue µ ̸= 0. Let M
σ1 = max u T Av ,
By Lemma 2.2.5, the rank of M c is r. Our inductive hypothesis now implies that there are ∥u∥=1
∥v ∥=1
orthonormal vectors ψ 1 , . . . , ψ r and non-zero µ1 , . . . , µr such that
r
X and
c =
M µi ψ i ψ Ti . σk = max min u T Av ,
dim(S)=k u∈S,v ∈T
i=1
dim(T )=k ∥u∥=1,∥v ∥=1
Setting ψ r+1 = ψ and µr+1 = µ, we have
where in the minima above, u ∈ IRm , v ∈ IRn , S is a subspace of IRm and T is a subspace of IRn .
r+1
X
M = µi ψ i ψ Ti .
i=1 2.4 Exercise
c and ψ r+1 is in
To show that ψ r+1 is orthogonal to ψ i for i ≤ r, note that ψ i is in the span of M
its nullspace. 1. Prove Theorem 2.3.2.
2. A tighter characterization.
2.3 Singular Values for Asymmetric Matrices Tighten Theorem 2.2.3 by proving that for every sequence of vectors x 1 , . . . , x n such that
The characterization of eigenvalues by maximizing or minimizing the Rayleigh quotient only x i ∈ arg max xTM x,
∥x ∥=1
works for symmetric matrices. The analogous quantities for non-symmetric matrices A are the xT x j =0,for j<i
singular vectors and singular values of A, which are the eigenvectors of AAT and AT A, and the
square roots of the eigenvalues of those matrices. each x i is an eigenvector of M .
Definition 2.3.1. The singular value decomposition of a matrix A is an expression of the form
A = U ΣV T ,
where U and V are matrices with orthonormal columns and Σ is a diagonal matrix with
non-negative entries. The diagonal entries of Σ are the singular values of A, and the columns of
U and V are its left and right singular vectors.
Even rectangular matrices have singular value decompositions. If A is an m-by-n matrix and
r = min(m, n), we can assume that Σ is square of dimension r, and that U and V are m-by-r
and n-by-r matrices with orthonormal columns. Let σ1 ≥ . . . ≥ σr be the diagonal entries of Σ,
and let u 1 , . . . , u r and v 1 , . . . , v r be the columns of U and V . Then, the above decomposition
can be written
Xr
A= σi u i v Ti .
i=1
As the columns of V are orthonormal, it follows that
Av i = σi u i
for any singular vector v i .
We can use techniques similar to those we used to prove the Courtant-Fischer Theorem to obtain
the following characterization of the singular values.
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 30
This is the matrix that is zero except at the intersection of rows and columns indexed by a and b,
where it looks looks like
1 −1
.
−1 1
Summing the matrices for every edge, we obtain
X X
Chapter 3 LG = wa,b (δ a − δ b )(δ a − δ b )T = wa,b LGa,b .
(a,b)∈E (a,b)∈E
You should check that this agrees with the definition of the Laplacian from Section 1.2.3:
The Laplacian and Graph Drawing LG = D G − AG ,
where X
D G (a, a) = wa,b .
3.1 The Laplacian Matrix b
This formula turns out to be useful when we view the Laplacian as an operator. For every vector
We begin this section by establishing the equivalence of multiple expressions for the Laplacian. x we have X X
The Laplacian Matrix of a weighted graph G = (V, E, w), w : E → IR+ , is designed to capture the (LG x )(a) = d(a)x (a) − wa,b x (b) = wa,b (x (a) − x (b)), (3.3)
Laplacian quadratic form: (a,b)∈E (a,b)∈E
X P
x T LG x = wa,b (x (a) − x (b))2 . (3.1) because d(a) = (a,b)∈E wa,b .
(a,b)∈E
From (3.1), we see that if all entries of x are the same, then x T Lx equals zero. From (3.3), we
We will now use this quadratic form to derive the structure of the matrix. To begin, consider a can immediately see that L1 = 0, so the constant vectors are eigenvectors of eigenvalue zero. If
graph with just two vertices and one edge of weight 1. Let’s call it G1,2 . We have the graph is connected, these are the only eigenvectors of eigenvalue zero.
x T LG1,2 x = (x (1) − x (2))2 . (3.2) Lemma 3.1.1. Let G = (V, E) be a weighted graph, and let 0 = λ1 ≤ λ2 ≤ · · · ≤ λn be the
eigenvalues of its Laplacian matrix, L. Then, λ2 > 0 if and only if G is connected.
Consider the vector δ 1 − δ 2 , where δ a is the elementary unit vector with a 1 in coordinate a. We
have Proof. We first show that λ2 = 0 if G is disconnected. If G is disconnected, then it can be
x (1) − x (2) = δ T1 x − δ T2 x = (δ 1 − δ 2 )T x , described as the disjoint union of two graphs, G1 and G2 , with no edges between them. After
so suitably reordering the vertices, we can write
2 L 0
(x (1) − x (2))2 = (δ 1 − δ 2 )T x = x T (δ 1 − δ 2 ) (δ 1 − δ 2 )T x L = G1 .
0 LG2
1 1 −1
= xT 1 −1 x = x T x.
−1 −1 1 So, L has at least two orthogonal eigenvectors of eigenvalue zero:
Thus, 0 1
and .
1 −1 1 0
LG1,2 = .
−1 1
where we have partitioned the vectors as we did the matrix L.
Now, let Ga,b be the graph with just one edge between vertices a and b. It can have as many
On the other hand, assume that G is connected and that ψ is an eigenvector of L of eigenvalue 0.
other vertices as you like. The Laplacian of Ga,b can be written in the same way:
As
LGa,b = (δ a − δ b )(δ a − δ b )T . Lψ = 0,
29
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 31 CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 32
of vertices a and b are connected by a path, we may inductively apply this fact along the path to This turns out not to be so different from minimizing (3.4), as it equals
show that ψ(a) = ψ(b) for all vertices a and b. Thus, ψ must be a constant vector. We conclude X
that the eigenspace of eigenvalue 0 has dimension 1. (x (a) − x (b))2 + (y (a) − y (b))2 = x T Lx + y T Ly .
(a,b)∈E
√
We can assume that ψ 1 = 1/ n. For i ≤ k, ψ T1 x i = 0; so, Lemma 2.1.1 implies
n
X
x Ti Lx i = λj (ψ Tj x i )2
j=2
n
X n
X
= λk+1 + (λj − λk+1 )(ψ Tj x i )2 , by (ψ Tj x i )2 = 1
j=2 j=2 Chapter 4
k+1
X
≥ λk+1 + (λj − λk+1 )(ψ Tj x i )2 ,
j=2
as λj ≥ λk+1 for j > k + 1. This inequality is only tight when (ψ Tj x i )2 = 0 for all j such that
Adjacency matrices, Eigenvalue
λj > λk+1 .
Interlacing, and the Perron-Frobenius
Summing over i we obtain
k
X k+1
X k
X
Theorem
x Ti Lx i ≥ kλk+1 + (λj − λk+1 ) (ψ Tj x i )2
i=1 j=2 i=1
k+1
X
≥ kλk+1 + (λj − λk+1 ) In this chapter, we examine the meaning of the smallest and largest eigenvalues of the adjacency
j=2 matrix of a graph. Note that the largest eigenvalue of the adjacency matrix corresponds to the
k+1
X smallest eigenvalue of the Laplacian. Our focus in this chapter will be on the features that
= λj , adjacency matrices possess but which Laplacians do not. Where the smallest eigenvector of the
j=2 Laplacian is a constant vector, the largest eigenvector of an adjacency matrix, called the Perron
Pk T 2
vector, need not be. The Perron-Frobenius theory tells us that the largest eigenvector of an
where the second inequality follows from the facts that λj − λk+1 ≤ 0 and i=1 (ψ j x i ) ≤ 1.
adjacency matrix is non-negative, and that its value is an upper bound on the absolute value of
This inequality is tight under the same conditions as the previous one.
the smallest eigenvalue. These are equal precisely when the graph is bipartite.
We will examine the relation between the largest adjacency eigenvalue and the degrees of vertices
in the graph. This is made more meaningful by the fact that we can apply Cauchy’s Interlacing
The beautiful pictures that we sometimes obtain from Hall’s graph drawing should convince you Theorem to adjacency matrices. We will use it to prove a theorem of Wilf [Wil67] which says that
that eigenvectors of the Laplacian should reveal a lot about the structure of graphs. But, it is a graph can be colored using at most 1 + ⌊µ1 ⌋ colors. We will learn more about eigenvalues and
worth pointing out that there are many graphs for which this approach does not produce nice graph coloring in Chapter 19.
images, and there are in fact graphs that can not be nicely drawn. Expander graphs are good
examples of these.
Many other approaches to graph drawing borrow ideas from Hall’s work: they try to minimize 4.1 The Adjacency Matrix
some function of the distances of the edges subject to some constraints that keep the vertices well
separated. However, very few of these have compactly describable solutions, or even solutions Let M be the adjacency matrix of a (possibly weighted) graph G. As an operator, M acts on a
that can provably be computed in polynomial time. The algorithms that implement them vector x ∈ IRV by X
typically use a gradient based method to attempt to minimize the function of the distances (M x )(a) = wa,b x (b). (4.1)
subject to constraints, but can not guarantee that they actually minimize it. For many of these (a,b)∈E
methods, relabeling the vertices could produce very different drawings! Thus, one must be careful
before using these images to infer some truth about a graph. When the edge set E is understood,
P we use the notation a ∼ b as shorthand for (a, b) ∈ E. Thus,
we may write (M x )(a) = b∼a wa,b x (b).
We will denote the eigenvalues of M by µ1 , . . . , µn . But, we order them in the opposite direction
34
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 35 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 36
than we did for the Laplacian: we assume Proof. If we have equality in (4.2), then it must be the case that d(a) = dmax and ϕ1 (b) = ϕ1 (a)
for all (a, b) ∈ E. Thus, we may apply the same argument to every neighbor of a. As the graph is
µ1 ≥ µ2 ≥ · · · ≥ µ n . connected, we may keep applying this argument to neighbors of vertices to which it has already
been applied to show that ϕ1 (c) = ϕ1 (a) and d(c) = dmax for all c ∈ V .
The reason for this convention is so that µi corresponds to the ith Laplacian eigenvalue, λi . If G
is a d-regular graph, then D = I d,
The following is a commonly used extension of Lemma 4.2.1.
L = dI − M ,
and so Theorem 4.2.3. Let M be an arbitrary n-by-n matrix with complex entries, and let λ be an
λi = d − µ i . eigenvalue of M . Then,
X def
|λ| ≤ max |M (a, b)| = ∥M ∥∞ .
a
Thus the largest adjacency eigenvalue of a d-regular graph is d, and its corresponding eigenvector b
is the constant vector. We could also prove that the constant vector is an eigenvector of
eigenvalue d by considering the action of M as an operator (4.1): if x (a) = 1 for all a ∈ V , then Proof. Let ϕ be an eigenvector of M with eigenvalue λ, and let a be the index on which ϕ has
(M x )(b) = d for all b ∈ V . largest magnitude. We have
X
|λ| |ϕ(a)| = |λϕ(a)| = |(M ϕ)(a)| = M (a, b)ϕ(b)
4.2 The Largest Eigenvalue, µ1 b
X X X
≤ |M (a, b)| |ϕ(b)| ≤ |M (a, b)| |ϕ(a)| = |ϕ(a)| |M (a, b)| .
We now examine µ1 for graphs which are not necessarily regular. Let G be a graph, let dmax be b b b
the maximum degree of a vertex in G and let dave be the average degree of a vertex in G. We will P
So, |λ| ≤ b |M (a, b)|.
show that µ1 lies between these. This naturally holds when we measure degrees by weight.
Lemma 4.2.1. For a weighted graph G with n vertices, we define There are graphs for which the bounds in Lemma 4.2.1 are very loose. Consider a large complete
d-ary
√ tree, T . It will have maximum degree d, an average degree just a little under 2, and µ1 close
1X
def def
dave = d (a) and dmax = max d (a). to 2 d − 1. The following theorem establishes a tight upper bound on µ1 by re-scaling the matrix
n a a
to average out the low and high degree vertices. We save the lower bound for an exercise.
If µ1 is the largest adjacency eigenvalue of G, then Theorem 4.2.4. Let d ≥ 2 and let T be a tree in which every √ vertex has degree at most d. Then,
all adjacency eigenvalues of T have absolute value at most 2 d − 1.
dave ≤ µ1 ≤ dmax .
Proof. Let M be the adjacency matrix of T . Choose some vertex to be the root of the tree, and
Proof. The lower bound follows by considering the Rayleigh quotient of 1 with respect to M : define its height to be 0. For every other vertex a, define its height, h(a), to be its distance to the
P root. Define D to be the diagonal matrix with
xTM x 1T M 1 1T d a d (a)
µ1 = max ≥ = = = dave . √ h(a)
x xTx 1T 1 n n
D(a, a) = d−1 .
To prove the upper bound, let ϕ1 be an eigenvector of eigenvalue µ1 . We may assume without Recall that multiplying a matrix by a diagonal matrix from the left scales its rows, that
loss of generality that ϕ1 has a positive value, because we can replace it by −ϕ1 if it does not. multiplying by a diagonal matrix from the right scales its columns, and that the eigenvalues of M
Let a be the vertex on which ϕ1 takes its maximum value. We show that µ1 ≤ d (a): are the same as the eigenvalues of DM D −1 , Theorem 4.2.3 tells us that the magnitude of these
1 1 X 1 X eigenvalues is at most the largest row sum of DM D −1 .
µ1 = (M ϕ1 )(a) = wa,b ϕ1 (b) ≤ wa,b ϕ1 (a) = d (a) ≤ dmax . (4.2) √
ϕ1 (a) ϕ1 (a)
b:b∼a
ϕ1 (a)
b:b∼a
So, we need to prove that all row sums of DM D −1 are at most 2 d − 1. There are √ three types
of vertices√to consider.
√ First, the row of the root has up to d entries that are all 1/ d − 1. For
d ≥ 2, d/ d − √ 1 ≤ 2 d − 1. Second, every leaf has exactly one nonzero entry in its row,
√ and that
entry equals d − 1 The rest of the vertices
√ their row that equals d − 1, and
have one entry in √
Lemma 4.2.2. If G is connected and µ1 = dmax , then G is dmax -regular.
up to d − 1 entries that are equal to 1/ d − 1, for a total of 2 d − 1.
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 37 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 38
4.3 Eigenvalue Interlacing Proof. If M is the adjacency matrix of G, then M (S) is the adjacency matrix of G(S). Lemma
4.2.1 says that dave (S) is at most the largest eigenvalue of the adjacency matrix of G(S), and
We can strengthen the lower bound in Lemma 4.2.1 by proving that µ1 is at least the average Theorem 4.3.1 says that this is at most µ1 .
degree of every subgraph of G. We will prove this by applying Cauchy’s Interlacing Theorem. We can also prove this without using Cauchy’s Interlacing Theorem. Consider the Rayleigh
For a graph G = (V, E) and S ⊂ V , we define the vertex-induced subgraph induced by S, written quotient in the characteristic vector of S, 1S , where
G(S), to be the graph with vertex set S and all edges in E connecting vertices in S: (
1 if a ∈ S
1S (a) =
{(a, b) ∈ E : a ∈ S and b ∈ S} . 0 otherwise.
For a symmetric matrix M whose rows and columns are indexed by a set V and subset S of V , Theorem 2.2.1 tells us that µ1 is at least
we write M (S) for the symmetric submatrix with rows and columns in S. P
1TS M 1S a,b∈S,(a,b)∈E wa,b
Theorem 4.3.1 (Cauchy’s Interlacing Theorem). Let A be an n-by-n symmetric matrix and let = = dave (S).
1TS 1S |S|
B be a principal submatrix of A of dimension n − 1 (that is, B is obtained by deleting the same
row and column from A). Then,
If we remove the vertex of smallest degree from a graph, the average degree can increase. On the
α1 ≥ β1 ≥ α2 ≥ β2 ≥ · · · ≥ αn−1 ≥ βn−1 ≥ αn , other hand, Cauchy’s Interlacing Theorem says that µ1 can only decrease when we remove a
vertex.
where α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn−1 are the eigenvalues of A and B, respectively.
Lemma 4.3.2 is a good demonstration of Cauchy’s Theorem. But, using Cauchy’s Theorem to
prove it was overkill. A more direct way to prove it is to emulate the proof of Lemma 4.2.1, but
Proof. Without loss of generality we will assume that B is obtained from A by removing its first computing the quadratic form in the characteristic vector of S instead of 1.
row and column. We now apply the Courant-Fischer Theorem, which tells us that
x T Ax
αk = max min
S⊆IRn x ∈S xTx
. 4.4 Wilf ’s Theorem
dim(S)=k
We now apply Lemma 4.3.2 to obtain an upper bound on the chromatic number of a graph.
Applying this to B gives Recall that a coloring of a graph is an assignment of colors to vertices in which adjacent vertices
T have distinct colors. A graph is said to be k-colorable if it can be colored with only k colors1 . The
0 0 chromatic number of a graph, written χ(G), is the least k for which G is k-colorable. The
A
x T Bx x x bipartite graphs are exactly the graph of chromatic number 2.
βk = max min T = max min .
S⊆IRn−1 x ∈S x x S⊆IRn−1 x ∈S xTx
dim(S)=k dim(S)=k It is easy to show that every graph is (dmax + 1)-colorable. Assign colors to the vertices
one-by-one. As each vertex has at most dmax neighbors, there is always some color one can assign
We see that the right-hand expression is a maximum over the special family of subspaces of IRn of that vertex that is different than those assigned to its neighbors. The following theorem of Wilf
dimension k in which all vectors in the subspaces must have first coordinate 0. As the maximum [Wil67] improves upon this bound.
over all subspaces of dimension k can only be larger,
Recall that for a real number x, ⌊x⌋ is the largest integer less than or equal to x.
αk ≥ βk .
Theorem 4.4.1.
χ(G) ≤ ⌊µ1 ⌋ + 1.
We may prove the inequalities in the other direction, such as βk ≥ αk+1 , by replacing A and B
with −A and −B, or by using the other characterization of eigenvalues in the Courant-Fischer
Proof. We prove this by induction on the number of vertices in the graph. To ground the
Theorem as a minimum over subspaces.
induction, consider the graph with one vertex and no edges. It has chromatic number 1 and
Lemma 4.3.2. For any S ⊆ V , let dave (S) be the average degree of G(S). Then, 1
To be precise, we often identify these k colors with the integers 1 through k. A k-coloring is then a function
c : {1, . . . , k} → V such that c(a) ̸= c(b) for all (a, b) ∈ E.
dave (S) ≤ µ1 .
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 39 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 40
largest adjacency eigenvalue zero2 . Now, assume the theorem is true for all graphs on n − 1 Lemma 4.5.2. Under the conditions of Theorem 4.5.1, assume that some non-negative vector ϕ
vertices, and let G be a graph on n vertices. Let a be the vertex of minimum degree in G. is an eigenvector of M . Then, ϕ is strictly positive.
Lemma 4.2.1 tells us that the average degree of a vertex of G is at most a is at most µ1 , and so
the degree of a can be no larger. As the degree of a must be an integer, it can be at most ⌊µ1 ⌋. Proof. If ϕ is not strictly positive, there is some vertex a for which ϕ(a) = 0. As G is connected,
Let S = V \ {a}. By Lemma 4.3.2, the largest eigenvalue of G(S) is at most µ1 , and so our and ϕ is not identically zero, there must be some edge (b, c) for which ϕ(b) = 0 but ϕ(c) > 0. Let
induction hypothesis implies that G(S) has a coloring with at most ⌊µ1 ⌋ + 1 colors. Let c be any µ be the eigenvalue of ϕ. As ϕ(b) = 0, we obtain a contradiction from
such coloring. We just need to show that we can extend c to a. As a has at most ⌊µ1 ⌋ neighbors, X
there is some color in {1, . . . , ⌊µ1 ⌋ + 1} that does not appear among its neighbors, and which it 0 = µϕ(b) = (M ϕ)(b) = wb,z ϕ(z) ≥ wb,c ϕ(c) > 0,
may be assigned. Thus, G has a coloring with ⌊µ1 ⌋ + 1 colors. (b,z)∈E
The simplest example in which this theorem improves over the naive bound of dmax + 1 is the where the inequalities follow from the fact that the terms wb,z and ϕ(z) are non-negative.
path graph on 3 vertices: it has dmax = 2 but µ1 < 2. Thus, Wilf’s theorem tells us that it can be So, we conclude that ϕ must be strictly positive.
colored with 2 colors, which is tight.
Star graphs provide more extreme examples. A star graph with n vertices has dmax = n − 1 but Proof of Theorem 4.5.1. Let ϕ1 be an eigenvector of µ1 of norm 1, and construct the vector x
√
µ1 = n − 1. So, the upper bound on the chromatic number given by Wilf’s theorem is much such that
better than the naive dmax + 1. But its is far from the actual chromatic number of a star graph, 2. x (u) = |ϕ1 (u)| , for all u.
We will show that x is an eigenvector of eigenvalue µ1 .
c. µ1 > µ2 . µ2 = ϕT2 M ϕ2 ≤ y T M y ≤ µ1 .
Finally, we show that for a connected graph G, µn = −µ1 if and only if G is bipartite. In fact, if The multiset of eigenvalues of M is comprised of t − s zeros and {±σ1 , . . . , ±σs } where σ1 , . . . , σs
µn = −µ1 , then µn−i = −µi+1 for every i. are the singular values of B. The matrix of form M is often referred to as the dilation of B.
Proposition 4.5.3. If G is a connected graph and µn = −µ1 , then G is bipartite. For a treatment of the general Perron-Frobenius theory, we recommend Seneta [Sen06] or Bapat
and Raghavan [BR97].
Proof. Consider the conditions necessary to achieve equality in (4.3). First, y must be an
eigenvector of eigenvalue µ1 . Thus, y must be strictly positive, ϕn can not have any zero values,
and there must be an edge (a, b) for which ϕn (a) < 0 < ϕn (b). It must also be the case that all of 4.7 Exercises
the terms in X
M (a, b)ϕn (a)ϕn (b) 1. Trees.
(a,b)∈E
Prove that the bound prove in Theorem 4.2.4 is tight. Begin by figuring out how to state that
have the same sign, and we have established that this sign must be negative. Thus, for every edge formally.
(a, b), ϕn (a) and ϕn (b) must have different signs. That is, the signs provide the bipartition of the
vertices.
Proposition 4.5.4. If G is bipartite then the eigenvalues of its adjacency matrix are symmetric
about zero.
Proof. As G is bipartite, we may divide its vertices into sets S and T so that all edges go between
S and T . Let ϕ be an eigenvector of M with eigenvalue µ. Define the vector x by
(
ϕ(a) if a ∈ S, and
x (a) =
−ϕ(a) if a ∈ T .
If G is bipartite, with all edges going between the sets S and T , then the adjacency matrix of G
can be written in the form
0 B
M = T ,
B 0
where the first set of rows and columns are indexed by S and the second by T . Instead of
examining the entire matrix M , we can instead understand G in terms of the spectral properties
of B. But, B is not necessarily symmetric nor even square. Instead of its eigenvalues, we consider
its singular values and singular vectors, defined in Section 2.3.
Returning to bipartite graphs, let s = |S|, t = |T |, and assume without loss of generality that
s ≤ t. One can show that the singular values of B are the square roots of the eigenvalues of BB T .
Chapter 5
Fundamental Graphs
Part II We will bound and derive the eigenvalues of the Laplacian matrices of some fundamental graphs,
including complete graphs, star graphs, ring graphs, path graphs, and products of these that yield
grids and hypercubes. As all these graphs are connected, they all have eigenvalue zero with
multiplicity one. We will have to do some work to compute the other eigenvalues.
The Zoo of Graphs We will see in Part IV that the Laplacian eigenvalues that reveal the most about a graph are the
smallest and largest ones. To interpret the smallest eigenvalues, we will exploit a relation between
λ2 and the isoperimetric ratio of a graph that is derived in Chapter 20, and which we state here
for convenience. The boundary of a set of vertices S, written ∂(S), is the set of edges with exactly
one endpoint in S. The isoperimetric ratio of S, written θ(S), is the size of the boundary of S
divided by size of S:
def |∂(S)|
θ(S) = .
|S|
In Theorem 20.1.1, we show
θ(S) ≥ λ2 (1 − s). (5.1)
Lemma 5.1.1. The Laplacian of Kn has eigenvalue 0 with multiplicity 1 and n with multiplicity
n − 1.
Proof. To compute the non-zero eigenvalues, let ψ be any non-zero vector orthogonal to the all-1s
vector, so X
ψ(a) = 0. (5.2)
a
We now compute the first coordinate of LKn ψ. Using (3.3), the expression for the action of the
43 44
CHAPTER 5. FUNDAMENTAL GRAPHS 45 CHAPTER 5. FUNDAMENTAL GRAPHS 46
Laplacian as an operator, we find Proof. Applying Lemma 5.2.1 to vertices i and i + 1 for 2 ≤ i < n, we find n − 2 linearly
n
independent eigenvectors of the form δ i − δ i+1 , all with eigenvalue 1. As 0 is also an eigenvalue,
X X
(LKn ψ) (1) = (ψ(1) − ψ(b)) = (n − 1)ψ(1) − ψ(b) = nψ(1), by (5.2). only one eigenvalue remains to be determined.
b≥2 b=2 Recall that the trace of a matrix equals both the sum of its diagonal entries and the sum of its
As the choice of coordinate was arbitrary, we have Lψ = nψ. So, every vector orthogonal to the eigenvalues. We know that the trace of LSn is 2n − 2, and we have identified n − 1 eigenvalues
all-1s vector is an eigenvector of eigenvalue n. that sum to n − 2. So, the remaining eigenvalue must be n.
Alternative approach. Observe that LKn = nI − 11T . To determine the corresponding eigenvector, recall that it must be orthogonal to the other
eigenvectors we have identified. This tells us that it must have the same value at each of the
points of the star. Let this value be 1, and let x be the value at vertex 1. As the eigenvector is
We often think of the Laplacian of the complete graph as being a scaling of the identity. For
orthogonal to the constant vectors, it must be that
every x orthogonal to the all-1s vector, Lx = nx .
Now, let’s see how our bound on the isoperimetric ratio works out. Let S ⊂ [n]. Every vertex in S (n − 1) + x = 0,
has n − |S| edges connecting it to vertices not in S. So,
so x = −(n − 1).
|S| (n − |S|)
θ(S) = = n − |S| = λ2 (LKn )(1 − s),
|S|
5.3 Products of graphs
where s = |S| /n. Thus, Theorem 20.1.1 is sharp for the complete graph.
We now define a product on graphs. If we apply this product to two paths, we obtain a grid. If
we apply it repeatedly to one edge, we obtain a hypercube.
5.2 The star graphs
Definition 5.3.1. Let G = (V, E, v) and H = (W, F, w) be weighted graphs. Then G × H is the
The star graph on n vertices, denoted Sn , has edge set {(1, a) : 2 ≤ a ≤ n}. The degrees of graph with vertex set V × W and edge set
vertices in star graphs vary considerably, and their Laplacian and adjacency matrices have very
different eigenvalues. a, b) with weight va,ba , where (a, b
(a, b), (b a) ∈ E and
To determine the eigenvalues of Sn , we first observe that each vertex a ≥ 2 has degree 1, and that
each of these degree-one vertices has the same neighbor. We now show that, in every graph, (a, b), (a, bb) with weight wb,bb , where (b, bb) ∈ F .
whenever two whenever two degree-one vertices share the same neighbor, they provide an
eigenvector of eigenvalue 1.
Lemma 5.2.1. Let G = (V, E) be a graph, and let a and b be vertices of degree one with the same
neighbor. Then, the vector ψ = δ a − δ b is an eigenvector of LG of eigenvalue 1.
Proof. Just multiply LG by ψ, and check (using (3.3)) vertex-by-vertex that it equals ψ.
While the star graph has n−12 pairs of degree-one vertices with the same neighbor, the span of
the space of the corresponding eigenvectors only has dimension n − 1.
As eigenvectors of different eigenvalues are orthogonal, for every eigenvector ψ of the Laplacian of
the star with eigenvalue other than 1 and for every a and b of degree one, we have
ψ T (δ a − δ b ) = 0, and thus that ψ(a) = ψ(b). Figure 5.1: An m-by-n grid graph is the product of a path on m vertices with a path on n vertices.
Lemma 5.2.2. The graph Sn has eigenvalue 0 with multiplicity 1, eigenvalue 1 with multiplicity This is a drawing of a 5-by-4 grid made using Hall’s algorithm from Section 3.2.
n − 2, and eigenvalue n with multiplicity 1.
CHAPTER 5. FUNDAMENTAL GRAPHS 47 CHAPTER 5. FUNDAMENTAL GRAPHS 48
Theorem 5.3.2. Let G = (V, E, v) and H = (W, F, w) be weighted graphs with Laplacian The eigenvectors of H1 are
eigenvalues λ1 , . . . , λn and µ1 , . . . , µm , and eigenvectors α1 , . . . , αn and β 1 , . . . , β m , respectively. 1 1
and ,
Then, for each 1 ≤ i ≤ n and 1 ≤ j ≤ m, the Laplacian of G × H has an eigenvector γ i,j of 1 −1
eigenvalue λi + µj such that with eigenvalues 0 and 2, respectively. Thus, if ψ is an eigenvector of Hd−1 with eigenvalue λ, then
γ i,j (a, b) = αi (a)β j (b).
ψ ψ
and ,
Proof. Let α be an eigenvector of LG of eigenvalue λ, let β be an eigenvector of LH of eigenvalue ψ −ψ
µ, and let γ be defined as above.
are eigenvectors of Hd with eigenvalues λ and λ + 2, respectively. This means that Hd has
To see that γ is an eigenvector of eigenvalue λ + µ, we compute eigenvalue 2i for each 0 ≤ i ≤ d with multiplicity di . Moreover, each eigenvector of Hd can be
X X identified with a vector y ∈ {0, 1}d :
(LG×H γ)(a, b) = va,ba (γ(a, b) − γ(b
a, b)) + wb,bb γ(a, b) − γ(a, bb)
Tx
(a,b
a)∈E (b,b
b)∈F ψ y (x ) = (−1)y ,
X X
= va,ba (α(a)β(b) − α(b
a)β(b)) + wb,bb α(a)β(b) − α(a)β(bb)
where x ∈ {0, 1} ranges over the vertices of Hd . Each y ∈ {0, 1}d−1 indexing an eigenvector of
d
(a,b
a)∈E (b,b
b)∈F
X X Hd−1 leads to the eigenvectors of Hd indexed by (y , 0) and (y , 1).
= va,ba β(b) (α(a) − α(b
a)) + wb,bb α(a) β(b) − β(bb)
Using Theorem 20.1.1 and the fact that λ2 (Hd ) = 2, we can immediately prove the following
(a,b
a)∈E (b,b
b)∈F
isoperimetric theorem for the hypercube.
= β(b)λα(a) + α(a)µβ(b)
Corollary 5.3.3.
= (λ + µ)(α(a)β(b)).
θHd ≥ 1.
In particular, for every set of at most half the vertices of the hypercube, the number of edges on
the boundary of that set is at least the number of vertices in that set.
An alternative approach to defining the graph product and proving Theorem 5.3.2 is via
Kronecker products. Recall that the Kronecker product A ⊗ B of an m1 -by-n1 matrix A and an This result is tight, as you can see by considering one face of the hypercube, such as all the
m2 -by-n2 matrix B is the (m1 m2 )-by-(n1 n2 ) matrix of form vertices whose labels begin with 0. It is possible to prove this by more concrete combinatorial
means. In fact, very precise analyses of the isoperimetry of sets of vertices in the hypercube can
A(1, 1)B A(1, 2)B · · · A(1, n1 )B
.. .. . be obtained. See [Har76] or [Bol86].
.. .. .
. . .
A(m1 , 1)B A(m1 , 2)B ··· A(m1 , n1 )B
5.4 Bounds on λ2 by test vectors
G × H is the graph with Laplacian matrix
(LG ⊗ I W ) + (I V ⊗ LH ). If we can guess an approximation of ψ 2 , we can often plug it in to the Laplacian quadratic form
to obtain a good upper bound on λ2 . The Courant-Fischer Theorem tells us that every vector v
orthogonal to 1 provides an upper bound on λ2 :
5.3.1 The Hypercube
v T Lv
d λ2 ≤ .
The d-dimensional hypercube graph, Hd , is the graph with vertex set {0, 1} that has edges vT v
between vertices whose names differ in exactly one coordinate. The hypercube may also be
When we use a vector v in this way, we call it a test vector.
expressed as the product of the one-edge graph with itself d − 1 times.
Let’s see what a natural test vector can tell us about λ2 of a path graph on n vertices. I would
Let H1 be the graph with vertex set {0, 1} and one edge between those vertices. Its Laplacian
like to use the vector that simply maps each vertex to its index on the path, but that vector is not
matrix has eigenvalues 0 and 2. As Hd = Hd−1 × H1 , we may use this to calculate the eigenvalues
and eigenvectors of Hd for every d.
CHAPTER 5. FUNDAMENTAL GRAPHS 49 CHAPTER 5. FUNDAMENTAL GRAPHS 50
orthogonal to 1. So, we will use the next best thing. Let x be the vector such that
1 1
P 2
1≤a<n (x(a) − x(a + 1))
0.6 0.6
λ2 (Pn ) ≤ P 2
0.4 0.4
P 2
1≤a<n 2
0 0
=P 2
−0.2 −0.2
4(n − 1)
(clearly, the denominator is at least n3 /c for some c)
−0.6 −0.6
=
(n + 1)n(n − 1)/3 −0.8 −0.8
12 −1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
= . (5.3)
n(n + 1)
(a) The ring graph on 9 vertices. (b) The eigenvectors for k = 2.
We will soon see that this bound is of the right order of magnitude. This means that the relation
between λ2 and the isoperimetric ratio given in Eq. (5.1) (and proved in Theorem 20.1.1) is very Figure 5.2:
loose for the path graph: The isoperimetric ratio is minimized by the set S = {1, . . . , n/2}, which
has θ(S) = 2/n. However, the lower bound provided by Eq. (5.1) is of the form c/n2 . Cheeger’s
inequality, which appears in Chapter 21, will tell us that the error of this approximation can not Note that x 0 is the all-ones vector. When n is even the vector x n/2 exists, and it alternates ±1.
be worse than quadratic.
Proof. We will first see that x 1 and y 1 are eigenvectors by drawing the ring graph on the unit
The Courant-Fischer theorem is not as helpful when we want to prove lower bounds on λ2 . To circle in the natural way: plot vertex a at point (cos(2πa/n), sin(2πa/n)).
prove lower bounds, we need the form with a maximum on the outside, which gives
You can see that the average of the neighbors of a vertex is a vector pointing in the same
v T Lv direction as the vector associated with that vertex. This should make it obvious that both the x
λ2 ≥ max min .
S:dim(S)=n−1 v ∈S vT v and y coordinates in this figure are eigenvectors of the same eigenvalue. The same holds for all k.
This is not too helpful, as it is difficult to prove lower bounds on Alternatively, we can verify that these are eigenvectors by a simple computation, recalling that
v T Lv cos(a + b) = cos a cos b − sin a sin b, and
min T
v ∈S v v
sin(a + b) = sin a cos b + cos a sin b.
over a space S of large dimension. We will see a technique that lets us prove such lower bounds
next section.
But, first we compute the eigenvalues and eigenvectors of the path graph exactly.
(LRn x k ) (a) = 2x k (a) − x k (a + 1) − x k (a − 1)
= 2 cos(2πka/n) − cos(2πk(a + 1)/n) − cos(2πk(a − 1)/n)
5.5 The Ring Graph = 2 cos(2πka/n) − cos(2πka/n) cos(2πk/n) + sin(2πka/n) sin(2πk/n)
− cos(2πka/n) cos(2πk/n) − sin(2πka/n) sin(2πk/n)
The ring graph on n vertices, Rn , may be viewed as having a vertex set corresponding to the = 2 cos(2πka/n) − cos(2πka/n) cos(2πk/n) − cos(2πka/n) cos(2πk/n)
integers modulo n. In this case, we view the vertices as the numbers 0 through n − 1, with edges
= (2 − 2 cos(2πk/n)) cos(2πka/n)
(a, a + 1), computed modulo n.
= (2 − cos(2πk/n))x k (a).
Theorem 5.5.1. The Laplacian of Rn has eigenvectors
The computation for y k follows similarly.
x k (a) = cos(2πka/n), for 0 ≤ k ≤ n/2, and
y k (a) = sin(2πka/n), for 1 ≤ k < n/2. Of course, rotating these pairs of eigenvectors results in another eigenvector of the same
Eigenvectors x k and y k have eigenvalue 2 − 2 cos(2πk/n).
CHAPTER 5. FUNDAMENTAL GRAPHS 51 CHAPTER 5. FUNDAMENTAL GRAPHS 52
eigenvalue. To see this algebraically, observe For any eigenvector ψ of R2n with eigenvalue λ for which ψ(a) = ψ(a + n) for 1 ≤ a ≤ n, the
above equation gives us a way to turn this into an eigenvector of Pn . Let ϕ ∈ IRn be the vector
cos(2πka/n + θ) = cos(2πka/n) cos θ − sin(2πka/n) sin θ for which
= (cos θ)x k − (sin θ)y k . ϕ(a) = ψ(a), for 1 ≤ a ≤ n.
and thus is in the span of x k and y k . Then,
In ϕ In ϕ In
ϕ= = ψ, LR2n ϕ = λψ = λ , and I n I n LR2n ψ = 2λϕ.
In ϕ In ϕ In
5.6 The Path Graph
So, if we can find such a vector ψ, then the corresponding ϕ is an eigenvector of Pn of eigenvalue
We will derive the eigenvalues and eigenvectors of the path graph from those of the ring graph. λ.
To begin, I will number the vertices of the ring a little differently, as in Figure 5.3.
As you’ve probably guessed, we can find such vectors ψ by rotating those derived in
Theorem 5.5.1. For each of the n − 1 two-dimensional eigenspaces of R2n , we get one such vector.
3 2 I’ve drawn Figure 5.3 so that the horizontal coordinate provides one.
For each 1 ≤ k < n, we rotate the eigenvectors x k and y k of R2n by θ = −πk/2n to obtain
4 1 ψ = (cos θ)x k − (sin θ)y k , so that, under the new ordering of the ring,
(
cos(2πka/(2n) − πk/2n) for 1 ≤ a ≤ n,
ψ(a) =
cos(2πk(2n + 1 − a)/(2n) − πk/2n) for n < a ≤ 2n.
Figure 5.3: The ring on 8 = 2n vertices, numbered differently. In this picture, vertex a appears We now set
above vertex n + a, and every edge appears above another edge. v k (a) = ψ(a) = cos(πka/n − πk/2n).
Proof. We derive the eigenvectors and eigenvalues by treating Pn as a quotient of R2n : we will The type of quotient used in the above argument is known as an equitable partition. You can find
identify vertex a of Pn with vertices a and a + n of R2n (under the new numbering of R2n ). These a extensive exposition of these in Godsil’s book [God93].
are pairs of vertices that are above each other in the figure that I drew.
Let I n be the n-dimensional identity matrix. You should check that
In
I n I n LR2n = 2LPn .
In
CHAPTER 6. COMPARING GRAPHS 54
which is equivalent to
v T Av ≥ v T Bv
for all v .
The relation ≽ is called the Loewner partial order. It is partial because it orders some pairs of
symmetric matrices, while others are incomparable. But, for all pairs to which it does apply, it
Chapter 6 acts like an order. For example, we have
A ≽ B and B ≽ C implies A ≽ C ,
It is rare than one can analytically determine the eigenvalues of an abstractly defined graph. if LG ≽ LH . When we write this, we are always describing an inequality on Laplacian matrices.
Often the best one can do is prove loose bounds on some eigenvalues. It is usually easy to prove
For example, if G = (V, E) is a graph and H = (V, F ) is a subgraph of G with the same vertex
lower bounds on the largest eigenvalues or upper bounds on the smallest eigenvalues: one need
set, then
merely compute the value of the quadratic form in a suitably chosen test vector, and then apply
LG ≽ LH .
the Courant-Fischer Theorem (see Theorem 2.0.1 and Section 5.4). Proving bounds in the other
direction is more difficult. To see this, recall the Laplacian quadratic form:
X
In this chapter we will see a powerful technique that allows one to compare one graph with x T LG x = wu,v (x (u) − x (v))2 .
another, and thereby prove things like lower bounds on the second-smallest eigenvalue of a (u,v)∈E
Laplacian. The technique often goes by the name “Poincaré Inequalities” (see
[DS91, SJ89, GLM99]), or “Graphic inequalities”. It is clear that dropping edges can only decrease the value of the quadratic form, because all
edges weights are positive. The same holds for decreasing the weights of edges.
This notation is particularly useful when we consider some multiple of a graph, such as when we
6.2 The Loewner order write
G ≽ c · H,
I begin by recalling an extremely useful piece of notation that is used in the optimization
for some c > 0. What is c · H? It is the same graph as H, but the weight of every edge is
community. For a symmetric matrix A, we write
multiplied by c. More formally, it is the graph such that Lc·H = cLH .
A≽0 We usually use this notation for the inequalities it implies on the eigenvalues of LG and LH .
if A is positive semidefinite. That is, if all of the eigenvalues of A are nonnegative. This is Lemma 6.2.1. For any c > 0, if G and H are graphs such that
equivalent to
v T Av ≥ 0, G ≽ c · H,
for all v . We similarly write then
A≽B λk (G) ≥ cλk (H),
if for all k.
A − B ≽ 0,
53
CHAPTER 6. COMPARING GRAPHS 55 CHAPTER 6. COMPARING GRAPHS 56
In Theorem 5.6.1 we proved that λ2 (Pn ) = 2(1 − cos(π/n)), which is approximately π 2 /n2 for
6.4 The Path Inequality large n. We now demonstrate the power of Lemma 6.4.1 by using it to prove a lower bound on
λ2 (Pn ) that is very close to this.
By now you should be wondering, “how do we prove that G ≽ c · H for some graph G and H?” To prove a lower bound on λ2 (Pn ), we will prove that some multiple of the path is at least the
Not too many ways are known. We’ll do it by proving some inequalities of this form for some of complete graph. To use this bound, we need to know the eigenvalues of the complete graph. In
the simplest graphs, and then extending them to more general graphs. For example, we will prove Lemma 5.1.1, we show that all the non-zero eigenvalues of the Laplacian of the complete graph
are n, and in particular λ2 (Kn ) = n.
(n − 1) · Pn ≽ G1,n , (6.1)
Let Ga,b denote the graph containing only edge (a, b), and write
where Pn is the path from vertex 1 to vertex n, and G1,n is the graph with just the edge (1, n). X
All of these edges are unweighted. LKn = LGa,b .
a<b
The following very simple proof of this inequality was discovered by Sam Daitch.
For every a < b, let Pa,b be the subgraph of the path graph induced on vertices with indices
Lemma 6.4.1. between a and b. Note that this a path graph of length b − a.
(n − 1) · Pn ≽ G1,n .
For every edge (a, b) in the complete graph, we apply the only inequality available in the path:
1
This inequality says that Ga,b is at most (b − a) times the part of the path connecting a to b, and 1
1
that this part of the path is less than the whole. 2 3
2 3
2 3
Summing inequality (6.2) over all edges (a, b) ∈ Kn gives
X X 4 7
5 6
Kn = Ga,b ≼ (b − a)Pn .
a<b a<b
Figure 6.1: T3 , T7 and T15 . Node 1 is at the top, 2 and 3 are its children. Some other nodes have
To finish the proof, we compute been labeled as well.
X n−1
X 0
(b − a) = c(n − c) = n(n + 1)(n − 1)/6.
1 −1
1≤a<b≤n c=1
1 −1
So, 1 −1
n(n + 1)(n − 1)
LKn ≼ LPn . 1 1 1 1 −1 −1 −1 −1
6
Rewriting this inequality in the form Figure 6.2: The test vector we use to upper bound λ2 (T15 ).
6
LPn ≽ LK ,
n(n + 1)(n − 1) n We will again prove a lower bound by comparing Tn to the complete graph. For each a < b, let
recalling that λ2 (Kn ) = n, and applying Lemma 6.2.1, gives us a pretty good lower bound on the T a,b denote the unique path in T from a to b. This path will have length at most 2d ≤ 2 log2 n.
second-smallest eigenvalue of Pn : So, we have
X X X
6 n
λ2 (Pn ) ≥ . Kn = Ga,b ≼ (2d)T a,b ≼ (2 log2 n)Tn = (2 log2 n)Tn .
(n + 1)(n − 1) 2
a<b a<b a<b
We now generalize the the inequality in Lemma 6.4.1 to weighted path graphs. Allowing for Theorem 6.7.1.
weights on the edges of the path greatly extends it applicability. λ2 (Tn ) ≥ 1/2n.
The Paley graph are Cayley graphs over the group of integer modulo a prime, p, where p is
equivalent to 1 modulo 4. Such a group is often written Z/p.
I should begin by reminding you a little about the integers modulo p. The first thing to remember
Chapter 7 is that the integers modulo p are actually a field, written Fp . That is, they are closed under both
addition and multiplication (completely obvious), have identity elements under addition and
multiplication (0 and 1), and have inverses under addition and multiplication. It is obvious that
the integers have inverses under addition: −x modulo p plus x modulo p equals 0. It is a little less
Cayley Graphs obvious that the integers modulo p have inverses under multiplication (except that 0 does not
have a multiplicative inverse). That is, for every x ̸= 0, there is a y such that xy = 1 modulo p.
When we write 1/x, we mean this element y.
The generators of the Paley graphs are the squares modulo p (usually called the quadratic
residues). That is, the set of numbers s such that there exits an x for which x2 ≡p s. Thus, the
7.1 Cayley Graphs vertex set is {0, . . . , p − 1}, and there is an edge between vertices u and v if u − v is a square
modulo p. I should now prove that −s is a quadratic residue if and only if s is. This will hold
Ring graphs and hypercubes are types of Cayley graphs. In general, the vertices of a Cayley provided that p is equivalent to 1 modulo 4. To prove that, I need to tell you one more thing
graph are the elements of some group Γ. In the case of the ring, the group is the set of integers about the integers modulo p: their multiplicative group is cyclic.
modulo n. The edges of a Cayley graph are specified by a set S ⊂ Γ, which are called the
generators of the Cayley graph. The set of generators must be closed under inversion. That is, if Fact 7.2.1. For every prime p, there exists a number g such that for every number x between 1
s ∈ S, then s−1 ∈ S. Vertices a, b ∈ Γ are connected by an edge if there is an s ∈ S such that and p − 1, there is a unique i between 1 and p − 1 such that
a ◦ s = b, x ≡ gi mod p.
where ◦ is the group operation. In the case of Abelian groups, like the integers modulo n, this In particular, g p−1 ≡ 1. And, the mapping between x and i is a bijection.
would usually be written a + s = b. The generators of the ring graph are {1, −1}.
Corollary 7.2.2. If p is a prime equivalent to 1 modulo 4, then −1 is a square modulo p.
The d-dimensional hypercube, Hd , is a Cayley graph over the additive group (Z/2)d : that is the
set of vectors in {0, 1}d under addition modulo 2. The generators are given by the vectors in Proof. Let i be the number between 1 and p − 1 such that g i ≡ −1 modulo p. Then, g 2i ≡ 1
{0, 1}d that have a 1 in exactly one position. This set is closed under inverse, because every modulo p and so by Fact 7.2.1 we know that 2i must be equivalent to p − 1 modulo p. The only
element of this group is its own inverse. number between 1 and p that satisfies this relation is i = (p − 1)/2.
We require S to be closed under inverse so that the graph is undirected: As 4 divides p − 1, (p − 1)/4 is an integer. So, we can set s = g (p−1)/4 , and finish the proof by
observing that s2 ≡ g (p−1)/2 ≡ −1 modulo p.
a+s=b ⇐⇒ b + (−s) = a.
We now understand a lot about the squares modulo p (formally called quadratic residues). The
Cayley graphs over Abeliean groups are particularly convenient because we can find an
squares are exactly the elements g i where i is even. As g i g j = g i+j , the fact that −1 is a square
orthonormal basis of eigenvectors without knowing the set of generators. They just depend on the
implies that s is a square if and only if −s is a square. So, S is closed under negation, and the
group1 . Knowing a basis of eigenvectors makes it much easier to compute the eigenvalues. We
Cayley graph of Z/p with generator set S is in fact a graph. As |S| = (p − 1)/2, it is regular of
give the computations of the eigenvectors in sections 7.4 and 7.8.
degree
We will now examine two exciting types of Cayley graphs: Paley graphs and generalized p−1
d= .
hypercubes. 2
1
More precisely, the characters always form an orthonormal set of eigenvectors, and the characters just depend
upon the group. When two different characters have the same eigenvalue, we obtain an eigenspace of dimension
greater than 1. These eigenspaces do depend upon the choice of generators.
61
CHAPTER 7. CAYLEY GRAPHS 63 CHAPTER 7. CAYLEY GRAPHS 64
7.3 Eigenvalues of the Paley Graphs As we are considering a non-diagonal entry, w ̸= 0. The term in the sum for y = 0 is zero. When
y ̸= 0, χ(y) ∈ ±1, so
Let L be the Laplacians matrix of the Paley graph on p vertices. A remarkable feature of Paley χ(y)χ(w + y) = χ(w + y)/χ(y) = χ(w/y + 1).
graph is that L2 can be written as a linear combination of L, J and I , where J is the all-1’s As y varies over {1, . . . , p − 1}, w/y also varies over all of {1, . . . , p − 1}. So, w/y + 1 varies over
matrix. We will prove that all elements other than 1. This means that
p−1 p(p − 1) !
L2 = pL + J− I. (7.1) p−1
X p−1
X
4 4
χ(y)χ(w + y) = χ(z) − χ(1) = 0 − 1 = −1.
The proof will be easiest if we express L in terms of a matrix X defined by the quadratic y=0 z=0
character :
1 if x is a quadratic residue modulo p So, every off-diagonal entry in X 2 is −1.
χ(x) = 0 if x = 0, and Corollary 7.3.2. The nonzero eigenvalues of the Laplacian of the Paley graph on p vertices are
−1 otherwise.
1 √
This is called a character because it satisfies χ(xy) = χ(x)χ(y). We will use this to define a (p ± p) .
2
matrix X by
X (a, b) = χ(a − b). Proof. Let ϕ be an eigenvector of L of eigenvalue λ ̸= 0. As ϕ is orthogonal to the all-1s vector,
Using the fact that J ϕ = 0. So, Equation (7.1) implies
p−1
2 if a = b, p(p − 1)
L(a, b) = −1 if χ(a − b) = 1, and λ2 ϕ = L2 ϕ = pLϕ − I ϕ = (pλ − p(p − 1)/4)ϕ.
4
0 otherwise,
This equation tells us that λ satisfies
we find that
2L = pI − J − X . (7.2) p(p − 1)
λ2 − pλ + = 0,
4
Equation (7.1) follows from this relation, the following lemma, and the relations J 2 = pJ and which implies
X J = J X = 0. The latter follows from the fact that each row and column of X has exactly 1 √
λ∈ (p ± p) .
(p − 1)/2 entries that are 1, (p − 1)/2 entries that are −1, and one entry that is 0. 2
Lemma 7.3.1.
X 2 = pI − J .
The fact that Paley graphs have only two nonzero eigenvalues means that they are strongly
Proof. The diagonal entries of X 2 are the squares of the norms of the columns of X . As each regular graphs. We will discuss those further in Chapter 9. The fact that those eigenvalues are
contains (p − 1)/2 entries that are 1, (p − 1)/2 entries that are −1, and one entry that is 0, its very close to each other means that
p 2 times a Paley graph (the graph we obtain by assigning
squared norm is p − 1. weight 2 to every edge) is a 1 + 1/p approximation of the complete graph, up to a very small
factor. Paley graphs have also been shown by Chung, Graham, and Wilson to have many
To handle the off-diagonal entries, we observe that X is symmetric, and so the off-diagonal properties in common with random graphs [CGW89].
entries of X 2 are the inner products of columns of X . That is,
p−1
X p−1
X
X (a, b) = χ(a − x)χ(b − x) = χ(y)χ((b − a) + y), 7.4 Generalizing Hypercubes
x=0 y=0
To generalize the hypercube, we will consider Cayley graphs over the same group, but with more
where we have set y = a − x. For convenience, set w = b − a, so we can write this more simply as
generators. Recall that we view the vertex set as the vectors in {0, 1}d , modulo 2. Each
p−1
X generator, g 1 , . . . , g k , is in the same group. As g + g = 0 modulo 2 for all g ∈ {0, 1}d , the set of
χ(y)χ(w + y). generators is automatically closed under negation.
y=0
CHAPTER 7. CAYLEY GRAPHS 65 CHAPTER 7. CAYLEY GRAPHS 66
Let G be the Cayley graph with these generators. To be concrete, set V = {0, 1}d , and note that 7.5 A random set of generators
G has edge set
(x , x + g j ) : x ∈ V, 1 ≤ j ≤ k . We will now show that if we choose the set of generators uniformly at random, for k some
constant multiple of the dimension, then we obtain a graph that is a good approximation of the
Using the analysis of products of graphs in Theorem 5.3.2, we derived a set of eigenvectors of Hd .
complete graph. That is, all the eigenvalues of the Laplacian will be close to k. This construction
We will now verify that these are eigenvectors for all generalized hypercubes. Knowing these will
comes from the work of Alon and Roichman [AR94]. We will set k = cd, for some c > 1. Think of
make it easy to describe the eigenvalues.
c = 2, c = 10, or c = 1 + ϵ.
For each b ∈ {0, 1}d , define the function ψ b from V to the reals given by
For b ∈ {0, 1}d but not the zero vector, and for g chosen uniformly at random from {0, 1}d , b T g
bT x modulo 2 is uniformly distributed in {0, 1}, and so
ψ b (x ) = (−1) .
T
When we write b T x , you might wonder if we mean to take the sum over the reals or modulo 2. (−1)b g
As both b and x are {0, 1}-vectors, and the result is used as the exponent of −1, you get the
is uniformly distributed in ±1. So, if we pick g 1 , . . . , g k independently and uniformly from
same answer either way you do it.
{0, 1}d , the eigenvalue corresponding to the eigenvector ψ b is
While it is natural to think of b as being a vertex, that is the wrong perspective. Instead, you k
X
def T
should think of b as indexing a Fourier coefficient (if you don’t know what a Fourier coefficient is, λb = k − (−1)b gi
.
just don’t think of it as a vertex). i=1
The eigenvectors and eigenvalues of the graph are determined by the following theorem. As this The right-hand part is a sum of independent, uniformly chosen ±1 random variables. So, we
graph is k-regular, the eigenvectors of the adjacency and Laplacian matrices will be the same. know it is concentrated around 0, and thus λb will be concentrated around k. To determine how
Lemma 7.4.1. For each b ∈ {0, 1}d the vector ψ b is a Laplacian matrix eigenvector with concentrated the sum actually is, we use a Chernoff bound. There are many forms of Chernoff
eigenvalue bounds. We will not use the strongest, but settle for one which is simple and which gives results
X k
T
that are qualitatively correct.
k− (−1)b g i .
Theorem 7.5.1. Let x1 , . . . , xk be independent ±1 random variables. Then, for all t > 0,
i=1 " #
X 2
Proof. We begin by observing that Pr xi ≥ t ≤ 2e−t /2k .
T T T
i
ψ b (x + y ) = (−1)b (x +y )
= (−1)b x
(−1)b y
= ψ b (x )ψ b (y ). (7.3)
This becomes very small when t is a constant fraction of k. In fact, it becomes so small that it is
Let L be the Laplacian matrix of the graph. For any b ∈ {0, 1}d and any vertex x ∈ V , we use unlikely that any eigenvalue deviates from k by more than t.
(7.3) to compute Theorem 7.5.2. With high probability, all of the nonzero eigenvalues of the generalized hypercube
k
X differ from k by at most r
(Lψ b )(x ) = kψ b (x ) − ψ b (x + g i ) 2
k ,
i=1 c
Xk where k = cd.
= kψ b (x ) − ψ b (x )ψ b (g i )
i=1 p
! Proof. Let t = k 2/c. Then, for every nonzero b,
k
X 2 /2k
= ψ b (x ) k − ψ b (g i ) . Pr [|k − λb | ≥ t] ≤ 2e−t ≤ 2e−k/c = 2e−d .
i=1
Now, the probability that there is some b for which λb violates these bounds is at most the sum
So, ψ b is an eigenvector of eigenvalue
of these terms:
k
X k
X X
k− ψ b (g i ) = k − (−1)b
T
gi
. Pr [∃b : |k − λb | ≥ t] ≤ Pr [|k − λb | ≥ t] ≤ (2d − 1)2e−d ,
i=1 i=1 b∈{0,1}d ,b̸=0d
which is always less than 1 and goes to zero exponentially quickly as d grows.
CHAPTER 7. CAYLEY GRAPHS 67 CHAPTER 7. CAYLEY GRAPHS 68
We initially suggested thinking of c = 2 or c = 10. The above bound works for c = 10. To get a where i is an integer that satisfies i2 = −1 modulo p.
useful bound for c = 2, we need to sharpen the analysis. A naive sharpening will work down to
Even more explicit constructions, which do not require solving equations, may be found
c = 2 ln 2. To go lower than that, you need a stronger Chernoff bound.
in [ABN+ 92].
7.6 Conclusion
7.8 Eigenvectors of Cayley Graphs of Abelian Groups
We have now seen that a random generalized hypercube of degree k probably has all non-zero
The wonderful thing about Cayley graphs of Abelian groups is that we can construct an
Laplacian eigenvalues between
p p orthornormal basis of eigenvectors for these graphs without even knowing the set of generators S.
k(1 − 2/c) and k(1 + 2/c). That is, the eigenvectors only depend upon the group. Related results also hold for Cayley graphs
of arbitrary groups, and are related to representations of the groups. See [Bab79] for details.
If we let n be the number of vertices, and we now multiply the weight of every edge by n/k, we
obtain a graph with all nonzero Laplacian eigenvalues between As Cayley graphs are regular, it won’t matter which matrix we consider. For simplicity, we will
p p consider adjacency matrices.
n(1 − 2/c) and n(1 + 2/c).
p Let n be an integer and let G be a Cayley graph on Z/n with generator set S. When S = {±1},
Thus, this is essentially a 1 + 2/c approximation of the complete graph on n vertices. But, the
degree of every vertex is only c log2 n. Expanders are infinite families of graphs that are we get the ring graphs. For general S, I think of these as generalized Ring graphs. Let’s first see
constant-factor approximations of complete graphs, but with constant degrees. that they have the same eigenvectors as the Ring graphs.
We know that random regular graphs are probably expanders. If we want explicit constructions, Recall that we proved that the vectors x k and y k were eigenvectors of the ring graphs, where
we need to go to non-Abelian groups. x k (u) = sin(2πku/n), and
Explicit constructions that achieve bounds approaching those of random generalized hypercubes y k (u) = cos(2πku/n),
come from error-correcting codes. for 1 ≤ k ≤ n/2.
Explicit constructions allow us to use these graphs in applications that require us to implicitly Let’s just do the computation for the x k , as the y k are similar. For every u modulo n, we have
deal with a very large graph. In Chapter 31, we will see how to use such graphs to construct X
pseudo-random generators. (Ax k )(u) = x k (u + g)
g∈S
1 X
7.7 Non-Abelian Groups = x k (u + g) + x k (u − g)
2
g∈S
In the homework, you will show that it is impossible to make constant-degree expander graphs 1 X
from Cayley graphs of Abelian groups. The best expanders are constructed from Cayley graphs of = sin(2πk(u + g)/n) + sin(2πk(u − g)/n)
2
2-by-2 matrix groups. In particular, the Ramanujan expanders of Margulis [Mar88] and g∈S
Lubotzky, Phillips and Sarnak [LPS88] are Cayley graphs over the Projective Special Linear
1 X
Groups PSL(2, p), where p is a prime. These are the 2-by-2 matrices modulo p with determinant = 2 sin(2πku/n) cos(2πkg/n)
1, in which we identify A with −A. 2
g∈S
X
They provided a very concrete set of generators. For a prime q congruent to 1 modulo 4, it is = sin(2πku/n) cos(2πkg/n)
known that there are p + 1 solutions to the equation g∈S
X
a21 + a22 + a23 + a24 = p, = x k (u) cos(2πkg/n).
g∈S
where a1 is odd and a2 , a3 and a4 are even. We obtain a generator for each such solution of the
So, the corresponding eigenvalue is X
form:
1 a0 + ia1 a2 + ia3 cos(2πkg/n).
√ ,
p −a2 + ia3 a0 − ia1 g∈S
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 70
Chapter 8
8.1 Transformation
The reason we write it this way is that the expectation of every entry of R, and thus R is zero.
We will show that with high probability all of the eigenvalues of R are probably small, and thus
we can view M as being approximately p(J − I ).
69
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 71 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 72
As p(J − I ) has one eigenvalue of p(n − 1) and n − 1 eigenvalues of −p, the bulk of the as x is a unit vector.
distribution of eigenvalues of M is very close to the distribution of eigenvalues of R, minus p. To
We thereby obtain the following bound on x T Rx .
see this, first subtract pI from R. This shifts all the eigenvalues down by p. We must now add
pJ . As J is a rank-1 matrix, we can show that the eigenvalues of pJ + (R − pI ) interlace the Lemma 8.2.2. For every unit vector x ,
eigenvalues of R − pI (see the exercise at the end of the chapter). So, the largest eigenvalue 2
moves up a lot, and all the other n − 1 move up to at most the next eigenvalue. PrR x T Rx ≥ t ≤ 2e−t .
Our first analysis will be a crude but simple upper bound on the norm of R.
Proof. The expectation of x T Rx is 0. The preceding argument tells us that
Pr x T Rx ≥ t ≤ Pr x T Rx ≥ t + Pr x T Rx ≤ −t
8.2 The extreme eigenvalues T T
≤ Pr x Rx ≥ t + Pr x (−R)x ≥ t
2
≤ 2e−t ,
For this section, we make the simplifying assumption that p = 1/2. At the end, we will explain
how to handle the general case. where we have exploited the fact that R and −R are identically distributed.
Recall that
x T Rx
∥R∥ = max . 8.2.1 Vectors near v 1
x xTx
Each R(i, j) is a random variable that is independently and uniformly distributed in ±1/2. You might be wondering what good the previous argument will do us. We have shown that it is
To begin, we fix any unit vector x , and consider unlikely that the Rayleigh quotient of any given x is large. But, we have to reason about all x of
X unit norm.
x T Rx = 2R(i, j)x (i)x (j).
Lemma 8.2.3. Let R be a symmetric matrix and let v be a unit eigenvector of R whose
i<j
eigenvalue has absolute value ∥R∥. If x is another unit vector such that
This is a sum of independent random variables, and so may be proved to be tightly concentrated √
around its expectation, which in this case is zero. There are many types of concentration bounds, v T x ≥ 3/2,
with the most popular being the Chernoff and Hoeffding bounds. In this case we will apply
then
Hoeffding’s inequality. 1
x T Rx ≥ ∥R∥ .
2
Theorem 8.2.1 (Hoeffding’s Inequality). Let a1 , . . . , am and b1 , . . . , bm be real numbers and let
X1 , . . . ,P
Xm be independent random variables such that Xi takes values between ai and bi . Let Proof. Let λ1 ≥ λ2 ≥ · · · ≥ λn be the eigenvalues of R and let v 1 , . . . , v n be a corresponding set
µ = E [ i Xi ]. Then, for every t > 0, of orthonormal eigenvectors. Assume without loss of generality that λ1 ≥ |λn | and that v = v 1 .
hX i Expand x in the eigenbasis as
2t2 X
Pr Xi ≥ µ + t ≤ exp − P 2
. x = ci v i .
i (bi − ai ) i
√ 2
P
To apply this theorem, we view We know that c1 ≥ 3/2 and
i ci = 1. This implies that
Xi,j = 2ri,j x (i)x (j)
X X X
as our random variables. As ri,j takes values in ±1/2, we can set
T
x Rx = 2 2
ci λi ≥ c1 λ1 − 2 2
ci |λ1 | = λ1 c1 − 2
ci = λ1 (2c21 − 1) ≥ λ1 /2.
i i≥2 i≥2
ai,j = −x (i)x (j) and bi,j = x (i)x (j).
We then compute
! ! We will bound the probability that ∥R∥ is large by taking Rayleigh quotients with random unit
X X X X X
2
(bi − ai ) = 2 2
4x (i) x (j) = 2 2 2
x (i) x (j) ≤ 2 2
x (i) x (j) 2
= 2, vectors. Let’s examine the probability that a random unit vector x satisfies the conditions of
i<j i<j i̸=j i i Lemma 8.2.3.
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 73 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 74
Lemma 8.2.4. Let v be an arbitrary unit vector, and let x be a random unit vector. Then, Proof. Let R be a fixed symmetric matrix. By applying Lemma 8.2.4 to any eigenvector of R
h √ i whose eigenvalue has maximal absolute value, we find
1
Pr v T x ≥ 3/2 ≥ √
πn2n−1 1 1
Prx x T Rx ≥ ∥R∥ ≥ √ .
2 πn2n−1
Proof. Let B n denote the unit ball in IRn , and let C denote the cap on the surface of B n
Thus, for a random R we find
containing all vectors x such that √
v T x ≥ 3/2.
1 1
PrR,x ∥R∥ ≥ t and x T Rx ≥ ∥R∥ ≥ PrR [∥R∥ ≥ t] √ .
We need to lower bound the ratio of the surface area of the cap C to the surface area of B n . 2 πn2n−1
Now, consider the (n − 1)-dimensional hypersphere whose boundary is the boundary of the cap C. where the last inequality follows from Lemma 8.2.2.
As the cap C lies above this hypersphere, the (n − 1)-dimensional volume of this hypersphere is a Combining these inequalities, we obtain
lower bound on the surface area of the cap C. Recall that the volume of a sphere in IRn of radius
r is 1 2
√ 2 /4
Pr [∥R∥ ≥ t] ≤ πn2n e−t . One of the easiest ways to reason about the distribution of the eigenvalues is to estimate the
expectations of the traces of powers of R. This is called Wigner’s trace method.
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 75 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 76
Before proceeding with our analysis, we recall a formula for the entries of a power of matrices. 8.4 Expectation of the trace of a power
For matrices A and B whose rows and columns are indexed by V ,
X Recall that the trace is the sum of the diagonal entries in a matrix. By expanding the formula for
(AB)(a, b) = A(a, c)B(c, b).
matrix multiplication, one can also show
c∈V
X l−1
Y
Applying this formula inductively, we find that
Rl (a0 , a0 ) = R(a0 , al−1 ) R(ai−1 , ai ),
X
(Al )(a, b) = A(a, c1 )A(c1 , c2 ) · · · A(cl−1 , b). a1 ,...,al−1 ∈V i=1
c1 ,...,cl−1 ∈V
and so
X l−1
Y
We can use good bounds on the moments of the eigenvalues of R to obtain a good upper bound ERl (a0 , a0 ) = ER(a0 , al−1 ) R(ai−1 , ai ).
on the norm of R that holds with high probability. We will do this by estimating the trace of a a1 ,...,al−1 ∈V i=1
high power of R. For an even power l, this should be relatively close to ∥R∥. In particular, we To simplify this expression, we will recall that if X and Y are independent random variables, then
use the fact that for every even l E(XY ) = E(X)E(Y ). So, to the extent that the terms in this product are independent, we can
∥R∥l ≤ Tr Rl distribute this expectation across this product. As the entries of R are independent, up to the
symmetry condition, the only terms that are dependent are those that are identical. So, if
and thus 1/l {bj , cj }j is the set of pairs that occur in
∥R∥ ≤ Tr Rl .
{a0 , a1 } , {a1 , a2 } , . . . , {al−2 , al−1 } , {al−1 , a0 } , (8.1)
We will prove in Theorem 8.4.2 that for every even l such that np(1 − p) ≥ 2l8 ,
and pair {bj , cj } appears dj times, then
ETr Rl ≤ 2n(4np(1 − p))l/2 . l−1
Y Y
ER(a0 , al−1 ) R(ai−1 , ai ) = ER(bj , cj )dj .
This will allow us to show that the norm of R is usually less than u where i=1 j
1/l p
def As each entry of R has expectation 0,
u = 2n(4np(1 − p))l/2 = (2n)1/l 2 np(1 − p).
ER(bj , cj )dj
We establish this by an application of Markov’s inequality. For all ϵ > 0,
is zero if dj is 1. In general
h i
Pr [∥R∥ > (1 + ϵ)u] ≤ Pr Tr Rl > (1 + ϵ)l ul h i
h i ERd(bj ,cj ) = p(1 − p) (1 − p)d−1 − (−p)d−1 ≤ p(1 − p), (8.2)
≤ Pr Tr Rl > (1 + ϵ)l ETr Rl
for d ≥ 2.
≤ (1 + ϵ)−l ,
So, ERl (a0 , a0 ) is at most the sum over sequences a1 , . . . , al−1 such that each pair in (8.1) appears
by Markov’s inequality. at least twice, times p(1 − p) for each pair that appears in the sequence.
To understand this probability, remember that for small ϵ (1 + ϵ) is approximately exp(ϵ). So, To describe this more carefully, we say that a sequence a0 , a1 , . . . , al is a closed walk of length l on
(1 + ϵ)−l is approximately exp(−ϵl). This probability becomes small when l > 1/ϵ. Concretely, for n vertices if each ai ∈ {1, . . . , n} and al = a0 . In addition, we say that it is significant if for every
ϵ < 1/2, 1 + ϵ < exp(4ϵ/5). Thus, we can take ϵ approximately (n/2)−1/8 . While this bound is not tuple {b, c} there are at least two indices i for which {ai , ai+1 } = {b, c}. Let Wn,l,k denote the
very useful for n that we encounter in practice, it is nice asymptotically. The bound can be number of significant closed walks of length l on n vertices such that a1 , . . . , al−1 contains exactly
substantially improved by more careful arguments, as we explain at the end of the Chapter. k distinct elements. As a sequence with k distinct elements must contain at least k distinct pairs,
we obtain the following upper bound on the trace.
We should also examine the term (2n)1/l :
Lemma 8.4.1.
(2n)1/l = exp(ln(2n)/l) ≤ 1 + 1.1 ln(2n)/l, X l/2
ETr Rl ≤ Wn,l,k (p(1 − p))k .
for ln(2n)/l < 1/2. Thus, for l >> ln(2n) this term is close to 1. k=1
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 77 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 78
In the next section, we prove that There are at most l−1k choices for the set S. Given S, we wish to record the identities of the
elements ai for i ∈ S. Each is an element of [n], and we record them in the order in which they
Wn,l,k ≤ nk+1 2l l4(l−2k) . appear in the walk. You may wish to think of this data as a map
We will show that the sequence tk is geometrically increasing, and thus it is dominated by its τ : [l − 1] \ S → S ∪ {0} .
largest term. We compute
There are at most (k + 1)l−1−k choices for τ . See figure 8.1 for an example.
tk nk+1 2l l4(l−2k) (p(1 − p))k
= k l 4(l−2k+2)
tk−1 n 2l (p(1 − p))k−1
n(p(1 − p))
=
l8
≥ 2.
Thus,
While not every choice of a0 , S, σ and τ corresponds to a significant walk, every significant walk
with k distinct elements corresponds to some a0 , S, σ and τ . Thus,
Of course, better bounds on Wn,l,k provide better bounds on the trace. Vu [Vu07] proves that
l − 1 k+1
l Wn,l,k ≤ n (k + 1)l−1−k . (8.3)
Wn,l,k ≤ nk+1 (k + 1)2(l−2k) 22k . k
2k
This bound
p is too loose to obtain the result we desire. It merely allows us to prove that
This bound allows us to apply much higher powers of the matrix.
∥R∥ ≤ c np(1 − p) log n for some constant c. This bound is loosest when k = l/2. This is
fortunate both because this is the case in which it is easiest to tighten the bound, and because the
computation in Theorem 8.4.2 is dominated by this term.
8.5 The number of walks
Consider the graph with edges (ai−1 , ai ) for i ∈ S. This graph must be a tree because it contains
Our goal is to prove an upper bound on Wn,l,k . We will begin by proving a crude upper bound, exactly the edges from which the walk first hits each vertex. Formally, this is because the graph
and then refine it. contains k edges, touches k + 1 vertices, and we can prove by induction on the elements in S that
it is connected, starting with a0 . See Figure 8.2.
As it is tricky to obtain a clean formula for the number of such walks, we will instead derive ways
of describing such walks, and then count how many such descriptions there can be. We can use this tree to show that, when k = l/2, every pair {ai−1 , ai } that appears in the walk
must appear exactly twice: the walk only takes l steps, and each pair of the k = l/2 in the tree
Let S ⊂ {1, . . . , l − 1} be the set of i for which ai does not appear earlier in the walk: must be covered at least twice.
ai ̸∈ {aj : j < i} . We now argue that when l = 2k the map τ is completely unnecessary: the walk is determined by
S and σ alone. That is, for every i ̸∈ S there is only one edge that the walk can follow. For i ̸∈ S,
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 79 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 80
exactly once and the walk follows that edge. That is, the edge is {ai−1 , ai } and we can infer ai
given i ∈ T . If ai−1 is adjacent to exactly one tree edge that has been used exactly once but the
walk does not follow that edge, then step i is extra: it either follows an edge that is not in the
tree or it follows a tree edge that has been used at least twice.
This leaves us to account for the steps in which ai−1 is adjacent to more than one tree edge that
has been used exactly once and follows such an edge. In this case, we call step i ambiguous,
because we need some way to indicate which of those edges it used. In ambiguous steps we also
use τ to record ai . Every step not in S or T is extra or ambiguous. So,
τ : ([l − 1] \ (S ∪ T )) → S ∪ {0} .
Figure 8.2: The tree edges for this walk.
The data a0 , S, T , σ, and τ determine the walk. It remains to determine how many ways we can
the tuple {ai−1 , ai } must have appeared exactly once before in the walk. We will show that at choose them.
step i the vertex ai−1 is adjacent to exactly one edge with this property.
We will show that the number of ambiguous steps is at most the number of extra steps. This
To this end, we keep track of the graph of edges that have appeared exactly once before in the implies that |V \ (S ∪ T )| ≤ 2x. Thus, the number of possible maps τ is at most
walk. We could show by induction that at step i this graph is precisely a path from a0 to ai−1 .
But, we take an alternate approach. Consider the subgraph of the tree edges that have been used (k + 1)2x .
exactly once up to step i. We will count both its number of vertices, v, and its number of edges,
The number of choices for S and T may be upper bounded by
f . At step i we include ai in this subgraph regardless of whether it is adjacent to any of the
edges, so initially v = 1 and e = 0. The walk ends in the same state.
l−1 l−1−k
≤ 2l−1 (l − 1 − k)2x .
For steps in which i ∈ S, both v and f increase by 1: the edge {ai−1 , ai } and the vertex ai is k 2x
added to the subgraph. When i ̸∈ S and {ai−1 , ai } is the only tree edge adjacent to ai−1 that has
Thus
been used exactly once, both e and v decrease: e because we use the pair {ai−1 , ai } a second time
Wn,l,k ≤ nk+1 2l−1 (l − k − 1)2x (k + 1)2x ≤ nk+1 2l−1 (lk)2(l−2k) ≤ nk+1 2l l4(l−2k) . (8.4)
and v because ai−1 is no longer adjacent to any tree edge that has been used exactly once, and
the walk moves to ai .
We now finish by arguing that the number of ambiguous edges is at most the number of extra
If for some i ̸∈ S it were the case that ai−1 was adjacent to two tree edges that had been used edges. As before, keep track of the subgraph of the tree edges that have been used exactly once
exactly once, then e would decrease but v would not. As the process starts and ends with up to step i. We will count both its number of vertices, v, and its number of edges, e. At step i
v − e = 1, this is not possible. we include ai in this subgraph regardless of whether it is adjacent to any of the graphs edges, so
initially v = 1 and e = 0. The walk ends in the same state.
Thus
l For steps in which i ∈ S, both v and e increase by 1. For steps in which i ∈ T , the vertex ai−1 has
Wn,l,l/2 ≤ nl/2+1 .
l/2 degree one in this graph. When we follow the edge (ai−1 , ai ), we remove it from this graph. As
ai−1 is no longer adjacent to any edge of the graph, both v and e decrease by 1.
Now that we know Wn,l,l/2 is much less than the bound suggested by (8.3), we should suspect
that Wn,l,k is also much lower when k is close to l/2. To show this, we extend the idea used in the At ambiguous steps i, we decrement e. But, because ai−1 was adjacent to at least two tree edges
previous argument to show that with very little information we can determine where the walk that had been used exactly once, it is not removed from the graph and v does not decrease. The
goes for almost every step not in S. ambiguous steps may be compensated by extra steps. An extra step does not change f , but it can
decrease v. This happens when ai−1 is not adjacent to any tree edges that have been used exactly
We say that the ith step in the walk is extra if the pair {ai−1 , ai } is not a tree edge or if it appears once, but ai is. Thus, ai−1 contributes 1 to v during step i − 1, but is removed from the count as
at least twice in the walk before step i. Let x denote the number of extra steps. As each of the soon as the walk moves to ai . As the walk starts and ends with v − f = 1, neither the steps in S
tree edges appears in at least two steps, the number of extra steps is at most l − 2k. We will use τ nor T change this difference, ambiguous steps increase it, and extra steps can decrease it, the
to record the destination vertex ai of each extra step, again by indicating its position in S. number of extra steps must be at least the number of ambiguous steps.
During the walk, we keep track of the set of tree edges that have been used exactly once. Let T
be the set of steps in which in which ai−1 is adjacent to exactly one tree edge that has been used
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 81 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 82
8.6 Notes
The proof in this chapter is a slight simplification and weakening of result due to Vu [Vu07]. The
result was first claimed by Füredi and Komlos [FK81]. However, there were a few mistakes in
their paper. Vu’s paper also provides concentration results that lower bound µ2 , whereas the
argument in this chapter merely provides an upper bound.
8.7 Exercise
1. Interlacing.
Let A be a symmetric matrix with eigenvalues α1 ≥ α2 ≥ . . . ≥ αn . Let B = A + x x T for some
vector x and let β1 ≥ β2 ≥ . . . ≥ βn be the eigenvalues of B. Prove that for all i
βi ≥ αi ≥ βi+1 .
These conditions are very strong, and it might not be obvious that there are any non-trivial
graphs that satisfy these conditions. Of course, the complete graph and disjoint unions of
complete graphs satisfy these conditions.
For the rest of this lecture, we will only consider strongly regular graphs that are connected and
that are not the complete graph. I will now give you some examples.
Chapter 9
9.3 The Pentagon
Strongly Regular Graphs The simplest strongly-regular graph is the pentagon. It has parameters
n = 5, k = 2, λ = 0, µ = 1.
For a positive integer n, the lattice graph Ln is the graph with vertex set {1, . . . n}2 in which
9.1 Introduction vertex (a, b) is connected to vertex (c, d) if a = c or b = d. Thus, the vertices may be arranged at
the points in an n-by-n grid, with vertices being connected if they lie in the same row or column.
In this and the next lecture, I will discuss strongly regular graphs. Strongly regular graphs are Alternatively, you can understand this graph as the line graph of a bipartite complete graph
extremal in many ways. For example, their adjacency matrices have only three distinct between two sets of n vertices.
eigenvalues. If you are going to understand spectral graph theory, you must have these in mind. It is routine to see that the parameters of this graph are:
In many ways, strongly-regular graphs can be thought of as the high-degree analogs of expander
graphs. However, they are much easier to construct. k = 2(n − 1), λ = n − 2, µ = 2.
Let me remark that the number of different latin squares of size n grows very quickly, at least as
Formally, a graph G is strongly regular if
fast as n!(n − 1)!(n − 2)! . . . 2!.
1. it is k-regular, for some integer k; From such a latin square, we construct a Latin Square Graph. It will have n2 nodes, one for each
cell in the square. Two nodes are joined by an edge if
2. there exists an integer λ such that for every pair of vertices x and y that are neighbors in G,
there are λ vertices z that are neighbors of both x and y;
1. they are in the same row,
3. there exists an integer µ such that for every pair of vertices x and y that are not neighbors
2. they are in the same column, or
in G, there are µ vertices z that are neighbors of both x and y.
83
CHAPTER 9. STRONGLY REGULAR GRAPHS 85 CHAPTER 9. STRONGLY REGULAR GRAPHS 86
3. they hold the same number. For the lattice graph Ln , we have
r = n − 2, s = −2.
So, such a graph has degree k = 3(n − 1). Any two nodes in the same row will both be neighbors For the latin square graphs of order n, we have
with every other pair of nodes in their row. They will have two more common neighors: the nodes
in their columns holding the other’s number. So, they have n common neighbors. The same r = n − 3, s = −3.
obviously holds for columns, and is easy to see for nodes that have the same number. So, every
pair of nodes that are neighbors have exactly λ = n common neighbors.
On the other hand, consider two vertices that are not neighbors, say (1, 1) and (2, 2). They lie in
9.7 Regular graphs with three eigenvalues
different rows, lie in different columns, and hold different numbers. The vertex (1, 1) has two
common neighbors of (2, 2) in its row: the vertex (1, 2) and the vertex holding the same number We will now show that every regular connected graph with at most 3 eigenvalues must be a
as (2, 2). Similarly, it has two common neighbors of (2, 2) in its column. Finally, we can find two strongly regular graph. Let G be k-regular, and let its eigenvalues other than k be r and s. As G
more common neighbors of (2, 2) that are in different rows and columns by looking at the nodes is connected, its adjacency eigenvalue k has multiplicty 1.
that hold the same number as (1, 1), but which are in the same row or column as (2, 2). So, µ = 6.
Then, for every vector orthogonal to 1, we have
(A − rI)(A − sI)v = 0.
9.6 The Eigenvalues of Strongly Regular Graphs
Thus, for some β,
(A − rI)(A − sI) = βJ,
We will consider the adjacency matrices of strongly regular graphs. Let A be the adjacency
matrix of a strongly regular graph with parameters (k, λ, µ). We already know that A has an which gives
eigenvalue of k with multiplicity 1. We will now show that A has just two other eigenvalues.
A2 − (r + s)A + rsI = βJ =⇒
To prove this, first observe that the (u, v) entry of A2 is the number of common neighbors of
A2 = (r + s)A − rsI + βJ
vertices u and v. For u = v, this is just the degree of vertex u. We will use this fact to write A2 as
a linear combination of A, I and J. To this end, observe that the adjacency matrix of the = (r + s + β)A + β(J − A − I) + (rs + β)I.
complement of A (the graph with non-edges where A has edges) is J − I − A. So,
So, the number of common neighbors of two nodes just depends on whether or not they are
A2 = λA + µ(J − I − A) + kI = (λ − µ)A + µJ + (k − µ)I. neighbors, which implies that A is strongly regular.
So, every eigenvalue θ of A other than k satisfies We will now see that, unless f = g, both r and s must be integers. We do this by observing a few
identities that they both must satisfy. First, from the quadratic equation above, we know that
θ2 = (λ − µ)θ + k − µ.
r+s=λ−µ (9.1)
The eigenvalues of A other than k are those θ that satisfy this quadratic equation, and so are
given by and
p
λ − µ ± (λ − µ)2 + 4(k − µ) rs = µ − k. (9.2)
.
2 As the trace of an adjacency matrix is zero, and is also the sum of the eigenvalues times their
These eigenvalues are always denoted r and s, with r > s. By convention, the multiplicty of the multiplicites, we know
eigenvalue r is always denoted f , and the multiplicty of s is always denoted g. k + f r + gs = 0. (9.3)
For example, for the pentagon we have So, it must be the case that s < 0. Equation 9.1 then gives r > 0.
√ √ If f ̸= g, then equations (9.3) and (9.1) provide independent constraints on r and s, and so
5−1 5+1
r= , s=− . together they determine r and s. As the coefficients in both equations are integers, they tell us
2 2
CHAPTER 9. STRONGLY REGULAR GRAPHS 87 CHAPTER 9. STRONGLY REGULAR GRAPHS 88
that both r and s are rational numbers. From this, and the fact that r and s are the roots of a Let W be the analogous matrix for the eigenvalue s. We then have
quadratic equation with integral coefficients, we may conclude that r and s are in fact integers.
1
Let me remind you as to why. A = rU U T + sW W T + k J.
n
Lemma 9.8.1. If θ is a rational number that satisfies 1
As each of the matrices U U T , W W T and nJ are projections (having all eigenvalues 0 or 1), and
θ2 + bθ + c = 0, are mutually orthogonal, we also have
Let u 1 , . . . u f be an orthonormal set of eigenvectors of the eigenvalue r, and let U be the matrix In particular, this means that the point set x 1 , . . . , x n is a two-distance point set: a set of points
containing these vectors as columns. Recall that U is only determined up to an orthnormal such that there are only two different distances between them. Next lecture, we will use this fact
transformation. That is, we could equally take U Q for any f -by-f orthnormal matrix Q. to prove a lower bound on the dimensions f and g.
While the vectors U are determined only up to orthogonal transformations, these transformations For a positive integer n, the triangular graph Tn may be defined to be the line graph of the
don’t effect the geometry of these vectors. For example, for vertices i and j, the distance between complete graph on n vertices. To be more concrete, its vertices are the subsets of size 2 of
x i and x j is {1, . . . , n}. Two of these sets are connected by an edge if their intersection has size 1.
∥x i − x j ∥ ,
You are probably familiar with some triangular graphs. T3 is the triangle, T4 is the skeleton of the
and octahedron, and T5 is the complement of the Petersen graph.
2 2 2
∥x i − x j ∥ = ∥x i ∥ + ∥xxj ∥ − 2x i x Tj .
Let’s verify that these are strongly-regular, and compute their parameters. As the construction is
On the other hand, competely symmetric, we may begin by considering any vertex, say the one labeled by the set
{1, 2}. Every vertex labeled by a set of form {1, i} or {2, i}, for i ≥ 3, will be connected to this
∥x i Q − x j Q∥2 = ∥x i Q∥2 +∥xxj Q∥2 −2(x i Q)(x j Q)T = ∥x i Q∥2 +∥xxj Q∥2 −2x i QQT x Tj = ∥x i ∥2 +∥xxj ∥2 −2x i x Tj .
set. So, this vertex, and every vertex, has degree 2(n − 2).
In fact, all the geometrical information about the vectors x i is captured by their Gram matrix, For any neighbor of {1, 2}, say {1, 3}, every other vertex of from {1, i} for i ≥ 4 will be a neighbor
whose (i, j) entry is x i x Tj . This matrix is also given by of both of these, as will the set {2, 3}. Carrying this out in general, we find that
λ = (n − 3) + 1 = n − 2.
UUT .
CHAPTER 9. STRONGLY REGULAR GRAPHS 89 CHAPTER 9. STRONGLY REGULAR GRAPHS 90
Finally, any non-neighbor of {1, 2}, say {3, 4}, will have 4 common neighbors with {1, 2}: coefficients γ1 , . . . , γn with γ1 ̸= 0 and
X
{1, 3} , {1, 4} , {2, 3} , {2, 4} . γi pi (y) = 0.
i
So, µ = 4.
To obtain a contradiction, plug in y = x 1 , to find
X
9.11 Two-distance point sets γi pi (x 1 ) = γ1 p1 (x 1 ) ̸= 0.
i
Recall from last lecture that each eigenspace of a strongly regular graph supply a set of points on Thus, we may conclude
the unit sphere such that the distance between a pair of points just depends on whether or not f
n ≤ 1 + 2f + ,
they are adjacent. If the graph is connected and not the complete graph, then we can show that 2
these distances are greater than zero, so no two vertices map to the same unit vector. If we take which implies √
the corresponding point sets for two strongly regular graphs with the same parameters, we can f≥ 2n − 2.
show that the graphs are isomorphic if and only if there is an orthogonal transformation that
maps one point set to the other. In low dimensions, it is easy to find such an orthogonal
transformation if one exists.
Consider the eigenspace of r, which we recall has dimension f . Fix any set of f independent
vectors corresponding to f vertices. An orthogonal transformation is determined by its action on
these vectors. So, if there is an orthogonal transformation that maps one vector set onto the
other, we will find it by examining all orthogonal transformations determined
by mapping these f
vectors to f vectors in the other set. Thus, we need only examine nf f ! transformations. This
would be helpful √if f were small. Unfortunately, it is not. We will now prove that both f and g
must be at least 2n − 2.
Let x 1 , . . . , x n be a set of unit vectors in IRf such that there are two values α, β < 1 such that
x i , x j = α or β.
pi (y ) = (y T x i − α)(y T x i − β),
for y ∈ IRf . We first note that each polynomial pi is a polynomial of degree 2 in f variables (the
coordinates of y ). As each f -variate polynomial of degree 2 can be expressed in the form
X X
a+ bi yi + ci,j yi yj ,
i i≤k
we see that the vector space of degree-2 polynomials in f variables has dimension
f
1 + 2f + .
2
To prove a lower bound on f , we will show that these polynomials are linearly independent.
Assume by way of contradiction that they are not. Then, without loss of generality, there exist
Chapter 10
Part III We will examine how the eigenvalues of a graph govern the convergence of a random walk.
Our initial probability distribution, p 0 , will typically be concentrated one vertex. That is, there
will be some vertex a for which p 0 (a) = 1. In this case, we say that the walk starts at a.
To derive a p t+1 from p t , note that the probability of being at a vertex a at time t + 1 is the sum
over the neighbors b of a of the probability that the walk was at b at time t, times the probability
it moved from b to a in time t + 1. We can state this algebraically as
X w(a, b)
p t+1 (a) = p (b), (10.1)
d (b) t
b:(a,b)∈E
P
where d (b) = a w(a, b) is the weighted degree of vertex b.
91 92
CHAPTER 10. RANDOM WALKS ON GRAPHS 93 CHAPTER 10. RANDOM WALKS ON GRAPHS 94
We may write this in matrix form using the walk matrix of the graph, which is given by f has the same eigenvectors as W .
Of course, W
def We next observe that the degree vector, d , is a Perron vector of W of eigenvalue 1:
W = M D −1 .
We then have M D −1 d = M 1 = d .
p t+1 = W p t .
So, the Perron-Frobenius theorem (Theorem 4.5.1) tells us that all the eigenvalues of W lie
To see why this holds, consider how W acts as an operator on an elementary unit vector. between −1 and 1. As we did in Proposition 4.5.3, one can show that G is bipartite if and only if
X −1 is an eigenvalue of A.
M D −1 δ b = M (δ b /d (b)) = (wa,b /d (b))δ a .
As Wf = W /2 + I /2, this implies that all the eigenvalues of W
f lie between 0 and 1. We denote
a∼b
f and I /2 + A/2 by
the eigenvalues of W
We will often consider lazy random walks, which are the variant of random walks that stay put
1 = ω1 ≥ ω2 ≥ · · · ≥ ωn ≥ 0.
with probability 1/2 at each time step, and walk to a random neighbor the other half of the time.
These evolve according to the equation While the letter ω is not a greek equivalent “w”, we use it because it looks like one.
X w(a, b)
p t+1 (a) = (1/2)p t (a) + (1/2) p (b), (10.2) From Claim 10.2.1, we now know that
d (b) t
b:(a,b)∈E
def d 1/2
ψ1 =
and satisfy d 1/2
f p t,
p t+1 = W
f is the lazy walk matrix , given by
where W is the unit-norm Perron vector of A, where
def
f def
W
1 1 1 1
= I + W = I + M D −1 . d 1/2 (a) = d (a)1/2 .
2 2 2 2
We will usually work with lazy random walks. 10.3 The stable distribution
Regardless of the starting distribution, the lazy random walk on a connected graph always
10.2 Spectra of Walk Matrices converges to one distribution: the stable distribution. This is the other reason that we forced our
random walk to be lazy. Without laziness1 , there can be graphs on which the random walks never
While the walk matrices are not symmetric, they are similar to symmetric matrices. We will see converge. For example, consider a non-lazy random walk on a bipartite graph. Every-other step
that this implies that they have n real eigenvalues, although their eigenvectors are generally not will bring it to the other side of the graph. So, if the walk starts on one side of the graph, its
orthogonal. Define the normalized adjacency matrix by limiting distribution at time t will depend upon the parity of t.
def In the stable distribution, every vertex is visited with probability proportional to its weighted
A = D −1/2 W D 1/2 = D −1/2 M D −1/2 .
degree. We denote the vector encoding this distribution by π, where
So, A is symmetric.
def
π = d /(1T d ).
Claim 10.2.1. The vector ψ is an eigenvector of A of eigenvalue ω if and only if D 1/2 ψ is an
eigenvector of W of eigenvalue ω. We have already seen that π is a right-eigenvector of eigenvalue 1. To show that the lazy random
walk converges to π, we will exploit the fact that all the eigenvalues other than 1 are in [0, 1).
Proof. As A = D−1/2 W D 1/2 , D 1/2 A = W D 1/2 . Thus, Aψ = ωψ if and only if And, we expand the vectors p t in the eigenbasis of A, after first multiplying by D −1/2 .
1
D 1/2 Aψ = D 1/2 ωψ = ω(D 1/2 ψ) = W (D 1/2 ψ). Strictly speaking, any nonzero probability of staying put at any vertex in a connected graph will guarantee
convergence. We don’t really need a half probability at every vertex.
CHAPTER 10. RANDOM WALKS ON GRAPHS 95 CHAPTER 10. RANDOM WALKS ON GRAPHS 96
f (caution:
Let ψ 1 , . . . , ψ n be the eigenvectors of A corresponding to eigenvalues ω1 , . . . , ωn of W Theorem 10.4.1. For all a, b and t, if p 0 = δ a , then
the corresponding eigenvalues of A are 2ωi − 1). For any initial distribution p 0 , write s
X d (b) t
|p t (b) − π(b)| ≤ ω .
D −1/2 p 0 = ci ψ i , where ci = ψ Ti D −1/2 . d (a) 2
i
X
i Analyzing the right-hand part of this last expression, we find
= D 1/2 ωit ci ψ i
i
X X
X δ Tb ωit ψ i ψ Ti δ a = ωit δ Tb ψ i ψ Ti δ a
= D 1/2 c1 ψ 1 + D 1/2 ωit ci ψ i i≥2 i≥2
i≥2 X
≤ ωit δ Tb ψ i ψ Ti δ a
i≥2
X
As 0 ≤ ωi < 1 for i ≥ 2, the right-hand term must go to zero. On the other hand, ≤ ω2t δ Tb ψ i ψ Ti δ a
ψ 1 = d 1/2 /∥d 1/2 ∥, so i≥2
X
! ≤ ω2t δ Tb ψ i ψ Ti δ a
1/2 1/2 1 d 1/2 d d i≥1
D c1 ψ 1 = D = =P = π. sX sX
∥d 1/2 ∥ ∥d 1/2 ∥ ∥d 1/2 ∥2 a d (a) 2 2
≤ ω2t δ Tb ψ i δ Ta ψ i by Cauchy-Schwartz
This is a perfect example of one of the main uses of spectral theory: to understand what happens i≥1 i≥1
when we repeatedly apply an operator. = ω2t ∥δ b ∥ ∥δ a ∥ , as the eigenvectors form an orthonormal basis,
= ω2t
The rate of convergence of a lazy random walk to the stable distribution is dictated by ω2 : a
small value of ω2 implies fast convergence. 10.5 Relation to the Normalized Laplacian
There are many ways of measuring convergence of a random walk. We will do so point-wise.
The walk matrix is closely related to the normalized Laplacian, which is defined by
Assume that the random walk starts at some vertex a ∈ V . For every vertex b, we will bound how
far p t (b) can be from π(b). N = D −1/2 LD −1/2 = I − D −1/2 M D −1/2 = I − A.
CHAPTER 10. RANDOM WALKS ON GRAPHS 97 CHAPTER 10. RANDOM WALKS ON GRAPHS 98
We let 0 ≤ ν1 ≤ ν2 ≤ · · · ≤ νn denote the eigenvalues of N , and note that they have the same Proof. The Courant-Fischer theorem tells us that
eigenvectors as A. Other useful relations include
xTN x
νi = min max .
νi = 2 − 2ωi , ωi = 1 − νi /2, dim(S)=i x ∈S xTx
and −1/2
As the change of variables y = D x is non-singular, this equals
f = I − 1 D 1/2 N D −1/2 .
W
2 y T Ly
min max .
The normalized Laplacian is positive semidefinite and has the same rank as the ordinary dim(T )=i y ∈T y T Dy
(sometimes called “combinatorial”) Laplacian. There are many advantages of working with the So,
normalized Laplacian: the mean of its eigenvalues is 1, so they are always on a degree-independent y T Ly y T Ly 1 y T Ly λi
scale. One can prove that νn ≤ 2, with equality if and only if the graph is bipartite. min max ≥ min max = min max = .
dim(T )=i y ∈T y T Dy dim(T )=i y ∈T dmax y T y dmax dim(T )=i y ∈T yT y dmax
The bound in Theorem 10.4.1 can be expressed in the eigenvalues of the normalized Laplacian as The other bound may be proved similarly.
s
d (b)
|p t (b) − π(b)| ≤ (1 − ν2 /2)t .
d (a) 10.6 Examples
We will say that a walk has mixed if
We now do some examples. For each we think about the random walk in two ways: by reasoning
|p t (b) − π(b)| ≤ π(b)/2, directly about how a random walk should behave and by examining ν2 .
for all vertices b. Using the approximation 1 − x ≈ exp(−x), we see that this should happen once
10.6.1 The Path
s
d (b)
(1 − ν2 /2)t ≤ d (b)/2d (V ) ⇐⇒ As every vertex in the path on n vertices has degree 1 or 2, ν2 is approximately λ2 , which is
d (a)
p approximately c/n2 for some constant c.
(1 − ν2 /2)t ≤ d (b)d (a)/2d (V ) ⇐⇒
p To understand the random walk on the path, think about what happens when the walk starts in
exp(−tν2 /2) ≤ d (b)d (a)/2d (V ) ⇐⇒ the middle. Ignoring the steps on which it stays put, it will either move to the left or the right
p
−tν2 /2 ≤ ln d (b)d (a)/2d (V ) ⇐⇒ with probability 1/2. So, the position of the walk after t steps is distributed as the sum of t
p random
√ variables taking values
√ in {1, −1}. Recall that the standard deviation of such a sum is
t ≥ 2 ln 2d (V )/ d (b)d (a) /ν2 . t. So, we need to have t comparable to n/4 for there to be a reasonable chance that the walk
is on the left or right n/4 vertices.
So, for graphs in which all degrees are approximately constant, this upper bound on the time to
mix is approximately ln(n)/ν2 . For some graphs the ln n term does not appear. Note that
multiplying all edge weights by a constant does not change any of these expressions. 10.6.2 The Complete Binary Tree
While we have explicitly worked out λ2 for many graphs, we have not done this for ν2 . The As with the path, ν2 for the tree is within a constant of λ2 for the tree, and so is approximately
following lemma will allow us to relate bounds on λ2 to bounds on ν2 : c/n for some constant c. To understand the random walk on Tn , first note that whenever it is at a
vertex, it is twice as likely to step towards a leaf as it is to step towards the root. So, if the walk
Lemma 10.5.1. Let L be the Laplacian matrix of a graph, with eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λn ,
starts at a leaf, there is no way the walk can mix until it reaches the root. The height of the walk
and let N be its normalized Laplacian, with eigenvalues ν1 ≤ ν2 ≤ · · · ≤ ν2 . Then, for all i
is like a sum of ±1 random variables, except that they are twice as likely to be −1 as they are to
λi λi be 1, and that their sum never goes below 0. One can show that we need to wait approximately n
≥ νi ≥ ,
dmin dmax steps before such a walk will hit the root. Once it does hit the root, the walk mixes rapidly.
where dmin and dmax are the minimum and maximum degrees of vertices in the graph.
CHAPTER 10. RANDOM WALKS ON GRAPHS 99 CHAPTER 10. RANDOM WALKS ON GRAPHS 100
The dumbell graph Dn consists of two complete graphs on n vertices, joined by one edge called We define the bolas2 graph Bn to be a graph containing two n-cliques connected by a path of
the “bridge”. So, there are 2n vertices in total, and all vertices have degree n − 1 or n. length n. The bolas graph has a value of ν2 that is almost as small as possible. Equivalently,
random walks on a bolas graph mix almost as slowly as possible.
To understand the random walk on this graph, consider starting it at some vertex that is not
attached to the bridge edge. After the first step the walk will be well mixed on the vertices in the The analysis of the random walk on a bolas is similar to that on a dumbbell, except that when
side on which it starts. Because of this, the chance that it finds the edge going to the other side is the walk is on the first vertex of the path the chance that it gets to the other end before moving
only around 1/n2 : there is only a 1/n chance of being at the vertex attached to the bridge edge, back to the clique at which we started is only 1/n. So, we must wait around n3 steps before there
and only a 1/n chance of choosing that edge when at that vertex. So, we must wait some multiple is a reasonable chance of getting to the other side.
of n2 steps before there is a reasonable chance that the walk reaches the other side of the graph.
Next lecture, we will learn that we can upper bound ν2 with a test vector using the fact that
The isoperimetric ratio of this graph is
1 x T Lx
θDn ∼ . ν2 = min .
n x ⊥d x T Dx
Using the test vector that is 1 on one complete graph and −1 on the other, we can show that
To prove an upper bound on ν2 , form a test vector that is n/2 on one clique, −n/2 on the other,
λ2 (Dn ) ⪅ 1/n. and increases by 1 along the path. We can use the symmetry of the construction to show that this
vector is orthogonal to d . The numerator of the generalized Rayleigh quotient is n, and the
Lemma 10.5.1 then tells us that denominator is the sum of the squares of the entries of the vectors times the degrees of the
ν2 (Dn ) ⪅ 1/n2 . vertices, which is some constant times n4 . This tells us that ν2 is at most some constant over n3 .
To prove that this bound is almost tight, we use the following lemma. To see that ν2 must be at least some constant over n3 , and in fact that this must hold for every
graph, apply Lemmas 10.5.1 and 10.6.1.
Lemma 10.6.1. Let G be an unweighted graph of diameter at most r. Then,
2
λ2 (G) ≥ . 10.7 Diffusion
r(n − 1)
Proof. For every pair of vertices (a, b), let P (a, b) be a path in G of length at most r. We have There are a few types of diffusion that people study in a graph, but the most common is closely
related to random walks. In a diffusion process, we imagine that we have some substance that can
L(a,b) ≼ r · LP (a,b) ≼ rLG . occupy the vertices, such as a gas or fluid. At each time step, some of the substance diffuses out
of each vertex. If we say that half the substance stays at a vertex at each time step, and the other
So, half is distributed among its neighboring vertices, then the distribution of the substance will
n
Kn ≼ r G, evolve according to equation (10.2). That is, probability mass obeys this diffusion equation.
2
People often consider finer time steps in which smaller fractions of the mass leave the vertices. In
and
n the limit, this results in continuous random walks that are modeled by the matrix exponential: if
n≤r λ2 (G), the walk stays put with probability 1 − ϵ in each step, and we view each step as taking time ϵ,
2
then the transition matrix of the walk after time t will be
from which the lemma follows.
((1 − ϵ)I + ϵW )t/ϵ → exp(t(W − I )).
The diameter of Dn is 3, so we have λ2 (Dn ) ≥ 2/3(n − 1). As every vertex of Dn has degree at
least n − 1, we may conclude ν2 (Dn ) ⪆ 2/3(n − 1)2 . These are in many ways more natural than discrete time random walks.
2
A bolas is a hunting weapon consisting of two balls or rocks tied together with a cord.
CHAPTER 10. RANDOM WALKS ON GRAPHS 101
11.1 Overview
In this lecture we will see how the analysis of random walks, spring networks, and resistor
networks leads to the consideration of systems of linear equations in Laplacian matrices. The
main purpose of this lecture is to introduce concepts and language that we will use extensively in
the rest of the course.
The theme of this whole lecture will be harmonic functions on graphs. These will be defined in
terms of a weighted graph G = (V, E, w) and a set of boundary vertices B ⊆ V . We let S = V − B
(I use “-” for set-minus). We will assume throughout this lecture that G is connected and that B
is nonempty.
A function x : V → R is said to be harmonic at a vertex a if the value of x at a is the weighted
average of its values at the neighbors of a where the weights are given by w:
1 X
x (a) = wa,b x (b). (11.1)
da
b∼a
102
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 103 CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 104
11.3 Random Walks on Graphs should be its weight. If a rubber band connects vertices a and b, then Hooke’s law tells us that
the force it exerts at node a is in the direction of b and is proportional to the distance between a
Consider the standard (not lazy) random walk on the graph G. Recall that when the walk is at a and b. Let x (a) be the position of each vertex a. You should begin by thinking of x (a) being in
vertex a, the probability it moves to a neighbor b is IR, but you will see that it is just as easy to make it a vector in IR2 or IRk for any k.
wa,b The force the rubber band between a and b exerts on a is
.
da
x (b) − x (a).
Distinguish two special nodes in the graph that we will call s and t, and run the random walk In a stable configuration, all of the vertices that have not been nailed down must experience a
until it hits either s or t. We view s and t as the boundary, so B = {s, t}. zero net force. That is
X X
Let x (a) be the probability that a walk that starts at a will stop at s, rather than at t. We have (x (b) − x (a)) = 0 ⇐⇒ x (b) = da x (a)
the boundary conditions x (s) = 1 and x (t) = 0. For every other node a the chance that the walk b∼a b∼a
stops at s is the sum over the neighbors b of a of the chance that the walk moves to b, times the 1 X
⇐⇒ x (b) = x (a).
chance that a walk from b stops at s. That is, da
b∼a
X wa,b
x (a) = x (b). In a stable configuration, every vertex that is not on the boundary must be the average of its
da neighbors.
b∼a
So, the function x is harmonic at every vertex in V − B. In the weighted case, we would have for each a ∈ V − B
For example, consider the path graph Pn . Let’s make s = n and t = 1. So, the walk stops at 1 X
wa,b x (b) = x (a).
either end. We then have x (n) = 1, x (1) = 0. It is easy to construction at least one solution to da
b∼a
the harmonic equations (11.1): we can set
That is, x is harmonic on V − B.
a−1
x (a) = . We will next show that the equations (11.1) have a solution, and that it is unique1 if the
n−1
underlying graph is connected and B is nonempty But first, consider again the path graph Pn
It essentially follows from the definitions that there can be only one vector x that solves these with the endpoints fixed: B = {1, n}. Let us fix them to the values f (1) = 1 and f (n) = n. The
equations. But, we will prove this algebraically later in lecture. only solution to the equations (11.1) is the obvious one: vertex i is mapped to i: x (i) = i for all i.
These solutions tell us that if the walk starts at node a, the chance that it ends at node n is
(a − 1)/(n − 1). This justifies some of our analysis of the Bolas graph from Lecture 10.
11.5 Laplacian linear equations
Of course, the exact same analysis goes through for the lazy random walks: those give
X wa,b X wa,b
x (a) = (1/2)x (a) + (1/2) x (b) ⇐⇒ x (a) = x (b). If we rewrite equation (11.1) as
da da X
b∼a b∼a
da x (a) − wa,b x (b) = 0, (11.2)
b∼a
11.4 Spring Networks we see that it corresponds to the row of the Laplacian matrix corresponding to vertex a. So, we
may find a solution to the equations (11.1) by solving a system of equations in the submatrix of
We begin by imagining that every edge of a graph G = (V, E) is an ideal spring or rubber band. the Laplacian indexed by vertices in V − B.
They are joined together at the vertices. Given such a structure, we will pick a subset of the To be more concete, I will set up those equations. For each vertex a ∈ B, let its position be fixed
vertices B ⊆ V and fix the location of every vertex in B. For example, you could nail each vertex to f (a). Then, we can re-write equation (11.2) as
in B onto a point in the real line, or onto a board in IR2 . We will then study where the other X X
vertices wind up. da x (a) − wa,b x (b) = wa,b f (b),
b̸∈B:(a,b)∈E b∈B:(a,b)∈E
We can use Hooke’s law to figure this out. To begin, assume that each rubber band is an ideal
1
spring with spring constant 1. If your graph is weighted, then the spring constant of each edge It can only fail to be unique if there is a connected component that contains no vertices of B.
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 105 CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 106
for each a ∈ V − B. So, all of the boundary terms wind up in the right-hand vector. Proof. Let S1 , . . . , Sk be the connected components of vertices of G(S). We can use these to write
L(S, S) as a block matrix with blocks equal to L(Si , Si ). Each of these blocks can be written
Let S = V − B. We now see that this is an equation of the form
L(Si , Si ) = LGSi + XSi .
L(S, S)x (S) = r , with r = M (S, :)f .
As G is connected, there must be some vertex in Si with an edge to a vertex not in Si . This
By L(S, S) I mean the submatrix of L indexed by rows and columns of S, and by x (S) I mean
implies that XSi is not the zero matrix, and so we can apply Lemma 11.5.2 to prove that L(Si , Si )
the sub-vector of x indexed by S.
is invertible.
We can then write the condition that entries of B are fixed to f by
As the matrix L(S, S) is invertible, the equations have a solution, and it must be unique.
x (B) = f (B).
We have reduced the problem to that of solving a system of equations in a submatrix of the 11.6 Energy
Laplacian.
Submatrices of Laplacians are a lot like Laplacians, except that they are positive definite. To see Physics also tells us that the vertices will settle into the position that minimizes the potential
this, note that all of the off-diagonals of the submatrix of L agree with all the off-diagonals of the energy. The potential energy of an ideal linear spring with constant w when stretched to length l
Laplacian of the induced subgraph on the internal vertices. But, some of the diagonals are larger: is
1 2
the diagonals of nodes in the submatrix account for both edges in the induced subgraph and wl .
2
edges to the vertices in B.
So, the potential energy in a configuration x is given by
Claim 11.5.1. Let L be the Laplacian of G = (V, E, w), let B ⊆ V , and let S = V − B. Then,
def 1 X
E (x ) = wa,b (x (a) − x (b))2 . (11.3)
L(S, S) = LG(S) + X S , 2
(a,b)∈E
where G(S) is the subgraph induced on the vertices in S and X S is the diagonal matrix with
entries For any x that minimizes the energy, the partial derivative of the energy with respect to each
X
X S (a, a) = wa,b , for a ∈ S. variable must be zero. In this case, the variables are x (a) for a ∈ S. The partial derivative with
b∼a,b∈B
respect to x (a) is
1X X
wa,b 2(x (a) − x (b)) = wa,b (x (a) − x (b)).
Lemma 11.5.2. Let L be the Laplacian matrix of a connected graph and let X be a nonnegative, 2
b∼a b∼a
diagonal matrix with at least one nonzero entry. Then, L + X is positive definite.
Setting this to zero gives the equations we previously derived: (11.1).
Proof. We will prove that x T (L + X )x > 0 for every nonzero vector x . As both L and X are For future reference, we state this result as a theorem.
positive semidefinite, we have
Theorem 11.6.1. Let G = (V, E, w) be a connected, weighted graph, let B ⊂ V , and let
x T (L + X )x ≥ min x T Lx , x T X x . S = V − B. Given x (B), E (x ) is minimized by setting x (S) so that x is harmonic on S.
Recall Ohm’s law: It is often helpful to think of the nodes a for which i ext (a) ̸= 0 as being boundary nodes. We will
V = IR. call the other nodes internal. Let’s see what the equation
That is, the potential drop across a resistor (V ) is equal to the current flowing over the resistor i ext = Lv .
(I) times the resistance (R). To apply this in a graph, we will define for each edge (a, b) the
current flowing from a to b to be i (a, b). As this is a directed quantity, we define means for the internal nodes. If the graph is unweighted and a is an internal node, then the ath
row of this equation is
i (b, a) = −i (a, b). X X
0 = (δ Ta L)v = (v (a) − v (b)) = da v (a) − v (b).
a∼b a∼b
I now let v ∈ IRV be a vector of potentials (voltages) at vertices. Given these potentials, we can
That is,
figure out how much current flows on each edge by the formula 1 X
v (a) = v (b),
1 da
i (a, b) = (v (a) − v (b)) = wa,b (v (a) − v (b)) . a∼b
ra,b which means that v is harmonic at a. Of course, the same holds in weighted graphs.
That is, we adopt the convention that current flows from high voltage to low voltage. We would
like to write this equation in matrix form. The one complication is that each edge comes up twice
in i . So, to treat i as a vector we will have each edge show up exactly once as (a, b) when a < b. 11.8 Solving for currents
We now define the signed edge-vertex adjacency matrix of the graph U to be the matrix with
rows indexed by edges and columns indexed by vertices such that We are often interested in applying (11.4) in the reverse: given a vector of external currents i ext
we solve for the induced voltages by
1 if a = c v = L−1 i ext .
U ((a, b), c) = −1 if b = c
This at first appears problematic, as the Laplacian matrix does not have an inverse. The way
0 otherwise. around this problem is to observe that we are only interested in solving these equations for vectors
i ext for which the system has a solution. In the case of a connected graph, this equation will have
Thus the row of U corresponding to edge (a, b) is U ((a, b), :) = δ Ta − δ Tb . a solution if the sum of the values of i ext is zero. That is, if the current going in to the circuit
Define W to be the diagonal matrix with rows and columns indexed by edges with the weights of equals the current going out. These are precisely the vectors that are in the span of the Laplacian.
the edges on the diagonals. We then have To obtain the solution to this equation, we multiply i ext by the Moore-Penrose pseudo-inverse of
L.
i = W U v.
Definition 11.8.1. The pseudo-inverse of a symmetric matrix L, written L+ , is the matrix that
Also recall that resistor networks cannot hold current. So, all the current entering a vertex a from has the same span as L and that satisfies
edges in the graph must exit a to an external source. Let i ext ∈ IRV denote the external currents, LL+ = Π,
where i ext (a) is the amount of current entering the graph through node a. We then have
X
i ext (a) = i (a, b). where Π is the symmetric projection onto the span of L.
b∼a
In matrix form, this becomes I remind you that a matrix Π is a symmetric projetion if Π is symmetric and Π2 = Π. This is
T T
i ext = U i = U W U v . (11.4) equivalent to saying that all of its eigenvalues are 0 or 1. We also know that Π = (1/n)LKn .
The matrix The symmetric case is rather special. As LΠ = L, the other following properties of the
def Moore-Penrose pseudo inverse follow from this one:
L = UTWU
is, of course, the Laplacian. This is another way of writing the expression that we derived in L+ L = Π,
Lecture 3: X LL+ L = L
L= wa,b (δ a − δ b )(δ a − δ b )T . L+ LL+ = L+ .
a∼b
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 109
It is easy to find a formula for the pseudo-inverse. First, let Ψ be the matrix whose ith column is
ψ i and let Λ be the diagonal matrix with λi on the ith diagonal. Recall that
X
L = ΨΛΨT = λi ψ i ψ Ti .
i
Claim 11.8.2.
L+ =
X
(1/λi )ψ i ψ Ti . Chapter 12
i>1
Moreover, this holds for any symmetric matrix. Not just Laplacians.
The effective resistance between two vertices a and b in an electrical network is the resistance of
the entire network when we treat it as one complex resistor. That is, we reduce the rest of the
network to a single edge. In general, we will see that if we wish to restrict our attention to a
subset of the vertices, B, and if we require all other vertices to be internal, then we can construct
a network just on B that factors out the contributions of the internal vertices. The process by
which we do this is Gaussian elimination, and the Laplacian of the resulting network on B is
called a Schur complement.
We now know that if a resistor network has external currents i ext , then the voltages induced at
the vertices will be given by
v = L+ i ext .
Consider what this means when i ext corresponds to a flow of one unit from vertex a to vertex b.
The resulting voltages are
v = L+ (δ a − δ b ).
Now, let c and d be two other vertices. The potential difference between c and d is
v (c) − v (d) = (δ c − δ d )T v = (δ c − δ d )T L+ (δ a − δ b ).
(δ a − δ b )T L+ (δ c − δ d ).
So, the potential difference between c and d when we flow one unit from a to b is the same as the
potential difference between a and b when we flow one unit from c to d.
110
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 111 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 112
The effective resistance between vertices a and b is the resistance between a and b when we view when x (s) is fixed to 0 and x (t) is fixed to 1. From Theorem 11.6.1, we know that this vector will
the entire network as one complex resistor. be harmonic on V − {s, t}.
To figure out what this is, recall the equation Fortunately, we already know how compute such a vector x . Set
v (a) − v (b) y = L+ (δ t − δ s )/Reff (t, s).
i (a, b) = ,
ra,b
We have
which holds for one resistor. We use the same equation to define the effective resistance of the y (t) − y (s) = (δ t − δ s )T L+ (δ t − δ s )/Reff (s, t) = 1,
whole network between a and b. That is, we consider an electrical flow that sends one unit of
and y is harmonic on V − {s, t}. So, we choose
current into node a and removes one unit of current from node b. We then measure the potential
difference between a and b that is required to realize this current, define this to be the effective x = y − 1y (s).
resistance between a and b, and write it Reff (a, b). As it equals the potential difference between a
and b in a flow of one unit of current from a to b: The vector x satisfies x (s) = 0, x (t) = 1, and it is harmonic on V − {s, t}. So, it is the vector
that minimizes the energy subject to the boundary conditions.
def
Reff (a, b) = (δ a − δ b )T L+ (δ a − δ b ).
To finish, we compute the energy to be
We will eventually show that effective resistance is a distance. For now, we observe that effective x T Lx = y T Ly
resistance is the square of a Euclidean distance. 1 T
= L+ (δ t − δ s ) L L+ (δ t − δ s )
To this end, let L+/2 denote the square root of L+ . Recall that every positive semidefinite matrix (Reff (s, t))2
has a square root: the square root of a symmetric matrix M is the symmetric matrix M 1/2 such 1
= (δ t − δ s )T L+ LL+ (δ t − δ s )
that (M 1/2 )2 = M . If (Reff (s, t))2
X 1
M = λi ψ i ψ T = (δ t − δ s )T L+ (δ t − δ s )
i (Reff (s, t))2
is the spectral decomposition of M , then 1
= .
X Reff (s, t)
1/2
M 1/2 = λi ψ i ψ T .
i As the weights of edges are the reciprocals of their resistances, and the spring constant
corresponds to the weight, this is the formula we would expect.
We now have
Resistor networks have an analogous quantity: the energy dissipation (into heat) when current
T 2 flows through the network. It has the same formula. The reciprocal of the effective resistance is
T + +/2 +/2 +/2
(δ a − δ b ) L (δ a − δ b ) = L (δ a − δ b ) L (δ a − δ b ) = L (δ a − δ b ) sometimes called the effective conductance.
2
= L+/2 δ a − L+/2 δ b = dist(L+/2 δ a , L+/2 δ b )2 .
12.3 Monotonicity
12.2 Effective Resistance through Energy Minimization
Rayleigh’s Monotonicity Principle tells us that if we alter the spring network by decreasing some
of the spring constants, then the effective spring constant between s and t will not increase. In
As you would imagine, we can also define the effective resistance through effective spring
terms of effective resistance, this says that if we increase the resistance of some resistors then the
constants. In this case, we view the network of springs as one large compound network. If we
effective resistance can not decrease. This sounds obvious. But, it is in fact a very special
define the effective spring constant of s, t to be the number w so that when s and t are stretched
property of linear elements like springs and resistors.
to distance l the potential energy in the spring is wl2 /2, then we should define the effective spring
constant to be twice the minimum possible energy of the network, b = (V, E, w)
Theorem 12.3.1. Let G = (V, E, w) be a weighted graph and let G b be another
X weighted graph with the same edges and such that
2E (x ) = wa,b (x (a) − x (b))2 ,
(a,b)∈E ba,b ≤ wa,b
w
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 113 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 114
for all (a, b) ∈ E. For vertices s and t, let cs,t be the effective spring constant between s and t in 12.5 Equivalent Networks, Elimination, and Schur Complements
G and let b b Then,
cs,t be the analogous quantity in G.
We have shown that the impact of the entire network on two vertices can be reduced to a network
cs,t ≤ cs,t .
b
with one edge between them. We will now see that we can do the same for a subset of the
vertices. We will do this in two ways: first by viewing L as an operator, and then by considering
Proof. Let x be the vector of minimum energy in G such that x (s) = 0 and x (t) = 1. Then, the it as a quadratic form.
b is no greater:
energy of x in G
Let B be the subset of nodes that we would like to understand (B stands for boundary). All
1 X 1 X
ba,b (x (a) − x (b))2 ≤
w wa,b (x (a) − x (b))2 = cs,t . nodes not in B will be internal. Call them I = V − B.
2 2
(a,b)∈E (a,b)∈E
As an operator, the Laplacian maps vectors of voltages to vectors of external currents. We want
b such that x (s) = 0 and x (t) = 1 will be at most cs,t ,
So, the minimum energy of a vector x in G to examine what happens if we fix the voltages at vertices in B, and require the rest to be
cs,t ≤ cs,t .
and so b harmonic. Let v (B) ∈ IRB be the voltages at B. We want the matrix LB such that
i B = LB v (B)
Similarly, if we let R b eff (s, t) be the effective resistance in G between s and t, then
b eff (s, t) ≥ Reff (s, t). That is, increasing the resistance of resistors in the network cannot decrease
R is the vector of external currents a vertices in B when we impose voltages v (B) at vertices of B.
effective resistances. As the internal vertices will have their voltages set to be harmonic, they will not have any
external currents.
While this principle seems very simple and intuitively obvious, it turns out to fail in just slightly
more complicated situations. The remarkable fact that we will discover is that LB is in fact a Laplacian matrix, and that it is
obtained by performing Gaussian elimination to remove the internal vertices. Warning: LB is
not a submatrix of L. To prove this, we will move from V to B by removing one vertex at a time.
12.4 Examples: Series and Parallel We’ll start with a graph G = (V, E, w), and we will set B = {2, . . . , n}, and we will treat vertex 1
as internal. Let N denote the set of neighbors of vertex 1.
In the case of a path graph with n vertices and edges of weight 1, the effective resistance between We want to compute Lv given v (b) for b ∈ B, and that
the extreme vertices is n − 1.
1 X
In general, if a path consists of edges of resistance r1,2 , . . . , rn−1,n then the effective resistance v (1) = w1,a v (a). (12.1)
d (1)
between the extreme vertices is a∈N
r1,2 + · · · + rn−1,n .
That is, we want to substitute the value on the right-hand side for v (1) everywhere that it
To see this, set the potential of vertex i to appears in the equation i ext = Lv . The variable v (1) only appears in the equation for i ext (a)
when a ∈ N . When it does, it appears with coefficient w1,a . Recall that the equation for i ext (b) is
v (i) = ri,i+1 + · · · + rn−1,n .
X
Ohm’s law then tells us that the current flow over the edge (i, i + 1) will be i ext (b) = d (b)v (b) − wb,c v (c).
c∼b
(v (i) − v (i + 1)) /ri,i+1 = 1.
For b ∈ N we expand this by making the substitution for v (1) given by (12.1).
X
If we have k parallel edges between two nodes s and t of resistances r1 , . . . , rk , then the effective i ext (b) = d (b)v (b) − wb,1 v (1) − wb,c v (c)
resistance is c∼b,c̸=1
1
Reff (s, t) = . 1 X X
1/r1 + · · · + 1/rk = d (b)v (b) − wb,1 w1,a v (a) − wb,c v (c)
d (1)
To see this, impose a potential difference of 1 between s and t. This will induce a flow of a∈N c∼b,c̸=1
X wb,1 wa,1 X
1/ri = wi on edge i. So, the total flow will be = d (b)v (b) − v (a) − wb,c v (c).
X X d (1)
a∈N c∼b,c̸=1
1/ri = wi .
i i
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 115 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 116
To finish, observe that b ∈ N , so we are counting b in the middle sum above. Removing the Thus, we can write the quadratic form as
double-count gives. T
X wb,1 wa,1 X −(1/L(1, 1))L(1, B)v (B) −(1/L(1, 1))L(1, B)v (B)
2 L .
i ext (b) = (d (b) − wb,1 /d (1))v (b) − v (a) − wb,c v (c). v (B) v (B)
d (1)
a∈N,a̸=b c∼b,c̸=1
Let’s look at exactly how the matrix has changed. In the row for vertex b, the edge to vertex 1 = v (B)T L(B, B)v (B) + (L(1, B)v (B))2 /L(1, 1) − 2 (L(1, B)v (B))2 /L(1, 1)
w wa,1
was removed, and edges to every vertex a ∈ N were added with weights b,1 d (1) . And, the = v (B)T L(B, B)v (B) − (L(1, B)v (B))2 /L(1, 1).
wb,1 wb,1
diagonal was decreased by d (1) . Overall, the star of edges based at 1 were removed, and a
clique on N was added in which edge (a, b) has weight Thus,
L(B, 1)L(1, B)
wb,1 w1,a LB = L(B, B) − .
. L(1, 1)
d (1)
To see that this is the matrix that appears in rows and columns 2 through n when we eliminate
To see that this new system of equations comes from a Laplacian, we observe that the entries in the first column of L by adding multiples of the first row, note that we eliminate
entry L(a, 1) by adding −L(a, 1)/L(1, 1) times the first row of the matrix to L(a, :). Doing this
1. It is symmetric. for all rows in B = {2, . . . , n} results in this formula.
2. The off-diagonal entries that have been added are negative. We can again check that LB is a Laplacian matrix. It is clear from the formula that it is
symmetric and that the off-diagonal entries are negative. To check that the constant vectors are
3. The sum of the changes in diagonal and off-diagonal entries is zero, so the row-sum is still in the nullspace, we can show that the quadratic form is zero on those vectors. If v (B) is a
zero. This follows from constant vector, then v (1) must equal this constant, and so v is a constant vector and the value
2
wb,1 X wb,1 wa,1
− = 0. of the quadratic form is 0.
d (1) d (1)
a∈N
We now do this in terms of the quadratic form. That is, we will compute the matrix LB so that We can of course use the same procedure to eliminate many vertices. We begin by partitioning
the vertex set into boundary vertices B and internal vertices I. We can then use Gaussian
v (B)T LB v (B) = v T Lv , elimination to eliminate all of the internal vertices. You should recall that the submatrices
produced by Gaussian elimination do not depend on the order of the eliminations. So, you may
given that v is harmonic at vertex 1 and agrees with v (B) elsewhere. The quadratic form that we conclude that the matrix LB is uniquely defined.
want to compute is thus given by
Or, observe that to eliminate the entries in row a ∈ B and columns in S, using the rows in S, we
1 P T 1 P need to add those rows, L(S, :) to row L(a, :) with coefficients c so that
b∼1 w1,b v (b) w1,b v (b)
d (1) L d (1) b∼1 .
v (B) v (B) L(a, S) + cL(S, S) = 0.
So that we can write this in terms of the entries of the Laplacian matrix, note that This gives
d (1) = L(1, 1), and so c = −L(a, S)L(S, S)−1 ,
1 X and thus row a becomes
v (1) = w1,b v (b) = −(1/L(1, 1))L(1, B)v (B).
d (1)
b∼1 L(a, :) − L(a, S)L(S, S)−1 L(S, :).
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 117 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 118
Restricting to rows and columns in B, we are left with the matrix We claim that the effective resistance is a distance. The only non-trivial part to prove is the
triangle inequality, (4).
−1
L(B, B) − L(B, S)L(S, S) L(S, B).
From the previous section, we know that it suffices to consider graphs with only three vertices: we
This is called the Schur complement on B (or with respect to S). can reduce any graph to one on just vertices a, b and c without changing the effective resistances
between them.
To see that this is equivalent to requiring that the variables in S be harmonic. Partition a vector
v into v (B) and v (S). The harmonic equations become Lemma 12.8.1. Let a, b and c be vertices in a graph. Then
L(S, S)v (S) + L(S, B)v (B) = 0, Reff (a, b) + Reff (b, c) ≥ Reff (a, c).
which implies
Proof. Let
v (S) = −L(S, S)−1 L(S, B)v (B) = L(S, S)−1 M (S, B)v (B),
z = wa,b , y = wa,c , and x = wb,c .
as M (S, B) = −L(S, B) because off-diagonal blocks of the Laplacian equal the negative of the
If we eliminate vertex c, we create an edge between vertices a and b of weight
corresponding blocks in the adjacency matrix. This gives
xy
.
i ext (B) = L(B, S)v (S) + L(B, B)v (B) = −L(B, S)L(S, S)−1 L(S, B)v (B) + L(B, B)v (B), x+y
xy
and so Adding this to the edge that is already there produces weight z + x+y , for
−1
i ext (B) = LB v (B), where LB = L(B, B) − L(B, S)L(S, S) L(S, B)
1 1 x+y
is the Schur complement. Reff a,b = xy = zx+zy+xy =
z + x+y x+y
zx + zy + xy
Working symmetrically, we find that we need to prove that for all positive x, y, and z
12.7 An interpretation of Gaussian elimination
x+y y+z x+z
+ ≥ ,
zx + zy + xy zx + zy + xy zx + zy + xy
This gives us a way of understand how Gaussian elimination solves a system of equations like
i ext = Lv . It constructs a sequence of graphs, G2 , . . . , Gn , so that Gi is the effective network on which is of course true.
vertices i, . . . , n. It then solves for the entries of v backwards. Given v (i + 1), . . . , v (n) and
i ext (i), we can solve for v (i). If i ext (i) = 0, then v (i) is set to the weighted average of its
neighbors. If not, then we need to take i ext (i) into account here and in the elimination as well. In
the case in which we fix some vertices and let the rest be harmonic, there is no such complication.
The second comes from the observation that the determinant is the volume P of the parallelepiped
with axes a 1 , . . . , a n : the polytope whose corners are the origin and i∈S a i for every
S ⊆ {1, . . . , n}. Let
Πa 1
be the symmetric projection orthogonal to a 1 . As this projection amounts to subtracting off a
multiple of a 1 and elementary row operations do not change the determinant,
Chapter 13
det a 1 , a 2 , . . . , a n = det a 1 , Πa 1 a 2 , . . . , Πa 1 a n .
The volume of this parallelepiped is ∥aa 1 ∥ times the volume of the parallelepiped formed by the
Random Spanning Trees vectors Πa 1 a 2 , . . . , Πa 1 a n . I would like to write this as a determinant, but must first deal with
the fact that these are n − 1 vectors in an n dimensional space. The way we first learn to handle
this is to project them into an n − 1 dimensional space where we can take the determinant.
Instead, we will employ other elementary symmetric functions of the eigenvalues.
13.1 Introduction
13.3 Characteristic Polynomials
In this chapter we present one of the most fundamental results in Spectral Graph Theory: the
Matrix-Three Theorem. It relates the number of spanning trees of a connected graph to the Recall that the characteristic polynomial of a matrix A is
determinants of principal minors of the Laplacian. We then extend this result to relate the
fraction of spanning trees that contain a given edge to the effective resistance of the entire graph det(xI − A).
between the edge’s endpoints.
I will write this as
n
X
xn−k (−1)k σk (A),
13.2 Determinants k=0
where σk (A) is the kth elementary symmetric function of the eigenvalues of A, counted with
To begin, we review some facts about determinants of matrices and characteristic polynomials. algebraic multiplicity: X Y
We first recall the Leibniz formula for the determinant of a square matrix A: σk (A) = λi .
n
! |S|=k i∈S
X Y
det(A) = sgn(π) A(i, π(i)) , (13.1) Thus, σ1 (A) is the trace and σn (A) is the determinant. From this formula, we know that these
π i=1 functions are invariant under similarity transformations.
where the sum is over all permutations π of {1, . . . , n}. In Exercise 3 from Lecture 2, you were asked to prove that
Also recall that the determinant is multiplicative, so for square matrices A and B X
σk (A) = det(A(S, S)). (13.3)
|S|=k
det(AB) = det(A) det(B). (13.2)
This follows from applying the Leibnitz formula (13.1) to det(xI − A).
Elementary row operations do not change the determinant. If the columns of A are the vectors
a 1 , . . . , a n , then for every c If we return to the vectors Πa 1 a 2 , . . . , Πa 1 a n from the previous section, we see that the volume of
their parallelepiped may be written
det a 1 , a 2 , . . . , a n = det a 1 , a 2 , . . . , a n + caa 1 .
σn−1 0n , Πa 1 a 2 , . . . , Πa 1 a n ,
This fact gives us two ways of computing the determinant. The first comes from the fact that we
can apply elementary row operations to transform A into an upper triangular matrix, and (13.1) as this will be the product of the n − 1 nonzero eigenvalues of this matrix.
tells us that the determinant of an upper triangular matrix is the product of its diagonal entries.
119
CHAPTER 13. RANDOM SPANNING TREES 121 CHAPTER 13. RANDOM SPANNING TREES 122
Recall that the matrices BB T and B T B have the same eigenvalues, up to some zero eigenvalues We will prove that for every a ∈ V ,
if they are rectangular. So, Y
det(LG (Sa , Sa )) = we .
σk (BB T ) = σk (B T B).
e∈E
This gives us one other way of computing the absolute value of the product of the nonzero Write LG = U T W U , where U is the signed edge-vertex adjacency matrix and W is the
eigenvalues of the matrix diagonal matrix of edge weights. Write B = W 1/2 U , so
Πa 1 a 2 , . . . , Πa 1 a n .
LG (Sa , Sa ) = B(:, Sa )T B(:, Sa ),
We can instead compute their square by computing the determinant of the square matrix
and
Πa 1 a 2 det(LG (Sa , Sa )) = det(B(:, Sa ))2 ,
..
. Πa 1 a 2 , . . . , Πa 1 a n . where we note that B(:, Sa ) is square because a tree has n − 1 edges and so B has n − 1 rows.
Πa 1 a n To see what is going on, first consider the case in which G is a weighted path and a is the first
vertex. Then,
When B is a singular matrix of rank k, σk (B) acts as the determinant of B restricted to its span. √
1 −1 0 · · · 0 − w1 0 ··· 0
Thus, there are situations in which σk is multiplicative. For example, if A and B both have rank 0 1 −1 · · · 0 √w2 −√w2 · · ·
0
k and the range of A is orthogonal to the nullspace of B, then U = . .. , and B(:, S1 ) = .. .. .
.. . . .
σk (BA) = σk (B)σk (A). (13.4) √
0 0 0 · · · −1 0 0 · · · − wn−1
We will use this identity in the case that A and B are symmetric and have the same nullspace. We see that B(:, S1 ) is a lower-triangular matrix, and thus its determinant is the product of its
√
diagonal entries, − wi .
To see that the same happens for every tree, renumber the vertices (permute the columns) so that
13.4 The Matrix Tree Theorem a comes first, and that the other vertices are ordered by increasing distance from 1, breaking ties
arbitrarily. This permutations can change the sign of the determinant, but we do not care
We will state a slight variant of the standard Matrix-Tree Theorem. Recall that a spanning tree because we are going to square it. For every vertex c ̸= 1, the tree now has exactly one edge (b, c)
of a graph is a subgraph that is a tree. with b < c. Put such an edge in position c − 1 in the ordering, and let wc indicate its weight.
Now, when we remove the first column to form B(:, S1 ), we produce a lower triangular matrix
Theorem 13.4.1. Let G = (V, E, w) be a connected, weighted graph. Then √
with the entry − wc on the cth diagonal. So, its determinant is the product of these terms and
X Y
σn−1 (LG ) = n we . n
Y
spanning trees T e∈T det(B(:, Sa ))2 = wc .
c=2
Thus, the eigenvalues allow us to count the sum over spanning trees of the product of the weights
of edges in those trees. When all the edge weights are 1, we just count the number of spanning
trees in G. Proof of Theorem 13.4.1 . As in the previous lemma, let LG = U T W U and B = W 1/2 U . So,
We first prove this in the case that G is just a tree. σn−1 (LG ) = σn−1 (B T B)
Lemma 13.4.2. Let G = (V, E, w) be a weighted tree. Then, = σn−1 (BB T )
X
Y = σn−1 (B(S, :)B(S, :)T ) (by (13.3) )
σn−1 (LG ) = n we .
|S|=n−1,S⊆E
e∈E X
= σn−1 (B(S, :)T B(S, :))
Proof. For a ∈ V , let Sa = V − {a}. We know from (13.3) |S|=n−1,S⊆E
X
X = σn−1 (LGS ),
σn−1 (LG ) = det(LG (Sa , Sa ).
|S|=n−1,S⊆E
a∈V
CHAPTER 13. RANDOM SPANNING TREES 123 CHAPTER 13. RANDOM SPANNING TREES 124
where by GS we mean the graph containing just the edges in S. As S contains n − 1 edges, this Proof. The matrix Γ is clearly symmetric. To show that it is a projection, it suffices to show that
graph is either disconnected or a tree. If it is disconnected, then its Laplacian has at least two all of its eigenvalues are 0 or 1. This is true because, excluding the zero eigenvalues, Γ has the
zero eigenvalues and σn−1 (LGS ) = 0. If it is a tree, we apply the previous lemma. Thus, the sum same eigenvalues as
equals X X Y L+ T +
G B B = LG LG = Π,
σn−1 (LGT ) = n we . where Π is the projection orthogonal to the all 1 vector. As Π has n − 1 eigenvalues that are 1,
spanning trees T ⊆E spanning trees T e∈T
so does Γ.
The leverage score of an edge, written ℓe is defined to be we Reff (e). That is, the weight of the This is a good sanity check on Theorem 13.5.1: every spanning tree has n − 1 edges, and thus the
edge times the effective resistance between its endpoints. The leverage score serves as a measure probabilities that each edge is in the tree must sum to n − 1.
of how important the edge is. For example, if removing an edge disconnects the graph, then
Reff (e) = 1/we , as all current flowing between its endpoints must use the edge itself, and ℓe = 1. We also obtain another formula for the leverage score. As a symmetric projection is its own
square,
Consider sampling a random spanning tree with probability proportional to the product of the Γ(e, e) = Γ(e, :)Γ(e, :)T = ∥Γ(e, :)∥2 .
weights of its edges. We will now show that the probability that edge e appears in the tree is
This is the formula I introduced in Section ??. If we flow 1 unit from a to b, the potential
exactly its leverage score.
difference between c and d is (δ a − δ b )T L+
G (δ c − δ d ). If we plug these potentials into the
Theorem 13.5.1. If we choose a spanning tree T with probability proportional to the product of Laplacian quadratic form, we obtain the effective resistance. Thus this formula says
its edge weights, then for every edge e X 2
wa,b Reff a,b = wa,b wc,d (δ a − δ b )T L+
G (δ c − δ d ) .
Pr [e ∈ T ] = ℓe . (c,d)∈E
Proof of Theorem 13.5.1. Let Span(G) denote the set of spanning trees of G. For an edge e,
For simplicity, you might want to begin by thinking about the case where all edges have weight 1.
X σn−1 (LGT )
Recall that the effective resistance of edge e = (a, b) is PrT [e ∈ T ] =
σn−1 (LG )
T ∈Span(G):e∈T
(δa − δb )T L+ X
G (δa − δb ), = σn−1 (LGT )σn−1 (L+
G)
and so T ∈Span(G):e∈T
X
ℓa,b = wa,b (δa − δb )T L+
G (δa − δb ). = σn−1 (LGT L+
G ),
T ∈Span(G):e∈T
We can write a matrix Γ that has all these terms on its diagonal by letting U be the edge-vertex
adjacency matrix, W be the diagonal edge weight matrix, B = W 1/2 U , and setting by (13.4). Recalling that the subsets of n − 1 edges that are not spanning trees contribute 0
allows us to re-write this sum as
Γ = BL+ T
GB . X
σn−1 (LGS L+
G ).
The rows and columns of Γ are indexed by edges, and for each edge e, |S|=n−1,e∈S
σn−1 (Γ(S, :)Γ(:, S)) = ∥γ e ∥2 σn−2 (Γ(S, :)Πγ e Γ(:, S)) = ∥γ e ∥2 σn−2 ((ΓΠγ e Γ)(S, S)).
= ∥γ e ∥2 σn−2 (ΓΠγ e Γ)
= ∥γ e ∥2 In this chapter, we will see how to use the Johnson-Lindenstrauss Lemma, one of the major
techniques for dimension reduction, to approximately represent and compute effective resistances.
= ℓe .
Throughout this chapter, G = (V, E, w) will be a connected, weighted graph with n vertices and
m edges.
We begin by considering the problem of building a data structure from which one can quickly
estimate the effective resistance between every pair of vertices a, b ∈ V . To do this, we exploit the
fact from Section 12.1 that effective resistances can be expressed as squares of Euclidean distances:
Reff (a, b) = (δ a − δ b )T L+ (δ a − δ b )
2
= L+/2 (δ a − δ b )
2
= L+/2 δ a − L+/2 δ b
= dist(L+/2 δ a , L+/2 δ b )2 .
One other way of expressing the above terms is through a matrix norm . For a positive
semidefinite matrix A, the matrix norm in A is defined by
√
∥x ∥A = x T Ax = A1/2 x .
It is worth observing that this is in fact a norm: it is zero when x is zero, it is symmetric, and it
obeys the triangle inequality: for x + y = z ,
126
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 127 CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 128
The Johnson-Lindenstrauss Lemma [JL84] tells us that every Euclidean metric on n points is the ath column. This leads us to ask how quickly we can multiply a vector by L+/2 . Cheng,
well-approximated by a Euclidean metric in O(log n) dimensions, regardless of the original Cheng, Liu, Peng and Teng [CCL+ 15] show that this can be done in nearly-linear time. In this
dimension of the points. Johnson and Lindenstrauss proved this by applying a random orthogonal section, we will present a more elementary approach that merely requires solving systems of
projection to the points. As is now common, we will analyze the simpler operation of applying a equations in Laplacian matrices. We will see in Chapter ?? that this can be done very quickly.
random matrix of Gaussian random variables (also known as Normal variables). All Gaussian
The key is to realize that we do not actually need to multiply by the square root of the
random variables that appear in this chapter will have mean 0.
pseudoinverse of the Laplacian. Any matrix M such that M T M = L+ will suffice.
We recall that a Gaussian random variable of variance 1 has probability density
Recall that we can write L = U T W U , where U is the signed edge-vertex adjacency matrix and
1 W is the diagonal matrix of edge weights. We then have
p(x) = √ exp(−x2 /2),
2π L+ U T W 1/2 W 1/2 U L+ = L+ LL+ = L+ .
and that a Gaussian random variable of variance σ 2 has probability density So,
2
1 W 1/2 U L+ (δ a − δ b ) = Reff (a, b).
p(x) = √ exp(−x2 /2σ 2 ).
2πσ
Now, we let R be a d-by-m matrix of independent N (0, 1/d) entries, and compute
The distribution of such a variable is written N (0, σ 2 ), where the 0 corresponds to the mean
RW 1/2 U L+ = (RW 1/2 U )L+ .
being 0. A variable with distribution N (0, σ 2 ) may be obtained by sampling one with distribution
N (0, 1), and then multiplying it by σ. Gaussian random variables have many special properties, This requires multiplying d vectors in IRm by W 1/2 U , and solving d systems of linear equations
some of which we will see in this chapter. For those who are not familiar with them, we begin by in L. We then set
mentioning that they are the limit of a binomial distribution. If X is the sum of n ±1 random y a = (RW 1/2 U )L+ δ a .
variables for large n, then √ Each of these is a vector in d dimensions, and with high probability ∥y a − y b ∥2 is a good
Pr X/ n = t → p(t). approximation of Reff (a, b).
Theorem 14.1.1. Let x 1 , . . . , x n be vectors in IRk . For any ϵ, δ > 0, let d = 8(ln(n2 /δ)/ϵ2 . If R
is a d-by-k matrix of independent N (0, 1/d) variables, then with probability at least 1 − δ, for all
a ̸= b, 14.3 Properties of Gaussian random variables
(1 − ϵ)dist(x a , x b )2 ≤ dist(Rx a , Rx b )2 ≤ (1 + ϵ)dist(x a , x b )2 .
The sum a Gaussian random variables is another Gaussian random variable.
Thus, if we set d = 8(ln(n2 /δ)/ϵ2 , let R be a d-by-n matrix of independent N (0, 1/d) variables, Claim 14.3.1. If r1 , . . . , rn are independent Gaussian random variables of variances σ12 , . . . , σn2 ,
and set y a = RL+/2 δ a for each a ∈ V , then with probability at least 1 − δ we will have that for respectively, then
every a and b, Reff (a, b) is within
a 1 ± ϵ factor of dist(y a , y b )2 . Whereas writing all effective n
X
resistances would require n2 numbers, storing y 1 , . . . , y n only requires ?nd. ri
i=1
We remark that the 8 in the theorem can be replace with a constant that tends towards 4 as ϵ is a Gaussian random variable of variance
goes to zero. n
X
σi2 .
i=1
14.2 Computing Effective Resistances
One way to remember this is to recall that for a N (0, σ 2 ) random variable r, Er2 = σ 2 , and the
Note that the naive way of computing one effective resistance requires solving one Laplacian variance of the sum of independent random variables is the sum of their variances. The above
system: (δ a − δ b )T L+ (δ a − δ b ). We will see that we can approximate all of them by solving a claim adds the fact that the sum is also Gaussian.
logarithmic number of such systems. In particular, if x is an arbitrary vector and r is a vector of independent N (0, 1) random
If we could quickly multiply a vector by L+/2 , then this would give us a fast way of approximately variables, then x T r is a Gaussian random variable of variance ∥x ∥2 . This follows because
computing all effective resistances. All we would need to do is multiply each of the d rows of R by x (i)r (i) has variance x (i)2 , and X
L+/2 . This would provide the matrix RL+/2 , from which we could compute RL+/2 δ a by selecting xTr = x (i)r (i).
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 129 CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 130
If x ∈ IRk and R is a matrix of independent N (0, σ 2 ) variables, then each entry of Rx is an Thus the choice of d = 8(ln(n2 /δ)/ϵ2 makes this probability at most
independent N (0, σ 2 ∥x∥2 ) random variable. They are independent because each entry comes
from a separate row of R, and the variables in different rows are independent from each other. 2δ
2 exp(−ϵ2 d/8) ≤ 2 exp(− ln(n2 /δ)) = .
n2
The norm of a vector of identical independent N (0, 1) random variables is called a χ random
n
variable, and its square is a χ2 random variable. A lot is known about the distribution of χ2 As there 2 possible choices for a and b, the probability that there is one such that
random variables. If the vector has dimension d, then its expectation is d. It is very unlikely to
deviate too much from this. ∥R(x a − x b )∥2 ̸∈ (1 ± ϵ) ∥x a − x b ∥2
Finally, the probability that X − d > ϵd or X − d < −ϵd is at most the sum of these probabilities,
which is at most 2 exp(−t).
√
We √
remark that for small ϵ the term 2ϵd/ 8 dominates, and the upper bound of ϵd approaches
ϵd/ 2. If one pushes this into the proof below, we see that it suffices to project into a space of
dimension dimension of just a little more than 4(ln(n2 /δ)/ϵ2 , instead of 8(ln(n2 /δ)/ϵ2 .
Proof of Theorem 14.1.1. First consider an arbitrary a and b, and let ∆ = ∥x a − x b ∥2 . Each
entry of R(x a − x b ) is a d-dimensional vector of N (0, σ 2 ) variables, where σ 2 = ∆/d. Thus,
Corollary 14.3.3 tells us that
Pr dist(Rx a , Rx b )2 − dist(x a , x b )2 > ϵdist(x a , x b )2 =
h i
Pr ∥R(x a − x b )∥2 − ∆ ≥ ϵ∆ ≤ 2 exp(−ϵ2 d/8).
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 132
A planar drawing of a graph G = (V, E) consists of mapping from the vertices to the plane,
z : V → IR2 , along with interior-disjoint curves for each edge. The curve for edge (a, b) starts at
z (a), ends at z (b), never crosses itself, and its interior does not intersect the curve for any other
edge. A graph is planar if it has a planar drawing. There can, of course, be many planar drawings
of a graph.
Chapter 15 If one removes the curves corresponding to the edges in a planar drawing, one divides the plane
into connected regions called faces. In a 3-connected planar graph, the sets of vertices and edges
that border each face are the same in every planar drawing. There are planar graphs that are not
3-connected, like those in Figures 15.1 and 15.1, in which different planar drawings result in
Tutte’s Theorem: How to draw a combinatorially different faces. We will only consider 3-connected planar graphs.
graph
We prove Tutte’s theorem [Tut63], which shows how to use spring embeddings to obtain planar
drawings of 3-connected planar graphs. One begins by selecting a face, and then nailing down the
positions of its vertices to the corners of a strictly convex polygon. Of course, the edges of the
face should line up with the edges of the polygon. Ever other vertex goes where the springs say Figure 15.1: Planar graphs that are merely one-connected. Edge (c, d) appears twice on a face in
they should—to the center of gravity of their neighbors. Tutte proved that the result is a planar each of them.
embedding of the planar graph. Here is an image of such an embedding
Figure 15.2: Two different planar drawings of a planar graph that is merely two-connected. Vertices
g and h have switched positions, and thus appear in different faces in each drawing.
The presentation in this lecture is a based on notes given to me by Jim Geelen. I begin by
recalling some standard results about planar graphs that we will assume. We state a few properties of 3-connected planar graphs that we will use. We will not prove these
properties, as we are more concerned with algebra and these properly belong in a class on
combinatorial graph theory.
15.1 3-Connected, Planar Graphs Claim 15.1.1. Let G = (V, E) be a 3-connected planar graph. Then, there exists a set of faces F ,
each of which corresponds to a cycle in G, so that no vertex appears twice in a face, no edge
A graph G = (V, E) is k-connected if there is no set of k − 1 vertices whose removal disconnects appears twice in a face, and every edge appears in exactly two faces.
the graph. That is, for every S ⊂ V with |S| < k, G(V − S) is connected. In a classical graph
theory course, one usually spends a lot of time studying things like 3-connectivity. We call the face on the outside of the drawing the outside face. The edges that lie along the
131
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 133 CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 134
outside face are the boundary edges. We will use one other important fact about planar graphs, whose utility in this context was
observed by Jim Geelen.
Lemma 15.1.3. Let (a, b) be an edge of a 3-connected planar graph and let S1 and S2 be the sets
of vertices on the two faces containing (a, b). Let P be a path in G that starts at a vertex of
S1 − {a, b}, ends at a vertex of S2 − {a, b}, and that does not intersect a or b. Then, every path in
G from a to b either intersects a vertex of P or the edge (a, b).
Proof. Let s1 and s2 be the vertices at the ends of the path P . Consider a planar drawing of G
and ¡the closed curve in the plane that follows the path P from s1 to s2 , and then connects s1 to
s2 by moving inside the faces S1 and S2 , where the path only intersects the curve for edge (a, b).
Figure 15.3: 3-connected planar graphs. Some faces of the graph on the left are abf , f gh, and This curve separates vertex a from vertex b. Thus, every path in G that connects a to b must
af he. The outer face is abcde. The graph on the right is obtained by contracting edge (g, h). intersect this curve. This means that it must either consist of just edge (a, b), or it must intersect
a vertex of P . See Figure 15.1.
Another standard fact about planar graphs is that they remain planar under edge contractions.
Contracting an edge (a, b) creates a new graph in which a and b become the same vertex, and all
edges that went from other vertices to a or b now go to the new vertex. Contractions also preserve
3-connectivity. Figure 15.1 depicts a 3-connected planar graph and the result of contracting an
edge.
A graph H = (W, F ) is a minor of a graph G = (V, E) if H can be obtained from G by
contracting some edges and possibly deleting other edges and vertices. This means that each
vertex in W corresponds to a connected subset of vertices in G, and that there is an edge between
two vertices in W precisely when there is some edge between the two corresponding subsets. This
leads to Kuratowski’s Theorem [Kur30], one of the most useful characterizations of planar graphs.
Theorem 15.1.2. A graph G is planar if and only if it does not have a minor isomorphic to the
complete graph on 5 vertices, K5 , or the bipartite complete graph between two sets of 3 vertices, Figure 15.5: A depiction of Lemma 15.1.3. S1 = abcde, S2 = abf , and the path P starts at d, ends
K3,3 . at f , and contains the other unlabeled vertices.
This is a good time to remind you what exactly a convex polygon is. A subset C ⊆ IR2 is convex
if for every two points x and y in C, the line segment between x and y is also in C. A convex
polygon is a convex region of IR2 whose boundary is comprised of a finite number of straight lines.
It is strictly convex if in addition the angle at every corner is less than π. We will always assume
that the corners of a strictly convex polygon are distinct. Two corners form an edge of the
polygon if the interior of the polygon is entirely on one side of the line through those corners.
This leads to another definition of a strictly convex polygon: a convex polygon is strictly convex if
for every edge, all of the corners of the polygon other than those two defining the edge lie entirely
Figure 15.4: The Peterson graph appears on the left. On the right is a minor of the Peterson graph on one side of the polygon. In particular, none of the other corners lie on the line.
that is isomorphic to K5 , proving that the Peterson graph is not planar.
Definition 15.2.1. Let G = (V, E) be a 3-connected planar graph. We say that z : V → IR2 is a
Tutte embedding if
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 135 CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 136
Claim 15.3.2. All vertices not in F must lie strictly inside the convex hull of the polygon of
which the vertices in F are the corners.
Proof. For every vertex a not in F , we can show that the position of a is a weighted average of
the positions of vertices in F by eliminating every vertex not in F ∪ {a}. As we learned in Lecture
13, this results in a graph in which all the neighbors of a are in F , and thus the position of a is
(a) A polygon (b) A convex polygon (c) A strictly convex some weighted average of the position of the vertices in F . As the graph is 3-connected, we can
polygon
show that this average must assign nonzero weights to at least 3 of the vertices in F .
Figure 15.6: Polygons
Note that it is also possible to prove Claim 15.3.2 by showing that one could reduce the potential
energy by moving vertices inside the polygon. See Claim 8.8.1 from my lecture notes from 2015.
a. There is a face F of G such that z maps the vertices of F to the corners of a strictly convex Lemma 15.3.3. Let H be a halfspace in IR2 (that is, everything on one side of some line). Then
polygon so that every edge of the face joins consecutive corners of the polygon; the subgraph of G induced on the vertices a such that z (a) ∈ H is connected.
b. Every vertex not in F lies at the center of gravity of its neighbors.
Proof. Let t be a vector so that we can write the line ℓ in the form t T x = µ, with the halfspace
We will prove Tutte’s theorem by proving that every face of G is embedded as a strictly convex consisting of those points x for which t T x ≥ µ. Let a be a vertex such that z (a) ∈ H and let b be
polygon. In fact, we will not use the fact that every non-boundary vertex is exactly the average of a vertex that maximizes t T z (b). So, z (b) is as far from the line defining the halfspace as possible.
its neighbors. We will only use the fact that every non-boundary vertex is inside the convex hull By Claim 15.3.2, b must be on the outside face, F .
of its neighbors. This corresponds to allowing arbitrary spring constants in the embedding.
For every vertex c, define t(c) = t T z (c). We will see that there is a path in G from a to b along
Theorem 15.2.2. Let G = (V, E) be a 3-connected planar graph, and let z be a Tutte embedding which the function t never decreases, and thus all the vertices along the path lie in the halfspace.
of G. If we represent every edge of G as the straight line between the embedding of its endpoints, We first consider the case in which t(a) = t(b). In this case, we also know that a ∈ F . As the
then we obtain a planar drawing of G. vertices in F embed to a strictly convex polygon, this implies that (a, b) is an edge of that
polygon, and thus the path from a to b.
Note that if the graph were not 3-connected, then the embedding could be rather degenerate. If
If t(a) < t(b), it suffices to show that there is a path from a to some other vertex c for which
there are two vertices a and b whose removal disconnects the graph into two components, then all
t(c) > t(a) and along which t never decreases: we can then proceed from c to obtain a path to b.
of the vertices in one of those components will embed on the line segment from a to b.
Let U be the set of all vertices u reachable from a for which t(u) = t(a). As the graph is
Henceforth, G will always be a 3-connected planar graph and z will always be a Tutte embedding. connected, there must be a vertex u ∈ U that has a neighbor c ̸∈ U . By Claim 15.3.1 u must have
a neighbor c for which t(c) > t(u). Thus, the a path from a through U to c suffices.
We will now obtain a contradiction by showing that G has a minor isomorphic to K3,3 . The three
vertices on one side are w1 , w2 , and w3 . The other three are obtained by contracting the vertex
sets S + , S − , and U .
that edge. This is all we need to know to prove Tutte’s Theorem. We finish the argument in the
Figure 15.7: An illustration of the proof of Lemma 15.3.4. proof below.
Proof of Theorem 15.2.2. We say that a point of the plane is generic if it does not lie on any z (a)
for on any segment of the plane corresponding to an edge (a, b). We first prove that every generic
15.4 All faces are convex point lies in exactly one face of G.
Begin with a point that is outside the polygon on which F is drawn. Such a point lies only in the
We now prove that every face of G embeds as a strictly convex polygon. outside face. For any other generic point we can draw a curve between these points that never
intersects a z (a) and never crosses the intersection of the drawings of edges. That is, it only
Lemma 15.4.1. Let (a, b) be any non-boundary edge of the graph, and let ℓ be a line through crosses drawings of edges in their interiors. By Lemma 15.4.1, when the curve does cross such an
z (a) and z (b) (there is probably just one). Let F0 and F1 be the faces that border edge (a, b) and edge it moves from one face to another. So, at no point does it ever appear in two faces.
let S0 and S1 be the vertices on those faces, other than a and b. Then all the vertices of S0 and S1
lie on opposite sides of ℓ, and none lie on ℓ. Now, assume by way of contradiction that the drawings of two edges cross. There must be some
generic point near their intersection that lies in at least two faces. This would be a
contradiction.
Note: if z (a) = z (b), then we can find a line passing through them and one of the vertices of S0 .
This leads to a contradiction, and thus rules out this type of degeneracy.
15.5 Notes
Proof. Assume by way of contradiction that the lemma is false. Without loss of generality, we
may then assume that there are vertices of both S0 and S1 on or below the line ℓ. Let s0 and s1
be such vertices. By Lemma 15.3.4 and Claim 15.3.1, we know that both s0 and s1 have This is the simplest proof of Tutte’s theorem that I have seen. Over the years, I have taught
neighbors that lie strictly below the line ℓ. By Lemma 15.3.3, we know that there is a path P many versions of Tutte’s proof by building on expositions by Lovász [LV99] and Geelen [Gee12],
that connects s0 and s1 on which all vertices other than s0 and s1 lie strictly below ℓ. and an alternative proof of Gortler, Gotsman and Thurston [GGT06].
On the other hand, we can similarly show that that both a and b have neighbors above the line ℓ,
and that they are joined by a path that lies strictly above ℓ. Thus, this path cannot consist of the
edge (a, b) and must be disjoint from P . This contradicts Lemma 15.1.3.
So, we now know that the embedding z contains no degeneracies, that every face is embedded as
a strictly convex polygon, and that the two faces bordering each edge embed on opposites sides of
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 140
I remark that this theorem has a very clean extension to irregular, weighted graphs. I just present
this version to simplify the exposition.
We can use this theorem to bound the rate of convergence of random walks in a graph. Let p t be
the probability distribution of the walk after t steps, and plot the curves p t {x}. The theorem tells
us that these curves lie beneath each other, and that each curve lies beneath a number of chords
Chapter 16 drawn across the previous. The walk is uniformly mixed when the curve reaches a straight line
from (0, 0) to (n, 1). This theorem tells us how quickly the walks approach the straight line.
Today, we will use the theorem to prove a variant of Cheeger’s inequality.
For a vector f and an integer k, we define f {k} to be the sum of the largest k entries of f . For Then, X
convenience, we define f {0} = 0. Symbolically, you can define this by setting π to be a αi f (i) ≤ f {k}.
permutation for which i
f (π(1)) ≥ f (π(2)) ≥ ... ≥ f (π(n)),
This should be obvious, and most of you proved something like this when solving problem 2 on
and then setting homework 1. It is true because the way one would maximize this sum is by setting x to 1 for the
k
X largest values.
f {k} = f (π(i)).
i=1 Throughout this lecture, we will only consider lazy random walks on regular graphs. For a set S
For real number x between 0 and n, we define f {x} by making it be piece-wise linear between and a vertex a, we define γ(a, S) to be the probability that a walk that is at vertex a moves to S
consecutive integers. This means that for x between integers k and k + 1, the slope of f {} at x is in one step. If a is not in S, this equals one half the fraction of edges from a to S. It is one half
f (π(k + 1)). As these slopes are monotone nonincreasing, the function f {x} is concave. because there is a one half probability that the walk stays at a. Similarly, if a is in S, then γ(a, S)
equals one half plus one half the fraction of edges of a that end in S.
We will prove the following theorem of Lovàsz and Simonovits [LS90] on the behavior of W f .
Theorem 16.1.1. Let W be the transition matrix of the lazy random walk on a d-regular graph
with conductance at least ϕ. Let g = W f . Then for all integers 0 ≤ k ≤ n 16.3 Warm up
1
g {k} ≤ (f {k − ϕh} + f {k + ϕh}) , We warm up by proving that the curves must lie under each other.
2
For a vector f and a set S, we define
where h = min(k, n − k). X
f (S) = f (a).
a∈S
139
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 141 CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 142
For every k there is at least one set S for which Proof. To ease notation, define γ(a) = γ(a, S). We prove the theorem by rearranging the formula
X
f (S) = f {k}. g (S) = γ(a)f (a).
a∈V
If the values of f are distinct, then the set S is unique.
P
Lemma 16.3.1. Let f be a vector and let g = W f . Then for every x ∈ [0, n], Recall that a∈V γ(a) = k.
and write X X
Our proof of the main theorem improves the previous argument by exploiting the conductance (2α(a)) = k − 2z and (2β(a)) = k + 2z.
through the following lemma. a∈V a∈V
Lemma 16.4.1. Let S be any set of k vertices. Then Lemma 16.4.1 implies that
X z ≥ ϕh/2.
γ(a, S) = (ϕ(S)/2) min(k, n − k).
a̸∈S By Claim 16.2.2,
1
g (S) ≤ (f {k − z} + f {k + z}) .
Proof. For a ̸∈ S, γ(a, S) equals half the fraction of the edges from a that land in S. And, the 2
number of edges leaving S equals dϕ(S) min(k, n − k). So, Claim 16.2.1 implies
1
g (S) ≤ (f {k − ϕh} + f {k + ϕh}) .
Lemma 16.4.2. Let W be the transition matrix of the lazy random walk on a d-regular graph, 2
and let g = W f . For every set S of size k with conductance at least ϕ,
1
g (S) ≤ (f {k − ϕh} + f {k + ϕh}) , Theorem 16.1.1 follows by applying Lemma 16.4.2 to sets S for which f (S) = f {k}, for each
2
integer k between 0 and n.
where h = min(k, n − k).
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 143
Reid Andersen observed that the technique of Lovàsz and Simonovits can be used to give a new
proof of Cheeger’s inequality. I will state and prove the result for the special case of d-regular
graphs that we consider in this lecture. But, one can of course generalize this to irregular,
weighted graphs.
Theorem 16.5.1. Let G be a d-regular graph with lazy random walk matrix W , and let
Chapter 17
ω2 = 1 − λ be the second-largest eigenvalue of W . Then there is a subset of vertices S for which
√
ϕ(S) ≤ 8λ.
144
CHAPTER 17. MONOTONICITY AND ITS FAILURES 145 CHAPTER 17. MONOTONICITY AND ITS FAILURES 146
we should consider what happens if the displacement between s and t is something other than 1. This formula tells us that if we have one resistor between a and b and we fix the voltage of a to 1
If we fix the position of s to 0 and the position of t to y, then the homogeniety of the expression and the voltage of b to 0, then the amount of current that will flow from a to b is the reciprocal of
for energy (17.1) tells us that the vector yx will minimize the energy subject to the boundary the resistance. It also tells us that if we want to flow one unit of current, then we need to place a
conditions. Moreover, the energy in this case will be y 2 /2 times the effective spring constant. potential difference of ra,b between a and b. Recall that we define the weight of an edge to be the
reciprocal of its resistance, as high resistance corresponds to poor connectivity. We can use this
formula to define the effective resistance between two vertices s and t in an arbitrary complex
17.3 Monotonicity network of resistors: we define the effective resistance between s and t to be the potential
difference needed to flow one unit of current from s to t.
Rayleigh’s Monotonicity Principle tells us that if we alter the spring network by decreasing some Algebraically, define i ext to be the vector
of the spring constants, then the effective resistance between s and t will not increase.
1 if a = s
b = (V, E, w)
Theorem 17.3.1. Let G = (V, E, w) be a weighted graph and let G b be another
i ext (a) = −1 if a = t .
weighted graph with the same edges and such that
0 otherwise
ba,b ≤ wa,b
w
This corresponds to a flow of 1 from s to t. We then solve for the voltages that realize this flow:
for all (a, b) ∈ E. For vertices s and t, let cs,t be the effective spring constant between s and t in
G and let b b Then,
cs,t be the analogous quantity in G. Lv = i ext ,
cs,t ≤ cs,t .
b by
v = L+ i ext .
Proof. Let x be the vector of minimum energy in G such that x (s) = 0 and x (t) = 1. Then, the We thus have
def
b is no greater:
energy of x in G v (s) − v (t) = i Text v = i Text L+ i ext = Reff (s, t).
1 X 1 X
ba,b (x (a) − x (b))2 ≤
w wa,b (x (a) − x (b))2 = cs,t . This agrees with the other natural approach to defining effective resistance: twice the energy
2 2 dissipation when we flow one unit of current from s to t.
(a,b)∈E (a,b)∈E
b such that x (s) = 0 and x (t) = 1 will be at most cs,t , Theorem 17.4.1. Let i be the electrical flow of one unit from vertex s to vertex t in a graph G.
So, the minimum energy of a vector x in G
Then,
cs,t ≤ cs,t .
and so b
Reff s,t = E (i ) .
While this principle seems very simple and intuitively obvious, it turns out to fail in just slightly
Proof. Recalling that i ext = Lv , we have
more complicated situtations. Before we examine them, I will present the analogous material for
electrical networks. Reff s,t = i Text L+ i ext = v T LL+ Lv = v T Lv = E (v ) .
17.5 Examples I now cut the small wire connecting point b to point c. While you would expect that removing
material from the supporting structure would cause the weight to go down, it will in fact move
In the case of a path graph with n vertices and edges of weight 1, the effective resistance between up. To see why, let’s analyze the resulting structure. It consists of two suppors in parallel. One
the extreme vertices is n − 1. consists of a spring from point a to point b followed by a wire of length 1 + ϵ from point b to d.
The other has a wire of length 1 + ϵ from point a to point c followed by a spring from point c to
In general, if a path consists of edges of resistance r(1, 2), . . . , r(n − 1, n) then the effective point d. Each of these is supporting the weight, and so each carries half the weight. This means
resistance between the extreme vertices is that the length of the springs will be 1/2. So, the distance from a to d should be essentially 3/2.
r(1, 2) + · · · + r(n − 1, n). This sounds like a joke, but we will see in class that it is true. The measurements that we get will
not be exactly 2 and 3/2, but that is because it is difficult to find ideal springs at Home Depot.
To see this, set the potential of vertex i to
In the example with resistors and diodes, one can increase electrical flow between two points by
v (i) = r(i, i + 1) + · · · + r(n − 1, n). cutting a wire!
Ohm’s law then tells us that the current flow over the edge (i, i + 1) will be
17.7 Traffic Networks
(v (i) − v (i + 1)) /r(i, i + 1) = 1.
I will now explain some analogous behavior in traffic networks. We will examine the more
If we have k parallel edges between two nodes s and t of resistances r1 , . . . , rk , then the effective formally in the next lecture.
resistance is
1 We will use a very simple model of a road in a traffic network. It will be a directed edge between
Reff (s, t) = .
1/r1 + · · · + 1/rk two vertices. The rate at which traffic can flow on a road will depend on how many cars are on
Again, to see this, note that the flow over the ith edge will be the road: the more cars, the slower the traffic. I will assume that our roads are linear. That is,
when a road has flow f , the time that it takes traffic to traverse the road is
1/ri
, af + b,
1/r1 + · · · + 1/rk
so the total flow will be 1. for some nonnegative constants a and b. I call this the characteristic function of the road.
We first consider an example of Pigou consisting of two roads between two vertices, s and t. The
slow road will have characteristic function 1: think of a very wide super-highway that goes far out
17.6 Breakdown of Monotonicity of the way. No matter how many cars are on it, the time from s to t will always be 1. The fast
road is better: its characteristic is f . Now, assume that there is 1 unit of traffic that would like to
We will now exhibit a breakdown of monotonicity in networks of nonlinear elements. In this case, go from s to t.
we will consider a network of springs and wires. For examples in electrical networks with resistors
and diodes or for networks of pipes with valves, see [PP03] and [CH91]. A global planner that could dictate the route that everyone takes could minimize the average time
of the traffic going from s to t by assigning half of the traffic to take the fast road and half of the
There will be 4 important vertices in the network that I will describe, a, b, c and d. Point a is traffic to take the slow road. In this case, half of the traffic will take time 1 and half will take time
fixed in place at the top of my aparatus. Point d is attached to an object of weight 1. The 1/2, for an average travel time of 3/4. To see that this is optimal, let f be the fraction of traffic
network has two springs of spring constant 1: one from point a to point b and one from point c to that takes the fast road. Then, the average travel time will be
point d. There is a very short wire connecting point b to point c.
f · f + (1 − f ) · 1 = f 2 − f + 1.
As each spring is supporting one unit of weight, each is stretched to length 1. So, the distance
from point a to point d is 2. Taking derivatives, we see that this is minimized when
I now add two more wires to the network. One connects point a to point c and the other connects
2f − 1 = 0,
point b to point d. Both have lengths 1 + ϵ, and so are slack. Thus, the addition of these wires
does not change the position of the weight. which is when f = 1/2.
CHAPTER 17. MONOTONICITY AND ITS FAILURES 149 CHAPTER 17. MONOTONICITY AND ITS FAILURES 150
On the other hand, this is not what people will naturally do if they have perfect information and more than 4/3 when the cost functions are linear. If there is time today, I will begin a more
freedom of choice. If a f < 1 fraction of the flow is going along the fast road, then those travelling formal analysis of Opt(G) and Nash(G) that we will need in our proof.
on the fast road will get to t faster than those going on the slow road. So, anyone going on the
slow road would rather take the fast road. So, all of the traffic will wind up on the fast road, and
it will become not-so-fast. All of the traffic will take time 1. 17.10 Nash optimum
We call this the Nash Optimal solution, because it is what everyone will do if they are only
maximizing their own benefit. You should be concerned that this is not as well as they would do Let the set of s-t paths be P1 , . . . , Pk , and let αi be the fraction of the traffic that flows on path
if they allowed some authority to dictate their routes. For example, the authority could dictate Pi . In the Nash equilibrium, no car will go along a sub-optimal path. Assuming that each car has
that half the cars go each way every-other day, or one way in the morning and another at night. a negligible impact on the traffic flow, this means that every path Pi that has non-zero flow must
have minimal cost. That is, for all i such that αi > 0 and all j
Let’s see an even more disturbing example.
c(Pi ) ≤ c(Pj ).
and scdt, respectively. The cost of route sct is p1 + p3 + 1. The cost of route sdt is p2 + p3 + 1. Theorem 17.11.1. All local minima of the social cost function are global minima. Moreover, the
And, the cost of route scdt is p3 + p3 . So, as long as p3 is less than 1, the cheapest route will be set of global minima is convex.
scdt. So, all the traffic will go that way, and the cost of every route will be 2.
Proof. This becomes easy once we re-write the cost function as
X X
17.9 The Price of Anarchy ce (fe )fe = ae fe2 + be fe
e e
In any traffic network, we can measure the average amount of time it takes traffic to go from s to and recall that we assumed that ae and be are both at least zero. The cost function on each edge
t under the optimal flow. We call this the cost of the social optimum, and denote it by Opt(G). is convex. It is strictly convex if ae > 0, but that does not matter for this theorem.
When we let everyone pick the route that is best for themselves, the resulting solution is a Nash
If you take two flows, say f 0 and f 1 , the line segments of flows between them contains the flows of
Equilibrium, and we denote it by Nash(G).
the form f t where
The “Price of Anarchy” is the cost to society of letting everyone do their own thing. That is, it is fet = tfe1 + (1 − t)fe0 ,
the ratio
Nash(G) for 0 ≤ t ≤ 1.
.
Opt(G) By the convexity of each cost function, we know that the cost of any flow f t is at most the
In these examples, the ratio was 4/3. In the next lecture, we will show that the ratio is never maximum of the costs of f 0 and f 1 . So, if f 1 is the global optimum and f 0 is any other flow with
CHAPTER 17. MONOTONICITY AND ITS FAILURES 151
higher cost, the flow f ϵ will have a social cost lower than f 0 . This means that f 0 cannot be a
local optimum. Similarly, if both f 0 and f 1 are global optima, then f t must be as well.
Chapter 18
18.1 Overview
In this lecture we will consider two generalizations of resistor networks: resistor networks with
non-linear resistors and networks whose resistances change over time. While they were introduced
over 50 years ago, non-linear resistor networks seem to have been recently rediscovered in the
Machine Learning community. We will discuss how they can be used to improve the technique we
learned in Lecture 13 for semi-supervised learning.
The material on time-varying networks that I will present comes from Cameron Musco’s senior
thesis from 2012.
A non-linear resistor network, as defined by Duffin [Duf47], is a like an ordinary resistor network
but the resistances depend on the potential differences across them. In fact, it might be easier not
to talk about resistances, and just say that the amount of flow across an edge increases as the
potential difference across the edge does. For every resistor e, there is a function
ϕe (v)
that gives the flow over resistor e when there is a potential difference of v between its terminals.
We will restrict our attention to functions ϕ that are
152
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 153 CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 154
a. continuous, Theorem 18.3.1. Let G = (V, E) be a non-linear resistor network with functions fe satisfying
conditions a, b and c for every e ∈ E. For every set S ⊆ V and fixed voltages wa for a ∈ S, there
b. monotone increasing, exists a setting of voltages va for a ̸∈ S that result in a flow of current that satisfies the flow-in
c. symmetric, by which I mean ϕe (−v) = −ϕe (v). equals flow-out conditions at every a ̸∈ S. Moreover, these voltages are unique.
Note that condition c implies that ϕe (0) = 0. For an ordinary resistor of resistance r, we have Proof. For a vector of voltages v, define
X
Φ(v) = Φ(a,b) (va − vb ).
ϕe (v) = v/r.
(a,b)∈E
However, we can and will consider more interesting functions. As each of the functions Φ(a,b) are strictly convex, Φ is as well. So, Φ has a minimum subject to
If the graph is connected and we fix the voltages at some of the vertices, then there exists a the fixed voltages. At this minimum point, we know that for every a ̸∈ S
setting of voltages at the other vertices that results in a flow satisfying flow-in equals flow-out at ∂Φ(v)
0=
all non-boundary vertices. Moreover, this flow is unique. ∂va
X ∂Φ(a,b) (va − vb )
We will prove this in the next section through the use of a generalization of energy dissipation. =
∂va
b:(a,b)∈E
X
= ϕ(a,b) (va − vb ).
18.3 Energy
b:(a,b)∈E
We define the energy dissipation of an edge that has a potential difference of v to be We may now set
Z v f(a,b) = ϕ(a,b) (va − vb ).
def
Φe (v) = ϕe (t)dt. This is a valid flow because for every vertex a ̸∈ S the sum of the flows out of va , taken with
0 appropriate signs, is zero.
We will show that the setting of the voltages that minimizes the total energy provides the flow I Conversely, for any setting of voltages that results in a flow that has no loss or gain at any a ̸∈ S,
claimed exists. we can reverse the above equalities to show that the partial derivatives of Φ(v) are zero. As Φ(v)
In the case of linear resistors, where ϕe (v) = v/r, is strictly convex, this can only happen at the unique minimum of Φ(v).
1 v2
Φe (v) = ,
2 r 18.4 Uses in Semi-Supervised Learning
which is exactly the energy function we introduced in Lecture 13.
In Lecture 13, I suggested an approach to estimating a function f on the vertices of a graph given
The conditions on ϕe imply that its values at a set S ⊆ V : X
min (x(a) − x(b))2 .
x:f (a)=x(a) for a∈S
d. Φe is strictly convex1 , (a,b)∈E
e. Φe (0) = 0, and Moreover, we saw that we can minimize such a function by solving a system of linear equations.
Unfortunately, there are situtations in which this approach does not work very well. In general,
f. Φe (−x) = Φe (x).
this should not be surprising: sometimes the problem is just unsolvable. But, there are cases in
which it would be reasonable to solve the learning problem in which this approach fails.
We remark that a function that is strictly convex has a unique minimum, and that a sum of
strictly convex functions is strictly convex. Better results are sometimes obtained by modifying the penalty function. For example, Bridle
1
and Zhu [BZ13] (and, essentially, Herbster and Guy [HL09]) suggest
That is, for all x ̸= y and all 0 < λ < 1, Φe (λx + (1 − λ)y) < λΦe (x) + (1 − λ)Φe (y). X
min |x(a) − x(b)|p ,
x:f (a)=x(a) for a∈S
(a,b)∈E
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 155 CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 156
for 1 < p < 2. minimality of the flow. Because Ψ(f ) is strictly convex, small changes to the optimum have a
negligible effect on its value (that is, the first derivative is zero). So, pushing an ϵ amount of flow
While a well-selected p will often improve accuracy, the drawback of this approach is that we
around any cycle will not change the value of Ψ(f ). That is, the sum of the derivatives around
cannot perform the minimization nearly as quickly as we can when p = 2.
any cycle will be zero. As
∂
Ψe (f ) = ψe (f ),
∂f
18.5 Dual Energy
this means that the sum of the desired potential differences around every cycle is zero.
We can establish a corresponding, although different, energy for the flows. Let ψ be the inverse of Theorem 18.5.2. If f = ϕ(v), then
ϕ. We then define the flow-energy of an edge that carries a flow of f to be
Z f Φ(v) + Ψ(f ) = vf.
def
Ψ(f ) = ψ(t)dt.
0 Proof. One can prove this theorem through “integration by parts”. But, I prefer a picture. In the
If we minimize the sum of the flow-energies over the space of flows, we again recover the unique following two figures, the curve is the plot of ϕ. In the first figure, the shaded region is the
valid flow in the network. (The function Φ is implicit in the work of Duffin. The dual Ψ comes integral of ϕ between 0 and v (2 in this case). In the second figure, the shaded region is the
from Millar [Mil51]). integral of ψ between 0 and ϕ(v) (just turn the picture on its side). It is clear that these are
complementary parts of the rectangle between the axes and the point (v, ϕ(v)).
In the classical case, Φ and Ψ are the same. While they are not the same here, their sum is. We
will later prove that when v = ψ(f ),
Ψ(f ) + Φ(v) = f v.
Ψ(f ) + Φ(v) ≥ f v,
resulting from the induced voltages. Let f be the flow on the edges that is compatible with fext and 0.7 0.7
f
Then, f is the flow induced by the voltages shown to exist in Theorem 18.3.1. 0.3 0.3
0.2 0.2
0.1 0.1
Sketch. We first show that f is a potential flow. That is, that there exist voltages v so that for
0 0
every edge (a, b), f(a,b) = ϕ(a,b) (va − vb ). The theorem then follows by the uniqueness established 0 0.5 1 1.5 2 0 0.5 1 1.5 2
v v
in Theorem 18.3.1.
To prove that f is a potential flow, we consider the potential difference that the flow “wants” to
induce on each edge, ψ(f(a,b) ). There exist vertex potentials that agree with these desired
potential differences if an only if for every pair of vertices and for every pair of paths between
them, the sum of the desired potential differences along the edges in the paths is the same. To see
this, arbitrarily fix the potential of one vertex, such as s. We may then set the potential of any
other vertex a by summing the desired potential differences along the edges in any path from s. (a) Φ(v) (b) Ψ(f ) when f = ϕ(v)
Equivalenty, the desired potential differences are realizable if and only if the sum of these desired
The bottom line is that almost all of the classical theory can be carried over to nonlinear networks.
potential differences is zero around every cycle. To show that this is the case, we use the
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 157 CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 158
18.6 Thermistor Networks If the system converges, that is if the voltages at the nodes converge along with the potential
drops and flows across edges, then
We now turn our attention to networks of resistors whose resistance changes over time. We ∂Te
consider a natural model in which edges get “worn out”: as they carry more flow their resistance 0= = αe Te fe2 − (Te − TA ).
∂t
increases. One physical model that does this is a thermistor. A thermistor is a resistor whose
resistance increases with its temperature. These are used in thermostats. To turn this into a relationship between fe and ve , we apply the identity fe re = ve , which
Remember the “energy dissipation” of a resistor? The energy dissipates as heat. So, the becomes fe αe Te = ve , to obtain
temperature of resistor increases as its resistance times the square of the flow through it. To 0 = ve fe − Te + TA .
prevent the temperatures of the resistors from going to infinity, we will assume that there is an To eliminate the last occurence of Te , we then multiply by fe and apply the same identity to
ambient temperature TA , and that they tend to the ambient temperature. I will denote by Te the produce
temperature of resistor e, and I will assume that there is a constant αe for each resistor so that its 0 = ve fe2 − ve /αe + fe TA .
resistance
The solutions of this equation in fe are given by
re = αe Te . (18.1)
s
1 TA 2 TA
We do not allow temperatures to be negative. fe = ± + − .
αe 2ve 2ve
Now, assume that we would like to either flow a current between two vertices s and t, or that we
have fixed the potentials of s and t. Given the temperature of every resistor at some moment, we The correct choice of sign is the one that gives this the same sign as ve :
can compute all their resistances, and then compute the resulting electrical flow as we did in s
Lecture 13. Let fe be the resulting flow on resistor e. The temperature of e will increase by re fe2 , 1 (2ve )2
fe = + TA − TA .
2 (18.3)
and it will also increase in proportion to the difference between its present temperature and the 2ve αe
ambient temperature.
This gives us the following differential equation for the change in the temperature of a resistor: When ve is small this approaches zero, so we define it to be zero when ve is zero. As ve becomes
−1/2
large this expression approaches αe . Similarly, when ve becomes very negative, this approaches
∂Te −1/2
= re fe2 − (Te − TA ). (18.2) −αe . If we now define s
∂t
1 (2ve )2
Ok, there should probably be some constant multiplying the (Te − TA ) term. But, since I haven’t ϕe (ve ) = + TA2 − TA ,
2ve αe
specified the units of temperature we can just assume that the constant is 1.
we see that this function satisfies properties a, b and c. Theorem 18.3.1 then tells us that a stable
By substituting in (18.1) we can eliminate the references to resistance. We thus obtain
solution exists.
∂Te
= αe Te fe2 − (Te − TA ).
∂t
18.7 Low Temperatures
There are now two natural questions to ask: does the system converge, and if so, what does it
converge to? If we choose to impose a current flow between s and t, the system does not need to We now observe that when the ambient temperature is low, a thermistor network produces a
converge. For example, consider just one resistor e between vertices s and t with αe = 2. We then minimum s-t cut in a graph. The weights of the edges in the graph are related to αe . For
find simplicity, we will just examine the case when all αe = 1. If we take the limit as TA approaches
∂Te
= αe Te fe2 − (Te − TA ) = 2Te − (Te − TA ) = Te + TA . zero, then the behavior of ϕe is
∂t
0 if ve = 0
So, the temperature of the resistor will go to infinity.
ϕe (ve ) = 1 if ve > 0
For this reason, I prefer to just fix the voltages of certain vertices. Under these conditions, we can −1 if ve < 0.
prove that the system will converge. While I do not have time to prove this, we can examine what
it will converge to. We will obtain similar behavior for small TA : if there is a non-negligible potential drop across an
edge, then the flow on that edge will be near 1. So, every edge will either have a flow near 1 or a
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 159
negligible potential drop. When an edge has a flow near 1, its energy will be near 1. On the other
hand, the energy of edges with negligible potential drop will be near 0.
So, in the limit of small temperatures, the energy minimization problem becomes
X
min |v(a) − v(b)| .
v:v(s)=0,v(t)=1
(a,b)∈E
One can show that the minimum is achieved when all of the voltages are 0 or 1, in which case the
energy is the number of edges going between voltage 0 and 1. That is, the minimum is achieved
by a minimum s-t cut.
Part IV
160
CHAPTER 19. INDEPENDENT SETS AND COLORING 162
The problem of finding large independent sets in a graph is NP-Complete, and it is very difficult
to even approximate the size of the largest independent set in a graph [FK98, Hås99]. However,
for some carefully chosen graphs, spectral analysis provides very good bounds on the sizes of
independent sets.
One of the first results in spectral graph theory was Hoffman’s [Hof70] proof the following upper
bound on the size of an independent set in a graph G.
Independent Sets and Coloring Theorem 19.3.1. Let G = (V, E) be a d-regular graph, and let µn be its smallest adjacency
matrix eigenvalue. Then
−µn
α(G) ≤ n .
d − µn
19.1 Introduction
Recall that µn < 0. Otherwise this theorem would not make sense. We will prove a generalization
of Hoffman’s theorem due to Godsil and Newman [GN08]:
We will see how high-frequency eigenvalues of the Laplacian and Adjacency matrix can be related
to independent sets and graph coloring. Recall the we number the Laplacian matrix eigenvalues Theorem 19.3.2. Let G = (V, E) be a graph, let S be an independent set in G, and let dave (S)
in increasing order: be the average degree of a vertex in S. Then,
0 = λ1 ≤ λ2 ≤ · · · ≤ λn .
dave (S)
We call the adjacency matrix eigenvalues µ1 , . . . , µn , and number them in the reverse order: |S| ≤ n 1 − .
λn
µ1 ≥ · · · ≥ µn .
The reason we reverse the order of indexing is that for d-regular graphs, µi = d − λi . For a This is a generalization of Theorem 19.3.1 because in the d-regular case dave = d and λn = d − µn .
non-empty graph, µn will be negative. So, these bounds are the same for regular graphs:
161
CHAPTER 19. INDEPENDENT SETS AND COLORING 163 CHAPTER 19. INDEPENDENT SETS AND COLORING 164
where the third equality follows from the independence of S. The reason that we subtracted s1 Corollary 7.3.2 tells us that if G is a Paley graph on n = p vertices of degree d = (p − 1)/2, then
√
from 1S is that this minimizes the norm of the result. We compute λn = (p + p)/2. So, for an independent set S, Hoffman’s bound tells us that
x T x = |S| (1 − s)2 + (|V | − |S|)s2 = |S| (1 − 2s + s2 ) + |S| s − |S| s2 = |S| (1 − s) = n(s − s2 ). dave (S)
|S| ≤ n 1 −
λn
Thus,
p−1
x T Lx dave (S) |S| dave (S)sn dave (S) =p 1− √
λn ≥ = = = . p+ p
xTx n(s − s2 ) n(s − s2 ) 1−s √
p+1
Re-arranging terms, this gives =p √
p+ p
dave (S) √
1− ≥ s, = p.
λn
which is equivalent to the claim of the theorem. √
One can also show that every clique in a Paley graph has size at most p.
Remarkably, this theorem holds for weighted graphs, even though edge weights do not play a role A graph is called a k-Ramsey graph if it contains no clique or independent set of size k. It is a
in independence of subsets of vertices. challenge to find large k-Ramsey graphs. Equivalently, it is challenging to find k-Ramsey graphs
on n vertices for which k is small. In one of the first papers on the Probabilistic Method in
We will use the computation of the norm of x often, so we will make it a claim. Combinatorics, Erdös proved that a random graph on n vertices in which each edge is included
with probability 1/2 is probably 2 log2 n Ramsey [Erd47].
Claim 19.3.3. Let S ⊆ V have size s |V |. Then
However, constructing explicit Ramsey graphs has proved much more challenging. Until a decade
∥1S − s1∥2 = s(1 − s) |V | . ago, Paley graphs were among the best known. A recent construction of Chattopadhyay and
Zuckerman [CZ19] provides explicit graphs on n vertices that do not have cliques or independent
Claim 19.3.4. For a vector x of length n, the value of t that minimizes the norm of x − t1 is O(1)
sets of size 2(log log n) .
t = 1T x /n.
Proof. The derivative of the square of the norm is 19.5 Lower Bound on the chromatic number
d X X
(x (a) − t)2 = 2 (x (a) − t).
dt a a
As a k-colorable graph must have an independent set of size at least n/k, an upper bound on the
sizes of independent sets gives a lower bound on its chromatic number. However, this bound is
When the norm is minimized the derivative is zero, which implies not always a good one.
X
nt = x (a) = 1T x . For example, consider a graph on 2n vertices consisting of a clique (complete graph) on n vertices
a and n vertices of degree 1, each of which is connected to a different vertex in the clique. The
chromatic number of this graph is n, because each of the vertices in the clique must have a
different color. However, the graph also has an independent set of size n, which would only give a
lower bound of 2 on the chromatic number.
19.4 Application to Paley graphs Hoffman [Hof70] proved the following lower bound on the chromatic number of a graph that does
not require the graph to be regular, and which can be applied to weighted graphs. Numerically, it
is obtained by dividing n by the bound in Theorem 19.3.1. But, the proof is very different
Let’s examine what Hoffman’s bound on the size of the largest independent set tells us about
because that theorem only applies to regular graphs.
Paley graphs.
Theorem 19.5.1.
µ1 − µn µ1
χ(G) ≥ =1+ .
−µn −µn
The proof of this theorem relies on the following inequality whose proof we defer to Section 19.6.
CHAPTER 19. INDEPENDENT SETS AND COLORING 165 CHAPTER 19. INDEPENDENT SETS AND COLORING 166
To state it, we introduce the notation λmax (M ) and λmin (M ) to indicate the largest and smallest Lemma 19.6.1. Let
eigenvalues of the matrix M . B C
A=
CT D
Lemma 19.5.2. Let
M 1,1 M 1,2 · · · M 1,k be a symmetric matrix. Then
M T1,2 M 2,2 · · · M 2,k
λmin (A) + λmax (A) ≤ λmax (B) + λmax (D).
M = .. .. .. ..
. . . .
T T
M 1,k M 2,k · · · M k,k
x1
Proof. Let x be a unit eigenvector of A of eigenvalue λmax (A). Write x = , using the same
be a block-partitioned symmetric matrix with k ≥ 2. Then x2
X partition as we did for A.
(k − 1)λmin (M ) + λmax (M ) ≤ λmax (M i,i ).
i
We first consider the case in which neither x 1 nor x 2 is an all-zero vector. In this case, we set
∥x 2 ∥
!
Proof of Theorem 19.5.1. Let G be a k-colorable graph. After possibly re-ordering the vertices, ∥x 1 ∥ x 1
y= .
the adjacency matrix of G can be written − ∥x 1∥
∥x 2 ∥ x 2
0 M 1,2 ··· M 1,k The reader may verify that y is also a unit vector, so
M T1,2 0 ··· M 2,k
.. .. .. .. . (19.1) y T Ay ≥ λmin (A).
. . . .
M T1,k M T2,k · · · 0 We have
Each block corresponds to a color. λmax (A) + λmin (A) ≤ x T Ax + y T Ay
As each diagonal block is all-zero, Lemma 19.5.2 implies = x T1 Bx 1 + x T1 C x 2 + x T2 C T x 1 + x T2 Dx 2 +
(k − 1)λmin (M ) + λmax (M ) ≤ 0. ∥x 2 ∥2 T ∥x 1 ∥2 T
+ x 1 Bx 1 − x T1 C x 2 − x T2 C T x 1 + x 2 Dx 2
∥x 1 ∥2 ∥x 2 ∥2
Recalling that λmin (M ) = µn < 0, and λmax (M ) = µ1 , a little algebra yields
∥x 2 ∥2 T ∥x 1 ∥2 T
= x T1 Bx 1 + x T2 Dx 2 + 2 x 1 Bx 1 + x 2 Dx 2
µ1 ∥x 1 ∥ ∥x 2 ∥2
1+ ≤ k. ! !
−µn
∥x 2 ∥2 ∥x 1 ∥2
≤ 1+ 2 x T1 Bx 1 + 1 + x T2 Dx 2
∥x 1 ∥ ∥x 2 ∥2
≤ λmax (B) ∥x 1 ∥2 + ∥x 2 ∥2 + λmax (D) ∥x 1 ∥2 + ∥x 2 ∥2
To return to our example of the n clique with n degree-1 vertices attached, I examined an
example with n = 6. We find µ1 = 5.19 and µ12 = −1.62. This gives a lower bound on the = λmax (B) + λmax (D),
chromatic number of 4.2, which implies a lower bound of 5. We can improve the lower bound by
re-weighting the edges of the graph. For example, if we give weight 2 to all the edges in the clique as x is a unit vector.
and weight 1 to all the others, we obtain a bound of 5.18, which agrees with the chromatic
We now return to the case in which ∥x 2 ∥ = 0 (or ∥x 1 ∥ = 0, which is really the same case).
number of this graph which is 6.
Theorem 4.3.1 tells us that λmax (B) ≤ λmax (A). So, it must be the case that x 1 is an eigenvector
of eigenvalue λmax (A) of B, and thus λmax (B) = λmax (A). To finish the proof, also observe that
Theorem 4.3.1 implies
19.6 Proofs for Hoffman’s lower bound on chromatic number λmax (D) ≥ λmin (D) ≥ λmin (A).
To prove Lemma 19.5.2, we begin with the case of k = 2. The general case follows from this one
by induction. While the lemma in the case k = 2 when there are zero blocks on the diagonal
follows from Proposition 4.5.4, we require the general statement for induction.
CHAPTER 19. INDEPENDENT SETS AND COLORING 167
Proof of Lemma 19.5.2. For k = 2, this is exactly Lemma 19.6.1. For k > 2, we apply induction.
Let
M 1,1 M 1,2 · · · M 1,k−1
M T1,2 M · · · M
2,2 2,k−1
B = .. .. .. .. .
. . . .
T T
M 1,k−1 M 2,k−1 · · · M k−1,k−1
Theorem 4.3.1 now implies.
Chapter 20
λmin (B) ≥ λmin (M ).
Applying Lemma 19.6.1 to B and the kth row and column of M , we find
We are less interested in the total number of edges on the boundary than in the ratio of this
number to the size of S itself. For now, we will measure this in the most natural way—by the
number of vertices in S. We will call this ratio the isoperimetric ratio of S, and define it by
def |∂(S)|
θ(S) = .
|S|
The isoperimetric ratio of a graph1 is the minimum isoperimetric ratio over all sets of at most
half the vertices:
def
θG = min θ(S).
|S|≤n/2
We will now derive a lower bound on θG in terms of λ2 . We will present an approximate converse
to this lower bound, known as Cheeger’s Inequality, in Chapter 21.
1
Other authors call this the isoperimetric number.
168
CHAPTER 20. GRAPH PARTITIONING 169 CHAPTER 20. GRAPH PARTITIONING 170
Theorem 20.1.1. For every S ⊂ V This theorem says that if λ2 is big, then G is very well connected: the boundary of every small set
of vertices is at least λ2 times something just slightly smaller than the number of vertices in the
θ(S) ≥ λ2 (1 − s), set.
where s = |S| / |V |. In particular, Re-arranging terms slightly, Theorem 20.1.1 can be stated as
θG ≥ λ2 /2. θ(S) |∂(S)|
= |V | ≥ λ2 .
1−s |S| |V − S|
Proof. As
x T LG x We sometimes favor the quantity in the middle above over the isoperimetric ratio because
λ2 = min ,
x :x T 1=0 xTx |∂(S)|
,
for every non-zero x orthogonal to 1 we know that |S| |V − S|
eliminates the need to restrict |S| ≤ |V | /2.
x T LG x ≥ λ2 x T x .
To exploit this inequality, we need a vector related to the set S. A natural choice is 1S , the
characteristic vector of S,
20.2 Conductance
(
1 if a ∈ S
1S (a) = Conductance is a variant of the isoperimetic ratio that applies to weighted graphs, and that
0 otherwise.
measures sets of vertices by the sum of their weighted degrees. Instead of counting the edges on
the boundary, it counts the sum of their weights. We write d(S) for the sum of the degrees of the
We find X vertices in S. d(V ) is twice the sum of the weights of edges in the graph, because each edge is
1TS LG 1S = (1S (a) − 1S (b))2 = |∂(S)| .
attached to two vertices. For a set of edges F , we write w(F ) for the sum of the weights of edges
(a,b)∈E
in F . We can now define the conductance of S to be
However, χS is not orthogonal to 1. To fix this, use
def w(∂(S))
ϕ(S) = .
x = 1S − s1, min(d(S), d(V − S))
Note that many slightly different definitions appear in the literature. For example, we could
so ( instead use
1−s for a ∈ S, and w(∂(S))
x (a) = d(V ) ,
−s otherwise. d(S)d(V − S)
which appears below in (20.3).
We have x T 1 = 0, and
We define the conductance of a graph G to be
X
x T LG x = ((1S (a) − s) − (1S (b) − s))2 = |∂(S)| . def
ϕG = min ϕ(S).
(a,b)∈E S⊂V
The conductance of a graph is more useful in many applications than the isoperimetric ratio. I
Claim 19.3.3 tells us that the square of the norm of x is
usually find that conductance is the more useful quantity when you are concerned about edges,
x T x = n(s − s2 ). and that isoperimetric ratio is most useful when you are concerned about vertices. Conductance
is particularly useful when studying random walks in graphs.
So,
1TS LG 1S |∂(S)|
λ2 ≤ = . 20.3 The Normalized Laplacian
1TS 1S |S| (1 − s)
As
1TS L1S w(∂(S))
= ,
1TS D1S d(S)
CHAPTER 20. GRAPH PARTITIONING 171 CHAPTER 20. GRAPH PARTITIONING 172
it seems natural to try to relate the conductance to the following generalized Rayleigh quotient: where
σ = d(S)/d(V ).
y T Ly
. (20.1) You should now check that yT d = 0:
y T Dy
T
If we make the change of variables y d= 1TS d − σ1T d = d(S) − (d(S)/d(V ))d(V ) = 0.
1/2
D y = x,
We already know that
then this ratio becomes y T Ly = w(∂(S)).
x T D −1/2 LD −1/2 x
.
xTx It remains to compute y T Dy . If you remember the computation in Claim 19.3.3, you would
This is an ordinary Rayleigh quotient, which we understand a little better. The matrix in the guess that it is d(S)(1 − σ) = d(S)d(V − S)/d(V ), and you would be right:
middle is called the normalized Laplacian (see [Chu97]). We reserve the letter N for this matrix: X X
y T Dy = d(u)(1 − σ)2 + d(u)σ 2
def
N = D −1/2 LD −1/2 . u∈S u̸∈S
x T d 1/2 = y T D1/2 d 1/2 = y T d , Proof. As the larger of d(S) and d(V − S) is at least half of d(V ), we find
we find w(∂(S))
y T Ly ν2 ≤ 2 .
ν2 = min . min(d(S), d(V − S))
y ⊥d y T Dy
ν2 /2 ≤ ϕG . (20.2)
20.4 Notes
Lemma 20.3.1. For every S ⊂ V ,
There are many variations on the definitions used in this chapter. For example, sometimes one
w(∂(S)) wants to measure the number of vertices on the boundary of a set, rather than the number of
ν2 ≤ d(V ) .
d(S)d(V − S) edges. The ratio of the number of boundary vertices to internal vertices is often called expansion.
But, authors are not consistent about these and related terms. Cut ratio is sometimes used
Proof. We would again like to again use 1S as a test vector. But, we need to shift it so that it is instead of isoperimetric ratio. When reading anything in this area, be sure to check the formulas
orthogonal to d . Set for the definitions.
y = 1S − σ1,
CHAPTER 21. CHEEGER’S INEQUALITY 174
We then set
Cheeger’s Inequality z = y − y (j)1.
This vector z satisfies z (j) = 0. And, the following lemma tells us that
z T Lz y T Ly
T
≤ T .
z Dz y Dy
In the last chapter we learned that ϕ(S) ≥ ν2 /2 for every S ⊆ V . Cheeger’s inequality is a partial
converse. It says that there exists a set of vertices S for which Lemma 21.1.2. Let v s = y + s1. Then, the minimum of v Ts Dv s is achieved at the s for which
√ v Ts d = 0.
ϕ(S) ≤ 2ν2 ,
and provides an algorithm for using the eigenvector of ν2 to find such a set. Proof. The derivative with respect to s is 2d T v s , and this is zero at the minimum.
Cheeger [Che70] first proved his famous inequality for manifolds. Many discrete versions of Theorem 21.1.3. Let G be a weighted graph, let L be its Laplacian, and let d be its vector of
Cheeger’s inequality were proved in the late 80’s [SJ89, LS88, AM85, Alo86, Dod84, Var85]. Some weighted degrees. Let z be a vector that is centered with respect to d . Then, there is a number τ
of these consider the walk matrix instead of the normalized Laplacian, and some consider the for which the set Sτ = {a : z (a) < τ } satisfies
isoperimetic ratio instead of conductance. The proof in this Chapter follows an approach r
developed by Trevisan [Tre11]. z T Lz
ϕ(Sτ ) ≤ 2 T .
z Dz
z (1)2 + z (n)2 = 1.
Cheeger’s inequality proves that if we have a vector y , orthogonal to d , for which the generalized
Rayleigh quotient (20.1) is small, then one can obtain a set of small conductance from y . We This can be achieved by multiplying z by a constant. We begin our proof of Cheeger’s inequality
obtain such a set by carefully choosing a real number τ , and setting by defining
z T Lz
Sτ = {a : y (a) ≤ τ } . ρ= T .
z Dz
√
So, we need to show that there is a τ for which ϕ(Sτ ) ≤ 2ρ.
We should think of deriving y from an eigenvector of ν2 of the normalized Laplacian. If ψ 2 is an
eigenvector of ν2 , then y = D 1/2 ψ 2 is orthogonal to d and the generalized Rayleigh quotient Recall that
w(∂(S))
(20.1) of y with respect to L and D equals ν2 . But, the theorem can make use of any vector that ϕ(S) = .
is orthogonal to d that makes the generalized Rayleigh quotient small. In fact, we prefer vectors min(d(S), d(V − S))
that are centered with respect to d . We will define a distribution on τ for which we can prove that
p
Definition 21.1.1. A vector y is centered with respect to d if E [w(∂(Sτ ))] ≤ 2ρ E [min(d(Sτ ), d(V − Sτ ))] .
X X
d (a) ≤ d (V )/2 and d (a) ≤ d (V )/2.
a:y (a)>0 a:y (a)<0
173
CHAPTER 21. CHEEGER’S INEQUALITY 175 CHAPTER 21. CHEEGER’S INEQUALITY 176
This implies1 that there is some τ for which Regardless of the signs,
p
w(∂(Sτ )) ≤ 2ρ min(d(Sτ ), d(V − Sτ )), z (a)2 − z (b)2 = |(z (a) − z (b))(z (a) + z (b))| ≤ |z (a) − z (b)| (|z (a)| + |z (b)|).
√
which means ϕ(S) ≤ 2ρ. When sgn(a) = −sgn(b),
To switch from working with y to working with z , define We will set Sτ = {a : z (a) ≤ τ }. z (a)2 + z (b)2 ≤ (z (a) − z (b))2 = |z (a) − z (b)| (|z (a)| + |z (b)|).
Trevisan had the remarkable idea of choosing τ between z (1) and z (n) with probability density
2 |t|. That is, the probability that τ lies in the interval [a, b] is
Z b
2 |t| dt. We now derive a formula for the expected denominator of ϕ.
t=a
Lemma 21.1.5.
To see that the total probability is 1, observe that T
Et [min(d(Sτ ), d(V − Sτ ))] = z Dz .
Z z (n) Z 0 Z z (n)
2 |t| dt = 2 |t| dt + 2 |t| dt = z (n)2 + z (1)2 = 1,
t=z (1) t=z (1) t=0 Proof. Observe that
X X
as z (1) ≤ 0 ≤ z (n). Et [d(Sτ )] = Prt [a ∈ Sτ ] d(a) = Prt [z (a) ≤ τ ] d(a).
a a
Similarly, the probability that τ lies in the interval [a, b] is
Z The result of our centering of z at j is that
b
2 2
2 |t| dt = sgn(b)b − sgn(a)a ,
t=a
τ < 0 =⇒ d(S) = min(d(S), d(V − S)), and
τ ≥ 0 =⇒ d(V − S) = min(d(S), d(V − S)).
where
1 if x > 0
That is, for a < j, a is in the smaller set if τ < 0; and, for a ≥ j, a is in the smaller set if τ ≥ 0.
sgn(x) = 0 if x = 0, and So,
−1 if x < 0. X X
Et [min(d(Sτ ), d(V − Sτ ))] = Pr [z (a) < τ and τ < 0] d(a) + Pr [z (a) > τ and τ ≥ 0] d(a)
Lemma 21.1.4. a<j a≥j
X X X X
Et [w(∂(Sτ ))] = wa,b Prt [(a, b) ∈ ∂(Sτ )] ≤ wa,b |z (a) − z (b)| (|z (a)| + |z (b)|). (21.1) = Pr [z (a) < τ < 0] d(a) + Pr [z (a) > τ ≥ 0] d(a)
(a,b)∈E (a,b)∈E a<j a≥j
X X
= z (a)2 d(a) + z (a)2 d(a)
Proof. An edge (a, b) with z (a) ≤ z (b) is on the boundary of S if a<j a≥j
X
= z (a)2 d(a)
z (a) ≤ τ < z (b).
a
and that X
Et [w(∂(Sτ ))] ≤ wa,b |z (a) − z (b)| (|z (a)| + |z (b)|).
(a,b)∈E
We may use the Cauchy-Schwartz inequality to upper bound the term above by
s X s X
wa,b (z (a) − z (b))2 wa,b (|z (a)| + |z (b)|)2 . (21.2) Chapter 22
(a,b)∈E (a,b)∈E
We have defined ρ so that the term under the left-hand square root is at most
The input to the algorithm is a target set size, s, a conductance bound ϕ, and a seed vertex, a.
We will prove that if G contains a set S with d (S) ≤ s ≤ d (V )/32 and ϕ(S) ≤ ϕ, then there is an
a ∈ S such that when thepalgorithm is run with these parameters, it will return a set T with
d (T ) ≤ 16s and ϕ(T ) ≤ 8 ln(8s)ϕ. For the rest of this chapter we will assume that G does
contain a set S that satisfies these conditions.
Here is the algorithm.
1. Set p 0 = δ a .
2. Set t = 1/2ϕ (we will assume that t is an integer).
178
CHAPTER 22. LOCAL GRAPH CLUSTERING 179 CHAPTER 22. LOCAL GRAPH CLUSTERING 180
t
f p 0.
3. Set y = D −1 W Proof. We will upper bound the probability that the lazy walk leaves S in each step by ϕ(S)/2. In
the first step, the probability that the lazy walk leaves S is exactly the sum over vertices a in S of
4. Return the set of the form Tτ = {b : y (b) > τ } that has least conductance among those with the probability the walk begins at a times the probability it follows an edge to a vertex not in S:
d (Tτ ) ≤ 8s.
X 1 X wa,b 1 1 X 1 w(∂(S)) 1
p S (a) = wa,b = = ϕ(S).
Recall that the stable distribution of the random walk on a graph is d /(1T d ).
So, to measure 2 d (a) 2 d (S) 2 d (S) 2
a∈S b∼a a∈S
how close a probability distribution p is to the stable distribution, we could ask how close D −1 p b̸∈S b̸∈S
is to being constant. In this chapter, we will measure this by the generalized Rayleigh quotient We now wish to show that in every future step the probability that the lazy walk leaves S is at
p T D −1 LD −1 p most this large. To this end, let p 0 = p S , and define
.
p T D −1 p f p i−1 .
pi = W
−1
When we want to apply Cheeger’s inequality, we will change variables to y = D p. In these
We now show by induction that for every a ∈ V , p i (a) ≤ d (a)/d (S). This is true for p 0 , and in
variables, the above quotient becomes f and
fact the inequality is tight for a ∈ S. To establish the induction, note that all entries of W
y T Ly
. p i−1 are nonnegative. So, the assumption that p i−1 is entrywise at most d (a)/d (S) implies that
y T Dy
for a ∈ S
f p i−1 ≤ δ T W
δ Ta p i = δ Ta W f d /d (S) = δ T d /d (S) = d (a)/d (S).
We will work with the lazy random walk matrix a a
Thus, the probability that the walk transitions from a vertex in S to a vertex not in S at step i
f = 1 (I + W ) = 1 (I + M D −1 ).
W satisfies
2 2 X 1 X wa,b X 1 X wa,b 1
p i (a) ≤ p S (a) = ϕ(S).
2 d (a) 2 d (a) 2
a∈S b∼a a∈S b∼a
b̸∈S b̸∈S
22.2 Good choices for a
We will say that a vertex a ∈ S is good for S if Lemma 22.2.2. The set S contains a good vertex a.
d (a) 1 f t δ a ≥ 1/2.
≥ and 1TS W Proof. After we expand p S in an elementary unit basis as
d (S) 2 |S|
X d (a)
The second inequality says that after t steps the lazy walk that starts at a will be in S with pS = δa,
probability at least 1/2. In this section we show that S contains a good vertex. We will then show d (S)
a∈S
that the local clustering algorithm succeeds if it begins at a good vertex.
Lemma 22.2.1 tells us that
Consider the distribution on vertices that corresponds to choosing a vertex at random from S X d (a)
1T W f t δ a ≤ tϕ(S)/2.
with probability proportional to its degree: d (S) V −S
a∈S
(
def d (a)/d (S), for a ∈ S Define fa to be the indicator1 for the event that
pS =
0, otherwise. t
f δ a > tϕ(S)
1TV −S W
The following lemma says that if we start a walk from a random vertex in S chosen with and let ba be the indicator for the event that
probability proportional to degree, then the probability it is outside S on the tth step of the lazy
walk is at most tϕ(S)/2. d (a) 1
< .
d (S) 2 |S|
t
f p S . Then
Lemma 22.2.1. Let S be a set with d (S) ≤ d (V )/2. Let p t = W 1
That is, fa = 1 if the event holds, and fa = 0 otherwise.
1TV −S p t ≤ tϕ(S)/2.
CHAPTER 22. LOCAL GRAPH CLUSTERING 181 CHAPTER 22. LOCAL GRAPH CLUSTERING 182
By an application of what is essentially Markov’s inequality, we conclude 22.4 Bounding the Generalized Rayleigh Quotient
X d (a) 1
fa < . The following lemma allows us to measure how close a walk is to convergence merely in terms of
d (S) 2
a∈S the quadratic form p Tt D −1 p t and the number of steps t.
As t
f p 0 for some probability vector p 0 . Then
X d (a) X 1 X 1 Lemma 22.4.1. Let p t = W
ba < ba ≤ = 1/2.
d (S) 2 |S| 2 |S|
a∈S a∈S a∈S p Tt D −1 LD −1 p t 1 p 0 D −1 p 0
T −1 ≤ ln −1 .
Thus, there is a vertex for which neither fa nor ba hold. As pt D pt t p tD p t
f t δ a = 1 − 1T W
1TS W f tδa The proof of Lemma 22.4.1 rests on the following standard inequality.
V −S
and tϕ(S) ≤ 1/2, such a vertex is good. Theorem 22.4.2. [Power Means Inequality] For k > h > 0, nonnegative numbers w1 , . . . , wn that
sum to 1, and nonnegative numbers λ1 , . . . , λn ,
By slightly loosening the constants in the definition of “good”, we could prove that most vertices !1/k !1/h
n
X n
X
of S are good, where “most” is defined by sampling with probability proportional to degree.
wi λki ≥ wi λhi
i=1 i=1
Thus, X
zt = ci ωit ψ i ,
i
CHAPTER 22. LOCAL GRAPH CLUSTERING 183 CHAPTER 22. LOCAL GRAPH CLUSTERING 184
R1/2t = exp(− ln(1/R)/2t) ≥ 1 − ln(1/R)/2t. Proof. We observe that for every number y and ϵ,
Moreover, as shifting y and rounding entries to zero can not increase the length of any edge,
x T Lx ≤ y T Ly .
Chapter 23
Together these imply
x T Lx y T Ly
T
≤2 T ≤ 4 ln(8 |S|)ϕ.
x Dx y Dy
As y is balanced with respect to d , we may apply Cheeger’s inequality to obtain a set T of Spectral Partitioning in a Stochastic
conductance at most p
8 ln(8 |S|)ϕ. Block Model
22.6 Notes
In this chapter, show how eigenvectors can be used to partition graphs drawn from certain
Explain where these come from, and give some references to where they are used in practice. natural models. These are called stochastic block models or a planted partition model, depending
on community and application.
The simplest model of this form is for the graph bisection problem. This is the problem of
partitioning the vertices of a graph into two equal-sized sets while minimizing the number of
edges bridging the sets. To create an instance of the planted bisection problem, we first choose a
partition of the vertices into equal-sized sets X and Y . When then choose probabilities p > q, and
place edges between vertices with the following probabilities:
p if u ∈ X and v ∈ X
Pr [(u, v) ∈ E] = p if u ∈ Y and v ∈ Y
q otherwise.
The expected number of edges crossing between X and Y will be q |X| |Y |. If p is sufficiently
√
larger than q, for example if p = 1/2 and q = p − 24/ n, we will show that the partition can be
approximately recovered from the second eigenvector of the adjacency matrix of the graph. The
result, of course, extends to over values of p and q. This will be a crude version of an analysis of
McSherry [McS01].
If p is too close to q, then the partition given by X and Y will not be the smallest. For example,
√
if q = p − ϵ/ n for small ϵ then one cannot hope to distinguish between X and Y .
McSherry analyzed more general models than this, including planted coloring problems, and
sharp results have been obtained in a rich line of work. See, for example,
[MNS14, DKMZ11, BLM15, Mas14, Vu14].
McSherry’s analysis treats the adjacency matrix of the generated graph as a perturbation of one
ideal probability matrix. In the probability matrix the second eigenvector provides a clean
partition of the two blocks. McSherry shows that the difference between the generated matrix and
186
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 187 CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 188
the ideal one is small, and so the generated matrix can be viewed as a small perturbation of the c and M are the same, and so are those of A
Note that the eigenvectors of M b and A. So
idea one. He then uses matrix perturbation theory to show that the second eigenvector of the c and A
considering M b won’t change our analysis at all. The matrix A
b is convenient because it has
generated matrix will probably be close to the second eigenvector of the original, and so it reveals rank 2. We now consider the difference between Mc and A: b
the partition. The idea of using perturbation theory to analyze random objects generated from c −A
b = M − A.
R=M
nice models has been very powerful.
Warning: stochastic block models have been the focus of a lot of research lately, and there are b Of course, the constant
To see why this would be useful, let’s look at the eigenvectors of A.
now very good algorithms for solving problems on graphs generated from these models. But, b We have
vectors are eigenvectors of A.
these are just models and very little real data resembles that produced by these models. So, there
b = n (p + q)1,
A1
is no reason to believe that algorithms that are optimized for these models will be useful in 2
practice. Nevertheless, some of them are. and so the corresponding eigenvalue is
def n
α1 = (p + q).
2
23.1 The Perturbation Approach
The second eigenvector of A b has two values: one on X and one on Y . Let’s be careful to make
this a unit vector. We take ( 1
As long as we don’t tell our algorithm, we can choose X = {1, . . . , n/2} and
√ a∈X
Y = {n/2 + 1, . . . , n}. Let’s do this for simplicity. ϕ2 (a) = n
− √1n a ∈ Y.
Define the matrix
Then,
0 p ··· p p q q ··· q q b 2 = n (p − q)ϕ2 ,
Aϕ
p 0 ··· p p q q ··· q q 2
.. .. and the corresponding eigenvalue is
. . n
def
α2 = (p − q).
p p ··· 0 p q q ··· q q 2
p ··· ··· q q
A=
p p 0 q q = pJ n/2 qJ n/2 − pI n , b has rank 2, all the other eigenvalues of A
As A b are zero.
q q ··· q q 0 p ··· p p qJ n/2 pJ n/2
q q ··· q q p 0 ··· p p For (a, b) in the same component,
. ..
.. . Pr [R(a, b) = 1 − p] = p and
q q ··· q q p p ··· 0 p Pr [R(a, b) = −p] = 1 − p,
q q ··· q q p p ··· p 0
and for (a, b) in different components,
where we write J n/2 for the square all-1s matrix of size n/2. Pr [R(a, b) = 1 − q] = q and
The adjacency matrix of the planted partition graph is obtained by setting M (a, b) = 1 with Pr [R(a, b) = −q] = 1 − q.
probability A(a, b), subject to M (a, b) = M (b, a) and M (a, a) = 0. So, this is a random graph,
but the probabilities of some edges are different from others. We can use bounds similar to that proved in Chapter 8 to show that it is unlikely that R has
large norm. The bounds that we proved on the norm of a matrix in which entries are chosen from
We will study a very simple algorithm for finding an approximation of the planted bisection:
{1 − p, −p} applies equally well if each entry (a, b) is chosen from {1 − qa,b , −qa,b } as long as
compute ψ 2 , the eigenvector of the second-largest eigenvalue of M . Then, set
qa,b < p and all have expectation 0, because (8.2) still applies. For a sharp result, we appeal to a
S = {a : ψ 2 (a) < 0}. We guess that S is one of the sets in the bisection. We will show that under
theorem of Vu [Vu07, Theorem 1.4], which implies the following.
reasonable conditions on p and q, S will be mostly right. For example, we might consider p = 1/2
√
and q = 1/2 − 12/ n. Intuitively, the reason this works is that M is a slight perturbation of A, Theorem 23.1.1. There exist constants c1 and c2 such that with probability approaching 1,
and so the eigenvectors of M should look like the eigenvectors of A. p
∥R∥ ≤ 2 p(1 − p)n + c1 (p(1 − p)n)1/4 ln n,
To simplify some formulas, we henceforth work with
provided that
c def b def ln4 n
M = M + pI and A = A + pI p ≥ c2 .
n
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 189 CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 190
We use a crude corollary of this result. and let θ be the angle between them. For every vertex a that is misclassified by ψ 2 , we have
|δ(a)| ≥ √1n . So, if ψ 2 misclassifies k vertices, then
Corollary 23.1.2. There exists a constant c0 such that with probability approaching 1,
r
√ k
∥R∥ ≤ 3 pn, ∥δ∥ ≥ .
n
provided that
As ϕ2 and ψ 2 are unit vectors, we may apply the crude inequality
ln4 n
p ≥ c0 . √
n ∥δ∥ ≤ 2 sin θ
√
In fact, Alon, Krivelevich and Vu [AKV02] prove that the probability that the norm of R exceeds (the 2 disappears as θ gets small).
this value by more than t is exponentially small in t. However, we will not need that fact for this
lecture. To combine this with the perturbation bound, we assume q > p/3, and find
n
min |α2 − αj | = (p − q).
j̸=2 2
23.2 Perturbation Theory for Eigenvectors √
Assuming that ∥R∥ ≤ 3 pn, we find
c , and let α1 > α2 > 0 = α3 = · · · = αn be the
Let µ1 ≥ µ2 ≥ · · · ≥ µn be the eigenvalues of M √ √
2 ∥R∥ 2 · 3 pn 12 p
b Weyl’s inequality, which one can prove using the Courant-Fischer theorem, says
eigenvalues of A. sin θ ≤ n ≤ n =√ .
2 (p− q) 2 (p − q) n(p − q)
that
|µi − αi | ≤ ∥R∥ . (23.1) So, the number k of misclassified vertices satisfies
r √ √
So, we can view µ2 as a perturbation of α2 . We need a stronger fact, which is that we can view k 2 · 12 p
≤√ ,
ψ 2 as a perturbation of ϕ2 . n n(p − q)
The Davis-Kahan theorem [DK70] says that ψ 2 will be close to ϕ2 , in angle, if the norm of R is which implies
b That is, the
significantly less than the distance between α2 and the other eigenvalues of A. 288p
k≤ .
eigenvector does not move too much if its corresponding eigenvalue is isolated. (p − q)2
So, we expect to misclassify at most a constant number of vertices if p and q remain constant as n
Theorem 23.2.1. Let A and B be symmetric matrices. Let R = A − B. Let α1 ≥ · · · ≥ αn be √
grows large. An interesting case to consider is p = 1/2 and q = p − 24/ n. This gives
the eigenvalues of A with corresponding eigenvectors ϕ1 , . . . , ϕn and let Let β1 ≥ · · · ≥ βn be the
eigenvalues of B with corresponding eigenvectors ψ 1 , . . . , ψ n . Let θi be the angle between ±ψ i 288p n
and ±ϕi . Then, = ,
(p − q)2 4
2 ∥R∥
sin 2θi ≤ .
minj̸=i |αi − αj | so we expect to misclassify at most a constant fraction of the vertices. Of course, once one gets
most of the vertices correct is should be possible to use them to better classify the rest. Many of
The angle is never more than π/2, because this theorem is bounding the angle between the the advances in the study of algorithms for this problem involve better and more rigorous ways of
eigenspaces rather than a particular choice of eigenvectors. We will prove and use a slightly doing this.
weaker statement in which we replace 2θ with θ.
more than 1, we may also assume that αi has multiplicity 1 as an eigenvalue, and that ψ i is a 23.5 Further Reading
unit vector in the nullspace of B.
Our assumption that αi = 0 also leads to |βi | ≤ ∥R∥ by Weyl’s inequality (23.1). If you would like to know more about bounding norms and eigenvalues of random matrices, I
recommend [Ver10] and [Tro12].
Expand ψ i in the eigenbasis of A, as
X
ψi = cj ϕj , where cj = ϕTj ψ i .
j
Setting
δ = min |αj | ,
j̸=i
we compute
X
∥Aψ i ∥2 = c2j αj2
j
X
≥ c2j δ 2
j̸=i
X
= δ2 c2j
j̸=i
= δ 2 (1 − c2i )
= δ 2 sin2 θi .
So,
2 ∥R∥
sin θi ≤ .
δ
It may seem surprising that the amount by which eigenvectors move depends upon how close
their respective eigenvalues are to the other eigenvalues. However, this dependence is necessary.
To see why, first consider the matrices
1+ϵ 0 1 0
and .
0 1 0 1+ϵ
1 0
While these two matrices are very close, their leading eigenvectors are and , which are
0 1
90 degrees from each other.
The heart of the problem is that there is no unique eigenvector of an eigenvalue that has
multiplicity greater than 1.
CHAPTER 24. NODAL DOMAINS 194
0.4 0.4
v2 v10
v3
v4
Value in Eigenvector
Value in Eigenvector
0.2 0.2
0.0 0.0
-0.4 -0.4
2 4 6 8 10 2 4 6 8 10
This remains true even when the edges of the path graphs have weights. Here are the analogous
plots for a path graph with edge weights randomly chosen in [0, 1]:
24.1 Overview
0.4 v2 v11
v3
v4 0.50
The goal of this section is to rigorously explain some of the behavior we observed when using
Value in Eigenvector
Value in Eigenvector
0.2
eigenvectors to draw graphs in the introduction First, recall some of the drawings we made of 0.25
graphs: 0.0
0.00
-0.2
-0.25
-0.4
-0.50
2 4 6 8 10 2 4 6 8 10
Vertex Number Vertex Number
These images were formed by computing the eigenvectors corresponding to the second and third
smallest eigenvalues of the Laplacian for each graph, ψ 2 and ψ 3 , and then using ψ 2 to assign a
horizontal coordinate to each vertex and ψ 3 to assign a vertical coordinate. The edges are drawn
as straight lines between their endpoints
We will show that the subgraphs obtained in the right and left halves of each image are connected.
Path graphs exhibited more interesting behavior: their kth eigenvector changes sign k times:
193
CHAPTER 24. NODAL DOMAINS 195 CHAPTER 24. NODAL DOMAINS 196
Theorem 24.2.2 (Sylvester’s Law of Inertia). Let M be any symmetric matrix and let B be any
v2 non-singular matrix. Then, the matrix BM B T has the same number of positive, negative and
v3
0.50 v4 zero eigenvalues as M .
Value in Eigenvector
0.25 Note that if the matrix B were orthogonal, or if we used B −1 in place of B T , then these matrices
would have the same eigenvalues. What we are doing here is different, and corresponds to a
0.00 change of variables.
-0.25 Proof. It is clear that M and BM B T have the same rank, and thus the same number of zero
eigenvalues.
-0.50
We will prove that M has at least as many positive eigenvalues as BM B T . One can similarly
prove that that M has at least as many negative eigenvalues, which proves the theorem.
2 4 6 8 10
Vertex Number Let γ1 ≥ . . . ≥ γk be the positive eigenvalues of BM B T and let Yk be the span of the
corresponding eigenvectors. Now, let Sk be the span of the vectors B T y , for y ∈ Yk . As B is
Random.seed!(1)
non-singluar, Sk has dimension k. Let α1 ≥ · · · ≥ αn be the eigenvalues of M . By the
M = spdiagm(1=>rand(10))
Courant-Fischer Theorem, we have
M = M + M’
L = lap(M) xTM x xTM x y T BM B T y γk y T y
E = eigen(Matrix(L)) αk = max min ≥ min = min ≥ T > 0.
S⊆IRn x ∈S xTx x ∈Sk x T x y ∈Yk y T BB T y y BB T y
Plots.plot(E.vectors[:,2],label="v2",marker = 5) dim(S)=k
Plots.plot!(E.vectors[:,3],label="v3",marker = 5)
Plots.plot!(E.vectors[:,4],label="v4",marker = 5) So, M has at least k positive eigenvalues (The point here is that the denominators are always
xlabel!("Vertex Number") positive, so we only need to think about the numerators.)
ylabel!("Value in Eigenvector") To finish, either apply the symmetric argument to the negative eigenvalues, or apply the same
savefig("rpath2v24.pdf") argument with B −1 to reverse the roles of A and BAB T .
We see that the kth eigenvector still changes sign k times. We will prove that this always
happens. These are some of Fiedler’s theorems about “nodal domains”. Nodal domains are the 24.3 Weighted Trees
connected parts of a graph on which an eigenvector is negative or positive.
We will now a slight simplification of a theorem of Fiedler [Fie75a].
24.2 Sylvester’s Law of Inertia Theorem 24.3.1. Let T be a weighted tree graph on n vertices, let LT have eigenvalues
0 = λ1 < λ2 · · · ≤ λn , and let ψ k be an eigenvector of λk . If there is no vertex a for which
ψ k (a) = 0, then there are exactly k − 1 edges (a, b) for which ψ k (a)ψ k (b) < 0.
Let’s begin with something obvious.
Claim 24.2.1. If A is positive semidefinite, then so is B T AB for every matrix B. One can extend this theorem to accommodate zero entries. We will just prove this theorem for
weighted path graphs, which are a special case of weighted trees. At the beginning of this section,
Proof. For every vector x , we plotted the eigenvectors of some weighted paths by using the index of a vertex along the path
x T B T ABx = (Bx )T A(Bx ) ≥ 0, as the horizontal coordinate, and the value of the eigenvector at that vertex as the vertical
coordinate. When we draw the edges as straight lines, the number of sign changes equals the
since A is positive semidefinite.
number of times the plot crosses the horizontal axis.
We will make use of Sylvester’s Law of Inertia, which is a powerful generalization of this fact. I Our analysis will rest on an understanding of Laplacians of paths that are allowed to have
will state and prove it now. negative edges weights.
CHAPTER 24. NODAL DOMAINS 197 CHAPTER 24. NODAL DOMAINS 198
Lemma 24.3.2. Let M be the Laplacian matrix of a weighted path that can have negative edge I claim that X
weights: X X = wa,b ψ k (a)ψ k (b)La,b .
M = wa,a+1 La,a+1 , (a,b)∈E
1≤a<n
To see this, first check that this agrees with the previous definition on the off-diagonal entries. To
where the weights wa,a+1 are non-zero and we recall that La,b is the Laplacian of the edge (a, b). verify that these expressions agree on the diagonal entries, we will show that the sum of the
The number of negative eigenvalues of M equals the number of negative edge weights. entries in each row of both expressions agree. In fact, they are zero. As we know that all the
off-diagonal entries agree, this will imply that the diagonal entries agree. We compute
Note that this is also true for weighted trees.
Ψk (LP − λk I )Ψk 1 = Ψk (LP − λk I )ψ k = Ψk (λk ψ k − λk ψ k ) = 0.
Proof. Note that X As Lu,v 1 = 0 and X 1 = 0, the row sums agree. Lemma 24.3.2 now tells us that the matrix X ,
xTM x = wu,v (x (u) − x (v))2 . and thus LP − λk I , has as many negative eigenvalues as there are edges (a, b) for which
(u,v)∈E ψ k (a)ψ k (b) < 0.
We now perform a change of variables that will diagonalize the matrix M . Let δ(1) = x (1), and
for every a > 1 let δ(a) = x (a) − x (a − 1).
24.4 The Perron-Frobenius Theorem for Laplacians
Every variable x (1), . . . , x (n) can be expressed as a linear combination of the variables
δ(1), . . . , δ(n). In particular,
In Theorem 4.5.1 we stated the Perron-Frobenius Theorem for non-negative matrices. I wish to
x (a) = δ(1) + δ(2) + · · · + δ(a). quickly observe that this theory may also be applied to Laplacian matrices, to principal
sub-matrices of Laplacian matrices, and to any matrix with non-positive off-diagonal entries. The
So, there is a square matrix B of full rank such that difference is that it then involves the eigenvector of the smallest eigenvalue, rather than the
largest eigenvalue.
x = Bδ.
Corollary 24.4.1. Let M be a matrix with non-positive off-diagonal entries, such that the graph
By Sylvester’s law of inertia, we know that of the non-zero off-diagonally entries is connected. Let λ1 be the smallest eigenvalue of M and let
v 1 be the corresponding eigenvector. Then v 1 may be taken to be strictly positive, and λ1 has
BT M B multiplicity 1.
has the same number of positive, negative, and zero eigenvalues as M . On the other hand,
Proof. Consider the matrix B = σI − M , for some large σ. For σ sufficiently large, this matrix
X
δ T B T M Bδ = wa,a+1 (δ(a + 1))2 . will be non-negative, and the graph of its non-zero entries is connected. So, we may apply the
1≤a<n Perron-Frobenius theory to B to conclude that its largest eigenvalue α1 has multiplicity 1, and
the corresponding eigenvector v 1 may be assumed to be strictly positive. We then have
So, this matrix clearly has one zero eigenvalue, and as many negative eigenvalues as there are λ1 = σ − α1 , and v 1 is an eigenvector of λ1 .
negative wa,a+1 .
Proof of Theorem 24.3.1. We assume that λk has multiplicity 1. One can prove it, but 24.5 Fiedler’s Nodal Domain Theorem
we will skip it.
Let Ψk denote the diagonal matrix with ψ k on the diagonal, and let λk be the corresponding Given a graph G = (V, E) and a subset of vertices, W ⊆ V , recall that the graph induced by G on
eigenvalue. Consider the matrix W is the graph with vertex set W and edge set
X = Ψk (LP − λk I )Ψk .
{(i, j) ∈ E, i ∈ W and j ∈ W } .
The matrix LP − λk I has one zero eigenvalue and k − 1 negative eigenvalues. As we have
assumed that ψ k has no zero entries, Ψk is non-singular, and so we may apply Sylvester’s Law of This graph is sometimes denoted G(W ).
Intertia to show that the same is true of X .
CHAPTER 24. NODAL DOMAINS 199 CHAPTER 24. NODAL DOMAINS 200
Theorem 24.5.1 ([Fie75b]). Let G = (V, E, w) be a weighted connected graph, and let L be its and so the smallest eigenvalue of B i is less than λk . On the other hand, if x i has any zero entries,
Laplacian matrix. Let 0 = λ1 < λ2 ≤ · · · ≤ λn be the eigenvalues of LG and let ψ 1 , . . . , ψ n be the then the Perron-Frobenius theorem tells us that x i cannot be the eigenvector of B i of smallest
corresponding eigenvectors. For any k ≥ 2 and any t ≤ 0, let eigenvalue, and so the smallest eigenvalue of B i is less than λk . Thus, the matrix
Wk = {i ∈ V : ψ k (i) ≥ t} . B1 0 · · · 0
0 B2 · · · 0
Then, the graph induced by G on Wk has at most k − 1 connected components.
.. .. .. ..
. . . .
Proof. We prove this theorem for the case that t = 0. Some additional work is needed to handle 0 0 ··· Bc
the general case.
has at least c eigenvalues less than λk . By the eigenvalue interlacing theorem, this implies that L
To see that Wk is non-empty, recall that ψ 1 = 1 and that ψ k is orthogonal ψ 1 . So, ψ k must have has at least c eigenvalues less than λk . We may conclude that c, the number of connected
both positive and negative entries. components of G(Wk ), is at most k − 1.
Assume that G(Wk ) has c connected components. After re-ordering the vertices so that the
vertices in each connected component of G(Wk ) are contiguous, we may assume that L and ψ k This theorem breaks down if we instead consider the set
have the forms
B1 0 0 ··· C1 x1 W = {i : ψ k (i) > 0} .
0 B 2 0 · · · C 2 x 2
The star graphs provide counter-examples.
.. ψ = .. ,
L = ... ..
.
..
.
..
. . k .
0 0 · · · B c C c x c 1
T T T
C1 C2 ··· Cc D y
0 −3
and
B1 0 0 ··· C1 x1 x1 1
0 B2 0 ···
C 2 x 2 x 2
.. .. .. .. .. .. = λ .. .
. . . . . k .
.
0 0 ··· Bc C c x c x c 1
C T1 C T2 ··· C Tc D y y
The first c sets of rows and columns correspond to the c connected components. So, x i ≥ t for
1 ≤ i ≤ c and y < t ≤ 0 (when I write this for a vector, I mean it holds for each entry). We also Figure 24.1: The star graph on 5 vertices, with an eigenvector of λ2 = 1.
know that the graph of non-zero entries in each B i is connected, and that each C i is non-positive
and has at least one negative entry (otherwise the graph G would be disconnected).
We will now prove that the smallest eigenvalue of each B i is smaller than λk . We know that
B i x i + C i y = λk x i .
As each entry in C i is non-positive and y is strictly negative, each entry of C i y is non-negative.
As each C i has at least one negative entry, some entry of C i y is positive. This implies that x i
cannot be the zero vector. As we assumed that x i ≥ t = 0, we can multiply the equation
B i x i = λk x i − C i y ≤ λk x i
by x i to get
x Ti B i x i ≤ λk x Ti x i .
If x i is strictly positive, then x Ti C i y > 0, and this inequality is strict:
x Ti B i x i = λk x Ti x i − x Ti C i y < λk x Ti x i ,
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 202
We typically upper bound λ2 by evidencing a test vector. Here, we will upper bound λ2 by
evidencing a test embedding. The bound we apply is:
Similarly, X X X X
∥v i ∥2 = x2i + yi2 + · · · + zi2 .
This Chapter Needs Editing i i i i
201
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 203 CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 204
(25.1) = Θ(1).
These graphs come from triangles inside triangles inside triangles. . . Such a graph is depicted
below:
Graph
Discs P
If we had i v i = 0, the rest of the computation would be easy. For each i, ∥v i ∥ = 1, so the
denominator of (25.1) is n. Let ri denote the straight-line distance from v i to the boundary of
Di . We then have (see Figure 25.2)
∥v i − v j ∥2 ≤ (ri + rj )2 ≤ 2ri2 + 2rj2 .
We will fix this problem by lifting the planar embeddings to the sphere by stereographic P
So, the numerator of (25.1) is at most 2d i ri2 . On the other hand, a theorem of Archimedes
projection. Given a plane, IR2 , and a sphere S tangent to the plane, we can define the tells us that the area of the cap encircled by Di is at exactly πri2 . Rather than proving it, I will
stereographic projection map, Π, from the plane to the sphere as follows: let s denote the point convince you that it has
where the sphere touches the plane, and let n denote the opposite point on the sphere. For any √ to be true because it is true when ri is small, it is true when the cap is a
hemisphere and ri = 2, and it is true when the cap is the whole sphere and ri = 2.
point x on the plane, consider the line from x to n. It will intersect the sphere somewhere. We
let this point of intersection be Π(x ). As the caps are disjoint, we have X
πri2 ≤ 4π,
The fundamental fact that we will exploit about stereographic projection is that it maps circles to i
circles! So, by applying stereographic projection to a kissing disk embedding of a graph in the
which implies that the numerator of (25.1) is at most
plane, we obtain a kissing disk embedding of that graph on the sphere. Let Di = Π(Ci ) denote X X
the image of circle Ci on the sphere. We will now let v i denote the center of Di , on the sphere. ∥v a − v b ∥2 ≤ 2ra2 + 2rb2 ≤ 2d ra2 ≤ 8d.
(a,b)∈E a
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 205 CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 206
Instead of proving that we can achieve (25.2), I will prove a slightly simpler theorem. The proof
of the theorem we really want is similar, but about just a few minutes too long for class. We will
prove
Theorem 25.3.1. Let v 1 , . . . , v n be points on the unit-sphere. Then, there exists an ω such that
P
i fω (v i ) = 0.
The reason that this theorem is different from the one that we want to prove is that if we apply a
circle-preserving map from the sphere to itself, the center of the circle might not map to the
center of the image circle.
P
To show that we can achieve i v i = 0, we will use the following topological lemma, which
Figure 25.3: A Spherical Cap. follows immediately from Brouwer’s fixed point theorem. In the following, we let B denote the
ball of points of norm less than 1, and S the sphere of points of norm 1.
Lemma 25.3.2. If ϕ : B → B be a continuous map that is the identity on S. Then, there exists
Putting these inequalities together, we see that an ω ∈ B such that
P ϕ(ω) = 0.
(i,j)∈E ∥v i − v j ∥2 8d
min P P 2 ≤ .
v 1 ,...,v n ∈IRd : v i =0 i ∥v i ∥ .
n
We will prove this lemma using Brouwer’s fixed point theorem:
Thus, we merely need to verify that we can ensure that Theorem 25.3.3 (Brouwer). If g : B → B is continuous, then there exists an α ∈ B such that
X g(α) = α.
v i = 0. (25.2)
i
Proof of Lemma 25.3.2. Let b be the map that sends z ∈ B to z / ∥z ∥. The map b is continuous
Note that there is enough freedom in our construction to believe that we could prove such a at every point other than 0. Now, assume by way of contradiction that 0 is not in the image of ϕ,
thing: we can put the sphere anywhere on the plane, and we could even scale the image in the and let g(z ) = −b(ϕ(z )). By our assumption, g is continuous and maps B to B. However, it is
plane before placing the sphere. By carefully combining these two operations, it is clear that we clear that g has no fixed point, contradicting Brouwer’s fixed point theorem.
can place the center of gravity of the v i s close to any point on the boundary of the sphere. It
turns out that this is sufficient to prove that we can place it at the origin.
Lemma 25.3.2, was our motivation for defining the maps fω in terms of ω ∈ B. Now consider
setting
1X
25.3 The center of gravity ϕ(ω) = fω (v i ).
n
i
The only thing that stops us from applying Lemma 25.3.2 at this point is that ϕ is not defined on
We need a nice family of maps that transform our kissing disk embedding on the sphere. It is
S, because fω was not defined for ω ∈ S. To fix this, we define for α ∈ S
particularly convenient to parameterize these by a point ω inside the sphere. For any point α on
the surface of the unit sphere, I will let Πα denote the stereographic projection from the plane (
α if z ̸= −α
tangent to the sphere at α. fα (z ) =
−α otherwise.
I will also define Π−1 −1
α . To handle the point −α, I let Πα (−α) = ∞, and Πα (∞) = −α. We also
define the map that dilates the plane tangent to the sphere at α by a factor a: Dαa . We then We then encounter the problem that fα (z ) is not a continuous function of α because it is
define the following map from the sphere to itself discontinuous at α = −v i . But, this shouldn’t be a problem because the point ω at which
ϕ(ω) = 0 won’t be on or near the boundary. The following argument makes this intuition formal.
def 1−∥ω∥
fω (x ) = Πω/∥ω∥ Dω/∥ω∥ Π−1 ω/∥ω∥ (x ) .
We set (
For α ∈ S and ω = aα, this map pushes everything on the sphere to a point close to α. As a 1 if dist(ω, z ) < 2 − ϵ, and
hω (z ) =
approaches 1, the mass gets pushed closer and closer to α. (2 − dist(ω, z ))/ϵ otherwise.
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 207
To finish the proof, we need to get rid of this ϵ. That is, we wish to show that ω is bounded away
from S, say by µ, for all sufficiently small ϵ. If that is the case, then we will have Planar Graphs 2, the Colin de
dist(ω, v i ) ≥ µ > 0 for all sufficiently small ϵ. So, for ϵ < µ and sufficiently small, hω (v i ) = 1 for
all i, and we recover the ϵ = 0 case. Verdière Number
One can verify that this holds provided that the points v i are distinct and there are at least 3 of
them.
Finally, recall that this is not exactly the theorem we wanted to prove: this theorem deals with This Chapter Needs Editing
v i , and not the centers of caps. The difficulty with centers of caps is that they move as the caps
move. However, this can be overcome by observing that the centers remain inside the caps, and
move continuously with ω. For a complete proof, see [ST07, Theorem 4.2] 26.1 Introduction
In this lecture, I will introduce the Colin de Verdière number of a graph, and sketch the proof that
25.4 Further progress
it is three for planar graphs. Along the way, I will recall two important facts about planar graphs:
This result has been improved in many ways. Jonathan Kelner [Kel06] generalized this result to
1. Three-connected planar graphs are the skeletons of three-dimensional convex polytopes.
graphs of bounded genus. Kelner, Lee, Price and Teng [KLPT09] obtained analogous bounds for
λk for k ≥ 2. Biswal, Lee and Rao [BLR10] developed an entirely new set of techniques to prove 2. Planar graphs are the graphs that do not have K5 or K3,3 minors.
these results. Their techniques improve these bounds, and extend them to graphs that do not
have Kh minors for any constant h.
26.2 Colin de Verdière invariant
The Colin de Verdière graph parameter essentially measures the maximum multiplicity of the
second eigenvalue of a generalized Laplacian matrix of the graph. It is less than or equal to three
precisely for planar graphs.
We say that M is a Generalized Laplacian Matrix of a graph G = (V, E) if M can be expressed as
M = L + D where L is a the Laplacian matrix of a weighted version of G and D is an arbitrary
diagonal matrix. That is, we impose the restrictions:
208
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 209 CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 210
The Colin de Verdière graph parameter, which we denote cdv(G) is the maximum multiplicity of of a convex polytope are the line segments on the boundary of the polytope that go between two
the second-smallest eigenvalue of a Generalized Laplacian Matrix M of G satisfying the following vertices of the polytope and such that every point on the edge cannot be expressed non-trivially
condition, known as the Strong Arnold Property. as the convex hull of any vertices other than these two.
Theorem 26.3.1 (Steinitz’s Theorem). For every three-connected planar graph G = (V, E), there
For every non-zero n-by-n matrix X such that X(i, j) = 0 for i = j and (i, j) ∈ E, exists a set of vectors x 1 , . . . , x n ∈ IR3 such that the line segment from x i to x j is an edge of the
M X ̸= 0. convex hull of the vectors if and only if (i, j) ∈ E.
That is, every planar graph may be represented by the edges of a three-dimensional convex
That later restriction will be unnecessary for the results we will prove in this lecture.
polytope. We will use this representation to construct a Generalized Laplacian Matrix M whose
Colin de Verdière [dV90] proved that cdv(G) is at most 2 if and only if the graph G is second-smallest eigenvalue has multiplicity 3.
outerplanar. That is, it is a planar graph in which every vertex lies on one face. He also proved
that it is at most 3 if and only if G is planar. Lovàsz and Schrijver [LS98] proved that it is at
most 4 if and only if the graph is linkless embeddable. 26.4 The Colin de Verdière Matrix
In this lecture, I will sketch proofs from two parts of this work:
Let G = (V, E) be a planar graph, and let x 1 , . . . , x n ∈ IR3 be the vectors given by Steinitz’s
1. If G is a three-connected planar graph, then cdv(G) ≥ 3. Theorem. For 1 ≤ i ≤ 3, let v i ∈ IRn be the vector given by
v i (j) = x j (i).
2. If G is a three-connected planar graph, then cdv(G) ≤ 3.
So, the vector v i contains the ith coordinate of each vector x 1 , . . . , x n .
The first requires the construction of a matrix, which we do using the representation of the graph
We will now see how to construct a generalized Laplacian matrix M having the vectors v 1 , v 2 and
as a convex polytope. The second requires a proof that no Generalized Laplacian Matrix of the
v 3 in its nullspace. One can also show that the matrix M has precisely one negative eigenvalue.
graph has a second eigenvalue of high multiplicity. We prove this by using graph minors.
But, we won’t have time to do that in this lecture. You can find the details in [Lov01].
Our construction will exploit the vector cross product. Recall that for two vectors x and y in IR3
26.3 Polytopes and Planar Graphs that it is possible to define a vector x × y that is orthogonal to both x and y , and whose length
is the area of the parallelogram with sizes x and y . This determines the cross product up to sign.
You should recall that the sign is determined by an ordering of the basis of IR3 , or by the right
Let me begin by giving two definitions of convex polytope: as the convex hull of a set of points
hand rule. Also recall that
and as the intersection of half-spaces.
x × y = −y × x ,
Let x 1 , . . . , x n ∈ IRd (think d = 3). Then, the convex hull of x 1 , . . . , x n is the set of points
( ) (x 1 + x 2 ) × y = x 1 × y + x 2 × y , and
X X x × y = 0 if and only if x and y are parallel.
ai x i : ai = 1 and all ai ≥ 0 .
i
We will now specify the entries M (i, j) for (i, j) ∈ E. An edge (i, j) is on the boundary of two
Every convex polytope is the convex hull of its extreme vertices. faces of the polytope. Let’s say that the vectors defining these faces are y a and y b . So,
A convex polytope can also be defined by its faces. For example, given vectors y 1 , . . . , y l , the set y Ta x i = y Ta x j = y Tb x i = y Tb x j = 1.
of points
x : y Ti x ≤ 1, for all i So,
(y a − y b )T x i = (y a − y b )T x j = 0.
is a convex polytope. Moreover, every convex polytope containing the origin in its interior can be
This implies that y a − y b is parallel to x i × x j .
described in this way. Each vector y i defines a face of the polytope consisting of those points x in
the polytope such that y Ti x = 1. Assume y a comes before y b in the clockwise order about vertex x i . So, y b − y a points the same
direction as x i × x j . Set M (i, j) so that
The vertices of a convex polytope are those points x in the polytope that cannot be expressed
non-trivially as a convex combination of any points other than themselves. The edges (or 1-faces) M (i, j)x i × x j = y a − y b
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 211 CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 212
and M (i, j) < 0. multiplicity greater than or equal to 4. We will do this by showing that if G is three-connected
and cdv(G) ≥ 4, then G contains a K3,3 minor. Without loss of generality, we can assume λ2 = 0
I will now show that we can choose the diagonal entries M (i, i) so that the coordinate vectors are
(by just adding a diagonal matrix).
in the nullspace of M . First, set X
x̂ i = M (i, j)x j . Our proof will exploit a variant of Fiedler’s Nodal Domain Theorem, which we proved back in the
j∼i beginning of the semester. That theorem considered any eigenvector v of λ2 (of a Laplacian), and
We will show that x̂ i is parallel to x i by observing that x̂ i × x i = 0. We compute proved that the set of vertices that are non-negative in v is connected. The variant we use is due
X X to van der Holst [van95], which instead applies to eigenvectors v of λ2 of minimal support. These
x i × x̂ i = x i × M (i, j)x j = M (i, j)x i × x j . are the eigenvectors of v of λ2 for which there is no other eigenvector w of λ2 such that the zeros
j∼i j∼i of v are a subset of the zeros of w . That is, v has as many zero entries as possible. One can then
This sum counts the difference y b − y a between each adjacent pair of faces that touch x i . By prove that the set of vertices that are positive in v is connected. And, one can of course do the
going around x i in counter-clockwise order, we see that each of these vectors occurs once same for the vertices that are negative.
positively and once negatively in the sum, so the sum is zero.
Now, let F be any face of G, and let a, b and c be three vertices in F . As λ2 has multiplicity at
Thus, x i and x̂ i are parallel, and we may set M (i, i) so that least 4, it has some eigenvector that is zero at each of a, b and c. Let v be an eigenvector of λ2
with minimal support that is zero at each of a, b, and c. Let d be any vertex for which v (d) > 0.
M (i, i)x i + x̂ i = 0. As the graph is three-connected, it contains three vertex-disjoint paths from d to a, b, and c (this
This implies that the coordinate vectors are in the nullspace of M , as follows from Menger’s Theorem, which I have not covered). As v (d) > 0 and v (a) = 0, there is
some vertex a′ on the path from d to a for which v (a′ ) = 0 but a′ has a neighbor a+ for which
x1 v (a+ ) > 0. As λ2 = 0, a′ must also have a neighbor a− for which v (a− ) < 0. Construct similar
x 2 X
vertices for b and c.
M .. = M (i, i)x i + M (i, j)x j = M (i, i)x i + x̂ i .
.
j∼i
xn i b+
One can also show that the matrix M has precisely one negative eigenvalue, so the multiplicity of d a+ a−
its second-smallest eigenvalue is 3. a′
Now, contract every edge on the path from a to a′ , on the path from b to b′ and on the path from
26.6 cdv(G) ≤ 3 c to c′ . Also, contract all the vertices for which v is positive and contract all the vertices for
which v is negative (which we can do because these sets are connected). Finally, contract every
We will now prove that if G is a 3-connected planar graph, then cdv(G) ≤ 3. Assume, by way of edge in the face F that does not involve one of a, b, or c. We obtain a graph with a triangular
contradiction, that there is generalized Laplacian matrix M of G whose second eigenvalue λ2 has face abc such that each of a, b, and c have an edge to the positive supervertex and the negative
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 213 CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 214
b+
d a+ a−
a′
a b’
b
f
b−
+
c c
+
′ a′
c
c−
Figure 26.2: The set of positive and negative vertices that will be contracted. Vertex f has been a b’
inserted. b
f
−
To do this, we add one additional vertex f inside the face and connected to each of a, b, and c. c
This does not violate planarity because a, b, and c were contained in a face. In fact, we can add f
before we do the contractions. By throwing away all other edges, we have constructed a K3,3
minor, so the graph cannot be planar.
c′
Figure 26.3: The edges in the cycle have been contracted, as have all the positive and negative
vertices. After contracting the paths between a and a′ , between b and b′ and between c and c′ , we
obtain a K3,3 minor.
Chapter 27
27.1 Overview
Expander Graphs
We say that a d-regular graph is a good expander if all of its adjacency matrix eigenvalues are
small. To quantify this, we set a threshold ϵ > 0, and require that each adjacency matrix
eigenvalue, other than d, has absolute value at most ϵd. This is equivalent to requiring all
non-zero eigenvalues of the Laplacian to be within ϵd of d.
In this lecture, we will:
Random d-regular graphs are expander graphs. Explicitly constructed expander graphs have
proved useful in a large number of algorithms and theorems. We will see some applications of
them next week.
One way of measuring how well two matrices A and B approximate each other is to measure the
operator norm of their difference: A − B. Since I consider the operator norm by default, I will
215 216
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 217 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 218
just refer to it as the norm. Recall that the norm of a matrix M is defined to be its largest 27.3 Quasi-Random Properties of Expanders
singular value:
∥M x ∥
∥M ∥ = max , There are many ways in which expander graphs act like random graphs. Conversely, one can prove
x ∥x ∥ that a random d-regular graph is an expander graph with reasonably high probability [Fri08].
where the norms in the fraction are the standard Euclidean vector norms. The norm of a
symmetric matrix is just the largest absolute value of one of its eigenvalues. It can be very We will see that all sets of vertices in an expander graph act like random sets of vertices. To make
different for a non symmetric matrix. this precise, imagine creating a random set S ⊂ V by including each vertex in S independently
with probability α. How many edges do we expect to find between vertices in S? Well, for every
For this lecture, we define an ϵ-expander to be a d-regular graph whose adjacency matrix edge (u, v), the probability that u ∈ S is α and the probability that v ∈ S is α, so the probability
eigenvalues satisfy |µi | ≤ ϵd for µi ≥ 2. As the Laplacian matrix eigenvalues are given by that both endpoints are in S is α2 . So, we expect an α2 fraction of the edges to go between
λi = d − µi , this is equivalent to |d − λi | ≤ ϵd for i ≥ 2. It is also equivalent to vertices in S. We will show that this is true for all sufficiently large sets S in an expander.
∥LG − (d/n)LKn ∥ ≤ ϵd. In fact, we will prove a stronger version of this statement for two sets S and T . Imagine including
each vertex in S independently with probability α and each vertex in T with probability β. We
allow vertices to belong to both S and T . For how many ordered pairs (u, v) ∈ E do we expect to
For this lecture, I define a graph G to be an ϵ-approximation of a graph H if
have u ∈ S and v ∈ T ? Obviously, it should hold for an αβ fraction of the pairs.
(1 − ϵ)H ≼ G ≼ (1 + ϵ)H, For a graph G = (V, E), define
⃗
E(S, S)
(1 − ϵ)dx T x ≤ x T LG x ≤ (1 + ϵ)dx T x .
Let H = nd Kn . As G is a good approximation of H, let’s compute Theorem 27.4.1 ([Tan84]). Let G = (V, E) be a d-regular graph on n vertices that
ϵ-approximates nd Kn . Then, for all S ⊆ V ,
d d
χTS LH χT = χTS dI − J χT = d |S ∩ T | − |S| |T | = d |S ∩ T | − αβdn.
n n |S|
|N (S)| ≥ ,
ϵ2 (1 − α) + α
So,
⃗
E(S, T ) − αβdn = χTS LG χT − χTS LH χT . where |S| = αn.
As
∥LG − LH ∥ ≤ ϵd, Note that when α is much less than ϵ2 , the term on the right is approximately |S| /ϵ2 , which can
be much larger than |S|. We will derive Tanner’s theorem from Theorem 27.3.1.
χTS LH χT − χTS LG χT = χTS (LH − LG )χT
≤ ∥χS ∥ ∥(LH − LG )χT ∥ Proof. Let R = N (S) and let T = V − R. Then, there are no edges between S and T . Let
|T | = βn and |R| = γn, so γ = 1 − β. By Theorem 27.3.1, it must be the case that
≤ ∥χS ∥ ∥LH − LG ∥ ∥χT ∥
p
≤ ϵd ∥χS ∥ ∥χT ∥ αβdn ≤ ϵdn (α − α2 )(β − β 2 ).
p
= ϵdn αβ.
The lower bound on γ now follows by re-arranging terms. Dividing through by dn and squaring
This is almost as good as the bound we are trying to prove. To prove the claimed bound, recall both sides gives
that LH x = LH (x + c1) for all c. So, let x S and x T be the result of orthogonalizing χS and χT
α2 β 2 ≤ ϵ2 (α − α2 )(β − β 2 ) ⇐⇒
with respect to the constant vectors. By Claim 2.4.2 (from Lecture 2), ∥x S ∥ = n(α − α2 ). So, we
obtain the improved bound αβ ≤ ϵ2 (1 − α)(1 − β) ⇐⇒
β ϵ2 (1 − α)
x TS (LH − LG )x T = χTS (LH − LG )χT , ≤ ⇐⇒
1−β α
while p 1−γ ϵ2 (1 − α)
≤ ⇐⇒
∥x S ∥ ∥x T ∥ = n (α − α2 )(β − β 2 ). γ α
So, we may conclude 1 ϵ2 (1 − α) + α
p ≤ ⇐⇒
⃗
E(S, T ) − αβdn ≤ ϵdn (α − α2 )(β − β 2 ). γ α
α
γ≥ 2 .
ϵ (1 − α) + α
We remark that when S and T are disjoint, the same proof goes through even if G is irregular and
⃗
weighted if we replace E(S, T ) with
X If instead of N (S) we consider N (S) − S, then T and S are disjoint, so the same proof goes
w(S, T ) = w(u, v). through for weighted, irregular graphs that ϵ-approximate nd Kn .
(u,v)∈E,u∈S,v∈T
d
We only need the fact that G ϵ-approximates n Kn . See [BSS12] for details. 27.5 How well can a graph approximate the complete graph?
Consider applying Tanner’s Theorem with S = {v} for some vertex v. As v has exactly d
27.4 Vertex Expansion
neighbors, we find
ϵ2 (1 − 1/n) + 1/n ≥ 1/d,
The reason for the name expander graph is that small sets of vertices in expander graphs have p √
unusually large numbers of neighbors. For S ⊂ V , let N (S) denote the set of vertices that are from which we see that ϵ must be at least 1/ d(n − 1)/n, which is essentially 1/ d. But, how
neighbors of vertices in S. The following theorem, called Tanner’s Theorem, provides a lower small can it be?
bound on the size of N (S).
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 221 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 222
The Ramanujan graphs, constructed by Margulis [Mar88] and Lubotzky, Phillips and
Sarnak [LPS88] achieve √
2 d−1
ϵ≤ .
d
We will see that if we keep d fixed while we let n grow, ϵ cannot exceed this bound in the limit.
We will prove an upper bound on ϵ by constructing a suitable test function.
As a first step, choose two vertices v and u in V whose neighborhoods to do not overlap. Consider
the vector x defined by
1 √ if i = u,
1/ d if i ∈ N (u),
x (i) = −1 if i = v,
√
−1/ d if i ∈ N (v),
0 otherwise.
Now, compute the Rayleigh quotient with respect to x . The numerator
√ is the sum over all edges
of the squares of differences across the edges. This gives (1 − 1/ d)2 for the edges attached to u
and v, and 1/d for the edges attached to N (u) and N (v) but not to u or v, for a total of Figure 27.1: The construction of x .
√ √ √
2d(1 − 1/ d)2 + 2d(d − 1)/d = 2 d − 2 d + 1 + (d − 1) = 2 2d − 2 d .
Proof. Define the following neighborhoods.
On the other hand, the denominator is 4, so we find
x T Lx √ U0 = {u0 , u1 }
= d − d. Ui = N (Ui−1 ) − ∪j<i Uj , for 0 < i ≤ k,
xTx
If we use instead the vector V0 = {v0 , v1 }
1 if i = u, Vi = N (Vi−1 ) − ∪j<i Vj , for 0 < i ≤ k.
√
−1/ d if i ∈ N (u),
That is, Ui consists of exactly those vertices at distance i from U0 . Note that there are no edges
y (i) = −1 if i = v, between any vertices in any Ui and any Vj .
√
1/ d
if i ∈ N (v),
Our test vector for λ2 will be given by
0 otherwise, 1
we find
(d−1)i/2
for a ∈ Ui
√
y T Ly − β for a ∈ Vi
=d+ d. x (a) = (d−1)i/2
yT y
√
This is not so impressive, as it merely tells us that ϵ ≥ 1/ d, which we already knew. But, we
0 otherwise.
can improve this argument by pushing it further. We do this by modifying it in two ways. First,
we extend x to neighborhoods of neighborhoods of u and v. Second, instead of basing the We choose β so that x is orthogonal to 1.
construction at vertices u and v, we base it at two edges. This way, each vertex has d − 1 edges to
We now find that the Rayleigh quotient of x with respect to L is at most
those that are farther away from the centers of the construction.
X0 + β 2 Y0
The following theorem is attributed to A. Nilli [Nil91], but we suspect it was written by N. Alon. ,
X1 + β 2 Y1
Theorem 27.5.1. Let G be a d-regular graph containing two edges (u0 , u1 ) and (v0 , v1 ) that are where
at distance at least 2k + 2. Then √ 2
√ k−1
X Xk
√ 1 − 1/ d − 1
2 d−1−1 X0 = |Ui | (d − 1) + |Uk | (d − 1)−k+1 , and X1 = |Ui | (d − 1)−i
λ2 ≤ d − 2 d − 1 + . (d − 1)i/2
k+1 i=0 i=0
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 223 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 224
To upper bound the Rayleigh quotient, we observe that the left-most of these terms contributes
Pk |Ui | √
i=0 (d−1)i (d − 2 d − 1) √
Pk = d − 2 d − 1.
−i
i=0 |Ui | (d − 1)
A naive way of doing this would be for the transmitter to send every bit 3 times. If only 1 bit
were flipped during transmission, then the receiver would be able to figure out which one it was.
But, this is a very inefficient coding scheme. Much better approaches exist.
28.2 Notation
Chapter 28
When x is a vector, we let
def
|x | = |{a : x (a) ̸= 0}|
A brief introduction to Coding denote the hamming weight of x . This is often called the 0-norm, and written ∥x ∥0 .
For a prime p, we denote the integers modulo p by Fp . The reason is that the integers modulo p
Theory form the field with p elements: they may be summed and multiplied, have identities under
addition and multiplication (0 and 1), the have inverses under addition (−x), and all but zero
have inverses under multiplication. We say the field because it is unique up to the names of the
elements. In this chapter we mostly deal with the field of two elements F2 , which we write F2 .
This chapter gives a short introduction to the combinatorial view of error-correcting codes. Our
motivation is twofold: good error-correcting codes provide choices for the generators of
generalized hypercubes that have high expansion, and in the next chapter we learn how to use 28.3 Connection with Generalized Hypercubes
expander graphs to construct good error-correcting codes.
Recall that the Generalized Hypercubes we encountered in Section 7.4 have vertex set Fk2 and are
We begin and end the chapter with a warning: the combinatorial, worst-case view of coding
defined by d ≥ k generators, g 1 , . . . , g d ∈ Fk2 . For each b ∈ Fk2 , the graph defined by these
theory presented herein was very useful in the first few decades of the field. But, the problem of
generators has an adjacency matrix eigenvalue given by
error-correction is at its heart probabilistic and great advances have been made by avoiding the
worst-case formulation. For readers who would like to understand this perspective, we recommend d
X T
“Modern Coding Theory” by Richardson and Urbanke. For those who wish to learn more about µb = (−1)g i b .
the worst-case approach, we recommend “The Theory of Error-Correcting Codes” by i=1
225
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 227 CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 228
Hamming codes combine parity bits in an interesting way to enable the receiver to correct one for n larger than m. Every string in the image of C is called a codeword. We will also abuse
error. Let’s consider the first interesting Hamming code, which transmits 4-bit messages by notation by identifying C with the set of codewords.
sending 7 bits in such a way that any one error can be corrected. Note that this is much better
We define the rate of the code to be
than repeating every bit 3 times, which would require 12 bits. m
. r=
n
For reasons that will be clear soon, we will let b3 , b5 , b6 , and b7 be the bits that the transmitter
would like to send. The parity bits will be chosen by the rules The rate of a code tells you how many bits of information you receive for each codeword bit. Of
course, codes of higher rate are more efficient.
b4 = b5 + b6 + b7
The Hamming distance between two words c 1 and c 2 is the number of bits in which they differ.
b2 = b3 + b6 + b7 It will be written
b1 = b3 + b5 + b7 . dist(c 1 , c 2 ) = c 1 − c 2 .
All additions, of course, are modulo 2. The transmitter will send the codeword b1 , . . . , b7 .
The minimum distance of a code is
If we write the bits as a vector, then we see that they satisfy the linear equations
d= min dist(c 1 , c 2 )
b1 c 1 ̸=c 2 ∈C
b2 (here we have used C to denote the set of codewords). It should be clear that if a code has large
0 0 0 1 1 1 1 b3
0
0 1 1 0 0 1 1 b4 = 0 . minimum distance then it is possible to correct many errors. In particular, it is possible to correct
any number of errors less than d/2. To see why, let c be a codeword, and let r be the result of
1 0 1 0 1 0 1 b5
0
b6 flipping e < d/2 bits of c. As dist(c, r ) < d/2, c will be the closest codeword to r . This is
because for every c 1 ̸= c,
b7
For example, to transmit the message 1010, we set d ≤ dist(c 1 , c) ≤ dist(c 1 , r ) + dist(r , c) < dist(c 1 , r ) + d/2 implies d/2 < dist(c 1 , r ).
A big step in this direction was the use of linear codes . In the same way that we defined To this end, fix some non-zero vector b in Fm2 . Each entry of Gb is the inner product of a column
Hamming codes, we may define a linear code as the set of vectors c ∈ Fn2 such that M c = 0, for of G with b. As each column of G consists of random F2 entries, each entry of Gb is chosen
some (n − m)-by-n matrix M . In this chapter, we will instead define a code by its generator uniformly from F2 . As the columns of G are chosen independently, we see that Gb is a uniform
matrix. Given an n-by-m matrix G, we define the code CG to be the of vectors of the form Gb, random vector in Fn2 . Thus, the probability that |Gb| is at most d is precisely
where b ∈ Fm2 . One may view b as the message to be transmitted, and Gb as its encoding. d
1 X n
A linear code is called linear because the sum of two codewords in the code is always another n
.
2 i
i=0
codeword. In particular, 0 is always a codeword and the minimum distance of the code equals the
minimum Hamming weight of a non-zero codeword, as As the probability that one of a number of events holds is at most the sum of the probabilities
that each holds (the “union bound”),
dist(c 1 , c 2 ) = c 1 − c 2 = c 1 + c 2 X
PrG [∃b ∈ Fm2 , b ̸= 0 : |Gb| ≤ d] ≤ PrG [|Gb| ≤ d]
over F2 .
0̸=b∈Fm
2
We now pause to make the connection back to generalized hypercubes: if CG has minimum d
1 X n
relative distance δ and maximum relative distance 1 − δ, then the corresponding generalized ≤ (2m − 1)
n
.
2 i
hypercube is 1 − 2δ expander. i=0
d
2m X n
≤ n .
2 i
28.6 Random Linear Codes i=0
In the early years of coding theory, there were many papers published that contained special
constructions of codes such as the Hamming code. But, as the number of bits to be transmitted To see how this behaves asymptotically, recall that for a constant p,
became larger and larger, it became more and more difficult to find such exceptional codes. Thus, n
an asymptotic approach became reasonable. In his paper introducing coding theory, Shannon ≈ 2nH(p) ,
pn
[Sha48] proved that random codes are asymptotically good. A few years later, Elias [Eli55]
where
suggested using random linear codes. def
H(p) = −p log2 p − (1 − p) log2 (1 − p)
We will now see that random linear codes are asymptotically good with high probability. We is the binary entropy function. If youPare not familiar with this, you may derive it from Stirling’s
consider a code of the form CG , where G is an n-by-m matrix with independent uniformly chosen formula. For our purposes 2nH(p) ≈ pn n
i=0 i . Actually, we will just use the fact that for
F2 entries. Clearly, the rate of the code will be m/n. β > H(p), P pn n
So, the minimum distance of CG is i=0 i
→0
2nβ
min m dist(0, Gb) = min m |Gb| , as n goes to infinity.
0̸=b∈F2 0̸=b∈F2
where by |c| we mean the number of 1s in c. This is sometimes called the weight of c. If we set m = rn and d = δn, then Lemma 28.6.1 tells us that CG probably has rate r and
minimum relative distance δ if
Here’s what we can say about the minimum distance of a random linear code. The following 2rn nH(δ)
2 < 1,
argument is a refinement of the Chernoff based argument that appears in Section 7.5. 2n
which happens when
Lemma 28.6.1. Let G be a random n-by-m matrix. For any d, the probability that CG has
H(δ) < 1 − r.
minimum distance at least d is at least
d For any constant r < 1, we can find a δ for which H(δ) < 1 − r, so there exist asymptotically
2m X n good sequences of codes of every non-zero rate. This is called the Gilbert-Varshamov bound. It is
1− n .
2 i still not known if binary codes exist whose relative minimum distance satisfies H(δ) > 1 − r. This
i=0
is a big open question in coding theory.
Proof. It suffices to upper bound the probability that there is some non-zero b ∈ Fm
2 for which
Of course, this does not tell us how to choose such a code in practice, how to efficiently check if a
|Gb| ≤ d. given code has large minimum distance, or how to efficiently decode such a code.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 231 CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 232
28.7 Reed-Solomon Codes However, Reed-Solomon codes do not provide an asymptotically good family. If one represents
each field element by log2 p bits in the obvious way, then the code has length p log2 p, but can
Reed-Solomon Codes are one of the workhorses of coding theory. The are simple to describe, and only correct at most p errors. That said, one can find an asymptotically good family by encoding
easy to encode and decode. each field element with its own small error-correcting code.
However, Reed-Solomon Codes are not binary codes. Rather, they are codes whose symbols are Next lecture, we will see how to make asymptotically good codes out of expander graphs. In the
elements of a finite field. If you don’t know what a finite field is, don’t worry (yet). For now, we following lecture, we will use good error-correcting codes to construct graphs.
will just consider prime fields, Fp . These are the numbers modulo a prime p. Recall that such
numbers may be added, multiplied, and divided.
28.8 Caution
A message in a Reed-Solomon code over a field Fp is identified with a polynomial of degree m − 1.
That is, the message f1 , . . . , fm is viewed as providing the coefficients of the polynomial
Explain defects of the worst-case view.
m−1
X
i
Q(x) = fi+1 x .
i=0
A Reed-Solomon code is encoded by evaluating it over every element of the field. That is, the
codeword is
Q(0), Q(1), Q(2), . . . , Q(p − 1).
Sometimes, it is evaluated at a subset of the field elements.
We will now see that the minimum distance of such a Reed-Solomon code is p − m. We show this
using the following standard fact from algebra.
Lemma 28.7.1. Let Q be a polynomial of degree at most m − 1 over a field Fp . If there exists
distinct field elements x1 , . . . , xm such that
Q(xi ) = 0
then Q is identically zero.
Theorem 28.7.2. The minimum distance of the Reed-Solomon code is at least p − m.
Proof. Let Q1 and Q2 be two different polynomials of degree at most m − 1. For a polynomial Q,
let
E(Q) = (Q(0), Q(1), . . . , Q(p))
be its encoding. If
dist(E(Q1 ), E(Q2 )) ≤ p − k,
then there exists field elements x1 , . . . , xk such that
Q1 (xj ) = Q2 (xj ).
Now, consider the polynomial
Q1 (x) − Q2 (x).
It also has degree at most m − 1, and it is zero at k field elements. Lemma 28.7.1 tells us that if
k ≥ m, then Q1 − Q2 is exactly zero, which means that Q1 = Q2 . Thus, for distinct Q1 and Q2 , it
must be the case that
dist(E(Q1 ), E(Q2 )) > p − m.
CHAPTER 29. EXPANDER CODES 234
Chapter 29
Figure 29.1: The cycle on 4 vertices, and its double-cover
Expander Codes
Proposition 29.1.2. Let H be the double-cover of G. Then, for every eigenvalue λi of the
Laplacian of G, H has a pair of eigenvalues,
λi and 2d − λi .
In this Chapter we will learn how to use expander graphs to construct and decode asymptotically
good error correcting codes.
The easiest way to prove this is to observe that if A is the adjacency matrix of G, then the
adjacency matrix of H looks like
29.1 Bipartite Expander Graphs 0 A
.
A 0
Our construction of error-correcting codes will exploit bipartite expander graphs (as these give a Our analysis of error-correcting codes will exploit the following theorem, which is analogous to
much cleaner construction than the general case). Let’s begin by examining what a bipartite Theorem 10.2.1.
expander graph should look like. It’s vertex set will have two parts, U and V , each having n
d
vertices. Every vertex will have degree d, and every edge will go from a vertex in U to a vertex in Theorem 29.1.3. Let G = (U ∪ V, E) be a d-regular bipartite graph that ϵ-approximates n Kn,n .
V. Then, for all S ⊆ U and T ⊆ V ,
In the same way that we view ordinary expanders as approximations of complete graphs, we will d p
|E(S, T )| − |S| |T | ≤ ϵd |S| |T |.
view bipartite expanders as approximations of complete bipartite graphs1 . That is, if we let Kn,n n
denote the complete bipartite graph, then we want a d-regular bipartite graph G such that
d d Proof. Similar to the proof of Theorem 27.3.1.
(1 − ϵ) Kn,n ≼ G ≼ (1 + ϵ) Kn,n .
n n
Let G(S ∪ T ) denote the graph induced on vertex set S ∪ T . We use the following simple corollary
As the eigenvalues of the Laplacian of nd Kn,n are 0 and 2d with multiplicity 1 each, and d of Theorem 29.1.3.
otherwise, this means that we want a d-regular graph G whose Laplacian spectrum satisfies
Corollary 29.1.4. For S ⊆ U with |S| = σn and and T ⊆ V with |T | = τ n, the average degree of
λ1 = 0, λ2n = 2d, and |λi − d| ≤ ϵd, for all 1 < i < 2n. vertices in G(S ∪ T ) is at most
2dστ
We can obtain such a graph by taking the double-cover of an ordinary expander graph. + ϵd.
σ+τ
Definition 29.1.1. Let G = (V, E) be a graph. The double-cover of G is the graph with vertex set
V × {0, 1} and edges Proof. The average degree of a graph is twice its number of edges, divided by the number of
((a, 0), (b, 1)) , for (a, b) ∈ E. vertices. In our case, this is at most
p
2d |S| |T | |S| |T |
It is easy to determine the eigenvalues of the double-cover of a graph. + 2ϵd .
n |S| + |T | |S| + |T |
1
The complete bipartite graph contains all edges between U and V
233
CHAPTER 29. EXPANDER CODES 235 CHAPTER 29. EXPANDER CODES 236
The left-hand term is So, the codewords form a vector space of dimension dn(2r0 − 1), and so there is a matrix G with
2dστ dn(2r0 − 1) columns and dn rows for which the codewords are precisely the vectors Gx , for
,
σ+τ x ∈ {0, 1}dn(2r0 −1) . In fact, there are many such matrices G, and they are called generator
and the right-hand term is at most matrices for the code. Such a matrix G may be computed from M by elementary linear algebra.
ϵd.
We will convert an algorithm that corrects errors in C0 into an algorithm for correcting errors in Lemma 29.5.2. Assume that ϵ ≤ δ0 /3. Let F be the set of edges in error after a U -decoding
C. The construction is fairly simple. We first apply the decoding algorithm at every vertex in U . step, and let S be the set of vertices in U attached to F . Now, perform a V -decoding step and let
We then do it at every vertex in V . We alternate in this fashion until we produce a codeword. T be the set of vertices in V attached to edges in error afterwards. If
To make this more concrete, assume that we have an algorithm A that corrects up to δ0 d/2 errors |S| ≤ δ0 n/9,
in the code C0 . That is, on input any word r ∈ {0, 1}d , A outputs another word in {0, 1}d with
the guarantee that if there is a c ∈ C0 such that dist(c, r ) ≤ δ0 d/2, then A outputs c. We apply then
the transformation A independently to the edges attached to each vertex of U . We then do the 3
|T | ≤ |S| .
same for V , and then alternate sides for a logarithmic number of iterations. We refer to these 4
alternating operations as U − and V -decoding steps
Proof. Every vertex in V that outputs an error after the V -decoding step must be attached to at
We will prove that if ϵ ≤ δ0 /3 then this algorithm will correct up to δ02 dn/18 errors in at most least δ0 d/2 edges of F . Moreover, each of these edges is attached to a vertex of S. Thus, the
log4/3 n iterations. The idea is to keep track of which vertices are attached to edges that contain lemma follows immediately from Lemma 29.5.1.
errors, rather than keeping track of the errors themselves. We will exploit the fact that any vertex
that is attached to few edges in error will correct those errors. Let S be the set of vertices Theorem 29.5.3. If ϵ ≤ δ0 /3, then the proposed decoding algorithm will correct every set of at
attached to edges in error after a U -decoding step. We will show that the set T of vertices most
δ02
attached to edges in error after the next V -decoding step will be much smaller. dn
18
Lemma 29.5.1. Assume that ϵ ≤ δ0 /3. Let F ⊂ E be a set of edges, let S be the subset of errors.
vertices in U attached to edges in F and let T be the subset of vertices in V attached to at least
δ0 d/2 edges in F . If Proof. Let F denote the set of edges that are initially in error. Let S denote the set of vertices
|S| ≤ δ0 n/9, that output errors after the first U -decoding step. Every vertex in S must be adjacent to at least
δd/2 edges in F , so
then
3 δ2 |F |
|T | ≤ |S| . |F | ≤ 0 dn =⇒ |S| ≤ ≤ δ0 n/9.
4 18 δ0 d/2
After this point, we may apply Lemma 29.5.2 to show that the decoding process converges in at
Proof. Let |S| = σn and |T | = τ n. We have |F | ≥ (δ0 d/2) |T |. As the average degree of G(S ∪ T ) most log4/3 n iterations.
is twice the number of edges in the subgraph divided by the number of vertices, it is at least
δ0 d |T | δ0 dτ
= . 29.6 Historical Notes
|S| + |T | σ+τ
Applying Corollary 29.1.4, we find Gallager [Gal63] first used graphs to construct error-correcting codes. His graphs were also
δ0 dτ 2dστ bipartite, with one set of vertices representing bits and the other set of vertices representing
≤ + ϵd. constraints. Tanner [Tan81] was the first to put the vertices on the edges. The use of expansion in
σ+τ σ+τ
CHAPTER 29. EXPANDER CODES 239
analyzing these codes we pioneered by Sipser and Spielman [SS96]. The construction we present
here is due to Zemor [Zem01], although he presents a tighter analysis. Improved constructions
and analyses may be found in [BZ02, BZ05, BZ06, AS06].
Surprisingly, encoding these codes is slower than decoding them. As the matrix G will be dense,
leading to an encoding algorithm that takes time Θ((dn)2 ). Of course, one would prefer to encode
them in time O(dn). Using Ramanujan expanders and the Fast Fourier Transform over the
appropriate groups, Lafferty and Rockmore [LR97] reduced the time for encoding to O(d2 n4/3 ).
Chapter 30
Spielman [Spi96a] modifies the code construction to obtain codes with similar performance that
may be encoded in linear time.
Related ideas have been used to design codes that approach channel capacity. See
[LMSS01, RSU01, RU08].
A simple construction of expander
graphs
30.1 Overview
Our goal is to prove that for every ϵ > 0 there is a d for which we can efficiently construct an
infinite family of d-regular ϵ-expanders. I recall that these are graphs whose adjacency matrix
eigenvalues satisfy |µi | ≤ ϵd and whose Laplacian matrix eigenvalues satisfy |d − λi | ≤ ϵd. Viewed
as a function of ϵ, the d that we obtain in this construction is rather large. But, it is a constant.
The challenge here is to construct infinite families with fixed d and ϵ.
Before we begin, I remind you that in Lecture 5 we showed that random generalized hybercubes
were ϵ expanders of degree f (ϵ) log n, for some function f . The reason they do not solve today’s
problem is that their degrees depend on the number of vertices. However, today’s construction
will require some small expander graph, and these graphs or graphs like them can serve in that
role. So that we can obtain a construction for every number of vertices n, we will exploit random
generalized ring graphs. Their analysis is similar to that of random generalized hypercubes.
Claim 30.1.1. There exists a function f (ϵ) so that for every ϵ > 0 and every sufficiently large n
the Cayley graph with group Z/n and a random set of at least f (ϵ) log n generators is an
ϵ-expander with high probability.
I am going to present the simplest construction of expanders that I have been able to find. By
“simplest”, I mean optimizing the tradeoff of simplicity of construction with simplicity of
analysis. It is inspired by the Zig-Zag product and replacement product constructions presented
by Reingold, Vadhan and Wigderson [RVW02].
For those who want the quick description, here it is. Begin with an expander. Take its line graph.
Observe that the line graph is a union of cliques. So, replace each clique by a small expander. We
240
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 241 CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 242
need to improve the expansion slightly, so square the graph. Square one more time. Repeat. So, by squaring enough times, we can convert a family of β expanders for any β < 1 into a family
of ϵ expanders.
The analysis will be simple because all of the important parts are equalities, which I find easier to
understand than inequalities.
While this construction requires the choice of two expanders of constant size, it is explicit in the 30.3 The Relative Spectral Gap
sense that we can obtain a simple implicit representation of the graph: if the name of a vertex in
the graph is written using b bits, then we can compute its neighbors in time polynomial in b. To measure the qualities of the graphs that appear in our construction, we define a quantity that
we will call the relative spectral gap of a d-regular graph:
30.2 Squaring Graphs def
r(G) = min
λ2 (G) 2d − λn
, .
d d
We will first show that we can obtain a family of ϵ expanders from a family of β-expanders for The graphs with larger relative spectral gaps are better expanders. An ϵ-expander has relative
any β < 1. The reason is that squaring a graph makes it a better expander, although at the cost spectral gap at least 1 − ϵ, and vice versa. Because we can square graphs, we know that it suffices
of increasing its degree. to find an infinite family of graphs with relative spectral gap strictly greater than 0.
Given a graph G, we define the graph G2 to be the graph in which vertices u and v are connected We now state exactly how squaring impacts the relative spectral gap of a graph.
if they are at distance 2 in G. Formally, G2 should be a weighted graph in which the weight of an
edge is the number of such paths. When first thinking about this, I suggest that you ignore the Corollary 30.3.1. If G has relative spectral gap β, then G2 has relative spectral gap at least
issue. When you want to think about it, I suggest treating such weighted edges as multiedges.
2β − β 2 .
We may form the adjacency matrix of G2 from the adjacency matrix of G. Let M be the
adjacency matrix of G. Then M 2 (u, v) is the number of paths of length 2 between u and v in G,
and M 2 (v, v) is always d. We will eliminate those self-loops. So, Note that when β is small, this gap is approximately 2β.
M G2 = M 2G − dIn .
30.4 Line Graphs
If G has no cycles of length up to 4, then all of the edges in its square will have weight 1. The
following claim is immediate from this definition.
Our construction will leverage small expanders to make bigger expanders. To begin, we need a
Claim 30.2.1. The adjacency matrix eigenvalues of G2 are precisely way to make a graph bigger and still say something about its spectrum.
µ2i − d, We use the line graph of a graph. Let G = (V, E) be a graph. The line graph of G is the graph
whose vertices are theedges of G in which two are connected if they share an endpoint in G.
where µ1 , . . . , µn are the adjacency matrix eigenvalues of G. That is, (u, v), (w, z) is an edge of the line graph if one of {u, v} is the same as one of {w, z}.
√ The line graph is often written L(G), but we won’t do that in this class so that we can avoid
Lemma
2 30.2.2. If {Gi }i is an infinite family of d-regular β-expanders for β ≥ 1/ d − 1, then
confusion with the Laplacian.
Gi i is an infinite family of d(d − 1)-regular β 2 expanders.
√
We remark that the case of β > 1/ d − 1, or even larger, is the case
√ of interest. We are not
expecting to work with graphs that beat the Ramanujan bound, 2 d − 1/d.
µ2 − d µ2 − d µ2
= 2 ≤ 2 ≤ β2.
d(d − 1) d −d d
(a) A graph (b) Its line graph.
On the other hand, every adjacency eigenvalue of G2i is at least −d, which is at least
−β 2 d(d − 1).
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 243 CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 244
Let G be a d-regular graph with n vertices, and let H be its line graph1 .As G has dn/2 edges, H Proof of Lemma 30.4.1. First, let λi be an eigenvalue of LG . We see that
has dn/2 vertices. Each vertex of H, say (u, v), has degree 2(d − 1): d − 1 neighbors for the other
edges attached to u and d − 1 for v. In fact, if we just consider one vertex u in V , then all vertices λi is an eigenvalue of D G − M G =⇒
in H of form (u, v) of G will be connected. That is, H contains a d-clique for every vertex in V . d − λi is an eigenvalue of M G =⇒
We see that each vertex of H is contained in exactly two of these cliques. 2d − λi is an eigenvalue of D G + M G =⇒
Here is the great fact about the spectrum of the line graph. 2d − λi is an eigenvalue of 2Ind/2 + M H =⇒
Lemma 30.4.1. Let G be a d-regular graph with n vertices, and let H be its line graph. Then the 2(d − 1) − λi is an eigenvalue of M H =⇒
spectrum of the Laplacian of H is the same as the spectrum of the Laplacian of G, except that it λi is an eigenvalue of D H − M H .
has dn/2 − n extra eigenvalues of 2d.
Of course, this last matrix is the Laplacian matrix of H. We can similarly show that the extra
Before we prove this lemma, we need to recall the factorization of a Laplacian as the product of dn/2 − n zero eigenvalues of 2Ind/2 + M H become 2d in LH .
the signed edge-vertex adjacency matrix times its transpose. We reserved the letter U for this
matrix, and defined it by While the line graph operation preserves λ2 , it causes the degree of the graph to grow. So, we are
going to need to do more than just take line graphs to construct expanders.
1 if a = c
U ((a, b), c) = −1 if b = c Proposition 30.5.1. Let G be a d-regular graph with d ≥ 7 and let H be its line graph. Then,
0 otherwise.
λ2 (G)
For an unweighted graph, we have r(H) = ≥ r(G)/2.
2(d − 1)
LG = U T U .
Recall that each edge indexes one column, and that we made an arbitrary choice when we ordered Proof. For G a d-regular graph other than Kd+1 , λ2 (G) ≤ d + 1. By the Perron-Frobenius
the edge (a, b) rather than (b, a). But, this arbitrary choice factors out when we multiply by U T . theorem (Lemma 6.A.1) λmax (G) ≤ 2d (with equality if and only G is bipartite). So,
λmax (H) = 2d and λ2 (H) = λ2 (G) ≤ d. So, the term in the definition of the relative spectral gap
corresponding to the largest eigenvalue of H satisfies
30.5 The Spectrum of the Line Graph
2(2d − 2) − λmax (H) 2(2d − 2) − 2d 2
= = 1 − ≥ 5/7,
2d − 2 2d − 2 d
Define the matrix |U | to be the matrix obtained by replacing every entry of U by its absolute
value. Now, consider |U |T |U |. It looks just like the Laplacian, except that all of its off-diagonal as d ≥ 7. On the other hand,
entries are 1 instead of −1. So, λ2 (H) d
≤ ≤ 2/3.
2d − 2 2d − 2
T
|U | |U | = D G + M G = dI + M G , As 2/3 < 5/7,
as G is d-regular. We will also consider the matrix |U | |U |T . This is a matrix with nd/2 rows
λ2 (H) 2(2d − 2) − λmax (H) λ2 (H) λ2 (G)
and nd/2 columns, indexed by edges of G. The entry at the intersection of row (u, v) and column min , = = ≥ r(G/2).
2d − 2 2d − 2 2d − 2 2d − 2
(w, z) is
(δ u + δ v )T (δ w + δ z ).
So, it is 2 if these are the same edge, 1 if they share a vertex, and 0 otherwise. That is
While the line graph of G has more vertices, its degree is higher and its relative spectral gap is
|U | |U |T = 2Ind/2 + M H . approximately half that of G. We can improve the relative spectral gap by squaring. In the next
section, we show how to lower the degree.
Moreover, |U | |U |T and |U |T |U | have the same eigenvalues, except that the later matrix has
nd/2 − n extra eigenvalues of 0.
1
If G has multiedges, which is how we interpret integer weights, then we include a vertex in the line graph for
each of those multiedges. These will be connected to each other by edges of weight two—one for each vertex that
they share. All of the following statements then work out.
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 245 CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 246
Let G be a d-regular graph and let Z be a graph on d vertices of degree k (we will use a
low-degree expander). We define the graph So, the relative spectral gap of G⃝Z
L is a little less than half that of G. But, the degree of G⃝Z
L
is 2k, which we will arrange to be much less than the degree of G, d.
G⃝Z
L
We will choose k and d so that squaring this graph improves its relative spectral gap, but still
to be the graph obtained by forming the edge graph of G, H, and then replacing every d-clique in leaves its degree less than d. If G has relative spectral gap β, then G2 has relative spectral gap at
H by a copy of Z. Actually, this does not uniquely define G⃝Z,
L as there are many ways to least
replace a d-clique by a copy of Z. But, any choice will work. Note that every vertex of G⃝Z
L has 2β − β 2 .
degree 2k.
It is easy to see that when β is small, this gap is approximately 2β. This is not quite enough to
Lemma 30.6.1. Let G be a d-regular graph, let H be the line graph of G, and let Z be a compensate for the loss of (1 − ϵ)/2 in the corollary above, so we will have to square the graph
k-regular α-expander. Then, once more.
k k
(1 − α) H ≼ G⃝Z
L ≼ (1 + α) H
d d 30.7 The whole construction
Proof. As H is a sum of d-cliques, let H1 , . . . , Hn be those d-cliques. So,
To begin, we need a “small” k-regular expander graph Z on
n
X
LH = LHi . def
d = (2k(2k − 1))2 − 2k(2k − 1)
i=1
vertices. It should be an ϵ-expander for some small ϵ. I believe that ϵ = 1/6 would suffice. The
Let Zi be the graph obtained by replacing Hi with a copy of Z, on the same set of vertices. To
other graph we will need to begin our construction will be a small d-regular expander graph G0 .
prove the lower bound, we compute
We use Claim 30.1.1 to establish the existence of both of these. Let β be the relative spectral gap
n
X kX
n
k of G0 . We will assume that β is small, but greater than 0. I believe that β = 1/5 will work. Of
LG⃝
LZ = LZi ≽ (1 − α) LHi = (1 − α) LH . course, it does not hurt to start with a graph of larger relative spectral gap.
d d
i=1 i=1
We then construct G0 ⃝Z.
L The degree of this graph is 2k, and its relative spectral gap is a little
The upper bound is proved similarly. less than β/2. So, we square the resulting graph, to obtain
Corollary 30.6.2. Under the conditions of Lemma 30.6.1, 2
(G0 ⃝Z)
L .
1−α It has degree approximately 4k 2 ,
and relative spectral gap slightly less than β. But, for induction,
r(G⃝Z)
L ≥ r(G).
2 we need it to be more than β. So, we square one more time, to get a relative spectral gap a little
less than 2β. We now set
Proof. The proof is similar to the proof of Proposition 30.5.1. We have 2
2
G1 = (G0 ⃝Z)
L .
kλ2 (G)
λ2 (G⃝Z)
L ≥ (1 − α) , The graph G1 is at least as good an approximation of a complete graph as G0 , and it has degree
d approximately 16k 4 . In general, we set
and 2
2
λmax (G⃝Z)
L ≤ (1 + α)2k. Gi+1 = (Gi ⃝Z)
L .
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 247
To make the inductive construction work, we need for Z to be a graph of degree k whose number
of vertices equals the degree of G. This is approximately 16k 4 , and is exactly
I’ll now carry out the computation of relative spectral gaps with more care. Let’s assume that G0
has a relative spectral gap of β ≥ 4/5, and assume, by way of induction, that ρ(Gi ) ≥ 4/5. Also Chapter 31
assume that Z is a 1/6-expander. We then find
r(Gi ⃝Z)
L ≥ (1 − ϵ)(4/5)/2 = 1/3.
So, Gi ⃝Z
L is a 2/3-expander. Our analysis of graph squares then tells us that Gi+1 is a
PSRGs via Random Walks on Graphs
(2/3)4 -expander. So,
r(Gi+1 ) ≥ 1 − (2/3)4 = 65/81 > 4/5.
By induction, we conclude that every Gi has relative spectral gap at least 4/5.
31.1 Overview
To improve their relative spectral gaps of the graphs we produce, we can just square them a few
times.
There are three major approaches to designing pseudo-random generators (PSRGs). The most
common is to use quick procedures that seem good enough. This is how the PSRGs that are
standard in most languages arise. Cryptographers and Complexity Theorists try to design PSRGs
30.8 Better Constructions
that work for every polynomial-time algorithm. For example, one can construct PSRGs from
cryptographic functions with the guarantee that if the output of a polynomial-time algorithm
There is a better construction technique, called the Zig-Zag product [RVW02]. The Zig-Zag differs from random when using the PSRG, then one can use it to break the cryptographic
construction is a little trickier to understand, but it achieves better expansion. I chose to present function (see [HILL99, Gol07]). In this chapter we consider the construction of PSRGs that can
the line-graph based construction because its analysis is very closely related to an analysis of the be proved to work for specific algorithms or algorithms of specific forms. In particular, we will see
Zig-Zag product. w Impagliazzo and Zuckerman’s [IZ89] approach of using of random walks on expanders to run
the same algorithm many times. We are going to perform a very crude analysis that is easy to
present. Rest assured that much tighter analyses are possible and much better PSRGs have been
constructed since.
Pseudo-random number generators take a seed which is presumably random (or which has a lot of
randomness in it), and then generate a long string of random bits that are supposed to act
random. We should first discuss why we would actually want such a thing. I can think of two
reasons.
1. Random bits are scarce. This might be surprising. After all, if you look at the last few bits
of the time that I last hit a key, it is pretty random. Similarly, the low-order bits of the
temperature of the processor in my computer seem pretty random. While these bits are
pretty random, there are not too many of them.
Many randomized algorithms need a lot of random bits. Sources such as these just do not
produce random bits with a frequency sufficient for many applications.
248
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 249 CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 250
2. If you want to re-run an algorithm, say to de-bug it, it is very convenient to be able to use Since we will not make any assumptions about the black box, we will use truly random bits the
the same set of random bits by re-running the PSRG with the same seed. If you use truly first time we test it. But, we will show that we only need 9 new random bits for each successive
random bits, you can’t do this. test. In particular, we will show that if we use our PSRG to generate bits for t + 1 test, then the
probability that majority answer is wrong decreases exponentially in t.
You may also wonder how good the standard pseudo-random number generators are. The first You are probably wondering why we would want to do such a thing. The reason is to increase the
answer is that the default ones, such as rand in C, are usually terrible. There are many accuracy of randomized algorithms. There are many randomized algorithms that provide weak
applications, such as those in my thesis, for which these generators produce behavior that is very guarantees, such as being correct 99% or 51% of the time. To obtain accurate answers from such
different from what one would expect from truly random bits (yes, this is personal). On the other algorithms, we run them many times with fresh random bits. You can view such an algorithm has
hand, one can use cryptographic functions to create bits that will act random for most purposes, having two inputs: the problem to be solved and its random bits. The black box is the behavior
unless one can break the underlying cryptography [HILL99]. But, the resulting generators are of the algorithm when the problem to be solved is fixed, so it is just working on the random bits.
usually much slower than the fastest pseudo-random generators. Fundamentally, it comes down to
a time-versus-quality tradeoff. The longer you are willing to wait, the better the pseudo-random
bits you can get. 31.5 The Random Walk Generator
31.3 Expander Graphs Let r be the number of bits that our black box takes as input. So, the space of random bits is
{0, 1}r . Let X ⊂ {0, 1}r be the settings of the random bits on which the box gives the minority
answer, and let Y be the settings on which it gives the majority answer.
In today’s lecture we will require an infinite family of d-regular 1/10-expander graphs. We require
that d be a constant, that the graphs have 2r vertices for all sufficiently large r, and that we can Our pseudo-random generator will use a random walk on a 1/10-expander graph whose vertex set
construct the neighbors of a vertex in time polynomial in r. That is, we need the graphs to have a is {0, 1}r . Recall that we can use d = 400. For the first input we feed to the black box, we will
simple explicit description. One can construct expanders families of this form using the require r truly random bits. We treat these bits as a vertex of our graph. For each successive test,
techniques from last lecture. For today’s purposes, the best expanders are the Ramanujan graphs we choose a random neighbor of the present vertex, and feed the corresponding bits to the box.
produced by Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88]. Ramanujan graphs of That is, we choose a random i between 1 and 400, and move to the ith neighbor of the present
degree d = 400 are 1/10-expanders. See also the work of Alon, Bruck, Naor, Naor and vertex. Note that we only need log2 400 ≈ 9 random bits to choose the next vertex. So, we will
Roth [ABN+ 92] for even more explicit constructions. only need 9 new bits to generate each input we feed to the box after the first.
While the explicit Ramanujan graphs only exist in certain sizes, none of which do have exactly 2r
vertices, some of them have just a little more that 2r vertices. It is possible to trim these to make 31.6 Formalizing the problem
them work, say by ignoring all steps in which the vertex does not correspond to r bits.
Assume that we are going to test the box t + 1 times. Our pseudo-random generator will begin at
31.4 Today’s Application : repeating an experiment a truly random vertex v, and then take t random steps. Recall that we defined X to be the set of
vertices on which the box outputs the minority answer, and we assume that |X| ≤ 2r /100. If we
report the majority of the outcomes of the t + 1 outputs of the box, we will return the correct
Imagine you are given a black box that takes r bits as input and then outputs either 0 or 1. answer as long as the random walk is inside X less than half the time. To analyze this, let v0 be
Moreover, let’s assume that the black box is very consistent: we know that it returns the same the initial random vertex, and let v1 , . . . , vt be the vertices produced by the t steps of the random
answer at least 99% of the time. If it almost always returns 0, we will call it a 0-box and if it walk. Let T = {0, . . . , t} be the time steps, and let S = {i : vi ∈ X}. We will prove
almost always returns 1, we will call it a 1-box. Our job is to determine whether a given box is a
0 or 1 box. We assume that r is big, so we don’t have time to test the box on all 2r settings of r 2 t+1
bits. Instead, we could pick r bits at random, and check what the box returns. If it says“1”, then Pr [|S| > t/2] ≤ √ .
5
it is probably a 1-box. But, what if we want more than 99% confidence? We could check the box
on many choices of r random bits, and report the majority value returned by the box.1 . But, this To begin our analysis, recall that the initial distribution of our random walk is p 0 = 1/n. Let χX
seems to require a new set of random bits for each run. In this lecture, we will prove that 9 new and χY be the characteristic vectors of X and Y , respectively, and let D X = diag(χX ) and
bits per run suffice. Note that the result would be interesting for any constant other than 9. D Y = diag(χY ). Let
1
Check for yourself that running it twice doesn’t help 1
W = M (31.1)
d
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 251 CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 252
be the transition matrix of the ordinary random walk on G. We are not using the lazy random The matrix norm measures how much a vector can increase in size when it is multiplied by M .
walk: it would be silly to use the lazy random walk for this problem, as there is no benefit to When M is symmetric, the 2-norm is just the largest absolute value of an eigenvalue of M (prove
re-running the experiment with the same random bits as before. Let ω1 , . . . , ωn be the eigenvalues this for yourself). It is also immediate that
of W . As the graph is a 1/10-expander, |ωi | ≤ 1/10 for all i ≥ 2.
∥M 1 M 2 ∥ ≤ ∥M 1 ∥ ∥M 2 ∥ .
Let’s see how we can use these matrices to understand the probabilities under consideration. For
a probability vector p on vertices, the probability that a vertex chosen according to p is in X You should also verify this yourself. As D X , D Y and W are symmetric, they each have norm 1.
may be expressed
χTX p = 1T D X p. Warning 31.7.1. While the largest eigenvalue of a walk matrix is 1, the norm of an asymmetric
walk matrix can be larger√than 1. For instance, consider the walk matrix of the path on 3 vertices.
The second form will be more useful, as Verify that it has norm 2.
DXp
is the vector obtained by zeroing out the events in which the vertices are not in X. If we then Our analysis rests upon the following bound on the norm of D X W .
want to take a step in the graph G, we multiply by W . That is, the probability that the walk
starts at vertex in X, and then goes to a vertex i is q (i) where Lemma 31.7.2.
∥D X W ∥ ≤ 1/5.
q = W D X p 0.
Let’s see why this implies the theorem. For any set R, let Zi be as defined above. As p 0 = W p 0 ,
Continuing this way, we see that the probability that the walk is in X at precisely the times i ∈ R we have
is
1T D Zt W D Zt−1 W · · · D Z1 W D Z0 p 0 , 1T D Zt W D Zt−1 W · · · D Z1 W D Z0 p 0 = 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0 .
where ( Now,
X if i ∈ R (
Zi = 1/5 for i ∈ R, and
Y otherwise. D Zt−1 W ≤
1 for i ∈
̸ R.
We will prove that this probability is at most (1/5)|R| . It will then follow that So,
X (D Zt W ) D Zt−1 W · · · (D Z0 W ) ≤ (1/5)|R| .
Pr [|S| > t/2] ≤ Pr [the walk is in X at precisely the times in R] √ √
|R|>t/2 As ∥p 0 ∥ = 1/ n and ∥1∥ = n, we may conclude
X 1 |R|
≤ 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0 ≤ 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0
5
|R|>t/2 ≤ 1T (1/5)|R| ∥p 0 ∥
(t+1)/2
≤ 2t+1
1 = (1/5)|R| .
5
2 t+1
= √ . 31.8 The norm of D X W
5
This implies p √
∥D X W c1∥ = c ∥χX ∥ = c |X| ≤ c n/10.
We will now show that ∥W y ∥ ≤ ∥y ∥ /10. The easiest way to see this is to consider the matrix
W − J /n,
where we recall that J is the all-1 matrix. This matrix is symmetric and all of its eigenvalues
have absolute value at most 1/10. So, it has norm at most 1/10. Moreover, (W − J /n)y = W y ,
which implies ∥W y ∥ ≤ ∥y ∥ /10. Another way to prove this is to expand y in the eigenbasis of
W , as in the proof of Lemma 2.1.3.
Finally, as 1 is orthogonal to y , q
∥x ∥ = c2 n + ∥y ∥2 .
So,
√ Part VI
∥D X W x ∥ ≤ ∥D X W c1∥ + ∥D X W y ∥ ≤ c n/10 + ∥y ∥ /10 ≤ ∥x ∥ /10 + ∥x ∥ /10 ≤ ∥x ∥ /5.
Algorithms
31.9 Conclusion
Observe that this is a very strange proof. When considering probabilities, it seems that it would
be much more natural to sum them. But, here we consider 2-norms of probability vectors.
31.10 Notes
For the best results on the number of bits one needs for each run of an algorithm, see [?].
For tighter results on the concentration on variables drawn from random walks on expanders, see
Gillman [Gil98]. For matrices, see [GLSS18].
254
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 256
We will prove this by using a very simple random construction. We first carefully1 choose a
probability pa,b for each edge (a, b). We then include each edge (a, b) with probabilty pa,b ,
independently. If we do include edge (a, b), we give it weight wa,b /pa,b . We will show that our
choice of probabilities ensures that the resulting graph H has at most 4n ln n/ϵ2 edges and is an ϵ
Chapter 32 approximation of G with high probability.
The reason we employ this sort of sampling–blowing up the weight of an edge by dividing by the
probability that we choose it—is that it preserves the matrix in expectation. Let La,b denote the
Sparsification by Random Sampling elementary Laplacian on edge (a, b) with weight 1, so that
X
LG = wa,b La,b .
(a,b)∈E
Two weeks ago, we learned that expander graphs are sparse approximations of the complete
graph. This week we will learn that every graph can be approximated by a sparse graph. Today,
we will see how a sparse approximation can be obtained by careful random sampling: every graph
32.3 Matrix Chernoff Bounds
on n vertices has an ϵ-approximation with only O(ϵ−2 n log n) edges (a result of myself and
Srivastava [SS11]). We will prove this using a matrix Chernoff bound due to Tropp [Tro12]. The main tool that we will use in our analysis is a theorem about the concentration of random
matrices. These may be viewed as matrix analogs of the Chernoff bound that we saw in Lecture
We originally proved this theorem using a concentration bound of Rudelson [Rud99]. This 5. These are a surprisingly recent development, with the first ones appearing in the work of
required an argument that used sampling with replacement. When I taught this result in 2012, I Rudelson and Vershynin [Rud99, RV07] and Ahlswede and Winter [AW02]. The best present
asked if one could avoid sampling with replacement. Nick Harvey pointed out to me the argument source for these bounds is Tropp [Tro12], in which the following result appears as Corollary 5.2.
that avoids replacement that I am presenting today.
Theorem 32.3.1. Let X 1 , . . . , X m be independent random n-dimensional
P symmetric positive
In the next lecture, we will see that the log n term is unnecessary. In fact, almost every graph can semidefinite matrices so that ∥X i ∥ ≤ R almost surely. Let X = i X i and let µmin and µmax be
be approximated by a sparse graph almost as well as the Ramanujan graphs approximate the minimum and maximum eigenvalues of
complete graphs.
X
E [X ] = E [X i ] .
i
32.2 Sparsification
Then,
" # µmin /R
For this lecture, I define a graph H to be an ϵ-approximation of a graph G if X e−ϵ
Pr λmin ( X i ) ≤ (1 − ϵ)µmin ≤ n , for 0 < ϵ < 1, and
(1 − ϵ)1−ϵ
(1 − ϵ)LG ≼ LH ≼ (1 + ϵ)LG . "
i
#
X µmax /R
eϵ
We will show that every graph G has a good approximation by a sparse graph. This is a very Pr λmax ( X i ) ≥ (1 + ϵ)µmax ≤ n , for 0 < ϵ.
strong statement, as graphs that approximate each other have a lot in common. For example, (1 + ϵ)1+ϵ
i
1. the effective resistance between all pairs of vertices are similar in the two graphs, It is important to note that the matrices X 1 , . . . , X m can have different distributions. Also note
that as the norms of these matrices get bigger, the bounds above become weaker. As the
2. the eigenvalues of the graphs are similar,
1
For those who can’t stand the suspense, we reveal that we will choose the probabilities to be proportional to
3. the boundaries of all sets are similar, as these are given by χTS LG χS , and leverage scores of the edges.
255
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 257 CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 258
expressions above are not particularly easy to work with, we often use the following so that X
+/2 +/2
approximations. LG LH LG = X a,b .
(a,b)∈E
e−ϵ 2
≤ e−ϵ /2 , for 0 < ϵ < 1, and We will choose the probabilities to be
(1 − ϵ)1−ϵ
ϵ
e 2
1
≤ e−ϵ /3 , for 0 < ϵ < 1. def
pa,b =
+/2 +/2
wa,b LG L(a,b) LG ,
(1 + ϵ)1+ϵ R
Chernoff (and Hoeffding and Bernstein) bounds rarely come in exactly the form you want. for an R to be chosen later. Thus, when edge (a, b) is chosen, ∥Xa,b ∥ = R. Making this value
Sometimes you can massage them into the needed form. Sometimes you need to prove your own. uniform for every edge optimizes one part of Theorem 32.3.1.
For this reason, you may some day want to spend a lot of time reading how these are proved. You may wonder what we should do if one of these probabilities pa,b exceeds one. There are many
ways of addressing this issue. For now, pretend that it does not happen. We will then explain
how to deal with this at the end of lecture.
32.4 The key transformation
Recall that the leverage score of edge (a, b) written ℓa,b was defined in Lecture 14 to be the weight
of an edge times the effective resistance between its endpoints:
Before applying the matrix Chernoff bound, we make a transformation that will cause
µmin = µmax = 1. ℓa,b = wa,b (δ a − δ b )T L+
G (δ a − δ b ).
For positive definite matrices A and B, we have
To see the relation between the leverage score and pa,b , compute
A ≼ (1 + ϵ)B ⇐⇒ B −1/2 AB −1/2 ≼ (1 + ϵ)I .
The same things holds for singular semidefinte matrices that have the same nullspace:
+/2 +/2 +/2 +/2
+/2 +/2 +/2 +/2
LG L(a,b) LG = LG (δ a − δ b )(δ a − δ b )T LG
LH ≼ (1 + ϵ)LG ⇐⇒ LG LH LG ≼ (1 + ϵ)LG LG LG ,
+/2 +/2
+/2
= (δ a − δ b )T LG LG (δ a − δ b )
where LG is the square root of the pseudo-inverse of LG . Let
= (δ a − δ b )T L+
G (δ a − δ b )
+/2 +/2
Π = LG LG LG , = Reff (a, b).
which is the projection onto the range of LG . We now know that LG is an ϵ-approximation of LH
+/2 +/2 As we can quickly approximate the effective resistance of every edge, we can quickly compute
if and only if LG LH LG is an ϵ-approximation of Π.
sufficient probabilities.
As multiplication by a fixed matrix is a linear operation and expectation commutes with linear
Recall that the leverage score of an edge equals the probability that the edge appears in a random
operations,
+/2 +/2 +/2 +/2 +/2 +/2 spanning tree. As every spanning tree has n − 1 edges, this means that the sum of the leverage
ELG LH LG = LG (ELH ) LG = ELG LG LG = Π. scores is n − 1, and thus
X n−1 n
So, we really just need to show that this random matrix is probably close to its expectation, Π. It pa,b = ≤ .
R R
would probably help to pretend that Π is in fact the identity, as it will make it easier to (a,b)∈E
understand the analysis. In fact, you don’t have to pretend: you could project all the vectors and This is a very clean bound on the expected number of edges in H. One can use a Chernoff bound
matrices onto the span of Π and carry out the analysis there. (on real variables rather than matrices) to prove that it is exponentially unlikely that the number
of edges in H is more than any small multiple of this.
Let (
+/2 +/2
(wa,b /pa,b )LG L(a,b) LG with probability pa,b
X a,b =
0 otherwise,
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 259 CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 260
For your convenience, I recall another proof that the sum of the leverage scores is n − 1: We finally return to deal with the fact that there might be some edges for which pa,b ≥ 1 and so
X X definitely appear in H. There are two natural ways to deal with these—one that is easiest
ℓa,b = wa,b Reff (a, b) algorithmically and one that simplifies the proof. The algorithmically natural way to handle these
(a,b)∈E (a,b)∈E is to simply include these edges in H, and remove them from the analysis above. This requires a
X
wa,b (δ a − δ b )T L+ small adjustment to the application of the Matrix Chernoff bound, but it does go through.
= G (δ a − δ b )
(a,b)∈E From the perspective of the proof, the simplest way to deal with these is to split each such X a,b
X
= wa,b Tr L+
G (δ a − δ b )(δ a − δ b )
T into many independent random edges: k = ⌊ℓa,b /R⌋ that appear with probability exactly 1, and
(a,b)∈E one more that appears with probability ℓa,b /R − k. This does not change the expectation of their
sum, or the expected number of edges once we remember to add together the weights of edges
X
= Tr L+ T
− δ b )(δ a − δ b ) that appear multiple times. The rest of the proof remains unchanged.
G wa,b (δ a
(a,b)∈E
X 32.7 Open Problem
= Tr L+
G wa,b La,b
(a,b)∈E
If I have time in class, I will sketch a way to quickly approximate the effective resistances of every
= Tr L+
G LG
edge in the graph. The basic idea, which can be found in [SS11] and which is carried out better in
= Tr (Π)
[KLP12], is that we can compute the effective resistance of an edge (a, b) from the solution to a
= n − 1. logarithmic number of systems of random linear equations in LG . That is, after solving a
logarithmic number of systems of linear equations in LG , we have information from which we can
estimates all of the effective resistances.
32.6 The analysis
In order to sparsify graphs, we do not actually need estimates of effective resistances that are
always accurate. We just need a way to identify many edges of low effective resistance, without
We will choose listing any that have high effective resistance. I believe that better algorithms for doing this
ϵ2
R= . remain to be found. Current fast algorithms that make progress in this direction and that exploit
3.5 ln n
such estimates may be found in [KLP12, Kou14, CLM+ 15, LPS15]. These, however, rely on fast
Thus, the number of edges in H will be at most 4(ln n)ϵ−2 with high probability. Laplacian equation solvers. It would be nice to be able to estimate effective resistances without
We have these. A step in this direction was recently taken in the works [CGP+ 18, LSY18], which quickly
X
EX a,b = Π. decompose graphs into the union of short cycles plus a few edges.
(a,b)∈E
For the lower bound, we need to remember that we can just work orthogonal to the all-1s vector,
and so treat the smallest eigenvalue of Π as 1. We then find that
X
Pr X a,b ≤ (1 − ϵ)Π ≤ n exp −ϵ2 /2R = n exp (−(3.5/2) ln n) = n−3/2 ,
a,b
CHAPTER 33. LINEAR SIZED SPARSIFIERS 262
where we recall that Π = n1 LKn is the projection orthogonal to the constant vectors.
The problem of sparsification is then the problem of finding a small subset of these vectors,
S ⊆ E, along with scaling factors, c : S → IR, so that
X
(1 − ϵ)Π ≼ ca,b v (a,b) v T(a,b) ≼ (1 + ϵ)Π
Chapter 33 (a,b)∈S
If we project onto the span of the Laplacian, then the sum of the outer products of vectors v (a,b)
becomes the identity, and our goal is to find a set S and scaling factors ca,b so that
That is, so that all the eigenvalues of the matrix in the middle lie between (1 − ϵ) and (1 + ϵ).
33.1 Overview
33.3 The main theorem
In this lecture, we will prove a slight simplification of the main result of [BSS12, BSS14]. This will
tell us that every graph with n vertices has an ϵ-approximation with approximately 4ϵ−2 n edges. Theorem 33.3.1. Let v 1 , . . . , v m be vectors in IRn so that
To translate this into a relation between approximation quality and average degree, note that
X
such a graph has average degree dave = 8ϵ−2 . So, v i v Ti = I .
√ i
2 2
ϵ≈ √ ,
d Then, for every ϵ > 0 there exists a set S along with scaling factors ci so that
X
which is about twice what you would get from a Ramanujan graph. Interestingly, this result even (1 − ϵ)2 I ≼ ci v i v Ti ≼ (1 + ϵ)2 I ,
works for average degree just a little bit more than 1. i∈S
and
|S| ≤ n/ϵ2 .
33.2 Turning edges into vectors
The condition that the sum of the outer products of the vectors sums to the identity has a name,
In the last lecture, we considered the Laplacian matrix of a graph G times the square root of the
isotropic position. I now mention one important property of vectors in isotropic position
pseudoinverse on either side. That is,
Lemma 33.3.2. Let v 1 , . . . , v m be vectors in isotropic position. Then, for every matrix M ,
+/2
X +/2 X
LG wa,b L(a,b) LG . v Ti M v i = Tr (M ) .
(a,b)∈E i
Today, it will be convenient to view this as a sum of outer products of vectors. Set
Proof. We have
√ +/2 v T M v = Tr v v T M ,
v (a,b) = wa,b LG (δ a − δ b ).
so ! !
Then, X X X
X X v Ti M v i = Tr v i v Ti M = Tr v i v Ti M = Tr (I M ) = Tr (M ) .
+/2 +/2
LG wa,b L(a,b) LG = v (a,b) v T(a,b) = Π, i i i
(a,b)∈E (a,b)∈E
261
CHAPTER 33. LINEAR SIZED SPARSIFIERS 263 CHAPTER 33. LINEAR SIZED SPARSIFIERS 264
we will add a vector to S more than once, in which case we will increase its scaling factor each This is positive for all upper bounds u, goes to infinity as u approaches the largest eigenvalue,
time. Throughout the argument we will maintain the invariant that the eigenvalues of the scaled decreases as u grows, and is convex for u > λn . In particular, we will use
sum of outer produces is in the interval [l, u], where l and u are quantities that will change with Φu+δ (A) < Φu (A), for δ > 0. (33.1)
each addition to S. At the start of the algorithm, when S is empty, we will have
Also, observe that
l0 = −n and u0 = n. λn ≤ u − 1/Φu (A). (33.2)
We will exploit the following formula for the upper barrier function:
Every time we add a vector to S, we increase l by δL and u by δU , where
Φu (A) = Tr (uI − A)−1 .
δL = 1/3 and δU = 2.
For a lower bound on the eigenvalues l, we will define an analogous lower barrier function
After we have done this 6n times, we will have l = n and u = 13n. X 1
Φl (A) = = Tr (A − lI )−1 .
λi − l
i
33.4 Rank-1 updates This is positive whenever l is smaller than all the eigenvalues, goes to infinity as l approaches the
smallest eigenvalue, and decreases as l becomes smaller. In particular,
We will need to understand what happens to a matrix when we add the outer product of a vector. l + 1/Φl (A) ≤ λ1 . (33.3)
Theorem 33.4.1 (Sherman-Morrison). Let A be a nonsingular symmetric matrix and let v be a The analog of (33.1) is the following.
vector and let c be a real number. Then, Claim 33.5.1. Let l be a lower bound on A and let δ < 1/Φl (A). Then,
A−1 v v T A−1 1
T −1
(A − cv v ) =A −1
+c . Φl+δ (A) ≤ .
1 − cv T A−1 v 1/Φl (A) − δ
uI − A)−2 v
v T (b If we fix the vector v and an increment δL , then this gives a lower bound on the scaling factor by
Φu+δU (A + cv v T ) = Φu+δU (A) + c which we need to multiply it for the lower barrier function not to increase.
uI − A)−1 v
1 − cv T (b
uI − A)−2 v
v T (b
= Φu (A) − Φu (A) − Φu+δU (A) + .
uI − A)−1 v
1/c − v T (b 33.7 The inductive argument
We would like for this to be less than Φu (A). If we commit to how much we are going to increase
It remains to show that there exits a vector v and a scaling factor c so that
u, then this gives an upper bound on how large c can be. We want
Φu+δU (A + cv v T ) ≤ Φu (A) and Φl+δL (A + cv v T ) ≤ Φl (A).
uI − A)−2 v
v T (b
Φu (A) − Φu+δU (A) ≥ ,
uI − A)−1 v
1/c − v T (b That is, we need to show that there is a vector v i so that
which is equivalent to v Ti U A v i ≤ v Ti LA v i .
1 v T (b
−
uI A)−2 v
≥ u uI − A)−1 v .
+ v T (b Once we know this, we can set c so that
c (Φ (A) − Φu+δU (A))
1
Define v Ti U A v i ≤ ≤ v Ti LA v i .
((u + δu )I − A)−2 c
UA = u + ((u + δu )I − A)−1 .
(Φ (A) − Φu+δU (A)) Lemma 33.7.1. X 1
We have established a clean condition for when we can add cv v T to S and increase u by δU v Ti U A v i ≤ + Φu (A).
δU
i
without increasing the upper barrier function.
CHAPTER 33. LINEAR SIZED SPARSIFIERS 267 CHAPTER 33. LINEAR SIZED SPARSIFIERS 268
Proof. By Lemma 33.3.2, we know So, for there to exist a v i that we can add to S with scale factor c so that neither barrier function
X increases, we just need that
v Ti U A v i = Tr (U A ) .
1 1 1
i + Φu (A) ≤ − .
δU δL 1/Φl (A) − δ
To bound this, we break it into two parts
If this holds, then there is a v i so that
uI − A)−2
Tr (b
v i U A v i ≤ v i LA v i .
(Φu (A) − Φu+δU (A))
We then set c so that
and 1
v iU Av i ≤ ≤ v i LA v i .
uI − A)−1 .
Tr (b c
The second term is easiest We now finish the proof by checking that the numbers I gave earlier satisfy the necessary
conditions. At the start both barrier functions are less than 1, and we need to show that this
uI − A)−1 = Φu+δ (A) ≤ Φu (A).
Tr (b
holds throughout the algorithm. At every step, we will have by induction
To bound the first term, consider the derivative of the barrier function with respect to u: 1 1 3
+ Φu (A) ≤ + 1 = ,
∂ u ∂ X 1 X 1 2 δU 2 2
Φ (A) = =− = −Tr (uI − A)−2 . and
∂u ∂u u − λi u − λi 1 1 1 3
i i
− ≥3− = .
As Φu (A) is convex in u, we may conclude that δL 1/Φl (A) − δL 1 − 1/3 2
So, there is always a v i that we can add to S and a scaling factor c so that both barrier function
∂ u+δu
Φu (A) − Φu+δU (A) ≥ −δU Φ uI − A)−2 .
(A) = δU Tr (b remain upper bounded by 1.
∂u
If we now do this for 6n steps, we will have
l = −n + 6n/3 = n and u = n + 2 · 6n = 13n.
The analysis for the lower barrier is similar, but the second term is slightly more complicated.
The bound stated at the beginning of the lecture comes from tightening the analysis. In
Lemma 33.7.2. X particular, it is possible to improve Lemma 33.7.2 so that it says
1 1
v Ti LA v i ≥ − . X 1 1
δL 1/Φl (A) − δL v Ti LA v i ≥ − .
i
δL 1/Φl (A)
i
Proof. As before, we bound I recommend the paper for details.
Tr (A − (l + δL I ))−2
Φl+δL (A) − Φl (A)
33.8 Progress and Open Problems
by recalling that
∂
Φl (A) = Tr (A − lI )−2 . • It is possible to generalize this result to sums of positive semidefinite matrices, instead of
∂l
outer products of vectors [dCSHS11].
As Φl (A) is convex in l, we have
∂ • It is now possible to compute sparsifiers that are almost this good in something close to
Φl+δL (A) − Φl (A) ≤ δL Φl+δL (A) = δL Tr (A − (l + δL )I )−2 . linear time. [AZLO15, LS15].
∂l
• Given last lecture, it seems natural to conjecture that the scaling factors of edges should be
To bound the other term, we use Claim 33.5.1 to prove proportional to their weights times effective resistances. Similarly, one might conjecture
1 that if all vectors v i have the same norm, then the scaling factors are unnecessary. This is
Tr (A − (l + δL I )−1 ≤ .
1/Φl (A) − δL true, but not obvious. In fact, it is essentially equivalent to the Kadison-Singer problem
[MSS14, MSS15c].
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 270
To get started, we will examine a simple, but sub-optimal, iterative method, Richardson’s
iteration. The idea of the method is to find an iterative process that has the solution to Ax = b
as a fixed point, and which converges. We observe that if Ax = b, then for any α,
Iterative solvers for linear equations This leads us to the following iterative process:
x t = (I − αA)x t−1 + αb, (34.1)
where we will take x0 = 0. We will show that this converges if
We introduce basic iterative solvers for systems of linear equations: Richardson iteration and I − αA
Chebyshev’s method. We discuss Conjugate Gradient in the next Chapter, and iterative
refinement and preconditioning in Chapter 36. has norm less than 1, and that the convergence rate depends on how much the norm is less than
1. This is analogous to our analysis of random walks on graphs from Chapter 10.
As we are assuming A is symmetric, I − αA is symmetric as well, and so its norm is the
34.1 Why iterative methods? maximum absolute value of its eigenvalues. Let 0 < λ1 ≤ λ2 . . . ≤ λn be the eigenvalues of A.
Then, the eigenvalues of I − αA are
One is first taught to solve linear systems like 1 − αλi ,
and the norm of I − αA is
Ax = b
max |1 − αλi | = |max (1 − αλ1 , 1 − αλn )| .
i
by direct methods such as Gaussian elimination, computing the inverse of A, or the LU
factorization. However, elimination algorithms can be very slow. This is especially true when A is This is minimized by taking
sparse. Just writing down the inverse takes O(n2 ) space, and computing the inverse takes O(n3 ) 2
α= ,
time if we do it naively. This might be OK if A is dense. But, it is very wasteful if A only has λn + λ1
O(n) non-zero entries. in which case the smallest and largest eigenvalues of I − αA become
In general, we prefer algorithms whose running time is proportional to the number of non-zero λn − λ1
± ,
entries in the matrix A, and which do not require much more space than that used to store A. λn + λ1
Iterative algorithms solve linear equations while only performing multiplications by A, and and the norm of I − αA becomes
2λ1
performing a few vector operations. Unlike the direct methods which are based on elimination, the 1− .
λn + λ1
iterative algorithms do not find exact solutions. Rather, they get closer and closer to the solution
the longer they work. The advantage of these methods is that they need to store very little, and While we might not know λn + λ1 , a good guess is often sufficient. If we choose an
are often much faster than the direct methods. When A is symmetric, the running times of these α < 2/(λn + λ1 ), then the norm of I − αA is at most
methods are determined by the eigenvalues of A. 1 − αλ1 .
Throughout this lecture we will assume that A is positive definite or positive semidefinite.
To show that x t converges to the solution, x , consider the difference x − x t . We have
x − x t = ((I − αA)x + αb) − (I − αA)x t−1 + αb
= (I − αA)(x − x t−1 ).
269
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 271 CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 272
In contrast, Gaussian elimination on expanders is slow: it takes time Ω(n3 ) and requires space To get some idea of why this should be an approximation of A−1 , consider the limit as t goes to
Ω(n2 ) [LRT79]. infinity. Assuming that the infinite sum converges, we obtain
1
For general matrices, the condition number is defined to be the ratio of the largest to smallest singular value. ∞
X
α (I − αA)i = α (I − (I − αA))−1 = α(αA)−1 = A−1 .
i=0
So, the Richardson iteration can be viewed as a truncation of this infinite summation.
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 273 CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 274
In general, a polynomial pt will enable us to compute a solution to precision ϵ if 34.7 Chebyshev Polynomials
∥pt (A)b − x ∥ ≤ ϵ ∥x ∥ .
I’d now like to explain how we find these better polynomials. The key is to transform one of the
As b = Ax , this is equivalent to most fundamental polynomials: the Chebyshev polynomials. These polynomials are as small as
∥pt (A)Ax − x ∥ ≤ ϵ ∥x ∥ , possible on [−1, 1], and grow quickly outside this interval. We will translate the interval [−1, 1] to
obtain the polynomials we need.
which is equivalent to
∥Apt (A) − I ∥ ≤ ϵ The tth Chebyshev polynomial, Tt (x) has degree t, and may be defined by setting
T0 (x) = 1, T1 (x) = x,
34.6 Better Polynomials and for t ≥ 2
Tt (x) = 2xTt−1 (x) − Tt−2 (x).
This leads us to the question of whether we can find better polynomial approximations to A−1 .
The reason I ask is that the answer is yes! As A, pt (A) and I all commute, the matrix These polynomials are best understood by realizing that they are the polynomials for which
2. qt (0) = 1, To compute the values of the Chebyshev polynomials outside [−1, 1], we use the hyperbolic cosine
function. Hyperbolic cosine maps the real line to [1, ∞] and is symmetric about the origin. So,
for √ √ the inverse of hyperbolic cosine may be viewed as a map from [1, ∞] to [0, ∞], and satisfies
ϵ ≤ 2(1 + 2/ κ)−t ≤ 2e−2t/ κ ,
p
where acosh(x) = ln x + x2 − 1 , for x ≥ 1.
λmax
κ= .
λmin
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 275 CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 276
Claim 34.7.2. For γ > 0, p t We know that |Tt (l(x))| ≤ 1 for x ∈ [λmin , λmax ]. To find q(x) for x in this range, we must
Tt (1 + γ) ≥ (1 + 2γ) /2. compute Tt (l(0)). We have
l(0) ≥ 1 + 2/κ(A),
Proof. Setting x = 1 + γ, we compute and so by properties 3 and 4 of Chebyshev polynomials,
1 t acosh(x) √
Tt (x) = e + e−t acosh(x) Tt (l(0)) ≥ (1 + 2/ κ)t /2.
2
1 t acosh(x)
≥ e Thus, √
2 q(x) ≤ 2(1 + 2/ κ)−t ,
1 p
= (x + x2 − 1)t √
2 for x ∈ [λmin , λmax ], and so all eigenvalues of q(A) will have absolute value at most 2(1 + 2/ κ)−t .
1 p
= (1 + γ + (1 + γ)2 − 1)t
2
1 p
= (1 + γ + 2γ + γ 2 )t 34.9 Laplacian Systems
2
1 p
≥ (1 + 2γ)t . One might at first think that these techniques do not apply to Laplacian systems, as these are
2
always singular. However, we can apply these techniques without change if b is in the span of L.
That is, if b is orthogonal to the all-1s vector and the graph is connected. In this case the
eigenvalue λ1 = 0 has no role in the analysis, and it is replaced by λ2 . One way of understanding
this is to just view L as an operator acting on the space orthogonal to the all-1s vector.
34.8 Proof of Theorem 34.6.1
By considering the example of the Laplacian of the path graph, one can show that it is impossible
√
to do much better than the κ iteration bound that I claimed at the end of the last section. To
We will exploit the following properties of the Chebyshev polynomials: see this, first observe that when one multiplies a vector x by L, the entry (Lx )(i) just depends on
x (i − 1), x (i), and x (i + 1). So, if we apply a polynomial of degree at most t, x t (i) will only
1. Tt has degree t. depend on b(j) with i − t ≤ j ≤ i + t. This tells us that we will need a polynomial of degree on
the order of n to solve such a system.
2. Tt (x) ∈ [−1, 1], for x ∈ [−1, 1]. p
On the other hand, λn /λ2 is on the order of n as well. So, we shouldp not be able to solve the
3. Tt (x) is monotonically increasing for x ≥ 1. system with a polynomial whose degree is significantly less than λn /λ2 .
√ t
4. Tt (1 + γ) ≥ (1 + 2γ) /2, for γ > 0.
To express qt (x) in terms of a Chebyshev polynomial, we should map the range on which we want
34.10 Warning
qt to be small, [λmin , λmax ] to [−1, 1]. We will accomplish this with the linear map:
The polynomial-based approach that I have described here only works in infinite precision
def λmax + λmin − 2x arithmetic. In finite precision arithmetic one has to be more careful about how one implements
l(x) = .
λmax − λmin these algorithms. This is why the descriptions of methods such as the Chebyshev method found
in Numerical Linear Algebra textbooks are more complicated than that presented here. The
Note that
algorithms that are actually used are mathematically identical in infinite precision, but they
−1 if x = λmax
actually work. The problem with the naive implementations are the typical experience: in
l(x) = 1 if x = λmin
λmax +λmin double-precision arithmetic the polynomial approach to Chebyshev will fail to solve linear systems
λmax −λmin if x = 0. in random positive definite matrices in 60 dimensions!
To guarantee that the constant coefficient in qt (x) is one (qt (0) = 1), we should set
def Tt (l(x))
qt (x) = .
Tt (l(0))
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 278
The analysis above works because these methods produce x t by applying a linear operator, p(A),
to b that commutes with A. While most of the algorithms we use to solve systems of equations in
A will be linear operators, they will typically not commute with A. But, they will produce small
error in the A-norm.
The following theorem shows that a linear operator Z is an ϵ approximation of A−1 if and only if
Chapter 35 it produces at most ϵ error in the A-norm when used to solve systems of linear equations in A.
Theorem 35.1.1. Let A and Z be positive definite matrices. Then
∥Z Ax − x ∥A ≤ ϵ ∥x ∥A (35.1)
A1/2 (Z A − I )x ≤ ϵ A1/2 x .
We introduce the matrix norm as the measure of convergence of iterative methods, and show how Setting y = A1/2 x , this becomes equivalent to saying that for all y ,
the Conjugate Gradient method efficiently minimizes it. We finish by relating the rate of
(A1/2 Z A1/2 − I )y ≤ ϵ ∥y ∥ ,
convergence of any iterative method on a Laplacian matrix to the diameter of the underlying
graph. which we usually write
My description of the Conjugate Gradient method is inspired by Vishnoi’s [Vis12]. It is the A1/2 Z A1/2 − I ≤ ϵ.
simplest explanation of the Conjugate Gradient that I have seen. This is in turn equivalent to
−ϵI ≼ A1/2 Z A1/2 − I ≼ ϵI ⇐⇒
35.1 The Matrix Norm (1 − ϵ)I ≼ A1/2 Z A1/2 ≼ (1 + ϵ)I ⇐⇒
(1 − ϵ)A −1
≼Z ≼ (1 + ϵ)A −1
,
Recall from Chapter 14 that for a positive semidefinite matrix A, the matrix norm in A is defined
by where the last statement follows from multiplying on the left and right by A−1/2 .
√
∥x ∥A = x T Ax = A1/2 x .
For many applications, the right way to measure the quality of approximation of a system of 35.2 Application: Approximating Fiedler Vectors
linear equations Ax = b is by ∥x − x t ∥A . Many algorithms naturally produce bounds on the
error in the matrix norm. And, for many applications that use linear equation solvers as Approximately computing eigenvectors of the smallest eigenvalues of matrices, such as Fiedler
subroutines, this is the measure of accuracy in the subroutine that most naturally translates to vectors, is one application in which approximation in the A-norm is the right thing to do. In
accuracy of the outside algorithm. problem [?], we saw that the largest eigenvalue of a matrix can be approximated using the power
We should observe that both the Richardon and Chebyshev methods achieve ϵ error in the method. If we want the smallest eigenvalue, it is natural to use the power method on the inverse
A-norm. Let p be a polynomial such that of the matrix.
As we are only going to compute an approximation of the eigenvalue and its corresponding
∥p(A)A − I ∥ ≤ ϵ. eigenvector, we might as well use an approximation of the matrix inverse. If Z is an operator that
ϵ-approximates A−1 , then the largest eigenvalue of Z is within 1 ± ϵ of the largest eigenvalue of
Then,
A−1 , and the corresponding eigenvector has large Rayleigh quotient with respect to A−1 . As we
learned in problem [?], if there is a gap between this and the next eigenvalue, then this vector
∥p(A)b − x ∥A = A1/2 p(A)Ax − A1/2 x = (p(A)A − I )A1/2 x ≤ ϵ A1/2 x = ϵ ∥x ∥A .
makes a small angle with the eigenvector. See [ST14, Section 7] for a more detailed discussion.
277
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 279 CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 280
35.3 Optimality in the A-norm is minimized by setting its derivative in ci equal to zero, which gives
bT pi
The iterative methods that we consider begin with the vector b, and then perform multiplications ci = .
p Ti Ap i
by A and take linear combinations with vectors that have already been produced. So, after t
iterations they produce a vector that is in the span of
It remains to describe how we compute this A-orthogonal basis. The algorithm begins by setting
b, Ab, A2 b, . . . , At b .
p 0 = b.
This subspace is called the t + 1st Krylov subspace generated by A and b.
The next vector should be Ap 0 , but A-orthogonalized with respect to p 0 . That is,
The Conjugate Gradient will find the vector x t in this subspace that minimizes the error in the
A-norm. It will do so by computing a very useful basis of this subspace. But, before we describe (Ap 0 )T Ap 0
p 1 = Ap 0 − p 0 .
this basis, let’s examine the error in the A norm. p T0 Ap 0
We have It is immediate that
∥x t − x ∥2A = x Tt Ax t − 2x T Ax t + x T Ax = x Tt Ax t − 2b T x t + b T x . p T0 Ap 1 = 0.
T
While we do not know b x , we do know that ∥x t − x ∥2A is minimized when we minimize In general, we set
t
X
1 T (Ap t )T Ap i
x Ax t − b T x t . (35.2) p t+1 = Ap t − pi . (35.3)
2 t i=0
p Ti Ap i
So, we will work to minimize (35.2).
Let’s verify that p t+1 is A-orthogonal to p i for i ≤ t, assuming that p 0 , . . . , p t are A-orthogonal.
Let p 0 , . . . , p t be a basis of the t + 1st Krylov subspace, and let We have
t
X t
X (Ap t )T Ap i
xt = ci p i . p Tj Ap t+1 = p Tj AAp t − p Tj Ap i
i=0
p Ti Ap i
i=0
(Ap t )T Ap i
We would like to find the coefficients ci that minimize (35.2). Expanding x gives = p Tj A2 p t − p Tj Ap j
p Tj Ap j
t
!T t
! t
!
1 T 1 X X X = 0.
x t Ax t − b T x t = ci p i A ci p i − b T ci p i
2 2
i=0 i=0 i=0
t t The computation of p t+1 is greatly simplified by the observation that all but two of the terms in
1X 2 T X 1X
= ci p i Ap i − ci b T p i + ci cj p Ti Ap j . the sum (35.3) are zero: for i < t − 1,
2 2
i=0 i=0 i̸=j
(Ap t )T Ap i = 0.
To simplify the selection of the optimal constants ci , the Conjugate Gradient will compute a basis
p 0 , . . . , p t that makes the rightmost term 0. That is, it will compute a basis such that p Ti Ap j = 0 To see this, note that
for all i ̸= j. Such a basis is called an A-orthogonal basis. (Ap t )T Ap i = p Tt A(Ap i ),
When the last term is zero, the objective function becomes and that Ap i is in the span of p 0 , . . . , p i+1 . So, this term will be zero if i + 1 < t.
The computation of x t by Laplacian of the hypercube. While there are other fast algorithms the exploit the special structure
t
X bT p of the hypercube, CG works well when one has a graph that is merely very close to the hypercube.
xt = pi T i .
i=0
p i Ap i In general, CG works especially quickly on matrices in which the eigenvalues appear in just a few
Only requires an additional O(t) more such operations. clusters, and on matrices in which there are just a few extreme eigenvalues. We will learn more
about this in the next lecture.
In fact, only t multiplications by A are required to compute p 0 , . . . , p t and x 1 , . . . , x t : every term
in the expressions for these vectors can be derived from the products Ap i . Thus, the Conjugate
Gradient algorithm can find the x t in the t + 1st Krylov subspace that minimizes the error in the 35.5 Laplacian Systems, again
A-norm in time O(tn) plus the time required to perform t multiplications by A.
Caution: the algorithm that I have presented here differs from the implemented Conjugate This would be a good time to re-examine what we want when our matrix is a Laplacian. The
gradient in that the implemented Conjugate Gradient re-arranges this computation to keep the Laplacian does not have an inverse. Rather, we want a polynomial in the Laplacian that
norms of the vectors involved reasonably small. Without this adjustment, the algorithm that I’ve approximates its pseudo-inverse (which we defined back in Lecture 8). If we were exactly solving
described will fail in practice as the vectors p i will become too large. the system of linear equations, we would have found a polynomial p such that
p(L)b = x ,
35.4 How Good is CG?
where b = Lx , so this gives
p(L)Lx = x .
The Conjugate Gradient is at least as good as the Chebyshev iteration, in that it finds a vector of
smaller error in the A-norm in any given number of iterations. The optimality property of the Of course, this is only reasonable if x is in the span of L. If the underlying graph is connected,
Conjugate Gradient causes it to perform remarkably well. this only happens if x is orthogonal to the all-1s vector. Of course, L sends constant vectors to
zero. So, we want
For example, one can see that it should never require more than n iterations. The vector x is p(L)L = Π,
always in nth Krylov subspace. Here’s an easy way to see this. Let the distinct eigenvalues of A
where Π is the projection matrix that sends the constant vectors to zero, and acts as an identity
be λ1 , . . . , λk . Now, consider the polynomial
on the vectors that are orthogonal to the constant vectors. Recall that Π = n1 LKn .
Qk
def (λi − x)
q(x) = i=1 Qk . Similarly, p gives an ϵ-approximation of the pseudo-inverse if
i=1 λi
∥p(L)L − Π∥ ≤ ϵ.
You can verify that q is a degree k polynomial such that
q(0) = 1, and
q(λi ) = 0, for all i. 35.6 Bounds on the Diameter
So, CG should be able to find the exact answer to a system in A in k − 1 iterations. I say
“should” because, while this statement is true with infinite precision arithmetic, it doesn’t work Our intuition tells us that if we can quickly solve linear equations in the Laplacian matrix of a
out quite this well in practice. graph by an iterative method, then the graph should have small diameter. We now make that
intuition precise.
Ignoring for now issues of finite arithmetic, let’s consider the importance of this for sparse
matrices A. By a sparse matrix, I mean one with at most cn non-zero entries, for some constant If s and t are vertices that are at distance greater than d from each other, then
c. That’s not a rigorous definition, but it will help guide our discussion. Multiplication by a χTs Ld χt = 0.
sparse matrix can be done in time O(n). So, CG can solve a system of equations in a sparse
matrix in time O(n2 ). Note that this is proportional to how long it would take to just write the On the other hand, if L only has k distinct eigenvalues other than 0, then we can form a
inverse of A, and will probably be faster than any algorithm for computing the inverse. On the polynomial p of degree k − 1 such that
other hand, it only provides the solution to one system in A.
Lp(L) = Π.
For another interesting example, consider the hypercube graph on n vertices. It only has log2 n
distinct eigenvalues. So, CG will only need log2 n iterations to solve linear systems in the This allows us to prove the following theorem.
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 283
Theorem 35.6.1. Let G be a connected graph whose Laplacian has at most k distinct eigenvalues
other than 0. Then, the diameter of G is at most k.
Proof. Let d be the diameter of the graph and let s and t be two vertices at distance d from each
other. We have
e Ts Πe t = −1/n.
On the other hand, we have just described a polynomial in L with zero constant term, given by
Chapter 36
Lp(L), that has degree k and such that
Lp(L) = Π.
Preconditioning Laplacians
If the degree of this polynomial were less than d, we would have
e Ts Lp(L)e t = 0.
As this is not the case, we have d ≤ k. A preconditioner for a positive semidefinite matrix A is a positive semidefinite matrix B such
that it is easy to solve systems of linear equations in B and the condition number of B −1 A is
We can similarly obtain bounds on the diameter from approximate pseudo-inverses. If p is a small. A good preconditioner allows one to quickly solve systems of equations in A.
polynomial such that In this lecture, we will measure the quality of preconditioners in terms of the ratio
∥p(L)L − Π∥ ≤ ϵ,
def
κ(A, B) = β/α,
then
e Ts (p(L)L − Π)e t ≤ ∥e s ∥ ∥p(L)L − Π∥ ∥e t ∥ ≤ ϵ. where α is the largest number and β is the smallest such that
If s and t are at distance d from each other in the graph, and if the degree of p(L)L has degree
αB ≼ A ≼ βB.
less than d, then
e Ts (p(L)L − Π)e t = e Ts (−Π)e t = 1/n. Lemma 36.0.1. Let α and β be as defined above. Then, α and β are the smallest and largest
This is a contradiction if ϵ < 1/n. So, the polynomials we constructed from Chebyshev eigenvalues of B −1 A, excluding possible zero eigenvalues corresponding to a common nullspace of
polynomials imply the following theorem of Chung, Faber and Manteuffel [CFM94] A and B.
Theorem 35.6.2. Let G = (V, E) be a connected graph, and let λ2 ≤ · · · ≤ λn be its Laplacian We need to exclude the common nullspace when A and B are the Laplacian matrices of
eigenvalues. Then, the diameter of G is at most connected graphs. If these matrices have different nullspaces α = 0 or β = ∞ and the condition
r ! number β/α is infinite.
1 λn
+ 1 ln 2n.
2 λ2 Proof of Lemma 36.0.1. We just prove the statement for β, in the case where neither matrix is
singular. We have
which equals β.
Recall that the eigenvalues of B −1 A are the same as those of B −1/2 AB −1/2 and A1/2 B −1 A1/2 .
284
CHAPTER 36. PRECONDITIONING LAPLACIANS 285 CHAPTER 36. PRECONDITIONING LAPLACIANS 286
36.1 Approximate Solutions So, x 1 + x 2 , our new estimate of x , differs from x by at most an ϵ2 factor. Continuing in this
way, we can find an ϵk approximation of x after solving k linear systems in B. This procedure is
Recall the A-norm: called iterative refinement.
√
∥x ∥A = x T Ax = A1/2 x .
We say that xe is an ϵ-approximate solution to the problem Ax = b if 36.3 Iterative Methods in the Matrix Norm
∥e
x − x ∥A ≤ ϵ ∥x ∥A .
The iterative methods we studied last class can also be shown to produce good approximate
solutions in the matrix norm. Given a matrix A, these produce ϵ-approximation solutions after t
36.2 Iterative Refinement iterations if there is a polynomial q of degree t for which q(0) = 1 and |q(λi )| ≤ ϵ for all
eigenvalues of A. To see this, recall that we can define p(x) so that q(x) = 1 − xp(x), and set
We will now see how to use a very good preconditioner to solve a system of equations. Let’s xe = p(A)b,
consider a preconditioner B that satisfies
to get
(1 − ϵ)B ≼ A ≼ (1 + ϵ)B. ∥e
x − x ∥A = ∥p(A)b − x ∥A = ∥p(A)Ax − x ∥A .
So, all of the eigenvalues of
As I , A, p(A) and A1/2 all commute, this equals
A1/2 B −1 A1/2 − I
have absolute value at most ϵ. A1/2 p(A)Ax − A1/2 x = p(A)AA1/2 x − A1/2 x
−1
The vector B b is a good approximation of x in the A-norm. We have ≤ ∥p(A)A − I ∥ A1/2 x
−1 1/2 −1 1/2
B b −x A
= A B b −A x ≤ ϵ ∥x ∥A .
1/2 −1 1/2
= A B Ax − A x
= A 1/2
B −1
A 1/2
(A 1/2
x ) − A1/2 x
36.4 Preconditioned Iterative Methods
≤ A1/2 B −1 A1/2 − I A1/2 x Preconditioned iterative methods can be viewed as the extension of Iterative Refinement by
algorithms like Chebyshev iteration and the Preconditioned Conjugate Gradient. These usually
≤ ϵ A1/2 x
work with condition numbers much larger than 2.
= ϵ ∥x ∥A .
In each iteration of a preconditioned method we will solve a system of equations in B, multiply a
vector by A, and perform a constant number of other vector operations. For this to be
worthwhile, the cost of solving equations in B has to be low.
Remark: This result crucially depends upon the use of the A-norm. It fails under the Euclidean
We begin by seeing how the analysis with polynomials translates. Let λi be the ith eigenvalue of
norm.
B −1 A. If qt (x) = 1 − xpt (x) is a polynomial such that |qt (λi )| ≤ ϵ for all i, then
If we want a better solution, we can just compute the residual and solve the problem in the
def
residual. That is, we set x t = pt (B −1 A)B −1 b
x 1 = B −1 b,
and compute
r 1 = b − Ax 1 = A(x − x 1 ).
We then use one solve in B to compute a vector x 2 such that
∥(x − x 1 ) − x 2 ∥A ≤ ϵ ∥x − x 1 ∥A ≤ ϵ2 ∥x ∥A .
CHAPTER 36. PRECONDITIONING LAPLACIANS 287 CHAPTER 36. PRECONDITIONING LAPLACIANS 288
will be an ϵ-approximate solution to Ax = b: We will now show that a special type of tree, called a low-stretch spanning tree provides a very
good preconditioner. To begin, let T be a spanning tree of G. Write
∥x − x t ∥A = A1/2 x − A1/2 x t X X
LG = wu,v Lu,v = wu,v (χu − χv )(χu − χv )T .
= A1/2 x − A1/2 pt (B −1 A)B −1 b (u,v)∈E (u,v)∈E
1/2 1/2 −1 −1
= A x −A pt (B A)B Ax We will actually consider the trace of L−1
T LG . As the trace is linear, we have
1/2 1/2 −1 −1 1/2 1/2 X
= A x −A pt (B A)B A (A x) Tr L−1
T LG = wu,v Tr L−1
T Lu,v
(u,v)∈E
≤ I − A1/2 pt (B −1 A)B −1 A1/2 (A1/2 x ) . X
= wu,v Tr L−1
T (χu − χv )(χu − χv )
T
To evaluate this last term, we need to know the value of (χu − χv )T L−1 T (χu − χv ). You already
The Preconditioned Conjugate Gradient (PCG) is a magical algorithm that after t steps (each of know something about it: it is the effective resistance in T between u and v. In a tree, this equals
which involves solving a system in B, multiplying a vector by A, and performing a constant the distance in T between u and v, when we view the length of an edge as the reciprocal of its
number of vector operations) produces the vector x t that minimizes weight. This is because it is the resistance of a path of resistors in series. Let T (u, v) denote the
path in T from u to v, and let w1 , . . . , wk denote the weights of the edges on this path. As we
∥x t − x ∥A
view the weight of an edge as the reciprocal of its length,
over all vectors x t that can be written in the form pt (b) for a polynomial of degree at most t. k
X 1
That is, the algorithm finds the best possible solution among all iterative methods of the form we (χu − χv )T L−1
T (χu − χv ) = . (36.1)
have described. We first bound the quality of PCG by saying that it is at least as good as wi
i=1
Preconditioned Chebyshev, but it has the advantage of not needing to know α and β. We will
then find an improved analysis. Even better, the term (36.1) is something that has been well-studied. It was defined by Alon,
Karp, Peleg and West [AKPW95] to be the stretch of the unweighted edge (u, v) with respect to
the tree T . Moreover, the stretch of the edge (u, v) with weight wu,v with respect to the tree T is
36.5 Preconditioning by Trees defined to be exactly
X k
1
wu,v ,
Vaidya [Vai90] had the remarkable idea of preconditioning the Laplacian matrix of a graph by the wi
i=1
Laplacian matrix of a subgraph. If H is a subgraph of G, then
where again w1 , . . . , wk are the weights on the edges of the unique path in T from u to v. A
LH ≼ LG , sequence of works, begining with [AKPW95], has shown that every graph G has a spanning tree
in which the sum of the stretches of the edges is low. The best result so far is due to [AN12], who
so all eigenvalues of L−1
H LG are at least 1. Thus, we only need to find a subgraph H such that LH
prove the following theorem.
is easy to invert and such that the largest eigenvalue of L−1
H LG is not too big. Theorem 36.5.1. Every weighted graph G has a spanning tree subgraph T such that the sum of
It is relatively easy to show that linear equations in the Laplacian matrices of trees can be solved the stretches of all edges of G with respect to T is at most
exactly in linear time. One can either do this by finding an LU -factorization with a linear number
of non-zeros, or by viewing the process of solving the linear equation as a dynamic program that O(m log n log log n),
passes up once from the leaves of the tree to a root, and then back down. where m is the number of edges G. Moreover, one can compute this tree in time
O(m log n log log n).
CHAPTER 36. PRECONDITIONING LAPLACIANS 289 CHAPTER 36. PRECONDITIONING LAPLACIANS 290
Thus, if we choose a low-stretch spanning tree T , we will ensure that Applying this lemma to the analysis of the Preconditioned Conjugate Gradient, with
X 2/3 1/3
β = Tr L−1T LG and k = Tr L−1
T LG , we find that the algorithm produces ϵ-approximate
Tr L−1
T LG = wu,v (χu − χv )T L−1
T (χu − χv ) ≤ O(m log n log log n). solutions within
(u,v)∈E 1/3
O(Tr L−1
T LG ln(1/ϵ)) = O(m1/3 log n ln 1/ϵ)
In particular, this tells us that λmax (L−1
T LG ) is at most O(m log n log log n), and so the iterations.
Preconditioned Conjugate Gradient will require at most O(m1/2 log n) iterations, each of which
requires one multiplication by LG and one linear solve in LT . This gives an algorithm that runs in This result is due to Spielman and Woo [SW09].
time O(m3/2 log n log 1/ϵ), which is much lower than the O(n3 ) of Gaussian elimination when m,
the number of edges in G, is small.
36.7 Further Improvements
This result is due to Boman and Hendrickson [BH01].
We now have three families of algorithms for solving systems of equations in Laplaican matrices
36.6 Improving the Bound on the Running Time in nearly-linear time.
• By subgraph preconditioners. These basically work by adding back edges to the low-stretch
We can show that the Preconditioned Conjugate Gradient will actually run in closer to O(m1/3 )
trees. The resulting systems can no longer be solved directly in linear time. Instead, we use
iterations. Since the trace is the sum of the eigenvalues, we know that for every β > 0, L−1
T LG has Gaussian elimination to eliminate the degree 1 and 2 vertices to reduce to a smaller system,
at most and then solve that system recursively. The first nearly linear time algorithm of this form
Tr L−1
T LG /β ran in time O(m logc n log 1/ϵ), for some constant c [ST14]. An approach of this form was
eigenvalues that are larger than β. first made practical (and much simpler) by Koutis, Miller, and Peng [KMP11]. The
asymptotically fastest method also works this way. It runs in time
To exploit this fact, we use the following lemma. It basically says that we can ignore the largest
O(m log1/2 m logc log n log 1/ϵ), [CKM+ 14] (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu).
eigenvalues of B −1 A if we are willing to spend one iteration for each.
Lemma 36.6.1. Let λ1 , . . . , λn be positive numbers such that all of them are at least α and at • By sparsification (see my notes from Lecture 19 from 2015). These algorithms work rather
most k of them are more than β. Then, for every t ≥ k, there exists a polynomial p(X) of degree t differently, and do not exploit low-stretch spanning trees. They appear in the papers
such that p(0) = 1 and [PS14, KLP+ 16].
!−(t−k)
2 • Accelerating Gaussian elimination by random sampling, by Kyng and Sachdeva [KS16].
|p(λi )| ≤ 2 1 + p ,
β/α This is the most elegant of the algorithms. While the running time of the algorithms,
O(m log2 n log 1/ϵ) is not the asymptotically best, the algorithm is so simple that it is the
for all λi .
best in practice. An optimized implementation appears in the package Laplacian.jl.
Proof. Let r(X) be the polynomial we constructed using Chebyshev polynomials of degree t − k
There are other algorithms that are often fast in practice, but for which we have no theoretical
for which !−(t−k) analysis. I suggest the Algebraic Multigrid of Livne and Brandt, and the Combinatorial Multigrid
2 of Yiannis Koutis.
|r(X)| ≤ 2 1 + p ,
β/α
for all X between α and β. Now, set
Y 36.8 Questions
p(X) = r(X) (1 − X/λi ).
i:λi >β
I conjecture that it is possible to construct spanning trees of even lower stretch. Does every graph
This new polynomial is zero at every λi greater than β, and for X between α and β have a spanning tree of average stretch 2 log2 n? I do not see any reason this should not be true. I
Y also believe that this should be achievable by a practical algorithm. The best code that I know for
|p(X)| = |r(X)| |(1 − X/λi )| ≤ |r(X)| , computing low-stretch spanning trees, and which I implemented in Laplacians.jl, is a heuristic
i:λi >β
based on the algorithm of Alon, Karp, Peleg and West. √ However, I do not know an analysis of
as we always have X < λi in the product. their algorithm that gives stretch better than O(m2 log n ). The theoretically better low-stretch
CHAPTER 36. PRECONDITIONING LAPLACIANS 291
trees of Abraham and Neiman are obtained by improving constructions of [EEST08, ABN08].
However, they seem too complicated to be practical.
The eigenvalues of L−1
H LG are called generalized eigenvalues. The relation between generalized
eigenvalues and stretch is the first result of which I am aware that establishes a combinatorial
interpretation of generalized eigenvalues. Can you find any others?
Chapter 37
37.1 Recursion
Let H be obtained by adding a few edges back to a spanning tree T of G. As a large fraction of
the vertices of T will have degree 1 or 2, the same is true of H. We can eliminate these degree 1
292
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 293 CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 294
Lemma 37.1.1. Let T be a tree on n vertices. Then, more than half the vertices of T have degree Lemma 37.2.2. For every weighted graph G = (V, E, w), spanning tree T = (V, F, w) of G, and
1 or 2. a, b ∈ V , Reff G (a, b) ≤ StretchT (a, b).
Proof. The number of edges in T is n − 1, so the average degree of vertices in T is less than 2. Proof. Rayleigh’s Monotonicity Theorem tells us that Reff G (a, b) ≤ Reff T (a, b), and this latter
Thus T must contain at least one degree 1 vertex for every vertex of degree at least 3. The other term equals StretchT (a, b).
vertices have degree 2.
The problem with sampling edges with probability proportional to their effective resistance, or
We learned last lecture that if we keep eliminating degree 1 vertices from trees, then we will stretches, is that this will produce too many edges. Koutis, Miller, and Peng solve this problem
eventually eliminate all the vertices. An analogous fast is true for a graph that equals a tree plus by multiplicatively increasing the weights of the edges in a low-stretch spanning tree of G. Define
k edges. e = G + (s − 1)T.
G
1
Whether this matrix is actually upper triangular depends on the ordering of the vertices. We assume, without
loss of generality, that the vertices are ordered so that the matrix is upper triangular. e is the same as G, but every edge in the tree T has its weight multiplied by s.
That is, G
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 295 CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 296
Thus, for (a, b) not in the tree, We now describe the recursion. Let G0 = G, the input graph. We will eventually solve systems in
Gi by recursively solving systems in Gi+1 . Each system Gi+1 will have fewer edges than Gi , and
Reff Ge (a, b) ≤ Reff sT (a, b) ≤ (1/s)StretchT (a, b). thus we can use a brute force solve when the system becomes small enough. We will bound the
running time of solvers for systems in Gi in terms of the number of edges that are not in their
For every edge (a, b)inT we set pa,b = 1 and for every edge (a, b) ̸∈ T , we set spanning trees. We denote this by oi = mi − (ni − 1). There is some issue with o0 , so let’s assume
without much loss of generality that G0 does not have any degree 1 or 2 vertices, and thus the
4 ln n
pa,b ≥ min 1, 2
wa,b StretchT (a, b) . o0 ≥ n0 .
sϵ
Form Ge i by multiplying a low-stretch spanning tree of G by s, and use random sampling to
Define σ to be the average stretch of edges of G with respect to T :
produce Hi . We know that the number of off-tree edges in Hi is at most a 1/(cσ ln n) fraction of
X the number of off-tree edges in Gi . If the number of off-tree edges in Hi is less than ni /4, then we
σ = (1/m)StretchT (G) = (1/m) wa,b StretchT (a, b),
know that after eliminating degree 1 and 2 vertices we will be left with a graph having at most
(a,b)∈E
4ni vertices and 5ni edges. We let Gi+1 be this graph. If this number off of-tree edges in Hi is
e
and recall that σ ≤ O(log n). If we now form H by including edge (a, b) with probability pa,b , more than ni /4, then we just set Gi+1 = Hi .
e and that
then Theorem 37.2.1 tells us that with high probability H is an ϵ approximation of G In this way, we ensure that oi+1 ≤ oi /(cσ ln n). We can now prove by backwards induction on i
the number of edges of H that are not in T is at most that the time required to solve systems of equations in LGi is at most O(oi σ ln n). A solve in Gi
X to constant accuracy requires performing O(s1/2 ) solves in Gi+1 and as many multiplies by LGi .
4mσ ln n
pa,b ≤ . By induction we know that this takes time at most
sϵ2
(a,b)̸∈T
O(s1/2 (oi + oi−1 σ ln n)) ≤ O(s1/2 (2oi )) ≤ O(oi σ ln n).
So, by making s a little more than some constant times σ ln n, we can make sure that the number
of edges of H not in T is less than the number of edges of G not in T .
e To this end, we use the following multiplicative
But, we need to solve systems in G, not G.
37.3 Saving a log
property of condition numbers.
Claim 37.2.3.
κ(LG , LH ) ≤ κ(LG , LGe )κ(LGe , LH ).
As Ge differs from G by having the weights of some edges multiplied by s, κ(LG , L e ) ≤ s. Thus,
G
we will have κ(LG , LH ) ≤ s(1 + ϵ)/(1 − ϵ), and to get ϵ accurate solutions to systems in LG we
will need to solve some constant times κ(LG , LH ) 1/2 systems in LH . As we are going to keep ϵ
constant, this will be around s1/2 .
To make an efficient algorithm for solving systems in G out of an algorithm for solving systems in
H, it would be easiest if the cost of the solves in H is less than the cost of a multiply by G. As we
will solve the system in H around s1/2 times, it seems natural to ensure that the number of edges
of H that are not in T is at most the number of edges in G divide by s1/2 . That is, we want
4mσ ln n
s1/2 ≤ m,
sϵ2
which requires
s ≥ c(σ ln n)2 ,
for some constant c. We will now show that such a choice of c yields an algorithm for solving
e
linear equations in LG to constant accuracy in time O(m log2 n).
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 298
I begin by describing the idea behind the algorithm. This idea won’t quite work. But, we will see
how to turn it into one that does.
We will work with matrices that look like M = L + X where L is a Laplacian and X is a
Chapter 38 non-zero, non-negative diagonal matrix. Such matrices are called M-matrices. A symmetric
M-matrix is a matrix M with nonpositive off-diagonal entries such that M 1 is nonnegative and
nonzero. We have encountered M-matrices before without naming them. If G = (V, E) is a graph,
S ⊂ V , and G(S) is connected, then the submatrix of LG indexed by rows and columns in S is an
Fast Laplacian Solvers by M-matrix. Algorithmically, the problems of solving systems of equations in Laplacians and
symmetric M-matrices are equivalent.
Sparsification The sparsification results that we learned for Laplacians translate over to M-matrices. Every
M-matrix M can be written in the form X + L where L is a Laplacian and X is a nonnegative
b ≈ϵ L, then it is easy to show (too easy for homework) that
diagonal matrix. If L
b ≈ϵ X + L.
X +L
This Chapter Needs Editing
In Lecture 7, Lemma 7.3.1, we proved that if X has at least one nonzero entry and if L is
connected, then X + L is nonsingular. We write such a matrix in the form M = D − A where D
38.1 Overview is positive diagonal and A is nonnegative, and note that its being nonsingular and positive
semidefinite implies
We will see how sparsification allows us to solve systems of linear equations in Laplacian matrices D −A≻0 ⇐⇒ D ≻ A. (38.1)
and their sub-matrices in nearly linear time. By “nearly-linear”, I mean time Using the Perron-Frobenius theorem, one can also show that
O(m logc (nκ−1 ) log ϵ−1 ) for systems with m nonzero entries, n dimensions, condition number κ.
D ≻ −A. (38.2)
and accuracy ϵ.
This algorithm comes from [PS14]. Multiplying M by D −1/2 on either side, we obtain
I − D −1/2 AD −1/2 .
38.2 Today’s notion of approximation Define
B = D −1/2 AD −1/2 ,
In today’s lecture, I will find it convenient to define matrix approximations slightly differently and note that inequalities (38.1) and (38.2) imply that all eigenvalues of B have absolute value
from previous lectures. Today, I define A ≈ϵ B to mean strictly less than 1.
e−ϵ A ≼ B ≼ eϵ A. It suffices to figure out how to solve systems of equations in I − B. One way to do this is to
exploit the power series expansion:
Note that this relation is symmetric in A and B, and that for ϵ small eϵ ≈ 1 + ϵ.
(I − B)−1 = I + B + B 2 + B 3 + · · ·
The advantage of this definition is that
However, this series might need many terms to converge. We can figure out how many. If the
A ≈α B and B ≈β C implies A ≈α+β C . largest eigenvalue of B is (1 − κ) < 1, then we need at least 1/κ terms.
We can write a series with fewer terms if we express it as a product instead of as a sum:
X Y j
Bi = (I + B 2 ).
i≥0 j≥1
297
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 299 CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 300
To see why this works, look at the first few terms 38.5 D and A
(I +B)(I +B 2 )(I +B 4 ) = (I +B +B 2 +B 3 )(I +B 4 ) = (I +B +B 2 +B 3 )+B 4 (I +B +B 2 +B 3 ).
Unfortunately, we are going to need to stop writing matrices in terms of I and B, and return to
We only need O(log κ−1 ) terms of this product to obtain a good approximation of (I − B)−1 . writing them in terms of D and A. The reason this is unfortunate is that it makes for longer
j
The obstacle to quickly applying a series like this is that the matrices I + B 2 are probably dense. expressions.
We know how to solve this problem: we can sparsify them! I’m not saying that flippantly. We
The analog of (38.3) is
actually do know how to sparsify matrices of this form.
2j 1 −1
But, simply sparsifying the matrices I + B does not solve our problem because approximation (D − A)−1 = D + (I + D −1 A)(D − AD −1 A)−1 (I + AD −1 ) . (38.4)
b and B ≈ϵ B,b A bB b could be a very poor 2
is not preserved by products. That is, even if A ≈ϵ A
approximation of AB. In fact, since the product A bBb is not necessarily symmetric, we haven’t
In order to be able to work with this expression inductively, we need to check that the middle
even defined what it would mean for it to approximate AB.
matrix is an M-matrix.
Proof. We have It remains to confirm that sparsification satisfies the requirements of this lemma. The reason this
M = D − A ≼ (1 + 1/4)D ≤ e1/4 D, might not be obvious is that we allow A to have nonnegative diagonal elements. While this does
not interfere with condition b, you might be concerned that it would interfere with condition c. It
and
need not.
M = D − A ≽ D − (1/4)D = (3/4)D ≽ e−1/3 D.
Let C be the diagonal of A, and let L be the Laplacian of the graph with adjacency matrix
A − C , and set X so that X + L = D − A. Let L e be a sparse ϵ-approximation of L. By
Lemma 38.5.3. If (D, A) is an (α, α)-pair, then (D, AD −1 A) is an (α2 , 0)-pair. computing the quadratic form in elementary unit vectors, you can check that the diagonals of L
e approximate each other. If we now write L
and L e=D e − A,
e where Ae has zero diagonal, and set
Proof. From Lecture 14, Lemma 3.1, we know that the condition of the lemma is equivalent to b =D
D e +C b =A
and A e +C
the assertion that all eigenvalues of D −1 A have absolute value at most α, and that the conclusion
is equivalent to the assertion that all eigenvalues of D −1 AD −1 A lie between 0 and α2 , which is b and A
You can now check that D b satisfy the requirements of Lemma 38.5.4.
immediate as they are the squares of the eigenvalues of D −1 A.
You might wonder why we bother to keep diagonal elements in a matrix like A. It seems simpler
to get rid of them. However, we want (D, A) to be an (α, β) pair, and removing subtracting C
So, if we start with matrices D and A that are a (1 − κ, 1 − κ)-pair, then after applying this from both of them would make β worse. This might not matter too much as we have good control
transformation approximately log κ−1 + 2 times we obtain a (1/4, 0)-pair. But, the matrices in over β. But, I don’t yet see a nice way to carry out a proof that exploits this.
this pair could be dense. To keep them sparse, we need to figure out how approximating D − A
degrades its quality.
a. (D, A) is a (1 − κ, 0) pair, We begin with an M-matrix M 0 = D 0 − A0 . Since this matrix is nonsingular, there is a κ0 > 0 so
that (D 0 , A0 ) is a (1 − κ0 , 1 − κ0 ) pair.
b − A,
b. D − A ≈ϵ D b and
We now know that the matrix
b
c. D ≈ϵ D, D 0 − A0 D −1
0 A0
is an M-matrix and that (D 0 , A0 D −1
0 A0 )is a ((1 − κ0 )2 , 0)-pair. Define κ1 so that
b −A
then D b is an (1 − κe−2ϵ , 3ϵ)-pair. 1 − κ1 = (1 − κ)20 , and note that κ1 is approximately 2κ0 . Lemma 38.5.4 and the discussion
following it tells us that there is a (1 − κ1 e−2ϵ , 3ϵ)-pair (D 1 , A1 ) so that
Proof. First observe that
(1 − κ)D ≽ A ⇐⇒ D − A ≽ κD. D 1 − A1 ≈ϵ D 0 − A0 D −1
0 A0
b −A
b ≽ e−ϵ (D − A) ≽ e−ϵ κD ≽ e−2ϵ κD.
b Continuing inductively for some number k steps, we find (1 − κi , 3ϵ) pairs (D i , Ai ) so that
D
M i = D i − Ai
For the other side, compute
has O(n/ϵ2 ) nonzero entries, and
b ≽ eϵ D ≽ eϵ (D − A) ≽ (D
e2ϵ D b − A).
b
M i ≈ϵ D i − Ai−1 D −1
i−1 Ai−1 .
For ϵ ≤ 1/3, 3ϵ ≥ e2ϵ − 1, so
b ≽ (e2ϵ − 1)D
b ≽ −A.
b For the i such that κi is small, κi+1 is approximately twice κi . So, for k = 2 + log2 1/κ and ϵ close
3ϵD
to zero, we can guarantee that (D k , Ak ) is a (1/4, 1/4) pair.
We now see how this construction allows us to approximately solve systems of equations in
D 0 − A0 , and how we must set ϵ for it to work. For every 0 ≤ i < k, we have
1 1 1 −1 1
(D i −Ai )−1 D −1 + (I +D −1 −1 −1 −1 −1 −1
i Ai )(D i −Ai D i Ai ) (I +Ai D i ) ≈ϵ D i + (I +D i Ai )(D i+1 −Ai+1 ) (I +Ai
2 i 2 2 2
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 303
and
(D k − Ak )−1 ≈1/3 D −1
k .
Recall that two graphs G = (V, E) and H = (V, F ) are isomorphic if there exists a permutation π
of V such that
(a, b) ∈ E ⇐⇒ (π(a), π(b)) ∈ F.
Of course, we can express this relation in terms of matrices associated with the graphs. It doesn’t
matter much which matrices we use. So for this lecture we will use the adjacency matrices.
304
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES305 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES306
Every permutation may be realized by a permutation matrix. For the permutation π, this is the Lemma 39.3.1. Let A = ΨΛΨT and B = ΦΛΦT where Λ is a diagonal matrix with distinct
matrix Π with entries given by ( entries and Ψ and Φ are orthogonal matrices. A permutation matrix Π satisfies ΠAΠT = B if
1 if π(a) = b and only if there exists a diagonal ±1 matrix S for which
Π(a, b) =
0 otherwise.
ΠΨ = ΦS .
For a vector ψ, we see1 that
(Πψ) (a) = ψ(π(a)). Proof. Let ψ 1 , . . . , ψ n be the columns of Ψ and let ϕ1 , . . . , ϕn be the columns of Φ. Assuming
there is a Π for which ΠAΠT = B,
Let A be the adjacency matrix of G and let B be the adjacency matrix of H. We see that G and n n
H are isomorphic if and only if there exists a permutation matrix Π such that X X
ΦΛΦT = ϕi λi ϕTi = (Πψ i )λi (ψ Ti ΠT ),
i=1 i=1
ΠAΠT = B.
which implies that for all i
ϕi ϕTi = (Πψ i )(Πψ i )T .
39.3 Using Eigenvalues and Eigenvectors This in turn implies that
ϕi = ±Πψ i .
If G and H are isomorphic, then A and B must have the same eigenvalues. However, there are
many pairs of graphs that are non-isomorphic but which have the same eigenvalues. We will see To go the other direction, assume ΠΨ = ΦS . Then,
some tricky ones next lecture. But, for now, we note that if A and B have different eigenvalues,
then we know that the corresponding graphs are non-isomorphic, and we don’t have to worry ΠAΠT = ΠΨΛΨT ΠT = ΦS ΛS ΦT = ΦΛS S ΦT = ΦΛΦT = B,
about them.
as S and Λ are diagonal and thus commute, and S 2 = I .
For the rest of this lecture, we will assume that A and B have the same eigenvalues, and that
each of these eigenvalues has multiplicity 1. We will begin our study of this situation by Our algorithm for testing isomorphism will determine all such matrices S . Let S be the set of all
considering some cases in which testing isomorphism is easy. diagonal ±1 matrices. We will find diagonal matrices S ∈ S such that the set of rows of ΦS is
Recall that we can write the same as the set of rows of Ψ. As the rows of Ψ are indexed by vertices a ∈ V , we will write
A = ΨΛΨT , the row indexed by a as the row-vector
def
where Λ is the diagonal matrix of eigenvalues of A and Ψ is an orthonormal matrix holding its v a = (ψ 1 (a), . . . , ψ n (a)).
eigenvectors. If B has the same eigenvalues, we can write
Similarly denote the rows of Φ by vectors u a . In this notation, we are searching for matrices
B = ΦΛΦT . S ∈ S for which the set of vectors {v a }a∈V is identical to the set of vectors {u a S }a∈V We have
thus transformed the graph isomorphism problem into a problem about vectors:
If Π is the matrix of an isomorphism from G to H, then
ΠΨΛΨT ΠT = ΦΛΦT .
39.4 An easy case
As each entry of Λ is distinct, this looks like it would imply ΠΨ = Φ. But, the eigenvectors
(columns of Φ and Ψ) are only determined up to sign. So, it just implies I will say that an eigenvector ψ i is helpful if for all a ̸= b ∈ V , |ψ i (a)| =
̸ |ψ i (b)|. In this case, it is
very easy to test if G and H are isomorphic, because this helpful vector gives us a canonical name
ΠΨ = ΦS, for every vertex. If Π is an isomorphism from G to H, then Πψ i must be an eigenvector of B. In
fact, is must be ±ϕi . If the sets of absolute values of entries of ψ i and ϕi are the same, then we
where S is a diagonal matrix with ±1 entries on its diagonal. may find the permutation that maps A to B by mapping every vertex a to the vertex b for which
1
I hope I got that right. It’s very easy to confuse the permutation and its inverse. |ψ i (a)| = |ϕi (b)|.
The reason that I put absolute values in the definition of helpful, rather than just taking values, is
that eigenvectors are only determined up to sign. On the other hand, a single eigenvector
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES307 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES308
determines the isomorphism if ψ i (a) ̸= ψ i (b) for all a ̸= b and there is a canonical way to choose a vertices to which it can be mapped by automorphisms. We will discover the orbits by realizing
sign for the vector ψ i . For example, if the sum of the entries in ψ i is not zero, we can choose its that the orbit of a vertex a is the set of b for which v a S = v b for some S ∈ A.
sign to make the sum positive. In fact, unless ψ i and −ψ i have exactly the same set of values,
The set of orbits of vertices forms a partition of the vertices. We say that a partition of the
there is a canonical choice of the sign for this vector.
vertices is valid if every orbit is contained entirely within one set in the partition. That is, each
Even if there is no canonical choice of sign for this vector, it leaves at most two choices for the class of the partition is a union of orbits. Our algorithm will proceed by constructing a valid
isomorphism. partition of the vertices and then splitting classes in the partition until each is exactly an orbit.
Recall that a set is stabilized by a group if the set is unchanged when the group acts on all of its
members. We will say that a group G ⊆ S stabilizes a set of vertices C if it stabilizes the set of
39.5 All the Automorphisms
vectors {v a }a∈C . Thus, A is the group that stabilizes V .
The graph isomorphism problem is complicated by the fact that there can be many isomorphisms An orbit is stabilized by A, and so are unions of orbits and thus classes of valid partitions. We
from one graph to another. So, any algorithm for finding isomorphisms must be able to find many would like to construct the subgroup of S that stabilizes each orbit Cj . However, I do not yet see
of them. how to do that directly. Instead, we will construct a particular valid partition of the vertices, and
find for each class in the partition Cj the subgroup of Aj ⊆ S that stabilizes Cj , where here we
Recall that an automorphism of a graph is an isomorphism from the graph to itself. These form a are considering the actions of matrices S ∈ S on vectors v a . In fact, Aj will act transitively2 on
group which we denote aut(G): if Π and Γ are automorphisms of A then so is ΠΓ. Let A ⊆ S the class Cj . As A stabilizes every orbit, and thus every union of orbits, it is a subgroup of Aj . In
denote the corresponding set of diagonal ±1 matrices. The set A is in fact a group and is fact, A is exactly the intersection of all the groups Aj .
isomorphic to aut(G).
We now observe that we can use linear algebra to efficiently construct A from the groups Aj by
Here is a way to make this isomorphism very concrete: Lemma 39.3.1 implies that the exploiting the isomorphism between S and (Z/2)n . Each subgroup Aj is isomorphic to a
Π ∈ aut(G) and the S ∈ A are related by subgroup of (Z/2)n . Each subgroup of (Z/2)n is precisely a vector space modulo 2, and thus may
Π = ΨS ΨT and S = ΨT ΠΨ. be described by a basis. It will eventually become clear that by “compute Aj ” we mean to
compute such a basis. From the basis, we may compute a basis of the nullspace. The subgroup of
As diagonal matrices commute, we have that for every Π1 and Π2 in aut(G) and for (Z/2)n corresponding to A is then the nullspace of the span of the nullspaces of the subspaces
S 1 = ΨT Π1 Ψ and S 2 = ΨT Π2 Ψ, corresponding to the Aj . We can compute all these using Gaussian elimination.
Π1 Π2 = ΨS 1 ΨT ΨS 2 ΨT = ΨS 1 S 2 ΨT = ΨS 2 S 1 ΨT = ΨS 2 ΨT ΨS 1 ΨT = Π2 Π1 .
Thus, the automorphism group of a graph with distinct eigenvalues is commutative, and it is 39.7 The first partition
isomorphic to a subgroup of S.
It might be easier to think about these subgroups by realizing that they are isomorphic to We may begin by dividing vertices according to the absolute values of their entries in
subspaces of (Z/2Z)n . Let f : S → (Z/2Z)n be the function that maps the group of diagonal eigenvectors. That is, if |ψ i (a)| =
̸ |ψ i (b)| for some i, then we may place vertices a and b in
matrices with ±1 entries to vectors t modulo 2 by setting t(i) so that S (i, i) = (−1)t(i) . You different classes, as there can be no S ∈ S for which v a S = v b . The partition that we obtain this
should check that this is a group homomorphism: f (S 1 S 2 ) = f (S 1 ) + f (S 2 ). You should also way is thus valid, and is the starting point of our algorithm.
confirm that f is invertible.
For today’s lecture, we will focus on the problem of finding the group of automorphisms of a
graph with distinct eigenvalues. We will probably save the slight extension to finding
39.8 Unbalanced vectors
isomorphisms for homework. Note that we will not try to list all the isomorphisms, as there could
be many. Rather, we will give a basis of the corresponding subspace of (Z/2Z)n . We say that an eigenvector ψ i is unbalanced if there is some value x for which
|{a : ψ i (a) = x}| =
̸ |{a : ψ i (a) = −x}| .
39.6 Equivalence Classes of Vertices Such vectors cannot change sign in an automorphism. That is, S (i, i) must equal 1. The reason is
that an automorphism with S (i, i) = −1 must induce a bijection between the two sets above, but
Recall that the orbit of an element under the action of a group is the set of elements to which it is this is impossible if their sizes are different.
2
mapped by the elements of the group. Concretely, the orbit of a vertex a in the graph is the set of That is, for every a and b in Cj , there is an S ∈ Aj for which v a S = b b .
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES309 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES310
Thus, an unbalanced vector tells us that all vertices for which ψ i (a) = x are in different orbits rows are indexed by vertices in a ∈ Cj , whose columns are indexed by subsets R ⊆ Q, and whose
from those for which ψ i (a) = −x. This lets us refine classes. entries are given by
We now extend this idea in two ways. First, we say that ψ i is unbalanced on a class C if there is
1 if x > 0
some value x for which
MCj ,Q (a, R) = sgn(ψ R (a)), where I recall sgn(x) = −1 if x < 0, and
|{a ∈ C : ψ i (a) = x}| =
̸ |{a ∈ C : ψ i (a) = −x}| . 0 if x = 0.
By the same reasoning, we can infer that the sign of S (i, i) must be fixed to 1. Assuming, as will Lemma 39.9.1. If Q is independent on C then the columns of MC,Q are orthogonal.
be the case, that C is a class in a valid partition and thus a union of orbits, we are now able to
split C into two smaller classes Proof. Let R1 and R2 index two columns of MC,Q . That is, R1 and R2 are two different subsets of
Q. Let R0 be their symmetric difference. We have
C0 = {a ∈ C : ψ i (a) = x} and C1 = {a ∈ C : ψ i (a) = −x} .
The partition we obtain by splitting C into C1 and C2 is thus also valid. Of course, it is only MC,Q (a, R1 )MC,Q (a, R2 ) = sgn(ψ R1 (a))sgn(ψ R2 (a)) =
Y Y Y
useful if both sets are non-empty. sgn(ψ i (a)) sgn(ψ i (a)) = sgn(ψ i (a)) = sgn(ψ R0 (a)) = MC,Q (a, R0 ).
i∈R1 i∈R2 i∈R0
Finally, we consider vectors formed from products of eigenvectors. For R ⊆ {1, . . . , n}, define ψ R
to be the component-wise product of the ψ i for i ∈ R: As all the nonempty products of subsets of eigenvectors in Q are balanced on C, MC,Q (a, R0 ) is
Y positive for half the a ∈ C and negative for the other half. So,
ψ R (a) = ψ i (a).
i∈R
X X
MC,Q (:, R1 )T MC,Q (:, R2 ) = MC,Q (a, R1 )MC,Q (a, R2 ) = MC,Q (a, R0 ) = 0.
We say that the vector ψ R is unbalanced on class C if there is some value x for which a∈C a∈C
Let Cj be a balanced class. By definition, the product of every subset of eigenvectors is either
constant or balanced on Cj . We say that a subset of eigenvectors Q is independent on Cj if all
products of subsets of eigenvectors in Q are balanced on Cj (except for the empty product). In
particular, none of these eigenvectors is zero or constant on Cj . Construct a matrix MCj ,Q whose
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES311 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES312
their inner product: Apply f to map this subgroup of S to (Z/2)n , and let B be a n-by-log2 (|C|) matrix containing a
X X basis of the subspace in its columns. Any independent subset of log2 (|C|) rows of B will form a
sgn(ψ R (a)ψ R (b)) = sgn(ψ R (a)ψ R (b)) + sgn(ψ R∪{i} (a)ψ R∪{i} (b)) basis of the row-space, and is isomorphic to a base for C of the eigenvectors.
R⊆Q R⊆Q−{i}
X
= sgn(ψ R (a)ψ R (b)) + sgn(ψ R (a)ψ i (a)ψ R (b)ψ i (b))
R⊆Q−{i} 39.10 Algorithms
X
= sgn(ψ R (a)ψ R (b)) + sgn(ψ R (a)ψ R (b))sgn(ψ i (a)ψ i (b))
R⊆Q−{i} Let Cj be a balanced class. We just saw how to compute Aj , assuming that we know Cj and a
X base Q for it. Of course, by “compute” we mean computing a basis of f (Aj ). We now show how
= sgn(ψ R (a)ψ R (b)) − sgn(ψ R (a)ψ R (b)) to find a base for a balanced class Cj . We do this by building up a set Q of eigenvectors that are
R⊆Q−{i}
independent on Cj . To do this, we go through the eigenvectors in order. For each eigenvector ψ i ,
= 0. we must determine whether or not its values on Cj can be expressed as a product of eigenvectors
already present in Q. If it can be, then we record this product as part of the structure of Aj . If
not, we add i to Q.
Corollary 39.9.4. Let C be a balanced subset of vertices. Then the size of C is a power of 2. If The eigenvector ψ i is a product of eigenvectors in Q on Cj if and only if there is a constant γ and
Q is an independent set of eigenvectors on C, then |Q| ≤ log2 |C|. yh ∈ {0, 1} for h ∈ Q such that Y
ψ i (a) = γ (ψ h (a))yh ,
Proof. Let C be an orbit and let Q be a maximal set of eigenvectors that are independent on C. h∈Q
As the rows and columns of MC,Q are both orthogonal, MC,Q must be square. This implies that for all vertices a ∈ Cj . This happens if and only if
|C| = 2|Q| . If we drop the assumption that Q is maximal, we still know that all the columns of
Y
MC,Q are orthogonal. This matrix has 2|Q| columns. As they are vectors in |C| dimensions, there sgn(ψ i (a)) = sgn(ψ h (a))yh .
can be at most |C| of them. h∈Q
We can now describe the structure of a balanced subset of vertices C. We call a maximal set of We can tell whether or not these equations have a solution using linear algebra modulo 2. Let B
eigenvectors that are independent on C a base for C. Every other eigenvector j is either constant be the matrix over Z/2 such that
on C or becomes constant when multiplied by the product of some subset R of eigenvectors in Q. ψ i (a) = (−1)B(i,a) .
In either case, we can write Then, the above equations become
Y X
ψ j (a) = γ ψ i (a) for all a ∈ C, (39.1) B(i, a) = yh B(h, a) for all a ∈ Cj .
i∈R h∈Q
for some constant γ. Thus, we can solve for the coefficients yh in polynomial time, if they exist. If they do not, we add
Let v a (Q) denote the vector (v a (i))i∈Q —the restriction of the vector v a to the coordinates in Q. i to Q.
I claim that every one of the 2|Q| ± sign patterns of length |Q| must appear in exactly one of the Once we have determined a base Q and how to express on Cj the values of every other
vectors v q (Q). The reason is that there are |C| = 2|Q| of these vectors, and we established in eigenvector as a product of eigenvectors in Q, we have determine Aj .
Lemma 39.9.2 that v a (Q) ̸= v b (Q) for all a ̸= b in Q. Thus, for every diagonal ± matrix S Q of
dimension |Q|, we have It remains to explain how we partition the vertices into balanced classes. Consider applying the
{v a (Q)S Q : a ∈ C} = {v a (Q) : a ∈ C} . above procedure to a class Cj that is not balanced. We will discover that Cj is not balanced by
finding a product of eigenvectors that is neither constant nor balanced on Cj . Every time we add
That is, this set of vectors is stabilized by ±1 diagonal matrices.
an eigenvector ψ i to Q, we will examine every product of vectors in Q to check if any are
As equation (39.1) gives a formula for the value taken on C by every eigenvector not in Q in unbalanced on Cj . We can do this efficiently, because there are at most 2|Q| ≤ |Cj | such products
terms of the eigenvectors in Q, we have described the structure of the subgroup of S that to consider. As we have added ψ i to Q, none of the products of vectors in Q can be constant on
stabilizes C: the diagonals corresponding to Q are unconstrained, and every other diagonal is Cj . If we find a product that it not balanced on Cj , then it must also be non-constant, and thus
some product of these. This structure is something that you are used to seeing in subspaces. provide a way of splitting class Cj into two.
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES313
We can now summarize the entire algorithm. We first compute the partition by absolute values of
entries described in section 39.7. We then go through the classes of the partition one-by-one. For
each, we use the above procedure until we have either split it in two or we have determined that it
is balanced and we have computed its automorphism group. If we do split the class in two, we
refine the partition and start over. As the total number of times we split classes is at most n, this
algorithm runs in polynomial time.
After we have computed a partition into balanced classes and have computed their
Chapter 40
automorphisms groups, we combine them to find the automorphisms group of the entire graph as
described at the end of section 39.6.
40.1 Introduction
In the last lecture we saw how to test isomorphism of graphs in which every eigenvalue is distinct.
So, in this lecture we will consider the opposite case: graphs that only have 3 distinct eigenvalues.
These are the strongly regular graphs.
Our algorithm for testing isomorphism of these will not run in polynomial time. Rather, it takes
1/2
time nO(n log n) . This is at least much faster than the naive algorithm of checking all n! possible
permutations. In fact, this was the best known running time for general algorithms for graph
isomorphism until three years ago.
40.2 Definitions
These conditions are very strong, and it might not be obvious that there are any non-trivial
graphs that satisfy these conditions. Of course, the complete graph and disjoint unions of
314
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 315 CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 316
complete graphs satisfy these conditions. Before proceeding, I warn you that there is a standard 1. they are in the same row,
notation in the literature about strongly regular graphs, and I am trying not to use it. In this
literature, d becomes k, α becomes λ and β becomes µ. Many other letters are bound as well. 2. they are in the same column, or
For the rest of this lecture, we will only consider strongly regular graphs that are connected and 3. they hold the same number.
that are not the complete graph. I will now give you some examples.
So, such a graph has degree d = 3(n − 1). Any two nodes in the same row will both be neighbors
with every other pair of nodes in their row. They will have two more common neighors: the nodes
40.3 Paley Graphs and The Pentagon in their columns holding the other’s number. So, they have n common neighbors. The same
obviously holds for columns, and is easy to see for nodes that have the same number. So, every
pair of nodes that are neighbors have exactly α = n common neighbors.
The Paley graphs we encountered are strongly regular. The simplest of these is the pentagon. It
has parameters On the other hand, consider two vertices that are not neighbors, say (1, 1) and (2, 2). They lie in
n = 5, d = 2, α = 0, β = 1. different rows, lie in different columns, and we are assuming that they hold different numbers.
The vertex (1, 1) has two common neighbors of (2, 2) in its row: the vertex (1, 2) and the vertex
holding the same number as (2, 2). Similarly, it has two common neighbors of (2, 2) in its column.
40.4 Lattice Graphs Finally, we can find two more common neighbors of (2, 2) that are in different rows and columns
by looking at the nodes that hold the same number as (1, 1), but which are in the same row or
For a positive integer n, the lattice graph Ln is the graph with vertex set {1, . . . n}2 in which column as (2, 2). So, β = 6.
vertex (a, b) is connected to vertex (c, d) if a = c or b = d. Thus, the vertices may be arranged at
the points in an n-by-n grid, with vertices being connected if they lie in the same row or column.
Alternatively, you can understand this graph as the product of two complete graphs on n vertices. 40.6 The Eigenvalues of Strongly Regular Graphs
The parameters of this graph are:
We will consider the adjacency matrices of strongly regular graphs. Let A be the adjacency
d = 2(n − 1), α = n − 2, β = 2. matrix of a strongly regular graph with parameters (d, α, β). We already know that A has an
eigenvalue of d with multiplicity 1. We will now show that A has just two other eigenvalues.
To prove this, first observe that the (a, b) entry of A2 is the number of common neighbors of
40.5 Latin Square Graphs vertices a and b. For a = b, this is just the degree of vertex a. We will use this fact to write A2 as
a linear combination of A, I and J, the all 1s matrix. To this end, observe that the adjacency
A Latin square is an n-by-n grid, each entry of which is a number between 1 and n, such that no matrix of the complement of A (the graph with non-edges where A has edges) is J − I − A. So,
number appears twice in any row or column. For example,
A2 = αA + β(J − I − A) + dI = (α − β)A + βJ + (d − β)I.
1 2 3 4 1 2 3 4 1 2 3 4
4 1 2 3 2 1 4 3 3 For every vector v orthogonal to 1,
, , and 2 4 1
3 4 1 2 3 4 1 2 3 1 4 2
2 3 4 1 4 3 2 1 4 3 2 1 A2 v = (α − β)Av + (d − β)v .
are Latin squares. Let me remark that the number of different Latin squares of size n grows very So, every eigenvalue λ of A other than d satisfies
quickly—at least as fast as n!(n − 1)!(n − 2)! . . . 2!. Two Latin squares are said to be isomorphic if
λ2 = (α − β)λ + d − β.
there is a renumbering of their rows, columns, and entries, or a permutation of these, that makes
them the same. As this provides 6(n!)3 isomorphisms, and this is much less than the number of Thus, these are given by p
Latin squares, there must be many non-isomorphic Latin squares of the same size. The two of the α−β± (α − β)2 + 4(d − β)
Latin squares above are isomorphic, but one is not. λ= .
2
From such a Latin square, we construct a Latin square graph. It will have n2 nodes, one for each These eigenvalues are traditionally denoted r and s, with r > s. By convention, the multiplicty of
cell in the square. Two nodes are joined by an edge if the eigenvalue r is always denoted f , and the multiplicty of s is always denoted g.
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 317 CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 318
For example, for the pentagon we have Now, every vertex has its own unique label. If we were given another copy of this graph, we could
√ √ use these labels to determine the isomorphism between them. This procedure is called refinement,
5−1 5+1
r= , s=− . and it can be carried out until it stops producing new labels. However, it is clear that this
2 2
procedure will fail to produce unique labels if the graph has automorphisms, or if it is a regular
For the lattice graph Ln , we have graph. In these cases, we need a way to break symmetry.
r = n − 2, s = −2.
The procedure called individualization breaks symmetry arbitrarily. It chooses some nodes in the
For the Latin square graphs of order n, we have
graph, arbitrarily, to give their own unique names. Ideally, we pick one vertex to give a unique
r = n − 3, s = −3. name, and then refine the resulting labeling. We could then pick another troubling vertex, and
continue. We call a set of vertices S ⊂ V a distinguishing set if individualizing this set of nodes
One can prove that every connected regular graph whose adjacency (or Laplacian) matrix has just results in a unique name for every vertex, after refinement. How would we use a distinguishing set
three distinct eigenvalues is a strongly regular graph. to test isomorphism? Assume that S is a distinguishing set for G = (V, E). To test if H = (W, F )
is isomorphic to G, we could enumerate over every possible set of |S| vertices of W , and check if
they are a distinguishing set for H. If G and H are isomorphic, then H will also have an
40.7 Testing Isomorphism by Individualization and Refinement isomorphic distinguishing set that we can use to find an isomorphism between G and H. We
n
would have to check |S| sets, and try |S|! labelings for each, so we had better hope that S is
The problem of testing isomorphism of graphs is often reduced to the problem of giving each small.
vertex in a graph a unique name. If we have a way of doing this that does not depend upon the
initial ordering of the vertices, then we can use it to test graph isomorphism: find the unique
names of vertices in both graphs, and then see if it provides an isomorphism. For example, 40.8 Distinguishing Sets for Strongly Regular Graphs
consider the graph below.
We will now prove a result of Babai [Bab80] which says that every strongly regular graph has a
√
distinguishing set of size O( n log n). Babai’s result won’t require any refinement beyond naming
every vertex by the set of individualized nodes that are its neighbors. So, we will prove that a set
of nodes S is a distinguishing set by proving that for every pair of distinct vertices a and b, either
there is an s ∈ S that is a neighbor of a but not of b, or the other way around. This will suffice to
distinguish a and b. As our algorithm will work in a brute-force fashion, enumerating over all sets
of a given size, we merely need to show that such a set S exists. We will do so by proving that a
random set of vertices probably works.
We could begin by labeling every vertex by its degree.
I first observe that it suffices to consider strongly-regular graphs with d < n/2, as the complement
1 of a strongly regular graph is also a strongly regular graph (that would have been too easy to
2 3 assign as a homework problem). We should also observe that every strongly-regular graph has
√
3 diameter 2, and so d ≥ n − 1.
Lemma 40.8.1. Let G = (V, E) be a connected strongly regular graph with n vertices and degree
1 2 d < n/2. Then for every pair of vertices a and b, there are at least d/3 vertices that are neighbors
of a but not b.
The degrees distinguish between many nodes, but not all of them. We may refine this labeling by
appending the labels of every neighbor of a node. Before I prove this, let √
me show how we may use it to prove the theorem. This lemma tells us
that there are at least n − 1/3 nodes that are neighbors of a but not of b. Let T be the set of
1, {3}
2, {1,3} 3, {2, 2, 3} nodes that are neighbors of a but not neighbors of b. So, if we choose a vertex at random, the
3, {1, 2, 3} probability that it is in T is at least
√
|T | n−1 1
2, {3, 3} ≥ ≥ √ .
1, {2} n 3n 3 n+2
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 319 CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 320
√
If we choose a set S of 3 n + 2 ln n2 vertices at random, the probability that none of them is in T So,
is 3√n+2 ln n2 d − β ≤ 2(d − α − 1) =⇒ 2(α + 1) ≤ d + β. (40.2)
1 1
1− √ ≤ 2. This tells us that if α is close to d, then β is also.
3 n+2 n
We require one more relation between α and β. We obtain this relation by picking any vertex a,
So, the probability that a random set of this many nodes fails to distinguish all n2 pairs is at
and counting the pairs b, z such that b ∼ z, a ∼ b and a ̸∼ z.
most 1/2.
On the other hand, there are n − d − 1 nodes z that are not neighbors of a, and each of them has
β neighbors in common with a, giving
Combining, we find
(n − d − 1)β = d(d − α − 1). (40.3)
Clearly, every w that is a neighbor of a but not of b lies in either Z0 or Z1 . As z is neither a
neighbor of a nor of b, As d < n/2, this equation tells us
|Z0 | = |Z1 | = d − β.
d(d − α − 1) ≥ dβ =⇒ d − α − 1 ≥ β. (40.4)
So,
d − α − 1 ≤ 2(d − β) =⇒ 2β ≤ d + α + 1. (40.1) Adding inequality 40.1 to (40.4) gives
Thus, for every a ̸= b the number of vertices that are neighbors of a but not of b is at least
min(d − α − 1, d − β) ≥ d/3.
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 321
40.9 Notes
You should wonder if we can make this faster by analyzing refinement steps. In, [Spi96b], I
1/3
improved the running time bound to 2O(n log n) by analyzing two refinement phases. The
algorithm required us to handle certain special families of strongly regular graphs separately:
Latin square graphs and Steiner graphs. Algorithms for testing isomorphism of strongly regular
graphs were recently improved by Babai, Chen, Sun, Teng, and Wilmes [BCS+ 13, BW13, SW15].
The running times of all these algorithms are subsumed by that in Babai’s breakthrough
algorithm for testing graph isomorphism [Bab16].
Part VII
Interlacing Families
322
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 324
is then
d
X
Πi M ΠTi .
i=1
In today’s lecture, we will consider the expected characteristic polynomial of such a graph. For a
matrix M , we let
Chapter 41 def
χx (M ) = det(xI − M )
denote the characteristic polynomial of M in the variable x.
For simplicity, we will consider the expected polynomial of the sum of just two graphs. For
Expected Characteristic Polynomials generality, we will let them be any graphs, or any symmetric matrices.
Our goal for today is to prove that these expected polynomials are real rooted.
Theorem 41.2.1. Let A and B be symmetric n-by-n matrices and let Π be a uniform random
This Chapter Needs Editing permutation. Then,
T
EΠ χx (A + ΠBΠ )
has only real roots.
41.1 Overview
So that you will be surprised by this, I remind you that the sum of real rooted polynomials might
Over the next few lectures, we will see two different proofs that infinite families of bipartite have no real roots. For example, both (x − 2)2 and (x + 2)2 have only real roots, but their sum,
Ramanujan graphs exist. Both proofs will use the theory of interlacing polynomials, and will 2x2 + 8, has no real roots.
consider the expected characteristic polynomials of random matrices. In today’s lecture, we will Theorem 41.2.1 also holds for sums of many matrices. But, for simplicity, we restrict ourselves to
see a proof that some of these polynomials are real rooted. considering the sum of two.
At present, we do not know how to use these techniques to prove the existence of infinite families
of non-bipartite Ramanujan graphs.
41.3 Interlacing
The material in today’s lecture comes from [MSS15d], but the proof is inspired by the treatment
of that work in [HPS15].
Our first tool for establishing real rootedness of polynomials is interlacing.
If p(x) is a real rooted polynomial of degree n and q(x) is a real rooted polynomial of degree
41.2 Random sums of graphs n − 1, then we say that p and q interlace if p has roots λ1 ≥ λ2 ≥ · · · ≥ λn and q has roots
µ1 ≥ µ2 ≥ · · · ≥ µn−1 that satisfy
We will build Ramanujan graphs on n vertices of degree d, for every d and even n. We begin by
λ1 ≥ µ1 ≥ λ2 ≥ µ2 · · · ≥ λn−1 ≥ µn−1 ≥ λn .
considering a random graph on n vertices of degree d. When n is even, the most natural way to
generate such a graph is to choose d perfect matchings uniformly at random, and to then take
their sum. I should mention one caveat: some edge could appear in many of the matchings. In We have seen two important examples of interlacing in this class so far. A real rooted polynomial
this case, we add the weights of the corresponding edges together. So, the weight of an edge is the and its derivative interlace. Similarly, the characteristic polynomial of a symmetric matrix and
number of matchings in which it appears. the characteristic polynomial of a principal submatrix interlace.
Let M be the adjacency matrix of some perfect matching on n vertices. We can generate the When p and q have the same degree, we also say that they interlace if their roots alternate. But,
adjacency matrix of a random perfect matching by choosing a permutation matrix Π uniformly at now there are two ways in which their roots can do so, depending on which polynomial has the
random, and then forming ΠM ΠT . The sum of d independent uniform random perfect machings largest root. If
Yn Yn
p(x) = (x − λi ) and q(x) = (x − µi ),
i=1 i=1
323
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 325 CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 326
we write q → p if p and q interlace and for every i the ith root of p is at least as large as the ith Lemma 41.3.3. Let A be an n-dimensional symmetric matrix and let v be a vector. Let
root of q. That is, if
λ 1 ≥ µ1 ≥ λ 2 ≥ µ 2 ≥ · · · ≥ λ n ≥ µ n . pt (x) = χx (A + tv v T ).
Lemma 41.3.1. Let p and q be polynomials of degree n and n − 1 that interlace and have positive Then there is a degree n − 1 polynomial q(x) so that
leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is real rooted and
pt (x) = χx (A) − tq(x).
p(x) → pt (x).
Proof. Consider the case in which v = δ 1 . It suffices to consider this case as determinants, and
Proof Sketch. For simplicity, I consider the case in which all of the roots of p and q are distinct. thus characteristic polynomials, are unchanged by multiplication by rotation matrices.
One can prove the general case by dividing out the common repeated roots.
Then, we know that
To see that the largest root of pt is larger than λ1 , note that q(x) is positive for all x > µ1 , and χx (A + tδ 1 δ T1 ) = det(xI − A − tδ 1 δ T1 ).
λ1 > µ1 . So, pt (λ1 ) = p(λ1 ) − tq(λ1 ) < 0. As pt is monic, it is eventually positive and it must have
a root larger than λ1 . Now, the matrix tδ 1 δ T1 is zeros everywhere except for the element t in the upper left entry. So,
We will now show that for every i ≥ 1, pt has a root between λi+1 and λi . As this gives us d − 1 det(xI − A − tδ 1 δ T1 ) = det(xI − A) − t det(xI (1) − A(1) ) = χx (A) − tχx (A(1) ),
more roots, it accounts for all d roots of pt . For i odd, we know that q(λi ) > 0 and q(λi+1 ) < 0.
As p is zero at both of these points, pt (λi ) > 0 and pt (λi+1 ) < 0, which means that pt has a root where A(1) is the submatrix of A obtained by removing its first row and column.
between λi and λi+1 . The case of even i is similar.
We know that χx (A + tv v T ) is real rooted for all t, and we can easily show using the Courant
The converse of this theorem is also true. Fischer Theorem that for t > 0 it interlaces χx (A) from above. Lemmas 44.7.2 and 44.7.1 tell us
that these facts imply each other.
Lemma 41.3.2. Let p and q be polynomials of degree n and n − 1, and let pt (x) = p(x) − tq(x).
If pt is real rooted for all t ∈ IR, then p and q interlace. We need one other fact about interlacing polynomials.
Lemma 41.3.4. Let p0 (x) and p1 (x) be two degree n monic polynomials for which there is a third
Proof Sketch. Recall that the roots of a polynomial are continuous functions of its coefficients, and
polynomial r(x) that has the same degree as p0 and p1 and so that
thus the roots of pt are continuous functions of t. We will use this fact to obtain a contradiction.
For simplicity,1 I again just consider the case in which all of the roots of p and q are distinct. p0 (x) → r(x) and p1 (x) → r(x).
If p and q do not interlace, then p must have two roots that do not have a root of q between Then for all 0 ≤ s ≤ 1,
them. Let these roots of p be λi+1 and λi . Assume, without loss of generality, that both p and q def
ps (x) = sp1 (x) + (1 − s)p0 (x)
are positive between these roots. We now consider the behavior of pt for positive t.
is a real rooted polynomial.
As we have assumed that the roots of p and q are distinct, q is positive at these roots, and so pt is
negative at λi+1 and λi . If t is very small, then pt will be close to p in value, and so there must be Sketch. Assume for simplicity that all the roots of r are distinct. Let µ1 > µ2 > · · · > µn be the
some small t0 for which pt0 (x) > 0 for some λi+1 < x < λi . This means that pt0 must have two roots of r. Our assumptions imply that both p0 and p1 are positive at µi for odd i and negative
roots between λi+1 and λi . for even i. So, the same is true of their sum ps . This tells us that ps must have at least n − 1 real
As q is positive on the entire closed interval [λi+1 , λi ], when t is large pt will be negative on this roots.
entire interval, and thus have no roots inside. As we vary t between t0 and infinity, the two roots We can also show that ps has a root that is less than µn . One way to do it is to recall that the
at t0 must vary continuously and cannot cross λi+1 or λi . This means that they must become complex roots of a polynomial with real coefficients come in conjugate pairs. So, ps can not have
complex, contradicting our assumption that pt is always real rooted. only one complex root.
Together, Lemmas 44.7.2 and 44.7.1 are known as Obreschkoff’s Theorem [Obr63].
The following example will be critical.
1
I thank Sushant Sachdeva for helping me work out this particularly simple proof.
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 327 CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 328
Lemma 41.4.2. Let σ be such that for all symmetric matrices A and B,
41.5 Random Swaps
def
X
T
px (A, B) = σ(Π)χx (A + ΠBΠ )
Π∈S We will build a random permutation out of random swaps. A random swap is specified by
coordinates i and j and a swap probability s. It is a random matrix is that is equal to the identity
is real rooted. Then, for every 0 < s < 1 and pair of vectors u and v , for every symmetric A and with probability 1 − s and Γi,j with probability s. Let S be a random swap.
B the polynomial
(1 − s)px (A, B) + spx (A − uu T + v v T , B) In the language of random swaps, we can express Corollary 41.5.1 as follows.
is real rooted.
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 329
Corollary 41.5.1. Let Π be a random permutation matrix drawn from a distribution so that for
all symmetric matrices A and B,
T
E χx (A + ΠBΠ )
is real rooted. Let S be a random swap. Then,
T T
E χx (A + S ΠBΠ S )
Chapter 42
is real rooted for every symmetric A and B.
All that remains is to show that a uniform random permutation can be assembled out of random
swaps. The trick to doing this is to choose the random swaps with swap probabilities other than Quadrature for the Finite Free
1/2. If you didn’t do this, it would be impossible as there are n! permutations, which is not a
power of 2. Convolution
Lemma 41.5.2. For every n, there exists a finite sequence of random swaps S 1 , . . . , S k so that
S 1S 2 . . . S k
This Chapter Needs Editing
is a uniform random permutation.
Proof. We prove this by induction. We can generate a random permutation on 1, . . . , n by first 42.1 Overview
choosing which item maps to n, and then generating a random permutation on those that remain.
To this end, we first form a sequence that gives a random permtuation on the first n − 1 elements.
The material in today’s lecture comes from [MSS15d] and [MSS15a]. My goal today is to prove
We then compose this with a random swap that exchanges elements 1 and n with probability
simple analogs of the main quadrature results, and then give some indication of how the other
1 − 1/n. At this point, the element that maps to n will be uniformly random. We then compose
quadrature statements are proved. I will also try to explain what led us to believe that these
with yet another sequence that gives a random permutation on the first n − 1 elements.
results should be true.
Recall that last lecture we considered the expected characteristic polynomial of a random matrix
of the form A + ΠBΠT , where A and B are symmetric. We do not know a nice expression for
this expected polynomial for general A and B. However, we will see that there is a very nice
expression when A and B are Laplacian matrices or the adjacency matrices of regular graphs.
In Free Probability [Voi97], one studies operations on matrices in a large dimensional limit. These
matrices are determined by the moments of their spectrum, and thus the operations are
independent of the eigenvectors of the matrices. We consider a finite dimensional analog.
For n-dimensional symmetric matrices A and B, we consider the expected characteristic
polynomial
T
EQ∈O(n) χx (A + QBQ ) ,
where O(n) is the group of n-by-n orthonormal matrices, and Q is a random orthonormal matrix
chosen according to the Haar measure. In case you are not familiar with “Haar measure”, I’ll
quickly explain the idea. It captures our most natural idea of a random orthnormal matrix. For
330
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 331 CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 332
example, if A is a Gaussian random symmetric matrix, and V is its matrix of eigenvectors, then We describe this theorem as a quadrature result, because it obtains an integral over a continuous
V is a random orthonormal matrix chosen according to Haar measure. Formally, it is the space as a sum over a finite number of points.
measure that is invariant under group operations, which in this case are multiplication by
Before going in to the proof of the theorem, I would like to explain why one might think
orthnormal matrices. That is, the Haar measure is the measure under which for every S ⊆ O(n)
something like this could be true. The first answer is that it was a lucky guess. We hoped that
and P ∈ O(n), S has the same measure as {QP : Q ∈ S}.
this expectation would have a nice formula. The nicest possible formula would be a bi-linear map:
This expected characteristic polynomial does not depend on the eigenvectors of A and B, and a function that is linear in p when q is held fixed, and vice versa. So, we computed some examples
thus can be written as a function of the characteristic polynomials of these matrices. To see this, by holding B and q fixed and varying A. We then observed that the coefficients of the resulting
write A = V DV T and B = U C U T where U and V are the orthnormal eigenvectors matrices expected polynomial are in fact a linear functions of the coefficients of p. Once we knew this, it
and C and D are the diagonal matrices of eigenvalues. We have didn’t take too much work to guess the formula.
χx (V DV T + QU C U T Q T ) = χx (D + V T QU C U T Q T V ) = χx (D + (V T QU )C (V T QU )T ). I now describe the main quadrature result we will prove today. Let B(n) be the nth
hyperoctahedral group. This is the group of symmetries of the generalized octahedron in n
If Q is distributed according to the Haar measure on O(n), then so is V T QU . dimensions. It may be described as the set of matrices that can be written in the form DΠ, where
D is a diagonal matrix of ±1 entries and Π is a permutation. It looks like the family of
If p(x) and q(x) are the characteristic polynomials of A and B, then we define their finite free permutation matrices, except that both 1 and −1 are allowed as nonzero entries. B(n) is a
convolution to be the polynomial subgroup of O(n).
def
p(x) n q(x) = EQ∈O(n) χx (A + QBQ T ) . Theorem 42.2.3. For all symmetric matrices A and B,
T
T
In today’s lecture, we will establish the following formula for the finite free convolution. EQ∈O(n) χx (A + QBQ ) = EP∈B(n) χx (A + PBP ) .
Theorem 42.2.1. Let We will use this result to prove Theorem 42.2.1. The proof of Theorem 42.2.2 is similar to the
n
X n
X proof of Theorem 42.2.3. So, we will prove Theorem 42.2.3 and then explain the major differences.
p(x) = xn−i (−1)i ai and q(x) = xn−i (−1)i bi .
i=0 i=0
Then,
42.3 Quadrature
n
X X (n − i)!(n − j)!
p(x) n q(x) = xn−k (−1)k ai bj . (42.1) In general, quadrature formulas allow one to evaluate integrals of a family of functions over a
n!(n − i − j)!
k=0 i+j=k
fixed continuous domain by summing the values of those functions at a fixed number of points.
There is an intimate connection between families of orthogonal polynomials and quadrature
This convolution was studied by Walsh [Wal22], who proved that when p and q are real rooted, so formulae that we unfortunately do not have time to discuss.
is p n q.
The best known quadrature formula allows us to evalue the integral of a polynomial around the
Our interest in the finite free convolution comes from the following theorem, whose proof we will unit circle in the complex plane. For a polynomial p(x) of degree less than n,
also sketch today.
Z n−1
2π
1X
Theorem 42.2.2. Let A and B be symmetric matrices with constant row sums. If A1 = a1 and p(eiθ )dθ = p(ω k ),
B1 = b1, we may write their characteristic polynomials as θ=0 n
k=0
χx (A) = (x − a)p(x) and χx (B) = (x − b)q(x). where ω = e2πi/n is a primitive nth root of unity.
We then have We may prove this result by establishing it separately for each monomial. For p(x) = xk with
T
EΠ∈Sn χx (A + ΠBΠ ) = (x − (a + b))(p(x) n−1 q(x)). k ̸= 0, Z 2π Z 2π
p(eiθ )dθ = eiθk dθ = 0.
We know that 1 is an eigenvector of eigenvalue a + b of A + ΠBΠT for every permutation matrix θ=0 θ=0
Π. Once we work orthogonal to this vector, we discover the finite free convolution.
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 333 CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 334
And, for |k| < n, the corresponding sum is the sum of nth roots of unity distributed So, "Z # Z
symmetrically about the unit circle. So,
EP∈B(n) f (QP) = f (Q).
n−1 Q∈O(n) Q∈O(n)
X
jk
ω = 0. On the other hand, as B(n) is discrete we can reverse the order of integration to obtain
j=0
Z Z Z
We used this fact in the start of the semester when we computed the eigenvectors of the ring f (Q) = EP∈B(n) [f (QP)] = EP∈B(n) [f (P)] = EP∈B(n) [f (P)] ,
Q∈O(n) Q∈O(n) Q∈O(n)
graph and observed that all but the dominant are orthogonal to the all-1s vector.
where the second equality follows from Theorem 42.4.1.
On the other hand, for p(x) = 1 both the integral and sum are 1.
We will use an alternative approach to quadrature on groups, encapsulted by the following lemma.
P 42.5 Structure of the Orthogonal Group
Lemma 42.3.1. For every n and function p(x) = |k|<n ck xk , and every θ ∈ [0, 2π],
n
X n
X To prove Theorem 42.4.1, we need to know a little more about the orthogonal group. We divide
p(ei(2πj/n+θ) ) = p(ei(2πj/n) ). the orthonormal matrices into two types, those of determinant 1 and those of determinant −1.
j=0 j=0 The orthonormal matrices of determinant 1 form the special orthogonal group, SO(n), and every
matrix in O(n) may be written in the form DQ where Q ∈ SO(n) and D is a diagonal matrix in
This identity implies the quadrature formula above, and has the advantage that it can be which the first entry is ±1 and all others are 1. Every matrix in SO(n) may be expressed as a
experimentally confirmed by evaluating both sums for a random θ. product of 2-by-2 rotation matrices. That is, for every Q ∈ SO(n) there are matrices Q i,j for
1 ≤ i < j ≤ n so that Q i,j is a rotation in the span of δ i and δ j and so that
Proof. We again evaluate the sums monomial-by-monomial. For p(x) = xk , with |k| < n, we have
Q = Q 1,2 Q 1,3 · · · Q 1,n Q 2,3 · · · Q 2,n · · · Q n−1,n .
n
X n
X
i(2πj/n+θ) k iθk
(e ) =e (ei(2πj/n) )k . If you learned the QR-factorization of a matrix, then you learned an algorithm for computing this
j=0 j=0 decomposition.
For k ̸= 0, the latter sum is zero. For k = 0, eiθk = 1. These facts about the structure of O(n) tell us that it suffices to prove Theorem 42.4.1 for the
special cases in which Q = diag(−1, 1, 1, . . . , 1) and when Q is rotation of the plane spanned by δ i
and δ j . As the diagonal matrix is contained in B(n), the result is immediate in that case.
42.4 Quadrature by Invariance For simplicity, consider the case i = 1 and j = 2, and let Rθ denote the rotation by angle θ in the
first two coordinates:
cos θ sin θ 0
For symmetric matrices A and B, define the function def
Rθ = − sin θ cos θ 0 .
fA,B (Q) = det(A + QBQ T ). 0 0 I n−2
The hyperoctahedral group B(n) contains the matrices Rθ for θ ∈ {0, π/2, π, 3π/2}. As B(n) is a
We will derive Theorem 42.2.3 from the following theorem. group, for these θ we know
Theorem 42.4.1. For all Q ∈ O(n),
EP∈B(n) [f (P)] = EP∈B(n) [f (Rθ P)] ,
EP∈B(n) [f (P)] = EP∈B(n) [f (QP)] . as the set of matrices in the expectations are identical. This identity implies
3
Proof of Theorem 42.2.3. First, observe that it suffices to consider determinants. For every 1X
P ∈ B(n), we have EP∈B(n) fA,B (R2πj/4 P) = EP∈B(n) [f (P)] .
4
j=0
Z Z Z
det(A + QBQ T ) = f (Q) = f (QP).
Q∈O(n) Q∈O(n) Q∈O(n) We will prove the following lemma, and then show it implies Theorem 42.4.1.
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 335 CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 336
Lemma 42.5.1. For every symmetric A and B, and every θ The term eiθ only appears in the first row and column of this matrix, and the term e−iθ only
appears in the second row and column. As a determinant can be expressed as a sum of products
3 3
1X 1X of matrix entries with one in each row and column, it is immediate that this determinant can be
fA,B (Rθ+2πj/4 ) = fA,B (R2πj/4 ).
4 4 expressed in terms of ekiθ for |k| ≤ 4. As each such product can have at most 2 terms of the form
j=0 j=0
eiθ and at most two of the form e−iθ , we have |k| ≤ 2.
This lemma implies that for every Q 1,2 ,
The difference between Theorem 42.2.3 and Theorem 42.2.2 is that the first involves a sum over
EP∈B(n) [f (P)] = EP∈B(n) f (Q 1,2 P) . the isometries of hyperoctahedron, while the second involves a sum over the symmetries of the
regular n-simplex in n − 1 dimensions. The proof of the appropriate quadrature theorem for the
This, in turn, implies Theorem 42.4.1 and thus Theorem 42.2.3. symmetries of the regular simplex is very similar to the proof we just saw, except that rotations of
the plane through δ i and δ j are replaced by rotations of the plane parallel to the affine subspace
We can use Lemma 42.3.1 to derive Lemma 42.5.1 follows from the following.
spanned by triples of vertices of the simplex.
Lemma 42.5.2. For every symmetric A and B, there exist c−2 , c−1 , c0 , c1 , c2 so that
2
X 42.6 The Formula
fA,B (Rθ ) = ck (eiθ )k .
k=−2
To establish the formula in Theorem 42.2.1, we observe that it suffices to compute the formula for
Proof. We need to express f (Rθ ) as a function of eiθ . To this end, recall that diagonal matrices, and that Theorem 42.2.3 makes this simple. Every matrix in B(n) can be
written as a product ΠD where D is a ±1 diagonal matrix. If B is the diagonal matrix with
1 −i iθ entries µ1 , . . . , µn , then ΠDBDΠT = ΠBΠT , which is the diagonal matrix with entries
cos θ = (eiθ + e−iθ ) and sin θ = (e − e−iθ ).
2 2 µπ(1) , . . . , µπ(n) , where π is the permutation corresponding to Π.
From these identities, we see that all two-by-two rotation matrices can be simultaneously Let A be diagonal with entries λ1 , . . . , λn . For a subset S of {1, . . . , n}, define
diagonalized by writing iθ Y
cos θ sin θ
=U
e 0
U ∗, λS = λi .
− sin θ cos θ 0 e −iθ
i∈S
Now, examine As opposed to expanding this out, let’s just figure out how often the product λS µT appears. We
must have |T | = n − |S|, and then this term appears for each permutation such that π(T ) ∩ S = ∅.
fA,B (Rθ ) = det(A + Rθ BR∗θ ) This happens 1/ ni fraction of the time, giving the formula
= det(A + U n D θ U ∗n BU n D ∗θ U ∗n ) n n n
X 1 X S X X 1 X i!(n − i)!
= det(U ∗n AU n + D θ U ∗n BU n D ∗θ ) cn = n
λ µT =
n ai bn−i = ai bn−i .
n!
= det(U ∗n AU n D θ + D θ U ∗n BU n ). i
i=0 |S|=i i
|T |=n−i i=0 i=0
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 337
For general ck and i + j = k, we see that λS and µT appear whenever µ(T ) is disjoint from S.
The probability of this happening is
n−i
j (n − i)!(n − j)!j! (n − i)!(n − j)!
n
= = ,
j
n!(n − i − j)!j! n!(n − i − j)!
and so
X (n − i)!(n − j)!
Chapter 43
ck = ai bj .
n!(n − i − j)!
i+j=k
42.7 Question
Ramanujan Graphs of Every Size
For which discrete subgroups of O(n) does a result like Theorem 42.2.3 hold? Can it hold for a
substantially smaller subgroup than the symmetries of the simplex (which has size (n + 1)! in n
dimensions). This Chapter Needs Editing
43.1 Overview
We will mostly prove that there are Ramanujan graphs of every number of vertices and degree.
The material in today’s lecture comes from [MSS15d] and [MSS15a]. In those papers, we prove
that for every even n and degree d < n there is a bipartite Ramanujan graph of degree d on n
vertices. A bipartite Ramanujan graph of degree d is an approximation of a complete bipartite
√ matrix thus has eigenvalues d and −d, and all other eigenvalues bounded in
graph. It’s adjacency
absolute value by 2 d − 1.
The difference between this result and that which we prove today is that we will show that for
every d√< n there is a d-regular (multi) graph in whose second adjacency matrix eigenvalue is at
most 2 d − 1. This bound is sufficient for many applications of expanders, but not all. We will
not control the magnitude of the negative eigenvalues. The reason will simply be for simplicity:
the proofs to bound the negative eigenvalues would take more lectures.
Next week we will see a different technique that won’t produce a multigraph and that will
produce a bipartite Ramanujan graph.
We will consider the sum of d random perfect matchings on n vertices. This produces a d-regular
graph that might be a multigraph. Friedman [Fri08] proves that such a graph is probably very
close to being Ramanujan if n is big enough relative to d. In particular, he proves that for all d
and ϵ > 0 there is an n0 so that for all n > n0√
, such a graph will probably have all eigenvalues
other than µ1 bounded in absolute value by 2 d − 1 + ϵ. We remove the asymptotics and the ϵ,
but merely prove the existence of one such graph. We do not estimate the probability with which
338
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 339 CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 340
such a graph is Ramanujan. But, it is predicted to be a constant [?]. Lemma 43.3.1. Let p1 , . . . , pm be polynomials so that pi (x) → r(x), and let s1 , . . . , sm ≥ 0 be not
identically zero. Define
The fundamental difference between our technique and that of Friedman is that Friedman bounds m
X
the moments of the distribution of the eigenvalues of such a random graph. I suspect that there is p∅ (x) = si pi (x).
no true bound on these moments that would allow one to conclude that a random graph is i=1
probably Ramanujan. We consider the expected characteristic polynomial. Then, there is an i so that the largest root of pi (x) is at most the largest root of p∅ (x). In general,
for every j there is an i so that the jth largest root of pi (x) is at most the jth largest root of p∅ (x).
Let M be the adjacency matrix of a perfect matching, and let Π1 , . . . , Πd be independent uniform
random permutation matrices. We will consider the expected characteristic polynomial
Proof. We prove this for the largest root. The proof for the others is similar. Let λ1 and λ2 be
T T
EΠ1 ,...,Πd χx (Π1 M Π1 + · · · + Πd M Πd ) . the largest and second-largest roots of r(x). Each polynomial pi (x) has exactly one root between
λ1 and λ2 , and is positive at all x > λ1 . Now, let µ be the largest root of p∅ (x). We can see that
In Lecture 22, we learned that this polynomial is real rooted. In Lecture 23, we learned a µ must lie between λ1 and λ2 . We also know that
technique that allows us to compute√this polynomial. Today we will prove that the second largest X
root of this polynomial is at most 2 d − 1. First, we show why this matters: it implies that there pi (µ) = 0.
is some√choice of the matrices Π1 , . . . , Πd so that resulting polynomial has second largest root at i
most 2 d − 1. These matrices provide the desired graph. If pi (µ) = 0 for some i, then we are done. If not, there is an i for which pi (µ) > 0. As pi only has
one root larger than λ2 , and it is eventually positive, the largest root of pi must be less than µ.
43.3 Interlacing Families of Polynomials Our polynomials do not all have a common interlacing. However, they satisfy a property that is
just as useful: they form an interlacing family. We say that a set of polynomials p1 , . . . , pm forms
The general problem we face is the following. We have a large family of polynomials, say an interlacing family if there is a rooted tree T in which
p1 (x), . . . , pm (x), for which we know each pi is real-rooted and such that their sum is real rooted.
We would like to show that there is some polynomial pi whose largest root is at most the largest a. every leaf is labeled by some polynomial pi ,
root of the sum, or rather we want to do this for the second-largest root. This is not true in b. every internal vertex is labeled by a nonzero, nonnegative combination of its children, and
general. But, it is true in our case. We will show that it is true whenever the polynomials form
what we call an interlacing family. c. all siblings have a common interlacing.
Recall from Lecture 22 that we say that for monic degree n polynomials p(x) and r(x),
The last condition guarantees that every internal vertex is labeled by a real rooted polynomial.
p(x) → r(x) if the roots of p and r interlace, with the roots of r being larger. We proved that if
Note that the same label is allowed to appear at many leaves.
p1 (x) → r(x) and p2 (x) → r(x), then every convex combination of p1 and p2 is real rooted. If we
go through the proof, we will also see that for all 0 ≤ s ≤ 1, Lemma 43.3.2. Let p1 , . . . , pm be an interlacing family, let T be the tree witnessing this, and let
p∅ be the polynomial labeling the root of the tree. Then, for every j there exists an i for which the
sp2 (x) + (1 − s)p1 (x) → r(x). jth largest root of pi is at most the jth largest root of p∅ .
Proceeding by induction, we can show that if pi (x) → r(x) for each i, then every convex Proof. By Lemma 44.5.2, there is a child of the root whose label has a jth largest root that is
combination of these polynomials interlaces r(x), and is thus real rooted. That is, for every smaller than the jth largest root of p∅ . If that child is not a leaf, then we can proceed down the
s1 , . . . , sm so that si ≥ 0 (but not all are zero), tree until we reach a leaf, at each step finding a node labeled by a polynomial whose jth largest
X root is at most the jth largest root of the previous polynomial.
si pi (x) → r(x).
i
Our construction of permutations by sequences of random swaps provides the required interlacing
Polynomials that satisfy this condition are said to have a common interlacing. By a technique family.
analogous to the one we used to prove Lemma 22.3.2, one can prove that the polynomials
Theorem 43.3.3. For permutation matrices Π1 , . . . , Πd , let
p1 , . . . , pm have a common interlacing if and only if every convex combination of these
polynomials is real rooted. pΠ1 ,...,Πd (x) = χx (Π1 M ΠT1 + · · · + Πd M ΠTd ).
These polynomials form an interlacing family.
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 341 CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 342
We will finish this lecture by proving that the second-largest root of For a real rooted polynomial p, and thus for real λ1 , . . . , λd , it is the value of x that is larger than
all the λi for which Gp (x) = w. For w = ∞, it is the largest root of p. But, it is larger for finite w.
E [pΠ1 ,...,Πd (x)]
√ We will prove the following bound on the Cauchy transforms.
is at most 2 d − 1. This implies that there is a d-regular
√ multigraph on n vertices in our family
Theorem 43.4.1. For degree n polynomials p and q and for w > 0,
with second-largest adjacency eigenvalue at most 2 d − 1.
Kp nq (w) ≤ Kp (w) + Kq (w) − 1/w.
43.4 Root Bounds for Finite Free Convolutions For w = ∞, this says that the largest root of p n q is at most the sum of the largest roots of p
and q. But, this is obvious.
Recall from the last lecture that for n-dimensional symmetric matrices A and B with uniform
To explain the 1/w term in the above expression, consider q(x) = xn . As this is the characteristic
row sums a and b and characteristic polynomials (x − a)p(x) and (x − b)q(x),
polynomial of the all-zero matrix, p n q = p(x). We have
T
EΠ χx (A + ΠBΠ ) = (x − (a + b))p(x) n−1 q(x). 1 nxn−1 1
Gq (x) = = .
This formula extends to sums of many such matrices. It is easy to show that n xn x
So,
def
χx (M ) = (x − 1)n/2 (x + 1)n/2 = (x − 1)p(x), where p(x) = (x − 1)n/2−1 (x + 1)n/2 . Kq (w) = max {x : 1/x = w} = 1/w.
So, Thus,
def Kq (w) − 1/w = 0.
p∅ (x) = E [pΠ1 ,...,Πd (x)] = (x − d) (p(x) n−1 p(x) n−1 p(x) n−1 ··· n−1 p(x)) ,
where p(x) appears d times above. I will defer the proof of this theorme to next lecture (or maybe the paper [MSS15a]), and now just
show how we use it.
We would like to prove a bound on the largest root of this polynomial in terms of the largest
roots of p(x). This effort turns out not to be productive. To see why, consider matrices A = aI
and B = bI . It is clear that A + ΠBΠT = (a + b)I for every Π. This tells us that
43.5 The Calculation
(x − a)n (x − b)n = (x − (a + b))n .
For p(x) = (x − 1)n/2−1 (x + 1)n/2 ,
So, the largest roots can add. This means that if we are going to obtain useful bounds on the
roots of the sum, we are going to need to exploit facts about the distribution of the roots of p(x). 1 n/2 − 1 n/2 1 n/2 n/2
Gp (x) = + ≤ + ,
As in Lecture ??, we will use the barrier functions, just scaled a little differently. n−1 x−1 x+1 n x−1 x+1
For, for x ≥ 1. This latter expression is simple to evaluate. It is
n
Y
p(x) = (x − λi ), x
= Gχ(M ) (x) .
i=1 x2 − 1
define the Cauchy transform of p at x to be We also see that
Kp (w) ≤ Kχ(M ) (w) ,
d
X
1 1 1 p′ (x)
Gp (x) = = . for all w ≥ 0.
d x − λi d p(x)
i=1
Theorem 43.4.1 tells us that
For those who are used to Cauchy transforms, I remark that this is the Cauchy transform of the d−1
uniform distribution on the roots of p(x). As we will be interested in upper bounds on the Kp n−1 ··· p (w) ≤ dKp (w) − .
w
Cauchy transform, we will want a number u so that for all x > u, Gp (x) is less than some
specified value. That is, we want the inverse Cauchy transform, which we define to be Using the above inequality, we see that this is at most
d−1
Kp (w) = max {x : Gp (x) = w} . dKχ(M ) (w) − .
w
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 343 CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 344
As this is an upper bound on the largest root of p n−1 ··· n−1 p, we wish to set w to minimize From this, one can derive a formula that plays better with derivatives:
this expression. As, n
x 1 X
Gχ(M ) (x) = , p(x) q(x) = (n − i)!bi p(i) (x).
x2 − 1 n
n!
i=0
we have
x
Kχ(M ) (w) = x if and only if w = . This equation allows us to understand what happens when p and q have different degrees.
x2 − 1
So, Lemma 43.6.1. If p(x) has degree n and q(x) = xn−1 , then
d−1 x2 − 1
dKχ(M ) (w) − . ≤ dx − d − 1 . p(x) q(x) = ∂x p(x).
w x n
√
The choice of x that minimizes this is d − 1, at which point it becomes
For the special case of q(x) = xn−1 , we have
√ (d − 1)(d − 2) √ √ √
d d−1− √ = d d − 1 − (d − 2) d − 1 = 2 d − 1.
d−1 Uα q(x) = xn−1 − α(n − 1)xn−2 ,
so
43.6 Some explanation of Theorem 43.4.1 maxroot (Uα q(x)) = α(n − 1).
So, in this case (43.1) says
I will now have time to go through the proof of Theorem 43.4.1. So, I’ll just tell you a little about
it. We begin by transforming statements about the inverse Cauchy transform into statements maxroot (Uα ∂x p) ≤ maxroot (Uα p) + maxroot (Uα q) − nα = maxroot (Uα p) − α.
about the roots of polynomials.
1 p′ (x) The proof of Theorem 43.4.1 has two major ingredients. We begin by proving the above
As Gp (x) = d p(x) ,
1 ′ inequality. We then show that the extreme case for the inequality is when q(x) = (x − b)n for
Gp (x) = w ⇐⇒ p(x) − p (x) = 0. some b. To do this, we consider an arbitrary real rooted polynomial q, and then modify it to make
wd
two of its roots the same. This leads to an induction on degree, which is essentially handled by
This tells us that
the following result.
Kp (w) = maxroot p(x) − p′ (x)/wd = maxroot ((1 − (1/wd)∂x )p) . Lemma 43.6.2. If p(x) has degree n and the degree of q(x) is less than n, then
As this sort of operator appears a lot in the proof, we give it a name: 1
p n q= (∂x p) n−1 q.
n
Uα = 1 − α∂x .
The whose proof is fairly straightforward, and only requires 2 pages.
In this notation, Theorem 43.4.1 becomes
A + Π1 AΠT1 + Π2 AΠT2 ,
44.1 Overview
Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88] presented the first explicit
constructions of infinite families of Ramanujan graphs. These had degrees p + 1, for primes p.
There have been a few other explicit constructions, [Piz90, Chi92, JL97, Mor94], all of which
produce graphs of degree q + 1 for some prime power q. Over this lecture and the next we will
prove the existence of infinite families of bipartite Ramanujan of every degree. While today’s
proof of existence does not lend itself to an explicit construction, it is easier to understand than
the presently known explicit constructions.
We think that much stronger results should be true. There is good reason to think that random
d-regular graphs should be Ramanujan [MNS08]. And, Friedman [Fri08] showed √ that a random
d-regular graph is almost Ramanujan: for sufficiently large n such a graph is a 2 d − 1 + ϵ
approximation of the complete graph with high probability, for every ϵ > 0.
In today’s lecture, we will use the method of interlacing families of polynomials to prove (half) a
conjecture of Bilu and Linial [BL06] that every bipartite Ramanujan graph has a 2-lift that is also
Ramanujan. This theorem comes from [MSS15b], but today’s proof is informed by the techniques
of [HPS15]. We will use theorems about the matching polynomials of graphs that we will prove
next lecture.
In the same way that a Ramanujan graph approximates the complete graph, a bipartite
Ramanujan graph approximates a complete bipartite graph. We say that a d-regular graph is a
bipartite Ramanujan graph√ if all of its adjacency matrix eigenvalues, other than d and −d, have
absolute value at most 2 d − 1. The eigenvalue of d is a consequence of being d-regular and the
eigenvalue of −d is a consequence of being bipartite. In particular, recall that the adjacency
matrix eigenvalues of a bipartite graph are symmetric about the origin. This is a special case of
the following claim, which you can prove when you have a sparse moment.
346
CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 347 CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 348
Claim 44.1.1. The eigenvalues of a symmetric matrix of the form Theorem 44.2.1. Every d-regular graph √ G has a signed adjacency matrix S for which the
minimum eigenvalue of S is at least −2 d − 1.
0 A
T
A 0
We can use this theorem to build infinite families of bipartite √ Ramanujan graphs, because their
are symmetric about the origin. eigenvalues
√ are symmetric about the origin. Thus, if µ n ≥ −2 d − 1, then we know that
|µi | ≤ 2 d − 1 for all 1 < i < n. Note that every 2-lift of a bipartite graph is also a bipartite
We remark that one can derive bipartite Ramanujan graphs from ordinary Ramanujan graph.
graphs—just take the double cover. However, we do not know any way to derive ordinary
Ramanujan graphs from the bipartite ones.
As opposed to reasoning directly about eigenvalues, we will work with characteristic polynomials.
44.3 Random 2-Lifts
For a matrix M , we write its characteristic polynomial in the variable x as
We will prove Theorem 44.2.1 by considering a random 2-lift. In particular, we consider the
def
χx (M ) = det(xI − M ). expected characteristic polynomial of a random signed adjacency matrix S :
ES [χx (S )] . (44.1)
44.2 2-Lifts
Godsil and Gutman [GG81] proved that this is equal to the matching polynomial of G! We will
learn more about the matching polynomial next lecture.
We saw 2-lifts of graphs in Problem 3 from Problem Set 2:
For now, we just need the following bound on its zeros which was proved by Heilmann and Lieb
We define a signed adjacency matrix of G to be a symmetric matrix S with the same [HL72].
nonzero pattern as the adjacency matrix A, but such that each nonzero entry is either Theorem 44.3.1. The eigenvalues of the matching
1 or −1. √ polynomial of a graph of maximum degree at
most d are real and have absolute value at most 2 d − 1.
We will use it to define a graph GS . Like the double-cover, the graph GS will have
two vertices for every vertex of G and two edges for every edge of G. For each edge √
Now that we know that the smallest zero of (44.1) is at least −2 d − 1, all we need to do is to
(u, v) ∈ E, if S (u, v) = −1 then GS has the two edges show that there is some signed adjacency matrix whose smallest eigenvalue is at least this bound.
(u1 , v2 ) and (v1 , u2 ), This is not necessarily as easy as it sounds, because the smallest zero of the average of two
polynomials is not necessarily related to the smallest zeros of those polynomials. We will show
just like the double-cover. If S (u, v) = 1, then GS has the two edges that, in this case, it is.
You should check that G−A is the double-cover of G and that GA consists of two 44.4 Laplacianized Polynomials
disjoint copies of G.
Prove that the eigenvalues of the adjacency matrix of GS are the union of the Instead of directly reasoning about the characteristic polynomials of signed adjacency matrices S ,
eigenvalues of A and the eigenvalues of S . we will work with characteristic polynomials of dI − S . It suffices
√ for us to prove that there exists
an S for which the largest eigenvalue of dI − S is at most d + 2 d − 1.
The graphs GS that we form this way are called 2-lifts of G. Fix an ordering on the m edges of the graph, associate each S with a vector σ ∈ {±1}m , and
Bilu and Linial [BL06] conjectured that every d-regular graph G has a signed adjacency matrix S define
√ pσ (x) = χx (dI − S ).
so that ∥S ∥ ≤ 2 d − 1. This would give a simple procedure for constructing infinite families of
Ramanujan graphs. We would begin with any small d-regular Ramanujan graph, such as the The expected polynomial is the average of all these polynomials.
complete graph on d + 1 vertices. Then, given any d-regular Ramanujan graph we√ could construct
a new Ramanujan graph on twice as many vertices by using GS where ∥S ∥ ≤ 2 d − 1. We define two vectors for each edge in the graph. If the ith edge is (a, b), then we define
For every σ ∈ {±1}m , we have where A(1) is the submatrix of A obtained by removing its first row and column. The polynomial
m
X q(x) = χx (A(1) ) has degree n − 1.
v i,σi v Ti,σi = dI − S ,
i=1 For arbitrary, v , let Q be a rotation matrix for which Qv = δ 1 . As determinants, and thus
where S is the signed adjacency matrix corresponding to σ. So, for every σ ∈ {±1}m , characteristic polynomials, are unchanged by multiplication by rotation matrices,
!
Xm χx (A + tv v T ) = χx (Q(A + tv v T )Q T )
pσ (x) = χx v i,σi v Ti,σi .
= χx (QAQ T + tδ 1 δ T1 )) = χx (QAQ T ) − tq(x) = χx (A) − tq(x),
i=1
µn ≤ λn ≤ µn−1 ≤ · · · ≤ λ2 ≤ µ1 ≤ λ1 . If pi (λ) = 0 for some i, then we are done. If not, there is an i for which pi (λ) > 0. As pi only has
one zero larger than µ1 , and it is eventually positive, the largest zero of pi must be less than λ.
That is, if the zeros of p and r interlace, with the zeros of p being larger. We also make these
statements if they hold of positive multiples of p, r and q. If p1 , . . . , pm are polynomials such that there exists an r(x) for which r(x) → pi (x) for all i, then
The following lemma gives the examples of interlacing polynomials that motivate us. these polynomials are said to have a common interlacing. Such polynomials satisfy the natural
generalization of Lemma 44.5.2.
Lemma 44.5.1. Let A be a symmetric matrix and let v be a vector. For a real number t let
The polynomials pσ (x) do not all have a common interlacing. However, they satisfy a property
pt (x) = χx (A + tv v T ). that is just as useful: they form an interlacing family. Rather than defining these in general, we
will just explain the special case we need for today’s theorem.
Then, for t > 0, p0 (x) → pt (x) and there is a monic1 degree n − 1 polynomial q(x) so that for all t
We define polynomials that correspond to fixing the signs of the first k edges and then choosing
pt (x) = χx (A) − tq(x). the rest at random. We indicate these by shorter sequences σ ∈ {±1}k . For k < m and σ ∈ {±1}k
we define
def
pσ (x) = Eρ∈{±1}n−k [pσ,ρ (x)] .
Proof. The fact that p0 (x) → pt (x) for t > 0 follows from the Courant-Fischer Theorem.
So,
We first establish the existence of q(x) in the case that v = δ 1 . As the matrix tδ 1 δ T1 is zeros
p∅ (x) = Eσ∈{±1}m [pσ (x)] .
everywhere except for the element t in the upper left entry and the determinant is linear in each
entry of the matrix, We view the strings σ, and thus the polynomials pσ , as vertices in a complete binary tree. The
nodes with σ of length m are the leaves, and ∅ corresponds to the root. For σ of length less than
χx (A + tδ 1 δ T1 ) = det(xI − A − tδ 1 δ T1 ) = det(xI − A) − t det(xI (1) − A(1) ) = χx (A) − tχx (A(1) ), n, the children of σ are (σ, 1) and (σ, −1). We call such a pair of nodes siblings. We will
1
A monic polynomial is one whose leading coefficient is 1.
eventually prove in Lemma 44.6.1 that all the polynomials pσ (x) are real rooted and in Corollary
44.6.2 that every pair of siblings has a common interlacing.
CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 351 CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 352
But first, we show that this implies that there is a leaf indexed by σ ∈ {±1}m for which 44.7 Real Rootedness
λmax (pσ ) ≤ λmax (p∅ ).
√ To prove Lemma 44.6.1, we use the following two lemmas which are known collectively as
This implies Theorem 44.2.1, as we know from Theorem 44.3.1 that λmax (p∅ ) ≤ d + 2 d − 1. Obreschkoff’s Theorem [Obr63].
Lemma 44.5.3. There is a σ ∈ {±1}m for which Lemma 44.7.1. Let p and q be polynomials of degree n and n − 1, and let pt (x) = p(x) − tq(x).
If pt is real rooted for all t ∈ IR, then q interlaces p.
λmax (pσ ) ≤ λmax (p∅ ).
Proof Sketch. Recall that the roots of a polynomial are continuous functions of its coefficients, and
Proof. Corollary 44.6.2 and Lemma 44.5.2 imply that every non-leaf node in the tree has a child thus the roots of pt are continuous functions of t. We will use this fact to obtain a contradiction.
whose largest zero is at most the largest zero of that node. Starting at the root of the tree, we
find a node whose largest zero is at most the largest zero of p∅ . We then proceed down the tree For simplicity,2 I just consider the case in which all of the roots of p and q are distinct. If they are
until we reach a leaf, at each step finding a node labeled by a polynomial whose largest zero is at not, one can prove this by dividing out their common divisors.
most the largest zero of the previous polynomial. The leaf we reach, σ, satisfies the desired If p and q do not interlace, then p must have two roots that do not have a root of q between
inequality. them. Let these roots of p be λi+1 and λi . Assume, without loss of generality, that both p and q
are positive between these roots. We now consider the behavior of pt for positive t.
44.6 Common Interlacings As we have assumed that the roots of p and q are distinct, q is positive at these roots, and so pt is
negative at λi+1 and λi . If t is very small, then pt will be close to p in value, and so there must be
some small t0 for which pt0 (x) > 0 for some λi+1 < x < λi . This means that pt0 must have two
We can now use Lemmas 44.5.1 and 44.5.2 to show that every σ ∈ {±1}m−1 has a child (σ, s) for roots between λi+1 and λi .
which λmax (pσ,s ) ≤ λmax (pσ ). Let
m−1
X As q is positive on the entire closed interval [λi+1 , λi ], when t is large pt will be negative on this
A= v i,σi v Ti,σi . entire interval, and thus have no roots inside. As we vary t between t0 and infinity, the two roots
i=1 at t0 must vary continuously and cannot cross λi+1 or λi . This means that they must become
The children of σ, (σ, 1) and (σ, −1) have polynomials p(σ,1) and p(σ,−1) that equal complex, contradicting our assumption that pt is always real rooted.
χx (A + v m,1 v Tm,1 ) and χx (A + v m,−1 v Tm,−1 ). Lemma 44.7.2. Let p and q be polynomials of degree n and n − 1 that interlace and have positive
leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is real rooted and
By Lemma 44.5.1, χx (A) → χx (A + v m,s v Tm,s ) for s ∈ {±1}, and Lemma 44.5.2 implies that there
p(x) → pt (x).
is an s for which the largest zero of p(σ,s) is at most the largest zero of their average, which is pσ .
To extend this argument to nodes higher up in the tree, we will prove the following statement. Proof Sketch. For simplicity, I consider the case in which all of the roots of p and q are distinct.
One can prove the general case by dividing out the common repeated roots.
Lemma 44.6.1. Let A be a symmetric matrix and let w i,s be vectors for 1 ≤ i ≤ k and
s ∈ {0, 1}. Then the polynomial To see that the largest root of pt is larger than λ1 , note that q(x) is positive for all x > µ1 , and
! λ1 > µ1 . So, pt (λ1 ) = p(λ1 ) − tq(λ1 ) < 0. As pt is monic, it is eventually positive and it must have
X Xk
a root larger than λ1 .
χx A + w i,ρi w Ti,ρi
ρ∈{0,1}k i=1 We will now show that for every i ≥ 1, pt has a root between λi+1 and λi . As this gives us d − 1
more roots, it accounts for all d roots of pt . For i odd, we know that q(λi ) > 0 and q(λi+1 ) < 0.
is real rooted, and for each s ∈ {0, 1}, As p is zero at both of these points, pt (λi ) > 0 and pt (λi+1 ) < 0, which means that pt has a root
k−1
! k−1
! between λi and λi+1 . The case of even i is similar.
X X X X
χx A+ w i,ρi w Ti,ρi → χx A+ w i,ρi w Ti,ρi + w k,s w Tk,s .
Lemma 44.7.3. Let p0 (x) and p1 (x) be degree n monic polynomials for which there is a third
ρ∈{0,1}k i=1 ρ∈{0,1}k i=1
polynomial r(x) Such that
r(x) → p0 (x) and r(x) → p1 (x).
Corollary 44.6.2. For every k < n and σ ∈ {±1}k , the polynomials pσ,s (x) for s ∈ {±1} are real
2
rooted and have a common interlacing. I thank Sushant Sachdeva for helping me work out this particularly simple proof.
CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 353
Then
r(x) → (1/2)p0 (x) + (1/2)p1 (x),
and the latter is a real rooted polynomial.
Sketch. Assume for simplicity that all the roots of r are distinct and different from the roots of p0
and p1 . Let µn < µn−1 < · · · < µ1 be the roots of r. Our assumptions imply that both p0 and p1
are negative at µi for odd i and positive for even i. So, the same is true of their average. This Chapter 45
tells us that their average must have at least n − 1 real roots between µn and µ1 . As their average
is monic, it must be eventually positive and so must have a root larger than µ1 . That accounts for
all n of its roots.
The Matching Polynomial
Proof of Lemma 44.6.1. We prove this by induction on k. Assuming that we have proved it for
k − 1, we now prove it for k. Let u be any vector and let t ∈ IR. Define
k−1
!
X X
pt (x) = χx A + w i,ρi w Ti,ρi + tuu T .
i=1
45.1 Overview
ρ∈{0,1}k
By Lemma 44.5.1, we can express this polynomial in the form The coefficients of the matching polynomial of a graph count the numbers of matchings of various
pt (x) = p0 (x) − tq(x), sizes in that graph. It was first defined by Heilmann and Lieb [HL72], who proved that it has
where q has positive leading coefficient and degree n − 1. By absorbing tuu T into A we may use some amazing properties, including that it is real rooted. They also √
proved that all root of the
induction on k to show that pt (x) is real rooted for all t. Thus, Lemma 44.7.1 implies that q(x) matching polynomial of a graph of maximum degree d are at most 2 d − 1. In the next lecture,
interlaces p0 (x), and Lemma 44.7.2 tells us that for t > 0 we will use this fact to derive the existence of Ramanujan graphs.
p0 (x) → pt (x). Our proofs today come from a different approach to the matching polynomial that appears in the
work of Godsil
√[God93, God81]. My hope is that someone can exploit Godsil’s approach to
So, we may conclude that for every s ∈ {±1},
! ! connect
√ the 2 d − 1 bound from today’s lecture with that from last lecture. In today’s lecture,
X k−1
X X k−1
X 2 d − 1 appears as an upper bound on the spectral radius of a d-ary tree. Infinite d-ary trees
χx A + w i,ρi w Ti,ρi → χx A+ w i,ρi w Ti,ρi + w k,s w Tk,s . appear as the graphs of free groups in free probability. I feel like there must be a formal relation
ρ∈{0,1}k−1 i=1 ρ∈{0,1}k i=1
between these that I am missing.
So, Lemma 44.7.3 implies that
k−1
! k
!
X X X X
χx A+ w i,ρi w Ti,ρi → χx A+ w i,ρi w Ti,ρi 45.2 The Matching Polynomial
ρ∈{0,1}k−1 i=1 ρ∈{0,1}k i=1
and that the latter polynomial is real rooted. A matching in a graph G = (V, E) is a subgraph of G in which every vertex has degree 1. We say
that a matching has size k if it has k edges. We let
denote the number of matchings in G of size k. Throughout this lecture, we let |V | = n. Observe
The major open problem left by this work is establishing the existence of regular (non-bipartite) that m1 (G) is the number of edges in G, and that mn/2 (G) is the number of perfect matchings in
Ramanujan graphs. The reason we can not prove this using the techniques in this lecture is that G. By convention we set m0 (G) = 1, as the empty set is matching with no edges. Computing the
the interlacing techniques only allow us to reason about the largest or smallest eigenvalue of a number of perfect matchings is a #P -hard problem. This means that it is much harder than
matrix, but not both. solving N P -hard problems, so you shouldn’t expect to do it quickly on large graphs.
To see related papers establishing the existence of Ramanujan graphs, see [MSS15d, HPS15]. For
a survey on this and related material, see [MSS14].
354
CHAPTER 45. THE MATCHING POLYNOMIAL 355 CHAPTER 45. THE MATCHING POLYNOMIAL 356
This is a fundamental example of a polynomial that is defined so that its coefficients count Theorem 45.3.3. Let G be a tree. Then
something. When the “something” is interesting, the polynomial usually is as well.
µx [G] = χx (AG ).
µx [G ∪ H] = µx [G] µx [H] . We will prove that the only permutations that contribute to this sum are those for which
π(π(a)) = a for every a. And, these correspond to matchings.
Proof. Every matching in G ∪ H is the union of a matchings in G and a matching in H. Thus, If π is a permutation for which there is an a so that π(π(a)) ̸= a, then there are a = a1 , . . . , ak
k with k > 2 so that π(ai ) = ai+1 for 1 ≤ i < k, and π(ak ) = a1 . For this term to contribute, it
X
mk (G ∪ H) = µj (G)µk−j (H). must be the case that AG (ai , ai+1 ) = 1 for all i, and that AG (ak , a1 ) = 1. For k > 2, this would
j=0 be a cycle of length k in G. However, G is a tree and so cannot have a cycle.
The lemma follows. So, the only permutations that contribute are the involutions: the permutations π that are their
own inverse. An involution has only fixed points and cycles of length 2. Each cycle of length 2
that contributes a nonzero term corresponds to an edge in the graph. Thus, the number of
For a a vertex of G = (V, E), we write G − a for the graph G(V − {a}). This notation will prove
permutations with k cycles of length 2 is equal to the number of matchings with k edges. As the
very useful when reasoning about matching polynmomials. Fix a vertex a of G, and divide the
sign of an involution with k cycles of length 2 is (−1)k , the coefficient of xn−2k is (−1)k mk (G).
matchings in G into two classes: those that involve vertex a and those that do not. The number
of matchings of size k that do not involve a is mk (G − a). On the other hand, those that do
involve a connect a to one of its neighbors. To count these, we enumerate the neighbors b of a. A
matching of size k that includes edge (a, b) can be written as the union of (a, b) and a matching of 45.4 The Path Tree
size k − 1 in G − a − b. So, the number of matchings that involve a is
X Godsil proves that the matching polynomial of a graph is real rooted by proving that it divides
mk−1 (G − a − b). the matching polynomial of a tree. As the matching polynomial of a tree is the same as the
b∼a characteristic polynomial of its adjacency matrix, it is real rooted. Thus, the matching
polynomial of the graph is as well. The tree that Godsil uses is the path tree of G starting at a
So, X vertex of G. For a a vertex of G, the path tree of G starting at a, written Ta (G) is a tree whose
mk (G) = mk (G − a) + mk−1 (G − a − b).
vertices correspond to paths in G that start at a and do not contain any vertex twice. One path is
b∼a
connected to another if one extends the other by one vertex. For example, here is a graph and its
To turn this into a recurrence for µx [G], write path tree starting at a.
xn−2k (−1)k mk (G) = x · xn−1−2k (−1)k mk (G − a) − xn−2−2(k−1) (−1)k−1 mk−1 (G − a − b).
Let ab be the vertex in Ta (G) corresponding to the path from a to b. We also have
[ [
Ta (G) − a − ab = Tc (G − a) ∪ Tc (G − a − b)
c∼a,c̸=b c∼b,c̸=a
[
= Tc (G − a) ∪ (Tb (G − a) − b) .
c∼a,c̸=b
When G is a tree, Ta (G) is isomorphic to G.
which implies
Godsil’s proof begins by deriving a somewhat strange equality. Since I haven’t yet found a better
proof, I’ll take this route too. Y
µx [Ta (G) − a − ab] = µx [Tc (G − a)] µx [Tb (G − a) − b] .
Theorem 45.4.1. For every graph G and vertex a of G, c∼a,c̸=b
µx [G − a] µx [Ta (G) − a]
= . Thus,
µx [G] µx [Ta (G)] Q
µx [Ta (G) − a − ab] c∼a,c̸=b µx [Tc (G − a)] µx [Tb (G − a) − b]
The term on the upper-right hand side is a little odd. It is a forrest obtained by removing the = Q
µx [Ta (G) − a] c∼a µx [Tc (G − a)]
root of the tree Ta (G). We may write it as a disjoint union of trees as
µx [Tb (G − a) − b]
[ = .
Ta (G) − a = Tb (G − a). µx [Tb (G − a)]
b∼a
Plugging this in to (45.1), we obtain
Proof. If G is a tree, then the left and right sides are identical, and so the inequality holds. As µx [G] X µx [Ta (G) − a − ab]
the only graphs on less than 3 vertices are trees, the theorem holds for all graphs on at most 2 =x−
µx [G − a] µx [Ta (G) − a]
vertices. We will now prove it by induction on the number of vertices. b∼a
P
xµx [Ta (G) − a] − b∼a µx [Ta (G) − a − ab]
We may use Lemma 45.3.2 to expand the reciprocal of the left-hand side: =
µx [Ta (G) − a]
P X µx [G − a − b]
µx [G] xµx [G − a] − b∼a µx [G − a − b] µx [Ta (G)]
= =x− . = .
µx [G − a] µx [G − a] µx [G − a] µx [Ta (G) − a]
b∼a
Be obtain the equality claimed in the theorem by taking the reciprocals of both sides.
By applying the inductive hypothesis to G − a, we see that this equals
X µx [Tb (G − a) − b] Theorem 45.4.2. For every vertex a of G, the polynomial µx [G] divides the polynomial
x− . (45.1) µx [Ta (G)].
µx [Tb (G − a)]
b∼a
To simplify this expression, we examine these graphs carefully. By the observtion we made before Proof. We again prove this by induction on the number of vertices in G, using as our base case
the proof, graphs with at most 2 vertices. We then know, by induction, that for b ∼ a,
[
Tb (G − a) − b = Tc (G − a − b).
µx [G − a] divides µx [Tb (G − a)] .
c∼b,c̸=a
Similarly, [ As
Ta (G) − a = Tc (G − a), Ta (G) − a = ∪b∼a Tb (G − a),
c∼a
µx [Tb (G − a)] divides µx [Ta (G) − a] .
which implies Y
µx [Ta (G) − a] = µx [Tc (G − a)] . Thus,
c∼a
µx [G − a] divides µx [Ta (G) − a] ,
CHAPTER 45. THE MATCHING POLYNOMIAL 359
and so
µx [Ta (G) − a]
µx [G − a]
is a polynomial in x. To finish the proof, we apply Theorem 45.4.1, which implies
45.5 Root bounds [ABN+ 92] Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth.
Construction of asymptotically good low-rate error-correcting codes through
If every vertex of G has degree at most d, then the same is true of Ta (G). We showed
√ in pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509–516,
Theorem 4.2.4 that every eigenvalue of a tree of maximum degree d is at most 2 d − 1. When March 1992. 68, 249
combined with Theorem 45.4.2, this tells us that that matching polynomial of √ a graph with
maximum degree at most d has all of its roots bounded in absolute value by 2 d − 1. [ABN08] I. Abraham, Y. Bartal, and O. Neiman. Nearly tight low stretch spanning trees. In
Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer
Science, pages 781–790, Oct. 2008. 291
[AC88] Noga Alon and Fan Chung. Explicit construction of linear sized tolerant networks.
Discrete Mathematics, 72:15–19, 1988. 218
[AH77a] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable part i.
discharging. Illinois Journal of Mathematics, 21:429–490, 1977. 161
[AH77b] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable part ii.
reducibility. Illinois Journal of Mathematics, 21:491–567, 1977. 161
[AKPW95] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic
game and its application to the k-server problem. SIAM Journal on Computing,
24(1):78–100, February 1995. 288
[AKV02] Noga Alon, Michael Krivelevich, and Van H. Vu. On the concentration of eigenvalues
of random symmetric matrices. Israel Journal of Mathematics, 131(1):259–267, 2002.
189
[AM85] Noga Alon and V. D. Milman. λ1 , isoperimetric inequalities for graphs, and
superconcentrators. J. Comb. Theory, Ser. B, 38(1):73–88, 1985. 173
[AN12] Ittai Abraham and Ofer Neiman. Using petal-decompositions to build a low stretch
spanning tree. In Proceedings of the 44th Annual ACM Symposium on the Theory of
Computing (STOC ’12), pages 395–406, 2012. 288
[AR94] Noga Alon and Yuval Roichman. Random cayley graphs and expanders. Random
Structures & Algorithms, 5(2):271–284, 1994. 66
360
BIBLIOGRAPHY 361 BIBLIOGRAPHY 362
[AS06] A. Ashikhmin and V. Skachek. Decoding of expander codes at rates close to capacity. [BLR10] P. Biswal, J. Lee, and S. Rao. Eigenvalue bounds, spectral partitioning, and metrical
IEEE Transactions on Information Theory, 52(12):5475–5485, Dec. 2006. 239 deformations via flows. Journal of the ACM, 2010. to appear. 207
[AW02] R. Ahlswede and A. Winter. Strong converse for identification via quantum channels. [BMS93] Richard Beigel, Grigorii Margulis, and Daniel A. Spielman. Fault diagnosis in a small
Information Theory, IEEE Transactions on, 48(3):569–579, 2002. 256 constant number of parallel testing rounds. In SPAA ’93: Proceedings of the fifth
annual ACM symposium on Parallel algorithms and architectures, pages 21–29, New
[AZLO15] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and York, NY, USA, 1993. ACM. 218
regret minimization beyond matrix multiplicative updates. In Proceedings of the
Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages 237–245. [Bol86] Béla Bollobás. Combinatorics: set systems, hypergraphs, families of vectors, and
ACM, 2015. 268 combinatorial probability. Cambridge University Press, 1986. 48
[Bab79] László Babai. Spectra of cayley graphs. Journal of Combinatorial Theory, Series B, [BR97] R. B. Bapat and T. E. S. Raghavan. Nonnegative Matrices and Applications.
pages 180–189, 1979. 68 Number 64 in Encyclopedia of Mathematics and its Applications. Cambridge
University Press, 1997. 42
[Bab80] László Babai. On the complexity of canonical labeling of strongly regular graphs.
SIAM Journal on Computing, 9(1):212–216, 1980. 318 [BSS12] Joshua Batson, Daniel A Spielman, and Nikhil Srivastava. Twice-Ramanujan
sparsifiers. SIAM Journal on Computing, 41(6):1704–1721, 2012. 219, 261
[Bab81] László Babai. Moderately exponential bound for graph isomorphism. In
Fundamentals of Computation Theory, number 117 in Lecture Notes in Math, pages [BSS14] Joshua Batson, Daniel A Spielman, and Nikhil Srivastava. Twice-ramanujan
34–50. Springer-Verlag, Berlin-Heidelberg-New York, 1981. 304 sparsifiers. SIAM Review, 56(2):315–334, 2014. 261
[Bab16] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the [BW13] László Babai and John Wilmes. Quasipolynomial-time canonical form for steiner
forty-eighth annual ACM symposium on Theory of Computing, pages 684–697. ACM, designs. In Proceedings of the forty-fifth annual ACM symposium on Theory of
2016. 304, 321 computing, pages 261–270. ACM, 2013. 321
[Bar82] Earl R. Barnes. An algorithm for partitioning the nodes of a graph. SIAM Journal [BZ02] A. Barg and G. Zemor. Error exponents of expander codes. IEEE Transactions on
on Algebraic and Discrete Methods, 3(4):541–550, 1982. 201 Information Theory, 48(6):1725–1729, Jun 2002. 239
[BCS+ 13] László Babai, Xi Chen, Xiaorui Sun, Shang-Hua Teng, and John Wilmes. Faster [BZ05] A. Barg and G. Zemor. Concatenated codes: serial and parallel. IEEE Transactions
canonical forms for strongly regular graphs. In 2013 IEEE 54th Annual Symposium on Information Theory, 51(5):1625–1634, May 2005. 239
on Foundations of Computer Science, pages 157–166. IEEE, 2013. 321
[BZ06] A. Barg and G. Zemor. Distance properties of expander codes. IEEE Transactions on
[BGM82] László Babai, D Yu Grigoryev, and David M Mount. Isomorphism of graphs with Information Theory, 52(1):78–90, Jan. 2006. 239
bounded eigenvalue multiplicity. In Proceedings of the fourteenth annual ACM
symposium on Theory of computing, pages 310–324. ACM, 1982. 13, 304 [BZ13] Nick Bridle and Xiaojin Zhu. p-voltages: Laplacian regularization for semi-supervised
learning on high-dimensional data. In Eleventh Workshop on Mining and Learning
[BH01] Erik Boman and B. Hendrickson. On spanning tree preconditioners. Manuscript, with Graphs (MLG2013), 2013. 154
Sandia National Lab., 2001. 289
[CCL+ 15] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Efficient
[BL83] László Babai and Eugene M Luks. Canonical labeling of graphs. In Proceedings of the sampling for gaussian graphical models via spectral sparsification. In Peter Grünwald,
fifteenth annual ACM symposium on Theory of computing, pages 171–183. ACM, Elad Hazan, and Satyen Kale, editors, Proceedings of The 28th Conference on
1983. 304 Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages
364–390, Paris, France, 03–06 Jul 2015. PMLR. 128
[BL06] Yonatan Bilu and Nathan Linial. Lifts, discrepancy and nearly optimal spectral gap*.
Combinatorica, 26(5):495–519, 2006. 346, 347 [CFM94] F. R. K. Chung, V. Faber, and T. A. Manteuffel. On the diameter of a graph from
eigenvalues associated with its Laplacian. SIAM Journal on Discrete Mathematics,
[BLM15] Charles Bordenave, Marc Lelarge, and Laurent Massoulié. Non-backtracking 7:443–457, 1994. 283
spectrum of random graphs: community detection and non-regular ramanujan
graphs. arXiv preprint arXiv:1501.06087, 2015. 186
BIBLIOGRAPHY 363 BIBLIOGRAPHY 364
[CGP+ 18] Timothy Chu, Yu Gao, Richard Peng, Sushant Sachdeva, Saurabh Sawlani, and [DS91] Persi Diaconis and Daniel Stroock. Geometric bounds for eigenvalues of Markov
Junxing Wang. Graph sparsification, spectral sketches, and faster resistance chains. The Annals of Applied Probability, 1(1):36–61, 1991. 53
computation, via short cycle decompositions. arXiv preprint arXiv:1805.12051, 2018.
[Duf47] R. J. Duffin. Nonlinear networks. IIa. Bull. Amer. Math. Soc, 53:963–971, 1947. 152
260
[dV90] Colin de Verdière. Sur un nouvel invariant des graphes et un critère de planarité. J.
[CGW89] F. R. K. Chung, R. L. Graham, and R. M. Wilson. Quasi-random graphs.
Combin. Theory Ser. B, 50:11–21, 1990. 209
Combinatorica, 9(4):345–362, 1989. 64
[EEST08] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. Lower-stretch
[CH91] Joel E Cohen and Paul Horowitz. Paradoxical behaviour of mechanical and electrical
spanning trees. SIAM Journal on Computing, 32(2):608–628, 2008. 291
networks. 1991. 147
[Eli55] Peter Elias. Coding for noisy channels. IRE Conv. Rec., 3:37–46, 1955. 229
[Che70] J. Cheeger. A lower bound for smallest eigenvalue of the Laplacian. In Problems in
Analysis, pages 195–199, Princeton University Press, 1970. 173 [Erd47] Paul Erdös. Some remarks on the theory of graphs. Bulletin of the American
Mathematical Society, 53(4):292–294, 1947. 164
[Chi92] Patrick Chiu. Cubic Ramanujan graphs. Combinatorica, 12(3):275–285, 1992. 346
[Fan49] Ky Fan. On a theorem of weyl concerning eigenvalues of linear transformations i.
[Chu97] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997. 171
Proceedings of the National Academy of Sciences of the United States of America,
[CKM+ 14] Michael B. Cohen, Rasmus Kyng, Gary L. Miller, Jakub W. Pachocki, Richard Peng, 35(11):652, 1949. 32
Anup B. Rao, and Shen Chen Xu. Solving sdd linear systems in nearly mlog1/2n
[Fie73] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal,
time. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing,
23(98):298–305, 1973. 16
STOC ’14, pages 343–352, New York, NY, USA, 2014. ACM. 290, 303
[Fie75a] M. Fiedler. Eigenvectors of acyclic matices. Czechoslovak Mathematical Journal,
[CLM+ 15] Michael B Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng,
25(100):607–618, 1975. 196
and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of
the 2015 Conference on Innovations in Theoretical Computer Science, pages 181–190. [Fie75b] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its
ACM, 2015. 260 applications to graph theory. Czechoslovak Mathematical Journal, 25(100):618–633,
1975. 199
[CZ19] Eshan Chattopadhyay and David Zuckerman. Explicit two-source extractors and
resilient functions. Annals of Mathematics, 189(3):653 – 705, 2019. 164 [FK81] Z. Füredi and J. Komlós. The eigenvalues of random symmetric matrices.
Combinatorica, 1(3):233–241, 1981. 74, 81
[dCSHS11] Marcel K. de Carli Silva, Nicholas J. A. Harvey, and Cristiane M. Sato. Sparse sums
of positive semidefinite matrices. CoRR, abs/1107.0088, 2011. 268 [FK98] Uriel Feige and Joe Kilian. Zero knowledge and the chromatic number. Journal of
Computer and System Sciences, 57(2):187–199, 1998. 162
[DH72] W. E. Donath and A. J. Hoffman. Algorithms for partitioning graphs and computer
logic based on eigenvectors of connection matrices. IBM Technical Disclosure [Fri08] Joel Friedman. A Proof of Alon’s Second Eigenvalue Conjecture and Related
Bulletin, 15(3):938–944, 1972. 201 Problems. Number 910 in Memoirs of the American Mathematical Society. American
Mathematical Society, 2008. 218, 338, 346
[DH73] W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM
Journal of Research and Development, 17(5):420–425, September 1973. 201 [Fro12] Georg Frobenius. Über matrizen aus nicht negativen elementen. 1912. 39
[DK70] Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a [Gal63] R. G. Gallager. Low Density Parity-Check Codes. MIT Press, Cambridge, MA, 1963.
perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970. 189 238
[DKMZ11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. [Gee12] Jim Geelen. On how to draw a graph.
Asymptotic analysis of the stochastic block model for modular networks and its https://www.math.uwaterloo.ca/∼jfgeelen/Publications/tutte.pdf, 2012. 138
algorithmic applications. Physical Review E, 84(6):066106, 2011. 186
[GG81] C. D. Godsil and I. Gutman. On the matching polynomial of a graph. In L. Lovász
[Dod84] Jozef Dodziuk. Difference equations, isoperimetric inequality and transience of and Vera T. Sós, editors, Algebraic Methods in graph theory, volume I of Colloquia
certain random walks. Transactions of the American Mathematical Society, Mathematica Societatis János Bolyai, 25, pages 241–249. János Bolyai Mathematical
284(2):787–794, 1984. 173 Society, 1981. 348
BIBLIOGRAPHY 365 BIBLIOGRAPHY 366
[GGT06] Steven J Gortler, Craig Gotsman, and Dylan Thurston. Discrete one-forms on meshes [IZ89] R. Impagliazzo and D. Zuckerman. How to recycle random bits. In 30th annual IEEE
and applications to 3d mesh parameterization. Computer Aided Geometric Design, Symposium on Foundations of Computer Science, pages 248–253, 1989. 248
23(2):83–112, 2006. 138
[JL84] William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a
[Gil98] David Gillman. A chernoff bound for random walks on expander graphs. SIAM hilbert space. Contemporary mathematics, 26(189-206):1, 1984. 127
Journal on Computing, 27(4):1203–1220, 1998. 253
[JL97] Bruce W Jordan and Ron Livné. Ramanujan local systems on graphs. Topology,
[GLM99] S. Guattery, T. Leighton, and G. L. Miller. The path resistance method for bounding 36(5):1007–1024, 1997. 346
the smallest nontrivial eigenvalue of a Laplacian. Combinatorics, Probability and
Computing, 8:441–460, 1999. 53 [Kar72] Richard M. Karp. Reducibility among combinatorial problems. In Complexity of
Computer Computations: Proceedings of a symposium on the Complexity of
[GLSS18] Ankit Garg, Yin Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson
chernoff bound. In Proceedings of the 50th Annual ACM SIGACT Symposium on Research Center, Yorktown Heights, New York, pages 85–103, Boston, MA, 1972.
Theory of Computing, pages 1102–1114. ACM, 2018. 253 Springer US. 161
[GN08] C.D. Godsil and M.W. Newman. Eigenvalue bounds for independent sets. Journal of [Kel06] Jonathan A. Kelner. Spectral partitioning, eigenvalue bounds, and circle packings for
Combinatorial Theory, Series B, 98(4):721 – 734, 2008. 162 graphs of bounded genus. SIAM J. Comput., 35(4):882–902, 2006. 207
[God81] C. D. Godsil. Matchings and walks in graphs. Journal of Graph Theory, [KLL16] Tsz Chiu Kwok, Lap Chi Lau, and Yin Tat Lee. Improved Cheeger’s inequality and
5(3):285–297, 1981. 354 analysis of local graph partitioning using vertex expansion and expansion profile. In
Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete
[God93] Chris Godsil. Algebraic Combinatorics. Chapman & Hall, 1993. 52, 354
algorithms, pages 1848–1861. Society for Industrial and Applied Mathematics, 2016.
[Gol07] Oded Goldreich. Foundations of cryptography: volume 1, basic tools. Cambridge 178
university press, 2007. 248
[KLP12] Ioannis Koutis, Alex Levin, and Richard Peng. Improved spectral sparsification and
[Hal70] K. M. Hall. An r-dimensional quadratic placement algorithm. Management Science, numerical algorithms for sdd matrices. In STACS’12 (29th Symposium on Theoretical
17:219–229, 1970. 31 Aspects of Computer Science), volume 14, pages 266–277. LIPIcs, 2012. 260
[Har76] Sergiu Hart. A note on the edges of the n-cube. Discrete Mathematics, 14(2):157–163, [KLP+ 16] Rasmus Kyng, Yin Tat Lee, Richard Peng, Sushant Sachdeva, and Daniel A
1976. 48 Spielman. Sparsified cholesky and multigrid solvers for connection laplacians. In
Proceedings of the forty-eighth annual ACM symposium on Theory of Computing,
[Hås99] Johan Håstad. Clique is hard to approximate within n1−ϵ . Acta Mathematica, pages 842–850. ACM, 2016. 290
182(1):105 – 142, 1999. 162
[KLPT09] Jonathan A. Kelner, James Lee, Gregory Price, and Shang-Hua Teng. Higher
[HILL99] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. A
eigenvalues of graphs. In Proceedings of the 50th IEEE Symposium on Foundations of
pseudorandom generator from any one-way function. SIAM Journal on Computing,
Computer Science, 2009. 207
28(4):1364–1396, 1999. 248, 249
[KMP10] I. Koutis, G.L. Miller, and R. Peng. Approaching optimality for solving sdd linear
[HL72] Ole J Heilmann and Elliott H Lieb. Theory of monomer-dimer systems.
systems. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE
Communications in Mathematical Physics, 25(3):190–232, 1972. 348, 354
Symposium on, pages 235 –244, 2010. 292
[HL09] Mark Herbster and Guy Lever. Predicting the labelling of a graph via minimum
[KMP11] I. Koutis, G.L. Miller, and R. Peng. A nearly-mlogn time solver for sdd linear
p-seminorm interpolation. In Proceedings of the 2009 Conference on Learning Theory
systems. In Foundations of Computer Science (FOCS), 2011 52nd Annual IEEE
(COLT), 2009. 154
Symposium on, pages 590–598, 2011. 290, 292
[Hof70] A. J. Hoffman. On eigenvalues and colorings of graphs. In Graph Theory and its
[Kou14] Ioannis Koutis. Simple parallel and distributed algorithms for spectral graph
Applications, pages 79–92. Academic Press, New York, 1970. 162, 164
sparsification. In Proceedings of the 26th ACM Symposium on Parallelism in
[HPS15] Chris Hall, Doron Puder, and William F Sawin. Ramanujan coverings of graphs. Algorithms and Architectures, SPAA ’14, pages 61–66, New York, NY, USA, 2014.
arXiv preprint arXiv:1506.02335, 2015. 323, 346, 353 ACM. 260
BIBLIOGRAPHY 367 BIBLIOGRAPHY 368
[KS16] Rasmus Kyng and Sushant Sachdeva. Approximate gaussian elimination for [LSY18] Yang P Liu, Sushant Sachdeva, and Zejun Yu. Short cycles via low-diameter
laplacians-fast, sparse, and simple. In Foundations of Computer Science (FOCS), decompositions. arXiv preprint arXiv:1810.05143, 2018. 260
2016 IEEE 57th Annual Symposium on, pages 573–582. IEEE, 2016. 290
[LV99] László Lovász and Katalin Vesztergombi. Geometric representations of graphs. Paul
[Kur30] Casimir Kuratowski. Sur le probleme des courbes gauches en topologie. Fundamenta Erdos and his Mathematics, 1999. 138
mathematicae, 15(1):271–283, 1930. 133
[Mar88] G. A. Margulis. Explicit group theoretical constructions of combinatorial schemes
[LM82] F. Tom Leighton and Gary Miller. Certificates for graphs with distinct eigenvalues. and their application to the design of expanders and concentrators. Problems of
Manuscript, 1982. 304 Information Transmission, 24(1):39–46, July 1988. 18, 67, 221, 249, 346
[LM00] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional [Mas14] Laurent Massoulié. Community detection thresholds and the weak ramanujan
by model selection. Annals of Statistics, pages 1302–1338, 2000. 129 property. In Proceedings of the 46th Annual ACM Symposium on Theory of
Computing, pages 694–703. ACM, 2014. 186
[LMSS01] Michael G Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, and Daniel A
Spielman. Efficient erasure correcting codes. IEEE Transactions on Information [McS01] F. McSherry. Spectral partitioning of random graphs. In FOCS ’01: Proceedings of
Theory, 47(2):569–584, 2001. 239 the 42nd IEEE symposium on Foundations of Computer Science, page 529, 2001. 186
[Lov01] Làszlò Lovàsz. Steinitz representations of polyhedra and the Colin de Verdière [Mil51] William Millar. Cxvi. some general theorems for non-linear systems possessing
number. Journal of Combinatorial Theory, Series B, 82(2):223 – 236, 2001. 15, 210 resistance. Philosophical Magazine, 42(333):1150–1160, 1951. 155
[LPS88] A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, [MNS08] Steven J. Miller, Tim Novikoff, and Anthony Sabelli. The distribution of the largest
8(3):261–277, 1988. 18, 67, 221, 249, 346 nontrivial eigenvalues in families of random regular graphs. Experiment. Math.,
17(2):231–244, 2008. 346
[LPS15] Yin Tat Lee, Richard Peng, and Daniel A. Spielman. Sparsified cholesky solvers for
SDD linear systems. CoRR, abs/1506.08204, 2015. 260, 303 [MNS14] Elchanan Mossel, Joe Neeman, and Allan Sly. Belief propagation, robust
reconstruction and optimal recovery of block models. In Proceedings of The 27th
[LR97] John D. Lafferty and Daniel N. Rockmore. Spectral techniques for expander codes. In Conference on Learning Theory, pages 356–370, 2014. 186
STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of
computing, pages 160–167, New York, NY, USA, 1997. ACM. 239 [Mor94] M. Morgenstern. Existance and explicit constructions of q + 1 regular Ramanujan
graphs for every prime power q. Journal of Combinatorial Theory, Series B,
[LRT79] Richard J Lipton, Donald J Rose, and Robert Endre Tarjan. Generalized nested 62:44–62, 1994. 346
dissection. SIAM journal on numerical analysis, 16(2):346–358, 1979. 271
[MSS14] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Ramanujan graphs
[LS88] Gregory F. Lawler and Alan D. Sokal. Bounds on the l2 spectrum for Markov chains and the solution of the Kadison-Singer problem. In Proceedings of the International
and Markov processes: A generalization of Cheeger’s inequality. Transactions of the Congress of Mathematicians, 2014. 268, 353
American Mathematical Society, 309(2):557–580, 1988. 173
[MSS15a] A. W. Marcus, D. A. Spielman, and N. Srivastava. Finite free convolutions of
[LS90] L. Lovàsz and M. Simonovits. The mixing rate of Markov chains, an isoperimetric polynomials. arXiv preprint arXiv:1504.00350, April 2015. 330, 338, 342
inequality, and computing the volume. In IEEE, editor, Proceedings: 31st Annual
Symposium on Foundations of Computer Science: October 22–24, 1990, St. Louis, [MSS15b] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing families I:
Missouri, volume 1, pages 346–354, 1109 Spring Street, Suite 300, Silver Spring, MD Bipartite Ramanujan graphs of all degrees. Ann. of Math., 182-1:307–325, 2015. 346
20910, USA, 1990. IEEE Computer Society Press. 139
[MSS15c] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing families II:
[LS98] Làszlò Lovàsz and Alexander Schrijver. A borsuk theorem for antipodal links and a Mixed characteristic polynomials and the Kadison-Singer problem. Ann. of Math.,
spectral characterization of linklessly embeddable graphs. Proceedings of the 182-1:327–350, 2015. 268
American Mathematical Society, 126(5):1275–1285, 1998. 209
[MSS15d] Adam W Marcus, Nikhil Srivastava, and Daniel A Spielman. Interlacing families IV:
[LS15] Yin Tat Lee and He Sun. Constructing linear-sized spectral sparsification in Bipartite Ramanujan graphs of all sizes. arXiv preprint arXiv:1505.08010, 2015.
almost-linear time. arXiv preprint arXiv:1508.03261, 2015. 268 appeared in Proceedings of the 56th IEEE Symposium on Foundations of Computer
Science. 323, 330, 338, 353
BIBLIOGRAPHY 369 BIBLIOGRAPHY 370
[Nil91] A. Nilli. On the second eigenvalue of a graph. Discrete Math, 91:207–210, 1991. 221 [Spi96a] D.A. Spielman. Linear-time encodable and decodable error-correcting codes. IEEE
Transactions on Information Theory, 42(6):1723–1731, Nov 1996. 239
[Obr63] Nikola Obrechkoff. Verteilung und berechnung der Nullstellen reeller Polynome. VEB
Deutscher Verlag der Wissenschaften, Berlin, 1963. 325, 352 [Spi96b] Daniel A. Spielman. Faster isomorphism testing of strongly regular graphs. In STOC
’96: Proceedings of the twenty-eighth annual ACM symposium on Theory of
[Per07] Oskar Perron. Zur theorie der matrices. Mathematische Annalen, 64(2):248–263, computing, pages 576–584, New York, NY, USA, 1996. ACM. 321
1907. 39
[SS96] M. Sipser and D.A. Spielman. Expander codes. IEEE Transactions on Information
[Piz90] Arnold K Pizer. Ramanujan graphs and Hecke operators. Bulletin of the AMS, 23(1), Theory, 42(6):1710–1722, Nov 1996. 239
1990. 346
[SS11] D.A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM
[PP03] Claude M Penchina and Leora J Penchina. The braess paradox in mechanical, traffic, Journal on Computing, 40(6):1913–1926, 2011. 255, 260
and other networks. American Journal of Physics, 71:479, 2003. 147
[ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph
[PS14] Richard Peng and Daniel A. Spielman. An efficient parallel solver for SDD linear partitioning, graph sparsification, and solving linear systems. In Proceedings of the
systems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, thirty-sixth annual ACM Symposium on Theory of Computing, pages 81–90, 2004.
May 31 - June 03, 2014, pages 333–342, 2014. 290, 297 Full version available at http://arxiv.org/abs/cs.DS/0310051. 178
[PSL90] A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with [ST07] Daniel A. Spielman and Shang-Hua Teng. Spectral partitioning works: Planar graphs
eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11:430–452, 1990. 201 and finite element meshes. Linear Algebra and its Applications, 421:284–305, 2007.
201, 207
[RSU01] Thomas J Richardson, Mohammad Amin Shokrollahi, and Rüdiger L Urbanke.
Design of capacity-approaching irregular low-density parity-check codes. IEEE [ST13] Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massive
transactions on information theory, 47(2):619–637, 2001. 239 graphs and its application to nearly linear time graph partitioning. SIAM Journal on
Computing, 42(1):1–26, 2013. 178
[RU08] Tom Richardson and Rüdiger Urbanke. Modern coding theory. Cambridge university
press, 2008. 239 [ST14] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for
preconditioning and solving symmetric, diagonally dominant linear systems. SIAM.
[Rud99] M. Rudelson. Random vectors in the isotropic position,. Journal of Functional
J. Matrix Anal. & Appl., 35:835–885, 2014. 278, 290, 292
Analysis, 164(1):60 – 72, 1999. 255, 256
[SW09] Daniel A. Spielman and Jaeoh Woo. A note on preconditioning by low-stretch
[RV07] Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach
spanning trees. CoRR, abs/0903.2816, 2009. Available at
through geometric functional analysis. J. ACM, 54(4):21, 2007. 256
http://arxiv.org/abs/0903.2816. 290
[RVW02] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph
[SW15] Xiaorui Sun and John Wilmes. Faster canonical forms for primitive coherent
product, and new constant-degree expanders. Annals of Mathematics,
configurations. In Proceedings of the forty-seventh annual ACM symposium on
155(1):157–187, 2002. 240, 247
Theory of computing, pages 693–702. ACM, 2015. 321
[Sen06] Eugene Seneta. Non-negative matrices and Markov chains. Springer Science &
[Tan81] R. Michael Tanner. A recursive approach to low complexity codes. IEEE
Business Media, 2006. 42
Transactions on Information Theory, 27(5):533–547, September 1981. 238
[Sha48] Claude Elwood Shannon. A mathematical theory of communication. Bell system
[Tan84] R. Michael Tanner. Explicit concentrators from generalized n-gons. SIAM Journal
technical journal, 27(3):379–423, 1948. 229
Alg. Disc. Meth., 5(3):287–293, September 1984. 220
[Sim91] Horst D. Simon. Partitioning of unstructured problems for parallel processing.
[Tre09] Luca Trevisan. Max cut and the smallest eigenvalue. In STOC ’09: Proceedings of
Computing Systems in Engineering, 2:135–148, 1991. 201
the 41st annual ACM symposium on Theory of computing, pages 263–272, 2009. 17
[SJ89] Alistair Sinclair and Mark Jerrum. Approximate counting, uniform generation and
[Tre11] Luca Trevisan. Lecture 4 from cs359g: Graph partitioning and expanders, stanford
rapidly mixing Markov chains. Information and Computation, 82(1):93–133, July
university, January 2011. available at
1989. 53, 173
http://theory.stanford.edu/ trevisan/cs359g/lecture04.pdf. 173
BIBLIOGRAPHY 371
[Tro12] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of
Computational Mathematics, 12(4):389–434, 2012. 192, 255, 256
[Tut63] W. T. Tutte. How to draw a graph. Proc. London Mathematical Society, 13:743–768,
1963. 17, 131
Index
[Vai90] Pravin M. Vaidya. Solving linear equations with symmetric diagonally dominant
matrices by constructing good preconditioners. Unpublished manuscript UIUC 1990.
A talk based on the manuscript was presented at the IMA Workshop on Graph
α, 161 GofS, 37
Theory and Sparse Matrix Computation, October 1991, Minneapolis., 1990. 287, 292
χ, 161 graphpgeq, 54
[van95] Hein van der Holst. A short proof of the planarity characterization of Colin de δ, 4
Verdière. Journal of Combinatorial Theory, Series B, 65(2):269 – 272, 1995. 212 µ1 , 35 hamming weight, 226
dave , 35 hypercube, 3
[Var85] N. Th. Varopoulos. Isoperimetric inequalities and Markov chains. Journal of dmax , 35
Functional Analysis, 63(2):215 – 239, 1985. 173 independence number, 161
approximation of graphs, 55 isoperimetric ratio, 44, 168
[Ver10] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices.
arXiv preprint arXiv:1011.3027, 2010. 192 bdry, 44 Laplacian, 5
bdry2, 168 lazy random walk, 4, 93
[Vis12] Nisheeth K. Vishnoi. Lx = b, 2012. available at boundary, 44, 168 lazy walk matrix, 93
http://research.microsoft.com/en-us/um/people/nvishno/Site/Lxb-Web.pdf. linear codes, 229
277 Cauchy’s Interlacing Theorem, 37 LLG, 5
Cayley graph, 61 Loewner partial order, 53
[Voi97] Dan V Voiculescu. Free probability theory. American Mathematical Society, 1997. 330
cdotG, 54
centered vector, 173 matrix norm, 126, 277
[Vu07] Van Vu. Spectral norm of random matrices. Combinatorica, 27(6):721–736, 2007. 74,
Characteristic Polynomial, 20 MMG, 3
77, 81, 188
chiG, 38 MofS, 37
[Vu14] Van Vu. A simple svd algorithm for finding hidden partitions. arXiv preprint chromatic number, 38, 161 mu, 35
arXiv:1404.3918, 2014. 186 coloring, 38
n, 3
combinatorial degree, 4
[Wal22] JL Walsh. On the location of the roots of certain types of polynomials. Transactions NN, 171
Conductance, 170
of the American Mathematical Society, 24(3):163–180, 1922. 331 normalized adjacency matrix, 93
Courant-Fischer Theorem, 21
normalized Laplacian, 96, 171
[Wig58] Eugene P Wigner. On the distribution of the roots of certain symmetric matrices. normInf, 36
dd, 4
Ann. Math, 67(2):325–327, 1958. 69 nui, 171
ddelta, 4
[Wil67] Herbert S. Wilf. The eigenvalues of a graph and its chromatic number. J. London DDG, 4
ooneS, 38
math. Soc., 42:330–332, 1967. 34, 38 ddhalf, 171
orthogonal matrix, 19, 22
degree, 4
[Zem01] G. Zemor. On expander codes. IEEE Transactions on Information Theory, delta, 29 Paley graph, 62
47(2):835–837, Feb 2001. 239 diffusion matrix, 4 path, 3
dilation, 42 path graph, 6
[ZKT85] V. M. Zemlyachenko, N. M. Kornienko, and R. I. Tyshkevich. Graph isomorphism
problem. Journal of Soviet Mathematics, 29:1426–1481, 1985. 304 permutation matrix, 19
E, 2
Perron vector, 34
Fiedler value, 16 pgeq, 53
floor, 38 positive definite, 6
positive semidefinite, 6
372
INDEX 373
Rayleigh quotient, 21
regular, 4
ring, 3
theta, 44
theta2, 168
trace, 20
V, 2
vertex-induced subgraph, 37
walk matrix, 4, 93
weighted degree, 4
WWG, 4
WWtil, 4