0% found this document useful (0 votes)
94 views199 pages

Sagt (N-Up 2x1)

The document is a draft of a book on Spectral and Algebraic Graph Theory by Daniel A. Spielman, detailing various topics related to the combinatorial and algebraic properties of graphs. It includes a comprehensive chapter list, covering subjects such as eigenvalues, graph drawing, and algorithms, along with a preface explaining the author's intent and background. The book is based on lecture notes from courses taught at Yale University and aims to provide intuition and understanding of the material.

Uploaded by

Tăng Hà
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views199 pages

Sagt (N-Up 2x1)

The document is a draft of a book on Spectral and Algebraic Graph Theory by Daniel A. Spielman, detailing various topics related to the combinatorial and algebraic properties of graphs. It includes a comprehensive chapter list, covering subjects such as eigenvalues, graph drawing, and algorithms, along with a preface explaining the author's intent and background. The book is based on lecture notes from courses taught at Yale University and aims to provide intuition and understanding of the material.

Uploaded by

Tăng Hà
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 199

Chapter List

Preface v

Contents vi
Spectral and Algebraic Graph Theory
Incomplete Draft, dated February 6, 2025 Notation xxii
Current version available at http://cs-www.cs.yale.edu/homes/spielman/sagt.

I Introduction and Background 1


Daniel A. Spielman
Yale University 1 Introduction 2

2 Eigenvalues and Optimization: The Courant-Fischer Theorem 21


Copyright ©2025 by Daniel A. Spielman. All rights reserved.
3 The Laplacian and Graph Drawing 29

4 Adjacency matrices, Eigenvalue Interlacing, and the Perron-Frobenius Theorem 34

II The Zoo of Graphs 43

5 Fundamental Graphs 44

6 Comparing Graphs 53

7 Cayley Graphs 61

8 Eigenvalues of Random Graphs 69

9 Strongly Regular Graphs 83

i
CHAPTER LIST ii CHAPTER LIST iii

III Physical Metaphors 91 V Expander Graphs 215

10 Random Walks on Graphs 92 27 Properties of Expander Graphs 216

11 Walks, Springs, and Resistor Networks 102 28 A brief introduction to Coding Theory 225

12 Effective Resistance and Schur Complements 110 29 Expander Codes 233

13 Random Spanning Trees 119 30 A simple construction of expander graphs 240

14 Approximating Effective Resistances 126 31 PSRGs via Random Walks on Graphs 248

15 Tutte’s Theorem: How to draw a graph 131


VI Algorithms 254
16 The Lovàsz - Simonovits Approach to Random Walks 139
32 Sparsification by Random Sampling 255
17 Monotonicity and its Failures 144
33 Linear Sized Sparsifiers 261
18 Dynamic and Nonlinear Networks 152
34 Iterative solvers for linear equations 269

IV Spectra and Graph Structure 160 35 The Conjugate Gradient and Diameter 277

19 Independent Sets and Coloring 161 36 Preconditioning Laplacians 284

20 Graph Partitioning 168 37 Augmented Spanning Tree Preconditioners 292

21 Cheeger’s Inequality 173 38 Fast Laplacian Solvers by Sparsification 297

22 Local Graph Clustering 178 39 Testing Isomorphism of Graphs with Distinct Eigenvalues 304

23 Spectral Partitioning in a Stochastic Block Model 186 40 Testing Isomorphism of Strongly Regular Graphs 314

24 Nodal Domains 193


VII Interlacing Families 322
25 The Second Eigenvalue of Planar Graphs 201
41 Expected Characteristic Polynomials 323
26 Planar Graphs 2, the Colin de Verdière Number 208
42 Quadrature for the Finite Free Convolution 330
CHAPTER LIST iv

43 Ramanujan Graphs of Every Size 338

44 Bipartite Ramanujan Graphs 346

45 The Matching Polynomial 354

Bibliography 360
Preface

Please note that this is a rapidly evolving draft. You will find warning messages at
the start of sections that need substantial editing.
This book is about how combinatorial properties of graphs are related to algebraic properties of
associated matrices, as well as applications of those connections. One’s initial excitement over this
material usually stems from its counter-intuitive nature. I hope to convey this initial amazement,
but then make the connections seem intuitive. After gaining intuition, I hope the reader will
appreciate the material for its beauty.
This book is mostly based on lecture notes from the “Spectral Graph Theory” course that I have
taught at Yale, with notes from “Graphs and Networks” and “Spectral Graph Theory and its
Applications” mixed in. I love the material in these courses, and find that I can never teach
everything I want to cover within one semester. This is why I wrote this book. As this book is
based on lecture notes, it does not contain the tightest or most recent results. Rather, my goal is
to introduce the main ideas and to provide intuition.
There are three tasks that one must accomplish in the beginning of a course on Spectral Graph
Theory:

• One must convey how the coordinates of eigenvectors correspond to vertices in a graph.
This is obvious to those who understand it, but it can take a while for students to grasp.

• One must introduce necessary linear algebra and show some interesting interpretations of
graph eigenvalues.

• One must derive the eigenvalues of some example graphs to ground the theory.

I find that one has to do all these at once. For this reason my first few lectures jump between
developing theory and examining particular graphs. For this book I have decided to organize the
material differently, mostly separating examinations of particular graphs from the development of
the theory. To help the reader reconstruct the flow of my courses, I give three orders that I have
used for the material:
put orders here
There are many terrific books on Spectral Graph Theory. The four that influenced me the most
are

v
PREFACE vi

“Algebraic Graph Theory” by Norman Biggs,

“Spectral Graph Theory” by Fan Chung,

“Algebraic Combinatorics” by Chris Godsil, and

“Algebraic Graph Theory” by Chris Godsil and Gordon Royle.

Other books that I find very helpful and that contain related material include
Contents
“Modern Graph Theory” by Bela Bollobas,

“Probability on Trees and Networks” by Russell Llyons and Yuval Peres,


Preface v
“Spectra of Graphs” by Dragos Cvetkovic, Michael Doob, and Horst Sachs, and

“Eigenspaces of Graphs” By Dragos Cvetkovic, Peter Rowlinson, and Slobodan Simic Contents vi

“Non-negative Matrices and Markov Chains” by Eugene Seneta Notation xxii


“Nonnegative Matrices and Applications” by R. B. Bapat and T. E. S. Raghavan

“Numerical Linear Algebra” by Lloyd N. Trefethen and David Bau, III I Introduction and Background 1
“Applied Numerical Linear Algebra” by James W. Demmel
1 Introduction 2
For those needing an introduction to linear algebra, a perspective that is compatible with this 1.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
book is contained in Gil Strang’s “Introduction to Linear Algebra.” For more advanced topics in
linear algebra, I recommend “Matrix Analysis” by Roger Horn and Charles Johnson, as well as 1.2 Matrices for Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
their “Topics in Matrix Analysis.” For treatments of physical systems related to graphs, the topic
1.2.1 A spreadsheet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
of Part III, I recommend Gil Strang’s “Introduction to Applied Mathematics”, Sydney H. Gould’s
“Variational Methods for Eigenvalue Problems”, and “Markov Chains and Mixing Times” by 1.2.2 An operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Levin, Peres and Wilmer.
1.2.3 A quadratic form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
I have gained a lot of intuition for spectral and algebraic graph theory by examining examples. I
1.3 Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
include many examples so that you can play with them, develop your own intuition, and test your
own ideas. My preferred environment for computational experiments is a Jupyter notebook 1.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
written in the Julia programming language. All of the code used in this book may be found at
1.4.1 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
this GitHub repository: https://github.com/danspielman/sagt_code. If you want to start
running the code in this book, you should begin by importing a few packages and setting some 1.5 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
defaults with the lines
1.5.1 Spectral Graph Drawing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
using Laplacians, LinearAlgebra, Plots, SparseArrays, FileIO, JLD2, Random 1.5.2 Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
gr(); default(fmt=:png)
1.5.3 Platonic Solids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Random.seed!(0)
1.5.4 The Fiedler Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.5 Bounding Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

vii
CONTENTS viii CONTENTS ix

1.5.6 Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 The star graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


1.5.7 Random Walks on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3 Products of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.5.8 Expanders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.3.1 The Hypercube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5.9 Approximations of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.4 Bounds on λ2 by test vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.5.10 Solving equations in and computing eigenvalues of Laplacians . . . . . . . . . 18 5.5 The Ring Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.5.11 Advice on reading this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5.6 The Path Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Comparing Graphs 53
2 Eigenvalues and Optimization: The Courant-Fischer Theorem 21 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.1 The First Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 6.2 The Loewner order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 Proof of the Spectral Theorem by Optimization . . . . . . . . . . . . . . . . . . . . . 24 6.3 Approximations of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Singular Values for Asymmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . 27 6.4 The Path Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.4 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.4.1 Lower bounding λ2 of a Path Graph . . . . . . . . . . . . . . . . . . . . . . . 56
6.5 The Complete Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 The Laplacian and Graph Drawing 29
6.6 The weighted path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.1 The Laplacian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.7 A better lower bound on λ2 (Tn ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2 Drawing with Laplacian Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7 Cayley Graphs 61
4 Adjacency matrices, Eigenvalue Interlacing, and the Perron-Frobenius Theorem 34
7.1 Cayley Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.1 The Adjacency Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Paley Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2 The Largest Eigenvalue, µ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.3 Eigenvalues of the Paley Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Eigenvalue Interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.4 Generalizing Hypercubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Wilf’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.5 A random set of generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 Perron-Frobenius Theory for symmetric matrices . . . . . . . . . . . . . . . . . . . . 39
7.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6 Singular Values and Directed Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.7 Non-Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.8 Eigenvectors of Cayley Graphs of Abelian Groups . . . . . . . . . . . . . . . . . . . . 68

II The Zoo of Graphs 43 8 Eigenvalues of Random Graphs 69


8.1 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5 Fundamental Graphs 44
8.2 The extreme eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.1 The complete graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.2.1 Vectors near v 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
CONTENTS x CONTENTS xi

8.2.2 The Probabilistic Argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 10.6.2 The Complete Binary Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98


8.3 The Trace Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 10.6.3 The Dumbbell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.4 Expectation of the trace of a power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 10.6.4 The Bolas Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.5 The number of walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 10.7 Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 10.8 Final Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.7 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11 Walks, Springs, and Resistor Networks 102
9 Strongly Regular Graphs 83 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 11.2 Harmonic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 11.3 Random Walks on Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.3 The Pentagon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 11.4 Spring Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
9.4 Lattice Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 11.5 Laplacian linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.5 Latin Square Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 11.6 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.6 The Eigenvalues of Strongly Regular Graphs . . . . . . . . . . . . . . . . . . . . . . 85 11.7 Resistor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.7 Regular graphs with three eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 86 11.8 Solving for currents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.8 Integrality of the eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 11.9 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.9 The Eigenspaces of Strongly Regular Graphs . . . . . . . . . . . . . . . . . . . . . . 87
12 Effective Resistance and Schur Complements 110
9.10 Triangular Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
12.1 Electrical Flows and Effective Resistance . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.11 Two-distance point sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
12.2 Effective Resistance through Energy Minimization . . . . . . . . . . . . . . . . . . . 111
12.3 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
III Physical Metaphors 91
12.4 Examples: Series and Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

10 Random Walks on Graphs 92 12.5 Equivalent Networks, Elimination, and Schur Complements . . . . . . . . . . . . . . 114

10.1 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 12.5.1 In matrix form by energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

10.2 Spectra of Walk Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 12.6 Eliminating Many Vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

10.3 The stable distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 12.7 An interpretation of Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . 117

10.4 The Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 12.8 Effective Resistance is a Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

10.5 Relation to the Normalized Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . 96


13 Random Spanning Trees 119
10.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.6.1 The Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
13.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
CONTENTS xii CONTENTS xiii

13.3 Characteristic Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 17.7 Traffic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


13.4 The Matrix Tree Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 17.8 Braes’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
13.5 Leverage Scores and Marginal Probabilities . . . . . . . . . . . . . . . . . . . . . . . 123 17.9 The Price of Anarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
17.10Nash optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
14 Approximating Effective Resistances 126
17.11Social optimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
14.1 Representing Effective Resistances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
14.2 Computing Effective Resistances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 18 Dynamic and Nonlinear Networks 152
14.3 Properties of Gaussian random variables . . . . . . . . . . . . . . . . . . . . . . . . . 128 18.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
14.4 Proof of Johnson-Lindenstrauss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 18.2 Non-Linear Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
18.3 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
15 Tutte’s Theorem: How to draw a graph 131
18.4 Uses in Semi-Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
15.1 3-Connected, Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
18.5 Dual Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
15.2 Strictly Convex Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
18.6 Thermistor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
15.3 Possible Degeneracies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
18.7 Low Temperatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
15.4 All faces are convex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
15.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
IV Spectra and Graph Structure 160
16 The Lovàsz - Simonovits Approach to Random Walks 139
19 Independent Sets and Coloring 161
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
16.2 Definitions and Elementary Observations . . . . . . . . . . . . . . . . . . . . . . . . 140
19.2 Graph Coloring and Independent Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 161
16.3 Warm up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
19.3 Hoffman’s Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
16.4 The proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
19.4 Application to Paley graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
16.5 Andersen’s proof of Cheeger’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . 143
19.5 Lower Bound on the chromatic number . . . . . . . . . . . . . . . . . . . . . . . . . 164
17 Monotonicity and its Failures 144 19.6 Proofs for Hoffman’s lower bound on chromatic number . . . . . . . . . . . . . . . . 165
17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
20 Graph Partitioning 168
17.2 Effective Spring Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
20.1 Isoperimetry and λ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
17.3 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
20.2 Conductance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
17.4 Effective Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
20.3 The Normalized Laplacian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
17.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
20.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
17.6 Breakdown of Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
CONTENTS xiv CONTENTS xv

21 Cheeger’s Inequality 173 26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208


21.1 Cheeger’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 26.2 Colin de Verdière invariant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
26.3 Polytopes and Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
22 Local Graph Clustering 178
26.4 The Colin de Verdière Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
22.1 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
26.5 Minors of Planar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
22.2 Good choices for a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
26.6 cdv(G) ≤ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
22.3 Bounding the D-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
22.4 Bounding the Generalized Rayleigh Quotient . . . . . . . . . . . . . . . . . . . . . . 182
V Expander Graphs 215
22.5 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
22.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 27 Properties of Expander Graphs 216
27.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
23 Spectral Partitioning in a Stochastic Block Model 186
27.2 Expanders as Approximations of the Complete Graph . . . . . . . . . . . . . . . . . 216
23.1 The Perturbation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
27.3 Quasi-Random Properties of Expanders . . . . . . . . . . . . . . . . . . . . . . . . . 218
23.2 Perturbation Theory for Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 189
27.4 Vertex Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
23.3 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
27.5 How well can a graph approximate the complete graph? . . . . . . . . . . . . . . . . 220
23.4 Proof of the Davis-Kahan Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
27.6 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
23.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

28 A brief introduction to Coding Theory 225


24 Nodal Domains 193
28.1 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
24.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
28.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
24.2 Sylvester’s Law of Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
28.3 Connection with Generalized Hypercubes . . . . . . . . . . . . . . . . . . . . . . . . 226
24.3 Weighted Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
28.4 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
24.4 The Perron-Frobenius Theorem for Laplacians . . . . . . . . . . . . . . . . . . . . . 198
28.5 Terminology and Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
24.5 Fiedler’s Nodal Domain Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
28.6 Random Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
25 The Second Eigenvalue of Planar Graphs 201 28.7 Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
25.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 28.8 Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
25.2 Geometric Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
29 Expander Codes 233
25.3 The center of gravity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
29.1 Bipartite Expander Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
25.4 Further progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
29.2 Building Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
26 Planar Graphs 2, the Colin de Verdière Number 208 29.3 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
CONTENTS xvi CONTENTS xvii

29.4 Minimum Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 32.3 Matrix Chernoff Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256


29.5 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 32.4 The key transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
29.6 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 32.5 The probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
32.6 The analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
30 A simple construction of expander graphs 240
32.7 Open Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
30.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
30.2 Squaring Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 33 Linear Sized Sparsifiers 261
30.3 The Relative Spectral Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 33.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
30.4 Line Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 33.2 Turning edges into vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
30.5 The Spectrum of the Line Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 33.3 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
30.6 Approximations of Line Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 33.4 Rank-1 updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
30.7 The whole construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 33.5 Barrier Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
30.8 Better Constructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 33.6 Barrier Function Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
33.7 The inductive argument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
31 PSRGs via Random Walks on Graphs 248
33.8 Progress and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
31.2 Why Study PSRGs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 34 Iterative solvers for linear equations 269
31.3 Expander Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 34.1 Why iterative methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
31.4 Today’s Application : repeating an experiment . . . . . . . . . . . . . . . . . . . . . 249 34.2 First-Order Richardson Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
31.5 The Random Walk Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 34.3 Expanders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
31.6 Formalizing the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 34.4 The norm of the residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
31.7 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 34.5 A polynomial approximation of the inverse . . . . . . . . . . . . . . . . . . . . . . . 272
31.8 The norm of D X W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 34.6 Better Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
31.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 34.7 Chebyshev Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
31.10Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 34.8 Proof of Theorem 34.6.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
34.9 Laplacian Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

VI Algorithms 254 34.10Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

32 Sparsification by Random Sampling 255 35 The Conjugate Gradient and Diameter 277

32.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 35.1 The Matrix Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

32.2 Sparsification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 35.2 Application: Approximating Fiedler Vectors . . . . . . . . . . . . . . . . . . . . . . . 278


CONTENTS xviii CONTENTS xix

35.3 Optimality in the A-norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 39.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304


35.4 How Good is CG? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 39.2 Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
35.5 Laplacian Systems, again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 39.3 Using Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
35.6 Bounds on the Diameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 39.4 An easy case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
39.5 All the Automorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
36 Preconditioning Laplacians 284
39.6 Equivalence Classes of Vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
36.1 Approximate Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
39.7 The first partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
36.2 Iterative Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
39.8 Unbalanced vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
36.3 Iterative Methods in the Matrix Norm . . . . . . . . . . . . . . . . . . . . . . . . . . 286
39.9 The structure of the balanced classes . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
36.4 Preconditioned Iterative Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
39.10Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
36.5 Preconditioning by Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
36.6 Improving the Bound on the Running Time . . . . . . . . . . . . . . . . . . . . . . . 289 40 Testing Isomorphism of Strongly Regular Graphs 314
36.7 Further Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 40.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
36.8 Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 40.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
40.3 Paley Graphs and The Pentagon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
37 Augmented Spanning Tree Preconditioners 292
40.4 Lattice Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
37.1 Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
40.5 Latin Square Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
37.2 Heavy Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
40.6 The Eigenvalues of Strongly Regular Graphs . . . . . . . . . . . . . . . . . . . . . . 316
37.3 Saving a log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
40.7 Testing Isomorphism by Individualization and Refinement . . . . . . . . . . . . . . . 317

38 Fast Laplacian Solvers by Sparsification 297 40.8 Distinguishing Sets for Strongly Regular Graphs . . . . . . . . . . . . . . . . . . . . 318
38.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 40.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
38.2 Today’s notion of approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
38.3 The Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 VII Interlacing Families 322
38.4 A symmetric expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
41 Expected Characteristic Polynomials 323
38.5 D and A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
41.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
38.6 Sketch of the construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
41.2 Random sums of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
38.7 Making the construction efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
41.3 Interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
38.8 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
41.4 Sums of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
39 Testing Isomorphism of Graphs with Distinct Eigenvalues 304 41.5 Random Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
CONTENTS xx CONTENTS xxi

42 Quadrature for the Finite Free Convolution 330 45.3 Properties of the Matching Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 355
42.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 45.4 The Path Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
42.2 The Finite Free Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 45.5 Root bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
42.3 Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Bibliography 360
42.4 Quadrature by Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
42.5 Structure of the Orthogonal Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
42.6 The Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
42.7 Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337

43 Ramanujan Graphs of Every Size 338


43.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
43.2 The Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
43.3 Interlacing Families of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
43.4 Root Bounds for Finite Free Convolutions . . . . . . . . . . . . . . . . . . . . . . . . 341
43.5 The Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
43.6 Some explanation of Theorem 43.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
43.7 Some thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344

44 Bipartite Ramanujan Graphs 346


44.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
44.2 2-Lifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
44.3 Random 2-Lifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
44.4 Laplacianized Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
44.5 Interlacing Families of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
44.6 Common Interlacings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
44.7 Real Rootedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
44.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353

45 The Matching Polynomial 354


45.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
45.2 The Matching Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
NOTATION xxiii

V The set of vertices, Page 2


E The set of edges, Page 2
n usually the number of vertices in a graph, Page 3
a, b vertices
(a, b) an edge
w(a, b) or wa,b the weight of edge (a, b)
Notation w(e)
x (a)
the weight of edge e
the component of the vector x corresponding to vertex a.
d (a) the weighted degree of vertex a
d the vector of weighted degrees of the vertices in a graph.
w(F ) the sum of the weights of edges in F ⊂ E
This section lists the notation that I try to use throughout the book. I sometimes fall into λ an eigenvalue, usually of a graph Laplacian
different notations when the conventions surrounding a topic are so strong that failing to follow µ an eigenvalue, usually of an adjacency matrix, Page 35
them would make it difficult for experts to understand this book, or would cause cognitive stress. ν an eigenvalue of a normalized Laplacian matrix
ω an eigenvalue of a walk matrix
I almost always treat vectors as functions, and thus write x (i) for the ith component of the vector
ϕ an eigenvector, usually of M
x . In spectral and algebraic graph theory, we usually treat vectors as functions from vertices to
ψi the ith eigenvector, associated with eigenvalue λi , usually of a Laplacian
the real numbers, so you are more likely to encounter x (a) for a vertex named a. Similarly, we
δa the elementary unit vector in direction a; δ a (a) = 1, Page 29
denote the entry in row a and column b of a matrix M by M (a, b). I place subscripts on vectors,
Ψ the orthogonal matrix with columns ψ i
like x i , to indicate the ith vector in a set of vectors.
π the stable distribution of a random walk
I An identity matrix λmax (M ) the largest eigenvalue of M
J An all-1s matrix λmin (M ) the smallest eigenvalue of M
D The diagonal matrix of weighted degrees of a graph µk (M ) the kth largest eigenvalue of M
L Laplacian Matrix λk (L) the kth smallest eigenvalue of L
M Adjacency Matrix or a generic matrix, Page 3 Tr (M ) the trace of the matrix M , Page 20
N Normalized Laplacian Matrix ∥M ∥ the operator norm of the matrix M
W The diagonal matrix of edge weights, or |x | the Hamming weight of the vector x
W The Walk Matrix, M D −1 1S the characteristic vector of the set S, Page 38
f
W Lazy Walk Matrix, I /2 + W /2
a∼b a is a neighbor of b, Page 34
A+ The Moore-Penrose pseudoinverse of A.
G(S) the subgraph induced on the vertices in S Page 37
A+/2 The square root of of A+ .
M (S) the submatrix of M induced on the rows and columns in S, Page 37
χ(G) the chromatic number of the graph G, Page 161
α(G) the independence number of the graph G, Page 161
∂(S) the boundary of a set of vertices, Pages 44 and 168
θ(S) the isoperimetric ratio of S, Pages 44 and 168
ϕ(S) the conductance of S
θG the isoperimetric ratio of G
ϕG the conductance of G
A≽B A − B is positive semidefinite, Page 53
G≽H LG ≽ LH , Page 54
c·G the result of multiplying all edge weights of G by c, Page 54
[n] the set {1, 2, . . . , n}
⌊x⌋ the “floor” of x, Page 38
Fp the field with p elements, aka the integers modulo the prime p

xxii
Chapter 1

Introduction

Part I In this chapter we present essential background on graphs and spectral theory. We also introduce
some spectral and algebraic graph theory, describe some of the topics covered in this book, and
try to give some useful intuition about graph spectra.

Introduction and Background 1.1 Graphs

First, we recall that a graph G = (V, E) is specified by its vertex1 set, V , and edge set, E. In an
undirected graph, the edge set is a set of unordered pairs of vertices. We use the notation (a, b) to
indicate an edge between vertices a and b. As this edge is undirected, this is the same as edge
(b, a). Some prefer to write undirected edges using set notation, like {a, b}; but, we won’t do that.
Unless otherwise specified, all graphs discussed in this book will be undirected, simple (having no
loops or multiple edges) and finite. We will sometimes assign weights to edges. These will usually
be positive real numbers. If no weights have been specified, we will assume all edges have weight
1. This is an arbitrary choice, and we should remember that it has an impact.
Graphs (also called “networks”) are typically used to model connections or relations between
things, where “things” are vertices. When the edges in a graph are more important than the
vertices, we may just specify an edge set E and omit the ambient vertex set.
Common “natural” examples of graphs are:

• Friendship graphs: people are vertices, edges exist between pairs of people who are friends
(assuming the relation is symmetric).

• Network graphs: devices, routers and computers are vertices, edges exist between pairs that
are connected.

• Circuit graphs: electronic components, such as transistors, are vertices: edges exist between
pairs connected by wires.
1
I will use the words “vertex” and “node” interchangeably. Sorry about that.

1 2
CHAPTER 1. INTRODUCTION 3 CHAPTER 1. INTRODUCTION 4

• Protein-Protein Interaction graphs: proteins are vertices. Edges exist between pairs that While the adjacency matrix is the most natural matrix to associate with a graph, I find it the
interact. These should really have weights indicating the strength and nature of interaction. least useful. Eigenvalues and eigenvectors are most meaningful when used to understand a
So should most other graphs. natural operator or a natural quadratic form. The adjacency matrix provides neither.

It is much easier to study abstract, mathematically defined graphs. For example,


1.2.2 An operator
• The path on n vertices. The vertices are {1, . . . n}. The edges are (i, i + 1) for 1 ≤ i < n. The most natural operator associated with a graph G is probably its diffusion operator. This
• The ring on n vertices. The vertices are {1, . . . n}. The edges are all those in the path, plus operator describes the diffusion of stuff among the vertices of a graph. Imagine a process in which
the edge (1, n). each vertex can contain some amount of stuff, such as a gas or the probability that a random walk
is at that vertex. At each time step, the stuff at a vertex will be uniformly distributed to its
• The hypercube on 2k vertices. The vertices are elements of {0, 1}k . Edges exist between neighbors. None of the stuff that was at a vertex remains at the vertex, but stuff can enter from
vertices that differ in only one coordinate. other vertices. This is a discrete-time and slightly unnatural notion of diffusion, but it provides a
nice matrix. We define the operator for the continuous-time process below.
To construct the matrix realizing this process, which we call the walk matrix or the diffusion
1.2 Matrices for Graphs matrix, let D G be the diagonal matrix in which D G (a, a) is the degree of vertex a. We will
usually write d (a) for the degree of vertex a. In an unweighted graph, the degree of a vertex is
The naive view of a matrix is that it is essentially a spreadsheet—a table we use to organize the number of edges attached to it. In the case of a weighted graph, we call the number of edges
numbers. This is like saying that a car is an enclosed metal chair with wheels. It says nothing attached to a vertex its combinatorial degree, and the sum of the weights of the edges attached to
about what it does! it the weighted degree. When we refer to the degree of a vertex in a weighted graph, you should
assume we mean the weighted degree. Algebraically, we can obtain the vector of degrees from the
We will use matrices to do two things. First, we will view a matrix M as providing a function
expression
that maps a vector x to the vector M x . That is, we view M as an operator. Second, we use the def
matrix M to define a quadratic form: a function that maps a vector x to the number x T M x . d = M G 1,
where 1 is the all-ones vector.

1.2.1 A spreadsheet We then set


W G = M G D −1
G .
We will usually write V for the set of vertices of a graph, and let n denote the number of vertices. When the graph is regular, that is when every vertex has the same degree, W G is merely a
There are times that we will need to order the vertices and assign numbers to them. In this case, rescaling of M G 3 . In the event that a vertex a has degree 0, we adopt the convention that
they will usually be {1, . . . , n}. For example, if we wish to draw a matrix as a table, then we need W G (a, a) = 0.
to decide which vertex corresponds to which row and column.
Formally4 , we use a vector p ∈ IRV to indicate how much “stuff” is at each vertex, with p(a)
The most natural matrix to associate with a graph G is its adjacency matrix2 , M G , whose being the amount of stuff at vertex a. After one time step, the distribution of stuff at each vertex
entries M G (a, b) are given by ( will be W G p. To see this, first consider the case when p is an elementary unit vector, δ a , where
1 if (a, b) ∈ E we define δ a to be the vector for which δ a (a) = 1, and for every other vertex b, δ a (b) = 0. The
M G (a, b) =
0 otherwise. vector D −1G δ a has the value 1/d (a) at vertex a, and is zero everywhere else. So, the vector
M G D −1G δ a has value 1/d (a) at every vertex b that is a neighbor of a, and is zero everywhere else.
If G is weighted with edge (a, b) having weight wa,b , we set M G (a, b) = wa,b . If this is not immediately obvious, think about it until it is.
It is important to realize that we index the rows and columns of the matrix by vertices, rather It is sometimes more convenient to consider a lazy random walk . These are usually defined to be
than by numbers. Almost every statement that we make will remain true under renaming of walks that stay put with probability one half and take a step with probability one half. The
vertices. The first row of a matrix has no special importance. To understand this better see the matrix corresponding to this operator is given by
exercises at the end of this section.
2
f G def
W = I /2 + W G /2.
I am going to try to always use the letter M for the adjacency matrix, in contrast with my past practice which
was to use A. I will almost always use letters like a and b to denote vertices. 3
I think this is why researchers got away with studying the adjacency matrix for so long.
4
We write IRV instead of IRn to emphasize that each coordinate of the vector corresponds to a vertex of the graph.
CHAPTER 1. INTRODUCTION 5 CHAPTER 1. INTRODUCTION 6

One of the purposes of spectral theory is to provide an understanding of what happens when one Recall that the eigenvectors are not uniquely determined, although the eigenvalues are. If ψ is an
repeatedly applies a linear operator like W G . eigenvector, then −ψ is as well. Some eigenvalues can be repeated. If λi = λi+1 , then ψ i + ψ i+1
will also be an eigenvector of eigenvalue λi . The eigenvectors of a given eigenvalue are only
determined up to an orthogonal transformation.
1.2.3 A quadratic form
Definition 1.3.2. A matrix is positive definite, written M ≻ 0, if it is symmetric and all of its
The most natural quadratic form associated with a graph is defined in terms of its Laplacian eigenvalues are positive. It is positive semidefinite, written M ⪰ 0, if it is symmetric and all of
matrix, its eigenvalues are nonnegative.
def
LG = D G − M G .
Fact 1.3.3. The Laplacian matrix of a graph is positive semidefinite.
V
Given a function on the vertices, x ∈ IR , the value of the Laplacian quadratic form of a weighted
graph in which edge (a, b) has weight wa,b > 0 is Proof. Let ψ be a unit eigenvector of L of eigenvalue λ. Then,
X X
x T LG x = wa,b (x (a) − x (b))2 . (1.1) ψ T Lψ = ψ T λψ = λ(ψ T ψ) = λ = wa,b (ψ(a) − ψ(b))2 ≥ 0.
(a,b)∈E (a,b)∈E

This form measures the smoothness of the function x . It will be small if the function x does not
vary too much between the vertices connected by any edge.
We always number the eigenvalues of the Laplacian from smallest to largest. Thus, λ1 = 0. We
We will occasionally want to consider the Laplacians of graphs that have both positively and
will refer to λ2 , and in general λk for small k, as low-frequency eigenvalues. λn is a high-frequency
negatively weighted edges. As there are many reasonable definitions of these Laplacians, we will
eigenvalue. We will see why soon.
only define them when we need them.

1.3 Spectral Theory 1.4 Some examples

We now review the highlights of the spectral theory for symmetric matrices. Almost all of the Before we start proving theorems, we will see examples that should convince you that the
matrices we consider will be symmetric or will be similar5 to symmetric matrices. eigenvalues and eigenvectors of graphs are meaningful.

We recall that a vector ψ is an eigenvector of a matrix M with eigenvalue λ if ψ is not identically


zero and 1.4.1 Paths
M ψ = λψ. (1.2)
A path graph on n vertices has vertices {1, . . . , n} and edges (i, i + 1) for 1 ≤ i < n. Here is the
That is, λ is an eigenvalue if and only if λI − M is a singular matrix. Thus, the eigenvalues are
adjacency matrix of a path graph on 4 vertices.
the roots of the characteristic polynomial of M :

det(xI − M ). M = path_graph(4)
Matrix(M)
Theorem 1.3.1. [The Spectral Theorem] For every n-by-n, real, symmetric matrix M , there 0.0 1.0 0.0 0.0
exist real numbers λ1 , . . . , λn and n mutually orthogonal unit vectors ψ 1 , . . . , ψ n and such that ψ i 1.0 0.0 1.0 0.0
is an eigenvector of M of eigenvalue λi , for each i. 0.0 1.0 0.0 1.0
0.0 0.0 1.0 0.0
This is the great fact about symmetric matrices. If the matrix is not symmetric, it might not have
n eigenvalues. And, even if it has n eigenvalues, their eigenvectors will not be orthogonal6 . If M
And, here is its Laplacian matrix
is not symmetric, its eigenvalues and eigenvectors might be the wrong thing to study.
5
A matrix M is similar to a matrix B if there is a non-singular matrix X such that X −1 M X = B. In this case, Matrix(lap(M))
M and B have the same eigenvalues. See the exercises at the end of this section.
6
You can prove that if the eigenvectors are orthogonal, then the matrix is symmetric.
1.0 -1.0 0.0 0.0
CHAPTER 1. INTRODUCTION 7 CHAPTER 1. INTRODUCTION 8

-1.0 2.0 -1.0 0.0


0.0 -1.0 2.0 -1.0
0.0 0.0 -1.0 1.0

Here are the eigenvalues of a longer path.

L = lap(path_graph(10))
E = eigen(Matrix(L))
E.values’

0.0 0.097887 0.381966 0.824429 1.38197 2.0 2.61803 3.17557 3.61803 3.90211

The eigenvector of the zero-eigenvalue is a constant vector (up to numerical issues):

E.vectors[:,1] plot(v2,marker=5,legend=false)
xlabel!("vertex number")
0.31622776601683755 ylabel!("value in eigenvector")
0.31622776601683716
0.31622776601683766 The x-axis is the name/number of the vertex, and the y-axis is the value of the eigenvector at
0.3162277660168381 that vertex. Now, let’s look at the next few eigenvectors.
0.31622776601683855
0.3162277660168381
0.3162277660168385
0.31622776601683805
0.3162277660168378
0.3162277660168378

The eigenvector of λ2 is the lowest frequency eigenvector. As we can see, it increases


monotonically along the path:

v2 = E.vectors[:,2]

0.44170765403093926
0.3984702312962002
0.31622776601683794
0.20303072371134548
0.0699596195707542
plot(E.vectors[:,2:4],label=["v2" "v3" "v4"],marker = 5)
-0.06995961957075394
xlabel!("Vertex Number")
-0.2030307237113458
ylabel!("Value in Eigenvector")
-0.3162277660168378
-0.39847023129619985
-0.44170765403093826 You may now understand why we refer to these as the low-frequency eigenvectors. The curves
they trace out resemble the low-frequency modes of vibration of a string. The reason for this is
Let’s plot that. that the path graph can be viewed as a discretization of the string, and its Laplacian matrix is a
CHAPTER 1. INTRODUCTION 9 CHAPTER 1. INTRODUCTION 10

discretization of the Laplace operator. We will relate the low-frequency eigenvalues to -0.15623 0.353553
connectivity. 0.15623 0.353553
0.377172 0.353553
In contrast, the highest frequency eigenvalue alternates positive and negative with every vertex.
-0.377172 -1.66533e-16
We will see that the high-frequency eigenvectors may be related to problems of graph coloring
-0.15623 -4.16334e-16
and finding independent sets.
0.15623 -5.82867e-16
0.377172 2.77556e-16
-0.377172 -0.353553
0.4
v10 -0.15623 -0.353553
0.15623 -0.353553
0.377172 -0.353553
Value in Eigenvector

0.2

In the figure below, we use these eigenvectors to draw the graph. Vertex a has been plotted at
0.0
coordinates ψ 2 (a), ψ 3 (a). That is, we use ψ 2 to provide a horizontal coordinate for every vertex,
and ψ 3 to obtain a vertical coordinate. We then draw the edges as straight lines.
-0.2

-0.4

2 4 6 8 10
Vertex Number

Plots.plot(E.vectors[:,10],label="v10",marker=5)
xlabel!("Vertex Number")
ylabel!("Value in Eigenvector")

1.5 Highlights

We now attempt to motivate this book, and the course on which it is based, by surveying some of
its highlights. plot_graph(M,V[:,1],V[:,2])

1.5.1 Spectral Graph Drawing


Figure 1.1: A 3-by-4 grid graph.
We can often use the low-frequency eigenvalues to obtain a nice drawing of a graph. For example, Let’s do a fancier example that should convince you something interesting is going on. We begin
here is 3-by-4 grid graph, and its first two non-trivial eigenvectors. Looking at them suggests that in Fig. 1.2 by generating points by sampling them from the Yale logo.
they might provide nice coordinates for the vertices.
We then construct a graph on the points by forming their Delaunay triangulation7 , and use the
edges of the triangles to define a graph on the points. We draw those edges as straight lines in
M = grid2(3,4) Fig. 1.3.
L = lap(M)
E = eigen(Matrix(L)) Since the vertices came with coordinates, it was easy to draw a nice picture of the graph. But,
V = E.vectors[:,2:3] what if we just knew the graph, and not the coordinates? As we did with the grid, we could
7
While it does not make sense to cover Delaunay triangulations in this book, they are fascinating and I recommend
-0.377172 0.353553 that you look them up.
CHAPTER 1. INTRODUCTION 11 CHAPTER 1. INTRODUCTION 12

2.00

1.75

1.50

1.25

1.00
1.00 1.25 1.50 1.75 2.00

@load "yale.jld2" E = eigen(Matrix(lap(a)))


scatter(xy[:,1],xy[:,2],legend=false) V = E.vectors[:,2:3]
plot_graph(a,V[:,1],V[:,2], dots=false);

Figure 1.2: Dots sampled from the Yale logo


Figure 1.4: The spectral drawing of the graph of the Delaunay triangulation.

That’s a great way to draw a graph if you start out knowing nothing about it8 . Note that the
middle of the picture is almost planar, although edges do cross near the boundaries.

1.5.2 Graph Isomorphism

It is important to note that the eigenvalues do not change if we relabel the vertices. Moreover, if
we permute the vertices then the entries of the eigenvectors are similarly permuted. That is, if P
is a permutation matrix, then

Lψ = λψ if and only if (PLP T )(Pψ) = PLψ = λ(Pψ),

because P T P = I . To prove it by experiment, let’s randomly permute the vertices, and plot the
permuted graph.
8
It’s the first thing I do whenever I meet a strange graph.
plot_graph(a,xy[:,1],xy[:,2])

Figure 1.3: Dots sampled from the Yale logo

generate coordinates by computing two eigenvectors, and using each as a coordinate. In Fig. 1.4,
we plot vertex a at position ψ 2 (a), ψ 3 (a), and again draw the edges as straight lines.
CHAPTER 1. INTRODUCTION 13 CHAPTER 1. INTRODUCTION 14

[0.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0,
15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0]

All Latin Square Graphs of the same size have the same eigenvalues, whether or not they are
isomorphic. We will learn some surprisingly fast (but still not polynomial time) algorithms for
checking whether or not Strongly Regular Graphs are isomorphic.

1.5.3 Platonic Solids

Of course, some graphs are not meant to be drawn in two dimensions. For example let’s try
drawing the skeleton of the dodecahedron using ψ 2 and ψ 3 .

Random.seed!(3)
p = randperm(size(a,1))
M = a[p,p]
E = eigen(Matrix(lap(M)))
V = E.vectors[:,2:3]
plot_graph(M,V[:,1],V[:,2], dots=false);

Note that this picture is slightly different from the previous one: it has flipped vertically. That’s
because eigenvectors are only determined up to signs, and that’s only if they have multiplicity 1.
This gives us a powerful heuristic for testing if one graph is a permutation of another (this is the
famous “Graph Isomorphism Testing Problem”). First, check if the two graphs have the same sets
of eigenvalues. If they don’t, then they are not isomorphic. If they do, and the eigenvalues have
multiplicity one, then draw the pictures above. If the pictures are the same, up to horizontal or M = read_graph("dodec.txt")
vertical flips, and no vertex is mapped to the same location as another, then by lining up the E = eigen(Matrix(lap(M)))
pictures we can recover the permutation. x = E.vectors[:,2]
y = E.vectors[:,3]
As some vertices can map to the same location, this heuristic doesn’t always work. We will learn plot_graph(M, x, y; setaxis=false);
about the extent to which it does. In particular, we will see in Chapter 39 that if every eigenvalue
of two graphs G and H has multiplicity 1, then we can efficiently test whether or not they are
isomorphic.
Figure 1.5: The 1-skeleton of the dodecahedron.
These algorithms have been extended to handle graphs in which the multiplicity of every
eigenvalue is bounded by a constant [BGM82]. But, there are graphs in which every non-trivial You will notice that this looks like what you would get if you squashed the dodecahedron down to
eigenvalue has large multiplicity. In Chapter 9 We will learn how to construct and analyze some, the plane. The reason is that we really shouldn’t be drawing this picture in two dimensions: the
as they constitute fundamental examples and counter-examples to many natural conjectures. For smallest non-zero eigenvalue of the Laplacian has multiplicity three.
example, here are the eigenvalues of a Latin Square Graph on 25 vertices. These are a type of
Strongly Regular Graph. E = eigen(Matrix(lap(M)))
E.values’
M = latin_square_graph(5);
println(eigvals(Matrix(lap(M)))) 3.55271e-15 0.763932 0.763932 0.763932 2.0 2.0 2.0 2.0 2.0
3.0 3.0 3.0 3.0 5.0 5.0 5.0 5.0 5.23607 5.23607 5.23607
CHAPTER 1. INTRODUCTION 15 CHAPTER 1. INTRODUCTION 16

So, we can’t reasonably choose just two eigenvectors. We should be choosing three that span the
eigenspace. When we do, we get the canonical representation of the dodecahedron in three
dimensions.

x = E.vectors[:,20]
y = E.vectors[:,19]
z = E.vectors[:,18]
plot_graph(M, x, y, z; setaxis=false);

1.5.4 The Fiedler Value


x = E.vectors[:,2]
y = E.vectors[:,3] We prove in Lemma 3.1.1 that the second-smallest eigenvalue of the Laplacian matrix of a graph
z = E.vectors[:,4] is zero if and only if the graph is disconnected. If G is disconnected, then we can partition it into
plot_graph(M, x, y, z; setaxis=false) two graphs G1 and G2 with no edges between them, and then write
 
As you would guess, this happens for all Platonic solids. In fact, if you properly re-weight the LG1 0
LG = .
edges, it happens for every graph that is the one-skeleton of a convex polytope [Lov01]. 0 LG2
As the eigenvalues of LG are the union, with multiplicity, of the eigenvalues of LG1 and LG2 we
We finish this section by contemplating an image of the high-frequency eigenvectors of the
see that LG inherits a zero eigenvalue from each. Conversely, if G is connected then we can show
dodecahedron. This code plots them in three dimensions, although we can only print them in two.
that the only vectors x for which x T LG x = 0 are the constant vectors: if x is not constant and G
Observe that vertices are approximately opposite their neighbors.
is connected then there must be an edge (a, b) for which x (a) ̸= x (b). This edge will contribute a
positive term to the sum (1.1).
Fiedler suggested that we make this qualitative observation quantitative by considering λ2 a
measure of how well connected the graph is. For this reason, he called it the “Algebraic
Connectivity” of a graph, and we call it the “Fiedler value”.
Fiedler [Fie73] proved that the further λ2 is from 0, the better connected the graph is. In Chapter
21 we will the prove ultimate extension of this result: Cheeger’s inequality.
In short, we say that a graph is poorly connected if one can cut off many vertices by removing
only a few edges. We measure how poorly connected it is by the ratio of these quantities (almost).
Cheeger’s inequality gives a tight connection between this ratio and λ2 . If λ2 is small, then for
some t, the set of vertices
def
Si = {i : ψ 2 (i) < t}
CHAPTER 1. INTRODUCTION 17 CHAPTER 1. INTRODUCTION 18

may be removed by cutting many fewer than |Si | edges. This spectral graph partitioning heuristic Error-correcting codes and expander graphs are both fundamental objects of study in the field of
has proved very successful in practice. extremal combinatorics and are extremely useful. We will also use error-correcting codes to
construct crude expander graphs. In Chapter 30 we will see a simple construction of good
In general, it will be interesting to turn qualitative statements like “G is connected if and only if
expanders. The best expanders are the Ramanujan graphs. These were first constructed by
λ2 > 0” into quantitative ones. For example, the smallest eigenvalue of the diffusion matrix is
Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88]. In Chapters 44 and 43 we will
zero if and only if the graph is bipartite. Trevisan [Tre09] showed that the magnitude of this
prove that there exist infinite families of bipartite Ramanujan graphs.
eigenvalue is related to how far a graph is from being bipartite.

1.5.9 Approximations of Graphs


1.5.5 Bounding Eigenvalues
We will ask what it means for one graph to approximate another. Given graphs G and H, we will
We will often be interested in the magnitudes of certain eigenvalues. For this reason, we will learn
measure how well G approximates H by the closeness of their Laplacian quadratic forms. We will
multiple techniques for proving bounds on eigenvalues. The most prominent of these will be
see that expanders are precisely the sparse graphs that provide good approximations of the
proofs by test vectors (Section 5.4) and proofs by comparison with simpler graphs (Chapter 6).
complete graph, and we will use this perspective for most of our analysis of expanders in
Chapter 27. In Chapters 32 and 33 we show that every graph can be well-approximated by a
1.5.6 Planar Graphs sparse graph through a process called sparsification.

We will prove that graphs that can be drawn nicely must have small Fiedler value, and we will
1.5.10 Solving equations in and computing eigenvalues of Laplacians
prove very tight results for planar graphs (Chapter 25).
In Chapter 15 we will see how to use the graph Laplacian to draw planar graphs: Tutte [Tut63] We will also ask how well a graph can be approximated by a tree, and see in Chapter 36 that
showed that if one reasonably fixes the locations of the vertices on a face of a planar graph and low-stretch spanning-trees provide good approximations under this measure.
then lets the others settle into the positions obtained by treating the edges as springs, then one
Our motivation for this material is the need to design fast algorithms for solving systems of linear
obtains a planar drawing of the graph!
equations in Laplacian matrices and for computing their eigenvectors. This first problem arises in
numerous contexts, including the solution of elliptic PDEs by the finite element method, the
1.5.7 Random Walks on Graphs solution of network flow problems by interior point algorithms, and in classification problems in
Machine Learning.
Spectral graph theory is one of the main tools we use for analyzing random walks on graphs. We In fact, our definition of graph approximation is designed to suit the needs of the Preconditioned
will devote a few chapters to this theory, connect it to Cheeger’s inequality, and use tools Conjugate Gradient algorithm.
developed to study random walks to derive a fascinating proof of Cheeger’s inequality
(Chapter 16).
1.5.11 Advice on reading this book
1.5.8 Expanders Throughout this book, we have tried to strike a balance between the simplicity and generality of
the results that we prove. But, whenever you want to understand a proof, you should try to make
We will be particularly interested in graphs that are very well connected. These are called as many simplifying assumptions as are reasonable. For example, that the graph under
expanders. Roughly speaking, expanders are sparse graphs (say having a number of edges linear consideration is connected, all of its edges have weight 1, and that of its eigenvalues have
in the number of vertices), in which λ2 is bounded away from zero by a constant. They are among multiplicity one.
the most important examples of graphs, and play a prominent role in Theoretical Computer
Science. When seeking generalizations of the material in this book, you should consult the source material
of the notes at the end of each chapter.
Expander graphs have numerous applications. We will see how to use random walks on expander
graphs to construct pseudo-random generators about which one can actually prove something. We
will also use them to construct good error-correcting codes.
CHAPTER 1. INTRODUCTION 19 CHAPTER 1. INTRODUCTION 20

1.6 Exercises 6. Traces.


Recall that the trace of a matrix A, written Tr (A), is the sum of the diagonal entries of A. Prove
The following exercises are intended to help you get back in practice at doing linear algebra. You that for two matrices A and B,
should solve all of them. Tr (AB) = Tr (BA) .
1. Orthogonal eigenvectors. Let M be a symmetric matrix, and let ψ and ϕ be vectors so Note that the matrices do not need to be square for this to be true: they can be rectangular
that matrices of dimensions n × m and m × n.
M ψ = µψ and M ϕ = νϕ.
Use this fact and the previous exercise to prove that
Prove that if µ ̸= ν then ψ must be orthogonal to ϕ. Your proof should exploit the symmetry of
n
X
M , as this statement is false otherwise.
Tr (A) = λi ,
2. Invariance under permutations. i=1

Let Π be a permutation matrix. That is, there is a permutation π : V → V so that where λ1 , . . . , λn are the eigenvalues of A. You are probably familiar with this fact about the
( trace, or it may have been the definition you were given. This is why I want you to remember
1 if u = π(v), and how to prove it.
Π(u, v) =
0 otherwise.
7. The Characteristic Polynomial
Prove that if
Let M be a symmetric matrix. Recall that the eigenvalues of M are the roots of the
M ψ = λψ,
characteristic polynomial of M :
then  n
Y
ΠM ΠT (Πψ) = λ(Πψ). def
p(x) = det(xI − M ) = (x − µi ).
That is, permuting the coordinates of the matrix merely permutes the coordinates of the i=1
eigenvectors, and does not change the eigenvalues.
Write
n
X
3. Invariance under rotations.
p(x) = xn−k ck (−1)k .
Let Q be an orthogonal matrix. That is, a matrix such that Q T Q = I . Prove that if k=0

Prove that X
M ψ = λψ,
ck = det(M (S, S)).
then 
S⊆[n],|S|=k
T
QM Q (Qψ) = λ(Qψ). Here, we write [n] to denote the set {1, . . . , n}, and M (S, S) to denote the submatrix of M with
rows and columns indexed by S.
4. Similar Matrices.
8. Reversing products.
A matrix M is similar to a matrix B if there is a non-singular matrix X such that
X −1 M X = B. Prove that similar matrices have the same eigenvalues. Let M be a d-by-n matrix. Prove that the multiset of nonzero eigenvalues of M M T is the same
as the multiset of nonzero eigenvalues of M T M .
5. Spectral decomposition.
Let M be a symmetric matrix with eigenvalues λ1 , . . . , λn and let ψ 1 , . . . , ψ n be a corresponding
set of orthonormal column eigenvectors. Let Ψ be the orthogonal matrix whose ith column is ψ i .
Prove that
ΨT M Ψ = Λ,
where Λ is the diagonal matrix with λ1 , . . . , λn on its diagonal. Conclude that
X
M = ΨΛΨT = λi ψ i ψ Ti .
i∈V
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 22

where the maximization and minimization are over subspaces S and T of IRn .
The corresponding eigenvectors satisfy

ψ 1 ∈ arg max x T M x , and ψ n ∈ arg min x T M x ,


∥x ∥=1 ∥x ∥=1

and for 2 ≤ k ≤ n
Chapter 2
ψ k ∈ arg min xTM x, and ψ k ∈ arg max xTM x. (2.2)
∥x ∥=1 ∥x ∥=1
x T ψ j =0, for j > k x T ψ j =0, for j < k

Eigenvalues and Optimization: The 2.1 The First Proof


Courant-Fischer Theorem
As with many proofs in spectral theory, we begin by expanding a vector x in the basis of
eigenvectors of M . Let’s recall how this is done.
Let ψ 1 , . . . , ψ n be an orthonormal basis of eigenvectors of M corresponding to µ1 , . . . , µn . For
One of the reasons that the eigenvalues of matrices have meaning is that they arise as the solution every1 vector x , we may write
to natural optimization problems. The formal statement of this is given by the Courant-Fischer
Theorem. We begin by using the Spectral Theorem to prove the Courant-Fischer Theorem. We X
x = ci ψ i , where ci = ψ Ti x . (2.3)
then give a self-contained proof of the Spectral Theorem for symmetric matrices by leveraging a
i
special case of the Courant-Fischer Theorem.
There are many ways to verify this. Let Ψ be the matrix whose columns are ψ 1 , . . . , ψ n , and
The Rayleigh quotient of a vector x with respect to a matrix M is defined to be
recall that the matrix Ψ is said to be orthogonal if its columns are mutually orthogonal unit
xTM x vectors. Also recall that the orthogonal matrices are exactly those matrices Ψ for which
. (2.1) ΨΨT = I , and that this implies that ΨT Ψ = I . We now verify (2.3) by
xTx
!
The Rayleigh quotient of an eigenvector is its eigenvalue: if M ψ = µψ, then X X X X 
ci ψ i = (ψ Ti x )ψ i = ψ i (ψ Ti x ) = ψ i ψ Ti x = ΨΨT x = I x = x .
ψT M ψ ψ T µψ i i i i
= T = µ.
ψT ψ ψ ψ As you gain comfort with linear algebra, you will avoid summation over indices and instead write
c = ΨT x and x = Ψc. Until you get used to orthonormal bases, just pretend that they are the
The Courant-Fischer Theorem tells us that the vectors that maximize the Rayleigh quotient are basis of elementary unit vectors. For example, you know that
exactly the eigenvectors of the largest eigenvalue of M . In fact it supplies a similar X
characterization of all the eigenvalues of a symmetric matrix. x (i) = δ Ti x , and that x = x (i)δ i .
i
Theorem 2.0.1 (Courant-Fischer Theorem). Let M be a symmetric matrix with eigenvalues
µ1 ≥ µ2 ≥ · · · ≥ µn . Then, The first step in the proof of Theorem 2.0.1 is to express the Laplacian quadratic form of x in
terms of the expansion of x in the eigenbasis. We will use this expansion often.
xTM x xTM x
µ1 = max , µn = min , Lemma 2.1.1. Let M be a symmetric matrix with eigenvalues µ1 , . . . , µn and a corresponding
x ̸=0 xTx x ̸=0 xTx
orthonormal basis of eigenvectors ψ 1 , . . . , ψ n . Let x be a vector whose expansion in this basis is
and for all k ≥ 1, n
X
x = ci ψ i .
xTM x xTM x i=1
µk = maxn min = minn max ,
S⊆IR x ∈S x T x T ⊆IR x ∈T xTx 1
dim(S)=k x ̸=0 dim(T )=n−k+1 x ̸=0 When we say “every”, assume we mean every vector of the dimension of M .

21
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 23 CHAPTER 2. EIGENVALUES AND OPTIMIZATION 24

Then, So, 2
n
X xTM x
xTM x = c2i µi . min ≥ µk .
x ∈S xTx
i=1

To show that this is in fact the maximum, we will prove that for all subspaces S of dimension k,
Proof. Compute:
!T   xTM x
X X min ≤ µk .
x ∈S xTx
xTM x = ci ψ i M cj ψ j 
i j Let T be the span of ψ k , . . . , ψ n . As T has dimension n − k + 1, every subspace S of dimension k
!T  
has an intersection with T of dimension at least 1. In particular, S ∩ T is non-empty so we can
X X
= ci ψ i  cj M ψ j  write
xTM x xTM x xTM x
i j min ≤ min ≤ max .
!T   x ∈S x T x x ∈S∩T x T x x ∈T xTx
X X
= ci ψ i  cj µj ψ j  Let x be a vector in T at which this maximum is achieved, and expand x in the form
i j n
X X
= ci cj µj ψ Ti ψ j x = ci ψ i ,
i,j i=k
X
= c2i µi , Again applying Lemma 2.1.1 we obtain
i Pn P
xTM x µi c2i µk n c2
as ( T
= Pi=k
n 2 ≤ Pn i=k 2 i = µk .
x x i=k ci i=k ci
0 for i ̸= j
ψ Ti ψ j = 
1 for i = j. As ψ Tk M ψ k = µk , and T = x : x T ψ j = 0, for j < k ,

ψ k ∈ arg max xTM x.


∥x ∥=1
Proof of Theorem 2.0.1. Let ψ 1 , . . . , ψ n be an orthonormal set of eigenvectors of M xT ψ j =0, for j < k

corresponding to µ1 , . . . , µn . We will just verify the identity 


The other expression for ψ k follows from S = x : x T ψ j = 0, for j > k .
xTM x
µk = max min ,
S⊆IRn x ∈S xTx
̸ 0
dim(S)=k x =
2.2 Proof of the Spectral Theorem by Optimization
as the expressions for µ1 and µn are special cases of this, and the proof of the other
characterization is similar. We now give a self-contained proof that the Rayleigh quotient is maximized at an eigenvector of
M.
First, let’s verify that µk is achievable. Let S be the span of ψ 1 , . . . , ψ k . We can expand every
x ∈ S as Theorem 2.2.1. Let M be a symmetric matrix and let x be a non-zero vector that maximizes
Xk
the Rayleigh quotient with respect to M :
x = ci ψ i .
i=1 xTM x
Applying Lemma 2.1.1 we obtain .
xTx
Pk P Then, M x = µ1 x , where µ1 is the largest eigenvalue of M . Conversely, the minimum is achieved
xTM x i=1 µi ci
2 µk ki=1 c2i
= P ≥ P = µk . by eigenvectors of the smallest eigenvalue of M .
xTx k 2
i=1 ci
k 2
i=1 ci
2
Be warned that we will often neglect to mention the condition x ̸= 0, but we always intend it.
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 25 CHAPTER 2. EIGENVALUES AND OPTIMIZATION 26

Proof. We first observe that the maximum is achieved: as the Rayleigh quotient is homogeneous, Theorem 2.2.3. For every real symmetric matrix M of rank r, there exist non-zero real numbers
it suffices to consider unit vectors x . As the Rayleigh quotient is continuous on the set of unit µ1 , . . . , µr and orthonormal vectors ψ 1 , . . . , ψ r such that
vectors and this set is closed and compact, the maximum is achieved on this set3 .
r
X
Now, let x be any non-zero vector that maximizes the Rayleigh quotient. We recall that the M = µi ψ i ψ Ti . (2.4)
gradient of a function at its maximum must be the zero vector. Let’s compute that gradient. i=1

We have4
The content of this theorem is equivalent to that of Theorem 1.3.1, because multiplying Eq. (2.4)
∇x T x = 2x ,
on the right by ψ i gives M ψ i = µi ψ i .
and
We first recall an elementary property of symmetric matrices.
∇x T M x = 2M x .
So, Theorem 2.2.4. The span of a symmetric matrix is orthogonal to its nullspace.
xTM x (x T x )(2M x ) − (x T M x )(2x )
∇ T = .
x x (x T x )2 Proof. Let M be a symmetric matrix. Recall that its span is the set of vectors of the form M x ,
In order for this to be zero, we must have and its nullspace is the set of vectors z for which M z = 0. For y = M x , we have

(x T x )M x = (x T M x )x , z T y = z T M x = (z T M )x = 0T x = 0,

which implies because M is symmetric.


xTM x
Mx = x.
xTx The following lemma is the key to the inductive proof of Theorem 2.2.3.
That is, if and only if x is an eigenvector of M with eigenvalue equal to its Rayleigh quotient. As
x maximizes the Rayleigh quotient, this eigenvalue must be the largest of M . Lemma 2.2.5. Let M be a symmetric matrix, and let ψ be a unit eigenvector of M with
non-zero eigenvalue µ. Let
Corollary 2.2.2. Every non-zero symmetric matrix M has at least one eigenvector with c = M − µψψ T .
M
non-zero eigenvalue.
c contains the span of ψ and the nullspace of M . And, the rank of M is
Then, the nullspace of M
c.
larger by one than the rank of M
Proof. We first show that there is some vector x for which xTM x
̸= 0. If M (i, i) ̸= 0 for some i,
then δ i , the elementary unit vector in direction i, is such a vector. If all diagonals of M are zero,
c . Let x be a
Proof. We first show that the nullspace of M is contained in the nullspace of M
let M (i, j) be a non-zero entry of M and set x = δ i + δ j . This suffices because
x T M x = 2M (i, j) ̸= 0. vector for which M x = 0. As ψ is in the span of M , x is orthogonal to ψ. So,

If x T M x > 0, we can use Theorem 2.2.1 to obtain an eigenvector ψ of M with eigenvalue µ > 0. c x = M x − µψψ T x = 0 − ψ0 = 0.
M
If x T M x < 0, then apply Theorem 2.2.1 to −M to obtain an eigenvector ψ of −M with
eigenvalue ν > 0, and observe that ψ as an eigenvector of M with eigenvalue µ = −ν < 0. c forces ψ to lie in its
As µ ̸= 0, ψ is not in the nullspace of M . However, our construction of M
nullspace:
We now give a self-contained proof of the Spectral Theorem for symmetric matrices. The idea of c ψ = M ψ − µψ(ψ T ψ) = µψ − µψ = 0,
M
the proof is to use Theorem 2.2.1 to obtain ψ 1 and µ1 , and then proceed by induction.
where the second equality uses the fact that ψ is a unit vector.
3
Here’s an explanation for those not familiar with analysis: we need to avoid the situation in which there are x
on which the function is arbitrarily close to its maximum, but there are none on which it is achieved. We also need c . Together with the fact that
Now, Theorem 2.2.4 tells us that ψ is orthogonal to the span of M
to avoid the situation in which the maximum is undefined. These conditions guarantee that the maximum is defined M =M c + µψψ T , this implies that the span of M equals the span of Mc and ψ, and thus is 1
and that there is a unit vector x at which it is achieved. You can read almost all of this book without knowing c.
dimensional larger than the span of M
analysis, as long as you are willing to accept this result.
4
In case you are not used to computing gradients of functions of vectors, you can derive these directly by reasoning
like Proof of Theorem 2.2.3. We proceed by induction on the rank of M . If M is the zero matrix,
∂ ∂ X
xT x = x (b)2 = 2x (a). then the theorem is trivial.
∂x (a) ∂x (a)
b
CHAPTER 2. EIGENVALUES AND OPTIMIZATION 27 CHAPTER 2. EIGENVALUES AND OPTIMIZATION 28

We now assume that the theorem has been proved for all matrices of rank r, and prove it for Theorem 2.3.2. Let A be an arbitrary real m-by-n matrix, and let σ1 ≥ . . . ≥ σr be its singular
matrices of rank r + 1. Let M be a symmetric matrix of rank r + 1. We know from Corollary 2.2.2 values, where r = min(m, n). Then,
c = M − µψψ T .
that there is a unit eigenvector ψ of M of eigenvalue µ ̸= 0. Let M
σ1 = max u T Av ,
By Lemma 2.2.5, the rank of M c is r. Our inductive hypothesis now implies that there are ∥u∥=1
∥v ∥=1
orthonormal vectors ψ 1 , . . . , ψ r and non-zero µ1 , . . . , µr such that
r
X and
c =
M µi ψ i ψ Ti . σk = max min u T Av ,
dim(S)=k u∈S,v ∈T
i=1
dim(T )=k ∥u∥=1,∥v ∥=1
Setting ψ r+1 = ψ and µr+1 = µ, we have
where in the minima above, u ∈ IRm , v ∈ IRn , S is a subspace of IRm and T is a subspace of IRn .
r+1
X
M = µi ψ i ψ Ti .
i=1 2.4 Exercise
c and ψ r+1 is in
To show that ψ r+1 is orthogonal to ψ i for i ≤ r, note that ψ i is in the span of M
its nullspace. 1. Prove Theorem 2.3.2.
2. A tighter characterization.
2.3 Singular Values for Asymmetric Matrices Tighten Theorem 2.2.3 by proving that for every sequence of vectors x 1 , . . . , x n such that

The characterization of eigenvalues by maximizing or minimizing the Rayleigh quotient only x i ∈ arg max xTM x,
∥x ∥=1
works for symmetric matrices. The analogous quantities for non-symmetric matrices A are the xT x j =0,for j<i

singular vectors and singular values of A, which are the eigenvectors of AAT and AT A, and the
square roots of the eigenvalues of those matrices. each x i is an eigenvector of M .

Definition 2.3.1. The singular value decomposition of a matrix A is an expression of the form
A = U ΣV T ,
where U and V are matrices with orthonormal columns and Σ is a diagonal matrix with
non-negative entries. The diagonal entries of Σ are the singular values of A, and the columns of
U and V are its left and right singular vectors.

Even rectangular matrices have singular value decompositions. If A is an m-by-n matrix and
r = min(m, n), we can assume that Σ is square of dimension r, and that U and V are m-by-r
and n-by-r matrices with orthonormal columns. Let σ1 ≥ . . . ≥ σr be the diagonal entries of Σ,
and let u 1 , . . . , u r and v 1 , . . . , v r be the columns of U and V . Then, the above decomposition
can be written
Xr
A= σi u i v Ti .
i=1
As the columns of V are orthonormal, it follows that
Av i = σi u i
for any singular vector v i .
We can use techniques similar to those we used to prove the Courtant-Fischer Theorem to obtain
the following characterization of the singular values.
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 30

This is the matrix that is zero except at the intersection of rows and columns indexed by a and b,
where it looks looks like  
1 −1
.
−1 1
Summing the matrices for every edge, we obtain
X X
Chapter 3 LG = wa,b (δ a − δ b )(δ a − δ b )T = wa,b LGa,b .
(a,b)∈E (a,b)∈E

You should check that this agrees with the definition of the Laplacian from Section 1.2.3:
The Laplacian and Graph Drawing LG = D G − AG ,

where X
D G (a, a) = wa,b .
3.1 The Laplacian Matrix b

This formula turns out to be useful when we view the Laplacian as an operator. For every vector
We begin this section by establishing the equivalence of multiple expressions for the Laplacian. x we have X X
The Laplacian Matrix of a weighted graph G = (V, E, w), w : E → IR+ , is designed to capture the (LG x )(a) = d(a)x (a) − wa,b x (b) = wa,b (x (a) − x (b)), (3.3)
Laplacian quadratic form: (a,b)∈E (a,b)∈E
X P
x T LG x = wa,b (x (a) − x (b))2 . (3.1) because d(a) = (a,b)∈E wa,b .
(a,b)∈E
From (3.1), we see that if all entries of x are the same, then x T Lx equals zero. From (3.3), we
We will now use this quadratic form to derive the structure of the matrix. To begin, consider a can immediately see that L1 = 0, so the constant vectors are eigenvectors of eigenvalue zero. If
graph with just two vertices and one edge of weight 1. Let’s call it G1,2 . We have the graph is connected, these are the only eigenvectors of eigenvalue zero.

x T LG1,2 x = (x (1) − x (2))2 . (3.2) Lemma 3.1.1. Let G = (V, E) be a weighted graph, and let 0 = λ1 ≤ λ2 ≤ · · · ≤ λn be the
eigenvalues of its Laplacian matrix, L. Then, λ2 > 0 if and only if G is connected.
Consider the vector δ 1 − δ 2 , where δ a is the elementary unit vector with a 1 in coordinate a. We
have Proof. We first show that λ2 = 0 if G is disconnected. If G is disconnected, then it can be
x (1) − x (2) = δ T1 x − δ T2 x = (δ 1 − δ 2 )T x , described as the disjoint union of two graphs, G1 and G2 , with no edges between them. After
so suitably reordering the vertices, we can write
 
2 L 0
(x (1) − x (2))2 = (δ 1 − δ 2 )T x = x T (δ 1 − δ 2 ) (δ 1 − δ 2 )T x L = G1 .
    0 LG2
1   1 −1
= xT 1 −1 x = x T x.
−1 −1 1 So, L has at least two orthogonal eigenvectors of eigenvalue zero:
   
Thus,   0 1
and .
1 −1 1 0
LG1,2 = .
−1 1
where we have partitioned the vectors as we did the matrix L.
Now, let Ga,b be the graph with just one edge between vertices a and b. It can have as many
On the other hand, assume that G is connected and that ψ is an eigenvector of L of eigenvalue 0.
other vertices as you like. The Laplacian of Ga,b can be written in the same way:
As
LGa,b = (δ a − δ b )(δ a − δ b )T . Lψ = 0,

29
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 31 CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 32

we have X the squares of the lengths of the edges in the drawing:


xT Lx = wa,b (ψ(a) − ψ(b))2 = 0.
X x (a) x (b) 2
(a,b)∈E − .
y (a) y (b)
Thus, for every pair of vertices (a, b) connected by an edge, we have1 ψ(a) = ψ(b). As every pair (a,b)∈E

of vertices a and b are connected by a path, we may inductively apply this fact along the path to This turns out not to be so different from minimizing (3.4), as it equals
show that ψ(a) = ψ(b) for all vertices a and b. Thus, ψ must be a constant vector. We conclude X
that the eigenspace of eigenvalue 0 has dimension 1. (x (a) − x (b))2 + (y (a) − y (b))2 = x T Lx + y T Ly .
(a,b)∈E

As before, we impose the scale conditions


3.2 Drawing with Laplacian Eigenvalues
∥x ∥2 = 1 and ∥y ∥2 = 1,
The idea of drawing graphs using eigenvectors demonstrated in Section 1.5.1 was suggested by and the centering constraints
Hall [Hal70] in 1970. 1T x = 0 and 1T y = 0.
To explain Hall’s approach, we first consider the problem of drawing a graph on a line. That is,
mapping each vertex to a real number. It isn’t easy to see what a graph looks like when you do However, this still leaves us with the degenerate solution x = y = ψ 2 . To ensure that the two
this, as all of the edges sit on top of one another. One can fix this either by drawing the edges of coordinates are different, Hall introduced the restriction that x be orthogonal to y . This is how
the graph as curves, or by wrapping the line around a circle. we drew the pictures in Figs. 1.1, 1.4 and 1.5. To draw a graph in k dimensions, Hall P suggested
choosing k orthonormal vectors x 1 , . . . , x k that are orthogonal to 1 and minimize i x Ti Lx i . A
Let x ∈ IRV be the vector that describes the assignment of a real number to each vertex. We natural choice for these is ψ 2 through ψ k+1 , and this choice achieves objective function value
would like vertices that are neighbors to be close to one another. For an unweighted graph, Hall Pk+1
i=2 λi .
suggested choosing an x minimizing
X The following theorem says that this choice is optimal. It is a variant of [Fan49, Theorem 1].
x T Lx = (x (a) − x (b))2 . (3.4)
Theorem 3.2.1. Let L be a Laplacian matrix with eigenvalues 0 = λ1 < λ2 ≤ · · · ≤ λn
(a,b)∈E
corresponding to orthonormal eigenvectors ψ 1 , . . . , ψ n , and let x 1 , . . . , x k be any orthonormal
Unless we place restrictions on x , the solution will be degenerate. For example, all of the vertices vectors that are all orthogonal to 1. Then
could map to 0. To avoid this, and to fix the scale of the drawing overall, we require k
X k+1
X
X x Ti Lx i ≥ λi ,
x (a)2 = ∥x ∥2 = 1. (3.5) i=1 i=2
a∈V
and this inequality is tight only when x Ti ψ j = 0 for all j such that λj > λk+1 .
Even with this restriction, another degenerate solution is possible: it could be that every vertex

maps to 1/ n. To prevent this from happening, we impose the additional restriction that We would have liked to have said “this inequality is tight only when x Ti ψ j = 0 for all j > k + 1,”
X but this could be wrong if there are repeated eigenvalues.
x (a) = 1T x = 0. (3.6)
a
Proof. Let x k+1 , . . . , x n be vectors such that x 1 , . . . , x n is an orthonormal basis. We can find
On its own, this restriction fixes the shift of the drawing along the line. When combined with these by choosing x k+1 , . . . , x n to be an orthonormal basis of the space orthogonal to x 1 , . . . , x k .
(3.5), it guarantees that we get something interesting. We now know that for all 1 ≤ i ≤ n
n
X n
X
As 1 is the eigenvector of smallest eigenvalue of the Laplacian, Theorem 2.0.1 implies that a unit
(ψ Tj x i )2 = 1 and (x Tj ψ i )2 = 1.
eigenvector of λ2 minimizes x T Lx subject to (3.5) and (3.6).
j=1 j=1
Of course, we really want to draw a graph in two dimensions. So, we will assign two coordinates
to each vertex given by x and y . As opposed to minimizing (3.4), we will minimize the sum of That is, the matrix with i, j entry (ψ Tj x i )2 is doubly-stochastic 2 .
2
1
Recall that edges weights are assumed to be positive. This theorem is really about majorization, which is easily established through multiplication by a doubly-
stochastic matrix.
CHAPTER 3. THE LAPLACIAN AND GRAPH DRAWING 33


We can assume that ψ 1 = 1/ n. For i ≤ k, ψ T1 x i = 0; so, Lemma 2.1.1 implies
n
X
x Ti Lx i = λj (ψ Tj x i )2
j=2
n
X n
X
= λk+1 + (λj − λk+1 )(ψ Tj x i )2 , by (ψ Tj x i )2 = 1
j=2 j=2 Chapter 4
k+1
X
≥ λk+1 + (λj − λk+1 )(ψ Tj x i )2 ,
j=2

as λj ≥ λk+1 for j > k + 1. This inequality is only tight when (ψ Tj x i )2 = 0 for all j such that
Adjacency matrices, Eigenvalue
λj > λk+1 .
Interlacing, and the Perron-Frobenius
Summing over i we obtain
k
X k+1
X k
X
Theorem
x Ti Lx i ≥ kλk+1 + (λj − λk+1 ) (ψ Tj x i )2
i=1 j=2 i=1
k+1
X
≥ kλk+1 + (λj − λk+1 ) In this chapter, we examine the meaning of the smallest and largest eigenvalues of the adjacency
j=2 matrix of a graph. Note that the largest eigenvalue of the adjacency matrix corresponds to the
k+1
X smallest eigenvalue of the Laplacian. Our focus in this chapter will be on the features that
= λj , adjacency matrices possess but which Laplacians do not. Where the smallest eigenvector of the
j=2 Laplacian is a constant vector, the largest eigenvector of an adjacency matrix, called the Perron
Pk T 2
vector, need not be. The Perron-Frobenius theory tells us that the largest eigenvector of an
where the second inequality follows from the facts that λj − λk+1 ≤ 0 and i=1 (ψ j x i ) ≤ 1.
adjacency matrix is non-negative, and that its value is an upper bound on the absolute value of
This inequality is tight under the same conditions as the previous one.
the smallest eigenvalue. These are equal precisely when the graph is bipartite.
We will examine the relation between the largest adjacency eigenvalue and the degrees of vertices
in the graph. This is made more meaningful by the fact that we can apply Cauchy’s Interlacing
The beautiful pictures that we sometimes obtain from Hall’s graph drawing should convince you Theorem to adjacency matrices. We will use it to prove a theorem of Wilf [Wil67] which says that
that eigenvectors of the Laplacian should reveal a lot about the structure of graphs. But, it is a graph can be colored using at most 1 + ⌊µ1 ⌋ colors. We will learn more about eigenvalues and
worth pointing out that there are many graphs for which this approach does not produce nice graph coloring in Chapter 19.
images, and there are in fact graphs that can not be nicely drawn. Expander graphs are good
examples of these.
Many other approaches to graph drawing borrow ideas from Hall’s work: they try to minimize 4.1 The Adjacency Matrix
some function of the distances of the edges subject to some constraints that keep the vertices well
separated. However, very few of these have compactly describable solutions, or even solutions Let M be the adjacency matrix of a (possibly weighted) graph G. As an operator, M acts on a
that can provably be computed in polynomial time. The algorithms that implement them vector x ∈ IRV by X
typically use a gradient based method to attempt to minimize the function of the distances (M x )(a) = wa,b x (b). (4.1)
subject to constraints, but can not guarantee that they actually minimize it. For many of these (a,b)∈E
methods, relabeling the vertices could produce very different drawings! Thus, one must be careful
before using these images to infer some truth about a graph. When the edge set E is understood,
P we use the notation a ∼ b as shorthand for (a, b) ∈ E. Thus,
we may write (M x )(a) = b∼a wa,b x (b).
We will denote the eigenvalues of M by µ1 , . . . , µn . But, we order them in the opposite direction

34
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 35 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 36

than we did for the Laplacian: we assume Proof. If we have equality in (4.2), then it must be the case that d(a) = dmax and ϕ1 (b) = ϕ1 (a)
for all (a, b) ∈ E. Thus, we may apply the same argument to every neighbor of a. As the graph is
µ1 ≥ µ2 ≥ · · · ≥ µ n . connected, we may keep applying this argument to neighbors of vertices to which it has already
been applied to show that ϕ1 (c) = ϕ1 (a) and d(c) = dmax for all c ∈ V .
The reason for this convention is so that µi corresponds to the ith Laplacian eigenvalue, λi . If G
is a d-regular graph, then D = I d,
The following is a commonly used extension of Lemma 4.2.1.
L = dI − M ,
and so Theorem 4.2.3. Let M be an arbitrary n-by-n matrix with complex entries, and let λ be an
λi = d − µ i . eigenvalue of M . Then,
X def
|λ| ≤ max |M (a, b)| = ∥M ∥∞ .
a
Thus the largest adjacency eigenvalue of a d-regular graph is d, and its corresponding eigenvector b

is the constant vector. We could also prove that the constant vector is an eigenvector of
eigenvalue d by considering the action of M as an operator (4.1): if x (a) = 1 for all a ∈ V , then Proof. Let ϕ be an eigenvector of M with eigenvalue λ, and let a be the index on which ϕ has
(M x )(b) = d for all b ∈ V . largest magnitude. We have

X
|λ| |ϕ(a)| = |λϕ(a)| = |(M ϕ)(a)| = M (a, b)ϕ(b)
4.2 The Largest Eigenvalue, µ1 b
X X X
≤ |M (a, b)| |ϕ(b)| ≤ |M (a, b)| |ϕ(a)| = |ϕ(a)| |M (a, b)| .
We now examine µ1 for graphs which are not necessarily regular. Let G be a graph, let dmax be b b b
the maximum degree of a vertex in G and let dave be the average degree of a vertex in G. We will P
So, |λ| ≤ b |M (a, b)|.
show that µ1 lies between these. This naturally holds when we measure degrees by weight.

Lemma 4.2.1. For a weighted graph G with n vertices, we define There are graphs for which the bounds in Lemma 4.2.1 are very loose. Consider a large complete
d-ary
√ tree, T . It will have maximum degree d, an average degree just a little under 2, and µ1 close
1X
def def
dave = d (a) and dmax = max d (a). to 2 d − 1. The following theorem establishes a tight upper bound on µ1 by re-scaling the matrix
n a a
to average out the low and high degree vertices. We save the lower bound for an exercise.
If µ1 is the largest adjacency eigenvalue of G, then Theorem 4.2.4. Let d ≥ 2 and let T be a tree in which every √ vertex has degree at most d. Then,
all adjacency eigenvalues of T have absolute value at most 2 d − 1.
dave ≤ µ1 ≤ dmax .
Proof. Let M be the adjacency matrix of T . Choose some vertex to be the root of the tree, and
Proof. The lower bound follows by considering the Rayleigh quotient of 1 with respect to M : define its height to be 0. For every other vertex a, define its height, h(a), to be its distance to the
P root. Define D to be the diagonal matrix with
xTM x 1T M 1 1T d a d (a)
µ1 = max ≥ = = = dave . √ h(a)
x xTx 1T 1 n n
D(a, a) = d−1 .
To prove the upper bound, let ϕ1 be an eigenvector of eigenvalue µ1 . We may assume without Recall that multiplying a matrix by a diagonal matrix from the left scales its rows, that
loss of generality that ϕ1 has a positive value, because we can replace it by −ϕ1 if it does not. multiplying by a diagonal matrix from the right scales its columns, and that the eigenvalues of M
Let a be the vertex on which ϕ1 takes its maximum value. We show that µ1 ≤ d (a): are the same as the eigenvalues of DM D −1 , Theorem 4.2.3 tells us that the magnitude of these
1 1 X 1 X eigenvalues is at most the largest row sum of DM D −1 .
µ1 = (M ϕ1 )(a) = wa,b ϕ1 (b) ≤ wa,b ϕ1 (a) = d (a) ≤ dmax . (4.2) √
ϕ1 (a) ϕ1 (a)
b:b∼a
ϕ1 (a)
b:b∼a
So, we need to prove that all row sums of DM D −1 are at most 2 d − 1. There are √ three types
of vertices√to consider.
√ First, the row of the root has up to d entries that are all 1/ d − 1. For
d ≥ 2, d/ d − √ 1 ≤ 2 d − 1. Second, every leaf has exactly one nonzero entry in its row,
√ and that
entry equals d − 1 The rest of the vertices
√ their row that equals d − 1, and
have one entry in √
Lemma 4.2.2. If G is connected and µ1 = dmax , then G is dmax -regular.
up to d − 1 entries that are equal to 1/ d − 1, for a total of 2 d − 1.
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 37 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 38

4.3 Eigenvalue Interlacing Proof. If M is the adjacency matrix of G, then M (S) is the adjacency matrix of G(S). Lemma
4.2.1 says that dave (S) is at most the largest eigenvalue of the adjacency matrix of G(S), and
We can strengthen the lower bound in Lemma 4.2.1 by proving that µ1 is at least the average Theorem 4.3.1 says that this is at most µ1 .
degree of every subgraph of G. We will prove this by applying Cauchy’s Interlacing Theorem. We can also prove this without using Cauchy’s Interlacing Theorem. Consider the Rayleigh
For a graph G = (V, E) and S ⊂ V , we define the vertex-induced subgraph induced by S, written quotient in the characteristic vector of S, 1S , where
G(S), to be the graph with vertex set S and all edges in E connecting vertices in S: (
1 if a ∈ S
1S (a) =
{(a, b) ∈ E : a ∈ S and b ∈ S} . 0 otherwise.

For a symmetric matrix M whose rows and columns are indexed by a set V and subset S of V , Theorem 2.2.1 tells us that µ1 is at least
we write M (S) for the symmetric submatrix with rows and columns in S. P
1TS M 1S a,b∈S,(a,b)∈E wa,b
Theorem 4.3.1 (Cauchy’s Interlacing Theorem). Let A be an n-by-n symmetric matrix and let = = dave (S).
1TS 1S |S|
B be a principal submatrix of A of dimension n − 1 (that is, B is obtained by deleting the same
row and column from A). Then,
If we remove the vertex of smallest degree from a graph, the average degree can increase. On the
α1 ≥ β1 ≥ α2 ≥ β2 ≥ · · · ≥ αn−1 ≥ βn−1 ≥ αn , other hand, Cauchy’s Interlacing Theorem says that µ1 can only decrease when we remove a
vertex.
where α1 ≥ α2 ≥ · · · ≥ αn and β1 ≥ β2 ≥ · · · ≥ βn−1 are the eigenvalues of A and B, respectively.
Lemma 4.3.2 is a good demonstration of Cauchy’s Theorem. But, using Cauchy’s Theorem to
prove it was overkill. A more direct way to prove it is to emulate the proof of Lemma 4.2.1, but
Proof. Without loss of generality we will assume that B is obtained from A by removing its first computing the quadratic form in the characteristic vector of S instead of 1.
row and column. We now apply the Courant-Fischer Theorem, which tells us that

x T Ax
αk = max min
S⊆IRn x ∈S xTx
. 4.4 Wilf ’s Theorem
dim(S)=k

We now apply Lemma 4.3.2 to obtain an upper bound on the chromatic number of a graph.
Applying this to B gives Recall that a coloring of a graph is an assignment of colors to vertices in which adjacent vertices
 T   have distinct colors. A graph is said to be k-colorable if it can be colored with only k colors1 . The
0 0 chromatic number of a graph, written χ(G), is the least k for which G is k-colorable. The
A
x T Bx x x bipartite graphs are exactly the graph of chromatic number 2.
βk = max min T = max min .
S⊆IRn−1 x ∈S x x S⊆IRn−1 x ∈S xTx
dim(S)=k dim(S)=k It is easy to show that every graph is (dmax + 1)-colorable. Assign colors to the vertices
one-by-one. As each vertex has at most dmax neighbors, there is always some color one can assign
We see that the right-hand expression is a maximum over the special family of subspaces of IRn of that vertex that is different than those assigned to its neighbors. The following theorem of Wilf
dimension k in which all vectors in the subspaces must have first coordinate 0. As the maximum [Wil67] improves upon this bound.
over all subspaces of dimension k can only be larger,
Recall that for a real number x, ⌊x⌋ is the largest integer less than or equal to x.
αk ≥ βk .
Theorem 4.4.1.
χ(G) ≤ ⌊µ1 ⌋ + 1.
We may prove the inequalities in the other direction, such as βk ≥ αk+1 , by replacing A and B
with −A and −B, or by using the other characterization of eigenvalues in the Courant-Fischer
Proof. We prove this by induction on the number of vertices in the graph. To ground the
Theorem as a minimum over subspaces.
induction, consider the graph with one vertex and no edges. It has chromatic number 1 and
Lemma 4.3.2. For any S ⊆ V , let dave (S) be the average degree of G(S). Then, 1
To be precise, we often identify these k colors with the integers 1 through k. A k-coloring is then a function
c : {1, . . . , k} → V such that c(a) ̸= c(b) for all (a, b) ∈ E.
dave (S) ≤ µ1 .
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 39 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 40

largest adjacency eigenvalue zero2 . Now, assume the theorem is true for all graphs on n − 1 Lemma 4.5.2. Under the conditions of Theorem 4.5.1, assume that some non-negative vector ϕ
vertices, and let G be a graph on n vertices. Let a be the vertex of minimum degree in G. is an eigenvector of M . Then, ϕ is strictly positive.
Lemma 4.2.1 tells us that the average degree of a vertex of G is at most a is at most µ1 , and so
the degree of a can be no larger. As the degree of a must be an integer, it can be at most ⌊µ1 ⌋. Proof. If ϕ is not strictly positive, there is some vertex a for which ϕ(a) = 0. As G is connected,
Let S = V \ {a}. By Lemma 4.3.2, the largest eigenvalue of G(S) is at most µ1 , and so our and ϕ is not identically zero, there must be some edge (b, c) for which ϕ(b) = 0 but ϕ(c) > 0. Let
induction hypothesis implies that G(S) has a coloring with at most ⌊µ1 ⌋ + 1 colors. Let c be any µ be the eigenvalue of ϕ. As ϕ(b) = 0, we obtain a contradiction from
such coloring. We just need to show that we can extend c to a. As a has at most ⌊µ1 ⌋ neighbors, X
there is some color in {1, . . . , ⌊µ1 ⌋ + 1} that does not appear among its neighbors, and which it 0 = µϕ(b) = (M ϕ)(b) = wb,z ϕ(z) ≥ wb,c ϕ(c) > 0,
may be assigned. Thus, G has a coloring with ⌊µ1 ⌋ + 1 colors. (b,z)∈E

The simplest example in which this theorem improves over the naive bound of dmax + 1 is the where the inequalities follow from the fact that the terms wb,z and ϕ(z) are non-negative.
path graph on 3 vertices: it has dmax = 2 but µ1 < 2. Thus, Wilf’s theorem tells us that it can be So, we conclude that ϕ must be strictly positive.
colored with 2 colors, which is tight.
Star graphs provide more extreme examples. A star graph with n vertices has dmax = n − 1 but Proof of Theorem 4.5.1. Let ϕ1 be an eigenvector of µ1 of norm 1, and construct the vector x

µ1 = n − 1. So, the upper bound on the chromatic number given by Wilf’s theorem is much such that
better than the naive dmax + 1. But its is far from the actual chromatic number of a star graph, 2. x (u) = |ϕ1 (u)| , for all u.
We will show that x is an eigenvector of eigenvalue µ1 .

4.5 Perron-Frobenius Theory for symmetric matrices We have x T x = ϕT1 ϕ1 . Moreover,


X X
µ1 = ϕT1 M ϕ1 = M (a, b)ϕ1 (a)ϕ1 (b) ≤ M (a, b) |ϕ1 (a)| |ϕ1 (b)| = x T M x .
The eigenvector corresponding to the largest eigenvalue of the adjacency matrix of a graph is a,b a,b
usually not a constant vector. However, it is always a positive vector if the graph is connected.
This follows from the Perron-Frobenius theory (discovered independently by Perron [Per07] and So, the Rayleigh quotient of x is at least µ1 . As µ1 is the maximum possible Rayleigh quotient for
Frobenius [Fro12]). In fact, the Perron-Frobenius theory says much more, and it can be applied to a unit vector, the Rayleigh quotient of x must be µ1 and Theorem 2.2.1 implies that x must be
adjacency matrices of strongly connected directed graphs. Note that these need not even be an eigenvector of µ1 . As x is non-negative, Lemma 4.5.2 implies that it is strictly positive and
diagonalizable! satisfies the promise of part a.
In the symmetric case, the theory is made much easier by both the spectral theory and the To prove part b, let ϕn be the eigenvector of µn and let y be the vector for which y (u) = |ϕn (u)|.
characterization of eigenvalues as extreme values of Rayleigh quotients. In the spirit of the previous argument, we can again show that
X
Theorem 4.5.1. [Perron-Frobenius, Symmetric Case] Let G be a connected weighted graph, let A |µn | = |ϕn M ϕn | ≤ M (a, b)y (a)y (b) = y T M y ≤ µ1 y T y = µ1 . (4.3)
be its adjacency matrix, let H be a nonnegative diagonal matrix, let M = A + H , and let a,b
µ1 ≥ µ2 ≥ · · · ≥ µn be its eigenvalues. Then
To show that the multiplicity of µ1 is 1 (that is, µ2 < µ1 ), consider an eigenvector ϕ2 . As ϕ2 is
a. The eigenvalue µ1 has a strictly positive eigenvector, orthogonal to ϕ1 , it must contain both positive and negative values. We now construct the vector
y such that y (u) = |ϕ2 (u)| and repeat the argument that we used for x . We find that
b. µ1 ≥ −µn , and

c. µ1 > µ2 . µ2 = ϕT2 M ϕ2 ≤ y T M y ≤ µ1 .

If µ2 = µ1 , then y is a nonnegative eigenvector of eigenvalue µ1 , and so Lemma 4.5.2 says that it


For simplicity, we just prove the theorem in the case where H is the zero matrix. is strictly positive. Thus, ϕ2 does not have any zero entries. As it has both positive and negative
We begin with a helpful lemma. It says that a non-negative eigenvector of the adjacency matrix entries and the graph is connected, there must be some edge (a, b) for which ϕ2 (a) < 0 < ϕ2 (b).
of a connected graph must be strictly positive. This forces the inequality ϕT2 M ϕ2 < y T M y to be strict because the edge (a, b) will make a
2
negative contribution to ϕT2 M ϕ2 and a positive contribution to y T M y . This contradicts our
If this makes you uncomfortable, you could use both graphs on two vertices
assumption that µ2 = µ1 .
CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 41 CHAPTER 4. ADJACENCY, INTERLACING, AND PERRON-FROBENIUS 42

Finally, we show that for a connected graph G, µn = −µ1 if and only if G is bipartite. In fact, if The multiset of eigenvalues of M is comprised of t − s zeros and {±σ1 , . . . , ±σs } where σ1 , . . . , σs
µn = −µ1 , then µn−i = −µi+1 for every i. are the singular values of B. The matrix of form M is often referred to as the dilation of B.

Proposition 4.5.3. If G is a connected graph and µn = −µ1 , then G is bipartite. For a treatment of the general Perron-Frobenius theory, we recommend Seneta [Sen06] or Bapat
and Raghavan [BR97].
Proof. Consider the conditions necessary to achieve equality in (4.3). First, y must be an
eigenvector of eigenvalue µ1 . Thus, y must be strictly positive, ϕn can not have any zero values,
and there must be an edge (a, b) for which ϕn (a) < 0 < ϕn (b). It must also be the case that all of 4.7 Exercises
the terms in X
M (a, b)ϕn (a)ϕn (b) 1. Trees.
(a,b)∈E
Prove that the bound prove in Theorem 4.2.4 is tight. Begin by figuring out how to state that
have the same sign, and we have established that this sign must be negative. Thus, for every edge formally.
(a, b), ϕn (a) and ϕn (b) must have different signs. That is, the signs provide the bipartition of the
vertices.

Proposition 4.5.4. If G is bipartite then the eigenvalues of its adjacency matrix are symmetric
about zero.

Proof. As G is bipartite, we may divide its vertices into sets S and T so that all edges go between
S and T . Let ϕ be an eigenvector of M with eigenvalue µ. Define the vector x by
(
ϕ(a) if a ∈ S, and
x (a) =
−ϕ(a) if a ∈ T .

To see that x is an eigenvector with eigenvalue −µ, note that for a ∈ S,


X X
(M x )(a) = M (a, b)x (b) = M (a, b)(−ϕ(b)) = −(M ϕ)(a) = −µϕ(a) = −µx (a).
(a,b)∈E (a,b)∈E

We may similarly show that (M x )(a) = −µx (a) for a ∈ T .

4.6 Singular Values and Directed Graphs

If G is bipartite, with all edges going between the sets S and T , then the adjacency matrix of G
can be written in the form  
0 B
M = T ,
B 0
where the first set of rows and columns are indexed by S and the second by T . Instead of
examining the entire matrix M , we can instead understand G in terms of the spectral properties
of B. But, B is not necessarily symmetric nor even square. Instead of its eigenvalues, we consider
its singular values and singular vectors, defined in Section 2.3.
Returning to bipartite graphs, let s = |S|, t = |T |, and assume without loss of generality that
s ≤ t. One can show that the singular values of B are the square roots of the eigenvalues of BB T .
Chapter 5

Fundamental Graphs

Part II We will bound and derive the eigenvalues of the Laplacian matrices of some fundamental graphs,
including complete graphs, star graphs, ring graphs, path graphs, and products of these that yield
grids and hypercubes. As all these graphs are connected, they all have eigenvalue zero with
multiplicity one. We will have to do some work to compute the other eigenvalues.
The Zoo of Graphs We will see in Part IV that the Laplacian eigenvalues that reveal the most about a graph are the
smallest and largest ones. To interpret the smallest eigenvalues, we will exploit a relation between
λ2 and the isoperimetric ratio of a graph that is derived in Chapter 20, and which we state here
for convenience. The boundary of a set of vertices S, written ∂(S), is the set of edges with exactly
one endpoint in S. The isoperimetric ratio of S, written θ(S), is the size of the boundary of S
divided by size of S:
def |∂(S)|
θ(S) = .
|S|
In Theorem 20.1.1, we show
θ(S) ≥ λ2 (1 − s). (5.1)

5.1 The complete graph

The complete graph on n vertices, Kn , has edge set {(a, b) : a ̸= b}.

Lemma 5.1.1. The Laplacian of Kn has eigenvalue 0 with multiplicity 1 and n with multiplicity
n − 1.

Proof. To compute the non-zero eigenvalues, let ψ be any non-zero vector orthogonal to the all-1s
vector, so X
ψ(a) = 0. (5.2)
a

We now compute the first coordinate of LKn ψ. Using (3.3), the expression for the action of the

43 44
CHAPTER 5. FUNDAMENTAL GRAPHS 45 CHAPTER 5. FUNDAMENTAL GRAPHS 46

Laplacian as an operator, we find Proof. Applying Lemma 5.2.1 to vertices i and i + 1 for 2 ≤ i < n, we find n − 2 linearly
n
independent eigenvectors of the form δ i − δ i+1 , all with eigenvalue 1. As 0 is also an eigenvalue,
X X
(LKn ψ) (1) = (ψ(1) − ψ(b)) = (n − 1)ψ(1) − ψ(b) = nψ(1), by (5.2). only one eigenvalue remains to be determined.
b≥2 b=2 Recall that the trace of a matrix equals both the sum of its diagonal entries and the sum of its
As the choice of coordinate was arbitrary, we have Lψ = nψ. So, every vector orthogonal to the eigenvalues. We know that the trace of LSn is 2n − 2, and we have identified n − 1 eigenvalues
all-1s vector is an eigenvector of eigenvalue n. that sum to n − 2. So, the remaining eigenvalue must be n.

Alternative approach. Observe that LKn = nI − 11T . To determine the corresponding eigenvector, recall that it must be orthogonal to the other
eigenvectors we have identified. This tells us that it must have the same value at each of the
points of the star. Let this value be 1, and let x be the value at vertex 1. As the eigenvector is
We often think of the Laplacian of the complete graph as being a scaling of the identity. For
orthogonal to the constant vectors, it must be that
every x orthogonal to the all-1s vector, Lx = nx .
Now, let’s see how our bound on the isoperimetric ratio works out. Let S ⊂ [n]. Every vertex in S (n − 1) + x = 0,
has n − |S| edges connecting it to vertices not in S. So,
so x = −(n − 1).
|S| (n − |S|)
θ(S) = = n − |S| = λ2 (LKn )(1 − s),
|S|
5.3 Products of graphs
where s = |S| /n. Thus, Theorem 20.1.1 is sharp for the complete graph.
We now define a product on graphs. If we apply this product to two paths, we obtain a grid. If
we apply it repeatedly to one edge, we obtain a hypercube.
5.2 The star graphs
Definition 5.3.1. Let G = (V, E, v) and H = (W, F, w) be weighted graphs. Then G × H is the
The star graph on n vertices, denoted Sn , has edge set {(1, a) : 2 ≤ a ≤ n}. The degrees of graph with vertex set V × W and edge set
vertices in star graphs vary considerably, and their Laplacian and adjacency matrices have very  
different eigenvalues. a, b) with weight va,ba , where (a, b
(a, b), (b a) ∈ E and
 
To determine the eigenvalues of Sn , we first observe that each vertex a ≥ 2 has degree 1, and that
each of these degree-one vertices has the same neighbor. We now show that, in every graph, (a, b), (a, bb) with weight wb,bb , where (b, bb) ∈ F .
whenever two whenever two degree-one vertices share the same neighbor, they provide an
eigenvector of eigenvalue 1.

Lemma 5.2.1. Let G = (V, E) be a graph, and let a and b be vertices of degree one with the same
neighbor. Then, the vector ψ = δ a − δ b is an eigenvector of LG of eigenvalue 1.

Proof. Just multiply LG by ψ, and check (using (3.3)) vertex-by-vertex that it equals ψ.

While the star graph has n−12 pairs of degree-one vertices with the same neighbor, the span of
the space of the corresponding eigenvectors only has dimension n − 1.
As eigenvectors of different eigenvalues are orthogonal, for every eigenvector ψ of the Laplacian of
the star with eigenvalue other than 1 and for every a and b of degree one, we have
ψ T (δ a − δ b ) = 0, and thus that ψ(a) = ψ(b). Figure 5.1: An m-by-n grid graph is the product of a path on m vertices with a path on n vertices.
Lemma 5.2.2. The graph Sn has eigenvalue 0 with multiplicity 1, eigenvalue 1 with multiplicity This is a drawing of a 5-by-4 grid made using Hall’s algorithm from Section 3.2.
n − 2, and eigenvalue n with multiplicity 1.
CHAPTER 5. FUNDAMENTAL GRAPHS 47 CHAPTER 5. FUNDAMENTAL GRAPHS 48

Theorem 5.3.2. Let G = (V, E, v) and H = (W, F, w) be weighted graphs with Laplacian The eigenvectors of H1 are    
eigenvalues λ1 , . . . , λn and µ1 , . . . , µm , and eigenvectors α1 , . . . , αn and β 1 , . . . , β m , respectively. 1 1
and ,
Then, for each 1 ≤ i ≤ n and 1 ≤ j ≤ m, the Laplacian of G × H has an eigenvector γ i,j of 1 −1
eigenvalue λi + µj such that with eigenvalues 0 and 2, respectively. Thus, if ψ is an eigenvector of Hd−1 with eigenvalue λ, then
γ i,j (a, b) = αi (a)β j (b).
   
ψ ψ
and ,
Proof. Let α be an eigenvector of LG of eigenvalue λ, let β be an eigenvector of LH of eigenvalue ψ −ψ
µ, and let γ be defined as above.
are eigenvectors of Hd with eigenvalues λ and λ + 2, respectively. This means that Hd has
To see that γ is an eigenvector of eigenvalue λ + µ, we compute eigenvalue 2i for each 0 ≤ i ≤ d with multiplicity di . Moreover, each eigenvector of Hd can be
X X   identified with a vector y ∈ {0, 1}d :
(LG×H γ)(a, b) = va,ba (γ(a, b) − γ(b
a, b)) + wb,bb γ(a, b) − γ(a, bb)
Tx
(a,b
a)∈E (b,b
b)∈F ψ y (x ) = (−1)y ,
X X  
= va,ba (α(a)β(b) − α(b
a)β(b)) + wb,bb α(a)β(b) − α(a)β(bb)
where x ∈ {0, 1} ranges over the vertices of Hd . Each y ∈ {0, 1}d−1 indexing an eigenvector of
d
(a,b
a)∈E (b,b
b)∈F
X X   Hd−1 leads to the eigenvectors of Hd indexed by (y , 0) and (y , 1).
= va,ba β(b) (α(a) − α(b
a)) + wb,bb α(a) β(b) − β(bb)
Using Theorem 20.1.1 and the fact that λ2 (Hd ) = 2, we can immediately prove the following
(a,b
a)∈E (b,b
b)∈F
isoperimetric theorem for the hypercube.
= β(b)λα(a) + α(a)µβ(b)
Corollary 5.3.3.
= (λ + µ)(α(a)β(b)).
θHd ≥ 1.
In particular, for every set of at most half the vertices of the hypercube, the number of edges on
the boundary of that set is at least the number of vertices in that set.
An alternative approach to defining the graph product and proving Theorem 5.3.2 is via
Kronecker products. Recall that the Kronecker product A ⊗ B of an m1 -by-n1 matrix A and an This result is tight, as you can see by considering one face of the hypercube, such as all the
m2 -by-n2 matrix B is the (m1 m2 )-by-(n1 n2 ) matrix of form vertices whose labels begin with 0. It is possible to prove this by more concrete combinatorial
  means. In fact, very precise analyses of the isoperimetry of sets of vertices in the hypercube can
A(1, 1)B A(1, 2)B · · · A(1, n1 )B
 .. .. .  be obtained. See [Har76] or [Bol86].
 .. .. .
. . .
A(m1 , 1)B A(m1 , 2)B ··· A(m1 , n1 )B
5.4 Bounds on λ2 by test vectors
G × H is the graph with Laplacian matrix

(LG ⊗ I W ) + (I V ⊗ LH ). If we can guess an approximation of ψ 2 , we can often plug it in to the Laplacian quadratic form
to obtain a good upper bound on λ2 . The Courant-Fischer Theorem tells us that every vector v
orthogonal to 1 provides an upper bound on λ2 :
5.3.1 The Hypercube
v T Lv
d λ2 ≤ .
The d-dimensional hypercube graph, Hd , is the graph with vertex set {0, 1} that has edges vT v
between vertices whose names differ in exactly one coordinate. The hypercube may also be
When we use a vector v in this way, we call it a test vector.
expressed as the product of the one-edge graph with itself d − 1 times.
Let’s see what a natural test vector can tell us about λ2 of a path graph on n vertices. I would
Let H1 be the graph with vertex set {0, 1} and one edge between those vertices. Its Laplacian
like to use the vector that simply maps each vertex to its index on the path, but that vector is not
matrix has eigenvalues 0 and 2. As Hd = Hd−1 × H1 , we may use this to calculate the eigenvalues
and eigenvectors of Hd for every d.
CHAPTER 5. FUNDAMENTAL GRAPHS 49 CHAPTER 5. FUNDAMENTAL GRAPHS 50

orthogonal to 1. So, we will use the next best thing. Let x be the vector such that
1 1

x (a) = (n + 1) − 2a, for 1 ≤ a ≤ n. This vector satisfies x ⊥ 1, so 0.8 0.8

P 2
1≤a<n (x(a) − x(a + 1))
0.6 0.6

λ2 (Pn ) ≤ P 2
0.4 0.4

a x(a) 0.2 0.2

P 2
1≤a<n 2
0 0

=P 2
−0.2 −0.2

a (n + 1 − 2a) −0.4 −0.4

4(n − 1)
(clearly, the denominator is at least n3 /c for some c)
−0.6 −0.6

=
(n + 1)n(n − 1)/3 −0.8 −0.8

12 −1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

= . (5.3)
n(n + 1)
(a) The ring graph on 9 vertices. (b) The eigenvectors for k = 2.

We will soon see that this bound is of the right order of magnitude. This means that the relation
between λ2 and the isoperimetric ratio given in Eq. (5.1) (and proved in Theorem 20.1.1) is very Figure 5.2:
loose for the path graph: The isoperimetric ratio is minimized by the set S = {1, . . . , n/2}, which
has θ(S) = 2/n. However, the lower bound provided by Eq. (5.1) is of the form c/n2 . Cheeger’s
inequality, which appears in Chapter 21, will tell us that the error of this approximation can not Note that x 0 is the all-ones vector. When n is even the vector x n/2 exists, and it alternates ±1.
be worse than quadratic.
Proof. We will first see that x 1 and y 1 are eigenvectors by drawing the ring graph on the unit
The Courant-Fischer theorem is not as helpful when we want to prove lower bounds on λ2 . To circle in the natural way: plot vertex a at point (cos(2πa/n), sin(2πa/n)).
prove lower bounds, we need the form with a maximum on the outside, which gives
You can see that the average of the neighbors of a vertex is a vector pointing in the same
v T Lv direction as the vector associated with that vertex. This should make it obvious that both the x
λ2 ≥ max min .
S:dim(S)=n−1 v ∈S vT v and y coordinates in this figure are eigenvectors of the same eigenvalue. The same holds for all k.
This is not too helpful, as it is difficult to prove lower bounds on Alternatively, we can verify that these are eigenvectors by a simple computation, recalling that
v T Lv cos(a + b) = cos a cos b − sin a sin b, and
min T
v ∈S v v
sin(a + b) = sin a cos b + cos a sin b.
over a space S of large dimension. We will see a technique that lets us prove such lower bounds
next section.
But, first we compute the eigenvalues and eigenvectors of the path graph exactly.
(LRn x k ) (a) = 2x k (a) − x k (a + 1) − x k (a − 1)
= 2 cos(2πka/n) − cos(2πk(a + 1)/n) − cos(2πk(a − 1)/n)
5.5 The Ring Graph = 2 cos(2πka/n) − cos(2πka/n) cos(2πk/n) + sin(2πka/n) sin(2πk/n)
− cos(2πka/n) cos(2πk/n) − sin(2πka/n) sin(2πk/n)
The ring graph on n vertices, Rn , may be viewed as having a vertex set corresponding to the = 2 cos(2πka/n) − cos(2πka/n) cos(2πk/n) − cos(2πka/n) cos(2πk/n)
integers modulo n. In this case, we view the vertices as the numbers 0 through n − 1, with edges
= (2 − 2 cos(2πk/n)) cos(2πka/n)
(a, a + 1), computed modulo n.
= (2 − cos(2πk/n))x k (a).
Theorem 5.5.1. The Laplacian of Rn has eigenvectors
The computation for y k follows similarly.
x k (a) = cos(2πka/n), for 0 ≤ k ≤ n/2, and
y k (a) = sin(2πka/n), for 1 ≤ k < n/2. Of course, rotating these pairs of eigenvectors results in another eigenvector of the same
Eigenvectors x k and y k have eigenvalue 2 − 2 cos(2πk/n).
CHAPTER 5. FUNDAMENTAL GRAPHS 51 CHAPTER 5. FUNDAMENTAL GRAPHS 52

eigenvalue. To see this algebraically, observe For any eigenvector ψ of R2n with eigenvalue λ for which ψ(a) = ψ(a + n) for 1 ≤ a ≤ n, the
above equation gives us a way to turn this into an eigenvector of Pn . Let ϕ ∈ IRn be the vector
cos(2πka/n + θ) = cos(2πka/n) cos θ − sin(2πka/n) sin θ for which
= (cos θ)x k − (sin θ)y k . ϕ(a) = ψ(a), for 1 ≤ a ≤ n.
and thus is in the span of x k and y k . Then,
         
In ϕ In ϕ  In
ϕ= = ψ, LR2n ϕ = λψ = λ , and I n I n LR2n ψ = 2λϕ.
In ϕ In ϕ In
5.6 The Path Graph
So, if we can find such a vector ψ, then the corresponding ϕ is an eigenvector of Pn of eigenvalue
We will derive the eigenvalues and eigenvectors of the path graph from those of the ring graph. λ.
To begin, I will number the vertices of the ring a little differently, as in Figure 5.3.
As you’ve probably guessed, we can find such vectors ψ by rotating those derived in
Theorem 5.5.1. For each of the n − 1 two-dimensional eigenspaces of R2n , we get one such vector.
3 2 I’ve drawn Figure 5.3 so that the horizontal coordinate provides one.
For each 1 ≤ k < n, we rotate the eigenvectors x k and y k of R2n by θ = −πk/2n to obtain
4 1 ψ = (cos θ)x k − (sin θ)y k , so that, under the new ordering of the ring,
(
cos(2πka/(2n) − πk/2n) for 1 ≤ a ≤ n,
ψ(a) =
cos(2πk(2n + 1 − a)/(2n) − πk/2n) for n < a ≤ 2n.

8 5 To verify that ψ(a + n) = ψ(a), we use cos(−x) = cos x and

cos(2πk(2n + 1 − a)/(2n) − πk/2n) = cos(2π − 2πka/(2n) + πk/n − πk/2n)


7 6 = cos(−2πka/(2n) + πk/2n)
= cos(2πka/(2n) − πk/2n).

Figure 5.3: The ring on 8 = 2n vertices, numbered differently. In this picture, vertex a appears We now set
above vertex n + a, and every edge appears above another edge. v k (a) = ψ(a) = cos(πka/n − πk/2n).

The corresponding eigenvalue is


Theorem 5.6.1. Let Pn = (V, E) where V = {1, . . . , n} and E = {(a, a + 1) : 1 ≤ a < n}. The
Laplacian of Pn has the same eigenvalues as R2n , excluding 2. That is, Pn has eigenvalues 2 − 2 cos(2πk/(2n)) = 2(1 − cos(πk/n)),
2(1 − cos(πk/n)), with corresponding eigenvectors
for 1 ≤ k < n.
v k (a) = cos(πka/n − πk/2n).
We now know n − 1 distinct eigenvalues. The last, of course, is zero and comes from the constant
for 0 ≤ k < n vector.

Proof. We derive the eigenvectors and eigenvalues by treating Pn as a quotient of R2n : we will The type of quotient used in the above argument is known as an equitable partition. You can find
identify vertex a of Pn with vertices a and a + n of R2n (under the new numbering of R2n ). These a extensive exposition of these in Godsil’s book [God93].
are pairs of vertices that are above each other in the figure that I drew.
Let I n be the n-dimensional identity matrix. You should check that
 
 In
I n I n LR2n = 2LPn .
In
CHAPTER 6. COMPARING GRAPHS 54

which is equivalent to
v T Av ≥ v T Bv
for all v .
The relation ≽ is called the Loewner partial order. It is partial because it orders some pairs of
symmetric matrices, while others are incomparable. But, for all pairs to which it does apply, it
Chapter 6 acts like an order. For example, we have

A ≽ B and B ≽ C implies A ≽ C ,

Comparing Graphs and


A ≽ B implies A + C ≽ B + C ,
for symmetric matrices A, B and C .
We will overload this notation by defining it for graphs as well. Thus, we write
6.1 Overview G≽H

It is rare than one can analytically determine the eigenvalues of an abstractly defined graph. if LG ≽ LH . When we write this, we are always describing an inequality on Laplacian matrices.
Often the best one can do is prove loose bounds on some eigenvalues. It is usually easy to prove
For example, if G = (V, E) is a graph and H = (V, F ) is a subgraph of G with the same vertex
lower bounds on the largest eigenvalues or upper bounds on the smallest eigenvalues: one need
set, then
merely compute the value of the quadratic form in a suitably chosen test vector, and then apply
LG ≽ LH .
the Courant-Fischer Theorem (see Theorem 2.0.1 and Section 5.4). Proving bounds in the other
direction is more difficult. To see this, recall the Laplacian quadratic form:
X
In this chapter we will see a powerful technique that allows one to compare one graph with x T LG x = wu,v (x (u) − x (v))2 .
another, and thereby prove things like lower bounds on the second-smallest eigenvalue of a (u,v)∈E
Laplacian. The technique often goes by the name “Poincaré Inequalities” (see
[DS91, SJ89, GLM99]), or “Graphic inequalities”. It is clear that dropping edges can only decrease the value of the quadratic form, because all
edges weights are positive. The same holds for decreasing the weights of edges.
This notation is particularly useful when we consider some multiple of a graph, such as when we
6.2 The Loewner order write
G ≽ c · H,
I begin by recalling an extremely useful piece of notation that is used in the optimization
for some c > 0. What is c · H? It is the same graph as H, but the weight of every edge is
community. For a symmetric matrix A, we write
multiplied by c. More formally, it is the graph such that Lc·H = cLH .
A≽0 We usually use this notation for the inequalities it implies on the eigenvalues of LG and LH .
if A is positive semidefinite. That is, if all of the eigenvalues of A are nonnegative. This is Lemma 6.2.1. For any c > 0, if G and H are graphs such that
equivalent to
v T Av ≥ 0, G ≽ c · H,
for all v . We similarly write then
A≽B λk (G) ≥ cλk (H),
if for all k.
A − B ≽ 0,

53
CHAPTER 6. COMPARING GRAPHS 55 CHAPTER 6. COMPARING GRAPHS 56

Proof. The Courant-Fischer Theorem tells us that Proof. As


n−1
X
xTLGx x T LH x x T LH x
λk (G) = minn max ≥ minn max c =c minn max = cλk (H). x T LPn x = (x (a + 1) − x (a))2 and x T LG1,n x = (x (n) − x (1))2 ,
S⊆IR x ∈S xTx S⊆IR x ∈S xTx S⊆IR x ∈S xTx
dim(S)=k dim(S)=k dim(S)=k a=1

we need to show that for every x ∈ IRn ,


n−1
X
Corollary 6.2.2. Let G be a graph and let H be obtained by either removing an edge from G or (n − 1) (x (a + 1) − x (a))2 ≥ (x (n) − x (1))2 .
decreasing the weight of an edge in G. Then, for all k a=1

λk (G) ≥ λk (H). For 1 ≤ a ≤ n − 1, set


∆(a) = x (a + 1) − x (a),
P
and note that a ∆(a) = x (n) − x (1). The inequality we need to prove then becomes
6.3 Approximations of Graphs !2
n−1
X n−1
X
(n − 1) ∆(a)2 ≥ ∆(a) .
We consider one graph to be a good approximation of another if their Laplacian quadratic forms a=1 a=1
are similar. For example, we will say that H is a c-approximation of G if
This follows from the Cauchy-Schwartz inequality, which I remind you is the fact that the inner
cH ≽ G ≽ H/c. product of two vectors is at most the product of their norms. In this case, those vectors are ∆
and the all-ones vector of length n − 1:
We consider this approximation to be good if c is just slightly larger than 1. Surprisingly good n−1
!2 n−1
approximations exist. For example, random regular and random Erdös-Rényi graphs are good X 2 X
∆(a) = 1Tn−1 ∆ ≤ (∥1n−1 ∥ ∥∆∥)2 = ∥1n−1 ∥2 ∥∆∥2 = (n − 1) ∆(i)2 .
approximations of complete graphs (see Chapter 8). Infinite families of expander graphs provide a=1 i=1
explicit sparse graphs that are good approximations of complete graphs, even though they have
many fewer edges. For every ϵ > 0 there is a d > 0 such that for all n > 0 there is a d-regular
graph Gn that is a (1 + ϵ)-approximation of Kn .
In Chapters 32 and 33 we will see that every graph can be well-approximated by a sparse graph. 6.4.1 Lower bounding λ2 of a Path Graph

In Theorem 5.6.1 we proved that λ2 (Pn ) = 2(1 − cos(π/n)), which is approximately π 2 /n2 for
6.4 The Path Inequality large n. We now demonstrate the power of Lemma 6.4.1 by using it to prove a lower bound on
λ2 (Pn ) that is very close to this.
By now you should be wondering, “how do we prove that G ≽ c · H for some graph G and H?” To prove a lower bound on λ2 (Pn ), we will prove that some multiple of the path is at least the
Not too many ways are known. We’ll do it by proving some inequalities of this form for some of complete graph. To use this bound, we need to know the eigenvalues of the complete graph. In
the simplest graphs, and then extending them to more general graphs. For example, we will prove Lemma 5.1.1, we show that all the non-zero eigenvalues of the Laplacian of the complete graph
are n, and in particular λ2 (Kn ) = n.
(n − 1) · Pn ≽ G1,n , (6.1)
Let Ga,b denote the graph containing only edge (a, b), and write
where Pn is the path from vertex 1 to vertex n, and G1,n is the graph with just the edge (1, n). X
All of these edges are unweighted. LKn = LGa,b .
a<b
The following very simple proof of this inequality was discovered by Sam Daitch.
For every a < b, let Pa,b be the subgraph of the path graph induced on vertices with indices
Lemma 6.4.1. between a and b. Note that this a path graph of length b − a.
(n − 1) · Pn ≽ G1,n .
For every edge (a, b) in the complete graph, we apply the only inequality available in the path:

Ga,b ≼ (b − a)Pa,b ≼ (b − a)Pn . (6.2)


CHAPTER 6. COMPARING GRAPHS 57 CHAPTER 6. COMPARING GRAPHS 58

1
This inequality says that Ga,b is at most (b − a) times the part of the path connecting a to b, and 1
1
that this part of the path is less than the whole. 2 3
2 3
2 3
Summing inequality (6.2) over all edges (a, b) ∈ Kn gives
X X 4 7
5 6
Kn = Ga,b ≼ (b − a)Pn .
a<b a<b
Figure 6.1: T3 , T7 and T15 . Node 1 is at the top, 2 and 3 are its children. Some other nodes have
To finish the proof, we compute been labeled as well.
X n−1
X 0
(b − a) = c(n − c) = n(n + 1)(n − 1)/6.
1 −1
1≤a<b≤n c=1

1 −1
So, 1 −1
n(n + 1)(n − 1)
LKn ≼ LPn . 1 1 1 1 −1 −1 −1 −1
6
Rewriting this inequality in the form Figure 6.2: The test vector we use to upper bound λ2 (T15 ).
6
LPn ≽ LK ,
n(n + 1)(n − 1) n We will again prove a lower bound by comparing Tn to the complete graph. For each a < b, let
recalling that λ2 (Kn ) = n, and applying Lemma 6.2.1, gives us a pretty good lower bound on the T a,b denote the unique path in T from a to b. This path will have length at most 2d ≤ 2 log2 n.
second-smallest eigenvalue of Pn : So, we have
X X X  
6 n
λ2 (Pn ) ≥ . Kn = Ga,b ≼ (2d)T a,b ≼ (2 log2 n)Tn = (2 log2 n)Tn .
(n + 1)(n − 1) 2
a<b a<b a<b

This gives the bound  


6.5 The Complete Binary Tree n
(2 log2 n)λ2 (Tn ) ≥ λ2 (Kn ) = n,
2
Let’s do the same analysis for the complete binary tree. which implies
1
One way of understanding the complete binary tree of depth d + 1 is to identify the vertices of the λ2 (Tn ) ≥ .
(n − 1) log2 n
tree with strings over {0, 1} of length at most d. The root of the tree is the empty string. Every
non-leaf node has two children, obtained by by appending one character to its label. And, every You should now wonder which bound is closer to the truth: the lower bound of 1/(n − 1) log2 n or
node other than the root has one ancestor, obtained by removing the last character. For example, the upper bound of 2/(n − 1). A small experiment suggests that the correct answer is close to 1/n:
the node with label 10 has children labeled 100 and 101, and its ancestor has label 1.
julia> d = 12; n = 2^d - 1
Alternatively, you can describe it as the graph on n = 2d+1 − 1 nodes with edges of the form
julia> e = fiedler(complete_binary_tree(n))
(i, 2i) and (i, 2i + 1) for i < n. We will name this graph Tn . See figure 6.1 for pictures of these.
julia> e[1][1], 1/n
Let’s first upper bound λ2 (Tn ) by constructing a test vector x . Set x (1) = 0, x (2) = 1, and (0.0002453419413893766, 0.0002442002442002442)
x (3) = −1. Then, for every vertex u that we can reach from node 2 without going through node
1, we set x (a) = 1. For all the other nodes, we set x (a) = −1. Using the generalization of Lemma 6.4.1 presented in the next section, we will improve the lower
bound to 1/2n.
We have constructed x symmetrically, so that 1T x
= 0. Thus, by the Courant-Fischer Theorem
(Theorem 2.0.1),
P
x T Lx (x (a) − x (b))2 (x (1) − x (2))2 + (x (1) − x (3))2
λ2 ≤ T = a∼bP 2
= = 2/(n − 1).
x x a x (a) n−1
CHAPTER 6. COMPARING GRAPHS 59 CHAPTER 6. COMPARING GRAPHS 60

6.6 The weighted path 6.7 A better lower bound on λ2 (Tn )

We now generalize the the inequality in Lemma 6.4.1 to weighted path graphs. Allowing for Theorem 6.7.1.
weights on the edges of the path greatly extends it applicability. λ2 (Tn ) ≥ 1/2n.

Lemma 6.6.1. Let w1 , . . . , wn−1 be positive. Then


Proof. Let n = 2d+1 − 1. Define the depth of a vertex in Tn to be its distance from the root, and
n−1
! define the depth of an edge to be the maximum of the depths of its vertices. So, the lowest
X 1 n−1 X
G1,n ≼ wa Ga,a+1 . possible depth of an edge is 1 and the largest possible is d.
wa
a=1 a=1
For each pair of vertices (a, b), we let T a,b contain exactly the edges on the unique path from a to
Proof. Let x ∈ IRn . As in the proof of Lemma 6.4.1, set ∆(a) = x (a + 1) − x (a). We again have b in Tn , but we assign weight 2i to the edges at depth i. Note that each such path contains at
that most two edges from each depth, 1 through d, and so Lemma 6.6.1 implies
!2
X
x T LG1,n x = (x (1) − x (n))2 = ∆(a) . Ga,b ≼ 2(2−1 + 2−2 + · · · + 2−(d−1 )) · T a,b ≼ 2 · T a,b .
a
So, X X
But, we will need to weight the entries in ∆ before applying Cauchy-Schwarz. Set
Kn = Ga,b ≼ 2 · T a,b .
√ a<b a<b
γ(a) = ∆(a) wa ,
It now remains to upper bound the right-hand term by some multiple of Tn . To this end, observe
and let w −1/2 denote the vector for which that each depth i edge has 2d+1−i − 1 vertices beneath it. So, it appears on
1
w −1/2 (a) = √ . (2d+1−i − 1)(n − 2d+1−i + 1) ≤ (2d+1−i − 1)n
wa
paths between pairs (a, b). As the weight of such an edge in the sum is 2i each time it appears, it
Then, X
T −1/2 appears with total weight at most (2d+1 − 2i )n ≤ n2 . Thus, we find
∆(a) = γ w ,
a X
T a,b ≼ n2 Tn ,
2 X 1
w −1/2 = , a<b
a
wa
and may conclude
and X Kn ≼ 2n2 Tn .
∥γ∥2 = ∆(a)2 wa .
a As λ2 (Kn ) = n, it follows from Lemma 6.2.1 that
So, n
λ2 (Tn ) ≥ = 1/2n.
!2 2n2
X  2
x T LG1,n x = ∆(a) = γ T w −1/2
a
! ! !
 2 X 1 X X 1 n−1
X
≤ ∥γ∥ w −1/2 = ∆(a)2 wa = xT wa LGa,a+1 x.
a
wa a a
wa
a=1
CHAPTER 7. CAYLEY GRAPHS 62

7.2 Paley Graphs

The Paley graph are Cayley graphs over the group of integer modulo a prime, p, where p is
equivalent to 1 modulo 4. Such a group is often written Z/p.
I should begin by reminding you a little about the integers modulo p. The first thing to remember
Chapter 7 is that the integers modulo p are actually a field, written Fp . That is, they are closed under both
addition and multiplication (completely obvious), have identity elements under addition and
multiplication (0 and 1), and have inverses under addition and multiplication. It is obvious that
the integers have inverses under addition: −x modulo p plus x modulo p equals 0. It is a little less
Cayley Graphs obvious that the integers modulo p have inverses under multiplication (except that 0 does not
have a multiplicative inverse). That is, for every x ̸= 0, there is a y such that xy = 1 modulo p.
When we write 1/x, we mean this element y.
The generators of the Paley graphs are the squares modulo p (usually called the quadratic
residues). That is, the set of numbers s such that there exits an x for which x2 ≡p s. Thus, the
7.1 Cayley Graphs vertex set is {0, . . . , p − 1}, and there is an edge between vertices u and v if u − v is a square
modulo p. I should now prove that −s is a quadratic residue if and only if s is. This will hold
Ring graphs and hypercubes are types of Cayley graphs. In general, the vertices of a Cayley provided that p is equivalent to 1 modulo 4. To prove that, I need to tell you one more thing
graph are the elements of some group Γ. In the case of the ring, the group is the set of integers about the integers modulo p: their multiplicative group is cyclic.
modulo n. The edges of a Cayley graph are specified by a set S ⊂ Γ, which are called the
generators of the Cayley graph. The set of generators must be closed under inversion. That is, if Fact 7.2.1. For every prime p, there exists a number g such that for every number x between 1
s ∈ S, then s−1 ∈ S. Vertices a, b ∈ Γ are connected by an edge if there is an s ∈ S such that and p − 1, there is a unique i between 1 and p − 1 such that

a ◦ s = b, x ≡ gi mod p.

where ◦ is the group operation. In the case of Abelian groups, like the integers modulo n, this In particular, g p−1 ≡ 1. And, the mapping between x and i is a bijection.
would usually be written a + s = b. The generators of the ring graph are {1, −1}.
Corollary 7.2.2. If p is a prime equivalent to 1 modulo 4, then −1 is a square modulo p.
The d-dimensional hypercube, Hd , is a Cayley graph over the additive group (Z/2)d : that is the
set of vectors in {0, 1}d under addition modulo 2. The generators are given by the vectors in Proof. Let i be the number between 1 and p − 1 such that g i ≡ −1 modulo p. Then, g 2i ≡ 1
{0, 1}d that have a 1 in exactly one position. This set is closed under inverse, because every modulo p and so by Fact 7.2.1 we know that 2i must be equivalent to p − 1 modulo p. The only
element of this group is its own inverse. number between 1 and p that satisfies this relation is i = (p − 1)/2.
We require S to be closed under inverse so that the graph is undirected: As 4 divides p − 1, (p − 1)/4 is an integer. So, we can set s = g (p−1)/4 , and finish the proof by
observing that s2 ≡ g (p−1)/2 ≡ −1 modulo p.
a+s=b ⇐⇒ b + (−s) = a.

We now understand a lot about the squares modulo p (formally called quadratic residues). The
Cayley graphs over Abeliean groups are particularly convenient because we can find an
squares are exactly the elements g i where i is even. As g i g j = g i+j , the fact that −1 is a square
orthonormal basis of eigenvectors without knowing the set of generators. They just depend on the
implies that s is a square if and only if −s is a square. So, S is closed under negation, and the
group1 . Knowing a basis of eigenvectors makes it much easier to compute the eigenvalues. We
Cayley graph of Z/p with generator set S is in fact a graph. As |S| = (p − 1)/2, it is regular of
give the computations of the eigenvectors in sections 7.4 and 7.8.
degree
We will now examine two exciting types of Cayley graphs: Paley graphs and generalized p−1
d= .
hypercubes. 2
1
More precisely, the characters always form an orthonormal set of eigenvectors, and the characters just depend
upon the group. When two different characters have the same eigenvalue, we obtain an eigenspace of dimension
greater than 1. These eigenspaces do depend upon the choice of generators.

61
CHAPTER 7. CAYLEY GRAPHS 63 CHAPTER 7. CAYLEY GRAPHS 64

7.3 Eigenvalues of the Paley Graphs As we are considering a non-diagonal entry, w ̸= 0. The term in the sum for y = 0 is zero. When
y ̸= 0, χ(y) ∈ ±1, so
Let L be the Laplacians matrix of the Paley graph on p vertices. A remarkable feature of Paley χ(y)χ(w + y) = χ(w + y)/χ(y) = χ(w/y + 1).
graph is that L2 can be written as a linear combination of L, J and I , where J is the all-1’s As y varies over {1, . . . , p − 1}, w/y also varies over all of {1, . . . , p − 1}. So, w/y + 1 varies over
matrix. We will prove that all elements other than 1. This means that
p−1 p(p − 1) !
L2 = pL + J− I. (7.1) p−1
X p−1
X
4 4
χ(y)χ(w + y) = χ(z) − χ(1) = 0 − 1 = −1.
The proof will be easiest if we express L in terms of a matrix X defined by the quadratic y=0 z=0
character : 

1 if x is a quadratic residue modulo p So, every off-diagonal entry in X 2 is −1.
χ(x) = 0 if x = 0, and Corollary 7.3.2. The nonzero eigenvalues of the Laplacian of the Paley graph on p vertices are


−1 otherwise.
1 √
This is called a character because it satisfies χ(xy) = χ(x)χ(y). We will use this to define a (p ± p) .
2
matrix X by
X (a, b) = χ(a − b). Proof. Let ϕ be an eigenvector of L of eigenvalue λ ̸= 0. As ϕ is orthogonal to the all-1s vector,
Using the fact that J ϕ = 0. So, Equation (7.1) implies
 p−1

 2 if a = b, p(p − 1)
L(a, b) = −1 if χ(a − b) = 1, and λ2 ϕ = L2 ϕ = pLϕ − I ϕ = (pλ − p(p − 1)/4)ϕ.
 4

0 otherwise,
This equation tells us that λ satisfies
we find that
2L = pI − J − X . (7.2) p(p − 1)
λ2 − pλ + = 0,
4
Equation (7.1) follows from this relation, the following lemma, and the relations J 2 = pJ and which implies
X J = J X = 0. The latter follows from the fact that each row and column of X has exactly 1 √
λ∈ (p ± p) .
(p − 1)/2 entries that are 1, (p − 1)/2 entries that are −1, and one entry that is 0. 2
Lemma 7.3.1.
X 2 = pI − J .
The fact that Paley graphs have only two nonzero eigenvalues means that they are strongly
Proof. The diagonal entries of X 2 are the squares of the norms of the columns of X . As each regular graphs. We will discuss those further in Chapter 9. The fact that those eigenvalues are
contains (p − 1)/2 entries that are 1, (p − 1)/2 entries that are −1, and one entry that is 0, its very close to each other means that
p 2 times a Paley graph (the graph we obtain by assigning
squared norm is p − 1. weight 2 to every edge) is a 1 + 1/p approximation of the complete graph, up to a very small
factor. Paley graphs have also been shown by Chung, Graham, and Wilson to have many
To handle the off-diagonal entries, we observe that X is symmetric, and so the off-diagonal properties in common with random graphs [CGW89].
entries of X 2 are the inner products of columns of X . That is,
p−1
X p−1
X
X (a, b) = χ(a − x)χ(b − x) = χ(y)χ((b − a) + y), 7.4 Generalizing Hypercubes
x=0 y=0
To generalize the hypercube, we will consider Cayley graphs over the same group, but with more
where we have set y = a − x. For convenience, set w = b − a, so we can write this more simply as
generators. Recall that we view the vertex set as the vectors in {0, 1}d , modulo 2. Each
p−1
X generator, g 1 , . . . , g k , is in the same group. As g + g = 0 modulo 2 for all g ∈ {0, 1}d , the set of
χ(y)χ(w + y). generators is automatically closed under negation.
y=0
CHAPTER 7. CAYLEY GRAPHS 65 CHAPTER 7. CAYLEY GRAPHS 66

Let G be the Cayley graph with these generators. To be concrete, set V = {0, 1}d , and note that 7.5 A random set of generators
G has edge set 
(x , x + g j ) : x ∈ V, 1 ≤ j ≤ k . We will now show that if we choose the set of generators uniformly at random, for k some
constant multiple of the dimension, then we obtain a graph that is a good approximation of the
Using the analysis of products of graphs in Theorem 5.3.2, we derived a set of eigenvectors of Hd .
complete graph. That is, all the eigenvalues of the Laplacian will be close to k. This construction
We will now verify that these are eigenvectors for all generalized hypercubes. Knowing these will
comes from the work of Alon and Roichman [AR94]. We will set k = cd, for some c > 1. Think of
make it easy to describe the eigenvalues.
c = 2, c = 10, or c = 1 + ϵ.
For each b ∈ {0, 1}d , define the function ψ b from V to the reals given by
For b ∈ {0, 1}d but not the zero vector, and for g chosen uniformly at random from {0, 1}d , b T g
bT x modulo 2 is uniformly distributed in {0, 1}, and so
ψ b (x ) = (−1) .
T
When we write b T x , you might wonder if we mean to take the sum over the reals or modulo 2. (−1)b g

As both b and x are {0, 1}-vectors, and the result is used as the exponent of −1, you get the
is uniformly distributed in ±1. So, if we pick g 1 , . . . , g k independently and uniformly from
same answer either way you do it.
{0, 1}d , the eigenvalue corresponding to the eigenvector ψ b is
While it is natural to think of b as being a vertex, that is the wrong perspective. Instead, you k
X
def T
should think of b as indexing a Fourier coefficient (if you don’t know what a Fourier coefficient is, λb = k − (−1)b gi
.
just don’t think of it as a vertex). i=1

The eigenvectors and eigenvalues of the graph are determined by the following theorem. As this The right-hand part is a sum of independent, uniformly chosen ±1 random variables. So, we
graph is k-regular, the eigenvectors of the adjacency and Laplacian matrices will be the same. know it is concentrated around 0, and thus λb will be concentrated around k. To determine how
Lemma 7.4.1. For each b ∈ {0, 1}d the vector ψ b is a Laplacian matrix eigenvector with concentrated the sum actually is, we use a Chernoff bound. There are many forms of Chernoff
eigenvalue bounds. We will not use the strongest, but settle for one which is simple and which gives results
X k
T
that are qualitatively correct.
k− (−1)b g i .
Theorem 7.5.1. Let x1 , . . . , xk be independent ±1 random variables. Then, for all t > 0,
i=1 " #
X 2
Proof. We begin by observing that Pr xi ≥ t ≤ 2e−t /2k .
T T T
i
ψ b (x + y ) = (−1)b (x +y )
= (−1)b x
(−1)b y
= ψ b (x )ψ b (y ). (7.3)
This becomes very small when t is a constant fraction of k. In fact, it becomes so small that it is
Let L be the Laplacian matrix of the graph. For any b ∈ {0, 1}d and any vertex x ∈ V , we use unlikely that any eigenvalue deviates from k by more than t.
(7.3) to compute Theorem 7.5.2. With high probability, all of the nonzero eigenvalues of the generalized hypercube
k
X differ from k by at most r
(Lψ b )(x ) = kψ b (x ) − ψ b (x + g i ) 2
k ,
i=1 c
Xk where k = cd.
= kψ b (x ) − ψ b (x )ψ b (g i )
i=1 p
! Proof. Let t = k 2/c. Then, for every nonzero b,
k
X 2 /2k
= ψ b (x ) k − ψ b (g i ) . Pr [|k − λb | ≥ t] ≤ 2e−t ≤ 2e−k/c = 2e−d .
i=1
Now, the probability that there is some b for which λb violates these bounds is at most the sum
So, ψ b is an eigenvector of eigenvalue
of these terms:
k
X k
X X
k− ψ b (g i ) = k − (−1)b
T
gi
. Pr [∃b : |k − λb | ≥ t] ≤ Pr [|k − λb | ≥ t] ≤ (2d − 1)2e−d ,
i=1 i=1 b∈{0,1}d ,b̸=0d

which is always less than 1 and goes to zero exponentially quickly as d grows.
CHAPTER 7. CAYLEY GRAPHS 67 CHAPTER 7. CAYLEY GRAPHS 68

We initially suggested thinking of c = 2 or c = 10. The above bound works for c = 10. To get a where i is an integer that satisfies i2 = −1 modulo p.
useful bound for c = 2, we need to sharpen the analysis. A naive sharpening will work down to
Even more explicit constructions, which do not require solving equations, may be found
c = 2 ln 2. To go lower than that, you need a stronger Chernoff bound.
in [ABN+ 92].

7.6 Conclusion
7.8 Eigenvectors of Cayley Graphs of Abelian Groups
We have now seen that a random generalized hypercube of degree k probably has all non-zero
The wonderful thing about Cayley graphs of Abelian groups is that we can construct an
Laplacian eigenvalues between
p p orthornormal basis of eigenvectors for these graphs without even knowing the set of generators S.
k(1 − 2/c) and k(1 + 2/c). That is, the eigenvectors only depend upon the group. Related results also hold for Cayley graphs
of arbitrary groups, and are related to representations of the groups. See [Bab79] for details.
If we let n be the number of vertices, and we now multiply the weight of every edge by n/k, we
obtain a graph with all nonzero Laplacian eigenvalues between As Cayley graphs are regular, it won’t matter which matrix we consider. For simplicity, we will
p p consider adjacency matrices.
n(1 − 2/c) and n(1 + 2/c).
p Let n be an integer and let G be a Cayley graph on Z/n with generator set S. When S = {±1},
Thus, this is essentially a 1 + 2/c approximation of the complete graph on n vertices. But, the
degree of every vertex is only c log2 n. Expanders are infinite families of graphs that are we get the ring graphs. For general S, I think of these as generalized Ring graphs. Let’s first see
constant-factor approximations of complete graphs, but with constant degrees. that they have the same eigenvectors as the Ring graphs.

We know that random regular graphs are probably expanders. If we want explicit constructions, Recall that we proved that the vectors x k and y k were eigenvectors of the ring graphs, where
we need to go to non-Abelian groups. x k (u) = sin(2πku/n), and
Explicit constructions that achieve bounds approaching those of random generalized hypercubes y k (u) = cos(2πku/n),
come from error-correcting codes. for 1 ≤ k ≤ n/2.
Explicit constructions allow us to use these graphs in applications that require us to implicitly Let’s just do the computation for the x k , as the y k are similar. For every u modulo n, we have
deal with a very large graph. In Chapter 31, we will see how to use such graphs to construct X
pseudo-random generators. (Ax k )(u) = x k (u + g)
g∈S
 
1 X
7.7 Non-Abelian Groups = x k (u + g) + x k (u − g)
2
g∈S
 
In the homework, you will show that it is impossible to make constant-degree expander graphs 1 X
from Cayley graphs of Abelian groups. The best expanders are constructed from Cayley graphs of = sin(2πk(u + g)/n) + sin(2πk(u − g)/n)
2
2-by-2 matrix groups. In particular, the Ramanujan expanders of Margulis [Mar88] and g∈S
 
Lubotzky, Phillips and Sarnak [LPS88] are Cayley graphs over the Projective Special Linear
1 X
Groups PSL(2, p), where p is a prime. These are the 2-by-2 matrices modulo p with determinant = 2 sin(2πku/n) cos(2πkg/n)
1, in which we identify A with −A. 2
g∈S
X
They provided a very concrete set of generators. For a prime q congruent to 1 modulo 4, it is = sin(2πku/n) cos(2πkg/n)
known that there are p + 1 solutions to the equation g∈S
X
a21 + a22 + a23 + a24 = p, = x k (u) cos(2πkg/n).
g∈S
where a1 is odd and a2 , a3 and a4 are even. We obtain a generator for each such solution of the
So, the corresponding eigenvalue is X
form:  
1 a0 + ia1 a2 + ia3 cos(2πkg/n).
√ ,
p −a2 + ia3 a0 − ia1 g∈S
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 70

Chapter 8

Eigenvalues of Random Graphs


Note that the eigenvalues are almost never outside −10, 10. We are going to prove something like
that today.
Notation: at many points during this chapter, we will write [n] to indicate the set {1, 2, . . . , n}. Here is a histogram of the second-largest eigenvalues of 10,000 matrices on 1,000 vertices.
In this chapter we examine the adjacency matrix eigenvalues of Erdös-Rényi random graphs.
These are graphs in which each edge is chosen to appear with probability p, and the choices are
made independently for each edge. We will find that these graphs typically have one large
eigenvalue paround pn, and that all of the others probably have absolute value at most
(1 + o(1))2 p(1 − p)n. In fact, their distribution within this region follows Wigner’s [Wig58]
semicircle law: their histogram looks like a semicircle.
For example, let’s consider p = 1/2. Here is the histogram of all but the largest eigenvalue of a
random graph on 4,000 vertices.

8.1 Transformation

Let M be the adjacency matrix of an Erdös-Rényi random graph. We understand M by writing


it in the form
M = p(J − I ) + R,
where J is the all-1s matrix, and R is a random symmetric whose diagonal entries are zero and
whose off-diagonal entries have distribution
(
The following image is the histogram of the 99 smallest adjacency eigenvalues of 10,000 random 1 − p with probability p, and
graphs on 100 vertices. R(a, b) =
−p with probability 1 − p.

The reason we write it this way is that the expectation of every entry of R, and thus R is zero.
We will show that with high probability all of the eigenvalues of R are probably small, and thus
we can view M as being approximately p(J − I ).

69
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 71 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 72

As p(J − I ) has one eigenvalue of p(n − 1) and n − 1 eigenvalues of −p, the bulk of the as x is a unit vector.
distribution of eigenvalues of M is very close to the distribution of eigenvalues of R, minus p. To
We thereby obtain the following bound on x T Rx .
see this, first subtract pI from R. This shifts all the eigenvalues down by p. We must now add
pJ . As J is a rank-1 matrix, we can show that the eigenvalues of pJ + (R − pI ) interlace the Lemma 8.2.2. For every unit vector x ,
eigenvalues of R − pI (see the exercise at the end of the chapter). So, the largest eigenvalue   2
moves up a lot, and all the other n − 1 move up to at most the next eigenvalue. PrR x T Rx ≥ t ≤ 2e−t .

Our first analysis will be a crude but simple upper bound on the norm of R.
Proof. The expectation of x T Rx is 0. The preceding argument tells us that
     
Pr x T Rx ≥ t ≤ Pr x T Rx ≥ t + Pr x T Rx ≤ −t
8.2 The extreme eigenvalues  T   T
≤ Pr x Rx ≥ t + Pr x (−R)x ≥ t

2
≤ 2e−t ,
For this section, we make the simplifying assumption that p = 1/2. At the end, we will explain
how to handle the general case. where we have exploited the fact that R and −R are identically distributed.
Recall that
x T Rx
∥R∥ = max . 8.2.1 Vectors near v 1
x xTx
Each R(i, j) is a random variable that is independently and uniformly distributed in ±1/2. You might be wondering what good the previous argument will do us. We have shown that it is
To begin, we fix any unit vector x , and consider unlikely that the Rayleigh quotient of any given x is large. But, we have to reason about all x of
X unit norm.
x T Rx = 2R(i, j)x (i)x (j).
Lemma 8.2.3. Let R be a symmetric matrix and let v be a unit eigenvector of R whose
i<j
eigenvalue has absolute value ∥R∥. If x is another unit vector such that
This is a sum of independent random variables, and so may be proved to be tightly concentrated √
around its expectation, which in this case is zero. There are many types of concentration bounds, v T x ≥ 3/2,
with the most popular being the Chernoff and Hoeffding bounds. In this case we will apply
then
Hoeffding’s inequality. 1
x T Rx ≥ ∥R∥ .
2
Theorem 8.2.1 (Hoeffding’s Inequality). Let a1 , . . . , am and b1 , . . . , bm be real numbers and let
X1 , . . . ,P
Xm be independent random variables such that Xi takes values between ai and bi . Let Proof. Let λ1 ≥ λ2 ≥ · · · ≥ λn be the eigenvalues of R and let v 1 , . . . , v n be a corresponding set
µ = E [ i Xi ]. Then, for every t > 0, of orthonormal eigenvectors. Assume without loss of generality that λ1 ≥ |λn | and that v = v 1 .
hX i   Expand x in the eigenbasis as
2t2 X
Pr Xi ≥ µ + t ≤ exp − P 2
. x = ci v i .
i (bi − ai ) i
√ 2
P
To apply this theorem, we view We know that c1 ≥ 3/2 and
i ci = 1. This implies that
Xi,j = 2ri,j x (i)x (j)  
X X X
as our random variables. As ri,j takes values in ±1/2, we can set
T
x Rx = 2 2
ci λi ≥ c1 λ1 − 2  2
ci |λ1 | = λ1 c1 − 2
ci = λ1 (2c21 − 1) ≥ λ1 /2.
i i≥2 i≥2
ai,j = −x (i)x (j) and bi,j = x (i)x (j).

We then compute
! ! We will bound the probability that ∥R∥ is large by taking Rayleigh quotients with random unit
X X X X X
2
(bi − ai ) = 2 2
4x (i) x (j) = 2 2 2
x (i) x (j) ≤ 2 2
x (i) x (j) 2
= 2, vectors. Let’s examine the probability that a random unit vector x satisfies the conditions of
i<j i<j i̸=j i i Lemma 8.2.3.
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 73 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 74

Lemma 8.2.4. Let v be an arbitrary unit vector, and let x be a random unit vector. Then, Proof. Let R be a fixed symmetric matrix. By applying Lemma 8.2.4 to any eigenvector of R
h √ i whose eigenvalue has maximal absolute value, we find
1
Pr v T x ≥ 3/2 ≥ √  
πn2n−1 1 1
Prx x T Rx ≥ ∥R∥ ≥ √ .
2 πn2n−1
Proof. Let B n denote the unit ball in IRn , and let C denote the cap on the surface of B n
Thus, for a random R we find
containing all vectors x such that √
v T x ≥ 3/2.  
1 1
PrR,x ∥R∥ ≥ t and x T Rx ≥ ∥R∥ ≥ PrR [∥R∥ ≥ t] √ .
We need to lower bound the ratio of the surface area of the cap C to the surface area of B n . 2 πn2n−1

Recall that the surface area of B n is On the other hand,


nπ n/2  
, 1  
Γ( n2 + 1) PrR,x ∥R∥ ≥ t and x T Rx ≥ ∥R∥ ≤ PrR,x ∥R∥ ≥ t and x T Rx ≥ t/2
2
where I recall that for positive integers n  
≤ PrR,x x T Rx ≥ t/2
  T 
Γ(n) = (n − 1)!, ≤ Prx PrR x Rx ≥ t/2
2

and that Γ(x) is an increasing function for real x ≥ 1. ≤ 2e−(t/2) ,

Now, consider the (n − 1)-dimensional hypersphere whose boundary is the boundary of the cap C. where the last inequality follows from Lemma 8.2.2.
As the cap C lies above this hypersphere, the (n − 1)-dimensional volume of this hypersphere is a Combining these inequalities, we obtain
lower bound on the surface area of the cap C. Recall that the volume of a sphere in IRn of radius
r is 1 2

π n/2 PrR [∥R∥ ≥ t] √ ≤ e−(t/2) ,


rn n . πn2n−1
Γ( 2 + 1)
which implies the claimed result.
In our case, the radius of the hypersphere is
√ 2 √
r = sin(acos 3/2) = 1/2. The probability in Theorem 8.2.5 becomes small once et /4 exceeds πn2n . As n grows large, this
happens for √ √ √
So, the ratio of the (n − 1)-dimensional volume of the hypersphere to the surface area of B n is at t > 2 ln 2 n ∼ (5/3) n.
least √
π (n−1)/2
rn−1 Γ( n−1
+1) rn−1 Γ( n2 + 1) 1
It is known that the norm of R is unlikely to be much more than n. This is proved by Füredi
2
=√ ≥√ . and Komlós [FK81] and Vu [Vu07], using a very different technique that we introduce next.
nπ n/2
n
πn Γ( n−1
2 + 1)
πn2n−1
Γ( 2 +1)

8.3 The Trace Method


8.2.2 The Probabilistic Argument A good way to characterize the general shape of a distribution is through its moments. Let
ρ1 , . . . , ρn be the eigenvalues of R. Their first moment is simply their sum, which is the trace of
I’m going to do the following argument very slowly, because it is both very powerful and very R and thus zero. Their kth moment is
subtle.
n
X  
Theorem 8.2.5. Let R be a symmetric matrix with zero diagonal and off-diagonal entries ρki = Tr Rk .
uniformly chose from ±1/2. Then, i=1

√ 2 /4
Pr [∥R∥ ≥ t] ≤ πn2n e−t . One of the easiest ways to reason about the distribution of the eigenvalues is to estimate the
expectations of the traces of powers of R. This is called Wigner’s trace method.
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 75 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 76

Before proceeding with our analysis, we recall a formula for the entries of a power of matrices. 8.4 Expectation of the trace of a power
For matrices A and B whose rows and columns are indexed by V ,
X Recall that the trace is the sum of the diagonal entries in a matrix. By expanding the formula for
(AB)(a, b) = A(a, c)B(c, b).
matrix multiplication, one can also show
c∈V

X l−1
Y
Applying this formula inductively, we find that
Rl (a0 , a0 ) = R(a0 , al−1 ) R(ai−1 , ai ),
X
(Al )(a, b) = A(a, c1 )A(c1 , c2 ) · · · A(cl−1 , b). a1 ,...,al−1 ∈V i=1

c1 ,...,cl−1 ∈V
and so
X l−1
Y
We can use good bounds on the moments of the eigenvalues of R to obtain a good upper bound ERl (a0 , a0 ) = ER(a0 , al−1 ) R(ai−1 , ai ).
on the norm of R that holds with high probability. We will do this by estimating the trace of a a1 ,...,al−1 ∈V i=1
high power of R. For an even power l, this should be relatively close to ∥R∥. In particular, we To simplify this expression, we will recall that if X and Y are independent random variables, then
use the fact that for every even l   E(XY ) = E(X)E(Y ). So, to the extent that the terms in this product are independent, we can
∥R∥l ≤ Tr Rl distribute this expectation across this product. As the entries of R are independent, up to the
symmetry condition, the only terms that are dependent are those that are identical. So, if
and thus   1/l {bj , cj }j is the set of pairs that occur in
∥R∥ ≤ Tr Rl .
{a0 , a1 } , {a1 , a2 } , . . . , {al−2 , al−1 } , {al−1 , a0 } , (8.1)
We will prove in Theorem 8.4.2 that for every even l such that np(1 − p) ≥ 2l8 ,
and pair {bj , cj } appears dj times, then
 
ETr Rl ≤ 2n(4np(1 − p))l/2 . l−1
Y Y
ER(a0 , al−1 ) R(ai−1 , ai ) = ER(bj , cj )dj .
This will allow us to show that the norm of R is usually less than u where i=1 j
 1/l p
def As each entry of R has expectation 0,
u = 2n(4np(1 − p))l/2 = (2n)1/l 2 np(1 − p).
ER(bj , cj )dj
We establish this by an application of Markov’s inequality. For all ϵ > 0,
is zero if dj is 1. In general
h   i
Pr [∥R∥ > (1 + ϵ)u] ≤ Pr Tr Rl > (1 + ϵ)l ul h i
h    i ERd(bj ,cj ) = p(1 − p) (1 − p)d−1 − (−p)d−1 ≤ p(1 − p), (8.2)
≤ Pr Tr Rl > (1 + ϵ)l ETr Rl
for d ≥ 2.
≤ (1 + ϵ)−l ,
So, ERl (a0 , a0 ) is at most the sum over sequences a1 , . . . , al−1 such that each pair in (8.1) appears
by Markov’s inequality. at least twice, times p(1 − p) for each pair that appears in the sequence.
To understand this probability, remember that for small ϵ (1 + ϵ) is approximately exp(ϵ). So, To describe this more carefully, we say that a sequence a0 , a1 , . . . , al is a closed walk of length l on
(1 + ϵ)−l is approximately exp(−ϵl). This probability becomes small when l > 1/ϵ. Concretely, for n vertices if each ai ∈ {1, . . . , n} and al = a0 . In addition, we say that it is significant if for every
ϵ < 1/2, 1 + ϵ < exp(4ϵ/5). Thus, we can take ϵ approximately (n/2)−1/8 . While this bound is not tuple {b, c} there are at least two indices i for which {ai , ai+1 } = {b, c}. Let Wn,l,k denote the
very useful for n that we encounter in practice, it is nice asymptotically. The bound can be number of significant closed walks of length l on n vertices such that a1 , . . . , al−1 contains exactly
substantially improved by more careful arguments, as we explain at the end of the Chapter. k distinct elements. As a sequence with k distinct elements must contain at least k distinct pairs,
we obtain the following upper bound on the trace.
We should also examine the term (2n)1/l :
Lemma 8.4.1.
(2n)1/l = exp(ln(2n)/l) ≤ 1 + 1.1 ln(2n)/l,   X l/2
ETr Rl ≤ Wn,l,k (p(1 − p))k .
for ln(2n)/l < 1/2. Thus, for l >> ln(2n) this term is close to 1. k=1
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 77 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 78


In the next section, we prove that There are at most l−1k choices for the set S. Given S, we wish to record the identities of the
elements ai for i ∈ S. Each is an element of [n], and we record them in the order in which they
Wn,l,k ≤ nk+1 2l l4(l−2k) . appear in the walk. You may wish to think of this data as a map

Theorem 8.4.2. If l is even and np(1 − p) ≥ 2l8 , then σ : S → [n].


 
There are at most nk such maps. We also record the identity of a0 .
ETr Rl ≤ 2n(4np(1 − p))l/2 .
For each i ̸∈ S there is a j ∈ S ∪ {0} for which aj = ai . We record which element of S this is, or
Proof. Let we record 0 if it is a0 . As S has k elements, we only need one of k + 1 numbers to record this j.
tk = nk+1 2l l4(l−2k) (p(1 − p))k . Again, you may wish to think of this data as a map

We will show that the sequence tk is geometrically increasing, and thus it is dominated by its τ : [l − 1] \ S → S ∪ {0} .
largest term. We compute
There are at most (k + 1)l−1−k choices for τ . See figure 8.1 for an example.
tk nk+1 2l l4(l−2k) (p(1 − p))k
= k l 4(l−2k+2)
tk−1 n 2l (p(1 − p))k−1
n(p(1 − p))
=
l8
≥ 2.

Thus,

  X l/2 l/2 l/2


X X
ETr Rl ≤ W (n, l, k)(p(1 − p))k ≤ tk ≤ tl/2 21−k ≤ 2tl/2 ≤ 2n2l (np(1 − p))l/2 .
k=1 k=1 k=1
Figure 8.1: An example walk along with S, σ, and τ .

While not every choice of a0 , S, σ and τ corresponds to a significant walk, every significant walk
with k distinct elements corresponds to some a0 , S, σ and τ . Thus,
Of course, better bounds on Wn,l,k provide better bounds on the trace. Vu [Vu07] proves that
 
  l − 1 k+1
l Wn,l,k ≤ n (k + 1)l−1−k . (8.3)
Wn,l,k ≤ nk+1 (k + 1)2(l−2k) 22k . k
2k
This bound
p is too loose to obtain the result we desire. It merely allows us to prove that
This bound allows us to apply much higher powers of the matrix.
∥R∥ ≤ c np(1 − p) log n for some constant c. This bound is loosest when k = l/2. This is
fortunate both because this is the case in which it is easiest to tighten the bound, and because the
computation in Theorem 8.4.2 is dominated by this term.
8.5 The number of walks
Consider the graph with edges (ai−1 , ai ) for i ∈ S. This graph must be a tree because it contains
Our goal is to prove an upper bound on Wn,l,k . We will begin by proving a crude upper bound, exactly the edges from which the walk first hits each vertex. Formally, this is because the graph
and then refine it. contains k edges, touches k + 1 vertices, and we can prove by induction on the elements in S that
it is connected, starting with a0 . See Figure 8.2.
As it is tricky to obtain a clean formula for the number of such walks, we will instead derive ways
of describing such walks, and then count how many such descriptions there can be. We can use this tree to show that, when k = l/2, every pair {ai−1 , ai } that appears in the walk
must appear exactly twice: the walk only takes l steps, and each pair of the k = l/2 in the tree
Let S ⊂ {1, . . . , l − 1} be the set of i for which ai does not appear earlier in the walk: must be covered at least twice.

ai ̸∈ {aj : j < i} . We now argue that when l = 2k the map τ is completely unnecessary: the walk is determined by
S and σ alone. That is, for every i ̸∈ S there is only one edge that the walk can follow. For i ̸∈ S,
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 79 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 80

exactly once and the walk follows that edge. That is, the edge is {ai−1 , ai } and we can infer ai
given i ∈ T . If ai−1 is adjacent to exactly one tree edge that has been used exactly once but the
walk does not follow that edge, then step i is extra: it either follows an edge that is not in the
tree or it follows a tree edge that has been used at least twice.
This leaves us to account for the steps in which ai−1 is adjacent to more than one tree edge that
has been used exactly once and follows such an edge. In this case, we call step i ambiguous,
because we need some way to indicate which of those edges it used. In ambiguous steps we also
use τ to record ai . Every step not in S or T is extra or ambiguous. So,

τ : ([l − 1] \ (S ∪ T )) → S ∪ {0} .
Figure 8.2: The tree edges for this walk.
The data a0 , S, T , σ, and τ determine the walk. It remains to determine how many ways we can
the tuple {ai−1 , ai } must have appeared exactly once before in the walk. We will show that at choose them.
step i the vertex ai−1 is adjacent to exactly one edge with this property.
We will show that the number of ambiguous steps is at most the number of extra steps. This
To this end, we keep track of the graph of edges that have appeared exactly once before in the implies that |V \ (S ∪ T )| ≤ 2x. Thus, the number of possible maps τ is at most
walk. We could show by induction that at step i this graph is precisely a path from a0 to ai−1 .
But, we take an alternate approach. Consider the subgraph of the tree edges that have been used (k + 1)2x .
exactly once up to step i. We will count both its number of vertices, v, and its number of edges,
The number of choices for S and T may be upper bounded by
f . At step i we include ai in this subgraph regardless of whether it is adjacent to any of the
edges, so initially v = 1 and e = 0. The walk ends in the same state.   
l−1 l−1−k
≤ 2l−1 (l − 1 − k)2x .
For steps in which i ∈ S, both v and f increase by 1: the edge {ai−1 , ai } and the vertex ai is k 2x
added to the subgraph. When i ̸∈ S and {ai−1 , ai } is the only tree edge adjacent to ai−1 that has
Thus
been used exactly once, both e and v decrease: e because we use the pair {ai−1 , ai } a second time
Wn,l,k ≤ nk+1 2l−1 (l − k − 1)2x (k + 1)2x ≤ nk+1 2l−1 (lk)2(l−2k) ≤ nk+1 2l l4(l−2k) . (8.4)
and v because ai−1 is no longer adjacent to any tree edge that has been used exactly once, and
the walk moves to ai .
We now finish by arguing that the number of ambiguous edges is at most the number of extra
If for some i ̸∈ S it were the case that ai−1 was adjacent to two tree edges that had been used edges. As before, keep track of the subgraph of the tree edges that have been used exactly once
exactly once, then e would decrease but v would not. As the process starts and ends with up to step i. We will count both its number of vertices, v, and its number of edges, e. At step i
v − e = 1, this is not possible. we include ai in this subgraph regardless of whether it is adjacent to any of the graphs edges, so
initially v = 1 and e = 0. The walk ends in the same state.
Thus  
l For steps in which i ∈ S, both v and e increase by 1. For steps in which i ∈ T , the vertex ai−1 has
Wn,l,l/2 ≤ nl/2+1 .
l/2 degree one in this graph. When we follow the edge (ai−1 , ai ), we remove it from this graph. As
ai−1 is no longer adjacent to any edge of the graph, both v and e decrease by 1.
Now that we know Wn,l,l/2 is much less than the bound suggested by (8.3), we should suspect
that Wn,l,k is also much lower when k is close to l/2. To show this, we extend the idea used in the At ambiguous steps i, we decrement e. But, because ai−1 was adjacent to at least two tree edges
previous argument to show that with very little information we can determine where the walk that had been used exactly once, it is not removed from the graph and v does not decrease. The
goes for almost every step not in S. ambiguous steps may be compensated by extra steps. An extra step does not change f , but it can
decrease v. This happens when ai−1 is not adjacent to any tree edges that have been used exactly
We say that the ith step in the walk is extra if the pair {ai−1 , ai } is not a tree edge or if it appears once, but ai is. Thus, ai−1 contributes 1 to v during step i − 1, but is removed from the count as
at least twice in the walk before step i. Let x denote the number of extra steps. As each of the soon as the walk moves to ai . As the walk starts and ends with v − f = 1, neither the steps in S
tree edges appears in at least two steps, the number of extra steps is at most l − 2k. We will use τ nor T change this difference, ambiguous steps increase it, and extra steps can decrease it, the
to record the destination vertex ai of each extra step, again by indicating its position in S. number of extra steps must be at least the number of ambiguous steps.
During the walk, we keep track of the set of tree edges that have been used exactly once. Let T
be the set of steps in which in which ai−1 is adjacent to exactly one tree edge that has been used
CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 81 CHAPTER 8. EIGENVALUES OF RANDOM GRAPHS 82

8.6 Notes

The proof in this chapter is a slight simplification and weakening of result due to Vu [Vu07]. The
result was first claimed by Füredi and Komlos [FK81]. However, there were a few mistakes in
their paper. Vu’s paper also provides concentration results that lower bound µ2 , whereas the
argument in this chapter merely provides an upper bound.

8.7 Exercise

1. Interlacing.
Let A be a symmetric matrix with eigenvalues α1 ≥ α2 ≥ . . . ≥ αn . Let B = A + x x T for some
vector x and let β1 ≥ β2 ≥ . . . ≥ βn be the eigenvalues of B. Prove that for all i

βi ≥ αi ≥ βi+1 .

Figure 8.3: An example walk along with v and e


CHAPTER 9. STRONGLY REGULAR GRAPHS 84

These conditions are very strong, and it might not be obvious that there are any non-trivial
graphs that satisfy these conditions. Of course, the complete graph and disjoint unions of
complete graphs satisfy these conditions.
For the rest of this lecture, we will only consider strongly regular graphs that are connected and
that are not the complete graph. I will now give you some examples.
Chapter 9
9.3 The Pentagon

Strongly Regular Graphs The simplest strongly-regular graph is the pentagon. It has parameters

n = 5, k = 2, λ = 0, µ = 1.

This Chapter Needs Editing 9.4 Lattice Graphs

For a positive integer n, the lattice graph Ln is the graph with vertex set {1, . . . n}2 in which
9.1 Introduction vertex (a, b) is connected to vertex (c, d) if a = c or b = d. Thus, the vertices may be arranged at
the points in an n-by-n grid, with vertices being connected if they lie in the same row or column.
In this and the next lecture, I will discuss strongly regular graphs. Strongly regular graphs are Alternatively, you can understand this graph as the line graph of a bipartite complete graph
extremal in many ways. For example, their adjacency matrices have only three distinct between two sets of n vertices.
eigenvalues. If you are going to understand spectral graph theory, you must have these in mind. It is routine to see that the parameters of this graph are:
In many ways, strongly-regular graphs can be thought of as the high-degree analogs of expander
graphs. However, they are much easier to construct. k = 2(n − 1), λ = n − 2, µ = 2.

The Paley graphs we encountered in Chapter 7 are Strongly Regular Graphs.


Many times someone has asked me for a matrix of 0s and 1s that “looked random”, and strongly 9.5 Latin Square Graphs
regular graphs provided a resonable answer.
A Latin Square is an n-by-n grid, each entry of which is a number between 1 and n, such that no
Warning: I will use the letters that are standard when discussing strongly regular graphs. So λ
number appears twice in any row or column. For example,
and µ will not be eigenvalues in this lecture.
 
1 2 3
2 3 1
9.2 Definitions 3 1 2

Let me remark that the number of different latin squares of size n grows very quickly, at least as
Formally, a graph G is strongly regular if
fast as n!(n − 1)!(n − 2)! . . . 2!.

1. it is k-regular, for some integer k; From such a latin square, we construct a Latin Square Graph. It will have n2 nodes, one for each
cell in the square. Two nodes are joined by an edge if
2. there exists an integer λ such that for every pair of vertices x and y that are neighbors in G,
there are λ vertices z that are neighbors of both x and y;
1. they are in the same row,
3. there exists an integer µ such that for every pair of vertices x and y that are not neighbors
2. they are in the same column, or
in G, there are µ vertices z that are neighbors of both x and y.

83
CHAPTER 9. STRONGLY REGULAR GRAPHS 85 CHAPTER 9. STRONGLY REGULAR GRAPHS 86

3. they hold the same number. For the lattice graph Ln , we have
r = n − 2, s = −2.
So, such a graph has degree k = 3(n − 1). Any two nodes in the same row will both be neighbors For the latin square graphs of order n, we have
with every other pair of nodes in their row. They will have two more common neighors: the nodes
in their columns holding the other’s number. So, they have n common neighbors. The same r = n − 3, s = −3.
obviously holds for columns, and is easy to see for nodes that have the same number. So, every
pair of nodes that are neighbors have exactly λ = n common neighbors.
On the other hand, consider two vertices that are not neighbors, say (1, 1) and (2, 2). They lie in
9.7 Regular graphs with three eigenvalues
different rows, lie in different columns, and hold different numbers. The vertex (1, 1) has two
common neighbors of (2, 2) in its row: the vertex (1, 2) and the vertex holding the same number We will now show that every regular connected graph with at most 3 eigenvalues must be a
as (2, 2). Similarly, it has two common neighbors of (2, 2) in its column. Finally, we can find two strongly regular graph. Let G be k-regular, and let its eigenvalues other than k be r and s. As G
more common neighbors of (2, 2) that are in different rows and columns by looking at the nodes is connected, its adjacency eigenvalue k has multiplicty 1.
that hold the same number as (1, 1), but which are in the same row or column as (2, 2). So, µ = 6.
Then, for every vector orthogonal to 1, we have

(A − rI)(A − sI)v = 0.
9.6 The Eigenvalues of Strongly Regular Graphs
Thus, for some β,
(A − rI)(A − sI) = βJ,
We will consider the adjacency matrices of strongly regular graphs. Let A be the adjacency
matrix of a strongly regular graph with parameters (k, λ, µ). We already know that A has an which gives
eigenvalue of k with multiplicity 1. We will now show that A has just two other eigenvalues.
A2 − (r + s)A + rsI = βJ =⇒
To prove this, first observe that the (u, v) entry of A2 is the number of common neighbors of
A2 = (r + s)A − rsI + βJ
vertices u and v. For u = v, this is just the degree of vertex u. We will use this fact to write A2 as
a linear combination of A, I and J. To this end, observe that the adjacency matrix of the = (r + s + β)A + β(J − A − I) + (rs + β)I.
complement of A (the graph with non-edges where A has edges) is J − I − A. So,
So, the number of common neighbors of two nodes just depends on whether or not they are
A2 = λA + µ(J − I − A) + kI = (λ − µ)A + µJ + (k − µ)I. neighbors, which implies that A is strongly regular.

For every vector v orthogonal to 1,


9.8 Integrality of the eigenvalues
A2 v = (λ − µ)Av + (k − µ)v .

So, every eigenvalue θ of A other than k satisfies We will now see that, unless f = g, both r and s must be integers. We do this by observing a few
identities that they both must satisfy. First, from the quadratic equation above, we know that
θ2 = (λ − µ)θ + k − µ.
r+s=λ−µ (9.1)
The eigenvalues of A other than k are those θ that satisfy this quadratic equation, and so are
given by and
p
λ − µ ± (λ − µ)2 + 4(k − µ) rs = µ − k. (9.2)
.
2 As the trace of an adjacency matrix is zero, and is also the sum of the eigenvalues times their
These eigenvalues are always denoted r and s, with r > s. By convention, the multiplicty of the multiplicites, we know
eigenvalue r is always denoted f , and the multiplicty of s is always denoted g. k + f r + gs = 0. (9.3)
For example, for the pentagon we have So, it must be the case that s < 0. Equation 9.1 then gives r > 0.
√ √ If f ̸= g, then equations (9.3) and (9.1) provide independent constraints on r and s, and so
5−1 5+1
r= , s=− . together they determine r and s. As the coefficients in both equations are integers, they tell us
2 2
CHAPTER 9. STRONGLY REGULAR GRAPHS 87 CHAPTER 9. STRONGLY REGULAR GRAPHS 88

that both r and s are rational numbers. From this, and the fact that r and s are the roots of a Let W be the analogous matrix for the eigenvalue s. We then have
quadratic equation with integral coefficients, we may conclude that r and s are in fact integers.
1
Let me remind you as to why. A = rU U T + sW W T + k J.
n
Lemma 9.8.1. If θ is a rational number that satisfies 1
As each of the matrices U U T , W W T and nJ are projections (having all eigenvalues 0 or 1), and
θ2 + bθ + c = 0, are mutually orthogonal, we also have

where b and c are integers, then θ must be an integer. 1


A2 = r2 U U T + s2 W W T + k 2 J.
n
Proof. Write θ = x/y, where the greatest common divisor of x and y is 1. We then have Consider the polynomial
(X − s)(X − k)
(x/y)2 + b(x/y) + c = 0, P (X) = .
(r − s)(r − k)
which implies We have 
x2 + bxy + cy 2 = 0, 
1 if X = r
P (X) = 0 if X = s, and
which implies that y divides x2 . As we have assumed the greatest common divisor of x and y is 1, 

this implies y = 1. 0 if X = k.
So,
1
P (A) = P (r)U U T + P (s)W W T + P (k) J = U U T .
9.9 The Eigenspaces of Strongly Regular Graphs n
That is, the Gram matrix of the point set x 1 , . . . , x n is a linear combination of the identity, A
It is natural to ask what the eigenspaces can tell us about a strongly regular graph. But, we will and A2 . So, the distance between any pair of points in this set just depends on whether or not the
find that they don’t tell us anything we don’t already know. corresponding vertices are neighbors in G.

Let u 1 , . . . u f be an orthonormal set of eigenvectors of the eigenvalue r, and let U be the matrix In particular, this means that the point set x 1 , . . . , x n is a two-distance point set: a set of points
containing these vectors as columns. Recall that U is only determined up to an orthnormal such that there are only two different distances between them. Next lecture, we will use this fact
transformation. That is, we could equally take U Q for any f -by-f orthnormal matrix Q. to prove a lower bound on the dimensions f and g.

To the ith vertex, we associate the vector


def 9.10 Triangular Graphs
x i = (u 1 (i), . . . , u f (i)).

While the vectors U are determined only up to orthogonal transformations, these transformations For a positive integer n, the triangular graph Tn may be defined to be the line graph of the
don’t effect the geometry of these vectors. For example, for vertices i and j, the distance between complete graph on n vertices. To be more concrete, its vertices are the subsets of size 2 of
x i and x j is {1, . . . , n}. Two of these sets are connected by an edge if their intersection has size 1.
∥x i − x j ∥ ,
You are probably familiar with some triangular graphs. T3 is the triangle, T4 is the skeleton of the
and octahedron, and T5 is the complement of the Petersen graph.
2 2 2
∥x i − x j ∥ = ∥x i ∥ + ∥xxj ∥ − 2x i x Tj .
Let’s verify that these are strongly-regular, and compute their parameters. As the construction is
On the other hand, competely symmetric, we may begin by considering any vertex, say the one labeled by the set
{1, 2}. Every vertex labeled by a set of form {1, i} or {2, i}, for i ≥ 3, will be connected to this
∥x i Q − x j Q∥2 = ∥x i Q∥2 +∥xxj Q∥2 −2(x i Q)(x j Q)T = ∥x i Q∥2 +∥xxj Q∥2 −2x i QQT x Tj = ∥x i ∥2 +∥xxj ∥2 −2x i x Tj .
set. So, this vertex, and every vertex, has degree 2(n − 2).
In fact, all the geometrical information about the vectors x i is captured by their Gram matrix, For any neighbor of {1, 2}, say {1, 3}, every other vertex of from {1, i} for i ≥ 4 will be a neighbor
whose (i, j) entry is x i x Tj . This matrix is also given by of both of these, as will the set {2, 3}. Carrying this out in general, we find that
λ = (n − 3) + 1 = n − 2.
UUT .
CHAPTER 9. STRONGLY REGULAR GRAPHS 89 CHAPTER 9. STRONGLY REGULAR GRAPHS 90

Finally, any non-neighbor of {1, 2}, say {3, 4}, will have 4 common neighbors with {1, 2}: coefficients γ1 , . . . , γn with γ1 ̸= 0 and
X
{1, 3} , {1, 4} , {2, 3} , {2, 4} . γi pi (y) = 0.
i
So, µ = 4.
To obtain a contradiction, plug in y = x 1 , to find
X
9.11 Two-distance point sets γi pi (x 1 ) = γ1 p1 (x 1 ) ̸= 0.
i

Recall from last lecture that each eigenspace of a strongly regular graph supply a set of points on Thus, we may conclude  
the unit sphere such that the distance between a pair of points just depends on whether or not f
n ≤ 1 + 2f + ,
they are adjacent. If the graph is connected and not the complete graph, then we can show that 2
these distances are greater than zero, so no two vertices map to the same unit vector. If we take which implies √
the corresponding point sets for two strongly regular graphs with the same parameters, we can f≥ 2n − 2.
show that the graphs are isomorphic if and only if there is an orthogonal transformation that
maps one point set to the other. In low dimensions, it is easy to find such an orthogonal
transformation if one exists.
Consider the eigenspace of r, which we recall has dimension f . Fix any set of f independent
vectors corresponding to f vertices. An orthogonal transformation is determined by its action on
these vectors. So, if there is an orthogonal transformation that maps one vector set onto the
other, we will find it by examining all orthogonal transformations determined
 by mapping these f
vectors to f vectors in the other set. Thus, we need only examine nf f ! transformations. This
would be helpful √if f were small. Unfortunately, it is not. We will now prove that both f and g
must be at least 2n − 2.
Let x 1 , . . . , x n be a set of unit vectors in IRf such that there are two values α, β < 1 such that

x i , x j = α or β.

We will prove a lower bound on f in terms of n.


The key to our proof is to define an f -variate polynomial for each point. In particular, we set

pi (y ) = (y T x i − α)(y T x i − β),

for y ∈ IRf . We first note that each polynomial pi is a polynomial of degree 2 in f variables (the
coordinates of y ). As each f -variate polynomial of degree 2 can be expressed in the form
X X
a+ bi yi + ci,j yi yj ,
i i≤k

we see that the vector space of degree-2 polynomials in f variables has dimension
 
f
1 + 2f + .
2

To prove a lower bound on f , we will show that these polynomials are linearly independent.
Assume by way of contradiction that they are not. Then, without loss of generality, there exist
Chapter 10

Random Walks on Graphs

Part III We will examine how the eigenvalues of a graph govern the convergence of a random walk.

10.1 Random Walks


Physical Metaphors We will consider random walks on undirected graphs. Let’s begin with the definitions. Let
G = (V, E, w) be a weighted undirected graph. A random walk on a graph is a process that
begins at some vertex, and at each time step moves to another vertex. When the graph is
unweighted, the vertex the walk moves to is chosen uniformly at random among the neighbors of
the present vertex. When the graph is weighted, it moves to a neighbor with probability
proportional to the weight of the corresponding edge. While the transcript (the list of vertices in
the order they are visited) of a particular random walk is sometimes of interest, it is often more
productive to reason about the expected behavior of a random walk. To this end, we will
investigate the probability distribution over vertices after a certain number of steps.
We will let the vector p t ∈ IRV denote the probability distribution at time t. We write p t (a) to
indicate the value of p t at a vertex a—the probability of being at vertex a at time t. A
probability vector p is a vector such that p(a) ≥ 0, for all a ∈ V , and
X
p(a) = 1.
a

Our initial probability distribution, p 0 , will typically be concentrated one vertex. That is, there
will be some vertex a for which p 0 (a) = 1. In this case, we say that the walk starts at a.
To derive a p t+1 from p t , note that the probability of being at a vertex a at time t + 1 is the sum
over the neighbors b of a of the probability that the walk was at b at time t, times the probability
it moved from b to a in time t + 1. We can state this algebraically as
X w(a, b)
p t+1 (a) = p (b), (10.1)
d (b) t
b:(a,b)∈E
P
where d (b) = a w(a, b) is the weighted degree of vertex b.

91 92
CHAPTER 10. RANDOM WALKS ON GRAPHS 93 CHAPTER 10. RANDOM WALKS ON GRAPHS 94

We may write this in matrix form using the walk matrix of the graph, which is given by f has the same eigenvectors as W .
Of course, W
def We next observe that the degree vector, d , is a Perron vector of W of eigenvalue 1:
W = M D −1 .

We then have M D −1 d = M 1 = d .
p t+1 = W p t .
So, the Perron-Frobenius theorem (Theorem 4.5.1) tells us that all the eigenvalues of W lie
To see why this holds, consider how W acts as an operator on an elementary unit vector. between −1 and 1. As we did in Proposition 4.5.3, one can show that G is bipartite if and only if
X −1 is an eigenvalue of A.
M D −1 δ b = M (δ b /d (b)) = (wa,b /d (b))δ a .
As Wf = W /2 + I /2, this implies that all the eigenvalues of W
f lie between 0 and 1. We denote
a∼b
f and I /2 + A/2 by
the eigenvalues of W
We will often consider lazy random walks, which are the variant of random walks that stay put
1 = ω1 ≥ ω2 ≥ · · · ≥ ωn ≥ 0.
with probability 1/2 at each time step, and walk to a random neighbor the other half of the time.
These evolve according to the equation While the letter ω is not a greek equivalent “w”, we use it because it looks like one.
X w(a, b)
p t+1 (a) = (1/2)p t (a) + (1/2) p (b), (10.2) From Claim 10.2.1, we now know that
d (b) t
b:(a,b)∈E
def d 1/2
ψ1 =
and satisfy d 1/2
f p t,
p t+1 = W
f is the lazy walk matrix , given by
where W is the unit-norm Perron vector of A, where
def
f def
W
1 1 1 1
= I + W = I + M D −1 . d 1/2 (a) = d (a)1/2 .
2 2 2 2

We will usually work with lazy random walks. 10.3 The stable distribution

Regardless of the starting distribution, the lazy random walk on a connected graph always
10.2 Spectra of Walk Matrices converges to one distribution: the stable distribution. This is the other reason that we forced our
random walk to be lazy. Without laziness1 , there can be graphs on which the random walks never
While the walk matrices are not symmetric, they are similar to symmetric matrices. We will see converge. For example, consider a non-lazy random walk on a bipartite graph. Every-other step
that this implies that they have n real eigenvalues, although their eigenvectors are generally not will bring it to the other side of the graph. So, if the walk starts on one side of the graph, its
orthogonal. Define the normalized adjacency matrix by limiting distribution at time t will depend upon the parity of t.
def In the stable distribution, every vertex is visited with probability proportional to its weighted
A = D −1/2 W D 1/2 = D −1/2 M D −1/2 .
degree. We denote the vector encoding this distribution by π, where
So, A is symmetric.
def
π = d /(1T d ).
Claim 10.2.1. The vector ψ is an eigenvector of A of eigenvalue ω if and only if D 1/2 ψ is an
eigenvector of W of eigenvalue ω. We have already seen that π is a right-eigenvector of eigenvalue 1. To show that the lazy random
walk converges to π, we will exploit the fact that all the eigenvalues other than 1 are in [0, 1).
Proof. As A = D−1/2 W D 1/2 , D 1/2 A = W D 1/2 . Thus, Aψ = ωψ if and only if And, we expand the vectors p t in the eigenbasis of A, after first multiplying by D −1/2 .
1
D 1/2 Aψ = D 1/2 ωψ = ω(D 1/2 ψ) = W (D 1/2 ψ). Strictly speaking, any nonzero probability of staying put at any vertex in a connected graph will guarantee
convergence. We don’t really need a half probability at every vertex.
CHAPTER 10. RANDOM WALKS ON GRAPHS 95 CHAPTER 10. RANDOM WALKS ON GRAPHS 96

f (caution:
Let ψ 1 , . . . , ψ n be the eigenvectors of A corresponding to eigenvalues ω1 , . . . , ωn of W Theorem 10.4.1. For all a, b and t, if p 0 = δ a , then
the corresponding eigenvalues of A are 2ωi − 1). For any initial distribution p 0 , write s
X d (b) t
|p t (b) − π(b)| ≤ ω .
D −1/2 p 0 = ci ψ i , where ci = ψ Ti D −1/2 . d (a) 2
i

Note that Proof. Observe that


(d 1/2 )T 1T p 0 1 p t (b) = δ Tb p t .
c1 = ψ T1 (D −1/2 p 0 ) = (D −1/2
p 0) = = ,
∥d 1/2 ∥ ∥d 1/2 ∥ ∥d 1/2 ∥ From the analysis in the previous section, we know
as p 0 is a probability vector. One of the reasons we do not expand in a basis of eigenvectors of X
f is that it, not being orthogonal, it does not allow such a nice expression for the coefficients. p t (b) = δ Tb p = π(b) + δ Tb D 1/2 ωit ci ψ i .
W
i≥2
We have
We need merely prove an upper bound on the magnitude of the right-hand term. To this end,
f tp 0
pt = W recall that
t 1
f D 1/2 D −1/2 p 0
= D 1/2 D −1/2 W ci = ψ Ti D −1/2 δ a = p ψ Ti δ a .
 t d (a)
= D 1/2 D −1/2 Wf D 1/2 D −1/2 p 0 So, s
1/2 t −1/2 X d (b) T X t
=D (I /2 + A/2) D p0 δ Tb D 1/2 ωit ci ψ i = δ ωi ψ i ψ Ti δ a .
X d (a) b
= D 1/2 (I /2 + A/2)t ci ψ i i≥2 i≥2

X
i Analyzing the right-hand part of this last expression, we find
= D 1/2 ωit ci ψ i
i
X X  
X δ Tb ωit ψ i ψ Ti δ a = ωit δ Tb ψ i ψ Ti δ a
= D 1/2 c1 ψ 1 + D 1/2 ωit ci ψ i i≥2 i≥2
i≥2 X
≤ ωit δ Tb ψ i ψ Ti δ a
i≥2
X
As 0 ≤ ωi < 1 for i ≥ 2, the right-hand term must go to zero. On the other hand, ≤ ω2t δ Tb ψ i ψ Ti δ a
ψ 1 = d 1/2 /∥d 1/2 ∥, so i≥2
X
! ≤ ω2t δ Tb ψ i ψ Ti δ a
1/2 1/2 1 d 1/2 d d i≥1
D c1 ψ 1 = D = =P = π. sX sX
∥d 1/2 ∥ ∥d 1/2 ∥ ∥d 1/2 ∥2 a d (a) 2 2
≤ ω2t δ Tb ψ i δ Ta ψ i by Cauchy-Schwartz
This is a perfect example of one of the main uses of spectral theory: to understand what happens i≥1 i≥1
when we repeatedly apply an operator. = ω2t ∥δ b ∥ ∥δ a ∥ , as the eigenvectors form an orthonormal basis,
= ω2t

10.4 The Rate of Convergence

The rate of convergence of a lazy random walk to the stable distribution is dictated by ω2 : a
small value of ω2 implies fast convergence. 10.5 Relation to the Normalized Laplacian
There are many ways of measuring convergence of a random walk. We will do so point-wise.
The walk matrix is closely related to the normalized Laplacian, which is defined by
Assume that the random walk starts at some vertex a ∈ V . For every vertex b, we will bound how
far p t (b) can be from π(b). N = D −1/2 LD −1/2 = I − D −1/2 M D −1/2 = I − A.
CHAPTER 10. RANDOM WALKS ON GRAPHS 97 CHAPTER 10. RANDOM WALKS ON GRAPHS 98

We let 0 ≤ ν1 ≤ ν2 ≤ · · · ≤ νn denote the eigenvalues of N , and note that they have the same Proof. The Courant-Fischer theorem tells us that
eigenvectors as A. Other useful relations include
xTN x
νi = min max .
νi = 2 − 2ωi , ωi = 1 − νi /2, dim(S)=i x ∈S xTx

and −1/2
As the change of variables y = D x is non-singular, this equals
f = I − 1 D 1/2 N D −1/2 .
W
2 y T Ly
min max .
The normalized Laplacian is positive semidefinite and has the same rank as the ordinary dim(T )=i y ∈T y T Dy
(sometimes called “combinatorial”) Laplacian. There are many advantages of working with the So,
normalized Laplacian: the mean of its eigenvalues is 1, so they are always on a degree-independent y T Ly y T Ly 1 y T Ly λi
scale. One can prove that νn ≤ 2, with equality if and only if the graph is bipartite. min max ≥ min max = min max = .
dim(T )=i y ∈T y T Dy dim(T )=i y ∈T dmax y T y dmax dim(T )=i y ∈T yT y dmax
The bound in Theorem 10.4.1 can be expressed in the eigenvalues of the normalized Laplacian as The other bound may be proved similarly.
s
d (b)
|p t (b) − π(b)| ≤ (1 − ν2 /2)t .
d (a) 10.6 Examples
We will say that a walk has mixed if
We now do some examples. For each we think about the random walk in two ways: by reasoning
|p t (b) − π(b)| ≤ π(b)/2, directly about how a random walk should behave and by examining ν2 .

for all vertices b. Using the approximation 1 − x ≈ exp(−x), we see that this should happen once
10.6.1 The Path
s
d (b)
(1 − ν2 /2)t ≤ d (b)/2d (V ) ⇐⇒ As every vertex in the path on n vertices has degree 1 or 2, ν2 is approximately λ2 , which is
d (a)
p approximately c/n2 for some constant c.
(1 − ν2 /2)t ≤ d (b)d (a)/2d (V ) ⇐⇒
p To understand the random walk on the path, think about what happens when the walk starts in
exp(−tν2 /2) ≤ d (b)d (a)/2d (V ) ⇐⇒ the middle. Ignoring the steps on which it stays put, it will either move to the left or the right
p 
−tν2 /2 ≤ ln d (b)d (a)/2d (V ) ⇐⇒ with probability 1/2. So, the position of the walk after t steps is distributed as the sum of t
 p  random
√ variables taking values
√ in {1, −1}. Recall that the standard deviation of such a sum is
t ≥ 2 ln 2d (V )/ d (b)d (a) /ν2 . t. So, we need to have t comparable to n/4 for there to be a reasonable chance that the walk
is on the left or right n/4 vertices.
So, for graphs in which all degrees are approximately constant, this upper bound on the time to
mix is approximately ln(n)/ν2 . For some graphs the ln n term does not appear. Note that
multiplying all edge weights by a constant does not change any of these expressions. 10.6.2 The Complete Binary Tree

While we have explicitly worked out λ2 for many graphs, we have not done this for ν2 . The As with the path, ν2 for the tree is within a constant of λ2 for the tree, and so is approximately
following lemma will allow us to relate bounds on λ2 to bounds on ν2 : c/n for some constant c. To understand the random walk on Tn , first note that whenever it is at a
vertex, it is twice as likely to step towards a leaf as it is to step towards the root. So, if the walk
Lemma 10.5.1. Let L be the Laplacian matrix of a graph, with eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λn ,
starts at a leaf, there is no way the walk can mix until it reaches the root. The height of the walk
and let N be its normalized Laplacian, with eigenvalues ν1 ≤ ν2 ≤ · · · ≤ ν2 . Then, for all i
is like a sum of ±1 random variables, except that they are twice as likely to be −1 as they are to
λi λi be 1, and that their sum never goes below 0. One can show that we need to wait approximately n
≥ νi ≥ ,
dmin dmax steps before such a walk will hit the root. Once it does hit the root, the walk mixes rapidly.

where dmin and dmax are the minimum and maximum degrees of vertices in the graph.
CHAPTER 10. RANDOM WALKS ON GRAPHS 99 CHAPTER 10. RANDOM WALKS ON GRAPHS 100

10.6.3 The Dumbbell 10.6.4 The Bolas Graph

The dumbell graph Dn consists of two complete graphs on n vertices, joined by one edge called We define the bolas2 graph Bn to be a graph containing two n-cliques connected by a path of
the “bridge”. So, there are 2n vertices in total, and all vertices have degree n − 1 or n. length n. The bolas graph has a value of ν2 that is almost as small as possible. Equivalently,
random walks on a bolas graph mix almost as slowly as possible.
To understand the random walk on this graph, consider starting it at some vertex that is not
attached to the bridge edge. After the first step the walk will be well mixed on the vertices in the The analysis of the random walk on a bolas is similar to that on a dumbbell, except that when
side on which it starts. Because of this, the chance that it finds the edge going to the other side is the walk is on the first vertex of the path the chance that it gets to the other end before moving
only around 1/n2 : there is only a 1/n chance of being at the vertex attached to the bridge edge, back to the clique at which we started is only 1/n. So, we must wait around n3 steps before there
and only a 1/n chance of choosing that edge when at that vertex. So, we must wait some multiple is a reasonable chance of getting to the other side.
of n2 steps before there is a reasonable chance that the walk reaches the other side of the graph.
Next lecture, we will learn that we can upper bound ν2 with a test vector using the fact that
The isoperimetric ratio of this graph is
1 x T Lx
θDn ∼ . ν2 = min .
n x ⊥d x T Dx
Using the test vector that is 1 on one complete graph and −1 on the other, we can show that
To prove an upper bound on ν2 , form a test vector that is n/2 on one clique, −n/2 on the other,
λ2 (Dn ) ⪅ 1/n. and increases by 1 along the path. We can use the symmetry of the construction to show that this
vector is orthogonal to d . The numerator of the generalized Rayleigh quotient is n, and the
Lemma 10.5.1 then tells us that denominator is the sum of the squares of the entries of the vectors times the degrees of the
ν2 (Dn ) ⪅ 1/n2 . vertices, which is some constant times n4 . This tells us that ν2 is at most some constant over n3 .

To prove that this bound is almost tight, we use the following lemma. To see that ν2 must be at least some constant over n3 , and in fact that this must hold for every
graph, apply Lemmas 10.5.1 and 10.6.1.
Lemma 10.6.1. Let G be an unweighted graph of diameter at most r. Then,
2
λ2 (G) ≥ . 10.7 Diffusion
r(n − 1)

Proof. For every pair of vertices (a, b), let P (a, b) be a path in G of length at most r. We have There are a few types of diffusion that people study in a graph, but the most common is closely
related to random walks. In a diffusion process, we imagine that we have some substance that can
L(a,b) ≼ r · LP (a,b) ≼ rLG . occupy the vertices, such as a gas or fluid. At each time step, some of the substance diffuses out
of each vertex. If we say that half the substance stays at a vertex at each time step, and the other
So,   half is distributed among its neighboring vertices, then the distribution of the substance will
n
Kn ≼ r G, evolve according to equation (10.2). That is, probability mass obeys this diffusion equation.
2
People often consider finer time steps in which smaller fractions of the mass leave the vertices. In
and  
n the limit, this results in continuous random walks that are modeled by the matrix exponential: if
n≤r λ2 (G), the walk stays put with probability 1 − ϵ in each step, and we view each step as taking time ϵ,
2
then the transition matrix of the walk after time t will be
from which the lemma follows.
((1 − ϵ)I + ϵW )t/ϵ → exp(t(W − I )).
The diameter of Dn is 3, so we have λ2 (Dn ) ≥ 2/3(n − 1). As every vertex of Dn has degree at
least n − 1, we may conclude ν2 (Dn ) ⪆ 2/3(n − 1)2 . These are in many ways more natural than discrete time random walks.
2
A bolas is a hunting weapon consisting of two balls or rocks tied together with a cord.
CHAPTER 10. RANDOM WALKS ON GRAPHS 101

10.8 Final Notes

f and showing that the


The procedure we have described—repeatedly multiplying a vector by W
result approximates π—is known in Numerical Linear Algebra as the power method. It is one of
the common ways of approximately computing eigenvectors.
In the proof of Theorem 10.4.1, we were a little loose with some of the terms. The slack comes Chapter 11
from two sources. First, we upper bounded ωi by ω2 for all i, while many of the ωi are probably
significantly less than ω2 . This phenomenon is often called “eigenvalue decay”, and it holds in
many graphs. This sloppiness essentially costs us a multiplicative factor of log n in the number of
steps t we need to achieve the claimed bound. You will note that in the examples above, the time
to approximate convergence is typically on the order of 1/ν2 , not (log n)/ν2 . This is because of
Walks, Springs, and Resistor
eigenvalue decay.
Networks
The second source of slack appeared when we upper bounded the absolute value of a sum by the
sum of the absolute value.

Lecture 12 from October 8, 2018

This Chapter Needs Editing

11.1 Overview

In this lecture we will see how the analysis of random walks, spring networks, and resistor
networks leads to the consideration of systems of linear equations in Laplacian matrices. The
main purpose of this lecture is to introduce concepts and language that we will use extensively in
the rest of the course.

11.2 Harmonic Functions

The theme of this whole lecture will be harmonic functions on graphs. These will be defined in
terms of a weighted graph G = (V, E, w) and a set of boundary vertices B ⊆ V . We let S = V − B
(I use “-” for set-minus). We will assume throughout this lecture that G is connected and that B
is nonempty.
A function x : V → R is said to be harmonic at a vertex a if the value of x at a is the weighted
average of its values at the neighbors of a where the weights are given by w:
1 X
x (a) = wa,b x (b). (11.1)
da
b∼a

The function x is harmonic on S if it is harmonic for all a ∈ S.

102
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 103 CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 104

11.3 Random Walks on Graphs should be its weight. If a rubber band connects vertices a and b, then Hooke’s law tells us that
the force it exerts at node a is in the direction of b and is proportional to the distance between a
Consider the standard (not lazy) random walk on the graph G. Recall that when the walk is at a and b. Let x (a) be the position of each vertex a. You should begin by thinking of x (a) being in
vertex a, the probability it moves to a neighbor b is IR, but you will see that it is just as easy to make it a vector in IR2 or IRk for any k.
wa,b The force the rubber band between a and b exerts on a is
.
da
x (b) − x (a).
Distinguish two special nodes in the graph that we will call s and t, and run the random walk In a stable configuration, all of the vertices that have not been nailed down must experience a
until it hits either s or t. We view s and t as the boundary, so B = {s, t}. zero net force. That is
X X
Let x (a) be the probability that a walk that starts at a will stop at s, rather than at t. We have (x (b) − x (a)) = 0 ⇐⇒ x (b) = da x (a)
the boundary conditions x (s) = 1 and x (t) = 0. For every other node a the chance that the walk b∼a b∼a
stops at s is the sum over the neighbors b of a of the chance that the walk moves to b, times the 1 X
⇐⇒ x (b) = x (a).
chance that a walk from b stops at s. That is, da
b∼a
X wa,b
x (a) = x (b). In a stable configuration, every vertex that is not on the boundary must be the average of its
da neighbors.
b∼a

So, the function x is harmonic at every vertex in V − B. In the weighted case, we would have for each a ∈ V − B
For example, consider the path graph Pn . Let’s make s = n and t = 1. So, the walk stops at 1 X
wa,b x (b) = x (a).
either end. We then have x (n) = 1, x (1) = 0. It is easy to construction at least one solution to da
b∼a
the harmonic equations (11.1): we can set
That is, x is harmonic on V − B.
a−1
x (a) = . We will next show that the equations (11.1) have a solution, and that it is unique1 if the
n−1
underlying graph is connected and B is nonempty But first, consider again the path graph Pn
It essentially follows from the definitions that there can be only one vector x that solves these with the endpoints fixed: B = {1, n}. Let us fix them to the values f (1) = 1 and f (n) = n. The
equations. But, we will prove this algebraically later in lecture. only solution to the equations (11.1) is the obvious one: vertex i is mapped to i: x (i) = i for all i.
These solutions tell us that if the walk starts at node a, the chance that it ends at node n is
(a − 1)/(n − 1). This justifies some of our analysis of the Bolas graph from Lecture 10.
11.5 Laplacian linear equations
Of course, the exact same analysis goes through for the lazy random walks: those give
X wa,b X wa,b
x (a) = (1/2)x (a) + (1/2) x (b) ⇐⇒ x (a) = x (b). If we rewrite equation (11.1) as
da da X
b∼a b∼a
da x (a) − wa,b x (b) = 0, (11.2)
b∼a
11.4 Spring Networks we see that it corresponds to the row of the Laplacian matrix corresponding to vertex a. So, we
may find a solution to the equations (11.1) by solving a system of equations in the submatrix of
We begin by imagining that every edge of a graph G = (V, E) is an ideal spring or rubber band. the Laplacian indexed by vertices in V − B.
They are joined together at the vertices. Given such a structure, we will pick a subset of the To be more concete, I will set up those equations. For each vertex a ∈ B, let its position be fixed
vertices B ⊆ V and fix the location of every vertex in B. For example, you could nail each vertex to f (a). Then, we can re-write equation (11.2) as
in B onto a point in the real line, or onto a board in IR2 . We will then study where the other X X
vertices wind up. da x (a) − wa,b x (b) = wa,b f (b),
b̸∈B:(a,b)∈E b∈B:(a,b)∈E
We can use Hooke’s law to figure this out. To begin, assume that each rubber band is an ideal
1
spring with spring constant 1. If your graph is weighted, then the spring constant of each edge It can only fail to be unique if there is a connected component that contains no vertices of B.
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 105 CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 106

for each a ∈ V − B. So, all of the boundary terms wind up in the right-hand vector. Proof. Let S1 , . . . , Sk be the connected components of vertices of G(S). We can use these to write
L(S, S) as a block matrix with blocks equal to L(Si , Si ). Each of these blocks can be written
Let S = V − B. We now see that this is an equation of the form
L(Si , Si ) = LGSi + XSi .
L(S, S)x (S) = r , with r = M (S, :)f .
As G is connected, there must be some vertex in Si with an edge to a vertex not in Si . This
By L(S, S) I mean the submatrix of L indexed by rows and columns of S, and by x (S) I mean
implies that XSi is not the zero matrix, and so we can apply Lemma 11.5.2 to prove that L(Si , Si )
the sub-vector of x indexed by S.
is invertible.
We can then write the condition that entries of B are fixed to f by
As the matrix L(S, S) is invertible, the equations have a solution, and it must be unique.
x (B) = f (B).

We have reduced the problem to that of solving a system of equations in a submatrix of the 11.6 Energy
Laplacian.
Submatrices of Laplacians are a lot like Laplacians, except that they are positive definite. To see Physics also tells us that the vertices will settle into the position that minimizes the potential
this, note that all of the off-diagonals of the submatrix of L agree with all the off-diagonals of the energy. The potential energy of an ideal linear spring with constant w when stretched to length l
Laplacian of the induced subgraph on the internal vertices. But, some of the diagonals are larger: is
1 2
the diagonals of nodes in the submatrix account for both edges in the induced subgraph and wl .
2
edges to the vertices in B.
So, the potential energy in a configuration x is given by
Claim 11.5.1. Let L be the Laplacian of G = (V, E, w), let B ⊆ V , and let S = V − B. Then,
def 1 X
E (x ) = wa,b (x (a) − x (b))2 . (11.3)
L(S, S) = LG(S) + X S , 2
(a,b)∈E

where G(S) is the subgraph induced on the vertices in S and X S is the diagonal matrix with
entries For any x that minimizes the energy, the partial derivative of the energy with respect to each
X
X S (a, a) = wa,b , for a ∈ S. variable must be zero. In this case, the variables are x (a) for a ∈ S. The partial derivative with
b∼a,b∈B
respect to x (a) is
1X X
wa,b 2(x (a) − x (b)) = wa,b (x (a) − x (b)).
Lemma 11.5.2. Let L be the Laplacian matrix of a connected graph and let X be a nonnegative, 2
b∼a b∼a
diagonal matrix with at least one nonzero entry. Then, L + X is positive definite.
Setting this to zero gives the equations we previously derived: (11.1).
Proof. We will prove that x T (L + X )x > 0 for every nonzero vector x . As both L and X are For future reference, we state this result as a theorem.
positive semidefinite, we have
 Theorem 11.6.1. Let G = (V, E, w) be a connected, weighted graph, let B ⊂ V , and let
x T (L + X )x ≥ min x T Lx , x T X x . S = V − B. Given x (B), E (x ) is minimized by setting x (S) so that x is harmonic on S.

As the graph is connected, x T Lx is positive unless x is a constant vector. If x = c1 for some


c ̸= 0, then we obtain 11.7 Resistor Networks
X
c2 1T (L + X )1 = c2 1T X 1 = c2 X (i, i) > 0.
i
We now consider a related physical model of a graph in which we treat every edge as a resistor. If
the graph is unweighted, we will assume that each resistor has resistance 1. If an edge e has
weight we , we will give the corresponding resistor resistance re = 1/we . The reason is that when
the weight of an edge is very small, the edge is barely there, so it should correspond to very high
Lemma 11.5.3. Let L be the Laplacian matrix of a connected graph G = (V, E, w), let B be a
resistance. Having no edge corresponds to having a resistor of infinite resistance.
nonempty, proper subset of V , and let S = V − B. Then, L(S, S) is positive definite.
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 107 CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 108

Recall Ohm’s law: It is often helpful to think of the nodes a for which i ext (a) ̸= 0 as being boundary nodes. We will
V = IR. call the other nodes internal. Let’s see what the equation
That is, the potential drop across a resistor (V ) is equal to the current flowing over the resistor i ext = Lv .
(I) times the resistance (R). To apply this in a graph, we will define for each edge (a, b) the
current flowing from a to b to be i (a, b). As this is a directed quantity, we define means for the internal nodes. If the graph is unweighted and a is an internal node, then the ath
row of this equation is
i (b, a) = −i (a, b). X X
0 = (δ Ta L)v = (v (a) − v (b)) = da v (a) − v (b).
a∼b a∼b
I now let v ∈ IRV be a vector of potentials (voltages) at vertices. Given these potentials, we can
That is,
figure out how much current flows on each edge by the formula 1 X
v (a) = v (b),
1 da
i (a, b) = (v (a) − v (b)) = wa,b (v (a) − v (b)) . a∼b
ra,b which means that v is harmonic at a. Of course, the same holds in weighted graphs.
That is, we adopt the convention that current flows from high voltage to low voltage. We would
like to write this equation in matrix form. The one complication is that each edge comes up twice
in i . So, to treat i as a vector we will have each edge show up exactly once as (a, b) when a < b. 11.8 Solving for currents
We now define the signed edge-vertex adjacency matrix of the graph U to be the matrix with
rows indexed by edges and columns indexed by vertices such that We are often interested in applying (11.4) in the reverse: given a vector of external currents i ext
 we solve for the induced voltages by

1 if a = c v = L−1 i ext .
U ((a, b), c) = −1 if b = c

 This at first appears problematic, as the Laplacian matrix does not have an inverse. The way
0 otherwise. around this problem is to observe that we are only interested in solving these equations for vectors
i ext for which the system has a solution. In the case of a connected graph, this equation will have
Thus the row of U corresponding to edge (a, b) is U ((a, b), :) = δ Ta − δ Tb . a solution if the sum of the values of i ext is zero. That is, if the current going in to the circuit
Define W to be the diagonal matrix with rows and columns indexed by edges with the weights of equals the current going out. These are precisely the vectors that are in the span of the Laplacian.
the edges on the diagonals. We then have To obtain the solution to this equation, we multiply i ext by the Moore-Penrose pseudo-inverse of
L.
i = W U v.
Definition 11.8.1. The pseudo-inverse of a symmetric matrix L, written L+ , is the matrix that
Also recall that resistor networks cannot hold current. So, all the current entering a vertex a from has the same span as L and that satisfies
edges in the graph must exit a to an external source. Let i ext ∈ IRV denote the external currents, LL+ = Π,
where i ext (a) is the amount of current entering the graph through node a. We then have
X
i ext (a) = i (a, b). where Π is the symmetric projection onto the span of L.
b∼a

In matrix form, this becomes I remind you that a matrix Π is a symmetric projetion if Π is symmetric and Π2 = Π. This is
T T
i ext = U i = U W U v . (11.4) equivalent to saying that all of its eigenvalues are 0 or 1. We also know that Π = (1/n)LKn .

The matrix The symmetric case is rather special. As LΠ = L, the other following properties of the
def Moore-Penrose pseudo inverse follow from this one:
L = UTWU
is, of course, the Laplacian. This is another way of writing the expression that we derived in L+ L = Π,
Lecture 3: X LL+ L = L
L= wa,b (δ a − δ b )(δ a − δ b )T . L+ LL+ = L+ .
a∼b
CHAPTER 11. WALKS, SPRINGS, AND RESISTOR NETWORKS 109

It is easy to find a formula for the pseudo-inverse. First, let Ψ be the matrix whose ith column is
ψ i and let Λ be the diagonal matrix with λi on the ith diagonal. Recall that
X
L = ΨΛΨT = λi ψ i ψ Ti .
i

Claim 11.8.2.
L+ =
X
(1/λi )ψ i ψ Ti . Chapter 12
i>1

11.9 Exercise Effective Resistance and Schur


Prove that for every p > 0 X Complements
Lp = ΨΛp ΨT = λpi ψ i ψ Ti .
i

Moreover, this holds for any symmetric matrix. Not just Laplacians.
The effective resistance between two vertices a and b in an electrical network is the resistance of
the entire network when we treat it as one complex resistor. That is, we reduce the rest of the
network to a single edge. In general, we will see that if we wish to restrict our attention to a
subset of the vertices, B, and if we require all other vertices to be internal, then we can construct
a network just on B that factors out the contributions of the internal vertices. The process by
which we do this is Gaussian elimination, and the Laplacian of the resulting network on B is
called a Schur complement.

12.1 Electrical Flows and Effective Resistance

We now know that if a resistor network has external currents i ext , then the voltages induced at
the vertices will be given by
v = L+ i ext .

Consider what this means when i ext corresponds to a flow of one unit from vertex a to vertex b.
The resulting voltages are
v = L+ (δ a − δ b ).
Now, let c and d be two other vertices. The potential difference between c and d is

v (c) − v (d) = (δ c − δ d )T v = (δ c − δ d )T L+ (δ a − δ b ).

Note the amazing reciprocity here: as L is symmetric this is equal to

(δ a − δ b )T L+ (δ c − δ d ).

So, the potential difference between c and d when we flow one unit from a to b is the same as the
potential difference between a and b when we flow one unit from c to d.

110
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 111 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 112

The effective resistance between vertices a and b is the resistance between a and b when we view when x (s) is fixed to 0 and x (t) is fixed to 1. From Theorem 11.6.1, we know that this vector will
the entire network as one complex resistor. be harmonic on V − {s, t}.
To figure out what this is, recall the equation Fortunately, we already know how compute such a vector x . Set
v (a) − v (b) y = L+ (δ t − δ s )/Reff (t, s).
i (a, b) = ,
ra,b
We have
which holds for one resistor. We use the same equation to define the effective resistance of the y (t) − y (s) = (δ t − δ s )T L+ (δ t − δ s )/Reff (s, t) = 1,
whole network between a and b. That is, we consider an electrical flow that sends one unit of
and y is harmonic on V − {s, t}. So, we choose
current into node a and removes one unit of current from node b. We then measure the potential
difference between a and b that is required to realize this current, define this to be the effective x = y − 1y (s).
resistance between a and b, and write it Reff (a, b). As it equals the potential difference between a
and b in a flow of one unit of current from a to b: The vector x satisfies x (s) = 0, x (t) = 1, and it is harmonic on V − {s, t}. So, it is the vector
that minimizes the energy subject to the boundary conditions.
def
Reff (a, b) = (δ a − δ b )T L+ (δ a − δ b ).
To finish, we compute the energy to be

We will eventually show that effective resistance is a distance. For now, we observe that effective x T Lx = y T Ly
resistance is the square of a Euclidean distance. 1 T 
= L+ (δ t − δ s ) L L+ (δ t − δ s )
To this end, let L+/2 denote the square root of L+ . Recall that every positive semidefinite matrix (Reff (s, t))2
has a square root: the square root of a symmetric matrix M is the symmetric matrix M 1/2 such 1
= (δ t − δ s )T L+ LL+ (δ t − δ s )
that (M 1/2 )2 = M . If (Reff (s, t))2
X 1
M = λi ψ i ψ T = (δ t − δ s )T L+ (δ t − δ s )
i (Reff (s, t))2
is the spectral decomposition of M , then 1
= .
X Reff (s, t)
1/2
M 1/2 = λi ψ i ψ T .
i As the weights of edges are the reciprocals of their resistances, and the spring constant
corresponds to the weight, this is the formula we would expect.
We now have
Resistor networks have an analogous quantity: the energy dissipation (into heat) when current
 T 2 flows through the network. It has the same formula. The reciprocal of the effective resistance is
T + +/2 +/2 +/2
(δ a − δ b ) L (δ a − δ b ) = L (δ a − δ b ) L (δ a − δ b ) = L (δ a − δ b ) sometimes called the effective conductance.
2
= L+/2 δ a − L+/2 δ b = dist(L+/2 δ a , L+/2 δ b )2 .
12.3 Monotonicity
12.2 Effective Resistance through Energy Minimization
Rayleigh’s Monotonicity Principle tells us that if we alter the spring network by decreasing some
of the spring constants, then the effective spring constant between s and t will not increase. In
As you would imagine, we can also define the effective resistance through effective spring
terms of effective resistance, this says that if we increase the resistance of some resistors then the
constants. In this case, we view the network of springs as one large compound network. If we
effective resistance can not decrease. This sounds obvious. But, it is in fact a very special
define the effective spring constant of s, t to be the number w so that when s and t are stretched
property of linear elements like springs and resistors.
to distance l the potential energy in the spring is wl2 /2, then we should define the effective spring
constant to be twice the minimum possible energy of the network, b = (V, E, w)
Theorem 12.3.1. Let G = (V, E, w) be a weighted graph and let G b be another
X weighted graph with the same edges and such that
2E (x ) = wa,b (x (a) − x (b))2 ,
(a,b)∈E ba,b ≤ wa,b
w
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 113 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 114

for all (a, b) ∈ E. For vertices s and t, let cs,t be the effective spring constant between s and t in 12.5 Equivalent Networks, Elimination, and Schur Complements
G and let b b Then,
cs,t be the analogous quantity in G.
We have shown that the impact of the entire network on two vertices can be reduced to a network
cs,t ≤ cs,t .
b
with one edge between them. We will now see that we can do the same for a subset of the
vertices. We will do this in two ways: first by viewing L as an operator, and then by considering
Proof. Let x be the vector of minimum energy in G such that x (s) = 0 and x (t) = 1. Then, the it as a quadratic form.
b is no greater:
energy of x in G
Let B be the subset of nodes that we would like to understand (B stands for boundary). All
1 X 1 X
ba,b (x (a) − x (b))2 ≤
w wa,b (x (a) − x (b))2 = cs,t . nodes not in B will be internal. Call them I = V − B.
2 2
(a,b)∈E (a,b)∈E
As an operator, the Laplacian maps vectors of voltages to vectors of external currents. We want
b such that x (s) = 0 and x (t) = 1 will be at most cs,t ,
So, the minimum energy of a vector x in G to examine what happens if we fix the voltages at vertices in B, and require the rest to be
cs,t ≤ cs,t .
and so b harmonic. Let v (B) ∈ IRB be the voltages at B. We want the matrix LB such that

i B = LB v (B)
Similarly, if we let R b eff (s, t) be the effective resistance in G between s and t, then
b eff (s, t) ≥ Reff (s, t). That is, increasing the resistance of resistors in the network cannot decrease
R is the vector of external currents a vertices in B when we impose voltages v (B) at vertices of B.
effective resistances. As the internal vertices will have their voltages set to be harmonic, they will not have any
external currents.
While this principle seems very simple and intuitively obvious, it turns out to fail in just slightly
more complicated situations. The remarkable fact that we will discover is that LB is in fact a Laplacian matrix, and that it is
obtained by performing Gaussian elimination to remove the internal vertices. Warning: LB is
not a submatrix of L. To prove this, we will move from V to B by removing one vertex at a time.
12.4 Examples: Series and Parallel We’ll start with a graph G = (V, E, w), and we will set B = {2, . . . , n}, and we will treat vertex 1
as internal. Let N denote the set of neighbors of vertex 1.
In the case of a path graph with n vertices and edges of weight 1, the effective resistance between We want to compute Lv given v (b) for b ∈ B, and that
the extreme vertices is n − 1.
1 X
In general, if a path consists of edges of resistance r1,2 , . . . , rn−1,n then the effective resistance v (1) = w1,a v (a). (12.1)
d (1)
between the extreme vertices is a∈N
r1,2 + · · · + rn−1,n .
That is, we want to substitute the value on the right-hand side for v (1) everywhere that it
To see this, set the potential of vertex i to appears in the equation i ext = Lv . The variable v (1) only appears in the equation for i ext (a)
when a ∈ N . When it does, it appears with coefficient w1,a . Recall that the equation for i ext (b) is
v (i) = ri,i+1 + · · · + rn−1,n .
X
Ohm’s law then tells us that the current flow over the edge (i, i + 1) will be i ext (b) = d (b)v (b) − wb,c v (c).
c∼b
(v (i) − v (i + 1)) /ri,i+1 = 1.
For b ∈ N we expand this by making the substitution for v (1) given by (12.1).
X
If we have k parallel edges between two nodes s and t of resistances r1 , . . . , rk , then the effective i ext (b) = d (b)v (b) − wb,1 v (1) − wb,c v (c)
resistance is c∼b,c̸=1
1
Reff (s, t) = . 1 X X
1/r1 + · · · + 1/rk = d (b)v (b) − wb,1 w1,a v (a) − wb,c v (c)
d (1)
To see this, impose a potential difference of 1 between s and t. This will induce a flow of a∈N c∼b,c̸=1
X wb,1 wa,1 X
1/ri = wi on edge i. So, the total flow will be = d (b)v (b) − v (a) − wb,c v (c).
X X d (1)
a∈N c∼b,c̸=1
1/ri = wi .
i i
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 115 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 116

To finish, observe that b ∈ N , so we are counting b in the middle sum above. Removing the Thus, we can write the quadratic form as
double-count gives.  T  
X wb,1 wa,1 X −(1/L(1, 1))L(1, B)v (B) −(1/L(1, 1))L(1, B)v (B)
2 L .
i ext (b) = (d (b) − wb,1 /d (1))v (b) − v (a) − wb,c v (c). v (B) v (B)
d (1)
a∈N,a̸=b c∼b,c̸=1

If we expand this out, we find that it equals


We will show that these revised equations have two interesting properties: they are the result of
applying Gaussian elimination to eliminate vertex 1, and the resulting equations are Laplacian. v (B)T L(B, B)v (B) + L(1, 1) (−(1/L(1, 1))L(1, B)v (B))2 + 2v (1)L(1, B) (−(1/L(1, 1))L(1, B)v (B))

Let’s look at exactly how the matrix has changed. In the row for vertex b, the edge to vertex 1 = v (B)T L(B, B)v (B) + (L(1, B)v (B))2 /L(1, 1) − 2 (L(1, B)v (B))2 /L(1, 1)
w wa,1
was removed, and edges to every vertex a ∈ N were added with weights b,1 d (1) . And, the = v (B)T L(B, B)v (B) − (L(1, B)v (B))2 /L(1, 1).
wb,1 wb,1
diagonal was decreased by d (1) . Overall, the star of edges based at 1 were removed, and a
clique on N was added in which edge (a, b) has weight Thus,
L(B, 1)L(1, B)
wb,1 w1,a LB = L(B, B) − .
. L(1, 1)
d (1)
To see that this is the matrix that appears in rows and columns 2 through n when we eliminate
To see that this new system of equations comes from a Laplacian, we observe that the entries in the first column of L by adding multiples of the first row, note that we eliminate
entry L(a, 1) by adding −L(a, 1)/L(1, 1) times the first row of the matrix to L(a, :). Doing this
1. It is symmetric. for all rows in B = {2, . . . , n} results in this formula.

2. The off-diagonal entries that have been added are negative. We can again check that LB is a Laplacian matrix. It is clear from the formula that it is
symmetric and that the off-diagonal entries are negative. To check that the constant vectors are
3. The sum of the changes in diagonal and off-diagonal entries is zero, so the row-sum is still in the nullspace, we can show that the quadratic form is zero on those vectors. If v (B) is a
zero. This follows from constant vector, then v (1) must equal this constant, and so v is a constant vector and the value
2
wb,1 X wb,1 wa,1
− = 0. of the quadratic form is 0.
d (1) d (1)
a∈N

12.6 Eliminating Many Vertices


12.5.1 In matrix form by energy

We now do this in terms of the quadratic form. That is, we will compute the matrix LB so that We can of course use the same procedure to eliminate many vertices. We begin by partitioning
the vertex set into boundary vertices B and internal vertices I. We can then use Gaussian
v (B)T LB v (B) = v T Lv , elimination to eliminate all of the internal vertices. You should recall that the submatrices
produced by Gaussian elimination do not depend on the order of the eliminations. So, you may
given that v is harmonic at vertex 1 and agrees with v (B) elsewhere. The quadratic form that we conclude that the matrix LB is uniquely defined.
want to compute is thus given by
Or, observe that to eliminate the entries in row a ∈ B and columns in S, using the rows in S, we
 1 P T  1 P  need to add those rows, L(S, :) to row L(a, :) with coefficients c so that
b∼1 w1,b v (b) w1,b v (b)
d (1) L d (1) b∼1 .
v (B) v (B) L(a, S) + cL(S, S) = 0.
So that we can write this in terms of the entries of the Laplacian matrix, note that This gives
d (1) = L(1, 1), and so c = −L(a, S)L(S, S)−1 ,
1 X and thus row a becomes
v (1) = w1,b v (b) = −(1/L(1, 1))L(1, B)v (B).
d (1)
b∼1 L(a, :) − L(a, S)L(S, S)−1 L(S, :).
CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 117 CHAPTER 12. EFFECTIVE RESISTANCE AND SCHUR COMPLEMENTS 118

Restricting to rows and columns in B, we are left with the matrix We claim that the effective resistance is a distance. The only non-trivial part to prove is the
triangle inequality, (4).
−1
L(B, B) − L(B, S)L(S, S) L(S, B).
From the previous section, we know that it suffices to consider graphs with only three vertices: we
This is called the Schur complement on B (or with respect to S). can reduce any graph to one on just vertices a, b and c without changing the effective resistances
between them.
To see that this is equivalent to requiring that the variables in S be harmonic. Partition a vector
v into v (B) and v (S). The harmonic equations become Lemma 12.8.1. Let a, b and c be vertices in a graph. Then

L(S, S)v (S) + L(S, B)v (B) = 0, Reff (a, b) + Reff (b, c) ≥ Reff (a, c).

which implies
Proof. Let
v (S) = −L(S, S)−1 L(S, B)v (B) = L(S, S)−1 M (S, B)v (B),
z = wa,b , y = wa,c , and x = wb,c .
as M (S, B) = −L(S, B) because off-diagonal blocks of the Laplacian equal the negative of the
If we eliminate vertex c, we create an edge between vertices a and b of weight
corresponding blocks in the adjacency matrix. This gives
xy
.
i ext (B) = L(B, S)v (S) + L(B, B)v (B) = −L(B, S)L(S, S)−1 L(S, B)v (B) + L(B, B)v (B), x+y
xy
and so Adding this to the edge that is already there produces weight z + x+y , for
−1
i ext (B) = LB v (B), where LB = L(B, B) − L(B, S)L(S, S) L(S, B)
1 1 x+y
is the Schur complement. Reff a,b = xy = zx+zy+xy =
z + x+y x+y
zx + zy + xy

Working symmetrically, we find that we need to prove that for all positive x, y, and z
12.7 An interpretation of Gaussian elimination
x+y y+z x+z
+ ≥ ,
zx + zy + xy zx + zy + xy zx + zy + xy
This gives us a way of understand how Gaussian elimination solves a system of equations like
i ext = Lv . It constructs a sequence of graphs, G2 , . . . , Gn , so that Gi is the effective network on which is of course true.
vertices i, . . . , n. It then solves for the entries of v backwards. Given v (i + 1), . . . , v (n) and
i ext (i), we can solve for v (i). If i ext (i) = 0, then v (i) is set to the weighted average of its
neighbors. If not, then we need to take i ext (i) into account here and in the elimination as well. In
the case in which we fix some vertices and let the rest be harmonic, there is no such complication.

12.8 Effective Resistance is a Distance

A distance is any function on pairs of vertices such that

1. δ(a, a) = 0 for every vertex a,

2. δ(a, b) ≥ 0 for all vertices a, b,

3. δ(a, b) = δ(b, a), and

4. δ(a, c) ≤ δ(a, b) + δ(b, c).


CHAPTER 13. RANDOM SPANNING TREES 120

The second comes from the observation that the determinant is the volume P of the parallelepiped
with axes a 1 , . . . , a n : the polytope whose corners are the origin and i∈S a i for every
S ⊆ {1, . . . , n}. Let
Πa 1
be the symmetric projection orthogonal to a 1 . As this projection amounts to subtracting off a
multiple of a 1 and elementary row operations do not change the determinant,
Chapter 13  
det a 1 , a 2 , . . . , a n = det a 1 , Πa 1 a 2 , . . . , Πa 1 a n .

The volume of this parallelepiped is ∥aa 1 ∥ times the volume of the parallelepiped formed by the
Random Spanning Trees vectors Πa 1 a 2 , . . . , Πa 1 a n . I would like to write this as a determinant, but must first deal with
the fact that these are n − 1 vectors in an n dimensional space. The way we first learn to handle
this is to project them into an n − 1 dimensional space where we can take the determinant.
Instead, we will employ other elementary symmetric functions of the eigenvalues.

13.1 Introduction
13.3 Characteristic Polynomials
In this chapter we present one of the most fundamental results in Spectral Graph Theory: the
Matrix-Three Theorem. It relates the number of spanning trees of a connected graph to the Recall that the characteristic polynomial of a matrix A is
determinants of principal minors of the Laplacian. We then extend this result to relate the
fraction of spanning trees that contain a given edge to the effective resistance of the entire graph det(xI − A).
between the edge’s endpoints.
I will write this as
n
X
xn−k (−1)k σk (A),
13.2 Determinants k=0

where σk (A) is the kth elementary symmetric function of the eigenvalues of A, counted with
To begin, we review some facts about determinants of matrices and characteristic polynomials. algebraic multiplicity: X Y
We first recall the Leibniz formula for the determinant of a square matrix A: σk (A) = λi .
n
! |S|=k i∈S
X Y
det(A) = sgn(π) A(i, π(i)) , (13.1) Thus, σ1 (A) is the trace and σn (A) is the determinant. From this formula, we know that these
π i=1 functions are invariant under similarity transformations.
where the sum is over all permutations π of {1, . . . , n}. In Exercise 3 from Lecture 2, you were asked to prove that
Also recall that the determinant is multiplicative, so for square matrices A and B X
σk (A) = det(A(S, S)). (13.3)
|S|=k
det(AB) = det(A) det(B). (13.2)
This follows from applying the Leibnitz formula (13.1) to det(xI − A).
Elementary row operations do not change the determinant. If the columns of A are the vectors
a 1 , . . . , a n , then for every c If we return to the vectors Πa 1 a 2 , . . . , Πa 1 a n from the previous section, we see that the volume of
  their parallelepiped may be written
det a 1 , a 2 , . . . , a n = det a 1 , a 2 , . . . , a n + caa 1 .

σn−1 0n , Πa 1 a 2 , . . . , Πa 1 a n ,
This fact gives us two ways of computing the determinant. The first comes from the fact that we
can apply elementary row operations to transform A into an upper triangular matrix, and (13.1) as this will be the product of the n − 1 nonzero eigenvalues of this matrix.
tells us that the determinant of an upper triangular matrix is the product of its diagonal entries.

119
CHAPTER 13. RANDOM SPANNING TREES 121 CHAPTER 13. RANDOM SPANNING TREES 122

Recall that the matrices BB T and B T B have the same eigenvalues, up to some zero eigenvalues We will prove that for every a ∈ V ,
if they are rectangular. So, Y
det(LG (Sa , Sa )) = we .
σk (BB T ) = σk (B T B).
e∈E

This gives us one other way of computing the absolute value of the product of the nonzero Write LG = U T W U , where U is the signed edge-vertex adjacency matrix and W is the
eigenvalues of the matrix  diagonal matrix of edge weights. Write B = W 1/2 U , so
Πa 1 a 2 , . . . , Πa 1 a n .
LG (Sa , Sa ) = B(:, Sa )T B(:, Sa ),
We can instead compute their square by computing the determinant of the square matrix
and
 
Πa 1 a 2 det(LG (Sa , Sa )) = det(B(:, Sa ))2 ,
 ..  
 .  Πa 1 a 2 , . . . , Πa 1 a n . where we note that B(:, Sa ) is square because a tree has n − 1 edges and so B has n − 1 rows.
Πa 1 a n To see what is going on, first consider the case in which G is a weighted path and a is the first
vertex. Then,
When B is a singular matrix of rank k, σk (B) acts as the determinant of B restricted to its span.    √ 
1 −1 0 · · · 0 − w1 0 ··· 0
Thus, there are situations in which σk is multiplicative. For example, if A and B both have rank 0 1 −1 · · · 0   √w2 −√w2 · · · 
   0 
k and the range of A is orthogonal to the nullspace of B, then U = . ..  , and B(:, S1 ) =  .. .. .
 .. .   . . 
σk (BA) = σk (B)σk (A). (13.4) √
0 0 0 · · · −1 0 0 · · · − wn−1
We will use this identity in the case that A and B are symmetric and have the same nullspace. We see that B(:, S1 ) is a lower-triangular matrix, and thus its determinant is the product of its

diagonal entries, − wi .
To see that the same happens for every tree, renumber the vertices (permute the columns) so that
13.4 The Matrix Tree Theorem a comes first, and that the other vertices are ordered by increasing distance from 1, breaking ties
arbitrarily. This permutations can change the sign of the determinant, but we do not care
We will state a slight variant of the standard Matrix-Tree Theorem. Recall that a spanning tree because we are going to square it. For every vertex c ̸= 1, the tree now has exactly one edge (b, c)
of a graph is a subgraph that is a tree. with b < c. Put such an edge in position c − 1 in the ordering, and let wc indicate its weight.
Now, when we remove the first column to form B(:, S1 ), we produce a lower triangular matrix
Theorem 13.4.1. Let G = (V, E, w) be a connected, weighted graph. Then √
with the entry − wc on the cth diagonal. So, its determinant is the product of these terms and
X Y
σn−1 (LG ) = n we . n
Y
spanning trees T e∈T det(B(:, Sa ))2 = wc .
c=2

Thus, the eigenvalues allow us to count the sum over spanning trees of the product of the weights
of edges in those trees. When all the edge weights are 1, we just count the number of spanning
trees in G. Proof of Theorem 13.4.1 . As in the previous lemma, let LG = U T W U and B = W 1/2 U . So,
We first prove this in the case that G is just a tree. σn−1 (LG ) = σn−1 (B T B)
Lemma 13.4.2. Let G = (V, E, w) be a weighted tree. Then, = σn−1 (BB T )
X
Y = σn−1 (B(S, :)B(S, :)T ) (by (13.3) )
σn−1 (LG ) = n we .
|S|=n−1,S⊆E
e∈E X
= σn−1 (B(S, :)T B(S, :))
Proof. For a ∈ V , let Sa = V − {a}. We know from (13.3) |S|=n−1,S⊆E
X
X = σn−1 (LGS ),
σn−1 (LG ) = det(LG (Sa , Sa ).
|S|=n−1,S⊆E
a∈V
CHAPTER 13. RANDOM SPANNING TREES 123 CHAPTER 13. RANDOM SPANNING TREES 124

where by GS we mean the graph containing just the edges in S. As S contains n − 1 edges, this Proof. The matrix Γ is clearly symmetric. To show that it is a projection, it suffices to show that
graph is either disconnected or a tree. If it is disconnected, then its Laplacian has at least two all of its eigenvalues are 0 or 1. This is true because, excluding the zero eigenvalues, Γ has the
zero eigenvalues and σn−1 (LGS ) = 0. If it is a tree, we apply the previous lemma. Thus, the sum same eigenvalues as
equals X X Y L+ T +
G B B = LG LG = Π,
σn−1 (LGT ) = n we . where Π is the projection orthogonal to the all 1 vector. As Π has n − 1 eigenvalues that are 1,
spanning trees T ⊆E spanning trees T e∈T
so does Γ.

As the trace of Γ is n − 1, so is the sum of the leverage scores:


X
13.5 Leverage Scores and Marginal Probabilities ℓe = n − 1.
e

The leverage score of an edge, written ℓe is defined to be we Reff (e). That is, the weight of the This is a good sanity check on Theorem 13.5.1: every spanning tree has n − 1 edges, and thus the
edge times the effective resistance between its endpoints. The leverage score serves as a measure probabilities that each edge is in the tree must sum to n − 1.
of how important the edge is. For example, if removing an edge disconnects the graph, then
Reff (e) = 1/we , as all current flowing between its endpoints must use the edge itself, and ℓe = 1. We also obtain another formula for the leverage score. As a symmetric projection is its own
square,
Consider sampling a random spanning tree with probability proportional to the product of the Γ(e, e) = Γ(e, :)Γ(e, :)T = ∥Γ(e, :)∥2 .
weights of its edges. We will now show that the probability that edge e appears in the tree is
This is the formula I introduced in Section ??. If we flow 1 unit from a to b, the potential
exactly its leverage score.
difference between c and d is (δ a − δ b )T L+
G (δ c − δ d ). If we plug these potentials into the
Theorem 13.5.1. If we choose a spanning tree T with probability proportional to the product of Laplacian quadratic form, we obtain the effective resistance. Thus this formula says
its edge weights, then for every edge e X 2
wa,b Reff a,b = wa,b wc,d (δ a − δ b )T L+
G (δ c − δ d ) .
Pr [e ∈ T ] = ℓe . (c,d)∈E

Proof of Theorem 13.5.1. Let Span(G) denote the set of spanning trees of G. For an edge e,
For simplicity, you might want to begin by thinking about the case where all edges have weight 1.
X σn−1 (LGT )
Recall that the effective resistance of edge e = (a, b) is PrT [e ∈ T ] =
σn−1 (LG )
T ∈Span(G):e∈T
(δa − δb )T L+ X
G (δa − δb ), = σn−1 (LGT )σn−1 (L+
G)
and so T ∈Span(G):e∈T
X
ℓa,b = wa,b (δa − δb )T L+
G (δa − δb ). = σn−1 (LGT L+
G ),
T ∈Span(G):e∈T
We can write a matrix Γ that has all these terms on its diagonal by letting U be the edge-vertex
adjacency matrix, W be the diagonal edge weight matrix, B = W 1/2 U , and setting by (13.4). Recalling that the subsets of n − 1 edges that are not spanning trees contribute 0
allows us to re-write this sum as
Γ = BL+ T
GB . X
σn−1 (LGS L+
G ).
The rows and columns of Γ are indexed by edges, and for each edge e, |S|=n−1,e∈S

Γ(e, e) = ℓe . To evaluate the terms in the sum, we compute


σn−1 (LGS L+ T +
G ) = σn−1 (B(:, S)B(:, S) LG )
For off-diagonal entries corresponding to edges (a, b) and (c, d), we have
= σn−1 (B(:, S)T L+
G B(:, S))
√ √
Γ((a, b), (c, d)) = wa,b wc,d (δ a − δ b )T L+
G (δ c − δ d ). = σn−1 (Γ(S, S))
Claim 13.5.2. The matrix Γ is a symmetric projection matrix and has trace n − 1. = σn−1 (Γ(S, :)Γ(:, S)).
CHAPTER 13. RANDOM SPANNING TREES 125

Let γ e = Γ(e, :) and let Πγ e denote the projection orthogonal to γ e . As e ∈ S, we have

σn−1 (Γ(S, :)Γ(:, S)) = ∥γ e ∥2 σn−2 (Γ(S, :)Πγ e Γ(:, S)) = ∥γ e ∥2 σn−2 ((ΓΠγ e Γ)(S, S)).

As γ e is in the span on Γ, the matrix ΓΠγ e Γ is a symmetric projection onto an n − 2 dimensional


space, and so
σn−2 (ΓΠγ e Γ) = 1. Chapter 14
To exploit this identity, we return to our summation:
X X
σn−1 (LGS L+G) = ∥γ e ∥2 σn−2 ((ΓΠγ e Γ)(S, S))
|S|=n−1,e∈S |S|=n−1,e∈S
X
Approximating Effective Resistances
= ∥γ e ∥2 σn−2 ((ΓΠγ e Γ)(S, S))
|S|=n−1,e∈S

= ∥γ e ∥2 σn−2 (ΓΠγ e Γ)
= ∥γ e ∥2 In this chapter, we will see how to use the Johnson-Lindenstrauss Lemma, one of the major
techniques for dimension reduction, to approximately represent and compute effective resistances.
= ℓe .
Throughout this chapter, G = (V, E, w) will be a connected, weighted graph with n vertices and
m edges.

14.1 Representing Effective Resistances

We begin by considering the problem of building a data structure from which one can quickly
estimate the effective resistance between every pair of vertices a, b ∈ V . To do this, we exploit the
fact from Section 12.1 that effective resistances can be expressed as squares of Euclidean distances:

Reff (a, b) = (δ a − δ b )T L+ (δ a − δ b )
2
= L+/2 (δ a − δ b )
2
= L+/2 δ a − L+/2 δ b
= dist(L+/2 δ a , L+/2 δ b )2 .

One other way of expressing the above terms is through a matrix norm . For a positive
semidefinite matrix A, the matrix norm in A is defined by

∥x ∥A = x T Ax = A1/2 x .

It is worth observing that this is in fact a norm: it is zero when x is zero, it is symmetric, and it
obeys the triangle inequality: for x + y = z ,

∥z ∥A = A1/2 z = A1/2 (x + y ) ≤ A1/2 x + A1/2 y = ∥x ∥A + ∥y ∥A .

126
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 127 CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 128

The Johnson-Lindenstrauss Lemma [JL84] tells us that every Euclidean metric on n points is the ath column. This leads us to ask how quickly we can multiply a vector by L+/2 . Cheng,
well-approximated by a Euclidean metric in O(log n) dimensions, regardless of the original Cheng, Liu, Peng and Teng [CCL+ 15] show that this can be done in nearly-linear time. In this
dimension of the points. Johnson and Lindenstrauss proved this by applying a random orthogonal section, we will present a more elementary approach that merely requires solving systems of
projection to the points. As is now common, we will analyze the simpler operation of applying a equations in Laplacian matrices. We will see in Chapter ?? that this can be done very quickly.
random matrix of Gaussian random variables (also known as Normal variables). All Gaussian
The key is to realize that we do not actually need to multiply by the square root of the
random variables that appear in this chapter will have mean 0.
pseudoinverse of the Laplacian. Any matrix M such that M T M = L+ will suffice.
We recall that a Gaussian random variable of variance 1 has probability density
Recall that we can write L = U T W U , where U is the signed edge-vertex adjacency matrix and
1 W is the diagonal matrix of edge weights. We then have
p(x) = √ exp(−x2 /2),
2π L+ U T W 1/2 W 1/2 U L+ = L+ LL+ = L+ .
and that a Gaussian random variable of variance σ 2 has probability density So,
2
1 W 1/2 U L+ (δ a − δ b ) = Reff (a, b).
p(x) = √ exp(−x2 /2σ 2 ).
2πσ
Now, we let R be a d-by-m matrix of independent N (0, 1/d) entries, and compute
The distribution of such a variable is written N (0, σ 2 ), where the 0 corresponds to the mean
RW 1/2 U L+ = (RW 1/2 U )L+ .
being 0. A variable with distribution N (0, σ 2 ) may be obtained by sampling one with distribution
N (0, 1), and then multiplying it by σ. Gaussian random variables have many special properties, This requires multiplying d vectors in IRm by W 1/2 U , and solving d systems of linear equations
some of which we will see in this chapter. For those who are not familiar with them, we begin by in L. We then set
mentioning that they are the limit of a binomial distribution. If X is the sum of n ±1 random y a = (RW 1/2 U )L+ δ a .
variables for large n, then  √  Each of these is a vector in d dimensions, and with high probability ∥y a − y b ∥2 is a good
Pr X/ n = t → p(t). approximation of Reff (a, b).
Theorem 14.1.1. Let x 1 , . . . , x n be vectors in IRk . For any ϵ, δ > 0, let d = 8(ln(n2 /δ)/ϵ2 . If R
is a d-by-k matrix of independent N (0, 1/d) variables, then with probability at least 1 − δ, for all
a ̸= b, 14.3 Properties of Gaussian random variables
(1 − ϵ)dist(x a , x b )2 ≤ dist(Rx a , Rx b )2 ≤ (1 + ϵ)dist(x a , x b )2 .
The sum a Gaussian random variables is another Gaussian random variable.
Thus, if we set d = 8(ln(n2 /δ)/ϵ2 , let R be a d-by-n matrix of independent N (0, 1/d) variables, Claim 14.3.1. If r1 , . . . , rn are independent Gaussian random variables of variances σ12 , . . . , σn2 ,
and set y a = RL+/2 δ a for each a ∈ V , then with probability at least 1 − δ we will have that for respectively, then
every a and b, Reff (a, b) is within
 a 1 ± ϵ factor of dist(y a , y b )2 . Whereas writing all effective n
X
resistances would require n2 numbers, storing y 1 , . . . , y n only requires ?nd. ri
i=1
We remark that the 8 in the theorem can be replace with a constant that tends towards 4 as ϵ is a Gaussian random variable of variance
goes to zero. n
X
σi2 .
i=1
14.2 Computing Effective Resistances
One way to remember this is to recall that for a N (0, σ 2 ) random variable r, Er2 = σ 2 , and the
Note that the naive way of computing one effective resistance requires solving one Laplacian variance of the sum of independent random variables is the sum of their variances. The above
system: (δ a − δ b )T L+ (δ a − δ b ). We will see that we can approximate all of them by solving a claim adds the fact that the sum is also Gaussian.
logarithmic number of such systems. In particular, if x is an arbitrary vector and r is a vector of independent N (0, 1) random
If we could quickly multiply a vector by L+/2 , then this would give us a fast way of approximately variables, then x T r is a Gaussian random variable of variance ∥x ∥2 . This follows because
computing all effective resistances. All we would need to do is multiply each of the d rows of R by x (i)r (i) has variance x (i)2 , and X
L+/2 . This would provide the matrix RL+/2 , from which we could compute RL+/2 δ a by selecting xTr = x (i)r (i).
CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 129 CHAPTER 14. APPROXIMATING EFFECTIVE RESISTANCES 130

If x ∈ IRk and R is a matrix of independent N (0, σ 2 ) variables, then each entry of Rx is an Thus the choice of d = 8(ln(n2 /δ)/ϵ2 makes this probability at most
independent N (0, σ 2 ∥x∥2 ) random variable. They are independent because each entry comes
from a separate row of R, and the variables in different rows are independent from each other. 2δ
2 exp(−ϵ2 d/8) ≤ 2 exp(− ln(n2 /δ)) = .
n2
The norm of a vector of identical independent N (0, 1) random variables is called a χ random 
n
variable, and its square is a χ2 random variable. A lot is known about the distribution of χ2 As there 2 possible choices for a and b, the probability that there is one such that
random variables. If the vector has dimension d, then its expectation is d. It is very unlikely to
deviate too much from this. ∥R(x a − x b )∥2 ̸∈ (1 ± ϵ) ∥x a − x b ∥2

For example, the following bound appears as Lemma 1 of [LM00]. is at most  


P n 2δ
2 < δ.
Lemma 14.3.2. Let r1 , . . . , rd be independent N (0, 1) random variables and let X = i ri . 2 n2
Then, for all t > 0,
h √ i
Pr X ≥ d + 2 dt + 2t ≤ exp(−t), and
h √ i
Pr X ≤ d − 2 dt ≤ exp(−t).

We use the following corollary.

Corollary 14.3.3. For ϵ < 1,

Pr [|X − d| ≥ ϵd] ≤ 2 exp(−ϵ2 d/8).

Proof. Set t = ϵ2 d/8. This gives


√ ϵd ϵ2 d ϵd ϵd
2 dt + 2t ≤ 2 √ + ≤ 2√ + < ϵd.
8 4 8 4

Finally, the probability that X − d > ϵd or X − d < −ϵd is at most the sum of these probabilities,
which is at most 2 exp(−t).

We √
remark that for small ϵ the term 2ϵd/ 8 dominates, and the upper bound of ϵd approaches
ϵd/ 2. If one pushes this into the proof below, we see that it suffices to project into a space of
dimension dimension of just a little more than 4(ln(n2 /δ)/ϵ2 , instead of 8(ln(n2 /δ)/ϵ2 .

14.4 Proof of Johnson-Lindenstrauss

Proof of Theorem 14.1.1. First consider an arbitrary a and b, and let ∆ = ∥x a − x b ∥2 . Each
entry of R(x a − x b ) is a d-dimensional vector of N (0, σ 2 ) variables, where σ 2 = ∆/d. Thus,
Corollary 14.3.3 tells us that
 
Pr dist(Rx a , Rx b )2 − dist(x a , x b )2 > ϵdist(x a , x b )2 =
h i
Pr ∥R(x a − x b )∥2 − ∆ ≥ ϵ∆ ≤ 2 exp(−ϵ2 d/8).
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 132

A planar drawing of a graph G = (V, E) consists of mapping from the vertices to the plane,
z : V → IR2 , along with interior-disjoint curves for each edge. The curve for edge (a, b) starts at
z (a), ends at z (b), never crosses itself, and its interior does not intersect the curve for any other
edge. A graph is planar if it has a planar drawing. There can, of course, be many planar drawings
of a graph.

Chapter 15 If one removes the curves corresponding to the edges in a planar drawing, one divides the plane
into connected regions called faces. In a 3-connected planar graph, the sets of vertices and edges
that border each face are the same in every planar drawing. There are planar graphs that are not
3-connected, like those in Figures 15.1 and 15.1, in which different planar drawings result in

Tutte’s Theorem: How to draw a combinatorially different faces. We will only consider 3-connected planar graphs.

graph

We prove Tutte’s theorem [Tut63], which shows how to use spring embeddings to obtain planar
drawings of 3-connected planar graphs. One begins by selecting a face, and then nailing down the
positions of its vertices to the corners of a strictly convex polygon. Of course, the edges of the
face should line up with the edges of the polygon. Ever other vertex goes where the springs say Figure 15.1: Planar graphs that are merely one-connected. Edge (c, d) appears twice on a face in
they should—to the center of gravity of their neighbors. Tutte proved that the result is a planar each of them.
embedding of the planar graph. Here is an image of such an embedding

Figure 15.2: Two different planar drawings of a planar graph that is merely two-connected. Vertices
g and h have switched positions, and thus appear in different faces in each drawing.

The presentation in this lecture is a based on notes given to me by Jim Geelen. I begin by
recalling some standard results about planar graphs that we will assume. We state a few properties of 3-connected planar graphs that we will use. We will not prove these
properties, as we are more concerned with algebra and these properly belong in a class on
combinatorial graph theory.
15.1 3-Connected, Planar Graphs Claim 15.1.1. Let G = (V, E) be a 3-connected planar graph. Then, there exists a set of faces F ,
each of which corresponds to a cycle in G, so that no vertex appears twice in a face, no edge
A graph G = (V, E) is k-connected if there is no set of k − 1 vertices whose removal disconnects appears twice in a face, and every edge appears in exactly two faces.
the graph. That is, for every S ⊂ V with |S| < k, G(V − S) is connected. In a classical graph
theory course, one usually spends a lot of time studying things like 3-connectivity. We call the face on the outside of the drawing the outside face. The edges that lie along the

131
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 133 CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 134

outside face are the boundary edges. We will use one other important fact about planar graphs, whose utility in this context was
observed by Jim Geelen.
Lemma 15.1.3. Let (a, b) be an edge of a 3-connected planar graph and let S1 and S2 be the sets
of vertices on the two faces containing (a, b). Let P be a path in G that starts at a vertex of
S1 − {a, b}, ends at a vertex of S2 − {a, b}, and that does not intersect a or b. Then, every path in
G from a to b either intersects a vertex of P or the edge (a, b).

Proof. Let s1 and s2 be the vertices at the ends of the path P . Consider a planar drawing of G
and ¡the closed curve in the plane that follows the path P from s1 to s2 , and then connects s1 to
s2 by moving inside the faces S1 and S2 , where the path only intersects the curve for edge (a, b).
Figure 15.3: 3-connected planar graphs. Some faces of the graph on the left are abf , f gh, and This curve separates vertex a from vertex b. Thus, every path in G that connects a to b must
af he. The outer face is abcde. The graph on the right is obtained by contracting edge (g, h). intersect this curve. This means that it must either consist of just edge (a, b), or it must intersect
a vertex of P . See Figure 15.1.

Another standard fact about planar graphs is that they remain planar under edge contractions.
Contracting an edge (a, b) creates a new graph in which a and b become the same vertex, and all
edges that went from other vertices to a or b now go to the new vertex. Contractions also preserve
3-connectivity. Figure 15.1 depicts a 3-connected planar graph and the result of contracting an
edge.
A graph H = (W, F ) is a minor of a graph G = (V, E) if H can be obtained from G by
contracting some edges and possibly deleting other edges and vertices. This means that each
vertex in W corresponds to a connected subset of vertices in G, and that there is an edge between
two vertices in W precisely when there is some edge between the two corresponding subsets. This
leads to Kuratowski’s Theorem [Kur30], one of the most useful characterizations of planar graphs.

Theorem 15.1.2. A graph G is planar if and only if it does not have a minor isomorphic to the
complete graph on 5 vertices, K5 , or the bipartite complete graph between two sets of 3 vertices, Figure 15.5: A depiction of Lemma 15.1.3. S1 = abcde, S2 = abf , and the path P starts at d, ends
K3,3 . at f , and contains the other unlabeled vertices.

15.2 Strictly Convex Polygons

This is a good time to remind you what exactly a convex polygon is. A subset C ⊆ IR2 is convex
if for every two points x and y in C, the line segment between x and y is also in C. A convex
polygon is a convex region of IR2 whose boundary is comprised of a finite number of straight lines.
It is strictly convex if in addition the angle at every corner is less than π. We will always assume
that the corners of a strictly convex polygon are distinct. Two corners form an edge of the
polygon if the interior of the polygon is entirely on one side of the line through those corners.
This leads to another definition of a strictly convex polygon: a convex polygon is strictly convex if
for every edge, all of the corners of the polygon other than those two defining the edge lie entirely
Figure 15.4: The Peterson graph appears on the left. On the right is a minor of the Peterson graph on one side of the polygon. In particular, none of the other corners lie on the line.
that is isomorphic to K5 , proving that the Peterson graph is not planar.
Definition 15.2.1. Let G = (V, E) be a 3-connected planar graph. We say that z : V → IR2 is a
Tutte embedding if
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 135 CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 136

Claim 15.3.2. All vertices not in F must lie strictly inside the convex hull of the polygon of
which the vertices in F are the corners.

Proof. For every vertex a not in F , we can show that the position of a is a weighted average of
the positions of vertices in F by eliminating every vertex not in F ∪ {a}. As we learned in Lecture
13, this results in a graph in which all the neighbors of a are in F , and thus the position of a is
(a) A polygon (b) A convex polygon (c) A strictly convex some weighted average of the position of the vertices in F . As the graph is 3-connected, we can
polygon
show that this average must assign nonzero weights to at least 3 of the vertices in F .
Figure 15.6: Polygons
Note that it is also possible to prove Claim 15.3.2 by showing that one could reduce the potential
energy by moving vertices inside the polygon. See Claim 8.8.1 from my lecture notes from 2015.
a. There is a face F of G such that z maps the vertices of F to the corners of a strictly convex Lemma 15.3.3. Let H be a halfspace in IR2 (that is, everything on one side of some line). Then
polygon so that every edge of the face joins consecutive corners of the polygon; the subgraph of G induced on the vertices a such that z (a) ∈ H is connected.
b. Every vertex not in F lies at the center of gravity of its neighbors.
Proof. Let t be a vector so that we can write the line ℓ in the form t T x = µ, with the halfspace
We will prove Tutte’s theorem by proving that every face of G is embedded as a strictly convex consisting of those points x for which t T x ≥ µ. Let a be a vertex such that z (a) ∈ H and let b be
polygon. In fact, we will not use the fact that every non-boundary vertex is exactly the average of a vertex that maximizes t T z (b). So, z (b) is as far from the line defining the halfspace as possible.
its neighbors. We will only use the fact that every non-boundary vertex is inside the convex hull By Claim 15.3.2, b must be on the outside face, F .
of its neighbors. This corresponds to allowing arbitrary spring constants in the embedding.
For every vertex c, define t(c) = t T z (c). We will see that there is a path in G from a to b along
Theorem 15.2.2. Let G = (V, E) be a 3-connected planar graph, and let z be a Tutte embedding which the function t never decreases, and thus all the vertices along the path lie in the halfspace.
of G. If we represent every edge of G as the straight line between the embedding of its endpoints, We first consider the case in which t(a) = t(b). In this case, we also know that a ∈ F . As the
then we obtain a planar drawing of G. vertices in F embed to a strictly convex polygon, this implies that (a, b) is an edge of that
polygon, and thus the path from a to b.
Note that if the graph were not 3-connected, then the embedding could be rather degenerate. If
If t(a) < t(b), it suffices to show that there is a path from a to some other vertex c for which
there are two vertices a and b whose removal disconnects the graph into two components, then all
t(c) > t(a) and along which t never decreases: we can then proceed from c to obtain a path to b.
of the vertices in one of those components will embed on the line segment from a to b.
Let U be the set of all vertices u reachable from a for which t(u) = t(a). As the graph is
Henceforth, G will always be a 3-connected planar graph and z will always be a Tutte embedding. connected, there must be a vertex u ∈ U that has a neighbor c ̸∈ U . By Claim 15.3.1 u must have
a neighbor c for which t(c) > t(u). Thus, the a path from a through U to c suffices.

Lemma 15.3.4. No vertex is colinear with all of its neighbors.


15.3 Possible Degeneracies
Proof. This is trivially true for vertices in F , as no three of them are colinear.
The proof of Theorem 15.2.2 will be easy once we rule out certain degeneracies. There are two
types of degeneracies that we must show can not happen. The most obvious is that we can not Assume by way of contradiction that there is a vertex a that is colinear with all of its neighbors.
have z (a) = z (b) for any edge (a, b). The fact that this degeneracy can not happen will be a Let ℓ be that line, and let S + and S − be all the vertices that lie above and below the line,
consequence of Lemma 15.4.1. respectively. Lemma 15.3.3 tells us that both sets S + and S − are connected. Let U be the set of
vertices u reachable from a and such that all of us neighbors lie on ℓ. The vertex a is in U . Let W
The other type of degeneracy is when there is a vertex a such that all of its neighbors lie on one be the set of nodes that lie on ℓ that are neighbors of vertices in U , but which themselves are not
line in IR2 . We will rule out such degeneracies in this section. in U . As vertices in W are not in U , Claim 15.3.1 implies that each vertex in W has neighbors in
We first observe two simple consequences of the fact that every vertex must lie at the average of both S + and S − . As the graph is 3-connected, and removing the vertices in W would disconnect
its neighbors. U from the rest of the graph, there are at least 3 vertices in W . Let w1 , w2 and w3 be three of the
vertices in W .
Claim 15.3.1. Let a be a vertex and let ℓ be any line in IR2 through z (a). If a has a neighbor
that lines on one side of ℓ, then it has a neighbor that lies on the other.
CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 137 CHAPTER 15. TUTTE’S THEOREM: HOW TO DRAW A GRAPH 138

We will now obtain a contradiction by showing that G has a minor isomorphic to K3,3 . The three
vertices on one side are w1 , w2 , and w3 . The other three are obtained by contracting the vertex
sets S + , S − , and U .

Figure 15.8: An illustration of the proof of Lemma 15.4.1.

that edge. This is all we need to know to prove Tutte’s Theorem. We finish the argument in the
Figure 15.7: An illustration of the proof of Lemma 15.3.4. proof below.

Proof of Theorem 15.2.2. We say that a point of the plane is generic if it does not lie on any z (a)
for on any segment of the plane corresponding to an edge (a, b). We first prove that every generic
15.4 All faces are convex point lies in exactly one face of G.
Begin with a point that is outside the polygon on which F is drawn. Such a point lies only in the
We now prove that every face of G embeds as a strictly convex polygon. outside face. For any other generic point we can draw a curve between these points that never
intersects a z (a) and never crosses the intersection of the drawings of edges. That is, it only
Lemma 15.4.1. Let (a, b) be any non-boundary edge of the graph, and let ℓ be a line through crosses drawings of edges in their interiors. By Lemma 15.4.1, when the curve does cross such an
z (a) and z (b) (there is probably just one). Let F0 and F1 be the faces that border edge (a, b) and edge it moves from one face to another. So, at no point does it ever appear in two faces.
let S0 and S1 be the vertices on those faces, other than a and b. Then all the vertices of S0 and S1
lie on opposite sides of ℓ, and none lie on ℓ. Now, assume by way of contradiction that the drawings of two edges cross. There must be some
generic point near their intersection that lies in at least two faces. This would be a
contradiction.
Note: if z (a) = z (b), then we can find a line passing through them and one of the vertices of S0 .
This leads to a contradiction, and thus rules out this type of degeneracy.

15.5 Notes
Proof. Assume by way of contradiction that the lemma is false. Without loss of generality, we
may then assume that there are vertices of both S0 and S1 on or below the line ℓ. Let s0 and s1
be such vertices. By Lemma 15.3.4 and Claim 15.3.1, we know that both s0 and s1 have This is the simplest proof of Tutte’s theorem that I have seen. Over the years, I have taught
neighbors that lie strictly below the line ℓ. By Lemma 15.3.3, we know that there is a path P many versions of Tutte’s proof by building on expositions by Lovász [LV99] and Geelen [Gee12],
that connects s0 and s1 on which all vertices other than s0 and s1 lie strictly below ℓ. and an alternative proof of Gortler, Gotsman and Thurston [GGT06].

On the other hand, we can similarly show that that both a and b have neighbors above the line ℓ,
and that they are joined by a path that lies strictly above ℓ. Thus, this path cannot consist of the
edge (a, b) and must be disjoint from P . This contradicts Lemma 15.1.3.

So, we now know that the embedding z contains no degeneracies, that every face is embedded as
a strictly convex polygon, and that the two faces bordering each edge embed on opposites sides of
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 140

I remark that this theorem has a very clean extension to irregular, weighted graphs. I just present
this version to simplify the exposition.
We can use this theorem to bound the rate of convergence of random walks in a graph. Let p t be
the probability distribution of the walk after t steps, and plot the curves p t {x}. The theorem tells
us that these curves lie beneath each other, and that each curve lies beneath a number of chords
Chapter 16 drawn across the previous. The walk is uniformly mixed when the curve reaches a straight line
from (0, 0) to (n, 1). This theorem tells us how quickly the walks approach the straight line.
Today, we will use the theorem to prove a variant of Cheeger’s inequality.

The Lovàsz - Simonovits Approach to 16.2 Definitions and Elementary Observations


Random Walks
We believe that larger conductance should imply faster mixing. In the case of Theorem 16.1.1, it
should imply lower curves. This is because wider chords lie beneath narrower ones.
Claim 16.2.1. Let h(x) be a convex function, and let z > y > 0. Then,
This Chapter Needs Editing 1 1
(h(x − z) + h(x + z)) ≤ (h(x − y) + h(x + y)) .
2 2
16.1 Introduction Claim 16.2.2. Let f be a vector, let k ∈ [0, n], and let α1 , . . . , αn be numbers between 0 and 1
such that X
αi = k.
These notes are still very rough, and will be finished later. i

For a vector f and an integer k, we define f {k} to be the sum of the largest k entries of f . For Then, X
convenience, we define f {0} = 0. Symbolically, you can define this by setting π to be a αi f (i) ≤ f {k}.
permutation for which i
f (π(1)) ≥ f (π(2)) ≥ ... ≥ f (π(n)),
This should be obvious, and most of you proved something like this when solving problem 2 on
and then setting homework 1. It is true because the way one would maximize this sum is by setting x to 1 for the
k
X largest values.
f {k} = f (π(i)).
i=1 Throughout this lecture, we will only consider lazy random walks on regular graphs. For a set S
For real number x between 0 and n, we define f {x} by making it be piece-wise linear between and a vertex a, we define γ(a, S) to be the probability that a walk that is at vertex a moves to S
consecutive integers. This means that for x between integers k and k + 1, the slope of f {} at x is in one step. If a is not in S, this equals one half the fraction of edges from a to S. It is one half
f (π(k + 1)). As these slopes are monotone nonincreasing, the function f {x} is concave. because there is a one half probability that the walk stays at a. Similarly, if a is in S, then γ(a, S)
equals one half plus one half the fraction of edges of a that end in S.
We will prove the following theorem of Lovàsz and Simonovits [LS90] on the behavior of W f .

Theorem 16.1.1. Let W be the transition matrix of the lazy random walk on a d-regular graph
with conductance at least ϕ. Let g = W f . Then for all integers 0 ≤ k ≤ n 16.3 Warm up
1
g {k} ≤ (f {k − ϕh} + f {k + ϕh}) , We warm up by proving that the curves must lie under each other.
2
For a vector f and a set S, we define
where h = min(k, n − k). X
f (S) = f (a).
a∈S

139
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 141 CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 142

For every k there is at least one set S for which Proof. To ease notation, define γ(a) = γ(a, S). We prove the theorem by rearranging the formula
X
f (S) = f {k}. g (S) = γ(a)f (a).
a∈V
If the values of f are distinct, then the set S is unique.
P
Lemma 16.3.1. Let f be a vector and let g = W f . Then for every x ∈ [0, n], Recall that a∈V γ(a) = k.

g {x} ≤ f {x}. For every vertex a define


( (
γ(a) − 1/2 if a ∈ S 1/2 if a ∈ S
Proof. As the function g {x} is piecewise linear between integers, it suffices to prove it at integers α(a) = and β(a) =
k. Let k be an integer and let S be a set of size k for which f (S) = f {k}. As g = W f , 0 if a ∈
̸ S γ(a) if a ∈
̸ S.
X
g (S) = γ(a, S)f (a). As α(a) + β(a) = γ(a), X X
a∈V g (S) = α(a)f (a) + β(a)f (a).
a∈V a∈V
As the graph is regular, X
γ(a, S) = k. We now come to the point in the argument where we exploit the laziness of the random walk,
a∈V
which manifests as the fact that γ(a) ≥ 1/2 for a ∈ S, and so 0 ≤ α(a) ≤ 1/2 for all a. Similarly,
0 ≤ β(a) ≤ 1/2 for all a. So, we can write
Thus, Claim 16.2.2 implies X
γ(a, S)f (a) ≤ f {k}. X 1X X 1X
a∈V
α(a)f (a) = (2α(a))f (a), and β(a)f (a) = (2β(a))f (a)
2 2
a∈V a∈V a∈V a∈V

with all coefficients 2α(a) and 2β(a) between 0 and 1. As


X k X
16.4 The proof β(a) = + γ(a),
2
a∈V a̸∈S
Recall that the conductance of a subset of vertices S in a d-regular graph is defined to be
we can set X
def |∂(S)| z= γ(a)
ϕ(S) = .
d min(|S| , n − |S|) a̸∈S

and write X X
Our proof of the main theorem improves the previous argument by exploiting the conductance (2α(a)) = k − 2z and (2β(a)) = k + 2z.
through the following lemma. a∈V a∈V

Lemma 16.4.1. Let S be any set of k vertices. Then Lemma 16.4.1 implies that
X z ≥ ϕh/2.
γ(a, S) = (ϕ(S)/2) min(k, n − k).
a̸∈S By Claim 16.2.2,
1
g (S) ≤ (f {k − z} + f {k + z}) .
Proof. For a ̸∈ S, γ(a, S) equals half the fraction of the edges from a that land in S. And, the 2
number of edges leaving S equals dϕ(S) min(k, n − k). So, Claim 16.2.1 implies
1
g (S) ≤ (f {k − ϕh} + f {k + ϕh}) .
Lemma 16.4.2. Let W be the transition matrix of the lazy random walk on a d-regular graph, 2
and let g = W f . For every set S of size k with conductance at least ϕ,
1
g (S) ≤ (f {k − ϕh} + f {k + ϕh}) , Theorem 16.1.1 follows by applying Lemma 16.4.2 to sets S for which f (S) = f {k}, for each
2
integer k between 0 and n.
where h = min(k, n − k).
CHAPTER 16. THE LOVÀSZ - SIMONOVITS APPROACH TO RANDOM WALKS 143

16.5 Andersen’s proof of Cheeger’s inequality

Reid Andersen observed that the technique of Lovàsz and Simonovits can be used to give a new
proof of Cheeger’s inequality. I will state and prove the result for the special case of d-regular
graphs that we consider in this lecture. But, one can of course generalize this to irregular,
weighted graphs.
Theorem 16.5.1. Let G be a d-regular graph with lazy random walk matrix W , and let
Chapter 17
ω2 = 1 − λ be the second-largest eigenvalue of W . Then there is a subset of vertices S for which

ϕ(S) ≤ 8λ.

Proof. Let ψ be the eigenvector corresponding to ω2 . As ψ is orthogonal to the constant vectors,


Monotonicity and its Failures
ψ{n} = 0. Define
ψ{k}
k = arg max p .
0≤k≤n min(k, n − k)
Then, set γ to be the maximum value obtained:
This Chapter Needs Editing
ψ{k} These notes are not necessarily an accurate representation of what happened in class. They are a
γ=p . combination of what I intended to say with what I think I said. They have not been carefully
min(k, n − k)
edited.
We will assume without loss of generality that k ≤√
n/2: if it is not then we replace ψ by −ψ to
make it so and obtain the same γ. Now, ψ{k} = γ k.
We let S be a set (there is probably only one) for which 17.1 Overview
ψ(S) = ψ{k}.
As ψ is an eigenvector with positive eigenvalue, we also know that 17.2 Effective Spring Constants
(W ψ)(S) = W ψ{k}.
Consider a spring network. As in last lecture, we model it by a weighted graph G = (V, E, w),
We also know that √ where wa,b is the spring constant of the edge (a, b). Recall that a stronger spring constant results
(W ψ)(S) = (1 − λ)ψ(S) = (1 − λ)γ k. in a stronger connection between a and b.
Let ϕ be the conductance of S. Lemma 16.4.2 tells us that Now, let s and t be arbitrary vertices in V . We can view the network as a large, complex spring
1 connecting s to t. We then ask for the spring constant of this complex spring. We call it the
(W ψ)(S) ≤ (ψ{k − ϕk} + ψ{k + ϕk}) . effective spring constant between s and t.
2
By the construction of k and γ at the start of the proof, we know this quantity is at most To determine what it is, we recall the definition of the spring constant for an ordinary spring: the
1 p p  √ 1 p p 
potential energy in a spring connecting a to b is the spring constant times times the square of the
γ k − ϕk + γ k + ϕk = γ k 1−ϕ+ 1+ϕ .
2 2 length of the spring, divided by 2. We use this definition to determine the effective spring
Combining the inequalities derived so far yields constant between s and t.
1 p p 
Recall again that if we fix the positions of s and t on the real line, say to 0 and 1, then the
(1 − λ) ≤ 1−ϕ+γ 1+ϕ .
2 positions x of the other vertices will minimize the total energy:
An examination of the Taylor series for the last terms reveals that def 1
X
E (x ) = wa,b (x (a) − x (b))2 . (17.1)
1 p p  2
1 − ϕ + γ 1 + ϕ ≤ 1 − ϕ2 /8. (a,b)∈E
2
√ As s and t are separated by a distance of 1, we may define twice this quantity to be the effective
This implies λ ≥ ϕ2 /8, and thus ϕ(S) ≤ 8λ.
spring constant of the entire network between s and t. To verify that this definition is consistent,

144
CHAPTER 17. MONOTONICITY AND ITS FAILURES 145 CHAPTER 17. MONOTONICITY AND ITS FAILURES 146

we should consider what happens if the displacement between s and t is something other than 1. This formula tells us that if we have one resistor between a and b and we fix the voltage of a to 1
If we fix the position of s to 0 and the position of t to y, then the homogeniety of the expression and the voltage of b to 0, then the amount of current that will flow from a to b is the reciprocal of
for energy (17.1) tells us that the vector yx will minimize the energy subject to the boundary the resistance. It also tells us that if we want to flow one unit of current, then we need to place a
conditions. Moreover, the energy in this case will be y 2 /2 times the effective spring constant. potential difference of ra,b between a and b. Recall that we define the weight of an edge to be the
reciprocal of its resistance, as high resistance corresponds to poor connectivity. We can use this
formula to define the effective resistance between two vertices s and t in an arbitrary complex
17.3 Monotonicity network of resistors: we define the effective resistance between s and t to be the potential
difference needed to flow one unit of current from s to t.
Rayleigh’s Monotonicity Principle tells us that if we alter the spring network by decreasing some Algebraically, define i ext to be the vector
of the spring constants, then the effective resistance between s and t will not increase. 

1 if a = s
b = (V, E, w)
Theorem 17.3.1. Let G = (V, E, w) be a weighted graph and let G b be another
i ext (a) = −1 if a = t .
weighted graph with the same edges and such that 

0 otherwise
ba,b ≤ wa,b
w
This corresponds to a flow of 1 from s to t. We then solve for the voltages that realize this flow:
for all (a, b) ∈ E. For vertices s and t, let cs,t be the effective spring constant between s and t in
G and let b b Then,
cs,t be the analogous quantity in G. Lv = i ext ,

cs,t ≤ cs,t .
b by
v = L+ i ext .
Proof. Let x be the vector of minimum energy in G such that x (s) = 0 and x (t) = 1. Then, the We thus have
def
b is no greater:
energy of x in G v (s) − v (t) = i Text v = i Text L+ i ext = Reff (s, t).

1 X 1 X
ba,b (x (a) − x (b))2 ≤
w wa,b (x (a) − x (b))2 = cs,t . This agrees with the other natural approach to defining effective resistance: twice the energy
2 2 dissipation when we flow one unit of current from s to t.
(a,b)∈E (a,b)∈E

b such that x (s) = 0 and x (t) = 1 will be at most cs,t , Theorem 17.4.1. Let i be the electrical flow of one unit from vertex s to vertex t in a graph G.
So, the minimum energy of a vector x in G
Then,
cs,t ≤ cs,t .
and so b
Reff s,t = E (i ) .
While this principle seems very simple and intuitively obvious, it turns out to fail in just slightly
Proof. Recalling that i ext = Lv , we have
more complicated situtations. Before we examine them, I will present the analogous material for
electrical networks. Reff s,t = i Text L+ i ext = v T LL+ Lv = v T Lv = E (v ) .

17.4 Effective Resistance


Rayleigh’s Monotonicity Theorem was originally stated for electrical networks.
There are two (equivalent) ways to define the effective resistance between two vertices in a
network of resistors. The first is to start with the formula Theorem 17.4.2 (Rayleigh’s Monotonicity). The effective resistance between a pair of vertices
cannot be decreased by increasing the resistance of some edges.
V = IR,

or, as I prefer to write it,


v (a) − v (b)
i (a, b) = ,
ra,b
CHAPTER 17. MONOTONICITY AND ITS FAILURES 147 CHAPTER 17. MONOTONICITY AND ITS FAILURES 148

17.5 Examples I now cut the small wire connecting point b to point c. While you would expect that removing
material from the supporting structure would cause the weight to go down, it will in fact move
In the case of a path graph with n vertices and edges of weight 1, the effective resistance between up. To see why, let’s analyze the resulting structure. It consists of two suppors in parallel. One
the extreme vertices is n − 1. consists of a spring from point a to point b followed by a wire of length 1 + ϵ from point b to d.
The other has a wire of length 1 + ϵ from point a to point c followed by a spring from point c to
In general, if a path consists of edges of resistance r(1, 2), . . . , r(n − 1, n) then the effective point d. Each of these is supporting the weight, and so each carries half the weight. This means
resistance between the extreme vertices is that the length of the springs will be 1/2. So, the distance from a to d should be essentially 3/2.

r(1, 2) + · · · + r(n − 1, n). This sounds like a joke, but we will see in class that it is true. The measurements that we get will
not be exactly 2 and 3/2, but that is because it is difficult to find ideal springs at Home Depot.
To see this, set the potential of vertex i to
In the example with resistors and diodes, one can increase electrical flow between two points by
v (i) = r(i, i + 1) + · · · + r(n − 1, n). cutting a wire!

Ohm’s law then tells us that the current flow over the edge (i, i + 1) will be
17.7 Traffic Networks
(v (i) − v (i + 1)) /r(i, i + 1) = 1.
I will now explain some analogous behavior in traffic networks. We will examine the more
If we have k parallel edges between two nodes s and t of resistances r1 , . . . , rk , then the effective formally in the next lecture.
resistance is
1 We will use a very simple model of a road in a traffic network. It will be a directed edge between
Reff (s, t) = .
1/r1 + · · · + 1/rk two vertices. The rate at which traffic can flow on a road will depend on how many cars are on
Again, to see this, note that the flow over the ith edge will be the road: the more cars, the slower the traffic. I will assume that our roads are linear. That is,
when a road has flow f , the time that it takes traffic to traverse the road is
1/ri
, af + b,
1/r1 + · · · + 1/rk

so the total flow will be 1. for some nonnegative constants a and b. I call this the characteristic function of the road.
We first consider an example of Pigou consisting of two roads between two vertices, s and t. The
slow road will have characteristic function 1: think of a very wide super-highway that goes far out
17.6 Breakdown of Monotonicity of the way. No matter how many cars are on it, the time from s to t will always be 1. The fast
road is better: its characteristic is f . Now, assume that there is 1 unit of traffic that would like to
We will now exhibit a breakdown of monotonicity in networks of nonlinear elements. In this case, go from s to t.
we will consider a network of springs and wires. For examples in electrical networks with resistors
and diodes or for networks of pipes with valves, see [PP03] and [CH91]. A global planner that could dictate the route that everyone takes could minimize the average time
of the traffic going from s to t by assigning half of the traffic to take the fast road and half of the
There will be 4 important vertices in the network that I will describe, a, b, c and d. Point a is traffic to take the slow road. In this case, half of the traffic will take time 1 and half will take time
fixed in place at the top of my aparatus. Point d is attached to an object of weight 1. The 1/2, for an average travel time of 3/4. To see that this is optimal, let f be the fraction of traffic
network has two springs of spring constant 1: one from point a to point b and one from point c to that takes the fast road. Then, the average travel time will be
point d. There is a very short wire connecting point b to point c.
f · f + (1 − f ) · 1 = f 2 − f + 1.
As each spring is supporting one unit of weight, each is stretched to length 1. So, the distance
from point a to point d is 2. Taking derivatives, we see that this is minimized when
I now add two more wires to the network. One connects point a to point c and the other connects
2f − 1 = 0,
point b to point d. Both have lengths 1 + ϵ, and so are slack. Thus, the addition of these wires
does not change the position of the weight. which is when f = 1/2.
CHAPTER 17. MONOTONICITY AND ITS FAILURES 149 CHAPTER 17. MONOTONICITY AND ITS FAILURES 150

On the other hand, this is not what people will naturally do if they have perfect information and more than 4/3 when the cost functions are linear. If there is time today, I will begin a more
freedom of choice. If a f < 1 fraction of the flow is going along the fast road, then those travelling formal analysis of Opt(G) and Nash(G) that we will need in our proof.
on the fast road will get to t faster than those going on the slow road. So, anyone going on the
slow road would rather take the fast road. So, all of the traffic will wind up on the fast road, and
it will become not-so-fast. All of the traffic will take time 1. 17.10 Nash optimum
We call this the Nash Optimal solution, because it is what everyone will do if they are only
maximizing their own benefit. You should be concerned that this is not as well as they would do Let the set of s-t paths be P1 , . . . , Pk , and let αi be the fraction of the traffic that flows on path
if they allowed some authority to dictate their routes. For example, the authority could dictate Pi . In the Nash equilibrium, no car will go along a sub-optimal path. Assuming that each car has
that half the cars go each way every-other day, or one way in the morning and another at night. a negligible impact on the traffic flow, this means that every path Pi that has non-zero flow must
have minimal cost. That is, for all i such that αi > 0 and all j
Let’s see an even more disturbing example.
c(Pi ) ≤ c(Pj ).

17.8 Braes’s Paradox


17.11 Social optimum
We now examine Braes’s Paradox, which is analogous to the troubling example we saw with
springs and wires. This involves a network with 4 vertices, a, b, c, and d. All the traffic starts at Society in general cares more about the average time its takes to get from s to t. If we have a flow
s = a and wants to go to t = d. There are slow roads from s to c and from d to t, and fast roads that makes this average time low, everyone could rotate through all the routes and decrease the
from s to d and from c to t. If half of the traffic goes through route sct and the other half goes total time that they spend in traffic. So, the social cost of the flow f is
through route sdt, then all the traffic will go from s to t in time 3/2. Moroever, noone can
def
improve their lot by taking a different route, so this is a Nash equilibrium. c(α1 , . . . , αk ) = =
X X X
We now consider what happens if some well-intentioned politician decides to build a very fast αi c(Pi ) = αi ce (fe )
road connecting c to d. Let’s say that its characteristic function is 0. This opens up a faster i i e∈Pi
X X
route: traffic can go from s to c to d to t. If no one else has changed route, then this traffic will = ce (fe ) αi
reach t in 1 unit of time. Unfortunately, once everyone realizes this all the traffic will take this e i:e∈Pi
X
route, and everyone will now require 2 units of time to reach t. = ce (fe )fe .
Let’s prove that formally. Let p1 , p2 and p3 be the fractions of traffic going over routes sct, sdt, e

and scdt, respectively. The cost of route sct is p1 + p3 + 1. The cost of route sdt is p2 + p3 + 1. Theorem 17.11.1. All local minima of the social cost function are global minima. Moreover, the
And, the cost of route scdt is p3 + p3 . So, as long as p3 is less than 1, the cheapest route will be set of global minima is convex.
scdt. So, all the traffic will go that way, and the cost of every route will be 2.
Proof. This becomes easy once we re-write the cost function as
X X
17.9 The Price of Anarchy ce (fe )fe = ae fe2 + be fe
e e

In any traffic network, we can measure the average amount of time it takes traffic to go from s to and recall that we assumed that ae and be are both at least zero. The cost function on each edge
t under the optimal flow. We call this the cost of the social optimum, and denote it by Opt(G). is convex. It is strictly convex if ae > 0, but that does not matter for this theorem.
When we let everyone pick the route that is best for themselves, the resulting solution is a Nash
If you take two flows, say f 0 and f 1 , the line segments of flows between them contains the flows of
Equilibrium, and we denote it by Nash(G).
the form f t where
The “Price of Anarchy” is the cost to society of letting everyone do their own thing. That is, it is fet = tfe1 + (1 − t)fe0 ,
the ratio
Nash(G) for 0 ≤ t ≤ 1.
.
Opt(G) By the convexity of each cost function, we know that the cost of any flow f t is at most the
In these examples, the ratio was 4/3. In the next lecture, we will show that the ratio is never maximum of the costs of f 0 and f 1 . So, if f 1 is the global optimum and f 0 is any other flow with
CHAPTER 17. MONOTONICITY AND ITS FAILURES 151

higher cost, the flow f ϵ will have a social cost lower than f 0 . This means that f 0 cannot be a
local optimum. Similarly, if both f 0 and f 1 are global optima, then f t must be as well.

Chapter 18

Dynamic and Nonlinear Networks

This Chapter Needs Editing


These notes are not necessarily an accurate representation of what happened in class. They are a
combination of what I intended to say with what I think I said. They have not been carefully
edited.

18.1 Overview

In this lecture we will consider two generalizations of resistor networks: resistor networks with
non-linear resistors and networks whose resistances change over time. While they were introduced
over 50 years ago, non-linear resistor networks seem to have been recently rediscovered in the
Machine Learning community. We will discuss how they can be used to improve the technique we
learned in Lecture 13 for semi-supervised learning.
The material on time-varying networks that I will present comes from Cameron Musco’s senior
thesis from 2012.

18.2 Non-Linear Networks

A non-linear resistor network, as defined by Duffin [Duf47], is a like an ordinary resistor network
but the resistances depend on the potential differences across them. In fact, it might be easier not
to talk about resistances, and just say that the amount of flow across an edge increases as the
potential difference across the edge does. For every resistor e, there is a function

ϕe (v)

that gives the flow over resistor e when there is a potential difference of v between its terminals.
We will restrict our attention to functions ϕ that are

152
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 153 CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 154

a. continuous, Theorem 18.3.1. Let G = (V, E) be a non-linear resistor network with functions fe satisfying
conditions a, b and c for every e ∈ E. For every set S ⊆ V and fixed voltages wa for a ∈ S, there
b. monotone increasing, exists a setting of voltages va for a ̸∈ S that result in a flow of current that satisfies the flow-in
c. symmetric, by which I mean ϕe (−v) = −ϕe (v). equals flow-out conditions at every a ̸∈ S. Moreover, these voltages are unique.

Note that condition c implies that ϕe (0) = 0. For an ordinary resistor of resistance r, we have Proof. For a vector of voltages v, define
X
Φ(v) = Φ(a,b) (va − vb ).
ϕe (v) = v/r.
(a,b)∈E

However, we can and will consider more interesting functions. As each of the functions Φ(a,b) are strictly convex, Φ is as well. So, Φ has a minimum subject to
If the graph is connected and we fix the voltages at some of the vertices, then there exists a the fixed voltages. At this minimum point, we know that for every a ̸∈ S
setting of voltages at the other vertices that results in a flow satisfying flow-in equals flow-out at ∂Φ(v)
0=
all non-boundary vertices. Moreover, this flow is unique. ∂va
X ∂Φ(a,b) (va − vb )
We will prove this in the next section through the use of a generalization of energy dissipation. =
∂va
b:(a,b)∈E
X
= ϕ(a,b) (va − vb ).
18.3 Energy
b:(a,b)∈E

We define the energy dissipation of an edge that has a potential difference of v to be We may now set
Z v f(a,b) = ϕ(a,b) (va − vb ).
def
Φe (v) = ϕe (t)dt. This is a valid flow because for every vertex a ̸∈ S the sum of the flows out of va , taken with
0 appropriate signs, is zero.
We will show that the setting of the voltages that minimizes the total energy provides the flow I Conversely, for any setting of voltages that results in a flow that has no loss or gain at any a ̸∈ S,
claimed exists. we can reverse the above equalities to show that the partial derivatives of Φ(v) are zero. As Φ(v)
In the case of linear resistors, where ϕe (v) = v/r, is strictly convex, this can only happen at the unique minimum of Φ(v).

1 v2
Φe (v) = ,
2 r 18.4 Uses in Semi-Supervised Learning
which is exactly the energy function we introduced in Lecture 13.
In Lecture 13, I suggested an approach to estimating a function f on the vertices of a graph given
The conditions on ϕe imply that its values at a set S ⊆ V : X
min (x(a) − x(b))2 .
x:f (a)=x(a) for a∈S
d. Φe is strictly convex1 , (a,b)∈E

e. Φe (0) = 0, and Moreover, we saw that we can minimize such a function by solving a system of linear equations.
Unfortunately, there are situtations in which this approach does not work very well. In general,
f. Φe (−x) = Φe (x).
this should not be surprising: sometimes the problem is just unsolvable. But, there are cases in
which it would be reasonable to solve the learning problem in which this approach fails.
We remark that a function that is strictly convex has a unique minimum, and that a sum of
strictly convex functions is strictly convex. Better results are sometimes obtained by modifying the penalty function. For example, Bridle
1
and Zhu [BZ13] (and, essentially, Herbster and Guy [HL09]) suggest
That is, for all x ̸= y and all 0 < λ < 1, Φe (λx + (1 − λ)y) < λΦe (x) + (1 − λ)Φe (y). X
min |x(a) − x(b)|p ,
x:f (a)=x(a) for a∈S
(a,b)∈E
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 155 CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 156

for 1 < p < 2. minimality of the flow. Because Ψ(f ) is strictly convex, small changes to the optimum have a
negligible effect on its value (that is, the first derivative is zero). So, pushing an ϵ amount of flow
While a well-selected p will often improve accuracy, the drawback of this approach is that we
around any cycle will not change the value of Ψ(f ). That is, the sum of the derivatives around
cannot perform the minimization nearly as quickly as we can when p = 2.
any cycle will be zero. As

Ψe (f ) = ψe (f ),
∂f
18.5 Dual Energy
this means that the sum of the desired potential differences around every cycle is zero.

We can establish a corresponding, although different, energy for the flows. Let ψ be the inverse of Theorem 18.5.2. If f = ϕ(v), then
ϕ. We then define the flow-energy of an edge that carries a flow of f to be
Z f Φ(v) + Ψ(f ) = vf.
def
Ψ(f ) = ψ(t)dt.
0 Proof. One can prove this theorem through “integration by parts”. But, I prefer a picture. In the
If we minimize the sum of the flow-energies over the space of flows, we again recover the unique following two figures, the curve is the plot of ϕ. In the first figure, the shaded region is the
valid flow in the network. (The function Φ is implicit in the work of Duffin. The dual Ψ comes integral of ϕ between 0 and v (2 in this case). In the second figure, the shaded region is the
from Millar [Mil51]). integral of ψ between 0 and ϕ(v) (just turn the picture on its side). It is clear that these are
complementary parts of the rectangle between the axes and the point (v, ϕ(v)).
In the classical case, Φ and Ψ are the same. While they are not the same here, their sum is. We
will later prove that when v = ψ(f ),

Ψ(f ) + Φ(v) = f v.

In fact, one can show that for all f and v,

Ψ(f ) + Φ(v) ≥ f v,

with equality only when v = ψ(f ).


Theorem 18.5.1. Under the conditions of Theorem 18.3.1, let fext be the vector of external flows 0.8 0.8

resulting from the induced voltages. Let f be the flow on the edges that is compatible with fext and 0.7 0.7

that minimizes X 0.6 0.6


def
Ψ(f ) = Ψ(a,b) (f(a,b) ). 0.5 0.5
(a,b)∈E 0.4 0.4

f
Then, f is the flow induced by the voltages shown to exist in Theorem 18.3.1. 0.3 0.3
0.2 0.2
0.1 0.1
Sketch. We first show that f is a potential flow. That is, that there exist voltages v so that for
0 0
every edge (a, b), f(a,b) = ϕ(a,b) (va − vb ). The theorem then follows by the uniqueness established 0 0.5 1 1.5 2 0 0.5 1 1.5 2
v v
in Theorem 18.3.1.
To prove that f is a potential flow, we consider the potential difference that the flow “wants” to
induce on each edge, ψ(f(a,b) ). There exist vertex potentials that agree with these desired
potential differences if an only if for every pair of vertices and for every pair of paths between
them, the sum of the desired potential differences along the edges in the paths is the same. To see
this, arbitrarily fix the potential of one vertex, such as s. We may then set the potential of any
other vertex a by summing the desired potential differences along the edges in any path from s. (a) Φ(v) (b) Ψ(f ) when f = ϕ(v)

Equivalenty, the desired potential differences are realizable if and only if the sum of these desired
The bottom line is that almost all of the classical theory can be carried over to nonlinear networks.
potential differences is zero around every cycle. To show that this is the case, we use the
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 157 CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 158

18.6 Thermistor Networks If the system converges, that is if the voltages at the nodes converge along with the potential
drops and flows across edges, then
We now turn our attention to networks of resistors whose resistance changes over time. We ∂Te
consider a natural model in which edges get “worn out”: as they carry more flow their resistance 0= = αe Te fe2 − (Te − TA ).
∂t
increases. One physical model that does this is a thermistor. A thermistor is a resistor whose
resistance increases with its temperature. These are used in thermostats. To turn this into a relationship between fe and ve , we apply the identity fe re = ve , which
Remember the “energy dissipation” of a resistor? The energy dissipates as heat. So, the becomes fe αe Te = ve , to obtain
temperature of resistor increases as its resistance times the square of the flow through it. To 0 = ve fe − Te + TA .
prevent the temperatures of the resistors from going to infinity, we will assume that there is an To eliminate the last occurence of Te , we then multiply by fe and apply the same identity to
ambient temperature TA , and that they tend to the ambient temperature. I will denote by Te the produce
temperature of resistor e, and I will assume that there is a constant αe for each resistor so that its 0 = ve fe2 − ve /αe + fe TA .
resistance
The solutions of this equation in fe are given by
re = αe Te . (18.1)
s  
1 TA 2 TA
We do not allow temperatures to be negative. fe = ± + − .
αe 2ve 2ve
Now, assume that we would like to either flow a current between two vertices s and t, or that we
have fixed the potentials of s and t. Given the temperature of every resistor at some moment, we The correct choice of sign is the one that gives this the same sign as ve :
can compute all their resistances, and then compute the resulting electrical flow as we did in s 
Lecture 13. Let fe be the resulting flow on resistor e. The temperature of e will increase by re fe2 , 1  (2ve )2
fe = + TA − TA  .
2 (18.3)
and it will also increase in proportion to the difference between its present temperature and the 2ve αe
ambient temperature.
This gives us the following differential equation for the change in the temperature of a resistor: When ve is small this approaches zero, so we define it to be zero when ve is zero. As ve becomes
−1/2
large this expression approaches αe . Similarly, when ve becomes very negative, this approaches
∂Te −1/2
= re fe2 − (Te − TA ). (18.2) −αe . If we now define s 
∂t
1  (2ve )2
Ok, there should probably be some constant multiplying the (Te − TA ) term. But, since I haven’t ϕe (ve ) = + TA2 − TA  ,
2ve αe
specified the units of temperature we can just assume that the constant is 1.
we see that this function satisfies properties a, b and c. Theorem 18.3.1 then tells us that a stable
By substituting in (18.1) we can eliminate the references to resistance. We thus obtain
solution exists.
∂Te
= αe Te fe2 − (Te − TA ).
∂t
18.7 Low Temperatures
There are now two natural questions to ask: does the system converge, and if so, what does it
converge to? If we choose to impose a current flow between s and t, the system does not need to We now observe that when the ambient temperature is low, a thermistor network produces a
converge. For example, consider just one resistor e between vertices s and t with αe = 2. We then minimum s-t cut in a graph. The weights of the edges in the graph are related to αe . For
find simplicity, we will just examine the case when all αe = 1. If we take the limit as TA approaches
∂Te
= αe Te fe2 − (Te − TA ) = 2Te − (Te − TA ) = Te + TA . zero, then the behavior of ϕe is 
∂t

0 if ve = 0
So, the temperature of the resistor will go to infinity.
ϕe (ve ) = 1 if ve > 0


For this reason, I prefer to just fix the voltages of certain vertices. Under these conditions, we can −1 if ve < 0.
prove that the system will converge. While I do not have time to prove this, we can examine what
it will converge to. We will obtain similar behavior for small TA : if there is a non-negligible potential drop across an
edge, then the flow on that edge will be near 1. So, every edge will either have a flow near 1 or a
CHAPTER 18. DYNAMIC AND NONLINEAR NETWORKS 159

negligible potential drop. When an edge has a flow near 1, its energy will be near 1. On the other
hand, the energy of edges with negligible potential drop will be near 0.
So, in the limit of small temperatures, the energy minimization problem becomes
X
min |v(a) − v(b)| .
v:v(s)=0,v(t)=1
(a,b)∈E

One can show that the minimum is achieved when all of the voltages are 0 or 1, in which case the
energy is the number of edges going between voltage 0 and 1. That is, the minimum is achieved
by a minimum s-t cut.

Part IV

Spectra and Graph Structure

160
CHAPTER 19. INDEPENDENT SETS AND COLORING 162

The problem of finding large independent sets in a graph is NP-Complete, and it is very difficult
to even approximate the size of the largest independent set in a graph [FK98, Hås99]. However,
for some carefully chosen graphs, spectral analysis provides very good bounds on the sizes of
independent sets.

Chapter 19 19.3 Hoffman’s Bound

One of the first results in spectral graph theory was Hoffman’s [Hof70] proof the following upper
bound on the size of an independent set in a graph G.
Independent Sets and Coloring Theorem 19.3.1. Let G = (V, E) be a d-regular graph, and let µn be its smallest adjacency
matrix eigenvalue. Then
−µn
α(G) ≤ n .
d − µn
19.1 Introduction
Recall that µn < 0. Otherwise this theorem would not make sense. We will prove a generalization
of Hoffman’s theorem due to Godsil and Newman [GN08]:
We will see how high-frequency eigenvalues of the Laplacian and Adjacency matrix can be related
to independent sets and graph coloring. Recall the we number the Laplacian matrix eigenvalues Theorem 19.3.2. Let G = (V, E) be a graph, let S be an independent set in G, and let dave (S)
in increasing order: be the average degree of a vertex in S. Then,
0 = λ1 ≤ λ2 ≤ · · · ≤ λn .  
dave (S)
We call the adjacency matrix eigenvalues µ1 , . . . , µn , and number them in the reverse order: |S| ≤ n 1 − .
λn
µ1 ≥ · · · ≥ µn .
The reason we reverse the order of indexing is that for d-regular graphs, µi = d − λi . For a This is a generalization of Theorem 19.3.1 because in the d-regular case dave = d and λn = d − µn .
non-empty graph, µn will be negative. So, these bounds are the same for regular graphs:

dave (S) λn − d −µn


1− = = .
19.2 Graph Coloring and Independent Sets λn λn d − µn

Proof. The Courant-Fischer Theorem tells us that


A coloring of a graph is an assignment of one color to every vertex in a graph so that each edge
connects vertices of different colors. We are interested in coloring graphs while using as few colors x T Lx
λn = max .
as possible. Formally, a k-coloring of a graph is a function c : V → {1, . . . , k} so that for all x xTx
(u, v) ∈ E, c(u) ̸= c(v). A graph is k-colorable if it has a k-coloring. The chromatic number of a
graph, written χ(G), is the least k for which G is k-colorable. A graph G is 2-colorable if and Let S be an independent set of vertices, let 1S be the characteristic vector of S, and let d(S) be
only if it is bipartite. Determining whether or not a graph is 3-colorable is an NP-complete the sum of the degrees of vertices in S. Consider the vector
problem [Kar72]. The famous 4-Color Theorem [AH77a, AH77b] says that every planar graph is
4-colorable. x = 1S − s1,
A set of vertices S is independent if there are no edges between vertices in S. In particular, each where s = |S| /n. The vector x is the result of projecting 1S onto the subspace orthogonal to 1.
color class in a coloring is an independent set. The size of the largest independent set in a graph, We will often apply this natural operation to vectors that we plan to multiply by a Laplacian. As
which we call its independence number is written α(G). As a k-colorable graph with n vertices S is independent and L1 = 0, we have
must have a color class of size at least n/k, we have X
n x T Lx = 1S L1S = (1S (a) − 1S (b))2 = d(S) = dave (S) |S| ,
α(G) ≥ .
χ(G) a∼b

161
CHAPTER 19. INDEPENDENT SETS AND COLORING 163 CHAPTER 19. INDEPENDENT SETS AND COLORING 164

where the third equality follows from the independence of S. The reason that we subtracted s1 Corollary 7.3.2 tells us that if G is a Paley graph on n = p vertices of degree d = (p − 1)/2, then

from 1S is that this minimizes the norm of the result. We compute λn = (p + p)/2. So, for an independent set S, Hoffman’s bound tells us that
 
x T x = |S| (1 − s)2 + (|V | − |S|)s2 = |S| (1 − 2s + s2 ) + |S| s − |S| s2 = |S| (1 − s) = n(s − s2 ). dave (S)
|S| ≤ n 1 −
λn
Thus,  
p−1
x T Lx dave (S) |S| dave (S)sn dave (S) =p 1− √
λn ≥ = = = . p+ p
xTx n(s − s2 ) n(s − s2 ) 1−s √ 
p+1
Re-arranging terms, this gives =p √
p+ p
dave (S) √
1− ≥ s, = p.
λn
which is equivalent to the claim of the theorem. √
One can also show that every clique in a Paley graph has size at most p.

Remarkably, this theorem holds for weighted graphs, even though edge weights do not play a role A graph is called a k-Ramsey graph if it contains no clique or independent set of size k. It is a
in independence of subsets of vertices. challenge to find large k-Ramsey graphs. Equivalently, it is challenging to find k-Ramsey graphs
on n vertices for which k is small. In one of the first papers on the Probabilistic Method in
We will use the computation of the norm of x often, so we will make it a claim. Combinatorics, Erdös proved that a random graph on n vertices in which each edge is included
with probability 1/2 is probably 2 log2 n Ramsey [Erd47].
Claim 19.3.3. Let S ⊆ V have size s |V |. Then
However, constructing explicit Ramsey graphs has proved much more challenging. Until a decade
∥1S − s1∥2 = s(1 − s) |V | . ago, Paley graphs were among the best known. A recent construction of Chattopadhyay and
Zuckerman [CZ19] provides explicit graphs on n vertices that do not have cliques or independent
Claim 19.3.4. For a vector x of length n, the value of t that minimizes the norm of x − t1 is O(1)
sets of size 2(log log n) .
t = 1T x /n.

Proof. The derivative of the square of the norm is 19.5 Lower Bound on the chromatic number
d X X
(x (a) − t)2 = 2 (x (a) − t).
dt a a
As a k-colorable graph must have an independent set of size at least n/k, an upper bound on the
sizes of independent sets gives a lower bound on its chromatic number. However, this bound is
When the norm is minimized the derivative is zero, which implies not always a good one.
X
nt = x (a) = 1T x . For example, consider a graph on 2n vertices consisting of a clique (complete graph) on n vertices
a and n vertices of degree 1, each of which is connected to a different vertex in the clique. The
chromatic number of this graph is n, because each of the vertices in the clique must have a
different color. However, the graph also has an independent set of size n, which would only give a
lower bound of 2 on the chromatic number.

19.4 Application to Paley graphs Hoffman [Hof70] proved the following lower bound on the chromatic number of a graph that does
not require the graph to be regular, and which can be applied to weighted graphs. Numerically, it
is obtained by dividing n by the bound in Theorem 19.3.1. But, the proof is very different
Let’s examine what Hoffman’s bound on the size of the largest independent set tells us about
because that theorem only applies to regular graphs.
Paley graphs.
Theorem 19.5.1.
µ1 − µn µ1
χ(G) ≥ =1+ .
−µn −µn

The proof of this theorem relies on the following inequality whose proof we defer to Section 19.6.
CHAPTER 19. INDEPENDENT SETS AND COLORING 165 CHAPTER 19. INDEPENDENT SETS AND COLORING 166

To state it, we introduce the notation λmax (M ) and λmin (M ) to indicate the largest and smallest Lemma 19.6.1. Let  
eigenvalues of the matrix M . B C
A=
CT D
Lemma 19.5.2. Let  
M 1,1 M 1,2 · · · M 1,k be a symmetric matrix. Then
M T1,2 M 2,2 · · · M 2,k 
  λmin (A) + λmax (A) ≤ λmax (B) + λmax (D).
M =  .. .. .. .. 
 . . . . 
T T
M 1,k M 2,k · · · M k,k  
x1
Proof. Let x be a unit eigenvector of A of eigenvalue λmax (A). Write x = , using the same
be a block-partitioned symmetric matrix with k ≥ 2. Then x2
X partition as we did for A.
(k − 1)λmin (M ) + λmax (M ) ≤ λmax (M i,i ).
i
We first consider the case in which neither x 1 nor x 2 is an all-zero vector. In this case, we set
∥x 2 ∥
!
Proof of Theorem 19.5.1. Let G be a k-colorable graph. After possibly re-ordering the vertices, ∥x 1 ∥ x 1
y= .
the adjacency matrix of G can be written − ∥x 1∥
∥x 2 ∥ x 2
 
0 M 1,2 ··· M 1,k The reader may verify that y is also a unit vector, so
M T1,2 0 ··· M 2,k 
 
 .. .. .. ..  . (19.1) y T Ay ≥ λmin (A).
 . . . . 
M T1,k M T2,k · · · 0 We have
Each block corresponds to a color. λmax (A) + λmin (A) ≤ x T Ax + y T Ay
As each diagonal block is all-zero, Lemma 19.5.2 implies = x T1 Bx 1 + x T1 C x 2 + x T2 C T x 1 + x T2 Dx 2 +

(k − 1)λmin (M ) + λmax (M ) ≤ 0. ∥x 2 ∥2 T ∥x 1 ∥2 T
+ x 1 Bx 1 − x T1 C x 2 − x T2 C T x 1 + x 2 Dx 2
∥x 1 ∥2 ∥x 2 ∥2
Recalling that λmin (M ) = µn < 0, and λmax (M ) = µ1 , a little algebra yields
∥x 2 ∥2 T ∥x 1 ∥2 T
= x T1 Bx 1 + x T2 Dx 2 + 2 x 1 Bx 1 + x 2 Dx 2
µ1 ∥x 1 ∥ ∥x 2 ∥2
1+ ≤ k. ! !
−µn
∥x 2 ∥2 ∥x 1 ∥2
≤ 1+ 2 x T1 Bx 1 + 1 + x T2 Dx 2
∥x 1 ∥ ∥x 2 ∥2
   
≤ λmax (B) ∥x 1 ∥2 + ∥x 2 ∥2 + λmax (D) ∥x 1 ∥2 + ∥x 2 ∥2
To return to our example of the n clique with n degree-1 vertices attached, I examined an
example with n = 6. We find µ1 = 5.19 and µ12 = −1.62. This gives a lower bound on the = λmax (B) + λmax (D),
chromatic number of 4.2, which implies a lower bound of 5. We can improve the lower bound by
re-weighting the edges of the graph. For example, if we give weight 2 to all the edges in the clique as x is a unit vector.
and weight 1 to all the others, we obtain a bound of 5.18, which agrees with the chromatic
We now return to the case in which ∥x 2 ∥ = 0 (or ∥x 1 ∥ = 0, which is really the same case).
number of this graph which is 6.
Theorem 4.3.1 tells us that λmax (B) ≤ λmax (A). So, it must be the case that x 1 is an eigenvector
of eigenvalue λmax (A) of B, and thus λmax (B) = λmax (A). To finish the proof, also observe that
Theorem 4.3.1 implies
19.6 Proofs for Hoffman’s lower bound on chromatic number λmax (D) ≥ λmin (D) ≥ λmin (A).

To prove Lemma 19.5.2, we begin with the case of k = 2. The general case follows from this one
by induction. While the lemma in the case k = 2 when there are zero blocks on the diagonal
follows from Proposition 4.5.4, we require the general statement for induction.
CHAPTER 19. INDEPENDENT SETS AND COLORING 167

Proof of Lemma 19.5.2. For k = 2, this is exactly Lemma 19.6.1. For k > 2, we apply induction.
Let  
M 1,1 M 1,2 · · · M 1,k−1
 M T1,2 M · · · M 
 2,2 2,k−1 
B = .. .. .. .. .
 . . . . 
T T
M 1,k−1 M 2,k−1 · · · M k−1,k−1
Theorem 4.3.1 now implies.
Chapter 20
λmin (B) ≥ λmin (M ).
Applying Lemma 19.6.1 to B and the kth row and column of M , we find

λmin (M ) + λmax (M ) ≤ λmax (M k,k ) + λmax (B)


Graph Partitioning
k−1
X
≤ λmax (M k,k ) + λmax (M i,i ) − (k − 2)λmin (B) (by induction)
i=1
k Computer Scientists are often interested in cutting, partitioning, and finding clusters of vertices in
X
= λmax (M i,i ) − (k − 2)λmin (B) graphs. This usually means finding a set of vertices that is connected to the rest of the graph by a
i=1 small number of edges. There are many ways of balancing the size of the set of vertices with the
k
X number of edges. We will examine isoperimetric ratio and conductance, and will find that they
≤ λmax (M i,i ) − (k − 2)λmin (M ), are intimately related to the second-smallest eigenvalue of the Laplacian and the normalized
i=1 Laplacian. The motivations for measuring these range from algorithm design to data analysis.
because λmin (B) ≥ λmin (M ). Rearranging terms gives
k
X 20.1 Isoperimetry and λ2
(k − 1)λmin (M ) + λmax (M ) ≤ λmax (M i,i ).
i=1
Let S be a subset of the vertices of an unweighted graph. One way of measuring how well S can
be separated from the graph is to count the number of edges connecting S to the rest of the
graph. These edges are called the boundary of S, which we formally define by
def
∂(S) = {(a, b) ∈ E : a ∈ S, b ̸∈ S} .

We are less interested in the total number of edges on the boundary than in the ratio of this
number to the size of S itself. For now, we will measure this in the most natural way—by the
number of vertices in S. We will call this ratio the isoperimetric ratio of S, and define it by

def |∂(S)|
θ(S) = .
|S|

The isoperimetric ratio of a graph1 is the minimum isoperimetric ratio over all sets of at most
half the vertices:
def
θG = min θ(S).
|S|≤n/2

We will now derive a lower bound on θG in terms of λ2 . We will present an approximate converse
to this lower bound, known as Cheeger’s Inequality, in Chapter 21.
1
Other authors call this the isoperimetric number.

168
CHAPTER 20. GRAPH PARTITIONING 169 CHAPTER 20. GRAPH PARTITIONING 170

Theorem 20.1.1. For every S ⊂ V This theorem says that if λ2 is big, then G is very well connected: the boundary of every small set
of vertices is at least λ2 times something just slightly smaller than the number of vertices in the
θ(S) ≥ λ2 (1 − s), set.

where s = |S| / |V |. In particular, Re-arranging terms slightly, Theorem 20.1.1 can be stated as
θG ≥ λ2 /2. θ(S) |∂(S)|
= |V | ≥ λ2 .
1−s |S| |V − S|
Proof. As
x T LG x We sometimes favor the quantity in the middle above over the isoperimetric ratio because
λ2 = min ,
x :x T 1=0 xTx |∂(S)|
,
for every non-zero x orthogonal to 1 we know that |S| |V − S|
eliminates the need to restrict |S| ≤ |V | /2.
x T LG x ≥ λ2 x T x .

To exploit this inequality, we need a vector related to the set S. A natural choice is 1S , the
characteristic vector of S,
20.2 Conductance
(
1 if a ∈ S
1S (a) = Conductance is a variant of the isoperimetic ratio that applies to weighted graphs, and that
0 otherwise.
measures sets of vertices by the sum of their weighted degrees. Instead of counting the edges on
the boundary, it counts the sum of their weights. We write d(S) for the sum of the degrees of the
We find X vertices in S. d(V ) is twice the sum of the weights of edges in the graph, because each edge is
1TS LG 1S = (1S (a) − 1S (b))2 = |∂(S)| .
attached to two vertices. For a set of edges F , we write w(F ) for the sum of the weights of edges
(a,b)∈E
in F . We can now define the conductance of S to be
However, χS is not orthogonal to 1. To fix this, use
def w(∂(S))
ϕ(S) = .
x = 1S − s1, min(d(S), d(V − S))
Note that many slightly different definitions appear in the literature. For example, we could
so ( instead use
1−s for a ∈ S, and w(∂(S))
x (a) = d(V ) ,
−s otherwise. d(S)d(V − S)
which appears below in (20.3).
We have x T 1 = 0, and
We define the conductance of a graph G to be
X
x T LG x = ((1S (a) − s) − (1S (b) − s))2 = |∂(S)| . def
ϕG = min ϕ(S).
(a,b)∈E S⊂V

The conductance of a graph is more useful in many applications than the isoperimetric ratio. I
Claim 19.3.3 tells us that the square of the norm of x is
usually find that conductance is the more useful quantity when you are concerned about edges,
x T x = n(s − s2 ). and that isoperimetric ratio is most useful when you are concerned about vertices. Conductance
is particularly useful when studying random walks in graphs.
So,
1TS LG 1S |∂(S)|
λ2 ≤ = . 20.3 The Normalized Laplacian
1TS 1S |S| (1 − s)

As
1TS L1S w(∂(S))
= ,
1TS D1S d(S)
CHAPTER 20. GRAPH PARTITIONING 171 CHAPTER 20. GRAPH PARTITIONING 172

it seems natural to try to relate the conductance to the following generalized Rayleigh quotient: where
σ = d(S)/d(V ).
y T Ly
. (20.1) You should now check that yT d = 0:
y T Dy
T
If we make the change of variables y d= 1TS d − σ1T d = d(S) − (d(S)/d(V ))d(V ) = 0.
1/2
D y = x,
We already know that
then this ratio becomes y T Ly = w(∂(S)).
x T D −1/2 LD −1/2 x
.
xTx It remains to compute y T Dy . If you remember the computation in Claim 19.3.3, you would
This is an ordinary Rayleigh quotient, which we understand a little better. The matrix in the guess that it is d(S)(1 − σ) = d(S)d(V − S)/d(V ), and you would be right:
middle is called the normalized Laplacian (see [Chu97]). We reserve the letter N for this matrix: X X
y T Dy = d(u)(1 − σ)2 + d(u)σ 2
def
N = D −1/2 LD −1/2 . u∈S u̸∈S

= d(S)(1 − σ)2 + d(V − S)σ 2


This matrix often proves more useful when examining graphs in which the degrees of vertices
vary. We will let 0 = ν1 ≤ ν2 ≤ · · · ≤ νn denote the eigenvalues of N . = d(S) − 2d(S)σ + d(V )σ 2
= d(S) − d(S)σ, as d(S) = d(V )σ
The eigenvector of eigenvalue 0 of N is d 1/2 , by which I mean the vector whose entry for vertex a
is the square root of the degree of a. Observe that = d(S)d(V − S)/d(V ).

D −1/2 LD −1/2 d 1/2 = D −1/2 L1 = D −1/2 0 = 0. So,


y T Ly w(∂(S))
ν2 ≤ = d(V ) . (20.3)
y T Dy d(S)d(V − S)
The eigenvector of ν2 is given by
xTN x
arg min .
x ⊥d 1/2 xTx Corollary 20.3.2. For every S ⊂ V ,
Transferring back into the variable y , and observing that ϕ(S) ≥ ν2 /2.

x T d 1/2 = y T D1/2 d 1/2 = y T d , Proof. As the larger of d(S) and d(V − S) is at least half of d(V ), we find
we find w(∂(S))
y T Ly ν2 ≤ 2 .
ν2 = min . min(d(S), d(V − S))
y ⊥d y T Dy

The conductance is related to ν2 as the isoperimetric ratio is related to λ2 :

ν2 /2 ≤ ϕG . (20.2)
20.4 Notes
Lemma 20.3.1. For every S ⊂ V ,
There are many variations on the definitions used in this chapter. For example, sometimes one
w(∂(S)) wants to measure the number of vertices on the boundary of a set, rather than the number of
ν2 ≤ d(V ) .
d(S)d(V − S) edges. The ratio of the number of boundary vertices to internal vertices is often called expansion.
But, authors are not consistent about these and related terms. Cut ratio is sometimes used
Proof. We would again like to again use 1S as a test vector. But, we need to shift it so that it is instead of isoperimetric ratio. When reading anything in this area, be sure to check the formulas
orthogonal to d . Set for the definitions.
y = 1S − σ1,
CHAPTER 21. CHEEGER’S INEQUALITY 174

By renumbering the vertices, we may assume without loss of generality that

y (1) ≤ y (2) ≤ · · · ≤ y (n).

To center y , let j be the least number for which


j
X
Chapter 21 d (a) ≥ d(V )/2.
a=1

We then set
Cheeger’s Inequality z = y − y (j)1.
This vector z satisfies z (j) = 0. And, the following lemma tells us that

z T Lz y T Ly
T
≤ T .
z Dz y Dy
In the last chapter we learned that ϕ(S) ≥ ν2 /2 for every S ⊆ V . Cheeger’s inequality is a partial
converse. It says that there exists a set of vertices S for which Lemma 21.1.2. Let v s = y + s1. Then, the minimum of v Ts Dv s is achieved at the s for which
√ v Ts d = 0.
ϕ(S) ≤ 2ν2 ,

and provides an algorithm for using the eigenvector of ν2 to find such a set. Proof. The derivative with respect to s is 2d T v s , and this is zero at the minimum.

Cheeger [Che70] first proved his famous inequality for manifolds. Many discrete versions of Theorem 21.1.3. Let G be a weighted graph, let L be its Laplacian, and let d be its vector of
Cheeger’s inequality were proved in the late 80’s [SJ89, LS88, AM85, Alo86, Dod84, Var85]. Some weighted degrees. Let z be a vector that is centered with respect to d . Then, there is a number τ
of these consider the walk matrix instead of the normalized Laplacian, and some consider the for which the set Sτ = {a : z (a) < τ } satisfies
isoperimetic ratio instead of conductance. The proof in this Chapter follows an approach r
developed by Trevisan [Tre11]. z T Lz
ϕ(Sτ ) ≤ 2 T .
z Dz

21.1 Cheeger’s Inequality We assume without loss of generality that

z (1)2 + z (n)2 = 1.
Cheeger’s inequality proves that if we have a vector y , orthogonal to d , for which the generalized
Rayleigh quotient (20.1) is small, then one can obtain a set of small conductance from y . We This can be achieved by multiplying z by a constant. We begin our proof of Cheeger’s inequality
obtain such a set by carefully choosing a real number τ , and setting by defining
z T Lz
Sτ = {a : y (a) ≤ τ } . ρ= T .
z Dz

So, we need to show that there is a τ for which ϕ(Sτ ) ≤ 2ρ.
We should think of deriving y from an eigenvector of ν2 of the normalized Laplacian. If ψ 2 is an
eigenvector of ν2 , then y = D 1/2 ψ 2 is orthogonal to d and the generalized Rayleigh quotient Recall that
w(∂(S))
(20.1) of y with respect to L and D equals ν2 . But, the theorem can make use of any vector that ϕ(S) = .
is orthogonal to d that makes the generalized Rayleigh quotient small. In fact, we prefer vectors min(d(S), d(V − S))
that are centered with respect to d . We will define a distribution on τ for which we can prove that
p
Definition 21.1.1. A vector y is centered with respect to d if E [w(∂(Sτ ))] ≤ 2ρ E [min(d(Sτ ), d(V − Sτ ))] .
X X
d (a) ≤ d (V )/2 and d (a) ≤ d (V )/2.
a:y (a)>0 a:y (a)<0

173
CHAPTER 21. CHEEGER’S INEQUALITY 175 CHAPTER 21. CHEEGER’S INEQUALITY 176

This implies1 that there is some τ for which Regardless of the signs,
p
w(∂(Sτ )) ≤ 2ρ min(d(Sτ ), d(V − Sτ )), z (a)2 − z (b)2 = |(z (a) − z (b))(z (a) + z (b))| ≤ |z (a) − z (b)| (|z (a)| + |z (b)|).

which means ϕ(S) ≤ 2ρ. When sgn(a) = −sgn(b),
To switch from working with y to working with z , define We will set Sτ = {a : z (a) ≤ τ }. z (a)2 + z (b)2 ≤ (z (a) − z (b))2 = |z (a) − z (b)| (|z (a)| + |z (b)|).
Trevisan had the remarkable idea of choosing τ between z (1) and z (n) with probability density
2 |t|. That is, the probability that τ lies in the interval [a, b] is
Z b
2 |t| dt. We now derive a formula for the expected denominator of ϕ.
t=a
Lemma 21.1.5.
To see that the total probability is 1, observe that T
Et [min(d(Sτ ), d(V − Sτ ))] = z Dz .
Z z (n) Z 0 Z z (n)
2 |t| dt = 2 |t| dt + 2 |t| dt = z (n)2 + z (1)2 = 1,
t=z (1) t=z (1) t=0 Proof. Observe that
X X
as z (1) ≤ 0 ≤ z (n). Et [d(Sτ )] = Prt [a ∈ Sτ ] d(a) = Prt [z (a) ≤ τ ] d(a).
a a
Similarly, the probability that τ lies in the interval [a, b] is
Z The result of our centering of z at j is that
b
2 2
2 |t| dt = sgn(b)b − sgn(a)a ,
t=a
τ < 0 =⇒ d(S) = min(d(S), d(V − S)), and
τ ≥ 0 =⇒ d(V − S) = min(d(S), d(V − S)).
where 

1 if x > 0
That is, for a < j, a is in the smaller set if τ < 0; and, for a ≥ j, a is in the smaller set if τ ≥ 0.
sgn(x) = 0 if x = 0, and So,


−1 if x < 0. X X
Et [min(d(Sτ ), d(V − Sτ ))] = Pr [z (a) < τ and τ < 0] d(a) + Pr [z (a) > τ and τ ≥ 0] d(a)
Lemma 21.1.4. a<j a≥j
X X X X
Et [w(∂(Sτ ))] = wa,b Prt [(a, b) ∈ ∂(Sτ )] ≤ wa,b |z (a) − z (b)| (|z (a)| + |z (b)|). (21.1) = Pr [z (a) < τ < 0] d(a) + Pr [z (a) > τ ≥ 0] d(a)
(a,b)∈E (a,b)∈E a<j a≥j
X X
= z (a)2 d(a) + z (a)2 d(a)
Proof. An edge (a, b) with z (a) ≤ z (b) is on the boundary of S if a<j a≥j
X
= z (a)2 d(a)
z (a) ≤ τ < z (b).
a

The probability that this happens is = z T Dz .


(
z (a)2 − z (b)2 when sgn(a) = sgn(b),
sgn(z (b))z (b)2 − sgn(z (a))z (a)2 =
z (a)2 + z (b)2 ̸ sgn(b).
when sgn(a) =
Recall that our goal is to prove that
We now show that both of these terms are upper bounded by p
E [w(∂(Sτ ))] ≤ 2ρ E [min(d(Sτ ), d(V − Sτ ))] ,
|z (a) − z (b)| (|z (a)| + |z (b)|).
1
√  and we know that
If this is not immediately clear, note that it is equivalent to assert that E 2ρ min(d(S), d(V − S)) − w(∂(S)) ≥ T
0, which means that there must be some S for which the expression is non-negative.
Et [min(d(Sτ ), d(V − Sτ ))] = z Dz
CHAPTER 21. CHEEGER’S INEQUALITY 177

and that X
Et [w(∂(Sτ ))] ≤ wa,b |z (a) − z (b)| (|z (a)| + |z (b)|).
(a,b)∈E

We may use the Cauchy-Schwartz inequality to upper bound the term above by
s X s X
wa,b (z (a) − z (b))2 wa,b (|z (a)| + |z (b)|)2 . (21.2) Chapter 22
(a,b)∈E (a,b)∈E

We have defined ρ so that the term under the left-hand square root is at most

z T Lz ≤ ρz T Dz . Local Graph Clustering


To bound the right-hand square root, we observe
X X  X
wa,b (|z (a)| + |z (b)|)2 ≤ 2 wa,b z (a)2 + z (b)2 = 2 z (a)2 d(a) = 2z T Dz .
Local graph clustering algorithms discover small clusters of low conductance near a given input
(a,b)∈E (a,b)∈E a
vertex. Imagine that a graph has a cluster S that is not too big–d (S) is small relative to
Putting all these inequalities together yields d (V )–and that has low conductance. Also imagine that we know some vertex a ∈ S. Local
p √ clustering algorithms give us a way of computing a cluster nearby S of similar size and
E [w(∂(S))] ≤ ρz T Dz 2z T Dz conductance. They are not guaranteed to work for all a ∈ S. But, we can show that they work for
p “most” a ∈ S, where we have to measure “most” by weighted degree. In this chapter, we will see
= 2ρz T Dz
p an elegant analysis due to Kwok, Lau and Lee [KLL16] of a random-walk based local graph
= 2ρE [min (d(S), d(V − S))] . clustering algorithm suggested by Spielman and Teng [ST04, ST13]
Most local clustering algorithms can be implemented to run on unweighted graphs in time
depending on d (S), rather that on the size of the graph. This means that they can find the
cluster without having to examine the entire graph! Many of the developments in these
algorithms have improved the running time, the size of the set returned, and the conductance of
the set returned. The end of the chapter contains pointers to major advances in these algorithms.
In this chapter, we focus on proving that we can find a cluster approximately as good as S,
without optimizing parameters or run time.

22.1 The Algorithm

The input to the algorithm is a target set size, s, a conductance bound ϕ, and a seed vertex, a.
We will prove that if G contains a set S with d (S) ≤ s ≤ d (V )/32 and ϕ(S) ≤ ϕ, then there is an
a ∈ S such that when thepalgorithm is run with these parameters, it will return a set T with
d (T ) ≤ 16s and ϕ(T ) ≤ 8 ln(8s)ϕ. For the rest of this chapter we will assume that G does
contain a set S that satisfies these conditions.
Here is the algorithm.

1. Set p 0 = δ a .
2. Set t = 1/2ϕ (we will assume that t is an integer).

178
CHAPTER 22. LOCAL GRAPH CLUSTERING 179 CHAPTER 22. LOCAL GRAPH CLUSTERING 180

t
f p 0.
3. Set y = D −1 W Proof. We will upper bound the probability that the lazy walk leaves S in each step by ϕ(S)/2. In
the first step, the probability that the lazy walk leaves S is exactly the sum over vertices a in S of
4. Return the set of the form Tτ = {b : y (b) > τ } that has least conductance among those with the probability the walk begins at a times the probability it follows an edge to a vertex not in S:
d (Tτ ) ≤ 8s.
X 1 X wa,b 1 1 X 1 w(∂(S)) 1
p S (a) = wa,b = = ϕ(S).
Recall that the stable distribution of the random walk on a graph is d /(1T d ).
So, to measure 2 d (a) 2 d (S) 2 d (S) 2
a∈S b∼a a∈S
how close a probability distribution p is to the stable distribution, we could ask how close D −1 p b̸∈S b̸∈S

is to being constant. In this chapter, we will measure this by the generalized Rayleigh quotient We now wish to show that in every future step the probability that the lazy walk leaves S is at
p T D −1 LD −1 p most this large. To this end, let p 0 = p S , and define
.
p T D −1 p f p i−1 .
pi = W
−1
When we want to apply Cheeger’s inequality, we will change variables to y = D p. In these
We now show by induction that for every a ∈ V , p i (a) ≤ d (a)/d (S). This is true for p 0 , and in
variables, the above quotient becomes f and
fact the inequality is tight for a ∈ S. To establish the induction, note that all entries of W
y T Ly
. p i−1 are nonnegative. So, the assumption that p i−1 is entrywise at most d (a)/d (S) implies that
y T Dy
for a ∈ S
f p i−1 ≤ δ T W
δ Ta p i = δ Ta W f d /d (S) = δ T d /d (S) = d (a)/d (S).
We will work with the lazy random walk matrix a a

Thus, the probability that the walk transitions from a vertex in S to a vertex not in S at step i
f = 1 (I + W ) = 1 (I + M D −1 ).
W satisfies
2 2 X 1 X wa,b X 1 X wa,b 1
p i (a) ≤ p S (a) = ϕ(S).
2 d (a) 2 d (a) 2
a∈S b∼a a∈S b∼a
b̸∈S b̸∈S
22.2 Good choices for a

We will say that a vertex a ∈ S is good for S if Lemma 22.2.2. The set S contains a good vertex a.
d (a) 1 f t δ a ≥ 1/2.
≥ and 1TS W Proof. After we expand p S in an elementary unit basis as
d (S) 2 |S|
X d (a)
The second inequality says that after t steps the lazy walk that starts at a will be in S with pS = δa,
probability at least 1/2. In this section we show that S contains a good vertex. We will then show d (S)
a∈S
that the local clustering algorithm succeeds if it begins at a good vertex.
Lemma 22.2.1 tells us that
Consider the distribution on vertices that corresponds to choosing a vertex at random from S X d (a)
1T W f t δ a ≤ tϕ(S)/2.
with probability proportional to its degree: d (S) V −S
a∈S
(
def d (a)/d (S), for a ∈ S Define fa to be the indicator1 for the event that
pS =
0, otherwise. t
f δ a > tϕ(S)
1TV −S W

The following lemma says that if we start a walk from a random vertex in S chosen with and let ba be the indicator for the event that
probability proportional to degree, then the probability it is outside S on the tth step of the lazy
walk is at most tϕ(S)/2. d (a) 1
< .
d (S) 2 |S|
t
f p S . Then
Lemma 22.2.1. Let S be a set with d (S) ≤ d (V )/2. Let p t = W 1
That is, fa = 1 if the event holds, and fa = 0 otherwise.
1TV −S p t ≤ tϕ(S)/2.
CHAPTER 22. LOCAL GRAPH CLUSTERING 181 CHAPTER 22. LOCAL GRAPH CLUSTERING 182

By an application of what is essentially Markov’s inequality, we conclude 22.4 Bounding the Generalized Rayleigh Quotient
X d (a) 1
fa < . The following lemma allows us to measure how close a walk is to convergence merely in terms of
d (S) 2
a∈S the quadratic form p Tt D −1 p t and the number of steps t.
As t
f p 0 for some probability vector p 0 . Then
X d (a) X 1 X 1 Lemma 22.4.1. Let p t = W
ba < ba ≤ = 1/2.
d (S) 2 |S| 2 |S|  
a∈S a∈S a∈S p Tt D −1 LD −1 p t 1 p 0 D −1 p 0
T −1 ≤ ln −1 .
Thus, there is a vertex for which neither fa nor ba hold. As pt D pt t p tD p t

f t δ a = 1 − 1T W
1TS W f tδa The proof of Lemma 22.4.1 rests on the following standard inequality.
V −S

and tϕ(S) ≤ 1/2, such a vertex is good. Theorem 22.4.2. [Power Means Inequality] For k > h > 0, nonnegative numbers w1 , . . . , wn that
sum to 1, and nonnegative numbers λ1 , . . . , λn ,
By slightly loosening the constants in the definition of “good”, we could prove that most vertices !1/k !1/h
n
X n
X
of S are good, where “most” is defined by sampling with probability proportional to degree.
wi λki ≥ wi λhi
i=1 i=1

22.3 Bounding the D-norm Proof of Lemma 22.4.1. Define


z t = D −1/2 p t ,
Claim 22.3.1. For a probability vector p,
so
(1T p)2 p Tt D −1 LD −1 p t = z Tt D −1/2 LD −1/2 z t = z Tt N z t , and p Tt D −1 p t = z Tt z t .
T −1
p D p≥ S .
d (S) Write z 0 = D −1/2 p 0 in the eigenbasis of N as
X
Proof. Write X Xp   z0 = ci ψ i ,
p
1TS p = p(a) = d (a) p(a)/ d (a) i
a∈S a∈S
and set
and apply the Cauchy-Schwartz inequality to conclude 1 1
γ=P 2 =
! ! ! ! i ci z T0 z 0
2 X X X X P 2
T
1S p ≤ d (a) 2
p(a) /d (a) ≤ d (a) p(a) /d (a) = d (S)p T D −1 p.
2 so that i γci = 1. We have
a∈S a∈S a∈S a  t
f t p 0 = D −1/2 W
z t = D −1/2 W f t D 1/2 z 0 = D −1/2 W
f D 1/2 z 0 .

Recall from Chapter 10 that


If a is good for S, then 1TS p t ≥ 1/2, and so f D 1/2 = I − 1 N ,
D −1/2 W
2
1
p Tt D −1 p t ≥ . and that the eigenvalues of these matrices are related by
4d (S)
νi = 2 − 2ωi .

Thus, X
zt = ci ωit ψ i ,
i
CHAPTER 22. LOCAL GRAPH CLUSTERING 183 CHAPTER 22. LOCAL GRAPH CLUSTERING 184

and X X X 22.5 Rounding


z Tt N z t = c2i νi ωi2t ≤ 2 c2i ωi2t − 2 c2i ωi2t+1 .
i i i def
To apply Cheeger’s inequality, Theorem 21.1.3, we first change variables from p t to y = D −1 p t .
Thus,
As 1T p t = 1, the vector y satisfies d T y = 1, and
P P
γz Tt N z t 2 2 2t −
i γci ωP 2 i γc2i ωi2t+1 .
= i
2 2t p Tt D −1 LD −1 p t y T Ly
γz Tt z t i γci ωi T −1 = T .
P 2 2t+1 pt D pt y Dy
γc ω
= 2 − 2 Pi i 2 i 2t .
i γci ωi So that we can be sure that the algorithm underlying Theorem 21.1.3 will find a set T that is not
P 2 too big, we will round to zero all the small entries of y and call the result x . While this is not
To upper bound this last term, we recall that i γci = 1 and apply the Power Means Inequality
to show necessary for the algorithm, it does facilitate analysis.
!1/(2t+1) !1/(2t) Define
X X
2 2t+1 2 2t x (a) = max(0, y (a) − 1/16s). (22.1)
γci ωi ≥ γci ωi =⇒
i i
! !1+1/(2t) If s ≤ d (V )/32, then x will be balanced with respect to d . This is because at most half its entries
X X
γc2i ωi2t+1 ≥ γc2i ωi2t =⇒ (measured by degree) will be positive. Formally,
i i X X X X
P 2 2t+1 !1/(2t) d (a) = d (a) < 16sp t (a) ≤ 16sp t (a) ≤ 16s.
γc ω X
a
Pi i 2 i 2t ≥ γc2i ωi2t . a:y (a)>1/16s a:p t (a)>d (a)/16s a:p t (a)>d (a)/16s
i γci ωi i
As Cheeger’s inequality will produce a set of the form
This implies
Tτ = {a : y (a) > τ } ,
P 2 2t+1 !1/2t
γc ω X
2 − 2 Pi i 2 i 2t ≤ 2 − 2 γc2i ωi2t this set will satisify d (Tτ ) ≤ 16s.
i γci ωi i
 T 1/2t Lemma 22.5.1. Let y be a vector such that d T y = 1 and define the vector x by
zt zt
=2−2 .
z T0 z 0 x (a) = max(0, y (a) − ϵ).

To finish the proof, let R =


zT
t zt
, and note that for all R Then, x T Dx ≥ y T Dy − 2ϵ.
zT
0 z0

R1/2t = exp(− ln(1/R)/2t) ≥ 1 − ln(1/R)/2t. Proof. We observe that for every number y and ϵ,

So, max(0, y − ϵ)2 ≥ y 2 − 2yϵ :


 1/2t  
z Tt z t 1 z T0 z 0
2−2 ≤ 2 − 2 (1 − ln(1/R)/2t) = ln(1/R)/t = ln . If y ≤ ϵ then y 2 − 2yϵ < 0, and for y > ϵ, (y − ϵ)2 = y 2 − 2y + ϵ2 . Thus,
z T0 z 0 t z Tt z t X
x T Dx = d (a) max(0, y (a) − ϵ)2
a
X
For a vertex a that is good for S, ≥ d (a)y (a)2 − 2y (a)ϵ
a
p Tt D −1 p t 1 d (a) 1 X
≥ ≥ ≥ ; = y T Dy − 2ϵ d (a)y (a)
p T0 D −1 p 0 4d (S)p 0 D −1 p 0 4d (S) 8 |S| a
so = y T Dy − 2ϵ.
p Tt D −1 LD −1 p t ln(8 |S|)
≤ ≤ 2ϕ ln(8 |S|).
p Tt D −1 p t t
CHAPTER 22. LOCAL GRAPH CLUSTERING 185

If a is good for S and p 0 = δ a , then

x T Dx ≥ y T Dy − 1/8d (S) ≥ y T Dy /2.

Moreover, as shifting y and rounding entries to zero can not increase the length of any edge,

x T Lx ≤ y T Ly .
Chapter 23
Together these imply
x T Lx y T Ly
T
≤2 T ≤ 4 ln(8 |S|)ϕ.
x Dx y Dy
As y is balanced with respect to d , we may apply Cheeger’s inequality to obtain a set T of Spectral Partitioning in a Stochastic
conductance at most p
8 ln(8 |S|)ϕ. Block Model
22.6 Notes
In this chapter, show how eigenvectors can be used to partition graphs drawn from certain
Explain where these come from, and give some references to where they are used in practice. natural models. These are called stochastic block models or a planted partition model, depending
on community and application.
The simplest model of this form is for the graph bisection problem. This is the problem of
partitioning the vertices of a graph into two equal-sized sets while minimizing the number of
edges bridging the sets. To create an instance of the planted bisection problem, we first choose a
partition of the vertices into equal-sized sets X and Y . When then choose probabilities p > q, and
place edges between vertices with the following probabilities:


p if u ∈ X and v ∈ X
Pr [(u, v) ∈ E] = p if u ∈ Y and v ∈ Y


q otherwise.

The expected number of edges crossing between X and Y will be q |X| |Y |. If p is sufficiently

larger than q, for example if p = 1/2 and q = p − 24/ n, we will show that the partition can be
approximately recovered from the second eigenvector of the adjacency matrix of the graph. The
result, of course, extends to over values of p and q. This will be a crude version of an analysis of
McSherry [McS01].
If p is too close to q, then the partition given by X and Y will not be the smallest. For example,

if q = p − ϵ/ n for small ϵ then one cannot hope to distinguish between X and Y .
McSherry analyzed more general models than this, including planted coloring problems, and
sharp results have been obtained in a rich line of work. See, for example,
[MNS14, DKMZ11, BLM15, Mas14, Vu14].
McSherry’s analysis treats the adjacency matrix of the generated graph as a perturbation of one
ideal probability matrix. In the probability matrix the second eigenvector provides a clean
partition of the two blocks. McSherry shows that the difference between the generated matrix and

186
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 187 CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 188

the ideal one is small, and so the generated matrix can be viewed as a small perturbation of the c and M are the same, and so are those of A
Note that the eigenvectors of M b and A. So
idea one. He then uses matrix perturbation theory to show that the second eigenvector of the c and A
considering M b won’t change our analysis at all. The matrix A
b is convenient because it has
generated matrix will probably be close to the second eigenvector of the original, and so it reveals rank 2. We now consider the difference between Mc and A: b
the partition. The idea of using perturbation theory to analyze random objects generated from c −A
b = M − A.
R=M
nice models has been very powerful.
Warning: stochastic block models have been the focus of a lot of research lately, and there are b Of course, the constant
To see why this would be useful, let’s look at the eigenvectors of A.
now very good algorithms for solving problems on graphs generated from these models. But, b We have
vectors are eigenvectors of A.
these are just models and very little real data resembles that produced by these models. So, there
b = n (p + q)1,
A1
is no reason to believe that algorithms that are optimized for these models will be useful in 2
practice. Nevertheless, some of them are. and so the corresponding eigenvalue is
def n
α1 = (p + q).
2
23.1 The Perturbation Approach
The second eigenvector of A b has two values: one on X and one on Y . Let’s be careful to make
this a unit vector. We take ( 1
As long as we don’t tell our algorithm, we can choose X = {1, . . . , n/2} and
√ a∈X
Y = {n/2 + 1, . . . , n}. Let’s do this for simplicity. ϕ2 (a) = n
− √1n a ∈ Y.
Define the matrix
  Then,
0 p ··· p p q q ··· q q b 2 = n (p − q)ϕ2 ,

p 0 ··· p p q q ··· q q 2
 
 .. ..  and the corresponding eigenvalue is
. . n
def
  α2 = (p − q).
p p ··· 0 p q q ··· q q 2
   
p ··· ··· q q
A=
p p 0 q q  = pJ n/2 qJ n/2 − pI n , b has rank 2, all the other eigenvalues of A
As A b are zero.
q q ··· q q 0 p ··· p p  qJ n/2 pJ n/2

q q ··· q q p 0 ··· p p For (a, b) in the same component,
 
. .. 
 .. . Pr [R(a, b) = 1 − p] = p and
 
q q ··· q q p p ··· 0 p Pr [R(a, b) = −p] = 1 − p,
q q ··· q q p p ··· p 0
and for (a, b) in different components,
where we write J n/2 for the square all-1s matrix of size n/2. Pr [R(a, b) = 1 − q] = q and
The adjacency matrix of the planted partition graph is obtained by setting M (a, b) = 1 with Pr [R(a, b) = −q] = 1 − q.
probability A(a, b), subject to M (a, b) = M (b, a) and M (a, a) = 0. So, this is a random graph,
but the probabilities of some edges are different from others. We can use bounds similar to that proved in Chapter 8 to show that it is unlikely that R has
large norm. The bounds that we proved on the norm of a matrix in which entries are chosen from
We will study a very simple algorithm for finding an approximation of the planted bisection:
{1 − p, −p} applies equally well if each entry (a, b) is chosen from {1 − qa,b , −qa,b } as long as
compute ψ 2 , the eigenvector of the second-largest eigenvalue of M . Then, set
qa,b < p and all have expectation 0, because (8.2) still applies. For a sharp result, we appeal to a
S = {a : ψ 2 (a) < 0}. We guess that S is one of the sets in the bisection. We will show that under
theorem of Vu [Vu07, Theorem 1.4], which implies the following.
reasonable conditions on p and q, S will be mostly right. For example, we might consider p = 1/2

and q = 1/2 − 12/ n. Intuitively, the reason this works is that M is a slight perturbation of A, Theorem 23.1.1. There exist constants c1 and c2 such that with probability approaching 1,
and so the eigenvectors of M should look like the eigenvectors of A. p
∥R∥ ≤ 2 p(1 − p)n + c1 (p(1 − p)n)1/4 ln n,
To simplify some formulas, we henceforth work with
provided that
c def b def ln4 n
M = M + pI and A = A + pI p ≥ c2 .
n
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 189 CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 190

We use a crude corollary of this result. and let θ be the angle between them. For every vertex a that is misclassified by ψ 2 , we have
|δ(a)| ≥ √1n . So, if ψ 2 misclassifies k vertices, then
Corollary 23.1.2. There exists a constant c0 such that with probability approaching 1,
r
√ k
∥R∥ ≤ 3 pn, ∥δ∥ ≥ .
n
provided that
As ϕ2 and ψ 2 are unit vectors, we may apply the crude inequality
ln4 n
p ≥ c0 . √
n ∥δ∥ ≤ 2 sin θ

In fact, Alon, Krivelevich and Vu [AKV02] prove that the probability that the norm of R exceeds (the 2 disappears as θ gets small).
this value by more than t is exponentially small in t. However, we will not need that fact for this
lecture. To combine this with the perturbation bound, we assume q > p/3, and find
n
min |α2 − αj | = (p − q).
j̸=2 2
23.2 Perturbation Theory for Eigenvectors √
Assuming that ∥R∥ ≤ 3 pn, we find
c , and let α1 > α2 > 0 = α3 = · · · = αn be the
Let µ1 ≥ µ2 ≥ · · · ≥ µn be the eigenvalues of M √ √
2 ∥R∥ 2 · 3 pn 12 p
b Weyl’s inequality, which one can prove using the Courant-Fischer theorem, says
eigenvalues of A. sin θ ≤ n ≤ n =√ .
2 (p− q) 2 (p − q) n(p − q)
that
|µi − αi | ≤ ∥R∥ . (23.1) So, the number k of misclassified vertices satisfies
r √ √
So, we can view µ2 as a perturbation of α2 . We need a stronger fact, which is that we can view k 2 · 12 p
≤√ ,
ψ 2 as a perturbation of ϕ2 . n n(p − q)
The Davis-Kahan theorem [DK70] says that ψ 2 will be close to ϕ2 , in angle, if the norm of R is which implies
b That is, the
significantly less than the distance between α2 and the other eigenvalues of A. 288p
k≤ .
eigenvector does not move too much if its corresponding eigenvalue is isolated. (p − q)2
So, we expect to misclassify at most a constant number of vertices if p and q remain constant as n
Theorem 23.2.1. Let A and B be symmetric matrices. Let R = A − B. Let α1 ≥ · · · ≥ αn be √
grows large. An interesting case to consider is p = 1/2 and q = p − 24/ n. This gives
the eigenvalues of A with corresponding eigenvectors ϕ1 , . . . , ϕn and let Let β1 ≥ · · · ≥ βn be the
eigenvalues of B with corresponding eigenvectors ψ 1 , . . . , ψ n . Let θi be the angle between ±ψ i 288p n
and ±ϕi . Then, = ,
(p − q)2 4
2 ∥R∥
sin 2θi ≤ .
minj̸=i |αi − αj | so we expect to misclassify at most a constant fraction of the vertices. Of course, once one gets
most of the vertices correct is should be possible to use them to better classify the rest. Many of
The angle is never more than π/2, because this theorem is bounding the angle between the the advances in the study of algorithms for this problem involve better and more rigorous ways of
eigenspaces rather than a particular choice of eigenvectors. We will prove and use a slightly doing this.
weaker statement in which we replace 2θ with θ.

23.4 Proof of the Davis-Kahan Theorem


23.3 Partitioning
For simplicity, we will prove a statement that is weaker by a factor of 2.
Consider
δ = ψ 2 − ϕ2 , Proof of a weak version of Theorem 23.2.1. By considering the matrices A − αi I and B − αi I
instead of A and B, we can assume that αi = 0. As the theorem is vacuous if αi has multiplicity
CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 191 CHAPTER 23. SPECTRAL PARTITIONING IN A STOCHASTIC BLOCK MODEL 192

more than 1, we may also assume that αi has multiplicity 1 as an eigenvalue, and that ψ i is a 23.5 Further Reading
unit vector in the nullspace of B.
Our assumption that αi = 0 also leads to |βi | ≤ ∥R∥ by Weyl’s inequality (23.1). If you would like to know more about bounding norms and eigenvalues of random matrices, I
recommend [Ver10] and [Tro12].
Expand ψ i in the eigenbasis of A, as
X
ψi = cj ϕj , where cj = ϕTj ψ i .
j

Setting
δ = min |αj | ,
j̸=i

we compute
X
∥Aψ i ∥2 = c2j αj2
j
X
≥ c2j δ 2
j̸=i
X
= δ2 c2j
j̸=i

= δ 2 (1 − c2i )
= δ 2 sin2 θi .

On the other hand,

∥Aψ i ∥ = ∥(B + R)ψ i ∥ ≤ ∥Bψ i ∥ + ∥Rψ i ∥ = βi + ∥Rψ i ∥ ≤ 2 ∥R∥ .

So,
2 ∥R∥
sin θi ≤ .
δ

It may seem surprising that the amount by which eigenvectors move depends upon how close
their respective eigenvalues are to the other eigenvalues. However, this dependence is necessary.
To see why, first consider the matrices
   
1+ϵ 0 1 0
and .
0 1 0 1+ϵ
   
1 0
While these two matrices are very close, their leading eigenvectors are and , which are
0 1
90 degrees from each other.
The heart of the problem is that there is no unique eigenvector of an eigenvalue that has
multiplicity greater than 1.
CHAPTER 24. NODAL DOMAINS 194

0.4 0.4
v2 v10
v3
v4

Value in Eigenvector

Value in Eigenvector
0.2 0.2

0.0 0.0

Chapter 24 -0.2 -0.2

-0.4 -0.4

2 4 6 8 10 2 4 6 8 10

Nodal Domains Vertex Number Vertex Number

This remains true even when the edges of the path graphs have weights. Here are the analogous
plots for a path graph with edge weights randomly chosen in [0, 1]:

24.1 Overview
0.4 v2 v11
v3
v4 0.50

The goal of this section is to rigorously explain some of the behavior we observed when using

Value in Eigenvector

Value in Eigenvector
0.2

eigenvectors to draw graphs in the introduction First, recall some of the drawings we made of 0.25

graphs: 0.0

0.00

-0.2

-0.25

-0.4

-0.50
2 4 6 8 10 2 4 6 8 10
Vertex Number Vertex Number

Here are the first few eigenvectors of another:

These images were formed by computing the eigenvectors corresponding to the second and third
smallest eigenvalues of the Laplacian for each graph, ψ 2 and ψ 3 , and then using ψ 2 to assign a
horizontal coordinate to each vertex and ψ 3 to assign a vertical coordinate. The edges are drawn
as straight lines between their endpoints
We will show that the subgraphs obtained in the right and left halves of each image are connected.
Path graphs exhibited more interesting behavior: their kth eigenvector changes sign k times:

193
CHAPTER 24. NODAL DOMAINS 195 CHAPTER 24. NODAL DOMAINS 196

Theorem 24.2.2 (Sylvester’s Law of Inertia). Let M be any symmetric matrix and let B be any
v2 non-singular matrix. Then, the matrix BM B T has the same number of positive, negative and
v3
0.50 v4 zero eigenvalues as M .
Value in Eigenvector

0.25 Note that if the matrix B were orthogonal, or if we used B −1 in place of B T , then these matrices
would have the same eigenvalues. What we are doing here is different, and corresponds to a
0.00 change of variables.

-0.25 Proof. It is clear that M and BM B T have the same rank, and thus the same number of zero
eigenvalues.
-0.50
We will prove that M has at least as many positive eigenvalues as BM B T . One can similarly
prove that that M has at least as many negative eigenvalues, which proves the theorem.
2 4 6 8 10
Vertex Number Let γ1 ≥ . . . ≥ γk be the positive eigenvalues of BM B T and let Yk be the span of the
corresponding eigenvectors. Now, let Sk be the span of the vectors B T y , for y ∈ Yk . As B is
Random.seed!(1)
non-singluar, Sk has dimension k. Let α1 ≥ · · · ≥ αn be the eigenvalues of M . By the
M = spdiagm(1=>rand(10))
Courant-Fischer Theorem, we have
M = M + M’
L = lap(M) xTM x xTM x y T BM B T y γk y T y
E = eigen(Matrix(L)) αk = max min ≥ min = min ≥ T > 0.
S⊆IRn x ∈S xTx x ∈Sk x T x y ∈Yk y T BB T y y BB T y
Plots.plot(E.vectors[:,2],label="v2",marker = 5) dim(S)=k
Plots.plot!(E.vectors[:,3],label="v3",marker = 5)
Plots.plot!(E.vectors[:,4],label="v4",marker = 5) So, M has at least k positive eigenvalues (The point here is that the denominators are always
xlabel!("Vertex Number") positive, so we only need to think about the numerators.)
ylabel!("Value in Eigenvector") To finish, either apply the symmetric argument to the negative eigenvalues, or apply the same
savefig("rpath2v24.pdf") argument with B −1 to reverse the roles of A and BAB T .

We see that the kth eigenvector still changes sign k times. We will prove that this always
happens. These are some of Fiedler’s theorems about “nodal domains”. Nodal domains are the 24.3 Weighted Trees
connected parts of a graph on which an eigenvector is negative or positive.
We will now a slight simplification of a theorem of Fiedler [Fie75a].

24.2 Sylvester’s Law of Inertia Theorem 24.3.1. Let T be a weighted tree graph on n vertices, let LT have eigenvalues
0 = λ1 < λ2 · · · ≤ λn , and let ψ k be an eigenvector of λk . If there is no vertex a for which
ψ k (a) = 0, then there are exactly k − 1 edges (a, b) for which ψ k (a)ψ k (b) < 0.
Let’s begin with something obvious.

Claim 24.2.1. If A is positive semidefinite, then so is B T AB for every matrix B. One can extend this theorem to accommodate zero entries. We will just prove this theorem for
weighted path graphs, which are a special case of weighted trees. At the beginning of this section,
Proof. For every vector x , we plotted the eigenvectors of some weighted paths by using the index of a vertex along the path
x T B T ABx = (Bx )T A(Bx ) ≥ 0, as the horizontal coordinate, and the value of the eigenvector at that vertex as the vertical
coordinate. When we draw the edges as straight lines, the number of sign changes equals the
since A is positive semidefinite.
number of times the plot crosses the horizontal axis.

We will make use of Sylvester’s Law of Inertia, which is a powerful generalization of this fact. I Our analysis will rest on an understanding of Laplacians of paths that are allowed to have
will state and prove it now. negative edges weights.
CHAPTER 24. NODAL DOMAINS 197 CHAPTER 24. NODAL DOMAINS 198

Lemma 24.3.2. Let M be the Laplacian matrix of a weighted path that can have negative edge I claim that X
weights: X X = wa,b ψ k (a)ψ k (b)La,b .
M = wa,a+1 La,a+1 , (a,b)∈E
1≤a<n
To see this, first check that this agrees with the previous definition on the off-diagonal entries. To
where the weights wa,a+1 are non-zero and we recall that La,b is the Laplacian of the edge (a, b). verify that these expressions agree on the diagonal entries, we will show that the sum of the
The number of negative eigenvalues of M equals the number of negative edge weights. entries in each row of both expressions agree. In fact, they are zero. As we know that all the
off-diagonal entries agree, this will imply that the diagonal entries agree. We compute
Note that this is also true for weighted trees.
Ψk (LP − λk I )Ψk 1 = Ψk (LP − λk I )ψ k = Ψk (λk ψ k − λk ψ k ) = 0.
Proof. Note that X As Lu,v 1 = 0 and X 1 = 0, the row sums agree. Lemma 24.3.2 now tells us that the matrix X ,
xTM x = wu,v (x (u) − x (v))2 . and thus LP − λk I , has as many negative eigenvalues as there are edges (a, b) for which
(u,v)∈E ψ k (a)ψ k (b) < 0.
We now perform a change of variables that will diagonalize the matrix M . Let δ(1) = x (1), and
for every a > 1 let δ(a) = x (a) − x (a − 1).
24.4 The Perron-Frobenius Theorem for Laplacians
Every variable x (1), . . . , x (n) can be expressed as a linear combination of the variables
δ(1), . . . , δ(n). In particular,
In Theorem 4.5.1 we stated the Perron-Frobenius Theorem for non-negative matrices. I wish to
x (a) = δ(1) + δ(2) + · · · + δ(a). quickly observe that this theory may also be applied to Laplacian matrices, to principal
sub-matrices of Laplacian matrices, and to any matrix with non-positive off-diagonal entries. The
So, there is a square matrix B of full rank such that difference is that it then involves the eigenvector of the smallest eigenvalue, rather than the
largest eigenvalue.
x = Bδ.
Corollary 24.4.1. Let M be a matrix with non-positive off-diagonal entries, such that the graph
By Sylvester’s law of inertia, we know that of the non-zero off-diagonally entries is connected. Let λ1 be the smallest eigenvalue of M and let
v 1 be the corresponding eigenvector. Then v 1 may be taken to be strictly positive, and λ1 has
BT M B multiplicity 1.
has the same number of positive, negative, and zero eigenvalues as M . On the other hand,
Proof. Consider the matrix B = σI − M , for some large σ. For σ sufficiently large, this matrix
X
δ T B T M Bδ = wa,a+1 (δ(a + 1))2 . will be non-negative, and the graph of its non-zero entries is connected. So, we may apply the
1≤a<n Perron-Frobenius theory to B to conclude that its largest eigenvalue α1 has multiplicity 1, and
the corresponding eigenvector v 1 may be assumed to be strictly positive. We then have
So, this matrix clearly has one zero eigenvalue, and as many negative eigenvalues as there are λ1 = σ − α1 , and v 1 is an eigenvector of λ1 .
negative wa,a+1 .

Proof of Theorem 24.3.1. We assume that λk has multiplicity 1. One can prove it, but 24.5 Fiedler’s Nodal Domain Theorem
we will skip it.
Let Ψk denote the diagonal matrix with ψ k on the diagonal, and let λk be the corresponding Given a graph G = (V, E) and a subset of vertices, W ⊆ V , recall that the graph induced by G on
eigenvalue. Consider the matrix W is the graph with vertex set W and edge set
X = Ψk (LP − λk I )Ψk .
{(i, j) ∈ E, i ∈ W and j ∈ W } .
The matrix LP − λk I has one zero eigenvalue and k − 1 negative eigenvalues. As we have
assumed that ψ k has no zero entries, Ψk is non-singular, and so we may apply Sylvester’s Law of This graph is sometimes denoted G(W ).
Intertia to show that the same is true of X .
CHAPTER 24. NODAL DOMAINS 199 CHAPTER 24. NODAL DOMAINS 200

Theorem 24.5.1 ([Fie75b]). Let G = (V, E, w) be a weighted connected graph, and let L be its and so the smallest eigenvalue of B i is less than λk . On the other hand, if x i has any zero entries,
Laplacian matrix. Let 0 = λ1 < λ2 ≤ · · · ≤ λn be the eigenvalues of LG and let ψ 1 , . . . , ψ n be the then the Perron-Frobenius theorem tells us that x i cannot be the eigenvector of B i of smallest
corresponding eigenvectors. For any k ≥ 2 and any t ≤ 0, let eigenvalue, and so the smallest eigenvalue of B i is less than λk . Thus, the matrix
 
Wk = {i ∈ V : ψ k (i) ≥ t} . B1 0 · · · 0
 0 B2 · · · 0 
Then, the graph induced by G on Wk has at most k − 1 connected components.  
 .. .. .. .. 
 . . . . 
Proof. We prove this theorem for the case that t = 0. Some additional work is needed to handle 0 0 ··· Bc
the general case.
has at least c eigenvalues less than λk . By the eigenvalue interlacing theorem, this implies that L
To see that Wk is non-empty, recall that ψ 1 = 1 and that ψ k is orthogonal ψ 1 . So, ψ k must have has at least c eigenvalues less than λk . We may conclude that c, the number of connected
both positive and negative entries. components of G(Wk ), is at most k − 1.
Assume that G(Wk ) has c connected components. After re-ordering the vertices so that the
vertices in each connected component of G(Wk ) are contiguous, we may assume that L and ψ k This theorem breaks down if we instead consider the set
have the forms    
B1 0 0 ··· C1 x1 W = {i : ψ k (i) > 0} .
 0 B 2 0 · · · C 2  x 2 
   The star graphs provide counter-examples.
 ..  ψ =  ..  ,
L =  ... ..
.
..
.
..
. .  k  . 
   
 0 0 · · · B c C c x c  1
T T T
C1 C2 ··· Cc D y
0 −3
and     
B1 0 0 ··· C1 x1 x1 1
 0 B2 0 ···   
C 2  x 2  x 2 
  
 .. .. .. .. ..   ..  = λ  ..  .
 . . . . .    k . 
  .   
 0 0 ··· Bc C c  x c  x c  1
C T1 C T2 ··· C Tc D y y
The first c sets of rows and columns correspond to the c connected components. So, x i ≥ t for
1 ≤ i ≤ c and y < t ≤ 0 (when I write this for a vector, I mean it holds for each entry). We also Figure 24.1: The star graph on 5 vertices, with an eigenvector of λ2 = 1.
know that the graph of non-zero entries in each B i is connected, and that each C i is non-positive
and has at least one negative entry (otherwise the graph G would be disconnected).
We will now prove that the smallest eigenvalue of each B i is smaller than λk . We know that
B i x i + C i y = λk x i .
As each entry in C i is non-positive and y is strictly negative, each entry of C i y is non-negative.
As each C i has at least one negative entry, some entry of C i y is positive. This implies that x i
cannot be the zero vector. As we assumed that x i ≥ t = 0, we can multiply the equation
B i x i = λk x i − C i y ≤ λk x i
by x i to get
x Ti B i x i ≤ λk x Ti x i .
If x i is strictly positive, then x Ti C i y > 0, and this inequality is strict:
x Ti B i x i = λk x Ti x i − x Ti C i y < λk x Ti x i ,
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 202

25.2 Geometric Embeddings

We typically upper bound λ2 by evidencing a test vector. Here, we will upper bound λ2 by
evidencing a test embedding. The bound we apply is:

Lemma 25.2.1. For any d ≥ 1,


Chapter 25 P
∥v i − v j ∥2
(i,j)∈E
λ2 = min P P 2 (25.1)
v 1 ,...,v n ∈IRd : v i =0 i ∥v i ∥ .

The Second Eigenvalue of Planar Proof. Let v i = (xi , yi , . . . , zi ). We note that


X X X X
Graphs ∥v i − v j ∥2 =
(i,j)∈E (i,j)∈E
(xi − xj )2 + (yi − yj )2 + · · · +
(i,j)∈E
(zi − zj )2 .
(i,j)∈E

Similarly, X X X X
∥v i ∥2 = x2i + yi2 + · · · + zi2 .
This Chapter Needs Editing i i i i

It is now trivial to show that λ2 ≥ RHS: just let xi = yi = · · · = zi be given by an eigenvector of


λ2 . To show that λ2 ≤ RHS, we apply  my favorite inequality:
P
A+B+···+C A B C
25.1 Overview A′ +B ′ +···+C ′ ≥ min A′ , B ′ , . . . , C ′ , and then recall that xi = 0 implies
P
(i,j)∈E (xi − xj )2
Spectral Graph theory first came to the attention of many because of the success of using the P 2 ≥ λ2 .
second Laplacian eigenvector to partition planar graphs and scientific meshes i xi
[DH72, DH73, Bar82, PSL90, Sim91].
In this lecture, we will attempt to explain this success by proving, at least for planar graphs, that
the second smallest Laplacian eigenvalue is small. One can then use Cheeger’s inequality to prove For an example, consider the natural embedding of the square with corners (±1, ±1).
that the corresponding eigenvector provides a good cut.
The key to applying this embedding lemma is to obtain the right embedding of a planar graph.
This was already known for the model case of a 2-dimensional grid. If the grid is of size Usually, the right embedding of a planar graph is given by Koebe’s embedding theorem, which I
√ √
n-by- n, then it has λ2 ≈ c/n. Cheeger’s inequality then tells us that it has a cut of will now explain. I begin by considering one way of generating planar graphs. Consider a set of

conductance c/ n. And, this is in fact the cut that goes right accross the middle of one of the circles {C1 , . . . , Cn } in the plane such that no pair of circles intersects in their interiors. Associate
axes, which is the cut of minimum conductance. a vertex with each circle, and create an edge between each pair of circles that meet at a boundary.
See Figure 25.2. The resulting graph is clearly planar. Koebe’s embedding theorem says that
Theorem 25.1.1 ([ST07]). Let G be a planar graph with n vertices of maximum degree d, and let
every planar graph results from such an embedding.
λ2 be the second-smallest eigenvalue of its Laplacian. Then,
Theorem 25.2.2 (Koebe). Let G = (V, E) be a planar graph. Then there exists a set of circles
8d
λ2 ≤ . {C1 , . . . , Cn } in IR2 that are interior-disjoint such that circle Ci touches circle Cj if and only if
n (i, j) ∈ E.
The proof will involve almost no calculation, but will use some special properties of planar
graphs. However, this proof has been generalized to many planar-like graphs, including the This is an amazing theorem, which I won’t prove today. You can find a beautiful proof in the
graphs of well-shaped 3d meshes. book “Combinatorial Geometry” by Agarwal and Pach.
Such an embedding is often called a kissing disk embedding of the graph. From a kissing disk
embedding, we obtain a natural choice of v i : the center of disk Ci . Let ri denote the radius of

201
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 203 CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 204

(a) Circles in the plane (b) Circles with their


intersection graph

Figure 25.1: Stereographic Projection.


this disk. We now have an easy upper bound on the numerator of (25.1):
∥v i − v j ∥2 = (ri + rj )2 ≤ 2ri2 + 2rj2 . On the other hand, it is trickier to obtain a lower bound on
P
∥v i ∥2 . In fact, there are graphs whose kissing disk embeddings result in

(25.1) = Θ(1).

These graphs come from triangles inside triangles inside triangles. . . Such a graph is depicted
below:

Graph

Figure 25.2: Stereographic Projection.

Discs P
If we had i v i = 0, the rest of the computation would be easy. For each i, ∥v i ∥ = 1, so the
denominator of (25.1) is n. Let ri denote the straight-line distance from v i to the boundary of
Di . We then have (see Figure 25.2)
∥v i − v j ∥2 ≤ (ri + rj )2 ≤ 2ri2 + 2rj2 .

We will fix this problem by lifting the planar embeddings to the sphere by stereographic P
So, the numerator of (25.1) is at most 2d i ri2 . On the other hand, a theorem of Archimedes
projection. Given a plane, IR2 , and a sphere S tangent to the plane, we can define the tells us that the area of the cap encircled by Di is at exactly πri2 . Rather than proving it, I will
stereographic projection map, Π, from the plane to the sphere as follows: let s denote the point convince you that it has
where the sphere touches the plane, and let n denote the opposite point on the sphere. For any √ to be true because it is true when ri is small, it is true when the cap is a
hemisphere and ri = 2, and it is true when the cap is the whole sphere and ri = 2.
point x on the plane, consider the line from x to n. It will intersect the sphere somewhere. We
let this point of intersection be Π(x ). As the caps are disjoint, we have X
πri2 ≤ 4π,
The fundamental fact that we will exploit about stereographic projection is that it maps circles to i
circles! So, by applying stereographic projection to a kissing disk embedding of a graph in the
which implies that the numerator of (25.1) is at most
plane, we obtain a kissing disk embedding of that graph on the sphere. Let Di = Π(Ci ) denote X X
the image of circle Ci on the sphere. We will now let v i denote the center of Di , on the sphere. ∥v a − v b ∥2 ≤ 2ra2 + 2rb2 ≤ 2d ra2 ≤ 8d.
(a,b)∈E a
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 205 CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 206

Instead of proving that we can achieve (25.2), I will prove a slightly simpler theorem. The proof
of the theorem we really want is similar, but about just a few minutes too long for class. We will
prove

Theorem 25.3.1. Let v 1 , . . . , v n be points on the unit-sphere. Then, there exists an ω such that
P
i fω (v i ) = 0.

The reason that this theorem is different from the one that we want to prove is that if we apply a
circle-preserving map from the sphere to itself, the center of the circle might not map to the
center of the image circle.
P
To show that we can achieve i v i = 0, we will use the following topological lemma, which
Figure 25.3: A Spherical Cap. follows immediately from Brouwer’s fixed point theorem. In the following, we let B denote the
ball of points of norm less than 1, and S the sphere of points of norm 1.

Lemma 25.3.2. If ϕ : B → B be a continuous map that is the identity on S. Then, there exists
Putting these inequalities together, we see that an ω ∈ B such that
P ϕ(ω) = 0.
(i,j)∈E ∥v i − v j ∥2 8d
min P P 2 ≤ .
v 1 ,...,v n ∈IRd : v i =0 i ∥v i ∥ .
n
We will prove this lemma using Brouwer’s fixed point theorem:
Thus, we merely need to verify that we can ensure that Theorem 25.3.3 (Brouwer). If g : B → B is continuous, then there exists an α ∈ B such that
X g(α) = α.
v i = 0. (25.2)
i
Proof of Lemma 25.3.2. Let b be the map that sends z ∈ B to z / ∥z ∥. The map b is continuous
Note that there is enough freedom in our construction to believe that we could prove such a at every point other than 0. Now, assume by way of contradiction that 0 is not in the image of ϕ,
thing: we can put the sphere anywhere on the plane, and we could even scale the image in the and let g(z ) = −b(ϕ(z )). By our assumption, g is continuous and maps B to B. However, it is
plane before placing the sphere. By carefully combining these two operations, it is clear that we clear that g has no fixed point, contradicting Brouwer’s fixed point theorem.
can place the center of gravity of the v i s close to any point on the boundary of the sphere. It
turns out that this is sufficient to prove that we can place it at the origin.
Lemma 25.3.2, was our motivation for defining the maps fω in terms of ω ∈ B. Now consider
setting
1X
25.3 The center of gravity ϕ(ω) = fω (v i ).
n
i

The only thing that stops us from applying Lemma 25.3.2 at this point is that ϕ is not defined on
We need a nice family of maps that transform our kissing disk embedding on the sphere. It is
S, because fω was not defined for ω ∈ S. To fix this, we define for α ∈ S
particularly convenient to parameterize these by a point ω inside the sphere. For any point α on
the surface of the unit sphere, I will let Πα denote the stereographic projection from the plane (
α if z ̸= −α
tangent to the sphere at α. fα (z ) =
−α otherwise.
I will also define Π−1 −1
α . To handle the point −α, I let Πα (−α) = ∞, and Πα (∞) = −α. We also
define the map that dilates the plane tangent to the sphere at α by a factor a: Dαa . We then We then encounter the problem that fα (z ) is not a continuous function of α because it is
define the following map from the sphere to itself discontinuous at α = −v i . But, this shouldn’t be a problem because the point ω at which
   ϕ(ω) = 0 won’t be on or near the boundary. The following argument makes this intuition formal.
def 1−∥ω∥
fω (x ) = Πω/∥ω∥ Dω/∥ω∥ Π−1 ω/∥ω∥ (x ) .
We set (
For α ∈ S and ω = aα, this map pushes everything on the sphere to a point close to α. As a 1 if dist(ω, z ) < 2 − ϵ, and
hω (z ) =
approaches 1, the mass gets pushed closer and closer to α. (2 − dist(ω, z ))/ϵ otherwise.
CHAPTER 25. THE SECOND EIGENVALUE OF PLANAR GRAPHS 207

Now, the function fα (z )hα (z ) is continuous on all of B. So, we may set


P
def fω (v i )hω (v i )
iP
ϕ(ω) =
i hω (v i ),

which is now continuous and is the identity map on S.


So, for any ϵ > 0, we may now apply Lemma 25.3.2 to find an ω for which Chapter 26
ϕ(ω) = 0.

To finish the proof, we need to get rid of this ϵ. That is, we wish to show that ω is bounded away
from S, say by µ, for all sufficiently small ϵ. If that is the case, then we will have Planar Graphs 2, the Colin de
dist(ω, v i ) ≥ µ > 0 for all sufficiently small ϵ. So, for ϵ < µ and sufficiently small, hω (v i ) = 1 for
all i, and we recover the ϵ = 0 case. Verdière Number
One can verify that this holds provided that the points v i are distinct and there are at least 3 of
them.
Finally, recall that this is not exactly the theorem we wanted to prove: this theorem deals with This Chapter Needs Editing
v i , and not the centers of caps. The difficulty with centers of caps is that they move as the caps
move. However, this can be overcome by observing that the centers remain inside the caps, and
move continuously with ω. For a complete proof, see [ST07, Theorem 4.2] 26.1 Introduction

In this lecture, I will introduce the Colin de Verdière number of a graph, and sketch the proof that
25.4 Further progress
it is three for planar graphs. Along the way, I will recall two important facts about planar graphs:

This result has been improved in many ways. Jonathan Kelner [Kel06] generalized this result to
1. Three-connected planar graphs are the skeletons of three-dimensional convex polytopes.
graphs of bounded genus. Kelner, Lee, Price and Teng [KLPT09] obtained analogous bounds for
λk for k ≥ 2. Biswal, Lee and Rao [BLR10] developed an entirely new set of techniques to prove 2. Planar graphs are the graphs that do not have K5 or K3,3 minors.
these results. Their techniques improve these bounds, and extend them to graphs that do not
have Kh minors for any constant h.
26.2 Colin de Verdière invariant

The Colin de Verdière graph parameter essentially measures the maximum multiplicity of the
second eigenvalue of a generalized Laplacian matrix of the graph. It is less than or equal to three
precisely for planar graphs.
We say that M is a Generalized Laplacian Matrix of a graph G = (V, E) if M can be expressed as
M = L + D where L is a the Laplacian matrix of a weighted version of G and D is an arbitrary
diagonal matrix. That is, we impose the restrictions:

M (i, j) < 0 if (i, j) ∈ E


M (i, j) = 0 if (i, j) ̸∈ E and i ̸= j
M (i, i) is arbitrary.

208
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 209 CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 210

The Colin de Verdière graph parameter, which we denote cdv(G) is the maximum multiplicity of of a convex polytope are the line segments on the boundary of the polytope that go between two
the second-smallest eigenvalue of a Generalized Laplacian Matrix M of G satisfying the following vertices of the polytope and such that every point on the edge cannot be expressed non-trivially
condition, known as the Strong Arnold Property. as the convex hull of any vertices other than these two.
Theorem 26.3.1 (Steinitz’s Theorem). For every three-connected planar graph G = (V, E), there
For every non-zero n-by-n matrix X such that X(i, j) = 0 for i = j and (i, j) ∈ E, exists a set of vectors x 1 , . . . , x n ∈ IR3 such that the line segment from x i to x j is an edge of the
M X ̸= 0. convex hull of the vectors if and only if (i, j) ∈ E.

That is, every planar graph may be represented by the edges of a three-dimensional convex
That later restriction will be unnecessary for the results we will prove in this lecture.
polytope. We will use this representation to construct a Generalized Laplacian Matrix M whose
Colin de Verdière [dV90] proved that cdv(G) is at most 2 if and only if the graph G is second-smallest eigenvalue has multiplicity 3.
outerplanar. That is, it is a planar graph in which every vertex lies on one face. He also proved
that it is at most 3 if and only if G is planar. Lovàsz and Schrijver [LS98] proved that it is at
most 4 if and only if the graph is linkless embeddable. 26.4 The Colin de Verdière Matrix
In this lecture, I will sketch proofs from two parts of this work:
Let G = (V, E) be a planar graph, and let x 1 , . . . , x n ∈ IR3 be the vectors given by Steinitz’s
1. If G is a three-connected planar graph, then cdv(G) ≥ 3. Theorem. For 1 ≤ i ≤ 3, let v i ∈ IRn be the vector given by
v i (j) = x j (i).
2. If G is a three-connected planar graph, then cdv(G) ≤ 3.
So, the vector v i contains the ith coordinate of each vector x 1 , . . . , x n .
The first requires the construction of a matrix, which we do using the representation of the graph
We will now see how to construct a generalized Laplacian matrix M having the vectors v 1 , v 2 and
as a convex polytope. The second requires a proof that no Generalized Laplacian Matrix of the
v 3 in its nullspace. One can also show that the matrix M has precisely one negative eigenvalue.
graph has a second eigenvalue of high multiplicity. We prove this by using graph minors.
But, we won’t have time to do that in this lecture. You can find the details in [Lov01].
Our construction will exploit the vector cross product. Recall that for two vectors x and y in IR3
26.3 Polytopes and Planar Graphs that it is possible to define a vector x × y that is orthogonal to both x and y , and whose length
is the area of the parallelogram with sizes x and y . This determines the cross product up to sign.
You should recall that the sign is determined by an ordering of the basis of IR3 , or by the right
Let me begin by giving two definitions of convex polytope: as the convex hull of a set of points
hand rule. Also recall that
and as the intersection of half-spaces.
x × y = −y × x ,
Let x 1 , . . . , x n ∈ IRd (think d = 3). Then, the convex hull of x 1 , . . . , x n is the set of points
( ) (x 1 + x 2 ) × y = x 1 × y + x 2 × y , and
X X x × y = 0 if and only if x and y are parallel.
ai x i : ai = 1 and all ai ≥ 0 .
i
We will now specify the entries M (i, j) for (i, j) ∈ E. An edge (i, j) is on the boundary of two
Every convex polytope is the convex hull of its extreme vertices. faces of the polytope. Let’s say that the vectors defining these faces are y a and y b . So,
A convex polytope can also be defined by its faces. For example, given vectors y 1 , . . . , y l , the set y Ta x i = y Ta x j = y Tb x i = y Tb x j = 1.
of points 
x : y Ti x ≤ 1, for all i So,
(y a − y b )T x i = (y a − y b )T x j = 0.
is a convex polytope. Moreover, every convex polytope containing the origin in its interior can be
This implies that y a − y b is parallel to x i × x j .
described in this way. Each vector y i defines a face of the polytope consisting of those points x in
the polytope such that y Ti x = 1. Assume y a comes before y b in the clockwise order about vertex x i . So, y b − y a points the same
direction as x i × x j . Set M (i, j) so that
The vertices of a convex polytope are those points x in the polytope that cannot be expressed
non-trivially as a convex combination of any points other than themselves. The edges (or 1-faces) M (i, j)x i × x j = y a − y b
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 211 CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 212

and M (i, j) < 0. multiplicity greater than or equal to 4. We will do this by showing that if G is three-connected
and cdv(G) ≥ 4, then G contains a K3,3 minor. Without loss of generality, we can assume λ2 = 0
I will now show that we can choose the diagonal entries M (i, i) so that the coordinate vectors are
(by just adding a diagonal matrix).
in the nullspace of M . First, set X
x̂ i = M (i, j)x j . Our proof will exploit a variant of Fiedler’s Nodal Domain Theorem, which we proved back in the
j∼i beginning of the semester. That theorem considered any eigenvector v of λ2 (of a Laplacian), and
We will show that x̂ i is parallel to x i by observing that x̂ i × x i = 0. We compute proved that the set of vertices that are non-negative in v is connected. The variant we use is due
X X to van der Holst [van95], which instead applies to eigenvectors v of λ2 of minimal support. These
x i × x̂ i = x i × M (i, j)x j = M (i, j)x i × x j . are the eigenvectors of v of λ2 for which there is no other eigenvector w of λ2 such that the zeros
j∼i j∼i of v are a subset of the zeros of w . That is, v has as many zero entries as possible. One can then
This sum counts the difference y b − y a between each adjacent pair of faces that touch x i . By prove that the set of vertices that are positive in v is connected. And, one can of course do the
going around x i in counter-clockwise order, we see that each of these vectors occurs once same for the vertices that are negative.
positively and once negatively in the sum, so the sum is zero.
Now, let F be any face of G, and let a, b and c be three vertices in F . As λ2 has multiplicity at
Thus, x i and x̂ i are parallel, and we may set M (i, i) so that least 4, it has some eigenvector that is zero at each of a, b and c. Let v be an eigenvector of λ2
with minimal support that is zero at each of a, b, and c. Let d be any vertex for which v (d) > 0.
M (i, i)x i + x̂ i = 0. As the graph is three-connected, it contains three vertex-disjoint paths from d to a, b, and c (this
This implies that the coordinate vectors are in the nullspace of M , as follows from Menger’s Theorem, which I have not covered). As v (d) > 0 and v (a) = 0, there is
   some vertex a′ on the path from d to a for which v (a′ ) = 0 but a′ has a neighbor a+ for which
x1 v (a+ ) > 0. As λ2 = 0, a′ must also have a neighbor a− for which v (a− ) < 0. Construct similar
  x 2  X
   vertices for b and c.
M  ..  = M (i, i)x i + M (i, j)x j = M (i, i)x i + x̂ i .
  . 
j∼i
xn i b+
One can also show that the matrix M has precisely one negative eigenvalue, so the multiplicity of d a+ a−
its second-smallest eigenvalue is 3. a′

26.5 Minors of Planar Graphs a b’


b
I will now show you that cdv(G) ≤ 3 for every 3-connected planar graph G. To begin, I mention
one other characterization of planar graphs. b−
First, observe that if G is a planar graph, it remains planar when we remove an edge. Also
c+ c
observe that if (u, v) is an edge, then the graph obtained by contracting (u, v) to one vertex is
also planar. Any graph H that can be obtained by removing and contracting edges from a graph c′
G is called a minor of G. It is easy to show that every minor of a planar graph is also planar. c−
Kuratowski’s Theorem tells us that a graph is planar if and only if it does not have K5 or K3,3
(the complete bipartite graph between two sets of 3 vertices) as a minor. We will just use the fact
Figure 26.1: Vertices a, b, c, d, and the paths.
that a planar graph does not have K3,3 as a minor.

Now, contract every edge on the path from a to a′ , on the path from b to b′ and on the path from
26.6 cdv(G) ≤ 3 c to c′ . Also, contract all the vertices for which v is positive and contract all the vertices for
which v is negative (which we can do because these sets are connected). Finally, contract every
We will now prove that if G is a 3-connected planar graph, then cdv(G) ≤ 3. Assume, by way of edge in the face F that does not involve one of a, b, or c. We obtain a graph with a triangular
contradiction, that there is generalized Laplacian matrix M of G whose second eigenvalue λ2 has face abc such that each of a, b, and c have an edge to the positive supervertex and the negative
CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 213 CHAPTER 26. PLANAR GRAPHS 2, THE COLIN DE VERDIÈRE NUMBER 214

supervertex. We would like to say that this graph cannot be planar.

b+
d a+ a−
a′

a b’
b
f
b−
+
c c
+
′ a′
c
c−

Figure 26.2: The set of positive and negative vertices that will be contracted. Vertex f has been a b’
inserted. b
f

To do this, we add one additional vertex f inside the face and connected to each of a, b, and c. c
This does not violate planarity because a, b, and c were contained in a face. In fact, we can add f
before we do the contractions. By throwing away all other edges, we have constructed a K3,3
minor, so the graph cannot be planar.
c′

Figure 26.3: The edges in the cycle have been contracted, as have all the positive and negative
vertices. After contracting the paths between a and a′ , between b and b′ and between c and c′ , we
obtain a K3,3 minor.
Chapter 27

Properties of Expander Graphs

Part V This Chapter Needs Editing

27.1 Overview
Expander Graphs
We say that a d-regular graph is a good expander if all of its adjacency matrix eigenvalues are
small. To quantify this, we set a threshold ϵ > 0, and require that each adjacency matrix
eigenvalue, other than d, has absolute value at most ϵd. This is equivalent to requiring all
non-zero eigenvalues of the Laplacian to be within ϵd of d.
In this lecture, we will:

1. Show that this condition is equivalent to approximating the complete graph.


2. Prove that this condition implies that the number of edges between sets of vertices in the
graph is approximately the same as in a d-regular random graph.
3. Prove Tanner’s Theorem: that small sets of vertices have many neighbors.
4. Derive
√ the Alon-Boppana bound, which says that ϵ cannot be asymptotically smaller than
2 d − 1/d. This will tell us that the asymptotically best expanders are the Ramanujan
graphs.

Random d-regular graphs are expander graphs. Explicitly constructed expander graphs have
proved useful in a large number of algorithms and theorems. We will see some applications of
them next week.

27.2 Expanders as Approximations of the Complete Graph

One way of measuring how well two matrices A and B approximate each other is to measure the
operator norm of their difference: A − B. Since I consider the operator norm by default, I will

215 216
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 217 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 218

just refer to it as the norm. Recall that the norm of a matrix M is defined to be its largest 27.3 Quasi-Random Properties of Expanders
singular value:
∥M x ∥
∥M ∥ = max , There are many ways in which expander graphs act like random graphs. Conversely, one can prove
x ∥x ∥ that a random d-regular graph is an expander graph with reasonably high probability [Fri08].
where the norms in the fraction are the standard Euclidean vector norms. The norm of a
symmetric matrix is just the largest absolute value of one of its eigenvalues. It can be very We will see that all sets of vertices in an expander graph act like random sets of vertices. To make
different for a non symmetric matrix. this precise, imagine creating a random set S ⊂ V by including each vertex in S independently
with probability α. How many edges do we expect to find between vertices in S? Well, for every
For this lecture, we define an ϵ-expander to be a d-regular graph whose adjacency matrix edge (u, v), the probability that u ∈ S is α and the probability that v ∈ S is α, so the probability
eigenvalues satisfy |µi | ≤ ϵd for µi ≥ 2. As the Laplacian matrix eigenvalues are given by that both endpoints are in S is α2 . So, we expect an α2 fraction of the edges to go between
λi = d − µi , this is equivalent to |d − λi | ≤ ϵd for i ≥ 2. It is also equivalent to vertices in S. We will show that this is true for all sufficiently large sets S in an expander.

∥LG − (d/n)LKn ∥ ≤ ϵd. In fact, we will prove a stronger version of this statement for two sets S and T . Imagine including
each vertex in S independently with probability α and each vertex in T with probability β. We
allow vertices to belong to both S and T . For how many ordered pairs (u, v) ∈ E do we expect to
For this lecture, I define a graph G to be an ϵ-approximation of a graph H if
have u ∈ S and v ∈ T ? Obviously, it should hold for an αβ fraction of the pairs.
(1 − ϵ)H ≼ G ≼ (1 + ϵ)H, For a graph G = (V, E), define

where I recall that I say H ≼ G if for all x ⃗


E(S, T ) = {(u, v) : u ∈ S, v ∈ T, (u, v) ∈ E} .
T T
x LH x ≤ x LG x . We have put the arrow above the E in the definition, because we are considering ordered pairs of
vertices. When S and T are disjoint
I warn you that this definition is not symmetric. When I require a symmetric definition, I usually ⃗
E(S, T)
use the condition (1 + ϵ)−1 H ≼ G instead of (1 − ϵ)H ≼ G.
If G is an ϵ-expander, then for all x ∈ IRV that are orthogonal to the constant vectors, is precisely the number of edges between S and T , while


E(S, S)
(1 − ϵ)dx T x ≤ x T LG x ≤ (1 + ϵ)dx T x .

counts every edge inside S twice.


On the other hand, for the complete graph Kn , we know that all x orthogonal to the constant
vectors satisfy The following bound is a slight extension by Beigel, Margulis and Spielman [BMS93] of a bound
x T LKn x = nx T x . originally proved by Alon and Chung [AC88].
Let H be the graph d
Theorem 27.3.1. Let G = (V, E) be a d-regular graph that ϵ-approximates n Kn . Then, for every
d S ⊆ V and T ⊆ V ,
H= Kn , p
n ⃗
E(S, T ) − αβdn ≤ ϵdn (α − α2 )(β − β 2 ),
so
x T LH x = dx T x . where |S| = αn and |T | = βn.
So, G is an ϵ-approximation of H.
Observe that when α and β are greater than ϵ, the term on the right is less than αβdn.
This tells us that LG − LH is a matrix of small norm. Observe that
In class, we will just prove this in the case that S and T are disjoint.
(1 − ϵ)LH ≼ LG ≼ (1 + ϵ)LH implies − ϵLH ≼ LG − LH ≼ ϵLH .
Proof. The first step towards the proof is to observe
As LG and LH are symmetric, and all eigenvalues of LH are 0 or d, we may infer

χTS LG χT = d |S ∩ T | − E(S, T) .
∥LG − LH ∥ ≤ ϵd. (27.1)
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 219 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 220

Let H = nd Kn . As G is a good approximation of H, let’s compute Theorem 27.4.1 ([Tan84]). Let G = (V, E) be a d-regular graph on n vertices that
  ϵ-approximates nd Kn . Then, for all S ⊆ V ,
d d
χTS LH χT = χTS dI − J χT = d |S ∩ T | − |S| |T | = d |S ∩ T | − αβdn.
n n |S|
|N (S)| ≥ ,
ϵ2 (1 − α) + α
So,

E(S, T ) − αβdn = χTS LG χT − χTS LH χT . where |S| = αn.
As
∥LG − LH ∥ ≤ ϵd, Note that when α is much less than ϵ2 , the term on the right is approximately |S| /ϵ2 , which can
be much larger than |S|. We will derive Tanner’s theorem from Theorem 27.3.1.
χTS LH χT − χTS LG χT = χTS (LH − LG )χT
≤ ∥χS ∥ ∥(LH − LG )χT ∥ Proof. Let R = N (S) and let T = V − R. Then, there are no edges between S and T . Let
|T | = βn and |R| = γn, so γ = 1 − β. By Theorem 27.3.1, it must be the case that
≤ ∥χS ∥ ∥LH − LG ∥ ∥χT ∥
p
≤ ϵd ∥χS ∥ ∥χT ∥ αβdn ≤ ϵdn (α − α2 )(β − β 2 ).
p
= ϵdn αβ.
The lower bound on γ now follows by re-arranging terms. Dividing through by dn and squaring
This is almost as good as the bound we are trying to prove. To prove the claimed bound, recall both sides gives
that LH x = LH (x + c1) for all c. So, let x S and x T be the result of orthogonalizing χS and χT
α2 β 2 ≤ ϵ2 (α − α2 )(β − β 2 ) ⇐⇒
with respect to the constant vectors. By Claim 2.4.2 (from Lecture 2), ∥x S ∥ = n(α − α2 ). So, we
obtain the improved bound αβ ≤ ϵ2 (1 − α)(1 − β) ⇐⇒
β ϵ2 (1 − α)
x TS (LH − LG )x T = χTS (LH − LG )χT , ≤ ⇐⇒
1−β α
while p 1−γ ϵ2 (1 − α)
≤ ⇐⇒
∥x S ∥ ∥x T ∥ = n (α − α2 )(β − β 2 ). γ α
So, we may conclude 1 ϵ2 (1 − α) + α
p ≤ ⇐⇒

E(S, T ) − αβdn ≤ ϵdn (α − α2 )(β − β 2 ). γ α
α
γ≥ 2 .
ϵ (1 − α) + α

We remark that when S and T are disjoint, the same proof goes through even if G is irregular and

weighted if we replace E(S, T ) with
X If instead of N (S) we consider N (S) − S, then T and S are disjoint, so the same proof goes
w(S, T ) = w(u, v). through for weighted, irregular graphs that ϵ-approximate nd Kn .
(u,v)∈E,u∈S,v∈T

d
We only need the fact that G ϵ-approximates n Kn . See [BSS12] for details. 27.5 How well can a graph approximate the complete graph?

Consider applying Tanner’s Theorem with S = {v} for some vertex v. As v has exactly d
27.4 Vertex Expansion
neighbors, we find
ϵ2 (1 − 1/n) + 1/n ≥ 1/d,
The reason for the name expander graph is that small sets of vertices in expander graphs have p √
unusually large numbers of neighbors. For S ⊂ V , let N (S) denote the set of vertices that are from which we see that ϵ must be at least 1/ d(n − 1)/n, which is essentially 1/ d. But, how
neighbors of vertices in S. The following theorem, called Tanner’s Theorem, provides a lower small can it be?
bound on the size of N (S).
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 221 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 222

The Ramanujan graphs, constructed by Margulis [Mar88] and Lubotzky, Phillips and
Sarnak [LPS88] achieve √
2 d−1
ϵ≤ .
d
We will see that if we keep d fixed while we let n grow, ϵ cannot exceed this bound in the limit.
We will prove an upper bound on ϵ by constructing a suitable test function.
As a first step, choose two vertices v and u in V whose neighborhoods to do not overlap. Consider
the vector x defined by 

1 √ if i = u,



 1/ d if i ∈ N (u),

x (i) = −1 if i = v,

 √

 −1/ d if i ∈ N (v),



0 otherwise.
Now, compute the Rayleigh quotient with respect to x . The numerator
√ is the sum over all edges
of the squares of differences across the edges. This gives (1 − 1/ d)2 for the edges attached to u
and v, and 1/d for the edges attached to N (u) and N (v) but not to u or v, for a total of Figure 27.1: The construction of x .
√  √   √ 
2d(1 − 1/ d)2 + 2d(d − 1)/d = 2 d − 2 d + 1 + (d − 1) = 2 2d − 2 d .
Proof. Define the following neighborhoods.
On the other hand, the denominator is 4, so we find
x T Lx √ U0 = {u0 , u1 }
= d − d. Ui = N (Ui−1 ) − ∪j<i Uj , for 0 < i ≤ k,
xTx
If we use instead the vector  V0 = {v0 , v1 }

1 if i = u, Vi = N (Vi−1 ) − ∪j<i Vj , for 0 < i ≤ k.

 √

−1/ d if i ∈ N (u),
 That is, Ui consists of exactly those vertices at distance i from U0 . Note that there are no edges
y (i) = −1 if i = v, between any vertices in any Ui and any Vj .

 √
1/ d
 if i ∈ N (v),

 Our test vector for λ2 will be given by

0 otherwise,  1
we find 
 (d−1)i/2
for a ∈ Ui
√ 

y T Ly − β for a ∈ Vi
=d+ d. x (a) = (d−1)i/2
yT y 
√ 

This is not so impressive, as it merely tells us that ϵ ≥ 1/ d, which we already knew. But, we 

0 otherwise.
can improve this argument by pushing it further. We do this by modifying it in two ways. First,
we extend x to neighborhoods of neighborhoods of u and v. Second, instead of basing the We choose β so that x is orthogonal to 1.
construction at vertices u and v, we base it at two edges. This way, each vertex has d − 1 edges to
We now find that the Rayleigh quotient of x with respect to L is at most
those that are farther away from the centers of the construction.
X0 + β 2 Y0
The following theorem is attributed to A. Nilli [Nil91], but we suspect it was written by N. Alon. ,
X1 + β 2 Y1
Theorem 27.5.1. Let G be a d-regular graph containing two edges (u0 , u1 ) and (v0 , v1 ) that are where
at distance at least 2k + 2. Then  √ 2
√ k−1
X Xk
√ 1 − 1/ d − 1
2 d−1−1 X0 = |Ui | (d − 1) + |Uk | (d − 1)−k+1 , and X1 = |Ui | (d − 1)−i
λ2 ≤ d − 2 d − 1 + . (d − 1)i/2
k+1 i=0 i=0
CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 223 CHAPTER 27. PROPERTIES OF EXPANDER GRAPHS 224

and 27.6 Open Problems


k−1  √ 2 k
X 1 − 1/ d − 1 X
Y0 = |Vi | (d − 1) + |Vk | (d − 1)−k+1 , and Y1 = |Vi | (d − 1)−i . What can we say about λn ? In a previous iteration of this course, I falsely asserted that the same
(d − 1)i/2
i=0 i=0 proof tells us that √
√ 2 d−1−1
By my favorite inequality, it suffices to prove upper bounds on X0 /X1 and Y0 /Y1 . So, consider λn ≥ d + 2 d − 1 − .
k+1
Pk−1  √ 2
1−1/ d−1
+ |Uk | (d − 1)−k+1 But, the proof did not work.
i=0 |Ui | (d − 1) (d−1)i/2
Pk .
−i Another question is how well a graph of average degree d can approximate the complete graph.
i=0 |Ui | (d − 1)
That is, let G be a graph with dn/2 edges, but let G be irregular. While I doubt that irregularity
For now, let’s focus on the numerator, helps one approximate the complete graph, I do not know how to prove it.
k−1  √ 2
X 1 − 1/ d − 1 We can generalize this question further. Let G = (V, E, w) be a weighted graph with dn/2 edges.
|Ui | (d − 1) + |Uk | (d − 1)(d − 1)−k Can we prove that G cannot approximate a complete graph any better than the Ramanujan
i=0
(d − 1)i/2
graphs do? I conjecture that for every d and every β > 0 there is an n0 so that for every graph of
k−1
X |Ui | √ |Uk | average degree d on n ≥ n0 vertices,
= (d − 2 d − 1) + (d − 1)
(d − 1)i (d − 1)k √
i=0 λ2 d−2 d−1
k−1 ≤ √ + β.
X |Ui | √ |Uk | √ |Uk | √ λn d+2 d−1
= (d − 2 d − 1) + (d − 2 d − 1) + (2 d − 1 − 1)
(d − 1)i (d − 1)k (d − 1)k
i=0
k
X |Ui | √ |Uk | √
= (d − 2 d − 1) + (2 d − 1 − 1).
(d − 1)i (d − 1)k
i=0

To upper bound the Rayleigh quotient, we observe that the left-most of these terms contributes
Pk |Ui | √
i=0 (d−1)i (d − 2 d − 1) √
Pk = d − 2 d − 1.
−i
i=0 |Ui | (d − 1)

To bound the impact of the remaining term,


|Uk | √
(2 d − 1 − 1),
(d − 1)k
note that
|Uk | ≤ (d − 1)k−i |Ui | .
So, we have
k
|Uk | 1 X |Ui |
≤ .
(d − 1)k k+1 (d − 1)i
i=0
Thus, the last term contributes at most

2 d−1−1
k+1
to the Rayleigh quotient.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 226

A naive way of doing this would be for the transmitter to send every bit 3 times. If only 1 bit
were flipped during transmission, then the receiver would be able to figure out which one it was.
But, this is a very inefficient coding scheme. Much better approaches exist.

28.2 Notation
Chapter 28
When x is a vector, we let
def
|x | = |{a : x (a) ̸= 0}|

A brief introduction to Coding denote the hamming weight of x . This is often called the 0-norm, and written ∥x ∥0 .
For a prime p, we denote the integers modulo p by Fp . The reason is that the integers modulo p
Theory form the field with p elements: they may be summed and multiplied, have identities under
addition and multiplication (0 and 1), the have inverses under addition (−x), and all but zero
have inverses under multiplication. We say the field because it is unique up to the names of the
elements. In this chapter we mostly deal with the field of two elements F2 , which we write F2 .
This chapter gives a short introduction to the combinatorial view of error-correcting codes. Our
motivation is twofold: good error-correcting codes provide choices for the generators of
generalized hypercubes that have high expansion, and in the next chapter we learn how to use 28.3 Connection with Generalized Hypercubes
expander graphs to construct good error-correcting codes.
Recall that the Generalized Hypercubes we encountered in Section 7.4 have vertex set Fk2 and are
We begin and end the chapter with a warning: the combinatorial, worst-case view of coding
defined by d ≥ k generators, g 1 , . . . , g d ∈ Fk2 . For each b ∈ Fk2 , the graph defined by these
theory presented herein was very useful in the first few decades of the field. But, the problem of
generators has an adjacency matrix eigenvalue given by
error-correction is at its heart probabilistic and great advances have been made by avoiding the
worst-case formulation. For readers who would like to understand this perspective, we recommend d
X T
“Modern Coding Theory” by Richardson and Urbanke. For those who wish to learn more about µb = (−1)g i b .
the worst-case approach, we recommend “The Theory of Error-Correcting Codes” by i=1

MacWilliams and Sloane.


Let G be the d-by-k matrix whose ith row is g Ti . As (−1)x = 1 − 2x, for x ∈ {0, 1},
d
X
28.1 Coding µb =
T
(−1)g i b = d − 2 |Gb| .
i=1
Error-correcting codes are used to compensate for noise and interference in communication. They The eigenvalue of d comes from b = 0. If Gb has small Hamming weight for every other vector b,
are used in practically all digital transmission and data storage schemes. We will only consider then all the other eigenvalues of the adjacency matrix will be small. We will see that this
the problem of storing or transmitting bits1 , or maybe symbols from some small discrete alphabet. condition is satisfied when G is the generator matrix of a good code.
The only type of interference we will consider is the flipping of bits. Thus, 0101 may become
1101, but not 010. More noise means more bits are flipped.
28.4 Hamming Codes
In our model problem, a transmitter wants to send m bits, which means that the transmitter’s
message is an element of Fm2 . But, if the transmitter wants the receiver to correctly receive the
The first idea in coding theory was the parity bit. It allows one to detect one error. Let’s say that
message in the presence of noise, the transmitter should not send the plain message. Rather, the
the transmitter wants to send b1 , . . . , bm . If the transmitter constructs
transmitter will send n > m bits, encoded in such a way that the receiver can figure out what the
message was even if there is a little bit of noise. m
X
1
bm+1 = bi mod 2, (28.1)
Everything is bits. You think that’s air you’re breathing? i=1

225
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 227 CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 228

and sends 28.5 Terminology and Linear Codes


b1 , . . . , bm+1 ,
then the receiver will be able to detect one error, as it would cause (28.1) to be violated. But, the We will view an error-correcting code as a mapping
receiver won’t know where the error is, and so won’t be able to figure out the correct message
unless it request a retransmit. And, of course, the receiver wouldn’t be able to detect 2 errors. C : Fm n
2 → F2 ,

Hamming codes combine parity bits in an interesting way to enable the receiver to correct one for n larger than m. Every string in the image of C is called a codeword. We will also abuse
error. Let’s consider the first interesting Hamming code, which transmits 4-bit messages by notation by identifying C with the set of codewords.
sending 7 bits in such a way that any one error can be corrected. Note that this is much better
We define the rate of the code to be
than repeating every bit 3 times, which would require 12 bits. m
. r=
n
For reasons that will be clear soon, we will let b3 , b5 , b6 , and b7 be the bits that the transmitter
would like to send. The parity bits will be chosen by the rules The rate of a code tells you how many bits of information you receive for each codeword bit. Of
course, codes of higher rate are more efficient.
b4 = b5 + b6 + b7
The Hamming distance between two words c 1 and c 2 is the number of bits in which they differ.
b2 = b3 + b6 + b7 It will be written
b1 = b3 + b5 + b7 . dist(c 1 , c 2 ) = c 1 − c 2 .
All additions, of course, are modulo 2. The transmitter will send the codeword b1 , . . . , b7 .
The minimum distance of a code is
If we write the bits as a vector, then we see that they satisfy the linear equations
  d= min dist(c 1 , c 2 )
b1 c 1 ̸=c 2 ∈C
 
  b2    (here we have used C to denote the set of codewords). It should be clear that if a code has large
0 0 0 1 1 1 1  b3 
 0
0 1 1 0 0 1 1 b4  = 0 . minimum distance then it is possible to correct many errors. In particular, it is possible to correct
  any number of errors less than d/2. To see why, let c be a codeword, and let r be the result of
1 0 1 0 1 0 1  b5 
 0
b6  flipping e < d/2 bits of c. As dist(c, r ) < d/2, c will be the closest codeword to r . This is
because for every c 1 ̸= c,
b7
For example, to transmit the message 1010, we set d ≤ dist(c 1 , c) ≤ dist(c 1 , r ) + dist(r , c) < dist(c 1 , r ) + d/2 implies d/2 < dist(c 1 , r ).

b3 = 1, b5 = 0, b6 = 1, b7 = 0, So, large minimum distance is good.

and then compute The minimum relative distance of a code is


b1 = 1, b2 = 0, b4 = 1. d
δ= .
Let’s see what happens if some bit is flipped. Let the received transmission be c1 , . . . , c7 , and n
assume that ci = bi for all i except that c6 = 0. This means that the parity check equations that
It turns out that it is possible to keep both the rate and minimum relative distance of a code
involved the 6th bit will now fail to be satisfied, or
bounded below by constants, even as n grows. To formalize this notation, we will talk about a
   
0 0 0 1 1 1 1 1 sequence of codes instead of a particular code. A sequence of codes C1 , C2 , C3 , . . . is presumed to
0 1 1 0 0 1 1 c = 1 . be a sequence of codes of increasing message lengths. Such a sequence is called asymptotically
1 0 1 0 1 0 1 0 good if there are absolute constants r and δ such that for all i,
Note that this is exactly the pattern of entries in the 6th column of the matrix. This will happen r(Ci ) ≥ r and δ(Ci ) ≥ δ.
in general. If just one bit is flipped, and we multiply the received transmission by the matrix, the
product will be the column of the matrix containing the flipped bit. As each column is different, One of the early goals of coding theory was to construct asymptotically good sequences of codes.
we can tell which bit it was. To make this even easier, the columns have been arranged to be the Of course, one also needs to derive codes that have concise descriptions and that can be encoded
binary representations of their index. For example, 110 is the binary representation of 6. and decoded efficiently.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 229 CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 230

A big step in this direction was the use of linear codes . In the same way that we defined To this end, fix some non-zero vector b in Fm2 . Each entry of Gb is the inner product of a column
Hamming codes, we may define a linear code as the set of vectors c ∈ Fn2 such that M c = 0, for of G with b. As each column of G consists of random F2 entries, each entry of Gb is chosen
some (n − m)-by-n matrix M . In this chapter, we will instead define a code by its generator uniformly from F2 . As the columns of G are chosen independently, we see that Gb is a uniform
matrix. Given an n-by-m matrix G, we define the code CG to be the of vectors of the form Gb, random vector in Fn2 . Thus, the probability that |Gb| is at most d is precisely
where b ∈ Fm2 . One may view b as the message to be transmitted, and Gb as its encoding. d  
1 X n
A linear code is called linear because the sum of two codewords in the code is always another n
.
2 i
i=0
codeword. In particular, 0 is always a codeword and the minimum distance of the code equals the
minimum Hamming weight of a non-zero codeword, as As the probability that one of a number of events holds is at most the sum of the probabilities
that each holds (the “union bound”),
dist(c 1 , c 2 ) = c 1 − c 2 = c 1 + c 2 X
PrG [∃b ∈ Fm2 , b ̸= 0 : |Gb| ≤ d] ≤ PrG [|Gb| ≤ d]
over F2 .
0̸=b∈Fm
2
We now pause to make the connection back to generalized hypercubes: if CG has minimum d  
1 X n
relative distance δ and maximum relative distance 1 − δ, then the corresponding generalized ≤ (2m − 1)
n
.
2 i
hypercube is 1 − 2δ expander. i=0
d  
2m X n
≤ n .
2 i
28.6 Random Linear Codes i=0

In the early years of coding theory, there were many papers published that contained special
constructions of codes such as the Hamming code. But, as the number of bits to be transmitted To see how this behaves asymptotically, recall that for a constant p,
 
became larger and larger, it became more and more difficult to find such exceptional codes. Thus, n
an asymptotic approach became reasonable. In his paper introducing coding theory, Shannon ≈ 2nH(p) ,
pn
[Sha48] proved that random codes are asymptotically good. A few years later, Elias [Eli55]
where
suggested using random linear codes. def
H(p) = −p log2 p − (1 − p) log2 (1 − p)
We will now see that random linear codes are asymptotically good with high probability. We is the binary entropy function. If youPare not familiar with this, you may derive it from Stirling’s
consider a code of the form CG , where G is an n-by-m matrix with independent uniformly chosen formula. For our purposes 2nH(p) ≈ pn n
i=0 i . Actually, we will just use the fact that for
F2 entries. Clearly, the rate of the code will be m/n. β > H(p), P pn n 
So, the minimum distance of CG is i=0 i
→0
2nβ
min m dist(0, Gb) = min m |Gb| , as n goes to infinity.
0̸=b∈F2 0̸=b∈F2

where by |c| we mean the number of 1s in c. This is sometimes called the weight of c. If we set m = rn and d = δn, then Lemma 28.6.1 tells us that CG probably has rate r and
minimum relative distance δ if
Here’s what we can say about the minimum distance of a random linear code. The following 2rn nH(δ)
2 < 1,
argument is a refinement of the Chernoff based argument that appears in Section 7.5. 2n
which happens when
Lemma 28.6.1. Let G be a random n-by-m matrix. For any d, the probability that CG has
H(δ) < 1 − r.
minimum distance at least d is at least
d   For any constant r < 1, we can find a δ for which H(δ) < 1 − r, so there exist asymptotically
2m X n good sequences of codes of every non-zero rate. This is called the Gilbert-Varshamov bound. It is
1− n .
2 i still not known if binary codes exist whose relative minimum distance satisfies H(δ) > 1 − r. This
i=0
is a big open question in coding theory.
Proof. It suffices to upper bound the probability that there is some non-zero b ∈ Fm
2 for which
Of course, this does not tell us how to choose such a code in practice, how to efficiently check if a
|Gb| ≤ d. given code has large minimum distance, or how to efficiently decode such a code.
CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 231 CHAPTER 28. A BRIEF INTRODUCTION TO CODING THEORY 232

28.7 Reed-Solomon Codes However, Reed-Solomon codes do not provide an asymptotically good family. If one represents
each field element by log2 p bits in the obvious way, then the code has length p log2 p, but can
Reed-Solomon Codes are one of the workhorses of coding theory. The are simple to describe, and only correct at most p errors. That said, one can find an asymptotically good family by encoding
easy to encode and decode. each field element with its own small error-correcting code.

However, Reed-Solomon Codes are not binary codes. Rather, they are codes whose symbols are Next lecture, we will see how to make asymptotically good codes out of expander graphs. In the
elements of a finite field. If you don’t know what a finite field is, don’t worry (yet). For now, we following lecture, we will use good error-correcting codes to construct graphs.
will just consider prime fields, Fp . These are the numbers modulo a prime p. Recall that such
numbers may be added, multiplied, and divided.
28.8 Caution
A message in a Reed-Solomon code over a field Fp is identified with a polynomial of degree m − 1.
That is, the message f1 , . . . , fm is viewed as providing the coefficients of the polynomial
Explain defects of the worst-case view.
m−1
X
i
Q(x) = fi+1 x .
i=0
A Reed-Solomon code is encoded by evaluating it over every element of the field. That is, the
codeword is
Q(0), Q(1), Q(2), . . . , Q(p − 1).
Sometimes, it is evaluated at a subset of the field elements.
We will now see that the minimum distance of such a Reed-Solomon code is p − m. We show this
using the following standard fact from algebra.
Lemma 28.7.1. Let Q be a polynomial of degree at most m − 1 over a field Fp . If there exists
distinct field elements x1 , . . . , xm such that
Q(xi ) = 0
then Q is identically zero.
Theorem 28.7.2. The minimum distance of the Reed-Solomon code is at least p − m.

Proof. Let Q1 and Q2 be two different polynomials of degree at most m − 1. For a polynomial Q,
let
E(Q) = (Q(0), Q(1), . . . , Q(p))
be its encoding. If
dist(E(Q1 ), E(Q2 )) ≤ p − k,
then there exists field elements x1 , . . . , xk such that
Q1 (xj ) = Q2 (xj ).
Now, consider the polynomial
Q1 (x) − Q2 (x).
It also has degree at most m − 1, and it is zero at k field elements. Lemma 28.7.1 tells us that if
k ≥ m, then Q1 − Q2 is exactly zero, which means that Q1 = Q2 . Thus, for distinct Q1 and Q2 , it
must be the case that
dist(E(Q1 ), E(Q2 )) > p − m.
CHAPTER 29. EXPANDER CODES 234

Chapter 29
Figure 29.1: The cycle on 4 vertices, and its double-cover
Expander Codes
Proposition 29.1.2. Let H be the double-cover of G. Then, for every eigenvalue λi of the
Laplacian of G, H has a pair of eigenvalues,

λi and 2d − λi .
In this Chapter we will learn how to use expander graphs to construct and decode asymptotically
good error correcting codes.
The easiest way to prove this is to observe that if A is the adjacency matrix of G, then the
adjacency matrix of H looks like  
29.1 Bipartite Expander Graphs 0 A
.
A 0

Our construction of error-correcting codes will exploit bipartite expander graphs (as these give a Our analysis of error-correcting codes will exploit the following theorem, which is analogous to
much cleaner construction than the general case). Let’s begin by examining what a bipartite Theorem 10.2.1.
expander graph should look like. It’s vertex set will have two parts, U and V , each having n
d
vertices. Every vertex will have degree d, and every edge will go from a vertex in U to a vertex in Theorem 29.1.3. Let G = (U ∪ V, E) be a d-regular bipartite graph that ϵ-approximates n Kn,n .
V. Then, for all S ⊆ U and T ⊆ V ,

In the same way that we view ordinary expanders as approximations of complete graphs, we will d p
|E(S, T )| − |S| |T | ≤ ϵd |S| |T |.
view bipartite expanders as approximations of complete bipartite graphs1 . That is, if we let Kn,n n
denote the complete bipartite graph, then we want a d-regular bipartite graph G such that
d d Proof. Similar to the proof of Theorem 27.3.1.
(1 − ϵ) Kn,n ≼ G ≼ (1 + ϵ) Kn,n .
n n
Let G(S ∪ T ) denote the graph induced on vertex set S ∪ T . We use the following simple corollary
As the eigenvalues of the Laplacian of nd Kn,n are 0 and 2d with multiplicity 1 each, and d of Theorem 29.1.3.
otherwise, this means that we want a d-regular graph G whose Laplacian spectrum satisfies
Corollary 29.1.4. For S ⊆ U with |S| = σn and and T ⊆ V with |T | = τ n, the average degree of
λ1 = 0, λ2n = 2d, and |λi − d| ≤ ϵd, for all 1 < i < 2n. vertices in G(S ∪ T ) is at most
2dστ
We can obtain such a graph by taking the double-cover of an ordinary expander graph. + ϵd.
σ+τ
Definition 29.1.1. Let G = (V, E) be a graph. The double-cover of G is the graph with vertex set
V × {0, 1} and edges Proof. The average degree of a graph is twice its number of edges, divided by the number of
((a, 0), (b, 1)) , for (a, b) ∈ E. vertices. In our case, this is at most
p
2d |S| |T | |S| |T |
It is easy to determine the eigenvalues of the double-cover of a graph. + 2ϵd .
n |S| + |T | |S| + |T |
1
The complete bipartite graph contains all edges between U and V

233
CHAPTER 29. EXPANDER CODES 235 CHAPTER 29. EXPANDER CODES 236

The left-hand term is So, the codewords form a vector space of dimension dn(2r0 − 1), and so there is a matrix G with
2dστ dn(2r0 − 1) columns and dn rows for which the codewords are precisely the vectors Gx , for
,
σ+τ x ∈ {0, 1}dn(2r0 −1) . In fact, there are many such matrices G, and they are called generator
and the right-hand term is at most matrices for the code. Such a matrix G may be computed from M by elementary linear algebra.
ϵd.

29.4 Minimum Distance


29.2 Building Codes We will now see that if C0 is a good code, then C has large minimum distance. Let δ0 d be the
minimum distance of the code C0 . You should think of δ0 as being a constant.
Our construction of error-correcting codes will require two ingredients: a d-regular bipartite
Theorem 29.4.1. If ϵ ≤ δ0 /2, then the minimum relative distance δ of C satisfies
expander graph G on 2n vertices, and a linear error correcting code C0 of length d. We will
combine these to construct an error correcting code of length dn. We think of the code C0 as δ ≥ δ02 /2.
being a small code that drives the construction. This is reasonable as we will keep d a small
constant while n grows.
Proof. As C is a linear code, it suffices to prove that C has no nonzero codewords of small
In our construction of the code, we associate one bit with each edge of the graph. As the graph Hamming weight. To this end, we identify a codeword with the set of edges on which its bits are
has dn edges, this results in dn bits, which we label y1 , . . . , ydn . We now describe the code by 1. Let F be such a set of edges, and let |F | = ϕdn. As the minimum distance of C0 is δ0 d, every
listing the linear constraints its codewords must satisfy. Each vertex requires that the bits on its vertex v that is attached to an edge of F must be attached to at least δ0 d edges of F . Let S be
attached edges resemble a codeword in the code C0 . That is, each vertex should list its attached the subset of vertices of U adjacent to edges in F , and let T be the corresponding subset of V .
edges in some order (which order doesn’t matter, but it should be fixed). As a vertex has d We have just argued that every vertex in G(S ∪ T ) must have degree at least δ0 d, and so in
attached edges, it is easy to require that the d bits on these edges are a codeword in the code C0 . particular the average degree of G(S ∪ T ) is at least δ0 d.
Let r0 be the rate of code C0 . This means that the space of codewords has dimension r0 d. But, We may also use this fact to see that
since C0 is a linear code, it means that its codewords are exactly the the vectors that satisfy some |F |
|S| , |T | ≤ .
set of d(1 − r0 ) linear equations. As there are 2n vertices in the graph, the constraints imposed by δ0 d
each vertex impose 2nd(1 − r0 ) linear constraints on the dn bits. Thus, the vector space of Setting σ = |S| /n and τ = |T | /n, the previous inequality becomes
codewords that satisfy all of these constraints has dimension at least
ϕ
dn − 2dn(1 − r0 ) = dn(2r0 − 1), σ, τ ≤ .
δ0
and the code we have constructed has rate at least
Corollary 29.1.4 tells us that the average degree of G(S ∪ T ) is at most
r = 2r0 − 1.
2dστ
So, this rate will be a non-zero constant as long as r0 > 1/2. + ϵd.
σ+τ
For the rest of the lecture, we will let C denote the resulting expander code. As
ϕ
2στ ≤ σ 2 + τ 2 ≤ (σ + τ ),
δ0
29.3 Encoding the average degree of G(S ∪ T ) is at most
ϕ
We have described the set of codewords, but have not said how one should encode. As the code is d + ϵd.
linear, it is relatively easy to find a way to encode it. In particular, one may turn the above δ0
description of the code into a matrix M with dn columns and 2dn(1 − r0 ) rows such that the Combining the upper and lower bounds on the average degree of G(S ∪ T ), we obtain
codewords are precisely those y such that
ϕ
M y = 0. δ0 d ≤ d + ϵd,
δ0
CHAPTER 29. EXPANDER CODES 237 CHAPTER 29. EXPANDER CODES 238

which implies This implies


δ0 (δ0 − ϵ) ≤ ϕ. δ0 τ ≤ 2στ + ϵ(σ + τ ),
The assumption ϵ ≤ δ0 /2 then yields which becomes
ϵσ
ϕ ≥ δ02 /2. τ≤ .
δ0 − 2σ − ϵ
As we assumed that F was the set of edges corresponding to a codeword and that |F | = ϕdn, we Recalling that σ ≤ δ0 /9 and ϵ ≤ δ0 /3, we obtain
have shown that the minimum relative distance of C is at least δ02 /2.
δ0 /3 3
τ ≤σ ≤ σ.
δ0 (4/9) 4
29.5 Decoding

We will convert an algorithm that corrects errors in C0 into an algorithm for correcting errors in Lemma 29.5.2. Assume that ϵ ≤ δ0 /3. Let F be the set of edges in error after a U -decoding
C. The construction is fairly simple. We first apply the decoding algorithm at every vertex in U . step, and let S be the set of vertices in U attached to F . Now, perform a V -decoding step and let
We then do it at every vertex in V . We alternate in this fashion until we produce a codeword. T be the set of vertices in V attached to edges in error afterwards. If
To make this more concrete, assume that we have an algorithm A that corrects up to δ0 d/2 errors |S| ≤ δ0 n/9,
in the code C0 . That is, on input any word r ∈ {0, 1}d , A outputs another word in {0, 1}d with
the guarantee that if there is a c ∈ C0 such that dist(c, r ) ≤ δ0 d/2, then A outputs c. We apply then
the transformation A independently to the edges attached to each vertex of U . We then do the 3
|T | ≤ |S| .
same for V , and then alternate sides for a logarithmic number of iterations. We refer to these 4
alternating operations as U − and V -decoding steps
Proof. Every vertex in V that outputs an error after the V -decoding step must be attached to at
We will prove that if ϵ ≤ δ0 /3 then this algorithm will correct up to δ02 dn/18 errors in at most least δ0 d/2 edges of F . Moreover, each of these edges is attached to a vertex of S. Thus, the
log4/3 n iterations. The idea is to keep track of which vertices are attached to edges that contain lemma follows immediately from Lemma 29.5.1.
errors, rather than keeping track of the errors themselves. We will exploit the fact that any vertex
that is attached to few edges in error will correct those errors. Let S be the set of vertices Theorem 29.5.3. If ϵ ≤ δ0 /3, then the proposed decoding algorithm will correct every set of at
attached to edges in error after a U -decoding step. We will show that the set T of vertices most
δ02
attached to edges in error after the next V -decoding step will be much smaller. dn
18
Lemma 29.5.1. Assume that ϵ ≤ δ0 /3. Let F ⊂ E be a set of edges, let S be the subset of errors.
vertices in U attached to edges in F and let T be the subset of vertices in V attached to at least
δ0 d/2 edges in F . If Proof. Let F denote the set of edges that are initially in error. Let S denote the set of vertices
|S| ≤ δ0 n/9, that output errors after the first U -decoding step. Every vertex in S must be adjacent to at least
δd/2 edges in F , so
then
3 δ2 |F |
|T | ≤ |S| . |F | ≤ 0 dn =⇒ |S| ≤ ≤ δ0 n/9.
4 18 δ0 d/2
After this point, we may apply Lemma 29.5.2 to show that the decoding process converges in at
Proof. Let |S| = σn and |T | = τ n. We have |F | ≥ (δ0 d/2) |T |. As the average degree of G(S ∪ T ) most log4/3 n iterations.
is twice the number of edges in the subgraph divided by the number of vertices, it is at least

δ0 d |T | δ0 dτ
= . 29.6 Historical Notes
|S| + |T | σ+τ

Applying Corollary 29.1.4, we find Gallager [Gal63] first used graphs to construct error-correcting codes. His graphs were also
δ0 dτ 2dστ bipartite, with one set of vertices representing bits and the other set of vertices representing
≤ + ϵd. constraints. Tanner [Tan81] was the first to put the vertices on the edges. The use of expansion in
σ+τ σ+τ
CHAPTER 29. EXPANDER CODES 239

analyzing these codes we pioneered by Sipser and Spielman [SS96]. The construction we present
here is due to Zemor [Zem01], although he presents a tighter analysis. Improved constructions
and analyses may be found in [BZ02, BZ05, BZ06, AS06].
Surprisingly, encoding these codes is slower than decoding them. As the matrix G will be dense,
leading to an encoding algorithm that takes time Θ((dn)2 ). Of course, one would prefer to encode
them in time O(dn). Using Ramanujan expanders and the Fast Fourier Transform over the
appropriate groups, Lafferty and Rockmore [LR97] reduced the time for encoding to O(d2 n4/3 ).
Chapter 30
Spielman [Spi96a] modifies the code construction to obtain codes with similar performance that
may be encoded in linear time.
Related ideas have been used to design codes that approach channel capacity. See
[LMSS01, RSU01, RU08].
A simple construction of expander
graphs

This Chapter Needs Editing

30.1 Overview

Our goal is to prove that for every ϵ > 0 there is a d for which we can efficiently construct an
infinite family of d-regular ϵ-expanders. I recall that these are graphs whose adjacency matrix
eigenvalues satisfy |µi | ≤ ϵd and whose Laplacian matrix eigenvalues satisfy |d − λi | ≤ ϵd. Viewed
as a function of ϵ, the d that we obtain in this construction is rather large. But, it is a constant.
The challenge here is to construct infinite families with fixed d and ϵ.
Before we begin, I remind you that in Lecture 5 we showed that random generalized hybercubes
were ϵ expanders of degree f (ϵ) log n, for some function f . The reason they do not solve today’s
problem is that their degrees depend on the number of vertices. However, today’s construction
will require some small expander graph, and these graphs or graphs like them can serve in that
role. So that we can obtain a construction for every number of vertices n, we will exploit random
generalized ring graphs. Their analysis is similar to that of random generalized hypercubes.

Claim 30.1.1. There exists a function f (ϵ) so that for every ϵ > 0 and every sufficiently large n
the Cayley graph with group Z/n and a random set of at least f (ϵ) log n generators is an
ϵ-expander with high probability.

I am going to present the simplest construction of expanders that I have been able to find. By
“simplest”, I mean optimizing the tradeoff of simplicity of construction with simplicity of
analysis. It is inspired by the Zig-Zag product and replacement product constructions presented
by Reingold, Vadhan and Wigderson [RVW02].
For those who want the quick description, here it is. Begin with an expander. Take its line graph.
Observe that the line graph is a union of cliques. So, replace each clique by a small expander. We

240
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 241 CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 242

need to improve the expansion slightly, so square the graph. Square one more time. Repeat. So, by squaring enough times, we can convert a family of β expanders for any β < 1 into a family
of ϵ expanders.
The analysis will be simple because all of the important parts are equalities, which I find easier to
understand than inequalities.
While this construction requires the choice of two expanders of constant size, it is explicit in the 30.3 The Relative Spectral Gap
sense that we can obtain a simple implicit representation of the graph: if the name of a vertex in
the graph is written using b bits, then we can compute its neighbors in time polynomial in b. To measure the qualities of the graphs that appear in our construction, we define a quantity that
we will call the relative spectral gap of a d-regular graph:
 
30.2 Squaring Graphs def
r(G) = min
λ2 (G) 2d − λn
, .
d d
We will first show that we can obtain a family of ϵ expanders from a family of β-expanders for The graphs with larger relative spectral gaps are better expanders. An ϵ-expander has relative
any β < 1. The reason is that squaring a graph makes it a better expander, although at the cost spectral gap at least 1 − ϵ, and vice versa. Because we can square graphs, we know that it suffices
of increasing its degree. to find an infinite family of graphs with relative spectral gap strictly greater than 0.
Given a graph G, we define the graph G2 to be the graph in which vertices u and v are connected We now state exactly how squaring impacts the relative spectral gap of a graph.
if they are at distance 2 in G. Formally, G2 should be a weighted graph in which the weight of an
edge is the number of such paths. When first thinking about this, I suggest that you ignore the Corollary 30.3.1. If G has relative spectral gap β, then G2 has relative spectral gap at least
issue. When you want to think about it, I suggest treating such weighted edges as multiedges.
2β − β 2 .
We may form the adjacency matrix of G2 from the adjacency matrix of G. Let M be the
adjacency matrix of G. Then M 2 (u, v) is the number of paths of length 2 between u and v in G,
and M 2 (v, v) is always d. We will eliminate those self-loops. So, Note that when β is small, this gap is approximately 2β.

M G2 = M 2G − dIn .
30.4 Line Graphs
If G has no cycles of length up to 4, then all of the edges in its square will have weight 1. The
following claim is immediate from this definition.
Our construction will leverage small expanders to make bigger expanders. To begin, we need a
Claim 30.2.1. The adjacency matrix eigenvalues of G2 are precisely way to make a graph bigger and still say something about its spectrum.

µ2i − d, We use the line graph of a graph. Let G = (V, E) be a graph. The line graph of G is the graph
whose vertices are theedges of G in which two are connected if they share an endpoint in G.
where µ1 , . . . , µn are the adjacency matrix eigenvalues of G. That is, (u, v), (w, z) is an edge of the line graph if one of {u, v} is the same as one of {w, z}.
√ The line graph is often written L(G), but we won’t do that in this class so that we can avoid
Lemma
 2 30.2.2. If {Gi }i is an infinite family of d-regular β-expanders for β ≥ 1/ d − 1, then
confusion with the Laplacian.
Gi i is an infinite family of d(d − 1)-regular β 2 expanders.

We remark that the case of β > 1/ d − 1, or even larger, is the case
√ of interest. We are not
expecting to work with graphs that beat the Ramanujan bound, 2 d − 1/d.

Proof. For µ an adjacency matrix eigenvalue of Gi other than d, we have

µ2 − d µ2 − d µ2
= 2 ≤ 2 ≤ β2.
d(d − 1) d −d d
(a) A graph (b) Its line graph.
On the other hand, every adjacency eigenvalue of G2i is at least −d, which is at least
−β 2 d(d − 1).
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 243 CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 244

Let G be a d-regular graph with n vertices, and let H be its line graph1 .As G has dn/2 edges, H Proof of Lemma 30.4.1. First, let λi be an eigenvalue of LG . We see that
has dn/2 vertices. Each vertex of H, say (u, v), has degree 2(d − 1): d − 1 neighbors for the other
edges attached to u and d − 1 for v. In fact, if we just consider one vertex u in V , then all vertices λi is an eigenvalue of D G − M G =⇒
in H of form (u, v) of G will be connected. That is, H contains a d-clique for every vertex in V . d − λi is an eigenvalue of M G =⇒
We see that each vertex of H is contained in exactly two of these cliques. 2d − λi is an eigenvalue of D G + M G =⇒
Here is the great fact about the spectrum of the line graph. 2d − λi is an eigenvalue of 2Ind/2 + M H =⇒

Lemma 30.4.1. Let G be a d-regular graph with n vertices, and let H be its line graph. Then the 2(d − 1) − λi is an eigenvalue of M H =⇒
spectrum of the Laplacian of H is the same as the spectrum of the Laplacian of G, except that it λi is an eigenvalue of D H − M H .
has dn/2 − n extra eigenvalues of 2d.
Of course, this last matrix is the Laplacian matrix of H. We can similarly show that the extra
Before we prove this lemma, we need to recall the factorization of a Laplacian as the product of dn/2 − n zero eigenvalues of 2Ind/2 + M H become 2d in LH .
the signed edge-vertex adjacency matrix times its transpose. We reserved the letter U for this
matrix, and defined it by While the line graph operation preserves λ2 , it causes the degree of the graph to grow. So, we are

 going to need to do more than just take line graphs to construct expanders.
1 if a = c
U ((a, b), c) = −1 if b = c Proposition 30.5.1. Let G be a d-regular graph with d ≥ 7 and let H be its line graph. Then,


0 otherwise.
λ2 (G)
For an unweighted graph, we have r(H) = ≥ r(G)/2.
2(d − 1)
LG = U T U .
Recall that each edge indexes one column, and that we made an arbitrary choice when we ordered Proof. For G a d-regular graph other than Kd+1 , λ2 (G) ≤ d + 1. By the Perron-Frobenius
the edge (a, b) rather than (b, a). But, this arbitrary choice factors out when we multiply by U T . theorem (Lemma 6.A.1) λmax (G) ≤ 2d (with equality if and only G is bipartite). So,
λmax (H) = 2d and λ2 (H) = λ2 (G) ≤ d. So, the term in the definition of the relative spectral gap
corresponding to the largest eigenvalue of H satisfies
30.5 The Spectrum of the Line Graph
2(2d − 2) − λmax (H) 2(2d − 2) − 2d 2
= = 1 − ≥ 5/7,
2d − 2 2d − 2 d
Define the matrix |U | to be the matrix obtained by replacing every entry of U by its absolute
value. Now, consider |U |T |U |. It looks just like the Laplacian, except that all of its off-diagonal as d ≥ 7. On the other hand,
entries are 1 instead of −1. So, λ2 (H) d
≤ ≤ 2/3.
2d − 2 2d − 2
T
|U | |U | = D G + M G = dI + M G , As 2/3 < 5/7,
as G is d-regular. We will also consider the matrix |U | |U |T . This is a matrix with nd/2 rows  
λ2 (H) 2(2d − 2) − λmax (H) λ2 (H) λ2 (G)
and nd/2 columns, indexed by edges of G. The entry at the intersection of row (u, v) and column min , = = ≥ r(G/2).
2d − 2 2d − 2 2d − 2 2d − 2
(w, z) is
(δ u + δ v )T (δ w + δ z ).
So, it is 2 if these are the same edge, 1 if they share a vertex, and 0 otherwise. That is
While the line graph of G has more vertices, its degree is higher and its relative spectral gap is
|U | |U |T = 2Ind/2 + M H . approximately half that of G. We can improve the relative spectral gap by squaring. In the next
section, we show how to lower the degree.
Moreover, |U | |U |T and |U |T |U | have the same eigenvalues, except that the later matrix has
nd/2 − n extra eigenvalues of 0.
1
If G has multiedges, which is how we interpret integer weights, then we include a vertex in the line graph for
each of those multiedges. These will be connected to each other by edges of weight two—one for each vertex that
they share. All of the following statements then work out.
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 245 CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 246

30.6 Approximations of Line Graphs So,


 
kλ2 (G) kλ2 (G)
Our next step will be to construct approximations of line graphs. We already know how to min (λ2 (G⃝Z),
L 2(2k) − λmax (G⃝Z))
L ≥ min (1 − α) , (1 − α)2k = (1 − α) ,
d d
approximate complete graphs: we use expanders. As line graphs are sums of complete graphs, we
will approximate them by sums of expanders. That is, we replace each clique in the line graph by as λ2 (G) ≤ d. So,
1 1−α
an expander on d vertices. Since d will be a constant in our construction, we will be able to get r(G⃝Z)
L ≥ (1 − α)kr(G) = r(G).
these small expanders from known constructions, like the random generalized ring graphs. 2k 2

Let G be a d-regular graph and let Z be a graph on d vertices of degree k (we will use a
low-degree expander). We define the graph So, the relative spectral gap of G⃝Z
L is a little less than half that of G. But, the degree of G⃝Z
L
is 2k, which we will arrange to be much less than the degree of G, d.
G⃝Z
L
We will choose k and d so that squaring this graph improves its relative spectral gap, but still
to be the graph obtained by forming the edge graph of G, H, and then replacing every d-clique in leaves its degree less than d. If G has relative spectral gap β, then G2 has relative spectral gap at
H by a copy of Z. Actually, this does not uniquely define G⃝Z,
L as there are many ways to least
replace a d-clique by a copy of Z. But, any choice will work. Note that every vertex of G⃝Z
L has 2β − β 2 .
degree 2k.
It is easy to see that when β is small, this gap is approximately 2β. This is not quite enough to
Lemma 30.6.1. Let G be a d-regular graph, let H be the line graph of G, and let Z be a compensate for the loss of (1 − ϵ)/2 in the corollary above, so we will have to square the graph
k-regular α-expander. Then, once more.

k k
(1 − α) H ≼ G⃝Z
L ≼ (1 + α) H
d d 30.7 The whole construction
Proof. As H is a sum of d-cliques, let H1 , . . . , Hn be those d-cliques. So,
To begin, we need a “small” k-regular expander graph Z on
n
X
LH = LHi . def
d = (2k(2k − 1))2 − 2k(2k − 1)
i=1
vertices. It should be an ϵ-expander for some small ϵ. I believe that ϵ = 1/6 would suffice. The
Let Zi be the graph obtained by replacing Hi with a copy of Z, on the same set of vertices. To
other graph we will need to begin our construction will be a small d-regular expander graph G0 .
prove the lower bound, we compute
We use Claim 30.1.1 to establish the existence of both of these. Let β be the relative spectral gap
n
X kX
n
k of G0 . We will assume that β is small, but greater than 0. I believe that β = 1/5 will work. Of
LG⃝
LZ = LZi ≽ (1 − α) LHi = (1 − α) LH . course, it does not hurt to start with a graph of larger relative spectral gap.
d d
i=1 i=1
We then construct G0 ⃝Z.
L The degree of this graph is 2k, and its relative spectral gap is a little
The upper bound is proved similarly. less than β/2. So, we square the resulting graph, to obtain
Corollary 30.6.2. Under the conditions of Lemma 30.6.1, 2
(G0 ⃝Z)
L .
1−α It has degree approximately 4k 2 ,
and relative spectral gap slightly less than β. But, for induction,
r(G⃝Z)
L ≥ r(G).
2 we need it to be more than β. So, we square one more time, to get a relative spectral gap a little
less than 2β. We now set
Proof. The proof is similar to the proof of Proposition 30.5.1. We have  2
2
G1 = (G0 ⃝Z)
L .
kλ2 (G)
λ2 (G⃝Z)
L ≥ (1 − α) , The graph G1 is at least as good an approximation of a complete graph as G0 , and it has degree
d approximately 16k 4 . In general, we set
and  2
2
λmax (G⃝Z)
L ≤ (1 + α)2k. Gi+1 = (Gi ⃝Z)
L .
CHAPTER 30. A SIMPLE CONSTRUCTION OF EXPANDER GRAPHS 247

To make the inductive construction work, we need for Z to be a graph of degree k whose number
of vertices equals the degree of G. This is approximately 16k 4 , and is exactly

(2k(2k − 1))2 − 2k(2k − 1).

I’ll now carry out the computation of relative spectral gaps with more care. Let’s assume that G0
has a relative spectral gap of β ≥ 4/5, and assume, by way of induction, that ρ(Gi ) ≥ 4/5. Also Chapter 31
assume that Z is a 1/6-expander. We then find

r(Gi ⃝Z)
L ≥ (1 − ϵ)(4/5)/2 = 1/3.

So, Gi ⃝Z
L is a 2/3-expander. Our analysis of graph squares then tells us that Gi+1 is a
PSRGs via Random Walks on Graphs
(2/3)4 -expander. So,
r(Gi+1 ) ≥ 1 − (2/3)4 = 65/81 > 4/5.
By induction, we conclude that every Gi has relative spectral gap at least 4/5.
31.1 Overview
To improve their relative spectral gaps of the graphs we produce, we can just square them a few
times.
There are three major approaches to designing pseudo-random generators (PSRGs). The most
common is to use quick procedures that seem good enough. This is how the PSRGs that are
standard in most languages arise. Cryptographers and Complexity Theorists try to design PSRGs
30.8 Better Constructions
that work for every polynomial-time algorithm. For example, one can construct PSRGs from
cryptographic functions with the guarantee that if the output of a polynomial-time algorithm
There is a better construction technique, called the Zig-Zag product [RVW02]. The Zig-Zag differs from random when using the PSRG, then one can use it to break the cryptographic
construction is a little trickier to understand, but it achieves better expansion. I chose to present function (see [HILL99, Gol07]). In this chapter we consider the construction of PSRGs that can
the line-graph based construction because its analysis is very closely related to an analysis of the be proved to work for specific algorithms or algorithms of specific forms. In particular, we will see
Zig-Zag product. w Impagliazzo and Zuckerman’s [IZ89] approach of using of random walks on expanders to run
the same algorithm many times. We are going to perform a very crude analysis that is easy to
present. Rest assured that much tighter analyses are possible and much better PSRGs have been
constructed since.

31.2 Why Study PSRGs?

Pseudo-random number generators take a seed which is presumably random (or which has a lot of
randomness in it), and then generate a long string of random bits that are supposed to act
random. We should first discuss why we would actually want such a thing. I can think of two
reasons.

1. Random bits are scarce. This might be surprising. After all, if you look at the last few bits
of the time that I last hit a key, it is pretty random. Similarly, the low-order bits of the
temperature of the processor in my computer seem pretty random. While these bits are
pretty random, there are not too many of them.
Many randomized algorithms need a lot of random bits. Sources such as these just do not
produce random bits with a frequency sufficient for many applications.

248
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 249 CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 250

2. If you want to re-run an algorithm, say to de-bug it, it is very convenient to be able to use Since we will not make any assumptions about the black box, we will use truly random bits the
the same set of random bits by re-running the PSRG with the same seed. If you use truly first time we test it. But, we will show that we only need 9 new random bits for each successive
random bits, you can’t do this. test. In particular, we will show that if we use our PSRG to generate bits for t + 1 test, then the
probability that majority answer is wrong decreases exponentially in t.
You may also wonder how good the standard pseudo-random number generators are. The first You are probably wondering why we would want to do such a thing. The reason is to increase the
answer is that the default ones, such as rand in C, are usually terrible. There are many accuracy of randomized algorithms. There are many randomized algorithms that provide weak
applications, such as those in my thesis, for which these generators produce behavior that is very guarantees, such as being correct 99% or 51% of the time. To obtain accurate answers from such
different from what one would expect from truly random bits (yes, this is personal). On the other algorithms, we run them many times with fresh random bits. You can view such an algorithm has
hand, one can use cryptographic functions to create bits that will act random for most purposes, having two inputs: the problem to be solved and its random bits. The black box is the behavior
unless one can break the underlying cryptography [HILL99]. But, the resulting generators are of the algorithm when the problem to be solved is fixed, so it is just working on the random bits.
usually much slower than the fastest pseudo-random generators. Fundamentally, it comes down to
a time-versus-quality tradeoff. The longer you are willing to wait, the better the pseudo-random
bits you can get. 31.5 The Random Walk Generator

31.3 Expander Graphs Let r be the number of bits that our black box takes as input. So, the space of random bits is
{0, 1}r . Let X ⊂ {0, 1}r be the settings of the random bits on which the box gives the minority
answer, and let Y be the settings on which it gives the majority answer.
In today’s lecture we will require an infinite family of d-regular 1/10-expander graphs. We require
that d be a constant, that the graphs have 2r vertices for all sufficiently large r, and that we can Our pseudo-random generator will use a random walk on a 1/10-expander graph whose vertex set
construct the neighbors of a vertex in time polynomial in r. That is, we need the graphs to have a is {0, 1}r . Recall that we can use d = 400. For the first input we feed to the black box, we will
simple explicit description. One can construct expanders families of this form using the require r truly random bits. We treat these bits as a vertex of our graph. For each successive test,
techniques from last lecture. For today’s purposes, the best expanders are the Ramanujan graphs we choose a random neighbor of the present vertex, and feed the corresponding bits to the box.
produced by Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88]. Ramanujan graphs of That is, we choose a random i between 1 and 400, and move to the ith neighbor of the present
degree d = 400 are 1/10-expanders. See also the work of Alon, Bruck, Naor, Naor and vertex. Note that we only need log2 400 ≈ 9 random bits to choose the next vertex. So, we will
Roth [ABN+ 92] for even more explicit constructions. only need 9 new bits to generate each input we feed to the box after the first.

While the explicit Ramanujan graphs only exist in certain sizes, none of which do have exactly 2r
vertices, some of them have just a little more that 2r vertices. It is possible to trim these to make 31.6 Formalizing the problem
them work, say by ignoring all steps in which the vertex does not correspond to r bits.

Assume that we are going to test the box t + 1 times. Our pseudo-random generator will begin at
31.4 Today’s Application : repeating an experiment a truly random vertex v, and then take t random steps. Recall that we defined X to be the set of
vertices on which the box outputs the minority answer, and we assume that |X| ≤ 2r /100. If we
report the majority of the outcomes of the t + 1 outputs of the box, we will return the correct
Imagine you are given a black box that takes r bits as input and then outputs either 0 or 1. answer as long as the random walk is inside X less than half the time. To analyze this, let v0 be
Moreover, let’s assume that the black box is very consistent: we know that it returns the same the initial random vertex, and let v1 , . . . , vt be the vertices produced by the t steps of the random
answer at least 99% of the time. If it almost always returns 0, we will call it a 0-box and if it walk. Let T = {0, . . . , t} be the time steps, and let S = {i : vi ∈ X}. We will prove
almost always returns 1, we will call it a 1-box. Our job is to determine whether a given box is a
 
0 or 1 box. We assume that r is big, so we don’t have time to test the box on all 2r settings of r 2 t+1
bits. Instead, we could pick r bits at random, and check what the box returns. If it says“1”, then Pr [|S| > t/2] ≤ √ .
5
it is probably a 1-box. But, what if we want more than 99% confidence? We could check the box
on many choices of r random bits, and report the majority value returned by the box.1 . But, this To begin our analysis, recall that the initial distribution of our random walk is p 0 = 1/n. Let χX
seems to require a new set of random bits for each run. In this lecture, we will prove that 9 new and χY be the characteristic vectors of X and Y , respectively, and let D X = diag(χX ) and
bits per run suffice. Note that the result would be interesting for any constant other than 9. D Y = diag(χY ). Let
1
Check for yourself that running it twice doesn’t help 1
W = M (31.1)
d
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 251 CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 252

be the transition matrix of the ordinary random walk on G. We are not using the lazy random The matrix norm measures how much a vector can increase in size when it is multiplied by M .
walk: it would be silly to use the lazy random walk for this problem, as there is no benefit to When M is symmetric, the 2-norm is just the largest absolute value of an eigenvalue of M (prove
re-running the experiment with the same random bits as before. Let ω1 , . . . , ωn be the eigenvalues this for yourself). It is also immediate that
of W . As the graph is a 1/10-expander, |ωi | ≤ 1/10 for all i ≥ 2.
∥M 1 M 2 ∥ ≤ ∥M 1 ∥ ∥M 2 ∥ .
Let’s see how we can use these matrices to understand the probabilities under consideration. For
a probability vector p on vertices, the probability that a vertex chosen according to p is in X You should also verify this yourself. As D X , D Y and W are symmetric, they each have norm 1.
may be expressed
χTX p = 1T D X p. Warning 31.7.1. While the largest eigenvalue of a walk matrix is 1, the norm of an asymmetric
walk matrix can be larger√than 1. For instance, consider the walk matrix of the path on 3 vertices.
The second form will be more useful, as Verify that it has norm 2.
DXp
is the vector obtained by zeroing out the events in which the vertices are not in X. If we then Our analysis rests upon the following bound on the norm of D X W .
want to take a step in the graph G, we multiply by W . That is, the probability that the walk
starts at vertex in X, and then goes to a vertex i is q (i) where Lemma 31.7.2.
∥D X W ∥ ≤ 1/5.
q = W D X p 0.
Let’s see why this implies the theorem. For any set R, let Zi be as defined above. As p 0 = W p 0 ,
Continuing this way, we see that the probability that the walk is in X at precisely the times i ∈ R we have
is 
1T D Zt W D Zt−1 W · · · D Z1 W D Z0 p 0 , 1T D Zt W D Zt−1 W · · · D Z1 W D Z0 p 0 = 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0 .
where ( Now,
X if i ∈ R (
Zi = 1/5 for i ∈ R, and
Y otherwise. D Zt−1 W ≤
1 for i ∈
̸ R.
We will prove that this probability is at most (1/5)|R| . It will then follow that So, 
X (D Zt W ) D Zt−1 W · · · (D Z0 W ) ≤ (1/5)|R| .
Pr [|S| > t/2] ≤ Pr [the walk is in X at precisely the times in R] √ √
|R|>t/2 As ∥p 0 ∥ = 1/ n and ∥1∥ = n, we may conclude
X  1 |R|  
≤ 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0 ≤ 1T (D Zt W ) D Zt−1 W · · · (D Z0 W ) p 0
5
|R|>t/2 ≤ 1T (1/5)|R| ∥p 0 ∥
 (t+1)/2
≤ 2t+1
1 = (1/5)|R| .
5
 
2 t+1
= √ . 31.8 The norm of D X W
5

Proof of Lemma 31.7.2. Let x be any non-zero vector, and write


31.7 Matrix Norms
x = c1 + y ,
Recall that the operator norm of a matrix M (also called the 2-norm) is defined by
where 1T y = 0. We will show that ∥D X W x ∥ ≤ ∥x ∥ /5.
∥M v ∥
∥M ∥ = max . We know that the constant vectors are eigenvectors of W . So, W 1 = 1 and
v ∥v ∥
D X W 1 = χX .
CHAPTER 31. PSRGS VIA RANDOM WALKS ON GRAPHS 253

This implies p √
∥D X W c1∥ = c ∥χX ∥ = c |X| ≤ c n/10.

We will now show that ∥W y ∥ ≤ ∥y ∥ /10. The easiest way to see this is to consider the matrix

W − J /n,

where we recall that J is the all-1 matrix. This matrix is symmetric and all of its eigenvalues
have absolute value at most 1/10. So, it has norm at most 1/10. Moreover, (W − J /n)y = W y ,
which implies ∥W y ∥ ≤ ∥y ∥ /10. Another way to prove this is to expand y in the eigenbasis of
W , as in the proof of Lemma 2.1.3.
Finally, as 1 is orthogonal to y , q
∥x ∥ = c2 n + ∥y ∥2 .
So,
√ Part VI
∥D X W x ∥ ≤ ∥D X W c1∥ + ∥D X W y ∥ ≤ c n/10 + ∥y ∥ /10 ≤ ∥x ∥ /10 + ∥x ∥ /10 ≤ ∥x ∥ /5.

Algorithms
31.9 Conclusion

Observe that this is a very strange proof. When considering probabilities, it seems that it would
be much more natural to sum them. But, here we consider 2-norms of probability vectors.

31.10 Notes

For the best results on the number of bits one needs for each run of an algorithm, see [?].
For tighter results on the concentration on variables drawn from random walks on expanders, see
Gillman [Gil98]. For matrices, see [GLSS18].

254
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 256

4. the solutions of linear equations in the two matrices are similar.

We will prove this by using a very simple random construction. We first carefully1 choose a
probability pa,b for each edge (a, b). We then include each edge (a, b) with probabilty pa,b ,
independently. If we do include edge (a, b), we give it weight wa,b /pa,b . We will show that our
choice of probabilities ensures that the resulting graph H has at most 4n ln n/ϵ2 edges and is an ϵ
Chapter 32 approximation of G with high probability.
The reason we employ this sort of sampling–blowing up the weight of an edge by dividing by the
probability that we choose it—is that it preserves the matrix in expectation. Let La,b denote the
Sparsification by Random Sampling elementary Laplacian on edge (a, b) with weight 1, so that
X
LG = wa,b La,b .
(a,b)∈E

We then have that X


32.1 Overview ELH = pa,b (wa,b /pa,b )La,b = LG .
(a,b)∈E

Two weeks ago, we learned that expander graphs are sparse approximations of the complete
graph. This week we will learn that every graph can be approximated by a sparse graph. Today,
we will see how a sparse approximation can be obtained by careful random sampling: every graph
32.3 Matrix Chernoff Bounds
on n vertices has an ϵ-approximation with only O(ϵ−2 n log n) edges (a result of myself and
Srivastava [SS11]). We will prove this using a matrix Chernoff bound due to Tropp [Tro12]. The main tool that we will use in our analysis is a theorem about the concentration of random
matrices. These may be viewed as matrix analogs of the Chernoff bound that we saw in Lecture
We originally proved this theorem using a concentration bound of Rudelson [Rud99]. This 5. These are a surprisingly recent development, with the first ones appearing in the work of
required an argument that used sampling with replacement. When I taught this result in 2012, I Rudelson and Vershynin [Rud99, RV07] and Ahlswede and Winter [AW02]. The best present
asked if one could avoid sampling with replacement. Nick Harvey pointed out to me the argument source for these bounds is Tropp [Tro12], in which the following result appears as Corollary 5.2.
that avoids replacement that I am presenting today.
Theorem 32.3.1. Let X 1 , . . . , X m be independent random n-dimensional
P symmetric positive
In the next lecture, we will see that the log n term is unnecessary. In fact, almost every graph can semidefinite matrices so that ∥X i ∥ ≤ R almost surely. Let X = i X i and let µmin and µmax be
be approximated by a sparse graph almost as well as the Ramanujan graphs approximate the minimum and maximum eigenvalues of
complete graphs.
X
E [X ] = E [X i ] .
i
32.2 Sparsification
Then,
" #  µmin /R
For this lecture, I define a graph H to be an ϵ-approximation of a graph G if X e−ϵ
Pr λmin ( X i ) ≤ (1 − ϵ)µmin ≤ n , for 0 < ϵ < 1, and
(1 − ϵ)1−ϵ
(1 − ϵ)LG ≼ LH ≼ (1 + ϵ)LG . "
i
#
X  µmax /R

We will show that every graph G has a good approximation by a sparse graph. This is a very Pr λmax ( X i ) ≥ (1 + ϵ)µmax ≤ n , for 0 < ϵ.
strong statement, as graphs that approximate each other have a lot in common. For example, (1 + ϵ)1+ϵ
i

1. the effective resistance between all pairs of vertices are similar in the two graphs, It is important to note that the matrices X 1 , . . . , X m can have different distributions. Also note
that as the norms of these matrices get bigger, the bounds above become weaker. As the
2. the eigenvalues of the graphs are similar,
1
For those who can’t stand the suspense, we reveal that we will choose the probabilities to be proportional to
3. the boundaries of all sets are similar, as these are given by χTS LG χS , and leverage scores of the edges.

255
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 257 CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 258

expressions above are not particularly easy to work with, we often use the following so that X
+/2 +/2
approximations. LG LH LG = X a,b .
  (a,b)∈E
e−ϵ 2
≤ e−ϵ /2 , for 0 < ϵ < 1, and We will choose the probabilities to be
(1 − ϵ)1−ϵ
 ϵ

e 2
1
≤ e−ϵ /3 , for 0 < ϵ < 1. def
pa,b =
+/2 +/2
wa,b LG L(a,b) LG ,
(1 + ϵ)1+ϵ R

Chernoff (and Hoeffding and Bernstein) bounds rarely come in exactly the form you want. for an R to be chosen later. Thus, when edge (a, b) is chosen, ∥Xa,b ∥ = R. Making this value
Sometimes you can massage them into the needed form. Sometimes you need to prove your own. uniform for every edge optimizes one part of Theorem 32.3.1.
For this reason, you may some day want to spend a lot of time reading how these are proved. You may wonder what we should do if one of these probabilities pa,b exceeds one. There are many
ways of addressing this issue. For now, pretend that it does not happen. We will then explain
how to deal with this at the end of lecture.
32.4 The key transformation
Recall that the leverage score of edge (a, b) written ℓa,b was defined in Lecture 14 to be the weight
of an edge times the effective resistance between its endpoints:
Before applying the matrix Chernoff bound, we make a transformation that will cause
µmin = µmax = 1. ℓa,b = wa,b (δ a − δ b )T L+
G (δ a − δ b ).
For positive definite matrices A and B, we have
To see the relation between the leverage score and pa,b , compute
A ≼ (1 + ϵ)B ⇐⇒ B −1/2 AB −1/2 ≼ (1 + ϵ)I .
The same things holds for singular semidefinte matrices that have the same nullspace:
+/2 +/2 +/2 +/2
+/2 +/2 +/2 +/2
LG L(a,b) LG = LG (δ a − δ b )(δ a − δ b )T LG
LH ≼ (1 + ϵ)LG ⇐⇒ LG LH LG ≼ (1 + ϵ)LG LG LG ,
+/2 +/2
+/2
= (δ a − δ b )T LG LG (δ a − δ b )
where LG is the square root of the pseudo-inverse of LG . Let
= (δ a − δ b )T L+
G (δ a − δ b )
+/2 +/2
Π = LG LG LG , = Reff (a, b).
which is the projection onto the range of LG . We now know that LG is an ϵ-approximation of LH
+/2 +/2 As we can quickly approximate the effective resistance of every edge, we can quickly compute
if and only if LG LH LG is an ϵ-approximation of Π.
sufficient probabilities.
As multiplication by a fixed matrix is a linear operation and expectation commutes with linear
Recall that the leverage score of an edge equals the probability that the edge appears in a random
operations,
+/2 +/2 +/2 +/2 +/2 +/2 spanning tree. As every spanning tree has n − 1 edges, this means that the sum of the leverage
ELG LH LG = LG (ELH ) LG = ELG LG LG = Π. scores is n − 1, and thus
X n−1 n
So, we really just need to show that this random matrix is probably close to its expectation, Π. It pa,b = ≤ .
R R
would probably help to pretend that Π is in fact the identity, as it will make it easier to (a,b)∈E

understand the analysis. In fact, you don’t have to pretend: you could project all the vectors and This is a very clean bound on the expected number of edges in H. One can use a Chernoff bound
matrices onto the span of Π and carry out the analysis there. (on real variables rather than matrices) to prove that it is exponentially unlikely that the number
of edges in H is more than any small multiple of this.

32.5 The probabilities

Let (
+/2 +/2
(wa,b /pa,b )LG L(a,b) LG with probability pa,b
X a,b =
0 otherwise,
CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 259 CHAPTER 32. SPARSIFICATION BY RANDOM SAMPLING 260

For your convenience, I recall another proof that the sum of the leverage scores is n − 1: We finally return to deal with the fact that there might be some edges for which pa,b ≥ 1 and so
X X definitely appear in H. There are two natural ways to deal with these—one that is easiest
ℓa,b = wa,b Reff (a, b) algorithmically and one that simplifies the proof. The algorithmically natural way to handle these
(a,b)∈E (a,b)∈E is to simply include these edges in H, and remove them from the analysis above. This requires a
X
wa,b (δ a − δ b )T L+ small adjustment to the application of the Matrix Chernoff bound, but it does go through.
= G (δ a − δ b )
(a,b)∈E From the perspective of the proof, the simplest way to deal with these is to split each such X a,b
X 
= wa,b Tr L+
G (δ a − δ b )(δ a − δ b )
T into many independent random edges: k = ⌊ℓa,b /R⌋ that appear with probability exactly 1, and
(a,b)∈E one more that appears with probability ℓa,b /R − k. This does not change the expectation of their
  sum, or the expected number of edges once we remember to add together the weights of edges
X
= Tr  L+ T
− δ b )(δ a − δ b ) that appear multiple times. The rest of the proof remains unchanged.
G wa,b (δ a
(a,b)∈E
 
X 32.7 Open Problem
= Tr L+
G wa,b La,b 
(a,b)∈E
 If I have time in class, I will sketch a way to quickly approximate the effective resistances of every
= Tr L+
G LG
edge in the graph. The basic idea, which can be found in [SS11] and which is carried out better in
= Tr (Π)
[KLP12], is that we can compute the effective resistance of an edge (a, b) from the solution to a
= n − 1. logarithmic number of systems of random linear equations in LG . That is, after solving a
logarithmic number of systems of linear equations in LG , we have information from which we can
estimates all of the effective resistances.
32.6 The analysis
In order to sparsify graphs, we do not actually need estimates of effective resistances that are
always accurate. We just need a way to identify many edges of low effective resistance, without
We will choose listing any that have high effective resistance. I believe that better algorithms for doing this
ϵ2
R= . remain to be found. Current fast algorithms that make progress in this direction and that exploit
3.5 ln n
such estimates may be found in [KLP12, Kou14, CLM+ 15, LPS15]. These, however, rely on fast
Thus, the number of edges in H will be at most 4(ln n)ϵ−2 with high probability. Laplacian equation solvers. It would be nice to be able to estimate effective resistances without
We have these. A step in this direction was recently taken in the works [CGP+ 18, LSY18], which quickly
X
EX a,b = Π. decompose graphs into the union of short cycles plus a few edges.
(a,b)∈E

It remains to show that it is unlikely to deviate from this by too much.


We first consider the case in which p(a,b) ≤ 1 for all edges (a, b). If this is the case, then Theorem
32.3.1 tells us that
 
X 
Pr  X a,b ≥ (1 + ϵ)Π ≤ n exp −ϵ2 /3R = n exp (−(3.5/3) ln n) = n−1/6 .
a,b

For the lower bound, we need to remember that we can just work orthogonal to the all-1s vector,
and so treat the smallest eigenvalue of Π as 1. We then find that
 
X 
Pr  X a,b ≤ (1 − ϵ)Π ≤ n exp −ϵ2 /2R = n exp (−(3.5/2) ln n) = n−3/2 ,
a,b
CHAPTER 33. LINEAR SIZED SPARSIFIERS 262

where we recall that Π = n1 LKn is the projection orthogonal to the constant vectors.
The problem of sparsification is then the problem of finding a small subset of these vectors,
S ⊆ E, along with scaling factors, c : S → IR, so that
X
(1 − ϵ)Π ≼ ca,b v (a,b) v T(a,b) ≼ (1 + ϵ)Π

Chapter 33 (a,b)∈S

If we project onto the span of the Laplacian, then the sum of the outer products of vectors v (a,b)
becomes the identity, and our goal is to find a set S and scaling factors ca,b so that

Linear Sized Sparsifiers (1 − ϵ)I n−1 ≼


X
ca,b v (a,b) v T(a,b) ≼ (1 + ϵ)I n−1 .
(a,b)∈S

That is, so that all the eigenvalues of the matrix in the middle lie between (1 − ϵ) and (1 + ϵ).

33.1 Overview
33.3 The main theorem
In this lecture, we will prove a slight simplification of the main result of [BSS12, BSS14]. This will
tell us that every graph with n vertices has an ϵ-approximation with approximately 4ϵ−2 n edges. Theorem 33.3.1. Let v 1 , . . . , v m be vectors in IRn so that
To translate this into a relation between approximation quality and average degree, note that
X
such a graph has average degree dave = 8ϵ−2 . So, v i v Ti = I .
√ i
2 2
ϵ≈ √ ,
d Then, for every ϵ > 0 there exists a set S along with scaling factors ci so that
X
which is about twice what you would get from a Ramanujan graph. Interestingly, this result even (1 − ϵ)2 I ≼ ci v i v Ti ≼ (1 + ϵ)2 I ,
works for average degree just a little bit more than 1. i∈S

and  
|S| ≤ n/ϵ2 .
33.2 Turning edges into vectors
The condition that the sum of the outer products of the vectors sums to the identity has a name,
In the last lecture, we considered the Laplacian matrix of a graph G times the square root of the
isotropic position. I now mention one important property of vectors in isotropic position
pseudoinverse on either side. That is,
  Lemma 33.3.2. Let v 1 , . . . , v m be vectors in isotropic position. Then, for every matrix M ,
+/2
X +/2 X
LG  wa,b L(a,b)  LG . v Ti M v i = Tr (M ) .
(a,b)∈E i

Today, it will be convenient to view this as a sum of outer products of vectors. Set
Proof. We have 
√ +/2 v T M v = Tr v v T M ,
v (a,b) = wa,b LG (δ a − δ b ).
so ! !
Then,   X X  X
X X v Ti M v i = Tr v i v Ti M = Tr v i v Ti M = Tr (I M ) = Tr (M ) .
+/2 +/2
LG  wa,b L(a,b)  LG = v (a,b) v T(a,b) = Π, i i i

(a,b)∈E (a,b)∈E

261
CHAPTER 33. LINEAR SIZED SPARSIFIERS 263 CHAPTER 33. LINEAR SIZED SPARSIFIERS 264

Today, we will prove that we can find a set of


√ 6n vectors for which all eigenvalues lie between
√ 1n Let A be a symmetric matrix with eigenvalues λ1 ≤ . . . ≤ λn . If u is larger than all of the
and 13n. If you divide all scaling factors by 13n, this puts the eigenvalues between 1/ 13 and
√ eigenvalues of A, then we call u an upper bound on A. To make this notion quantitive, we define
13. You can tighten the argument to prove Theorem 33.3.1. the upper barrier function
X 1
We will prove this theorem by an iterative argument in which we choose one vector at a time to Φu (A) = .
u − λi
add to the set S. We will set the scaling factor of a vector when we add it to S. It is possible that i

we will add a vector to S more than once, in which case we will increase its scaling factor each This is positive for all upper bounds u, goes to infinity as u approaches the largest eigenvalue,
time. Throughout the argument we will maintain the invariant that the eigenvalues of the scaled decreases as u grows, and is convex for u > λn . In particular, we will use
sum of outer produces is in the interval [l, u], where l and u are quantities that will change with Φu+δ (A) < Φu (A), for δ > 0. (33.1)
each addition to S. At the start of the algorithm, when S is empty, we will have
Also, observe that
l0 = −n and u0 = n. λn ≤ u − 1/Φu (A). (33.2)
We will exploit the following formula for the upper barrier function:
Every time we add a vector to S, we increase l by δL and u by δU , where 
Φu (A) = Tr (uI − A)−1 .
δL = 1/3 and δU = 2.
For a lower bound on the eigenvalues l, we will define an analogous lower barrier function
After we have done this 6n times, we will have l = n and u = 13n. X 1 
Φl (A) = = Tr (A − lI )−1 .
λi − l
i
33.4 Rank-1 updates This is positive whenever l is smaller than all the eigenvalues, goes to infinity as l approaches the
smallest eigenvalue, and decreases as l becomes smaller. In particular,
We will need to understand what happens to a matrix when we add the outer product of a vector. l + 1/Φl (A) ≤ λ1 . (33.3)
Theorem 33.4.1 (Sherman-Morrison). Let A be a nonsingular symmetric matrix and let v be a The analog of (33.1) is the following.
vector and let c be a real number. Then, Claim 33.5.1. Let l be a lower bound on A and let δ < 1/Φl (A). Then,
A−1 v v T A−1 1
T −1
(A − cv v ) =A −1
+c . Φl+δ (A) ≤ .
1 − cv T A−1 v 1/Φl (A) − δ

Note that this inequality is an equality when A is one-dimensional. In that case,


Proof. The easiest way to prove this is to multiply it out, gathering v T A−1 v terms into scalars:
1 1
  = .
A−1 v v T A−1 v v T A−1 T −1
2 vv A vv A
T −1
λ1 − l − δ 1/(1/λ1 − l) − δ
(A − cv v T ) A−1 + c −1 = I − cv v T A−1 + c −1 − c
T
1 − cv A v T
1 − cv A v 1 − cv A−1 v
T
  Proof. After rearranging terms, we see that the inequality is equivalent to
1 cv T Av
= I − cv v T A−1 1 − −1 + −1
1 − cv T A v 1 − cv T A v Φl+δ (A) − Φl (A) ≤ δΦl+δ (A)Φl (A).
= I. We then prove this by expanding in the eigenvalues, keeping in mind that all the terms λi − l − δ
are positive:
X 1 X 1
Φl+δ (A) − Φl (A) = −
λi − l − δ λi − l
i i
33.5 Barrier Function Arguments X δ
=
(λi − l − δ)(λi − l)
i
! !
To prove the main theorem we need a good way to measure progress. We would like to keep all X X
1 1
the eigenvalues of the matrix we have constructed at any point to lie in a nice range. But, more ≤δ .
(λi − l − δ) (λi − l)
than that, we need them to be nicely distributed within this range. To enforce this, we need to i i
measure how close the eigenvalues are to the limits.
CHAPTER 33. LINEAR SIZED SPARSIFIERS 265 CHAPTER 33. LINEAR SIZED SPARSIFIERS 266

Initially, we will have Lemma 33.6.1. If


1
u0 n ≥ v T U Av ,
Φl0 (0) = Φ−n (0) = 1 and Φ (0) = Φ (0) = 1. c
then
Φu+δU (A + cv v T ) ≤ Φu (A).
33.6 Barrier Function Updates
The miracle in the above formula is that the condition in the lemma just involves the vector v as
The most important thing to understand about the barrier functions is how they change when we the argument of a quadratic form.
add a vector to S. The Sherman-Morrison theorem tells us that happens when we change A to
We also require the following analog for the lower barrier function. The difference is that
A + cv v T :
increasing l by setting ˆl = l + δL increases the barrier function, and adding a vector decreases it.

Φu (A + cv v T ) = Tr (uI − A − cv v T )−1 Lemma 33.6.2. Define

 Tr (uI − A)−1 v v T (uI − A)−1
= Tr (uI − A)−1 + c (A − ˆlI )−2
1 − cv T (uI − A)−1 v LA = − (A − ˆlI )−1 .
T (uI − A)−1 (uI − A)−1 v
 (Φl+δL (A) − Φl (A))
Tr v
= Φu (A) + c T −1
1 − cv (uI − A) v If
v T (uI − A)−2 v 1
u
= Φ (A) + c . ≤ v T LA v ,
1 − cv T (uI − A)−1 v c
then
This increases the upper barrier function, and we would like to counteract this increase by Φl+δL (A + cv v T ) ≤ Φl (A).
b = u + δU , then we find
increasing u at the same time. If we advance u to u

uI − A)−2 v
v T (b If we fix the vector v and an increment δL , then this gives a lower bound on the scaling factor by
Φu+δU (A + cv v T ) = Φu+δU (A) + c which we need to multiply it for the lower barrier function not to increase.
uI − A)−1 v
1 − cv T (b
  uI − A)−2 v
v T (b
= Φu (A) − Φu (A) − Φu+δU (A) + .
uI − A)−1 v
1/c − v T (b 33.7 The inductive argument
We would like for this to be less than Φu (A). If we commit to how much we are going to increase
It remains to show that there exits a vector v and a scaling factor c so that
u, then this gives an upper bound on how large c can be. We want
  Φu+δU (A + cv v T ) ≤ Φu (A) and Φl+δL (A + cv v T ) ≤ Φl (A).
uI − A)−2 v
v T (b
Φu (A) − Φu+δU (A) ≥ ,
uI − A)−1 v
1/c − v T (b That is, we need to show that there is a vector v i so that
which is equivalent to v Ti U A v i ≤ v Ti LA v i .
1 v T (b

uI A)−2 v
≥ u uI − A)−1 v .
+ v T (b Once we know this, we can set c so that
c (Φ (A) − Φu+δU (A))
1
Define v Ti U A v i ≤ ≤ v Ti LA v i .
((u + δu )I − A)−2 c
UA = u + ((u + δu )I − A)−1 .
(Φ (A) − Φu+δU (A)) Lemma 33.7.1. X 1
We have established a clean condition for when we can add cv v T to S and increase u by δU v Ti U A v i ≤ + Φu (A).
δU
i
without increasing the upper barrier function.
CHAPTER 33. LINEAR SIZED SPARSIFIERS 267 CHAPTER 33. LINEAR SIZED SPARSIFIERS 268

Proof. By Lemma 33.3.2, we know So, for there to exist a v i that we can add to S with scale factor c so that neither barrier function
X increases, we just need that
v Ti U A v i = Tr (U A ) .
1 1 1
i + Φu (A) ≤ − .
δU δL 1/Φl (A) − δ
To bound this, we break it into two parts
 If this holds, then there is a v i so that
uI − A)−2
Tr (b
v i U A v i ≤ v i LA v i .
(Φu (A) − Φu+δU (A))
We then set c so that
and  1
v iU Av i ≤ ≤ v i LA v i .
uI − A)−1 .
Tr (b c
The second term is easiest We now finish the proof by checking that the numbers I gave earlier satisfy the necessary
 conditions. At the start both barrier functions are less than 1, and we need to show that this
uI − A)−1 = Φu+δ (A) ≤ Φu (A).
Tr (b
holds throughout the algorithm. At every step, we will have by induction
To bound the first term, consider the derivative of the barrier function with respect to u: 1 1 3
+ Φu (A) ≤ + 1 = ,
∂ u ∂ X 1 X  1 2 δU 2 2
Φ (A) = =− = −Tr (uI − A)−2 . and
∂u ∂u u − λi u − λi 1 1 1 3
i i
− ≥3− = .
As Φu (A) is convex in u, we may conclude that δL 1/Φl (A) − δL 1 − 1/3 2
So, there is always a v i that we can add to S and a scaling factor c so that both barrier function
∂ u+δu
Φu (A) − Φu+δU (A) ≥ −δU Φ uI − A)−2 .
(A) = δU Tr (b remain upper bounded by 1.
∂u
If we now do this for 6n steps, we will have
l = −n + 6n/3 = n and u = n + 2 · 6n = 13n.
The analysis for the lower barrier is similar, but the second term is slightly more complicated.
The bound stated at the beginning of the lecture comes from tightening the analysis. In
Lemma 33.7.2. X particular, it is possible to improve Lemma 33.7.2 so that it says
1 1
v Ti LA v i ≥ − . X 1 1
δL 1/Φl (A) − δL v Ti LA v i ≥ − .
i
δL 1/Φl (A)
i
Proof. As before, we bound   I recommend the paper for details.
Tr (A − (l + δL I ))−2
Φl+δL (A) − Φl (A)
33.8 Progress and Open Problems
by recalling that

Φl (A) = Tr (A − lI )−2 . • It is possible to generalize this result to sums of positive semidefinite matrices, instead of
∂l
outer products of vectors [dCSHS11].
As Φl (A) is convex in l, we have
∂ • It is now possible to compute sparsifiers that are almost this good in something close to
Φl+δL (A) − Φl (A) ≤ δL Φl+δL (A) = δL Tr (A − (l + δL )I )−2 . linear time. [AZLO15, LS15].
∂l
• Given last lecture, it seems natural to conjecture that the scaling factors of edges should be
To bound the other term, we use Claim 33.5.1 to prove proportional to their weights times effective resistances. Similarly, one might conjecture
 1 that if all vectors v i have the same norm, then the scaling factors are unnecessary. This is
Tr (A − (l + δL I )−1 ≤ .
1/Φl (A) − δL true, but not obvious. In fact, it is essentially equivalent to the Kadison-Singer problem
[MSS14, MSS15c].
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 270

34.2 First-Order Richardson Iteration

To get started, we will examine a simple, but sub-optimal, iterative method, Richardson’s
iteration. The idea of the method is to find an iterative process that has the solution to Ax = b
as a fixed point, and which converges. We observe that if Ax = b, then for any α,

Chapter 34 αAx = αb, =⇒


x + (αA − I)x = αb, =⇒
x = (I − αA)x + αb.

Iterative solvers for linear equations This leads us to the following iterative process:
x t = (I − αA)x t−1 + αb, (34.1)
where we will take x0 = 0. We will show that this converges if
We introduce basic iterative solvers for systems of linear equations: Richardson iteration and I − αA
Chebyshev’s method. We discuss Conjugate Gradient in the next Chapter, and iterative
refinement and preconditioning in Chapter 36. has norm less than 1, and that the convergence rate depends on how much the norm is less than
1. This is analogous to our analysis of random walks on graphs from Chapter 10.
As we are assuming A is symmetric, I − αA is symmetric as well, and so its norm is the
34.1 Why iterative methods? maximum absolute value of its eigenvalues. Let 0 < λ1 ≤ λ2 . . . ≤ λn be the eigenvalues of A.
Then, the eigenvalues of I − αA are
One is first taught to solve linear systems like 1 − αλi ,
and the norm of I − αA is
Ax = b
max |1 − αλi | = |max (1 − αλ1 , 1 − αλn )| .
i
by direct methods such as Gaussian elimination, computing the inverse of A, or the LU
factorization. However, elimination algorithms can be very slow. This is especially true when A is This is minimized by taking
sparse. Just writing down the inverse takes O(n2 ) space, and computing the inverse takes O(n3 ) 2
α= ,
time if we do it naively. This might be OK if A is dense. But, it is very wasteful if A only has λn + λ1
O(n) non-zero entries. in which case the smallest and largest eigenvalues of I − αA become
In general, we prefer algorithms whose running time is proportional to the number of non-zero λn − λ1
± ,
entries in the matrix A, and which do not require much more space than that used to store A. λn + λ1

Iterative algorithms solve linear equations while only performing multiplications by A, and and the norm of I − αA becomes
2λ1
performing a few vector operations. Unlike the direct methods which are based on elimination, the 1− .
λn + λ1
iterative algorithms do not find exact solutions. Rather, they get closer and closer to the solution
the longer they work. The advantage of these methods is that they need to store very little, and While we might not know λn + λ1 , a good guess is often sufficient. If we choose an
are often much faster than the direct methods. When A is symmetric, the running times of these α < 2/(λn + λ1 ), then the norm of I − αA is at most
methods are determined by the eigenvalues of A. 1 − αλ1 .
Throughout this lecture we will assume that A is positive definite or positive semidefinite.
To show that x t converges to the solution, x , consider the difference x − x t . We have

x − x t = ((I − αA)x + αb) − (I − αA)x t−1 + αb
= (I − αA)(x − x t−1 ).

269
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 271 CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 272

So, 34.4 The norm of the residual


x − x t = (I − αA)t (x − x 0 ) = (I − αA)t x .
and Thinking about ∥x − x t ∥ is a little awkward because we do not know x . For this reason, people
often measure the quality of approximation of a solution to a system of linear equations by
x − x t = (I − αA)t x ≤ (I − αA)t ∥x ∥ ∥b − Ax t ∥. For this quantity, the same sort of convergence results hold. First observe that
= ∥(I − αA)∥t ∥x ∥
 t b − Ax t = Ax − Ax t = A(I − αA)t x = (I − αA)t Ax = (I − αA)t b.
2λ1
≤ 1− ∥x ∥ . So, the right choice of α guarantees that
λn + λ1
−2λ1 t/(λn +λ1 )
≤e ∥x ∥ . ∥b − Ax t ∥
≤ e−2λ1 t/(λn +λ1 ) .
∥b∥
So, if we want to get a solution xt with

x − xt In Chapter 35 we will encounter a more useful measure of convergence—convergence in the


≤ ϵ, A-norm.
∥x ∥

it suffices to run for  


λn + λ1 λn 1 34.5 A polynomial approximation of the inverse
ln(1/ϵ) = + ln(1/ϵ).
2λ1 2λ1 2
iterations. The term I am now going to give another interpretation of Richardson’s iteration. It provides us with a
λn
polynomial in A that approximates A−1 . In particular, the tth iterate, x t can be expressed in the
λ1
form
is called the condition number 1 of the matrix A, when A is symmetric. It is often written κ(A), pt (A)b,
and the running time of iterative algorithms is often stated in terms of this quantity. We see that
where pt is a polynomial of degree t.
if the condition number is small, then this algorithm quickly provides an approximate solution.
We will view pt (A) as a good approximation of A−1 if

34.3 Expanders ∥Apt (A) − I∥

is small. From the formula defining Richardson’s iteration (34.1), we find


Let’s pause a moment to consider the problem of solving systems in the Laplacians of expander
graphs. These are singular, but we know that their nullspace is spanned by the constant vectors. x 0 = 0,
So, if we work orthogonal to the constant vectors their effective smallest eigenvalue is λ2 . If the x 1 = αb,
graph is an ϵ-expander, then its condition number, λn /λ2 , will be approximately 1 + 2ϵ. Thus, we
x 2 = (I − αA)αb + αb,
can solve systems of linear equations in this Laplacian very quickly.
x 3 = (I − αA)2 αb + (I − αA)αb + αb, and
This should make intuitive sense: the Laplacian of an expander is an approximation of the t
X
Laplacian of a complete graph. And, the Laplacians of complete graphs act as multiples of the xt = (I − αA)i αb.
identity on the space orthogonal to constant vectors. i=0

In contrast, Gaussian elimination on expanders is slow: it takes time Ω(n3 ) and requires space To get some idea of why this should be an approximation of A−1 , consider the limit as t goes to
Ω(n2 ) [LRT79]. infinity. Assuming that the infinite sum converges, we obtain
1
For general matrices, the condition number is defined to be the ratio of the largest to smallest singular value. ∞
X
α (I − αA)i = α (I − (I − αA))−1 = α(αA)−1 = A−1 .
i=0

So, the Richardson iteration can be viewed as a truncation of this infinite summation.
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 273 CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 274

In general, a polynomial pt will enable us to compute a solution to precision ϵ if 34.7 Chebyshev Polynomials
∥pt (A)b − x ∥ ≤ ϵ ∥x ∥ .
I’d now like to explain how we find these better polynomials. The key is to transform one of the
As b = Ax , this is equivalent to most fundamental polynomials: the Chebyshev polynomials. These polynomials are as small as
∥pt (A)Ax − x ∥ ≤ ϵ ∥x ∥ , possible on [−1, 1], and grow quickly outside this interval. We will translate the interval [−1, 1] to
obtain the polynomials we need.
which is equivalent to
∥Apt (A) − I ∥ ≤ ϵ The tth Chebyshev polynomial, Tt (x) has degree t, and may be defined by setting

T0 (x) = 1, T1 (x) = x,
34.6 Better Polynomials and for t ≥ 2
Tt (x) = 2xTt−1 (x) − Tt−2 (x).
This leads us to the question of whether we can find better polynomial approximations to A−1 .
The reason I ask is that the answer is yes! As A, pt (A) and I all commute, the matrix These polynomials are best understood by realizing that they are the polynomials for which

Apt (A) − I cos(tθ) = Tt (cos(θ)) and cosh(tθ) = Tt (cosh(θ)).


is symmetric and its norm is the maximum absolute value of its eigenvalues. So, it suffices to find It might not be obvious that one can express cos(tθ) as a polynomial in cos(θ). To see this, and
a polynomial pt such that the correctness of the above formulas, recall that
|λi pt (λi ) − 1| ≤ ϵ,
1  iθ  1 θ 
for all eigenvalues λi of A. cos(θ) = e + e−iθ , and cosh(θ) = e + e−θ .
2 2
To reformulate this, define To verify that these satisfy the stated recurrences with x = cos(θ), compute
qt (x) = 1 − xp(x).
1 θ   1 
Then, it suffices to find a polynomial qt of degree t + 1 for which 2Tt−1 (x) − Tt−2 (x) = e + e−θ e(t−1)θ + e−(t−1)θ − e(t−2)θ + e−(t−2)θ
2 2
qt (0) = 1, and 1  (tθ  1  1 
= e + e−tθ + e(t−2)θ + e−(t−2)θ − e(t−2)θ + e−(t−2)θ
|qt (x)| ≤ ϵ, for λ1 ≤ x ≤ λn . 2 2 2
1  (tθ 
= e + e−tθ .
2
We will see that there are polynomials of degree
p 
ln(2/ϵ) λn /λ1 + 1 /2 Thus, (
cos(t acos(x)) for |x| ≤ 1, and
that satisfy these conditions and thus allow us to compute solutions of accuracy ϵ. In terms of the Tt (x) =
cosh(t acosh(x)) for x ≥ 1.
condition number of A, this is a quadratic improvement over Richardson’s first-order method.
Claim 34.7.1. For x ∈ [−1, 1], |Tt (x)| ≤ 1.
Theorem 34.6.1. For every t ≥ 1, and 0 < λmin ≤ λmax , there exists a polynomial qt (x) such
that
Proof. For x ∈ [−1, 1], there is a θ so that cos(θ) = x. We then have Tt (x) = cos(tθ), which must
1. |qt (x)| ≤ ϵ , for λmin ≤ x ≤ λmin , and also be between −1 and 1.

2. qt (0) = 1, To compute the values of the Chebyshev polynomials outside [−1, 1], we use the hyperbolic cosine
function. Hyperbolic cosine maps the real line to [1, ∞] and is symmetric about the origin. So,
for √ √ the inverse of hyperbolic cosine may be viewed as a map from [1, ∞] to [0, ∞], and satisfies
ϵ ≤ 2(1 + 2/ κ)−t ≤ 2e−2t/ κ ,  
p
where acosh(x) = ln x + x2 − 1 , for x ≥ 1.
λmax
κ= .
λmin
CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 275 CHAPTER 34. ITERATIVE SOLVERS FOR LINEAR EQUATIONS 276

Claim 34.7.2. For γ > 0, p t We know that |Tt (l(x))| ≤ 1 for x ∈ [λmin , λmax ]. To find q(x) for x in this range, we must
Tt (1 + γ) ≥ (1 + 2γ) /2. compute Tt (l(0)). We have
l(0) ≥ 1 + 2/κ(A),
Proof. Setting x = 1 + γ, we compute and so by properties 3 and 4 of Chebyshev polynomials,
1  t acosh(x)  √
Tt (x) = e + e−t acosh(x) Tt (l(0)) ≥ (1 + 2/ κ)t /2.
2
1  t acosh(x) 
≥ e Thus, √
2 q(x) ≤ 2(1 + 2/ κ)−t ,
1 p
= (x + x2 − 1)t √
2 for x ∈ [λmin , λmax ], and so all eigenvalues of q(A) will have absolute value at most 2(1 + 2/ κ)−t .
1 p
= (1 + γ + (1 + γ)2 − 1)t
2
1 p
= (1 + γ + 2γ + γ 2 )t 34.9 Laplacian Systems
2
1 p
≥ (1 + 2γ)t . One might at first think that these techniques do not apply to Laplacian systems, as these are
2
always singular. However, we can apply these techniques without change if b is in the span of L.
That is, if b is orthogonal to the all-1s vector and the graph is connected. In this case the
eigenvalue λ1 = 0 has no role in the analysis, and it is replaced by λ2 . One way of understanding
this is to just view L as an operator acting on the space orthogonal to the all-1s vector.
34.8 Proof of Theorem 34.6.1
By considering the example of the Laplacian of the path graph, one can show that it is impossible

to do much better than the κ iteration bound that I claimed at the end of the last section. To
We will exploit the following properties of the Chebyshev polynomials: see this, first observe that when one multiplies a vector x by L, the entry (Lx )(i) just depends on
x (i − 1), x (i), and x (i + 1). So, if we apply a polynomial of degree at most t, x t (i) will only
1. Tt has degree t. depend on b(j) with i − t ≤ j ≤ i + t. This tells us that we will need a polynomial of degree on
the order of n to solve such a system.
2. Tt (x) ∈ [−1, 1], for x ∈ [−1, 1]. p
On the other hand, λn /λ2 is on the order of n as well. So, we shouldp not be able to solve the
3. Tt (x) is monotonically increasing for x ≥ 1. system with a polynomial whose degree is significantly less than λn /λ2 .
√ t
4. Tt (1 + γ) ≥ (1 + 2γ) /2, for γ > 0.

To express qt (x) in terms of a Chebyshev polynomial, we should map the range on which we want
34.10 Warning
qt to be small, [λmin , λmax ] to [−1, 1]. We will accomplish this with the linear map:
The polynomial-based approach that I have described here only works in infinite precision
def λmax + λmin − 2x arithmetic. In finite precision arithmetic one has to be more careful about how one implements
l(x) = .
λmax − λmin these algorithms. This is why the descriptions of methods such as the Chebyshev method found
in Numerical Linear Algebra textbooks are more complicated than that presented here. The
Note that 
 algorithms that are actually used are mathematically identical in infinite precision, but they
−1 if x = λmax
actually work. The problem with the naive implementations are the typical experience: in
l(x) = 1 if x = λmin

 λmax +λmin double-precision arithmetic the polynomial approach to Chebyshev will fail to solve linear systems
λmax −λmin if x = 0. in random positive definite matrices in 60 dimensions!
To guarantee that the constant coefficient in qt (x) is one (qt (0) = 1), we should set

def Tt (l(x))
qt (x) = .
Tt (l(0))
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 278

The analysis above works because these methods produce x t by applying a linear operator, p(A),
to b that commutes with A. While most of the algorithms we use to solve systems of equations in
A will be linear operators, they will typically not commute with A. But, they will produce small
error in the A-norm.
The following theorem shows that a linear operator Z is an ϵ approximation of A−1 if and only if
Chapter 35 it produces at most ϵ error in the A-norm when used to solve systems of linear equations in A.
Theorem 35.1.1. Let A and Z be positive definite matrices. Then
∥Z Ax − x ∥A ≤ ϵ ∥x ∥A (35.1)

The Conjugate Gradient and for all x if and only if


(1 − ϵ)A−1 ≼ Z ≼ (1 + ϵ)A−1 .
Diameter Proof. The assertion that (35.1) holds for all x is equivalent to the assertion that for all x ,

A1/2 (Z A − I )x ≤ ϵ A1/2 x .

We introduce the matrix norm as the measure of convergence of iterative methods, and show how Setting y = A1/2 x , this becomes equivalent to saying that for all y ,
the Conjugate Gradient method efficiently minimizes it. We finish by relating the rate of
(A1/2 Z A1/2 − I )y ≤ ϵ ∥y ∥ ,
convergence of any iterative method on a Laplacian matrix to the diameter of the underlying
graph. which we usually write
My description of the Conjugate Gradient method is inspired by Vishnoi’s [Vis12]. It is the A1/2 Z A1/2 − I ≤ ϵ.
simplest explanation of the Conjugate Gradient that I have seen. This is in turn equivalent to
−ϵI ≼ A1/2 Z A1/2 − I ≼ ϵI ⇐⇒
35.1 The Matrix Norm (1 − ϵ)I ≼ A1/2 Z A1/2 ≼ (1 + ϵ)I ⇐⇒
(1 − ϵ)A −1
≼Z ≼ (1 + ϵ)A −1
,
Recall from Chapter 14 that for a positive semidefinite matrix A, the matrix norm in A is defined
by where the last statement follows from multiplying on the left and right by A−1/2 .

∥x ∥A = x T Ax = A1/2 x .

For many applications, the right way to measure the quality of approximation of a system of 35.2 Application: Approximating Fiedler Vectors
linear equations Ax = b is by ∥x − x t ∥A . Many algorithms naturally produce bounds on the
error in the matrix norm. And, for many applications that use linear equation solvers as Approximately computing eigenvectors of the smallest eigenvalues of matrices, such as Fiedler
subroutines, this is the measure of accuracy in the subroutine that most naturally translates to vectors, is one application in which approximation in the A-norm is the right thing to do. In
accuracy of the outside algorithm. problem [?], we saw that the largest eigenvalue of a matrix can be approximated using the power
We should observe that both the Richardon and Chebyshev methods achieve ϵ error in the method. If we want the smallest eigenvalue, it is natural to use the power method on the inverse
A-norm. Let p be a polynomial such that of the matrix.
As we are only going to compute an approximation of the eigenvalue and its corresponding
∥p(A)A − I ∥ ≤ ϵ. eigenvector, we might as well use an approximation of the matrix inverse. If Z is an operator that
ϵ-approximates A−1 , then the largest eigenvalue of Z is within 1 ± ϵ of the largest eigenvalue of
Then,
A−1 , and the corresponding eigenvector has large Rayleigh quotient with respect to A−1 . As we
learned in problem [?], if there is a gap between this and the next eigenvalue, then this vector
∥p(A)b − x ∥A = A1/2 p(A)Ax − A1/2 x = (p(A)A − I )A1/2 x ≤ ϵ A1/2 x = ϵ ∥x ∥A .
makes a small angle with the eigenvector. See [ST14, Section 7] for a more detailed discussion.

277
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 279 CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 280

35.3 Optimality in the A-norm is minimized by setting its derivative in ci equal to zero, which gives

bT pi
The iterative methods that we consider begin with the vector b, and then perform multiplications ci = .
p Ti Ap i
by A and take linear combinations with vectors that have already been produced. So, after t
iterations they produce a vector that is in the span of
 It remains to describe how we compute this A-orthogonal basis. The algorithm begins by setting
b, Ab, A2 b, . . . , At b .
p 0 = b.
This subspace is called the t + 1st Krylov subspace generated by A and b.
The next vector should be Ap 0 , but A-orthogonalized with respect to p 0 . That is,
The Conjugate Gradient will find the vector x t in this subspace that minimizes the error in the
A-norm. It will do so by computing a very useful basis of this subspace. But, before we describe (Ap 0 )T Ap 0
p 1 = Ap 0 − p 0 .
this basis, let’s examine the error in the A norm. p T0 Ap 0
We have It is immediate that
∥x t − x ∥2A = x Tt Ax t − 2x T Ax t + x T Ax = x Tt Ax t − 2b T x t + b T x . p T0 Ap 1 = 0.
T
While we do not know b x , we do know that ∥x t − x ∥2A is minimized when we minimize In general, we set
t
X
1 T (Ap t )T Ap i
x Ax t − b T x t . (35.2) p t+1 = Ap t − pi . (35.3)
2 t i=0
p Ti Ap i
So, we will work to minimize (35.2).
Let’s verify that p t+1 is A-orthogonal to p i for i ≤ t, assuming that p 0 , . . . , p t are A-orthogonal.
Let p 0 , . . . , p t be a basis of the t + 1st Krylov subspace, and let We have
t
X t
X (Ap t )T Ap i
xt = ci p i . p Tj Ap t+1 = p Tj AAp t − p Tj Ap i
i=0
p Ti Ap i
i=0
(Ap t )T Ap i
We would like to find the coefficients ci that minimize (35.2). Expanding x gives = p Tj A2 p t − p Tj Ap j
p Tj Ap j
t
!T t
! t
!
1 T 1 X X X = 0.
x t Ax t − b T x t = ci p i A ci p i − b T ci p i
2 2
i=0 i=0 i=0
t t The computation of p t+1 is greatly simplified by the observation that all but two of the terms in
1X 2 T X 1X
= ci p i Ap i − ci b T p i + ci cj p Ti Ap j . the sum (35.3) are zero: for i < t − 1,
2 2
i=0 i=0 i̸=j
(Ap t )T Ap i = 0.
To simplify the selection of the optimal constants ci , the Conjugate Gradient will compute a basis
p 0 , . . . , p t that makes the rightmost term 0. That is, it will compute a basis such that p Ti Ap j = 0 To see this, note that
for all i ̸= j. Such a basis is called an A-orthogonal basis. (Ap t )T Ap i = p Tt A(Ap i ),

When the last term is zero, the objective function becomes and that Ap i is in the span of p 0 , . . . , p i+1 . So, this term will be zero if i + 1 < t.

X t   That means that


1 2 T
ci p i Ap i − ci b T p i . (Ap t )T Ap t (Ap )T Ap t−1
2 p t+1 = Ap t − p t − p t−1 T t .
i=0 p Tt Ap t p t−1 Ap t−1
So, the terms corresponding to different is do not interact, and we can minimize the sum by So, one can compute p t+1 from p t and p t−1 while using only a constant number of
minimizing each term individually. The term multiplications by A and a constant number of vector operations. This means that one can
1 2 T compute the entire basis p 0 , . . . , p t while performing only O(t) multiplications of vectors by A
c p Ap i − ci b T p i and O(t) vector operations.
2 i i
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 281 CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 282

The computation of x t by Laplacian of the hypercube. While there are other fast algorithms the exploit the special structure
t
X bT p of the hypercube, CG works well when one has a graph that is merely very close to the hypercube.
xt = pi T i .
i=0
p i Ap i In general, CG works especially quickly on matrices in which the eigenvalues appear in just a few
Only requires an additional O(t) more such operations. clusters, and on matrices in which there are just a few extreme eigenvalues. We will learn more
about this in the next lecture.
In fact, only t multiplications by A are required to compute p 0 , . . . , p t and x 1 , . . . , x t : every term
in the expressions for these vectors can be derived from the products Ap i . Thus, the Conjugate
Gradient algorithm can find the x t in the t + 1st Krylov subspace that minimizes the error in the 35.5 Laplacian Systems, again
A-norm in time O(tn) plus the time required to perform t multiplications by A.
Caution: the algorithm that I have presented here differs from the implemented Conjugate This would be a good time to re-examine what we want when our matrix is a Laplacian. The
gradient in that the implemented Conjugate Gradient re-arranges this computation to keep the Laplacian does not have an inverse. Rather, we want a polynomial in the Laplacian that
norms of the vectors involved reasonably small. Without this adjustment, the algorithm that I’ve approximates its pseudo-inverse (which we defined back in Lecture 8). If we were exactly solving
described will fail in practice as the vectors p i will become too large. the system of linear equations, we would have found a polynomial p such that

p(L)b = x ,
35.4 How Good is CG?
where b = Lx , so this gives
p(L)Lx = x .
The Conjugate Gradient is at least as good as the Chebyshev iteration, in that it finds a vector of
smaller error in the A-norm in any given number of iterations. The optimality property of the Of course, this is only reasonable if x is in the span of L. If the underlying graph is connected,
Conjugate Gradient causes it to perform remarkably well. this only happens if x is orthogonal to the all-1s vector. Of course, L sends constant vectors to
zero. So, we want
For example, one can see that it should never require more than n iterations. The vector x is p(L)L = Π,
always in nth Krylov subspace. Here’s an easy way to see this. Let the distinct eigenvalues of A
where Π is the projection matrix that sends the constant vectors to zero, and acts as an identity
be λ1 , . . . , λk . Now, consider the polynomial
on the vectors that are orthogonal to the constant vectors. Recall that Π = n1 LKn .
Qk
def (λi − x)
q(x) = i=1 Qk . Similarly, p gives an ϵ-approximation of the pseudo-inverse if
i=1 λi
∥p(L)L − Π∥ ≤ ϵ.
You can verify that q is a degree k polynomial such that
q(0) = 1, and
q(λi ) = 0, for all i. 35.6 Bounds on the Diameter
So, CG should be able to find the exact answer to a system in A in k − 1 iterations. I say
“should” because, while this statement is true with infinite precision arithmetic, it doesn’t work Our intuition tells us that if we can quickly solve linear equations in the Laplacian matrix of a
out quite this well in practice. graph by an iterative method, then the graph should have small diameter. We now make that
intuition precise.
Ignoring for now issues of finite arithmetic, let’s consider the importance of this for sparse
matrices A. By a sparse matrix, I mean one with at most cn non-zero entries, for some constant If s and t are vertices that are at distance greater than d from each other, then
c. That’s not a rigorous definition, but it will help guide our discussion. Multiplication by a χTs Ld χt = 0.
sparse matrix can be done in time O(n). So, CG can solve a system of equations in a sparse
matrix in time O(n2 ). Note that this is proportional to how long it would take to just write the On the other hand, if L only has k distinct eigenvalues other than 0, then we can form a
inverse of A, and will probably be faster than any algorithm for computing the inverse. On the polynomial p of degree k − 1 such that
other hand, it only provides the solution to one system in A.
Lp(L) = Π.
For another interesting example, consider the hypercube graph on n vertices. It only has log2 n
distinct eigenvalues. So, CG will only need log2 n iterations to solve linear systems in the This allows us to prove the following theorem.
CHAPTER 35. THE CONJUGATE GRADIENT AND DIAMETER 283

Theorem 35.6.1. Let G be a connected graph whose Laplacian has at most k distinct eigenvalues
other than 0. Then, the diameter of G is at most k.

Proof. Let d be the diameter of the graph and let s and t be two vertices at distance d from each
other. We have
e Ts Πe t = −1/n.
On the other hand, we have just described a polynomial in L with zero constant term, given by
Chapter 36
Lp(L), that has degree k and such that

Lp(L) = Π.
Preconditioning Laplacians
If the degree of this polynomial were less than d, we would have

e Ts Lp(L)e t = 0.

As this is not the case, we have d ≤ k. A preconditioner for a positive semidefinite matrix A is a positive semidefinite matrix B such
that it is easy to solve systems of linear equations in B and the condition number of B −1 A is
We can similarly obtain bounds on the diameter from approximate pseudo-inverses. If p is a small. A good preconditioner allows one to quickly solve systems of equations in A.
polynomial such that In this lecture, we will measure the quality of preconditioners in terms of the ratio
∥p(L)L − Π∥ ≤ ϵ,
def
κ(A, B) = β/α,
then
e Ts (p(L)L − Π)e t ≤ ∥e s ∥ ∥p(L)L − Π∥ ∥e t ∥ ≤ ϵ. where α is the largest number and β is the smallest such that
If s and t are at distance d from each other in the graph, and if the degree of p(L)L has degree
αB ≼ A ≼ βB.
less than d, then
e Ts (p(L)L − Π)e t = e Ts (−Π)e t = 1/n. Lemma 36.0.1. Let α and β be as defined above. Then, α and β are the smallest and largest
This is a contradiction if ϵ < 1/n. So, the polynomials we constructed from Chebyshev eigenvalues of B −1 A, excluding possible zero eigenvalues corresponding to a common nullspace of
polynomials imply the following theorem of Chung, Faber and Manteuffel [CFM94] A and B.

Theorem 35.6.2. Let G = (V, E) be a connected graph, and let λ2 ≤ · · · ≤ λn be its Laplacian We need to exclude the common nullspace when A and B are the Laplacian matrices of
eigenvalues. Then, the diameter of G is at most connected graphs. If these matrices have different nullspaces α = 0 or β = ∞ and the condition
r ! number β/α is infinite.
1 λn
+ 1 ln 2n.
2 λ2 Proof of Lemma 36.0.1. We just prove the statement for β, in the case where neither matrix is
singular. We have

λmax (B −1 A) = λmax (B −1/2 AB −1/2 )


x T B −1/2 AB −1/2 x
= max
x xTx
y T Ay
= max T , settting y = B −1/2 x ,
y y By

which equals β.

Recall that the eigenvalues of B −1 A are the same as those of B −1/2 AB −1/2 and A1/2 B −1 A1/2 .

284
CHAPTER 36. PRECONDITIONING LAPLACIANS 285 CHAPTER 36. PRECONDITIONING LAPLACIANS 286

36.1 Approximate Solutions So, x 1 + x 2 , our new estimate of x , differs from x by at most an ϵ2 factor. Continuing in this
way, we can find an ϵk approximation of x after solving k linear systems in B. This procedure is
Recall the A-norm: called iterative refinement.

∥x ∥A = x T Ax = A1/2 x .

We say that xe is an ϵ-approximate solution to the problem Ax = b if 36.3 Iterative Methods in the Matrix Norm
∥e
x − x ∥A ≤ ϵ ∥x ∥A .
The iterative methods we studied last class can also be shown to produce good approximate
solutions in the matrix norm. Given a matrix A, these produce ϵ-approximation solutions after t
36.2 Iterative Refinement iterations if there is a polynomial q of degree t for which q(0) = 1 and |q(λi )| ≤ ϵ for all
eigenvalues of A. To see this, recall that we can define p(x) so that q(x) = 1 − xp(x), and set

We will now see how to use a very good preconditioner to solve a system of equations. Let’s xe = p(A)b,
consider a preconditioner B that satisfies
to get
(1 − ϵ)B ≼ A ≼ (1 + ϵ)B. ∥e
x − x ∥A = ∥p(A)b − x ∥A = ∥p(A)Ax − x ∥A .
So, all of the eigenvalues of
As I , A, p(A) and A1/2 all commute, this equals
A1/2 B −1 A1/2 − I
have absolute value at most ϵ. A1/2 p(A)Ax − A1/2 x = p(A)AA1/2 x − A1/2 x
−1
The vector B b is a good approximation of x in the A-norm. We have ≤ ∥p(A)A − I ∥ A1/2 x
−1 1/2 −1 1/2
B b −x A
= A B b −A x ≤ ϵ ∥x ∥A .
1/2 −1 1/2
= A B Ax − A x

= A 1/2
B −1
A 1/2
(A 1/2
x ) − A1/2 x
36.4 Preconditioned Iterative Methods

≤ A1/2 B −1 A1/2 − I A1/2 x Preconditioned iterative methods can be viewed as the extension of Iterative Refinement by
algorithms like Chebyshev iteration and the Preconditioned Conjugate Gradient. These usually
≤ ϵ A1/2 x
work with condition numbers much larger than 2.
= ϵ ∥x ∥A .
In each iteration of a preconditioned method we will solve a system of equations in B, multiply a
vector by A, and perform a constant number of other vector operations. For this to be
worthwhile, the cost of solving equations in B has to be low.
Remark: This result crucially depends upon the use of the A-norm. It fails under the Euclidean
We begin by seeing how the analysis with polynomials translates. Let λi be the ith eigenvalue of
norm.
B −1 A. If qt (x) = 1 − xpt (x) is a polynomial such that |qt (λi )| ≤ ϵ for all i, then
If we want a better solution, we can just compute the residual and solve the problem in the
def
residual. That is, we set x t = pt (B −1 A)B −1 b
x 1 = B −1 b,
and compute
r 1 = b − Ax 1 = A(x − x 1 ).
We then use one solve in B to compute a vector x 2 such that

∥(x − x 1 ) − x 2 ∥A ≤ ϵ ∥x − x 1 ∥A ≤ ϵ2 ∥x ∥A .
CHAPTER 36. PRECONDITIONING LAPLACIANS 287 CHAPTER 36. PRECONDITIONING LAPLACIANS 288

will be an ϵ-approximate solution to Ax = b: We will now show that a special type of tree, called a low-stretch spanning tree provides a very
good preconditioner. To begin, let T be a spanning tree of G. Write
∥x − x t ∥A = A1/2 x − A1/2 x t X X
LG = wu,v Lu,v = wu,v (χu − χv )(χu − χv )T .
= A1/2 x − A1/2 pt (B −1 A)B −1 b (u,v)∈E (u,v)∈E

1/2 1/2 −1 −1
= A x −A pt (B A)B Ax We will actually consider the trace of L−1
T LG . As the trace is linear, we have
1/2 1/2 −1 −1 1/2 1/2  X 
= A x −A pt (B A)B A (A x) Tr L−1
T LG = wu,v Tr L−1
T Lu,v
(u,v)∈E
≤ I − A1/2 pt (B −1 A)B −1 A1/2 (A1/2 x ) . X 
= wu,v Tr L−1
T (χu − χv )(χu − χv )
T

We now prod this matrix into a more useful form: (u,v)∈E


X 
1/2 −1 −1 1/2 1/2 −1 1/2 1/2 −1 1/2 1/2 −1 1/2 = wu,v Tr (χu − χv )T L−1
T (χu − χv )
I −A pt (B A)B A = I − pt (A B A )A B A = qt (A B A ).
(u,v)∈E
X
So, we find = wu,v (χu − χv )T L−1
T (χu − χv ).
1/2 −1 1/2 1/2
∥x − x t ∥A ≤ qt (A B A ) (A x ) ≤ ϵ ∥x ∥A . (u,v)∈E

To evaluate this last term, we need to know the value of (χu − χv )T L−1 T (χu − χv ). You already
The Preconditioned Conjugate Gradient (PCG) is a magical algorithm that after t steps (each of know something about it: it is the effective resistance in T between u and v. In a tree, this equals
which involves solving a system in B, multiplying a vector by A, and performing a constant the distance in T between u and v, when we view the length of an edge as the reciprocal of its
number of vector operations) produces the vector x t that minimizes weight. This is because it is the resistance of a path of resistors in series. Let T (u, v) denote the
path in T from u to v, and let w1 , . . . , wk denote the weights of the edges on this path. As we
∥x t − x ∥A
view the weight of an edge as the reciprocal of its length,
over all vectors x t that can be written in the form pt (b) for a polynomial of degree at most t. k
X 1
That is, the algorithm finds the best possible solution among all iterative methods of the form we (χu − χv )T L−1
T (χu − χv ) = . (36.1)
have described. We first bound the quality of PCG by saying that it is at least as good as wi
i=1
Preconditioned Chebyshev, but it has the advantage of not needing to know α and β. We will
then find an improved analysis. Even better, the term (36.1) is something that has been well-studied. It was defined by Alon,
Karp, Peleg and West [AKPW95] to be the stretch of the unweighted edge (u, v) with respect to
the tree T . Moreover, the stretch of the edge (u, v) with weight wu,v with respect to the tree T is
36.5 Preconditioning by Trees defined to be exactly
X k
1
wu,v ,
Vaidya [Vai90] had the remarkable idea of preconditioning the Laplacian matrix of a graph by the wi
i=1
Laplacian matrix of a subgraph. If H is a subgraph of G, then
where again w1 , . . . , wk are the weights on the edges of the unique path in T from u to v. A
LH ≼ LG , sequence of works, begining with [AKPW95], has shown that every graph G has a spanning tree
in which the sum of the stretches of the edges is low. The best result so far is due to [AN12], who
so all eigenvalues of L−1
H LG are at least 1. Thus, we only need to find a subgraph H such that LH
prove the following theorem.
is easy to invert and such that the largest eigenvalue of L−1
H LG is not too big. Theorem 36.5.1. Every weighted graph G has a spanning tree subgraph T such that the sum of
It is relatively easy to show that linear equations in the Laplacian matrices of trees can be solved the stretches of all edges of G with respect to T is at most
exactly in linear time. One can either do this by finding an LU -factorization with a linear number
of non-zeros, or by viewing the process of solving the linear equation as a dynamic program that O(m log n log log n),
passes up once from the leaves of the tree to a root, and then back down. where m is the number of edges G. Moreover, one can compute this tree in time
O(m log n log log n).
CHAPTER 36. PRECONDITIONING LAPLACIANS 289 CHAPTER 36. PRECONDITIONING LAPLACIANS 290

Thus, if we choose a low-stretch spanning tree T , we will ensure that Applying this lemma to the analysis of the Preconditioned Conjugate Gradient, with
X 2/3 1/3
 β = Tr L−1T LG and k = Tr L−1
T LG , we find that the algorithm produces ϵ-approximate
Tr L−1
T LG = wu,v (χu − χv )T L−1
T (χu − χv ) ≤ O(m log n log log n). solutions within 
(u,v)∈E 1/3
O(Tr L−1
T LG ln(1/ϵ)) = O(m1/3 log n ln 1/ϵ)
In particular, this tells us that λmax (L−1
T LG ) is at most O(m log n log log n), and so the iterations.
Preconditioned Conjugate Gradient will require at most O(m1/2 log n) iterations, each of which
requires one multiplication by LG and one linear solve in LT . This gives an algorithm that runs in This result is due to Spielman and Woo [SW09].
time O(m3/2 log n log 1/ϵ), which is much lower than the O(n3 ) of Gaussian elimination when m,
the number of edges in G, is small.
36.7 Further Improvements
This result is due to Boman and Hendrickson [BH01].

We now have three families of algorithms for solving systems of equations in Laplaican matrices
36.6 Improving the Bound on the Running Time in nearly-linear time.

• By subgraph preconditioners. These basically work by adding back edges to the low-stretch
We can show that the Preconditioned Conjugate Gradient will actually run in closer to O(m1/3 )
trees. The resulting systems can no longer be solved directly in linear time. Instead, we use
iterations. Since the trace is the sum of the eigenvalues, we know that for every β > 0, L−1
T LG has Gaussian elimination to eliminate the degree 1 and 2 vertices to reduce to a smaller system,
at most  and then solve that system recursively. The first nearly linear time algorithm of this form
Tr L−1
T LG /β ran in time O(m logc n log 1/ϵ), for some constant c [ST14]. An approach of this form was
eigenvalues that are larger than β. first made practical (and much simpler) by Koutis, Miller, and Peng [KMP11]. The
asymptotically fastest method also works this way. It runs in time
To exploit this fact, we use the following lemma. It basically says that we can ignore the largest
O(m log1/2 m logc log n log 1/ϵ), [CKM+ 14] (Cohen, Kyng, Miller, Pachocki, Peng, Rao, Xu).
eigenvalues of B −1 A if we are willing to spend one iteration for each.
Lemma 36.6.1. Let λ1 , . . . , λn be positive numbers such that all of them are at least α and at • By sparsification (see my notes from Lecture 19 from 2015). These algorithms work rather
most k of them are more than β. Then, for every t ≥ k, there exists a polynomial p(X) of degree t differently, and do not exploit low-stretch spanning trees. They appear in the papers
such that p(0) = 1 and [PS14, KLP+ 16].
!−(t−k)
2 • Accelerating Gaussian elimination by random sampling, by Kyng and Sachdeva [KS16].
|p(λi )| ≤ 2 1 + p ,
β/α This is the most elegant of the algorithms. While the running time of the algorithms,
O(m log2 n log 1/ϵ) is not the asymptotically best, the algorithm is so simple that it is the
for all λi .
best in practice. An optimized implementation appears in the package Laplacian.jl.

Proof. Let r(X) be the polynomial we constructed using Chebyshev polynomials of degree t − k
There are other algorithms that are often fast in practice, but for which we have no theoretical
for which !−(t−k) analysis. I suggest the Algebraic Multigrid of Livne and Brandt, and the Combinatorial Multigrid
2 of Yiannis Koutis.
|r(X)| ≤ 2 1 + p ,
β/α
for all X between α and β. Now, set
Y 36.8 Questions
p(X) = r(X) (1 − X/λi ).
i:λi >β
I conjecture that it is possible to construct spanning trees of even lower stretch. Does every graph
This new polynomial is zero at every λi greater than β, and for X between α and β have a spanning tree of average stretch 2 log2 n? I do not see any reason this should not be true. I
Y also believe that this should be achievable by a practical algorithm. The best code that I know for
|p(X)| = |r(X)| |(1 − X/λi )| ≤ |r(X)| , computing low-stretch spanning trees, and which I implemented in Laplacians.jl, is a heuristic
i:λi >β
based on the algorithm of Alon, Karp, Peleg and West. √ However, I do not know an analysis of
as we always have X < λi in the product. their algorithm that gives stretch better than O(m2 log n ). The theoretically better low-stretch
CHAPTER 36. PRECONDITIONING LAPLACIANS 291

trees of Abraham and Neiman are obtained by improving constructions of [EEST08, ABN08].
However, they seem too complicated to be practical.
The eigenvalues of L−1
H LG are called generalized eigenvalues. The relation between generalized
eigenvalues and stretch is the first result of which I am aware that establishes a combinatorial
interpretation of generalized eigenvalues. Can you find any others?
Chapter 37

Augmented Spanning Tree


Preconditioners

This Chapter Needs Editing


The first algorithms that solved Laplacian systems in nearly linear time used augmented spanning
tree preconditioners. These are formed by adding edges of G back to a spanning tree of G. Vaidya
[Vai90] first suggested doing this with maximum spanning trees. The first nearly linear time
solvers were developed by Spielman and Teng [ST14] by augmenting low stretch spanning trees.
The elegant algorithm described in this chapter is from two papers by Koutis, Miller, and Peng
[KMP10, KMP11]. It solves systems to ϵ accuracy in time O(me log n log ϵ−1 ).
Using the Iterative Refinement algorithm from the previous chapter, we know that it suffices to
show this with any constant ϵ < 1. You should assume throughout this chapter that ϵ is some
absolute constant like 1/20.
e is like O-notation, but it hides low order logarithmic terms. That is, when we
I recall that O
e
write f (n) ≤ O(g(n)), we mean that there is a constant c such that f (n) ≤ O(g(n) logc g(n)). For
example, in this notation we can say that every graph G has a spanning tree T of average stretch
e
O(log n). In this Chapter we will want to specify that many statements are true given some
choice of constants c. For this purpose, we will often let c be a constant, but not the same
constant, where it appears throughout the chapter. We do this instead of using O-notation, as it
simplifies making the constants explicit later.

37.1 Recursion

Let H be obtained by adding a few edges back to a spanning tree T of G. As a large fraction of
the vertices of T will have degree 1 or 2, the same is true of H. We can eliminate these degree 1

292
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 293 CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 294

e and an upper triangular matrix1 U such that


and 2 vertices to obtain a Schur complement H Lemma 37.1.2. Let H be a tree on n vertices plus k edges. If we eliminate degree 1 and 2
  vertices of the tree that do not touch the extra k edges until none remain, we will be left with at
I 0 most 4k vertices and 5k edges.
UT U = LH .
0 LHe
Proof. If we eliminate a degree 1 or 2 vertex of the tree that does not touch one of the extra k
This means that we can solve a system of equations in LH by solving systems in U T , LHe , and U .
edges, we will obtain a graph that looks like a tree on one fewer vertex, plus k edges. As a tree on
As elimination of a degree 1 vertex only decreases the degree of its neighbor and the elimination
4k vertices must have at least 2k + 1 vertices of degree 1 or 2, at least one of these does not touch
of a degree 2 vertex does not change the degrees of its neighbors, the matrix U has at most 2n
one of the extra k edges, and so can be eliminated.
nonzero entries. As U is upper triangular, systems in U and U T can be solved in time
proportional to their number of nonzero entries, O(n). This inspires a recursive algorithm for
solving equations in LG : we construct a good preconditioner H with many degree 1 and 2
vertices, and then solve systems in LH by approximately solving in LHe .
37.2 Heavy Trees
We now explore this idea in a little more detail. First observe that because we are applying a Koutis, Miller, and Peng observe that we do not necessarily have to produce a subgraph H of G
recursive algorithm, we will not solve systems in LHe exactly. Rather, we will be applying an that looks like a tree plus a few edges. All we really need is for H to have many fewer edges than
algorithm to approximately solve these systems. The one guarantee we make about this algorithm G. This still leaves the question of how we will find such an H that is a good approximation of G.
is that it acts as a linear operator. That is, the action of this algorithm corresponds to The trick is to use a variant of the random-sampling based approach of Chapter 32. But, we
multiplication by some matrix Z that we never construct. But, we know that for some ϵ avoid the cost of computing effective resistances of edges by estimating them by their stretches, at
the cost of a worse approximation.
(1 − ϵ)Z + ≼ LHe ≼ (1 + ϵ)Z + .
We begin by formally stating the result of that chapter for graphs.
This immediately implies that
      Theorem 37.2.1. Let G = (V, E, w) be a graph, let ϵ > 0, and for every edge (a, b) and let
I 0 I 0 I 0 pa,b ∈ (0, 1] satisfy
(1 − ϵ)U T U ≼ UT U ≼ (1 + ϵ)U T U.  
0 Z+ 0 LHe 0 Z+ 4 ln n
pa,b ≥ min 1, 2 wa,b Reff G (a, b) .
ϵ
Thus, we can obtain ϵ-approximate solutions to systems in LH by solving a system in U ,
applying Z , and solving a system in U T . Form the random graph H = (V, F, u) by setting for every edge independently
(
Define   wa,b /pa,b with probability pa,b
I 0 ua,b =
M = UT + U. 0 with probability 1 − pa,b .
0 Z
This will imply that κ(LG , M ) is at most ((1 + ϵ)/(1 − ϵ))κ(LG , LH ), which will be just a little Then there exists a constant c so that with probability
P at least 1 − n−c , H is an ϵ approximation of
more than κ(LG , LH ) and thus fine for our purposes. G and the number of edges in H is at most 2 a,b pa,b .

Lemma 37.1.1. Let T be a tree on n vertices. Then, more than half the vertices of T have degree Lemma 37.2.2. For every weighted graph G = (V, E, w), spanning tree T = (V, F, w) of G, and
1 or 2. a, b ∈ V , Reff G (a, b) ≤ StretchT (a, b).

Proof. The number of edges in T is n − 1, so the average degree of vertices in T is less than 2. Proof. Rayleigh’s Monotonicity Theorem tells us that Reff G (a, b) ≤ Reff T (a, b), and this latter
Thus T must contain at least one degree 1 vertex for every vertex of degree at least 3. The other term equals StretchT (a, b).
vertices have degree 2.
The problem with sampling edges with probability proportional to their effective resistance, or
We learned last lecture that if we keep eliminating degree 1 vertices from trees, then we will stretches, is that this will produce too many edges. Koutis, Miller, and Peng solve this problem
eventually eliminate all the vertices. An analogous fast is true for a graph that equals a tree plus by multiplicatively increasing the weights of the edges in a low-stretch spanning tree of G. Define
k edges. e = G + (s − 1)T.
G
1
Whether this matrix is actually upper triangular depends on the ordering of the vertices. We assume, without
loss of generality, that the vertices are ordered so that the matrix is upper triangular. e is the same as G, but every edge in the tree T has its weight multiplied by s.
That is, G
CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 295 CHAPTER 37. AUGMENTED SPANNING TREE PRECONDITIONERS 296

Thus, for (a, b) not in the tree, We now describe the recursion. Let G0 = G, the input graph. We will eventually solve systems in
Gi by recursively solving systems in Gi+1 . Each system Gi+1 will have fewer edges than Gi , and
Reff Ge (a, b) ≤ Reff sT (a, b) ≤ (1/s)StretchT (a, b). thus we can use a brute force solve when the system becomes small enough. We will bound the
running time of solvers for systems in Gi in terms of the number of edges that are not in their
For every edge (a, b)inT we set pa,b = 1 and for every edge (a, b) ̸∈ T , we set spanning trees. We denote this by oi = mi − (ni − 1). There is some issue with o0 , so let’s assume
  without much loss of generality that G0 does not have any degree 1 or 2 vertices, and thus the
4 ln n
pa,b ≥ min 1, 2
wa,b StretchT (a, b) . o0 ≥ n0 .

Form Ge i by multiplying a low-stretch spanning tree of G by s, and use random sampling to
Define σ to be the average stretch of edges of G with respect to T :
produce Hi . We know that the number of off-tree edges in Hi is at most a 1/(cσ ln n) fraction of
X the number of off-tree edges in Gi . If the number of off-tree edges in Hi is less than ni /4, then we
σ = (1/m)StretchT (G) = (1/m) wa,b StretchT (a, b),
know that after eliminating degree 1 and 2 vertices we will be left with a graph having at most
(a,b)∈E
4ni vertices and 5ni edges. We let Gi+1 be this graph. If this number off of-tree edges in Hi is
e
and recall that σ ≤ O(log n). If we now form H by including edge (a, b) with probability pa,b , more than ni /4, then we just set Gi+1 = Hi .
e and that
then Theorem 37.2.1 tells us that with high probability H is an ϵ approximation of G In this way, we ensure that oi+1 ≤ oi /(cσ ln n). We can now prove by backwards induction on i
the number of edges of H that are not in T is at most that the time required to solve systems of equations in LGi is at most O(oi σ ln n). A solve in Gi
X to constant accuracy requires performing O(s1/2 ) solves in Gi+1 and as many multiplies by LGi .
4mσ ln n
pa,b ≤ . By induction we know that this takes time at most
sϵ2
(a,b)̸∈T
O(s1/2 (oi + oi−1 σ ln n)) ≤ O(s1/2 (2oi )) ≤ O(oi σ ln n).
So, by making s a little more than some constant times σ ln n, we can make sure that the number
of edges of H not in T is less than the number of edges of G not in T .
e To this end, we use the following multiplicative
But, we need to solve systems in G, not G.
37.3 Saving a log
property of condition numbers.

Claim 37.2.3.
κ(LG , LH ) ≤ κ(LG , LGe )κ(LGe , LH ).

As Ge differs from G by having the weights of some edges multiplied by s, κ(LG , L e ) ≤ s. Thus,
G
we will have κ(LG , LH ) ≤ s(1 + ϵ)/(1 − ϵ), and to get ϵ accurate solutions to systems in LG we
will need to solve some constant times κ(LG , LH ) 1/2 systems in LH . As we are going to keep ϵ
constant, this will be around s1/2 .
To make an efficient algorithm for solving systems in G out of an algorithm for solving systems in
H, it would be easiest if the cost of the solves in H is less than the cost of a multiply by G. As we
will solve the system in H around s1/2 times, it seems natural to ensure that the number of edges
of H that are not in T is at most the number of edges in G divide by s1/2 . That is, we want
4mσ ln n
s1/2 ≤ m,
sϵ2
which requires
s ≥ c(σ ln n)2 ,
for some constant c. We will now show that such a choice of c yields an algorithm for solving
e
linear equations in LG to constant accuracy in time O(m log2 n).
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 298

38.3 The Idea

I begin by describing the idea behind the algorithm. This idea won’t quite work. But, we will see
how to turn it into one that does.
We will work with matrices that look like M = L + X where L is a Laplacian and X is a
Chapter 38 non-zero, non-negative diagonal matrix. Such matrices are called M-matrices. A symmetric
M-matrix is a matrix M with nonpositive off-diagonal entries such that M 1 is nonnegative and
nonzero. We have encountered M-matrices before without naming them. If G = (V, E) is a graph,
S ⊂ V , and G(S) is connected, then the submatrix of LG indexed by rows and columns in S is an
Fast Laplacian Solvers by M-matrix. Algorithmically, the problems of solving systems of equations in Laplacians and
symmetric M-matrices are equivalent.
Sparsification The sparsification results that we learned for Laplacians translate over to M-matrices. Every
M-matrix M can be written in the form X + L where L is a Laplacian and X is a nonnegative
b ≈ϵ L, then it is easy to show (too easy for homework) that
diagonal matrix. If L
b ≈ϵ X + L.
X +L
This Chapter Needs Editing
In Lecture 7, Lemma 7.3.1, we proved that if X has at least one nonzero entry and if L is
connected, then X + L is nonsingular. We write such a matrix in the form M = D − A where D
38.1 Overview is positive diagonal and A is nonnegative, and note that its being nonsingular and positive
semidefinite implies
We will see how sparsification allows us to solve systems of linear equations in Laplacian matrices D −A≻0 ⇐⇒ D ≻ A. (38.1)
and their sub-matrices in nearly linear time. By “nearly-linear”, I mean time Using the Perron-Frobenius theorem, one can also show that
O(m logc (nκ−1 ) log ϵ−1 ) for systems with m nonzero entries, n dimensions, condition number κ.
D ≻ −A. (38.2)
and accuracy ϵ.
This algorithm comes from [PS14]. Multiplying M by D −1/2 on either side, we obtain
I − D −1/2 AD −1/2 .
38.2 Today’s notion of approximation Define
B = D −1/2 AD −1/2 ,
In today’s lecture, I will find it convenient to define matrix approximations slightly differently and note that inequalities (38.1) and (38.2) imply that all eigenvalues of B have absolute value
from previous lectures. Today, I define A ≈ϵ B to mean strictly less than 1.
e−ϵ A ≼ B ≼ eϵ A. It suffices to figure out how to solve systems of equations in I − B. One way to do this is to
exploit the power series expansion:
Note that this relation is symmetric in A and B, and that for ϵ small eϵ ≈ 1 + ϵ.
(I − B)−1 = I + B + B 2 + B 3 + · · ·
The advantage of this definition is that
However, this series might need many terms to converge. We can figure out how many. If the
A ≈α B and B ≈β C implies A ≈α+β C . largest eigenvalue of B is (1 − κ) < 1, then we need at least 1/κ terms.
We can write a series with fewer terms if we express it as a product instead of as a sum:
X Y j
Bi = (I + B 2 ).
i≥0 j≥1

297
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 299 CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 300

To see why this works, look at the first few terms 38.5 D and A
(I +B)(I +B 2 )(I +B 4 ) = (I +B +B 2 +B 3 )(I +B 4 ) = (I +B +B 2 +B 3 )+B 4 (I +B +B 2 +B 3 ).
Unfortunately, we are going to need to stop writing matrices in terms of I and B, and return to
We only need O(log κ−1 ) terms of this product to obtain a good approximation of (I − B)−1 . writing them in terms of D and A. The reason this is unfortunate is that it makes for longer
j
The obstacle to quickly applying a series like this is that the matrices I + B 2 are probably dense. expressions.
We know how to solve this problem: we can sparsify them! I’m not saying that flippantly. We
The analog of (38.3) is
actually do know how to sparsify matrices of this form.
2j 1  −1 
But, simply sparsifying the matrices I + B does not solve our problem because approximation (D − A)−1 = D + (I + D −1 A)(D − AD −1 A)−1 (I + AD −1 ) . (38.4)
b and B ≈ϵ B,b A bB b could be a very poor 2
is not preserved by products. That is, even if A ≈ϵ A
approximation of AB. In fact, since the product A bBb is not necessarily symmetric, we haven’t
In order to be able to work with this expression inductively, we need to check that the middle
even defined what it would mean for it to approximate AB.
matrix is an M-matrix.

Lemma 38.5.1. If D is a diagonal matrix and A is a nonnegative matrix so that M = D − A is


38.4 A symmetric expansion an M-matrix, then
M 1 = D − AD −1 A
We will now derive a way of expanding (I − B)−1 that is amenable to approximation. We begin is also an M-matrix.
with an alternate derivation of the series we saw before. Note that
(I − B)(I + B) = (I − B 2 ), Proof. As the off-diagonal entries of this matrix are symmetric and nonpositive, it suffices to
and so prove that M 1 ≥ 0 and M 1 ̸= 0. To compute the row sums set
(I − B) = (I − B 2 )(I + B)−1 . d = D1 and a = A1,
Taking the inverse of both sides gives
and note that d − a ≥ 0 and d − a ̸= 0. For M 1 , we have
(I − B)−1 = (I + B)(I − B 2 )−1 .
We can then apply the same expansion to (I − B 2 )−1 to obtain (D − AD −1 A)1 = d − AD −1a ≥ d − A1 = d − a ,
(I − B)−1 = (I + B)(I + B 2 )(I − B 4 )−1 . which is nonnegative and not exactly zero.

What we need is a symmetric expansion. We use


We will apply transformation like this many times during our algorithm. To keep track of
1 1 progress, I say that (D, A) is an (α, β)-pair if
(I − B)−1 = I + (I + B)(I − B 2 )−1 (I + B). (38.3)
2 2
We will verify this by multiplying the right hand side by (I − B): a. D is positive diagonal,
(I + B)(I − B 2 )−1 (I + B)(I − B) = (I + B)(I − B 2 )−1 (I − B 2 ) = I + B;
b. A is nonnegative (and can have diagonal entries), and
so
1  1 c. αD ≽ A and βD ≽ −A.
I + (I + B)(I − B 2 )−1 (I + B) (I − B) = [(I − B) + (I + B)] = I .
2 2
This expression for (I − B)−1 plays nicely with matrix approximations. If For our initial matrix M = D − A, we know that there is some number κ > 0 for which (D, A) is
2 a (1 − κ, 1 − κ)-pair.
M 1 ≈ϵ (I − B ),
At the end of our recursion we will seek a (1/4, 1/4)-pair. When we have such a pair, we can just
then you can show
1  approximate D − A by D.
(I − B)−1 ≈ϵ I + (I + B)M −11 (I + B) .
2 Lemma 38.5.2. If M = D − A and (D, A) is a (1/4, 1/4)-pair, then
If we can apply M −1 −1
1 quickly and if B is sparse, then we can quickly approximate (I − B) . You
may now be wondering how we will construct such an M 1 . The answer, in short, is “recursively”. M ≈1/3 D.
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 301 CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 302

Proof. We have It remains to confirm that sparsification satisfies the requirements of this lemma. The reason this
M = D − A ≼ (1 + 1/4)D ≤ e1/4 D, might not be obvious is that we allow A to have nonnegative diagonal elements. While this does
not interfere with condition b, you might be concerned that it would interfere with condition c. It
and
need not.
M = D − A ≽ D − (1/4)D = (3/4)D ≽ e−1/3 D.
Let C be the diagonal of A, and let L be the Laplacian of the graph with adjacency matrix
A − C , and set X so that X + L = D − A. Let L e be a sparse ϵ-approximation of L. By
Lemma 38.5.3. If (D, A) is an (α, α)-pair, then (D, AD −1 A) is an (α2 , 0)-pair. computing the quadratic form in elementary unit vectors, you can check that the diagonals of L
e approximate each other. If we now write L
and L e=D e − A,
e where Ae has zero diagonal, and set
Proof. From Lecture 14, Lemma 3.1, we know that the condition of the lemma is equivalent to b =D
D e +C b =A
and A e +C
the assertion that all eigenvalues of D −1 A have absolute value at most α, and that the conclusion
is equivalent to the assertion that all eigenvalues of D −1 AD −1 A lie between 0 and α2 , which is b and A
You can now check that D b satisfy the requirements of Lemma 38.5.4.
immediate as they are the squares of the eigenvalues of D −1 A.
You might wonder why we bother to keep diagonal elements in a matrix like A. It seems simpler
to get rid of them. However, we want (D, A) to be an (α, β) pair, and removing subtracting C
So, if we start with matrices D and A that are a (1 − κ, 1 − κ)-pair, then after applying this from both of them would make β worse. This might not matter too much as we have good control
transformation approximately log κ−1 + 2 times we obtain a (1/4, 0)-pair. But, the matrices in over β. But, I don’t yet see a nice way to carry out a proof that exploits this.
this pair could be dense. To keep them sparse, we need to figure out how approximating D − A
degrades its quality.

Lemma 38.5.4. If ϵ ≤ 1/3, 38.6 Sketch of the construction

a. (D, A) is a (1 − κ, 0) pair, We begin with an M-matrix M 0 = D 0 − A0 . Since this matrix is nonsingular, there is a κ0 > 0 so
that (D 0 , A0 ) is a (1 − κ0 , 1 − κ0 ) pair.
b − A,
b. D − A ≈ϵ D b and
We now know that the matrix
b
c. D ≈ϵ D, D 0 − A0 D −1
0 A0
is an M-matrix and that (D 0 , A0 D −1
0 A0 )is a ((1 − κ0 )2 , 0)-pair. Define κ1 so that
b −A
then D b is an (1 − κe−2ϵ , 3ϵ)-pair. 1 − κ1 = (1 − κ)20 , and note that κ1 is approximately 2κ0 . Lemma 38.5.4 and the discussion
following it tells us that there is a (1 − κ1 e−2ϵ , 3ϵ)-pair (D 1 , A1 ) so that
Proof. First observe that
(1 − κ)D ≽ A ⇐⇒ D − A ≽ κD. D 1 − A1 ≈ϵ D 0 − A0 D −1
0 A0

Then, compute and so that A1 has O(n/ϵ2 ) nonzero entries.

b −A
b ≽ e−ϵ (D − A) ≽ e−ϵ κD ≽ e−2ϵ κD.
b Continuing inductively for some number k steps, we find (1 − κi , 3ϵ) pairs (D i , Ai ) so that
D
M i = D i − Ai
For the other side, compute
has O(n/ϵ2 ) nonzero entries, and
b ≽ eϵ D ≽ eϵ (D − A) ≽ (D
e2ϵ D b − A).
b
M i ≈ϵ D i − Ai−1 D −1
i−1 Ai−1 .
For ϵ ≤ 1/3, 3ϵ ≥ e2ϵ − 1, so
b ≽ (e2ϵ − 1)D
b ≽ −A.
b For the i such that κi is small, κi+1 is approximately twice κi . So, for k = 2 + log2 1/κ and ϵ close
3ϵD
to zero, we can guarantee that (D k , Ak ) is a (1/4, 1/4) pair.
We now see how this construction allows us to approximately solve systems of equations in
D 0 − A0 , and how we must set ϵ for it to work. For every 0 ≤ i < k, we have
1 1 1 −1 1
(D i −Ai )−1 D −1 + (I +D −1 −1 −1 −1 −1 −1
i Ai )(D i −Ai D i Ai ) (I +Ai D i ) ≈ϵ D i + (I +D i Ai )(D i+1 −Ai+1 ) (I +Ai
2 i 2 2 2
CHAPTER 38. FAST LAPLACIAN SOLVERS BY SPARSIFICATION 303

and
(D k − Ak )−1 ≈1/3 D −1
k .

By substituting through each of these approximations, we obtain solutions to systems of


equations in D 0 − A0 with accuracy 1/3 + kϵ. So, we should set kϵ = 1/3, and thus

ϵ = 1/(2 + log2 κ−1 ).


Chapter 39
The dominant cost of the resulting algorithm will be the multiplication of vectors by 2k matrices
of O(n/ϵ2 ) entries, with a total cost of

O(n(log2 (1/κ))3 ). Testing Isomorphism of Graphs with


38.7 Making the construction efficient Distinct Eigenvalues
In the above construction, I just assumed that appropriate sparsifiers exist, rather than
constructing them efficiently. To construct them efficiently, we need two ideas. The first is that
we need to be able to quickly approximate effective resistances so that we can use the sampling This Chapter Needs Editing
algorithm from Lecture 17.
The second is to observe that we do not actually want to form the matrix AD −1 A before 39.1 Introduction
sparsifying it, as that could take too long. Instead, we express it as a product of cliques that have
succinct descriptions, and we form the sum of approximations of each of those.
I will present an algorithm of Leighton and Miller [LM82] for testing isomorphism of graphs in
which all eigenvalues have multiplicity 1. This algorithm was never published, as the results were
technically subsumed by those in a paper of Babai, Grigoriev and Mount [BGM82], which gave a
38.8 Improvements
polynomial time algorithm for testing isomorphism of graphs in which all eigenvalues have
√ multiplicity bounded by a constant.
The fastest known algorithms for solving systems of equations run in time O(m log n log ϵ−1 )
+
[CKM 14]. The algorithm I have presented here can be substantially improved by combining it I present the weaker result in the interest of simplicity.
with Cholesky factorization. This both gives an efficient parallel algorithm, and proves the Testing isomorphism of graphs is√a notorious problem. Until very recently, the fastest-known
existence of an approximate inverse for every M-matrix that has a linear number of nonzeros
algorithm for it took time time 2 O(n log n) (See [Bab81, BL83, ZKT85]). Babai [Bab16] recently
[LPS15]. O(1)
announced a breakthrough that reduces the complexity to 2(log n) .
However, testing graph isomorphism seems easy in almost all practical instances. Today’s lecture
and one next week will give you some idea as to why.

39.2 Graph Isomorphism

Recall that two graphs G = (V, E) and H = (V, F ) are isomorphic if there exists a permutation π
of V such that
(a, b) ∈ E ⇐⇒ (π(a), π(b)) ∈ F.
Of course, we can express this relation in terms of matrices associated with the graphs. It doesn’t
matter much which matrices we use. So for this lecture we will use the adjacency matrices.

304
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES305 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES306

Every permutation may be realized by a permutation matrix. For the permutation π, this is the Lemma 39.3.1. Let A = ΨΛΨT and B = ΦΛΦT where Λ is a diagonal matrix with distinct
matrix Π with entries given by ( entries and Ψ and Φ are orthogonal matrices. A permutation matrix Π satisfies ΠAΠT = B if
1 if π(a) = b and only if there exists a diagonal ±1 matrix S for which
Π(a, b) =
0 otherwise.
ΠΨ = ΦS .
For a vector ψ, we see1 that
(Πψ) (a) = ψ(π(a)). Proof. Let ψ 1 , . . . , ψ n be the columns of Ψ and let ϕ1 , . . . , ϕn be the columns of Φ. Assuming
there is a Π for which ΠAΠT = B,
Let A be the adjacency matrix of G and let B be the adjacency matrix of H. We see that G and n n
H are isomorphic if and only if there exists a permutation matrix Π such that X X
ΦΛΦT = ϕi λi ϕTi = (Πψ i )λi (ψ Ti ΠT ),
i=1 i=1
ΠAΠT = B.
which implies that for all i
ϕi ϕTi = (Πψ i )(Πψ i )T .
39.3 Using Eigenvalues and Eigenvectors This in turn implies that
ϕi = ±Πψ i .
If G and H are isomorphic, then A and B must have the same eigenvalues. However, there are
many pairs of graphs that are non-isomorphic but which have the same eigenvalues. We will see To go the other direction, assume ΠΨ = ΦS . Then,
some tricky ones next lecture. But, for now, we note that if A and B have different eigenvalues,
then we know that the corresponding graphs are non-isomorphic, and we don’t have to worry ΠAΠT = ΠΨΛΨT ΠT = ΦS ΛS ΦT = ΦΛS S ΦT = ΦΛΦT = B,
about them.
as S and Λ are diagonal and thus commute, and S 2 = I .
For the rest of this lecture, we will assume that A and B have the same eigenvalues, and that
each of these eigenvalues has multiplicity 1. We will begin our study of this situation by Our algorithm for testing isomorphism will determine all such matrices S . Let S be the set of all
considering some cases in which testing isomorphism is easy. diagonal ±1 matrices. We will find diagonal matrices S ∈ S such that the set of rows of ΦS is
Recall that we can write the same as the set of rows of Ψ. As the rows of Ψ are indexed by vertices a ∈ V , we will write
A = ΨΛΨT , the row indexed by a as the row-vector
def
where Λ is the diagonal matrix of eigenvalues of A and Ψ is an orthonormal matrix holding its v a = (ψ 1 (a), . . . , ψ n (a)).
eigenvectors. If B has the same eigenvalues, we can write
Similarly denote the rows of Φ by vectors u a . In this notation, we are searching for matrices
B = ΦΛΦT . S ∈ S for which the set of vectors {v a }a∈V is identical to the set of vectors {u a S }a∈V We have
thus transformed the graph isomorphism problem into a problem about vectors:
If Π is the matrix of an isomorphism from G to H, then

ΠΨΛΨT ΠT = ΦΛΦT .
39.4 An easy case
As each entry of Λ is distinct, this looks like it would imply ΠΨ = Φ. But, the eigenvectors
(columns of Φ and Ψ) are only determined up to sign. So, it just implies I will say that an eigenvector ψ i is helpful if for all a ̸= b ∈ V , |ψ i (a)| =
̸ |ψ i (b)|. In this case, it is
very easy to test if G and H are isomorphic, because this helpful vector gives us a canonical name
ΠΨ = ΦS, for every vertex. If Π is an isomorphism from G to H, then Πψ i must be an eigenvector of B. In
fact, is must be ±ϕi . If the sets of absolute values of entries of ψ i and ϕi are the same, then we
where S is a diagonal matrix with ±1 entries on its diagonal. may find the permutation that maps A to B by mapping every vertex a to the vertex b for which
1
I hope I got that right. It’s very easy to confuse the permutation and its inverse. |ψ i (a)| = |ϕi (b)|.
The reason that I put absolute values in the definition of helpful, rather than just taking values, is
that eigenvectors are only determined up to sign. On the other hand, a single eigenvector
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES307 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES308

determines the isomorphism if ψ i (a) ̸= ψ i (b) for all a ̸= b and there is a canonical way to choose a vertices to which it can be mapped by automorphisms. We will discover the orbits by realizing
sign for the vector ψ i . For example, if the sum of the entries in ψ i is not zero, we can choose its that the orbit of a vertex a is the set of b for which v a S = v b for some S ∈ A.
sign to make the sum positive. In fact, unless ψ i and −ψ i have exactly the same set of values,
The set of orbits of vertices forms a partition of the vertices. We say that a partition of the
there is a canonical choice of the sign for this vector.
vertices is valid if every orbit is contained entirely within one set in the partition. That is, each
Even if there is no canonical choice of sign for this vector, it leaves at most two choices for the class of the partition is a union of orbits. Our algorithm will proceed by constructing a valid
isomorphism. partition of the vertices and then splitting classes in the partition until each is exactly an orbit.
Recall that a set is stabilized by a group if the set is unchanged when the group acts on all of its
members. We will say that a group G ⊆ S stabilizes a set of vertices C if it stabilizes the set of
39.5 All the Automorphisms
vectors {v a }a∈C . Thus, A is the group that stabilizes V .

The graph isomorphism problem is complicated by the fact that there can be many isomorphisms An orbit is stabilized by A, and so are unions of orbits and thus classes of valid partitions. We
from one graph to another. So, any algorithm for finding isomorphisms must be able to find many would like to construct the subgroup of S that stabilizes each orbit Cj . However, I do not yet see
of them. how to do that directly. Instead, we will construct a particular valid partition of the vertices, and
find for each class in the partition Cj the subgroup of Aj ⊆ S that stabilizes Cj , where here we
Recall that an automorphism of a graph is an isomorphism from the graph to itself. These form a are considering the actions of matrices S ∈ S on vectors v a . In fact, Aj will act transitively2 on
group which we denote aut(G): if Π and Γ are automorphisms of A then so is ΠΓ. Let A ⊆ S the class Cj . As A stabilizes every orbit, and thus every union of orbits, it is a subgroup of Aj . In
denote the corresponding set of diagonal ±1 matrices. The set A is in fact a group and is fact, A is exactly the intersection of all the groups Aj .
isomorphic to aut(G).
We now observe that we can use linear algebra to efficiently construct A from the groups Aj by
Here is a way to make this isomorphism very concrete: Lemma 39.3.1 implies that the exploiting the isomorphism between S and (Z/2)n . Each subgroup Aj is isomorphic to a
Π ∈ aut(G) and the S ∈ A are related by subgroup of (Z/2)n . Each subgroup of (Z/2)n is precisely a vector space modulo 2, and thus may
Π = ΨS ΨT and S = ΨT ΠΨ. be described by a basis. It will eventually become clear that by “compute Aj ” we mean to
compute such a basis. From the basis, we may compute a basis of the nullspace. The subgroup of
As diagonal matrices commute, we have that for every Π1 and Π2 in aut(G) and for (Z/2)n corresponding to A is then the nullspace of the span of the nullspaces of the subspaces
S 1 = ΨT Π1 Ψ and S 2 = ΨT Π2 Ψ, corresponding to the Aj . We can compute all these using Gaussian elimination.
Π1 Π2 = ΨS 1 ΨT ΨS 2 ΨT = ΨS 1 S 2 ΨT = ΨS 2 S 1 ΨT = ΨS 2 ΨT ΨS 1 ΨT = Π2 Π1 .
Thus, the automorphism group of a graph with distinct eigenvalues is commutative, and it is 39.7 The first partition
isomorphic to a subgroup of S.
It might be easier to think about these subgroups by realizing that they are isomorphic to We may begin by dividing vertices according to the absolute values of their entries in
subspaces of (Z/2Z)n . Let f : S → (Z/2Z)n be the function that maps the group of diagonal eigenvectors. That is, if |ψ i (a)| =
̸ |ψ i (b)| for some i, then we may place vertices a and b in
matrices with ±1 entries to vectors t modulo 2 by setting t(i) so that S (i, i) = (−1)t(i) . You different classes, as there can be no S ∈ S for which v a S = v b . The partition that we obtain this
should check that this is a group homomorphism: f (S 1 S 2 ) = f (S 1 ) + f (S 2 ). You should also way is thus valid, and is the starting point of our algorithm.
confirm that f is invertible.
For today’s lecture, we will focus on the problem of finding the group of automorphisms of a
graph with distinct eigenvalues. We will probably save the slight extension to finding
39.8 Unbalanced vectors
isomorphisms for homework. Note that we will not try to list all the isomorphisms, as there could
be many. Rather, we will give a basis of the corresponding subspace of (Z/2Z)n . We say that an eigenvector ψ i is unbalanced if there is some value x for which
|{a : ψ i (a) = x}| =
̸ |{a : ψ i (a) = −x}| .

39.6 Equivalence Classes of Vertices Such vectors cannot change sign in an automorphism. That is, S (i, i) must equal 1. The reason is
that an automorphism with S (i, i) = −1 must induce a bijection between the two sets above, but
Recall that the orbit of an element under the action of a group is the set of elements to which it is this is impossible if their sizes are different.
2
mapped by the elements of the group. Concretely, the orbit of a vertex a in the graph is the set of That is, for every a and b in Cj , there is an S ∈ Aj for which v a S = b b .
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES309 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES310

Thus, an unbalanced vector tells us that all vertices for which ψ i (a) = x are in different orbits rows are indexed by vertices in a ∈ Cj , whose columns are indexed by subsets R ⊆ Q, and whose
from those for which ψ i (a) = −x. This lets us refine classes. entries are given by
We now extend this idea in two ways. First, we say that ψ i is unbalanced on a class C if there is 

1 if x > 0
some value x for which
MCj ,Q (a, R) = sgn(ψ R (a)), where I recall sgn(x) = −1 if x < 0, and


|{a ∈ C : ψ i (a) = x}| =
̸ |{a ∈ C : ψ i (a) = −x}| . 0 if x = 0.

By the same reasoning, we can infer that the sign of S (i, i) must be fixed to 1. Assuming, as will Lemma 39.9.1. If Q is independent on C then the columns of MC,Q are orthogonal.
be the case, that C is a class in a valid partition and thus a union of orbits, we are now able to
split C into two smaller classes Proof. Let R1 and R2 index two columns of MC,Q . That is, R1 and R2 are two different subsets of
Q. Let R0 be their symmetric difference. We have
C0 = {a ∈ C : ψ i (a) = x} and C1 = {a ∈ C : ψ i (a) = −x} .

The partition we obtain by splitting C into C1 and C2 is thus also valid. Of course, it is only MC,Q (a, R1 )MC,Q (a, R2 ) = sgn(ψ R1 (a))sgn(ψ R2 (a)) =
Y Y Y
useful if both sets are non-empty. sgn(ψ i (a)) sgn(ψ i (a)) = sgn(ψ i (a)) = sgn(ψ R0 (a)) = MC,Q (a, R0 ).
i∈R1 i∈R2 i∈R0
Finally, we consider vectors formed from products of eigenvectors. For R ⊆ {1, . . . , n}, define ψ R
to be the component-wise product of the ψ i for i ∈ R: As all the nonempty products of subsets of eigenvectors in Q are balanced on C, MC,Q (a, R0 ) is
Y positive for half the a ∈ C and negative for the other half. So,
ψ R (a) = ψ i (a).
i∈R
X X
MC,Q (:, R1 )T MC,Q (:, R2 ) = MC,Q (a, R1 )MC,Q (a, R2 ) = MC,Q (a, R0 ) = 0.
We say that the vector ψ R is unbalanced on class C if there is some value x for which a∈C a∈C

|{a ∈ C : ψ R (a) = x}| =


̸ |{a ∈ C : ψ R (a) = −x}| .
Lemma 39.9.2. If C is a balanced class of vertices and Q is a maximal set of eigenvectors that
An unbalanced vector of this form again tells us that the vertices in the two sets belong to
are independent on C, then for every a and b in C there is an i ∈ Q for which ψ i (a) ̸= ψ i (b).
different orbits. So, if both sets are nonempty we can use such a vector to split the class C in two
to obtain a more refined valid partition. It also provides some relations between the entries of S ,
but we will not exploit those. Proof. Assume by way of contradiction that this does not hold. There must be some eigenvector i
for which ψ i (a) ̸= ψ i (b). We will show that if we added i to Q, the product of every subset would
We say that a vector is balanced if it is not unbalanced. still be balanced. As we already know this for subsets of Q, we just have to prove it for subsets of
We say that a subset of the vertices C ⊆ V is balanced if every non-constant product of the form R ∪ {i}, where R ⊆ Q. As ψ h (a) = ψ h (b) for every h ∈ Q, ψ R (a) = ψ R (b). This implies
eigenvectors is balanced on C. Thus, orbits are balanced. Our algorithm will partition the ψ R∪{i} (a) ̸= ψ R∪{i} (b). Thus, ψ R∪{i} is not uniform on C, and so it must be balanced on C.
vertices into balanced classes. Lemma 39.9.3. If C is a balanced class of vertices and Q is a maximal set of eigenvectors that
My confusion over this lecture stemmed from thinking that all balanced classes must be orbits. are independent on C, then the rows of MC,Q are orthogonal.
But, I don’t know if this is true.
Proof. Let a and b be in C. From Lemma 39.9.2 we know that there is an i ∈ Q for which
Question: Is every balanced class an orbit of A?
ψ i (a) = −ψ i (b). To prove that the rows MC,Q (a, :) and MC,Q (b, :) are orthogonal, we compute

39.9 The structure of the balanced classes

Let Cj be a balanced class. By definition, the product of every subset of eigenvectors is either
constant or balanced on Cj . We say that a subset of eigenvectors Q is independent on Cj if all
products of subsets of eigenvectors in Q are balanced on Cj (except for the empty product). In
particular, none of these eigenvectors is zero or constant on Cj . Construct a matrix MCj ,Q whose
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES311 CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES312

their inner product: Apply f to map this subgroup of S to (Z/2)n , and let B be a n-by-log2 (|C|) matrix containing a
X X basis of the subspace in its columns. Any independent subset of log2 (|C|) rows of B will form a
sgn(ψ R (a)ψ R (b)) = sgn(ψ R (a)ψ R (b)) + sgn(ψ R∪{i} (a)ψ R∪{i} (b)) basis of the row-space, and is isomorphic to a base for C of the eigenvectors.
R⊆Q R⊆Q−{i}
X
= sgn(ψ R (a)ψ R (b)) + sgn(ψ R (a)ψ i (a)ψ R (b)ψ i (b))
R⊆Q−{i} 39.10 Algorithms
X
= sgn(ψ R (a)ψ R (b)) + sgn(ψ R (a)ψ R (b))sgn(ψ i (a)ψ i (b))
R⊆Q−{i} Let Cj be a balanced class. We just saw how to compute Aj , assuming that we know Cj and a
X base Q for it. Of course, by “compute” we mean computing a basis of f (Aj ). We now show how
= sgn(ψ R (a)ψ R (b)) − sgn(ψ R (a)ψ R (b)) to find a base for a balanced class Cj . We do this by building up a set Q of eigenvectors that are
R⊆Q−{i}
independent on Cj . To do this, we go through the eigenvectors in order. For each eigenvector ψ i ,
= 0. we must determine whether or not its values on Cj can be expressed as a product of eigenvectors
already present in Q. If it can be, then we record this product as part of the structure of Aj . If
not, we add i to Q.
Corollary 39.9.4. Let C be a balanced subset of vertices. Then the size of C is a power of 2. If The eigenvector ψ i is a product of eigenvectors in Q on Cj if and only if there is a constant γ and
Q is an independent set of eigenvectors on C, then |Q| ≤ log2 |C|. yh ∈ {0, 1} for h ∈ Q such that Y
ψ i (a) = γ (ψ h (a))yh ,
Proof. Let C be an orbit and let Q be a maximal set of eigenvectors that are independent on C. h∈Q
As the rows and columns of MC,Q are both orthogonal, MC,Q must be square. This implies that for all vertices a ∈ Cj . This happens if and only if
|C| = 2|Q| . If we drop the assumption that Q is maximal, we still know that all the columns of
Y
MC,Q are orthogonal. This matrix has 2|Q| columns. As they are vectors in |C| dimensions, there sgn(ψ i (a)) = sgn(ψ h (a))yh .
can be at most |C| of them. h∈Q

We can now describe the structure of a balanced subset of vertices C. We call a maximal set of We can tell whether or not these equations have a solution using linear algebra modulo 2. Let B
eigenvectors that are independent on C a base for C. Every other eigenvector j is either constant be the matrix over Z/2 such that
on C or becomes constant when multiplied by the product of some subset R of eigenvectors in Q. ψ i (a) = (−1)B(i,a) .
In either case, we can write Then, the above equations become
Y X
ψ j (a) = γ ψ i (a) for all a ∈ C, (39.1) B(i, a) = yh B(h, a) for all a ∈ Cj .
i∈R h∈Q

for some constant γ. Thus, we can solve for the coefficients yh in polynomial time, if they exist. If they do not, we add
Let v a (Q) denote the vector (v a (i))i∈Q —the restriction of the vector v a to the coordinates in Q. i to Q.
I claim that every one of the 2|Q| ± sign patterns of length |Q| must appear in exactly one of the Once we have determined a base Q and how to express on Cj the values of every other
vectors v q (Q). The reason is that there are |C| = 2|Q| of these vectors, and we established in eigenvector as a product of eigenvectors in Q, we have determine Aj .
Lemma 39.9.2 that v a (Q) ̸= v b (Q) for all a ̸= b in Q. Thus, for every diagonal ± matrix S Q of
dimension |Q|, we have It remains to explain how we partition the vertices into balanced classes. Consider applying the
{v a (Q)S Q : a ∈ C} = {v a (Q) : a ∈ C} . above procedure to a class Cj that is not balanced. We will discover that Cj is not balanced by
finding a product of eigenvectors that is neither constant nor balanced on Cj . Every time we add
That is, this set of vectors is stabilized by ±1 diagonal matrices.
an eigenvector ψ i to Q, we will examine every product of vectors in Q to check if any are
As equation (39.1) gives a formula for the value taken on C by every eigenvector not in Q in unbalanced on Cj . We can do this efficiently, because there are at most 2|Q| ≤ |Cj | such products
terms of the eigenvectors in Q, we have described the structure of the subgroup of S that to consider. As we have added ψ i to Q, none of the products of vectors in Q can be constant on
stabilizes C: the diagonals corresponding to Q are unconstrained, and every other diagonal is Cj . If we find a product that it not balanced on Cj , then it must also be non-constant, and thus
some product of these. This structure is something that you are used to seeing in subspaces. provide a way of splitting class Cj into two.
CHAPTER 39. TESTING ISOMORPHISM OF GRAPHS WITH DISTINCT EIGENVALUES313

We can now summarize the entire algorithm. We first compute the partition by absolute values of
entries described in section 39.7. We then go through the classes of the partition one-by-one. For
each, we use the above procedure until we have either split it in two or we have determined that it
is balanced and we have computed its automorphism group. If we do split the class in two, we
refine the partition and start over. As the total number of times we split classes is at most n, this
algorithm runs in polynomial time.
After we have computed a partition into balanced classes and have computed their
Chapter 40
automorphisms groups, we combine them to find the automorphisms group of the entire graph as
described at the end of section 39.6.

Testing Isomorphism of Strongly


Regular Graphs

This Chapter Needs Editing

40.1 Introduction

In the last lecture we saw how to test isomorphism of graphs in which every eigenvalue is distinct.
So, in this lecture we will consider the opposite case: graphs that only have 3 distinct eigenvalues.
These are the strongly regular graphs.
Our algorithm for testing isomorphism of these will not run in polynomial time. Rather, it takes
1/2
time nO(n log n) . This is at least much faster than the naive algorithm of checking all n! possible
permutations. In fact, this was the best known running time for general algorithms for graph
isomorphism until three years ago.

40.2 Definitions

A graph G is strongly regular if

1. it is d-regular, for some integer d;


2. there exists an integer α such that for every pair of vertices x and y that are neighbors in G,
there are exactly α vertices z that are neighbors of both x and y;
3. there exists an integer β such that for every pair of vertices x and y that are not neighbors
in G, there are exactly β vertices z that are neighbors of both x and y.

These conditions are very strong, and it might not be obvious that there are any non-trivial
graphs that satisfy these conditions. Of course, the complete graph and disjoint unions of

314
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 315 CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 316

complete graphs satisfy these conditions. Before proceeding, I warn you that there is a standard 1. they are in the same row,
notation in the literature about strongly regular graphs, and I am trying not to use it. In this
literature, d becomes k, α becomes λ and β becomes µ. Many other letters are bound as well. 2. they are in the same column, or

For the rest of this lecture, we will only consider strongly regular graphs that are connected and 3. they hold the same number.
that are not the complete graph. I will now give you some examples.
So, such a graph has degree d = 3(n − 1). Any two nodes in the same row will both be neighbors
with every other pair of nodes in their row. They will have two more common neighors: the nodes
40.3 Paley Graphs and The Pentagon in their columns holding the other’s number. So, they have n common neighbors. The same
obviously holds for columns, and is easy to see for nodes that have the same number. So, every
pair of nodes that are neighbors have exactly α = n common neighbors.
The Paley graphs we encountered are strongly regular. The simplest of these is the pentagon. It
has parameters On the other hand, consider two vertices that are not neighbors, say (1, 1) and (2, 2). They lie in
n = 5, d = 2, α = 0, β = 1. different rows, lie in different columns, and we are assuming that they hold different numbers.
The vertex (1, 1) has two common neighbors of (2, 2) in its row: the vertex (1, 2) and the vertex
holding the same number as (2, 2). Similarly, it has two common neighbors of (2, 2) in its column.
40.4 Lattice Graphs Finally, we can find two more common neighbors of (2, 2) that are in different rows and columns
by looking at the nodes that hold the same number as (1, 1), but which are in the same row or
For a positive integer n, the lattice graph Ln is the graph with vertex set {1, . . . n}2 in which column as (2, 2). So, β = 6.
vertex (a, b) is connected to vertex (c, d) if a = c or b = d. Thus, the vertices may be arranged at
the points in an n-by-n grid, with vertices being connected if they lie in the same row or column.
Alternatively, you can understand this graph as the product of two complete graphs on n vertices. 40.6 The Eigenvalues of Strongly Regular Graphs
The parameters of this graph are:
We will consider the adjacency matrices of strongly regular graphs. Let A be the adjacency
d = 2(n − 1), α = n − 2, β = 2. matrix of a strongly regular graph with parameters (d, α, β). We already know that A has an
eigenvalue of d with multiplicity 1. We will now show that A has just two other eigenvalues.
To prove this, first observe that the (a, b) entry of A2 is the number of common neighbors of
40.5 Latin Square Graphs vertices a and b. For a = b, this is just the degree of vertex a. We will use this fact to write A2 as
a linear combination of A, I and J, the all 1s matrix. To this end, observe that the adjacency
A Latin square is an n-by-n grid, each entry of which is a number between 1 and n, such that no matrix of the complement of A (the graph with non-edges where A has edges) is J − I − A. So,
number appears twice in any row or column. For example,
A2 = αA + β(J − I − A) + dI = (α − β)A + βJ + (d − β)I.
     
1 2 3 4 1 2 3 4 1 2 3 4
4 1 2 3 2 1 4 3  3 For every vector v orthogonal to 1,
 ,   , and 2 4 1 
3 4 1 2 3 4 1 2 3 1 4 2
2 3 4 1 4 3 2 1 4 3 2 1 A2 v = (α − β)Av + (d − β)v .

are Latin squares. Let me remark that the number of different Latin squares of size n grows very So, every eigenvalue λ of A other than d satisfies
quickly—at least as fast as n!(n − 1)!(n − 2)! . . . 2!. Two Latin squares are said to be isomorphic if
λ2 = (α − β)λ + d − β.
there is a renumbering of their rows, columns, and entries, or a permutation of these, that makes
them the same. As this provides 6(n!)3 isomorphisms, and this is much less than the number of Thus, these are given by p
Latin squares, there must be many non-isomorphic Latin squares of the same size. The two of the α−β± (α − β)2 + 4(d − β)
Latin squares above are isomorphic, but one is not. λ= .
2
From such a Latin square, we construct a Latin square graph. It will have n2 nodes, one for each These eigenvalues are traditionally denoted r and s, with r > s. By convention, the multiplicty of
cell in the square. Two nodes are joined by an edge if the eigenvalue r is always denoted f , and the multiplicty of s is always denoted g.
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 317 CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 318

For example, for the pentagon we have Now, every vertex has its own unique label. If we were given another copy of this graph, we could
√ √ use these labels to determine the isomorphism between them. This procedure is called refinement,
5−1 5+1
r= , s=− . and it can be carried out until it stops producing new labels. However, it is clear that this
2 2
procedure will fail to produce unique labels if the graph has automorphisms, or if it is a regular
For the lattice graph Ln , we have graph. In these cases, we need a way to break symmetry.
r = n − 2, s = −2.
The procedure called individualization breaks symmetry arbitrarily. It chooses some nodes in the
For the Latin square graphs of order n, we have
graph, arbitrarily, to give their own unique names. Ideally, we pick one vertex to give a unique
r = n − 3, s = −3. name, and then refine the resulting labeling. We could then pick another troubling vertex, and
continue. We call a set of vertices S ⊂ V a distinguishing set if individualizing this set of nodes
One can prove that every connected regular graph whose adjacency (or Laplacian) matrix has just results in a unique name for every vertex, after refinement. How would we use a distinguishing set
three distinct eigenvalues is a strongly regular graph. to test isomorphism? Assume that S is a distinguishing set for G = (V, E). To test if H = (W, F )
is isomorphic to G, we could enumerate over every possible set of |S| vertices of W , and check if
they are a distinguishing set for H. If G and H are isomorphic, then H will also have an
40.7 Testing Isomorphism by Individualization and Refinement isomorphic distinguishing  set that we can use to find an isomorphism between G and H. We
n
would have to check |S| sets, and try |S|! labelings for each, so we had better hope that S is
The problem of testing isomorphism of graphs is often reduced to the problem of giving each small.
vertex in a graph a unique name. If we have a way of doing this that does not depend upon the
initial ordering of the vertices, then we can use it to test graph isomorphism: find the unique
names of vertices in both graphs, and then see if it provides an isomorphism. For example, 40.8 Distinguishing Sets for Strongly Regular Graphs
consider the graph below.
We will now prove a result of Babai [Bab80] which says that every strongly regular graph has a

distinguishing set of size O( n log n). Babai’s result won’t require any refinement beyond naming
every vertex by the set of individualized nodes that are its neighbors. So, we will prove that a set
of nodes S is a distinguishing set by proving that for every pair of distinct vertices a and b, either
there is an s ∈ S that is a neighbor of a but not of b, or the other way around. This will suffice to
distinguish a and b. As our algorithm will work in a brute-force fashion, enumerating over all sets
of a given size, we merely need to show that such a set S exists. We will do so by proving that a
random set of vertices probably works.
We could begin by labeling every vertex by its degree.
I first observe that it suffices to consider strongly-regular graphs with d < n/2, as the complement
1 of a strongly regular graph is also a strongly regular graph (that would have been too easy to
2 3 assign as a homework problem). We should also observe that every strongly-regular graph has

3 diameter 2, and so d ≥ n − 1.

Lemma 40.8.1. Let G = (V, E) be a connected strongly regular graph with n vertices and degree
1 2 d < n/2. Then for every pair of vertices a and b, there are at least d/3 vertices that are neighbors
of a but not b.
The degrees distinguish between many nodes, but not all of them. We may refine this labeling by
appending the labels of every neighbor of a node. Before I prove this, let √
me show how we may use it to prove the theorem. This lemma tells us
that there are at least n − 1/3 nodes that are neighbors of a but not of b. Let T be the set of
1, {3}
2, {1,3} 3, {2, 2, 3} nodes that are neighbors of a but not neighbors of b. So, if we choose a vertex at random, the
3, {1, 2, 3} probability that it is in T is at least

|T | n−1 1
2, {3, 3} ≥ ≥ √ .
1, {2} n 3n 3 n+2
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 319 CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 320


If we choose a set S of 3 n + 2 ln n2 vertices at random, the probability that none of them is in T So,
is  3√n+2 ln n2 d − β ≤ 2(d − α − 1) =⇒ 2(α + 1) ≤ d + β. (40.2)
1 1
1− √ ≤ 2. This tells us that if α is close to d, then β is also.
3 n+2 n
 We require one more relation between α and β. We obtain this relation by picking any vertex a,
So, the probability that a random set of this many nodes fails to distinguish all n2 pairs is at
and counting the pairs b, z such that b ∼ z, a ∼ b and a ̸∼ z.
most 1/2.

Proof of Lemma 40.8.1. Write a ∼ b if a is a neighbor of b, and a ̸∼ b otherwise. If a ∼ b, then the


number of nodes that are neighbors of a but not of b is d − 1 − α, and if a ̸∼ b the number if
d − β. So, we need to prove that neither α nor β is too close to d.
We will do this by establishing some elementary relations between these parameters. First,
consider the case in which a ∼ b. Let z be any vertex such that a ̸∼ z and b ̸∼ z. We will use z to
prove an upper bound on the number of vertices w that are neighbors of a but not of b (I know
this looks like the wrong direction, but be patient). Let Every node b that is a neighbor of a has α neighbors in common with a, and so has d − α − 1
neighbors that are not neighbors of a. This gives
Z0 = {w : w ∼ a, w ̸∼ z} , and Z1 = {w : w ̸∼ b, w ∼ z} .
|{(b, z) : b ∼ z, a ∼ b, a ̸∼ z}| = d(d − α − 1).

On the other hand, there are n − d − 1 nodes z that are not neighbors of a, and each of them has
β neighbors in common with a, giving

|{(b, z) : b ∼ z, a ∼ b, a ̸∼ z}| = (n − d − 1)β.

Combining, we find
(n − d − 1)β = d(d − α − 1). (40.3)
Clearly, every w that is a neighbor of a but not of b lies in either Z0 or Z1 . As z is neither a
neighbor of a nor of b, As d < n/2, this equation tells us
|Z0 | = |Z1 | = d − β.
d(d − α − 1) ≥ dβ =⇒ d − α − 1 ≥ β. (40.4)
So,
d − α − 1 ≤ 2(d − β) =⇒ 2β ≤ d + α + 1. (40.1) Adding inequality 40.1 to (40.4) gives

So, if β is close to d, α must also be close to d. 2


2d ≥ 3β =⇒ β ≤ d,
3
We will obtain an inequality in the other direction when a ̸∼ b by exploiting a z such that z ∼ a
and z ∼ b. Now, for any w ∼ a but w ̸∼ b, we have either while adding inequality 40.2 to (40.4) gives

(w ∼ a and w ̸∼ z) or (w ∼ z and w ̸∼ b). 2


α + 1 ≤ d.
3

Thus, for every a ̸= b the number of vertices that are neighbors of a but not of b is at least
min(d − α − 1, d − β) ≥ d/3.
CHAPTER 40. TESTING ISOMORPHISM OF STRONGLY REGULAR GRAPHS 321

40.9 Notes

You should wonder if we can make this faster by analyzing refinement steps. In, [Spi96b], I
1/3
improved the running time bound to 2O(n log n) by analyzing two refinement phases. The
algorithm required us to handle certain special families of strongly regular graphs separately:
Latin square graphs and Steiner graphs. Algorithms for testing isomorphism of strongly regular
graphs were recently improved by Babai, Chen, Sun, Teng, and Wilmes [BCS+ 13, BW13, SW15].
The running times of all these algorithms are subsumed by that in Babai’s breakthrough
algorithm for testing graph isomorphism [Bab16].

Part VII

Interlacing Families

322
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 324

is then
d
X
Πi M ΠTi .
i=1

In today’s lecture, we will consider the expected characteristic polynomial of such a graph. For a
matrix M , we let
Chapter 41 def
χx (M ) = det(xI − M )
denote the characteristic polynomial of M in the variable x.
For simplicity, we will consider the expected polynomial of the sum of just two graphs. For
Expected Characteristic Polynomials generality, we will let them be any graphs, or any symmetric matrices.
Our goal for today is to prove that these expected polynomials are real rooted.

Theorem 41.2.1. Let A and B be symmetric n-by-n matrices and let Π be a uniform random
This Chapter Needs Editing permutation. Then,  
T
EΠ χx (A + ΠBΠ )
has only real roots.
41.1 Overview
So that you will be surprised by this, I remind you that the sum of real rooted polynomials might
Over the next few lectures, we will see two different proofs that infinite families of bipartite have no real roots. For example, both (x − 2)2 and (x + 2)2 have only real roots, but their sum,
Ramanujan graphs exist. Both proofs will use the theory of interlacing polynomials, and will 2x2 + 8, has no real roots.
consider the expected characteristic polynomials of random matrices. In today’s lecture, we will Theorem 41.2.1 also holds for sums of many matrices. But, for simplicity, we restrict ourselves to
see a proof that some of these polynomials are real rooted. considering the sum of two.
At present, we do not know how to use these techniques to prove the existence of infinite families
of non-bipartite Ramanujan graphs.
41.3 Interlacing
The material in today’s lecture comes from [MSS15d], but the proof is inspired by the treatment
of that work in [HPS15].
Our first tool for establishing real rootedness of polynomials is interlacing.
If p(x) is a real rooted polynomial of degree n and q(x) is a real rooted polynomial of degree
41.2 Random sums of graphs n − 1, then we say that p and q interlace if p has roots λ1 ≥ λ2 ≥ · · · ≥ λn and q has roots
µ1 ≥ µ2 ≥ · · · ≥ µn−1 that satisfy
We will build Ramanujan graphs on n vertices of degree d, for every d and even n. We begin by
λ1 ≥ µ1 ≥ λ2 ≥ µ2 · · · ≥ λn−1 ≥ µn−1 ≥ λn .
considering a random graph on n vertices of degree d. When n is even, the most natural way to
generate such a graph is to choose d perfect matchings uniformly at random, and to then take
their sum. I should mention one caveat: some edge could appear in many of the matchings. In We have seen two important examples of interlacing in this class so far. A real rooted polynomial
this case, we add the weights of the corresponding edges together. So, the weight of an edge is the and its derivative interlace. Similarly, the characteristic polynomial of a symmetric matrix and
number of matchings in which it appears. the characteristic polynomial of a principal submatrix interlace.

Let M be the adjacency matrix of some perfect matching on n vertices. We can generate the When p and q have the same degree, we also say that they interlace if their roots alternate. But,
adjacency matrix of a random perfect matching by choosing a permutation matrix Π uniformly at now there are two ways in which their roots can do so, depending on which polynomial has the
random, and then forming ΠM ΠT . The sum of d independent uniform random perfect machings largest root. If
Yn Yn
p(x) = (x − λi ) and q(x) = (x − µi ),
i=1 i=1

323
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 325 CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 326

we write q → p if p and q interlace and for every i the ith root of p is at least as large as the ith Lemma 41.3.3. Let A be an n-dimensional symmetric matrix and let v be a vector. Let
root of q. That is, if
λ 1 ≥ µ1 ≥ λ 2 ≥ µ 2 ≥ · · · ≥ λ n ≥ µ n . pt (x) = χx (A + tv v T ).

Lemma 41.3.1. Let p and q be polynomials of degree n and n − 1 that interlace and have positive Then there is a degree n − 1 polynomial q(x) so that
leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is real rooted and
pt (x) = χx (A) − tq(x).
p(x) → pt (x).
Proof. Consider the case in which v = δ 1 . It suffices to consider this case as determinants, and
Proof Sketch. For simplicity, I consider the case in which all of the roots of p and q are distinct. thus characteristic polynomials, are unchanged by multiplication by rotation matrices.
One can prove the general case by dividing out the common repeated roots.
Then, we know that
To see that the largest root of pt is larger than λ1 , note that q(x) is positive for all x > µ1 , and χx (A + tδ 1 δ T1 ) = det(xI − A − tδ 1 δ T1 ).
λ1 > µ1 . So, pt (λ1 ) = p(λ1 ) − tq(λ1 ) < 0. As pt is monic, it is eventually positive and it must have
a root larger than λ1 . Now, the matrix tδ 1 δ T1 is zeros everywhere except for the element t in the upper left entry. So,

We will now show that for every i ≥ 1, pt has a root between λi+1 and λi . As this gives us d − 1 det(xI − A − tδ 1 δ T1 ) = det(xI − A) − t det(xI (1) − A(1) ) = χx (A) − tχx (A(1) ),
more roots, it accounts for all d roots of pt . For i odd, we know that q(λi ) > 0 and q(λi+1 ) < 0.
As p is zero at both of these points, pt (λi ) > 0 and pt (λi+1 ) < 0, which means that pt has a root where A(1) is the submatrix of A obtained by removing its first row and column.
between λi and λi+1 . The case of even i is similar.
We know that χx (A + tv v T ) is real rooted for all t, and we can easily show using the Courant
The converse of this theorem is also true. Fischer Theorem that for t > 0 it interlaces χx (A) from above. Lemmas 44.7.2 and 44.7.1 tell us
that these facts imply each other.
Lemma 41.3.2. Let p and q be polynomials of degree n and n − 1, and let pt (x) = p(x) − tq(x).
If pt is real rooted for all t ∈ IR, then p and q interlace. We need one other fact about interlacing polynomials.

Lemma 41.3.4. Let p0 (x) and p1 (x) be two degree n monic polynomials for which there is a third
Proof Sketch. Recall that the roots of a polynomial are continuous functions of its coefficients, and
polynomial r(x) that has the same degree as p0 and p1 and so that
thus the roots of pt are continuous functions of t. We will use this fact to obtain a contradiction.
For simplicity,1 I again just consider the case in which all of the roots of p and q are distinct. p0 (x) → r(x) and p1 (x) → r(x).

If p and q do not interlace, then p must have two roots that do not have a root of q between Then for all 0 ≤ s ≤ 1,
them. Let these roots of p be λi+1 and λi . Assume, without loss of generality, that both p and q def
ps (x) = sp1 (x) + (1 − s)p0 (x)
are positive between these roots. We now consider the behavior of pt for positive t.
is a real rooted polynomial.
As we have assumed that the roots of p and q are distinct, q is positive at these roots, and so pt is
negative at λi+1 and λi . If t is very small, then pt will be close to p in value, and so there must be Sketch. Assume for simplicity that all the roots of r are distinct. Let µ1 > µ2 > · · · > µn be the
some small t0 for which pt0 (x) > 0 for some λi+1 < x < λi . This means that pt0 must have two roots of r. Our assumptions imply that both p0 and p1 are positive at µi for odd i and negative
roots between λi+1 and λi . for even i. So, the same is true of their sum ps . This tells us that ps must have at least n − 1 real
As q is positive on the entire closed interval [λi+1 , λi ], when t is large pt will be negative on this roots.
entire interval, and thus have no roots inside. As we vary t between t0 and infinity, the two roots We can also show that ps has a root that is less than µn . One way to do it is to recall that the
at t0 must vary continuously and cannot cross λi+1 or λi . This means that they must become complex roots of a polynomial with real coefficients come in conjugate pairs. So, ps can not have
complex, contradicting our assumption that pt is always real rooted. only one complex root.

Together, Lemmas 44.7.2 and 44.7.1 are known as Obreschkoff’s Theorem [Obr63].
The following example will be critical.
1
I thank Sushant Sachdeva for helping me work out this particularly simple proof.
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 327 CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 328

41.4 Sums of polynomials Proof. Define


rt (x) = px (A + tv v T , B).
Our goal is to show that X By assumption, rt (x) is real rooted for every t ∈ IR. By Lemma 44.5.1, we can write
χx (A + ΠBΠT )
Π∈Sn rt (x) = r0 (x) − tq(x),
is a real rooted polynomial for all symmetric matrices A and B, where Sn is the set of n-by-n
where q(x) has degree n − 1 and both r0 and q have positive leading coefficients. So, by Lemma
permuation matrices. We will do this by proving it for smaller sets of permutation matrices. To
44.7.1 q(x) interlaces r0 (x) = px (A, B). Lemma 44.7.2 then tells us that
begin, we know it for S = {I }. We will build up larger sets by swapping coordinates.
This will actually result in a distribution on permuations, so we consider σ : Sn → IR≥0 and px (A, B) → px (A + v v T , B).
consider sums of the form X The same argument tells us that
σ(Π)χx (A + ΠBΠT ).
Π
px (A − uu T + v v T , B) → px (A + v v T , B).
For coordinates i and j, let Γi,j be the permutation matrix that just swaps i and j. We call such a
This tells us that px (A, B) and px (A − uu T + v v T , B) both interlace r1 (x) from below. We
permutation a swap. We need the following important fact about the action of swaps on matrices.
finish by applying Lemma 44.7.3 to conclude that every convex combination of these polynomials
Lemma 41.4.1. Let A be a symmetric matrix. Then, for all i and j, there are vectors u and v is real rooted.
so that
Corollary 41.4.3. Let σ be such that for all symmetric matrices A and B,
Γi,j AΓi,j = A − uu T + v v T .
def
X
px (A, B) = σ(Π)χx (A + ΠBΠT )
Proof. Without loss of generality, let i = 1 and j = 2. We prove that A − Γi,j AΓi,j has rank 2 Π∈S
and trace 0.
is real rooted. Then, for every 0 < s < 1 and for every symmetric A and B the polynomial
We can write this difference in the form X
  sσ(Π)χx (A + ΠBΠT ) + (1 − s)σ(Π)χx (A + Γi,j ΠBΠT ΓTi,j )
a11 − a22 a12 − a21 a13 − a23 a14 − a24 . . .   Π∈S
a21 − a12 a22 − a11 a23 − a13 a24 − a14 . . . α β yT
 
a31 − a32 a32 − a31   T 
 0 ...  = −β −α −y is real rooted.
a41 − a42 a42 − a41 0 ...  y −y 0n−2
... Proof. Recall that
for some numbers α, β and some column vector y of length n − 2. If α ̸= β then the sum of the χx (A + Γi,j ΠBΠT ΓTi,j ) = χx (ΓTi,j AΓi,j + ΠBΠT ) = χx (Γi,j AΓTi,j + ΠBΠT ).
first two rows is equal to (c, −c, 0, . . . , 0) for some c ̸= 0, and every other row is a scalar multiple
of this. On the other hand, if α = β then the first two rows are linearly dependent, and all of the The corollary now follows from the previous lemma.
other rows are multiples of (1, −1, 0, . . . , 0).

Lemma 41.4.2. Let σ be such that for all symmetric matrices A and B,
41.5 Random Swaps
def
X
T
px (A, B) = σ(Π)χx (A + ΠBΠ )
Π∈S We will build a random permutation out of random swaps. A random swap is specified by
coordinates i and j and a swap probability s. It is a random matrix is that is equal to the identity
is real rooted. Then, for every 0 < s < 1 and pair of vectors u and v , for every symmetric A and with probability 1 − s and Γi,j with probability s. Let S be a random swap.
B the polynomial
(1 − s)px (A, B) + spx (A − uu T + v v T , B) In the language of random swaps, we can express Corollary 41.5.1 as follows.
is real rooted.
CHAPTER 41. EXPECTED CHARACTERISTIC POLYNOMIALS 329

Corollary 41.5.1. Let Π be a random permutation matrix drawn from a distribution so that for
all symmetric matrices A and B,  
T
E χx (A + ΠBΠ )
is real rooted. Let S be a random swap. Then,
 T T

E χx (A + S ΠBΠ S )
Chapter 42
is real rooted for every symmetric A and B.

All that remains is to show that a uniform random permutation can be assembled out of random
swaps. The trick to doing this is to choose the random swaps with swap probabilities other than Quadrature for the Finite Free
1/2. If you didn’t do this, it would be impossible as there are n! permutations, which is not a
power of 2. Convolution
Lemma 41.5.2. For every n, there exists a finite sequence of random swaps S 1 , . . . , S k so that

S 1S 2 . . . S k
This Chapter Needs Editing
is a uniform random permutation.

Proof. We prove this by induction. We can generate a random permutation on 1, . . . , n by first 42.1 Overview
choosing which item maps to n, and then generating a random permutation on those that remain.
To this end, we first form a sequence that gives a random permtuation on the first n − 1 elements.
The material in today’s lecture comes from [MSS15d] and [MSS15a]. My goal today is to prove
We then compose this with a random swap that exchanges elements 1 and n with probability
simple analogs of the main quadrature results, and then give some indication of how the other
1 − 1/n. At this point, the element that maps to n will be uniformly random. We then compose
quadrature statements are proved. I will also try to explain what led us to believe that these
with yet another sequence that gives a random permutation on the first n − 1 elements.
results should be true.
Recall that last lecture we considered the expected characteristic polynomial of a random matrix
of the form A + ΠBΠT , where A and B are symmetric. We do not know a nice expression for
this expected polynomial for general A and B. However, we will see that there is a very nice
expression when A and B are Laplacian matrices or the adjacency matrices of regular graphs.

42.2 The Finite Free Convolution

In Free Probability [Voi97], one studies operations on matrices in a large dimensional limit. These
matrices are determined by the moments of their spectrum, and thus the operations are
independent of the eigenvectors of the matrices. We consider a finite dimensional analog.
For n-dimensional symmetric matrices A and B, we consider the expected characteristic
polynomial  
T
EQ∈O(n) χx (A + QBQ ) ,
where O(n) is the group of n-by-n orthonormal matrices, and Q is a random orthonormal matrix
chosen according to the Haar measure. In case you are not familiar with “Haar measure”, I’ll
quickly explain the idea. It captures our most natural idea of a random orthnormal matrix. For

330
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 331 CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 332

example, if A is a Gaussian random symmetric matrix, and V is its matrix of eigenvectors, then We describe this theorem as a quadrature result, because it obtains an integral over a continuous
V is a random orthonormal matrix chosen according to Haar measure. Formally, it is the space as a sum over a finite number of points.
measure that is invariant under group operations, which in this case are multiplication by
Before going in to the proof of the theorem, I would like to explain why one might think
orthnormal matrices. That is, the Haar measure is the measure under which for every S ⊆ O(n)
something like this could be true. The first answer is that it was a lucky guess. We hoped that
and P ∈ O(n), S has the same measure as {QP : Q ∈ S}.
this expectation would have a nice formula. The nicest possible formula would be a bi-linear map:
This expected characteristic polynomial does not depend on the eigenvectors of A and B, and a function that is linear in p when q is held fixed, and vice versa. So, we computed some examples
thus can be written as a function of the characteristic polynomials of these matrices. To see this, by holding B and q fixed and varying A. We then observed that the coefficients of the resulting
write A = V DV T and B = U C U T where U and V are the orthnormal eigenvectors matrices expected polynomial are in fact a linear functions of the coefficients of p. Once we knew this, it
and C and D are the diagonal matrices of eigenvalues. We have didn’t take too much work to guess the formula.

χx (V DV T + QU C U T Q T ) = χx (D + V T QU C U T Q T V ) = χx (D + (V T QU )C (V T QU )T ). I now describe the main quadrature result we will prove today. Let B(n) be the nth
hyperoctahedral group. This is the group of symmetries of the generalized octahedron in n
If Q is distributed according to the Haar measure on O(n), then so is V T QU . dimensions. It may be described as the set of matrices that can be written in the form DΠ, where
D is a diagonal matrix of ±1 entries and Π is a permutation. It looks like the family of
If p(x) and q(x) are the characteristic polynomials of A and B, then we define their finite free permutation matrices, except that both 1 and −1 are allowed as nonzero entries. B(n) is a
convolution to be the polynomial subgroup of O(n).
def  
p(x) n q(x) = EQ∈O(n) χx (A + QBQ T ) . Theorem 42.2.3. For all symmetric matrices A and B,
 T
  T

In today’s lecture, we will establish the following formula for the finite free convolution. EQ∈O(n) χx (A + QBQ ) = EP∈B(n) χx (A + PBP ) .

Theorem 42.2.1. Let We will use this result to prove Theorem 42.2.1. The proof of Theorem 42.2.2 is similar to the
n
X n
X proof of Theorem 42.2.3. So, we will prove Theorem 42.2.3 and then explain the major differences.
p(x) = xn−i (−1)i ai and q(x) = xn−i (−1)i bi .
i=0 i=0

Then,
42.3 Quadrature
n
X X (n − i)!(n − j)!
p(x) n q(x) = xn−k (−1)k ai bj . (42.1) In general, quadrature formulas allow one to evaluate integrals of a family of functions over a
n!(n − i − j)!
k=0 i+j=k
fixed continuous domain by summing the values of those functions at a fixed number of points.
There is an intimate connection between families of orthogonal polynomials and quadrature
This convolution was studied by Walsh [Wal22], who proved that when p and q are real rooted, so formulae that we unfortunately do not have time to discuss.
is p n q.
The best known quadrature formula allows us to evalue the integral of a polynomial around the
Our interest in the finite free convolution comes from the following theorem, whose proof we will unit circle in the complex plane. For a polynomial p(x) of degree less than n,
also sketch today.
Z n−1

1X
Theorem 42.2.2. Let A and B be symmetric matrices with constant row sums. If A1 = a1 and p(eiθ )dθ = p(ω k ),
B1 = b1, we may write their characteristic polynomials as θ=0 n
k=0

χx (A) = (x − a)p(x) and χx (B) = (x − b)q(x). where ω = e2πi/n is a primitive nth root of unity.

We then have We may prove this result by establishing it separately for each monomial. For p(x) = xk with
 
T
EΠ∈Sn χx (A + ΠBΠ ) = (x − (a + b))(p(x) n−1 q(x)). k ̸= 0, Z 2π Z 2π
p(eiθ )dθ = eiθk dθ = 0.
We know that 1 is an eigenvector of eigenvalue a + b of A + ΠBΠT for every permutation matrix θ=0 θ=0

Π. Once we work orthogonal to this vector, we discover the finite free convolution.
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 333 CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 334

And, for |k| < n, the corresponding sum is the sum of nth roots of unity distributed So, "Z # Z
symmetrically about the unit circle. So,
EP∈B(n) f (QP) = f (Q).
n−1 Q∈O(n) Q∈O(n)
X
jk
ω = 0. On the other hand, as B(n) is discrete we can reverse the order of integration to obtain
j=0
Z Z Z
We used this fact in the start of the semester when we computed the eigenvectors of the ring f (Q) = EP∈B(n) [f (QP)] = EP∈B(n) [f (P)] = EP∈B(n) [f (P)] ,
Q∈O(n) Q∈O(n) Q∈O(n)
graph and observed that all but the dominant are orthogonal to the all-1s vector.
where the second equality follows from Theorem 42.4.1.
On the other hand, for p(x) = 1 both the integral and sum are 1.
We will use an alternative approach to quadrature on groups, encapsulted by the following lemma.
P 42.5 Structure of the Orthogonal Group
Lemma 42.3.1. For every n and function p(x) = |k|<n ck xk , and every θ ∈ [0, 2π],
n
X n
X To prove Theorem 42.4.1, we need to know a little more about the orthogonal group. We divide
p(ei(2πj/n+θ) ) = p(ei(2πj/n) ). the orthonormal matrices into two types, those of determinant 1 and those of determinant −1.
j=0 j=0 The orthonormal matrices of determinant 1 form the special orthogonal group, SO(n), and every
matrix in O(n) may be written in the form DQ where Q ∈ SO(n) and D is a diagonal matrix in
This identity implies the quadrature formula above, and has the advantage that it can be which the first entry is ±1 and all others are 1. Every matrix in SO(n) may be expressed as a
experimentally confirmed by evaluating both sums for a random θ. product of 2-by-2 rotation matrices. That is, for every Q ∈ SO(n) there are matrices Q i,j for
1 ≤ i < j ≤ n so that Q i,j is a rotation in the span of δ i and δ j and so that
Proof. We again evaluate the sums monomial-by-monomial. For p(x) = xk , with |k| < n, we have
Q = Q 1,2 Q 1,3 · · · Q 1,n Q 2,3 · · · Q 2,n · · · Q n−1,n .
n
X n
X
i(2πj/n+θ) k iθk
(e ) =e (ei(2πj/n) )k . If you learned the QR-factorization of a matrix, then you learned an algorithm for computing this
j=0 j=0 decomposition.

For k ̸= 0, the latter sum is zero. For k = 0, eiθk = 1. These facts about the structure of O(n) tell us that it suffices to prove Theorem 42.4.1 for the
special cases in which Q = diag(−1, 1, 1, . . . , 1) and when Q is rotation of the plane spanned by δ i
and δ j . As the diagonal matrix is contained in B(n), the result is immediate in that case.
42.4 Quadrature by Invariance For simplicity, consider the case i = 1 and j = 2, and let Rθ denote the rotation by angle θ in the
first two coordinates:  
cos θ sin θ 0
For symmetric matrices A and B, define the function def
Rθ = − sin θ cos θ 0 .
fA,B (Q) = det(A + QBQ T ). 0 0 I n−2
The hyperoctahedral group B(n) contains the matrices Rθ for θ ∈ {0, π/2, π, 3π/2}. As B(n) is a
We will derive Theorem 42.2.3 from the following theorem. group, for these θ we know
Theorem 42.4.1. For all Q ∈ O(n),
EP∈B(n) [f (P)] = EP∈B(n) [f (Rθ P)] ,
EP∈B(n) [f (P)] = EP∈B(n) [f (QP)] . as the set of matrices in the expectations are identical. This identity implies
3
Proof of Theorem 42.2.3. First, observe that it suffices to consider determinants. For every 1X  
P ∈ B(n), we have EP∈B(n) fA,B (R2πj/4 P) = EP∈B(n) [f (P)] .
4
j=0
Z Z Z
det(A + QBQ T ) = f (Q) = f (QP).
Q∈O(n) Q∈O(n) Q∈O(n) We will prove the following lemma, and then show it implies Theorem 42.4.1.
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 335 CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 336

Lemma 42.5.1. For every symmetric A and B, and every θ The term eiθ only appears in the first row and column of this matrix, and the term e−iθ only
appears in the second row and column. As a determinant can be expressed as a sum of products
3 3
1X 1X of matrix entries with one in each row and column, it is immediate that this determinant can be
fA,B (Rθ+2πj/4 ) = fA,B (R2πj/4 ).
4 4 expressed in terms of ekiθ for |k| ≤ 4. As each such product can have at most 2 terms of the form
j=0 j=0
eiθ and at most two of the form e−iθ , we have |k| ≤ 2.
This lemma implies that for every Q 1,2 ,
The difference between Theorem 42.2.3 and Theorem 42.2.2 is that the first involves a sum over
 
EP∈B(n) [f (P)] = EP∈B(n) f (Q 1,2 P) . the isometries of hyperoctahedron, while the second involves a sum over the symmetries of the
regular n-simplex in n − 1 dimensions. The proof of the appropriate quadrature theorem for the
This, in turn, implies Theorem 42.4.1 and thus Theorem 42.2.3. symmetries of the regular simplex is very similar to the proof we just saw, except that rotations of
the plane through δ i and δ j are replaced by rotations of the plane parallel to the affine subspace
We can use Lemma 42.3.1 to derive Lemma 42.5.1 follows from the following.
spanned by triples of vertices of the simplex.
Lemma 42.5.2. For every symmetric A and B, there exist c−2 , c−1 , c0 , c1 , c2 so that
2
X 42.6 The Formula
fA,B (Rθ ) = ck (eiθ )k .
k=−2
To establish the formula in Theorem 42.2.1, we observe that it suffices to compute the formula for
Proof. We need to express f (Rθ ) as a function of eiθ . To this end, recall that diagonal matrices, and that Theorem 42.2.3 makes this simple. Every matrix in B(n) can be
written as a product ΠD where D is a ±1 diagonal matrix. If B is the diagonal matrix with
1 −i iθ entries µ1 , . . . , µn , then ΠDBDΠT = ΠBΠT , which is the diagonal matrix with entries
cos θ = (eiθ + e−iθ ) and sin θ = (e − e−iθ ).
2 2 µπ(1) , . . . , µπ(n) , where π is the permutation corresponding to Π.
From these identities, we see that all two-by-two rotation matrices can be simultaneously Let A be diagonal with entries λ1 , . . . , λn . For a subset S of {1, . . . , n}, define
diagonalized by writing    iθ  Y
cos θ sin θ
=U
e 0
U ∗, λS = λi .
− sin θ cos θ 0 e −iθ
i∈S

where   We then have


1 1 X
U = , ai = λS .
i −i
|S|=i
and we recall that U ∗ is the conjugate transpose:
Let
  n
X
U∗ =
1 −i
. p n q= xn−k (−1)k ck .
1 i k=0

We first compute the expected determinant, cn .


Let D θ be the digaonal matrix having its first two entries eiθ and e−iθ , and the rest 1, and let U n
be the matrix with U in its upper 2-by-2 block and 1s on the diagonal beneath. So, 1 XY 1 XX S Y
cn = (λh + µπ(h) ) = λ µh .
n! π n! π
Rθ = U n D θ U ∗n . h |S|=i h:π(h)̸∈S

Now, examine As opposed to expanding this out, let’s just figure out how often the product λS µT appears. We
must have |T | = n − |S|, and then this term appears for each permutation such that π(T ) ∩ S = ∅.
fA,B (Rθ ) = det(A + Rθ BR∗θ ) This happens 1/ ni fraction of the time, giving the formula
= det(A + U n D θ U ∗n BU n D ∗θ U ∗n ) n n n
X 1 X S X X 1 X i!(n − i)!
= det(U ∗n AU n + D θ U ∗n BU n D ∗θ ) cn = n
 λ µT = 
n ai bn−i = ai bn−i .
n!
= det(U ∗n AU n D θ + D θ U ∗n BU n ). i
i=0 |S|=i i
|T |=n−i i=0 i=0
CHAPTER 42. QUADRATURE FOR THE FINITE FREE CONVOLUTION 337

For general ck and i + j = k, we see that λS and µT appear whenever µ(T ) is disjoint from S.
The probability of this happening is
n−i

j (n − i)!(n − j)!j! (n − i)!(n − j)!
n
 = = ,
j
n!(n − i − j)!j! n!(n − i − j)!

and so
X (n − i)!(n − j)!
Chapter 43
ck = ai bj .
n!(n − i − j)!
i+j=k

42.7 Question
Ramanujan Graphs of Every Size
For which discrete subgroups of O(n) does a result like Theorem 42.2.3 hold? Can it hold for a
substantially smaller subgroup than the symmetries of the simplex (which has size (n + 1)! in n
dimensions). This Chapter Needs Editing

43.1 Overview

We will mostly prove that there are Ramanujan graphs of every number of vertices and degree.
The material in today’s lecture comes from [MSS15d] and [MSS15a]. In those papers, we prove
that for every even n and degree d < n there is a bipartite Ramanujan graph of degree d on n
vertices. A bipartite Ramanujan graph of degree d is an approximation of a complete bipartite
√ matrix thus has eigenvalues d and −d, and all other eigenvalues bounded in
graph. It’s adjacency
absolute value by 2 d − 1.
The difference between this result and that which we prove today is that we will show that for
every d√< n there is a d-regular (multi) graph in whose second adjacency matrix eigenvalue is at
most 2 d − 1. This bound is sufficient for many applications of expanders, but not all. We will
not control the magnitude of the negative eigenvalues. The reason will simply be for simplicity:
the proofs to bound the negative eigenvalues would take more lectures.
Next week we will see a different technique that won’t produce a multigraph and that will
produce a bipartite Ramanujan graph.

43.2 The Approach

We will consider the sum of d random perfect matchings on n vertices. This produces a d-regular
graph that might be a multigraph. Friedman [Fri08] proves that such a graph is probably very
close to being Ramanujan if n is big enough relative to d. In particular, he proves that for all d
and ϵ > 0 there is an n0 so that for all n > n0√
, such a graph will probably have all eigenvalues
other than µ1 bounded in absolute value by 2 d − 1 + ϵ. We remove the asymptotics and the ϵ,
but merely prove the existence of one such graph. We do not estimate the probability with which

338
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 339 CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 340

such a graph is Ramanujan. But, it is predicted to be a constant [?]. Lemma 43.3.1. Let p1 , . . . , pm be polynomials so that pi (x) → r(x), and let s1 , . . . , sm ≥ 0 be not
identically zero. Define
The fundamental difference between our technique and that of Friedman is that Friedman bounds m
X
the moments of the distribution of the eigenvalues of such a random graph. I suspect that there is p∅ (x) = si pi (x).
no true bound on these moments that would allow one to conclude that a random graph is i=1
probably Ramanujan. We consider the expected characteristic polynomial. Then, there is an i so that the largest root of pi (x) is at most the largest root of p∅ (x). In general,
for every j there is an i so that the jth largest root of pi (x) is at most the jth largest root of p∅ (x).
Let M be the adjacency matrix of a perfect matching, and let Π1 , . . . , Πd be independent uniform
random permutation matrices. We will consider the expected characteristic polynomial
Proof. We prove this for the largest root. The proof for the others is similar. Let λ1 and λ2 be
 T T

EΠ1 ,...,Πd χx (Π1 M Π1 + · · · + Πd M Πd ) . the largest and second-largest roots of r(x). Each polynomial pi (x) has exactly one root between
λ1 and λ2 , and is positive at all x > λ1 . Now, let µ be the largest root of p∅ (x). We can see that
In Lecture 22, we learned that this polynomial is real rooted. In Lecture 23, we learned a µ must lie between λ1 and λ2 . We also know that
technique that allows us to compute√this polynomial. Today we will prove that the second largest X
root of this polynomial is at most 2 d − 1. First, we show why this matters: it implies that there pi (µ) = 0.
is some√choice of the matrices Π1 , . . . , Πd so that resulting polynomial has second largest root at i

most 2 d − 1. These matrices provide the desired graph. If pi (µ) = 0 for some i, then we are done. If not, there is an i for which pi (µ) > 0. As pi only has
one root larger than λ2 , and it is eventually positive, the largest root of pi must be less than µ.

43.3 Interlacing Families of Polynomials Our polynomials do not all have a common interlacing. However, they satisfy a property that is
just as useful: they form an interlacing family. We say that a set of polynomials p1 , . . . , pm forms
The general problem we face is the following. We have a large family of polynomials, say an interlacing family if there is a rooted tree T in which
p1 (x), . . . , pm (x), for which we know each pi is real-rooted and such that their sum is real rooted.
We would like to show that there is some polynomial pi whose largest root is at most the largest a. every leaf is labeled by some polynomial pi ,
root of the sum, or rather we want to do this for the second-largest root. This is not true in b. every internal vertex is labeled by a nonzero, nonnegative combination of its children, and
general. But, it is true in our case. We will show that it is true whenever the polynomials form
what we call an interlacing family. c. all siblings have a common interlacing.
Recall from Lecture 22 that we say that for monic degree n polynomials p(x) and r(x),
The last condition guarantees that every internal vertex is labeled by a real rooted polynomial.
p(x) → r(x) if the roots of p and r interlace, with the roots of r being larger. We proved that if
Note that the same label is allowed to appear at many leaves.
p1 (x) → r(x) and p2 (x) → r(x), then every convex combination of p1 and p2 is real rooted. If we
go through the proof, we will also see that for all 0 ≤ s ≤ 1, Lemma 43.3.2. Let p1 , . . . , pm be an interlacing family, let T be the tree witnessing this, and let
p∅ be the polynomial labeling the root of the tree. Then, for every j there exists an i for which the
sp2 (x) + (1 − s)p1 (x) → r(x). jth largest root of pi is at most the jth largest root of p∅ .

Proceeding by induction, we can show that if pi (x) → r(x) for each i, then every convex Proof. By Lemma 44.5.2, there is a child of the root whose label has a jth largest root that is
combination of these polynomials interlaces r(x), and is thus real rooted. That is, for every smaller than the jth largest root of p∅ . If that child is not a leaf, then we can proceed down the
s1 , . . . , sm so that si ≥ 0 (but not all are zero), tree until we reach a leaf, at each step finding a node labeled by a polynomial whose jth largest
X root is at most the jth largest root of the previous polynomial.
si pi (x) → r(x).
i
Our construction of permutations by sequences of random swaps provides the required interlacing
Polynomials that satisfy this condition are said to have a common interlacing. By a technique family.
analogous to the one we used to prove Lemma 22.3.2, one can prove that the polynomials
Theorem 43.3.3. For permutation matrices Π1 , . . . , Πd , let
p1 , . . . , pm have a common interlacing if and only if every convex combination of these
polynomials is real rooted. pΠ1 ,...,Πd (x) = χx (Π1 M ΠT1 + · · · + Πd M ΠTd ).
These polynomials form an interlacing family.
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 341 CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 342

We will finish this lecture by proving that the second-largest root of For a real rooted polynomial p, and thus for real λ1 , . . . , λd , it is the value of x that is larger than
all the λi for which Gp (x) = w. For w = ∞, it is the largest root of p. But, it is larger for finite w.
E [pΠ1 ,...,Πd (x)]
√ We will prove the following bound on the Cauchy transforms.
is at most 2 d − 1. This implies that there is a d-regular
√ multigraph on n vertices in our family
Theorem 43.4.1. For degree n polynomials p and q and for w > 0,
with second-largest adjacency eigenvalue at most 2 d − 1.
Kp nq (w) ≤ Kp (w) + Kq (w) − 1/w.

43.4 Root Bounds for Finite Free Convolutions For w = ∞, this says that the largest root of p n q is at most the sum of the largest roots of p
and q. But, this is obvious.
Recall from the last lecture that for n-dimensional symmetric matrices A and B with uniform
To explain the 1/w term in the above expression, consider q(x) = xn . As this is the characteristic
row sums a and b and characteristic polynomials (x − a)p(x) and (x − b)q(x),
polynomial of the all-zero matrix, p n q = p(x). We have
 T

EΠ χx (A + ΠBΠ ) = (x − (a + b))p(x) n−1 q(x). 1 nxn−1 1
Gq (x) = = .
This formula extends to sums of many such matrices. It is easy to show that n xn x
So,
def
χx (M ) = (x − 1)n/2 (x + 1)n/2 = (x − 1)p(x), where p(x) = (x − 1)n/2−1 (x + 1)n/2 . Kq (w) = max {x : 1/x = w} = 1/w.

So, Thus,
def Kq (w) − 1/w = 0.
p∅ (x) = E [pΠ1 ,...,Πd (x)] = (x − d) (p(x) n−1 p(x) n−1 p(x) n−1 ··· n−1 p(x)) ,
where p(x) appears d times above. I will defer the proof of this theorme to next lecture (or maybe the paper [MSS15a]), and now just
show how we use it.
We would like to prove a bound on the largest root of this polynomial in terms of the largest
roots of p(x). This effort turns out not to be productive. To see why, consider matrices A = aI
and B = bI . It is clear that A + ΠBΠT = (a + b)I for every Π. This tells us that
43.5 The Calculation
(x − a)n (x − b)n = (x − (a + b))n .
For p(x) = (x − 1)n/2−1 (x + 1)n/2 ,
So, the largest roots can add. This means that if we are going to obtain useful bounds on the    
roots of the sum, we are going to need to exploit facts about the distribution of the roots of p(x). 1 n/2 − 1 n/2 1 n/2 n/2
Gp (x) = + ≤ + ,
As in Lecture ??, we will use the barrier functions, just scaled a little differently. n−1 x−1 x+1 n x−1 x+1
For, for x ≥ 1. This latter expression is simple to evaluate. It is
n
Y
p(x) = (x − λi ), x
= Gχ(M ) (x) .
i=1 x2 − 1
define the Cauchy transform of p at x to be We also see that
Kp (w) ≤ Kχ(M ) (w) ,
d
X
1 1 1 p′ (x)
Gp (x) = = . for all w ≥ 0.
d x − λi d p(x)
i=1
Theorem 43.4.1 tells us that
For those who are used to Cauchy transforms, I remark that this is the Cauchy transform of the d−1
uniform distribution on the roots of p(x). As we will be interested in upper bounds on the Kp n−1 ··· p (w) ≤ dKp (w) − .
w
Cauchy transform, we will want a number u so that for all x > u, Gp (x) is less than some
specified value. That is, we want the inverse Cauchy transform, which we define to be Using the above inequality, we see that this is at most
d−1
Kp (w) = max {x : Gp (x) = w} . dKχ(M ) (w) − .
w
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 343 CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 344

As this is an upper bound on the largest root of p n−1 ··· n−1 p, we wish to set w to minimize From this, one can derive a formula that plays better with derivatives:
this expression. As, n
x 1 X
Gχ(M ) (x) = , p(x) q(x) = (n − i)!bi p(i) (x).
x2 − 1 n
n!
i=0
we have
x
Kχ(M ) (w) = x if and only if w = . This equation allows us to understand what happens when p and q have different degrees.
x2 − 1
So, Lemma 43.6.1. If p(x) has degree n and q(x) = xn−1 , then
d−1 x2 − 1
dKχ(M ) (w) − . ≤ dx − d − 1 . p(x) q(x) = ∂x p(x).
w x n

The choice of x that minimizes this is d − 1, at which point it becomes
For the special case of q(x) = xn−1 , we have
√ (d − 1)(d − 2) √ √ √
d d−1− √ = d d − 1 − (d − 2) d − 1 = 2 d − 1.
d−1 Uα q(x) = xn−1 − α(n − 1)xn−2 ,

so
43.6 Some explanation of Theorem 43.4.1 maxroot (Uα q(x)) = α(n − 1).
So, in this case (43.1) says
I will now have time to go through the proof of Theorem 43.4.1. So, I’ll just tell you a little about
it. We begin by transforming statements about the inverse Cauchy transform into statements maxroot (Uα ∂x p) ≤ maxroot (Uα p) + maxroot (Uα q) − nα = maxroot (Uα p) − α.
about the roots of polynomials.
1 p′ (x) The proof of Theorem 43.4.1 has two major ingredients. We begin by proving the above
As Gp (x) = d p(x) ,
1 ′ inequality. We then show that the extreme case for the inequality is when q(x) = (x − b)n for
Gp (x) = w ⇐⇒ p(x) − p (x) = 0. some b. To do this, we consider an arbitrary real rooted polynomial q, and then modify it to make
wd
two of its roots the same. This leads to an induction on degree, which is essentially handled by
This tells us that
the following result.

Kp (w) = maxroot p(x) − p′ (x)/wd = maxroot ((1 − (1/wd)∂x )p) . Lemma 43.6.2. If p(x) has degree n and the degree of q(x) is less than n, then
As this sort of operator appears a lot in the proof, we give it a name: 1
p n q= (∂x p) n−1 q.
n
Uα = 1 − α∂x .
The whose proof is fairly straightforward, and only requires 2 pages.
In this notation, Theorem 43.4.1 becomes

maxroot (Uα (p n q)) ≤ maxroot (Uα p) + maxroot (Uα p) − nα. (43.1)


43.7 Some thoughts
We, of course, also need to exploit an expression for the finite free convolution. Last lecture, we
proved that if I would like to reflect on the fundamental difference between considering expected characteristic
polynomials and the distributions of the roots of random polynomials. Let A be a symmetric
n
X n
X matrix of dimension 3k with k eigenvalues that are 1, 0, and −1. If you consider A + ΠAΠT for a
p(x) = xn−i (−1)i ai and q(x) = xn−i (−1)i bi . random Π, the resulting matrix will almost definitely have a root at 2 and a root at −2. In fact,
i=0 i=0
the chance that it does not is exponentially small in k. However, all the roots of the expected
Then, characteristic polynomial of this matrix are strictly bounded away from 2. You could verify this
n
X X (n − i)!(n − j)! by computing the Cauchy transform of this polynomial.
p(x) n q(x) = xn−k (−1)k ai bj . (43.2)
n!(n − i − j)!
k=0 i+j=k
CHAPTER 43. RAMANUJAN GRAPHS OF EVERY SIZE 345

In our case, we considered a matrix A with k eigenvalues of 1 and k eigenvalues of −1. If we


consider A + ΠAΠT , it will almost definitely have roots at 2 and −2, and in fact the expected
characteristic polynomial has roots that are very close to this. But, if we consider

A + Π1 AΠT1 + Π2 AΠT2 ,

√ at 3 and −3, the largest root of the expected


even though it almost definitely has roots
characteristic polynomial is at most 2 2 < 3.
Chapter 44
I should finish by saying that Theorem 43.4.1 is inspired by a theorem of Voiculescu that holds in
the infinite dimensional case. In this limit, the inequality becomes an equality.
Bipartite Ramanujan Graphs

44.1 Overview

Margulis [Mar88] and Lubotzky, Phillips and Sarnak [LPS88] presented the first explicit
constructions of infinite families of Ramanujan graphs. These had degrees p + 1, for primes p.
There have been a few other explicit constructions, [Piz90, Chi92, JL97, Mor94], all of which
produce graphs of degree q + 1 for some prime power q. Over this lecture and the next we will
prove the existence of infinite families of bipartite Ramanujan of every degree. While today’s
proof of existence does not lend itself to an explicit construction, it is easier to understand than
the presently known explicit constructions.
We think that much stronger results should be true. There is good reason to think that random
d-regular graphs should be Ramanujan [MNS08]. And, Friedman [Fri08] showed √ that a random
d-regular graph is almost Ramanujan: for sufficiently large n such a graph is a 2 d − 1 + ϵ
approximation of the complete graph with high probability, for every ϵ > 0.
In today’s lecture, we will use the method of interlacing families of polynomials to prove (half) a
conjecture of Bilu and Linial [BL06] that every bipartite Ramanujan graph has a 2-lift that is also
Ramanujan. This theorem comes from [MSS15b], but today’s proof is informed by the techniques
of [HPS15]. We will use theorems about the matching polynomials of graphs that we will prove
next lecture.
In the same way that a Ramanujan graph approximates the complete graph, a bipartite
Ramanujan graph approximates a complete bipartite graph. We say that a d-regular graph is a
bipartite Ramanujan graph√ if all of its adjacency matrix eigenvalues, other than d and −d, have
absolute value at most 2 d − 1. The eigenvalue of d is a consequence of being d-regular and the
eigenvalue of −d is a consequence of being bipartite. In particular, recall that the adjacency
matrix eigenvalues of a bipartite graph are symmetric about the origin. This is a special case of
the following claim, which you can prove when you have a sparse moment.

346
CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 347 CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 348

Claim 44.1.1. The eigenvalues of a symmetric matrix of the form Theorem 44.2.1. Every d-regular graph √ G has a signed adjacency matrix S for which the
  minimum eigenvalue of S is at least −2 d − 1.
0 A
T
A 0
We can use this theorem to build infinite families of bipartite √ Ramanujan graphs, because their
are symmetric about the origin. eigenvalues
√ are symmetric about the origin. Thus, if µ n ≥ −2 d − 1, then we know that
|µi | ≤ 2 d − 1 for all 1 < i < n. Note that every 2-lift of a bipartite graph is also a bipartite
We remark that one can derive bipartite Ramanujan graphs from ordinary Ramanujan graph.
graphs—just take the double cover. However, we do not know any way to derive ordinary
Ramanujan graphs from the bipartite ones.
As opposed to reasoning directly about eigenvalues, we will work with characteristic polynomials.
44.3 Random 2-Lifts
For a matrix M , we write its characteristic polynomial in the variable x as
We will prove Theorem 44.2.1 by considering a random 2-lift. In particular, we consider the
def
χx (M ) = det(xI − M ). expected characteristic polynomial of a random signed adjacency matrix S :

ES [χx (S )] . (44.1)
44.2 2-Lifts
Godsil and Gutman [GG81] proved that this is equal to the matching polynomial of G! We will
learn more about the matching polynomial next lecture.
We saw 2-lifts of graphs in Problem 3 from Problem Set 2:
For now, we just need the following bound on its zeros which was proved by Heilmann and Lieb
We define a signed adjacency matrix of G to be a symmetric matrix S with the same [HL72].
nonzero pattern as the adjacency matrix A, but such that each nonzero entry is either Theorem 44.3.1. The eigenvalues of the matching
1 or −1. √ polynomial of a graph of maximum degree at
most d are real and have absolute value at most 2 d − 1.
We will use it to define a graph GS . Like the double-cover, the graph GS will have
two vertices for every vertex of G and two edges for every edge of G. For each edge √
Now that we know that the smallest zero of (44.1) is at least −2 d − 1, all we need to do is to
(u, v) ∈ E, if S (u, v) = −1 then GS has the two edges show that there is some signed adjacency matrix whose smallest eigenvalue is at least this bound.
(u1 , v2 ) and (v1 , u2 ), This is not necessarily as easy as it sounds, because the smallest zero of the average of two
polynomials is not necessarily related to the smallest zeros of those polynomials. We will show
just like the double-cover. If S (u, v) = 1, then GS has the two edges that, in this case, it is.

(u1 , v1 ) and (v2 , u2 ).

You should check that G−A is the double-cover of G and that GA consists of two 44.4 Laplacianized Polynomials
disjoint copies of G.
Prove that the eigenvalues of the adjacency matrix of GS are the union of the Instead of directly reasoning about the characteristic polynomials of signed adjacency matrices S ,
eigenvalues of A and the eigenvalues of S . we will work with characteristic polynomials of dI − S . It suffices
√ for us to prove that there exists
an S for which the largest eigenvalue of dI − S is at most d + 2 d − 1.
The graphs GS that we form this way are called 2-lifts of G. Fix an ordering on the m edges of the graph, associate each S with a vector σ ∈ {±1}m , and
Bilu and Linial [BL06] conjectured that every d-regular graph G has a signed adjacency matrix S define
√ pσ (x) = χx (dI − S ).
so that ∥S ∥ ≤ 2 d − 1. This would give a simple procedure for constructing infinite families of
Ramanujan graphs. We would begin with any small d-regular Ramanujan graph, such as the The expected polynomial is the average of all these polynomials.
complete graph on d + 1 vertices. Then, given any d-regular Ramanujan graph we√ could construct
a new Ramanujan graph on twice as many vertices by using GS where ∥S ∥ ≤ 2 d − 1. We define two vectors for each edge in the graph. If the ith edge is (a, b), then we define

We will prove something close to their conjecture. v i,σi = δ a − σi δ b .


CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 349 CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 350

For every σ ∈ {±1}m , we have where A(1) is the submatrix of A obtained by removing its first row and column. The polynomial
m
X q(x) = χx (A(1) ) has degree n − 1.
v i,σi v Ti,σi = dI − S ,
i=1 For arbitrary, v , let Q be a rotation matrix for which Qv = δ 1 . As determinants, and thus
where S is the signed adjacency matrix corresponding to σ. So, for every σ ∈ {±1}m , characteristic polynomials, are unchanged by multiplication by rotation matrices,
!
Xm χx (A + tv v T ) = χx (Q(A + tv v T )Q T )
pσ (x) = χx v i,σi v Ti,σi .
= χx (QAQ T + tδ 1 δ T1 )) = χx (QAQ T ) − tq(x) = χx (A) − tq(x),
i=1

for some q(x) of degree n − 1.


44.5 Interlacing Families of Polynomials
For a polynomial p, let λmax (p) denote its largest zero. When polynomials interlace, we can relate
Here is the problem we face. We have a large family of polynomials, say p1 (x), . . . , pm (x), for the largest zero of their sum to the largest zero of at least one of them.
which we know each pi is real-rooted and that their sum is real rooted. We would like to show Lemma 44.5.2. Let p1 (x), p2 (x) and r(x) be polynomials so that r(x) → pi (x). Then,
that there is some polynomial pi whose largest zero is at most the largest zero of the sum. This is r(x) → p1 (x) + p2 (x) and there is an i ∈ {1, 2} for which
not true in general. But, it is true in our case because the polynomials form an interlacing family.
Q Qn−1 λmax (pi ) ≤ λmax (p1 + p2 ).
For a polynomial p(x) = ni=1 (x − λi ) of degree n and a polynomial q(x) = i=1 (x − µi ) of
degree n − 1, we say that q(x) interlaces p(x) if
Proof. Let µ1 be the largest zero of r(x). As each polynomial pi (x) has a positive leading
λn ≤ µn−1 ≤ λn−1 ≤ · · · ≤ λ2 ≤ µ1 ≤ λ1 . coefficient, each is eventually positive and so is their sum. As each has exactly one zero that is at
Qn least µ1 each is nonpositive at µ1 , and the same is also true of their sum. Let λ be the largest zero
If r(x) = i=1 (x − µi ) has degree n, we write r(x) → p(x) if of p1 + p2 . We have established that λ ≥ µ1 .

µn ≤ λn ≤ µn−1 ≤ · · · ≤ λ2 ≤ µ1 ≤ λ1 . If pi (λ) = 0 for some i, then we are done. If not, there is an i for which pi (λ) > 0. As pi only has
one zero larger than µ1 , and it is eventually positive, the largest zero of pi must be less than λ.
That is, if the zeros of p and r interlace, with the zeros of p being larger. We also make these
statements if they hold of positive multiples of p, r and q. If p1 , . . . , pm are polynomials such that there exists an r(x) for which r(x) → pi (x) for all i, then
The following lemma gives the examples of interlacing polynomials that motivate us. these polynomials are said to have a common interlacing. Such polynomials satisfy the natural
generalization of Lemma 44.5.2.
Lemma 44.5.1. Let A be a symmetric matrix and let v be a vector. For a real number t let
The polynomials pσ (x) do not all have a common interlacing. However, they satisfy a property
pt (x) = χx (A + tv v T ). that is just as useful: they form an interlacing family. Rather than defining these in general, we
will just explain the special case we need for today’s theorem.
Then, for t > 0, p0 (x) → pt (x) and there is a monic1 degree n − 1 polynomial q(x) so that for all t
We define polynomials that correspond to fixing the signs of the first k edges and then choosing
pt (x) = χx (A) − tq(x). the rest at random. We indicate these by shorter sequences σ ∈ {±1}k . For k < m and σ ∈ {±1}k
we define
def
pσ (x) = Eρ∈{±1}n−k [pσ,ρ (x)] .
Proof. The fact that p0 (x) → pt (x) for t > 0 follows from the Courant-Fischer Theorem.
So,
We first establish the existence of q(x) in the case that v = δ 1 . As the matrix tδ 1 δ T1 is zeros
p∅ (x) = Eσ∈{±1}m [pσ (x)] .
everywhere except for the element t in the upper left entry and the determinant is linear in each
entry of the matrix, We view the strings σ, and thus the polynomials pσ , as vertices in a complete binary tree. The
nodes with σ of length m are the leaves, and ∅ corresponds to the root. For σ of length less than
χx (A + tδ 1 δ T1 ) = det(xI − A − tδ 1 δ T1 ) = det(xI − A) − t det(xI (1) − A(1) ) = χx (A) − tχx (A(1) ), n, the children of σ are (σ, 1) and (σ, −1). We call such a pair of nodes siblings. We will
1
A monic polynomial is one whose leading coefficient is 1.
eventually prove in Lemma 44.6.1 that all the polynomials pσ (x) are real rooted and in Corollary
44.6.2 that every pair of siblings has a common interlacing.
CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 351 CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 352

But first, we show that this implies that there is a leaf indexed by σ ∈ {±1}m for which 44.7 Real Rootedness
λmax (pσ ) ≤ λmax (p∅ ).
√ To prove Lemma 44.6.1, we use the following two lemmas which are known collectively as
This implies Theorem 44.2.1, as we know from Theorem 44.3.1 that λmax (p∅ ) ≤ d + 2 d − 1. Obreschkoff’s Theorem [Obr63].

Lemma 44.5.3. There is a σ ∈ {±1}m for which Lemma 44.7.1. Let p and q be polynomials of degree n and n − 1, and let pt (x) = p(x) − tq(x).
If pt is real rooted for all t ∈ IR, then q interlaces p.
λmax (pσ ) ≤ λmax (p∅ ).
Proof Sketch. Recall that the roots of a polynomial are continuous functions of its coefficients, and
Proof. Corollary 44.6.2 and Lemma 44.5.2 imply that every non-leaf node in the tree has a child thus the roots of pt are continuous functions of t. We will use this fact to obtain a contradiction.
whose largest zero is at most the largest zero of that node. Starting at the root of the tree, we
find a node whose largest zero is at most the largest zero of p∅ . We then proceed down the tree For simplicity,2 I just consider the case in which all of the roots of p and q are distinct. If they are
until we reach a leaf, at each step finding a node labeled by a polynomial whose largest zero is at not, one can prove this by dividing out their common divisors.
most the largest zero of the previous polynomial. The leaf we reach, σ, satisfies the desired If p and q do not interlace, then p must have two roots that do not have a root of q between
inequality. them. Let these roots of p be λi+1 and λi . Assume, without loss of generality, that both p and q
are positive between these roots. We now consider the behavior of pt for positive t.

44.6 Common Interlacings As we have assumed that the roots of p and q are distinct, q is positive at these roots, and so pt is
negative at λi+1 and λi . If t is very small, then pt will be close to p in value, and so there must be
some small t0 for which pt0 (x) > 0 for some λi+1 < x < λi . This means that pt0 must have two
We can now use Lemmas 44.5.1 and 44.5.2 to show that every σ ∈ {±1}m−1 has a child (σ, s) for roots between λi+1 and λi .
which λmax (pσ,s ) ≤ λmax (pσ ). Let
m−1
X As q is positive on the entire closed interval [λi+1 , λi ], when t is large pt will be negative on this
A= v i,σi v Ti,σi . entire interval, and thus have no roots inside. As we vary t between t0 and infinity, the two roots
i=1 at t0 must vary continuously and cannot cross λi+1 or λi . This means that they must become
The children of σ, (σ, 1) and (σ, −1) have polynomials p(σ,1) and p(σ,−1) that equal complex, contradicting our assumption that pt is always real rooted.

χx (A + v m,1 v Tm,1 ) and χx (A + v m,−1 v Tm,−1 ). Lemma 44.7.2. Let p and q be polynomials of degree n and n − 1 that interlace and have positive
leading coefficients. For every t > 0, define pt (x) = p(x) − tq(x). Then, pt (x) is real rooted and
By Lemma 44.5.1, χx (A) → χx (A + v m,s v Tm,s ) for s ∈ {±1}, and Lemma 44.5.2 implies that there
p(x) → pt (x).
is an s for which the largest zero of p(σ,s) is at most the largest zero of their average, which is pσ .
To extend this argument to nodes higher up in the tree, we will prove the following statement. Proof Sketch. For simplicity, I consider the case in which all of the roots of p and q are distinct.
One can prove the general case by dividing out the common repeated roots.
Lemma 44.6.1. Let A be a symmetric matrix and let w i,s be vectors for 1 ≤ i ≤ k and
s ∈ {0, 1}. Then the polynomial To see that the largest root of pt is larger than λ1 , note that q(x) is positive for all x > µ1 , and
! λ1 > µ1 . So, pt (λ1 ) = p(λ1 ) − tq(λ1 ) < 0. As pt is monic, it is eventually positive and it must have
X Xk
a root larger than λ1 .
χx A + w i,ρi w Ti,ρi
ρ∈{0,1}k i=1 We will now show that for every i ≥ 1, pt has a root between λi+1 and λi . As this gives us d − 1
more roots, it accounts for all d roots of pt . For i odd, we know that q(λi ) > 0 and q(λi+1 ) < 0.
is real rooted, and for each s ∈ {0, 1}, As p is zero at both of these points, pt (λi ) > 0 and pt (λi+1 ) < 0, which means that pt has a root
k−1
! k−1
! between λi and λi+1 . The case of even i is similar.
X X X X
χx A+ w i,ρi w Ti,ρi → χx A+ w i,ρi w Ti,ρi + w k,s w Tk,s .
Lemma 44.7.3. Let p0 (x) and p1 (x) be degree n monic polynomials for which there is a third
ρ∈{0,1}k i=1 ρ∈{0,1}k i=1
polynomial r(x) Such that
r(x) → p0 (x) and r(x) → p1 (x).
Corollary 44.6.2. For every k < n and σ ∈ {±1}k , the polynomials pσ,s (x) for s ∈ {±1} are real
2
rooted and have a common interlacing. I thank Sushant Sachdeva for helping me work out this particularly simple proof.
CHAPTER 44. BIPARTITE RAMANUJAN GRAPHS 353

Then
r(x) → (1/2)p0 (x) + (1/2)p1 (x),
and the latter is a real rooted polynomial.

Sketch. Assume for simplicity that all the roots of r are distinct and different from the roots of p0
and p1 . Let µn < µn−1 < · · · < µ1 be the roots of r. Our assumptions imply that both p0 and p1
are negative at µi for odd i and positive for even i. So, the same is true of their average. This Chapter 45
tells us that their average must have at least n − 1 real roots between µn and µ1 . As their average
is monic, it must be eventually positive and so must have a root larger than µ1 . That accounts for
all n of its roots.
The Matching Polynomial
Proof of Lemma 44.6.1. We prove this by induction on k. Assuming that we have proved it for
k − 1, we now prove it for k. Let u be any vector and let t ∈ IR. Define
k−1
!
X X
pt (x) = χx A + w i,ρi w Ti,ρi + tuu T .
i=1
45.1 Overview
ρ∈{0,1}k

By Lemma 44.5.1, we can express this polynomial in the form The coefficients of the matching polynomial of a graph count the numbers of matchings of various
pt (x) = p0 (x) − tq(x), sizes in that graph. It was first defined by Heilmann and Lieb [HL72], who proved that it has
where q has positive leading coefficient and degree n − 1. By absorbing tuu T into A we may use some amazing properties, including that it is real rooted. They also √
proved that all root of the
induction on k to show that pt (x) is real rooted for all t. Thus, Lemma 44.7.1 implies that q(x) matching polynomial of a graph of maximum degree d are at most 2 d − 1. In the next lecture,
interlaces p0 (x), and Lemma 44.7.2 tells us that for t > 0 we will use this fact to derive the existence of Ramanujan graphs.

p0 (x) → pt (x). Our proofs today come from a different approach to the matching polynomial that appears in the
work of Godsil
√[God93, God81]. My hope is that someone can exploit Godsil’s approach to
So, we may conclude that for every s ∈ {±1},
! ! connect
√ the 2 d − 1 bound from today’s lecture with that from last lecture. In today’s lecture,
X k−1
X X k−1
X 2 d − 1 appears as an upper bound on the spectral radius of a d-ary tree. Infinite d-ary trees
χx A + w i,ρi w Ti,ρi → χx A+ w i,ρi w Ti,ρi + w k,s w Tk,s . appear as the graphs of free groups in free probability. I feel like there must be a formal relation
ρ∈{0,1}k−1 i=1 ρ∈{0,1}k i=1
between these that I am missing.
So, Lemma 44.7.3 implies that
k−1
! k
!
X X X X
χx A+ w i,ρi w Ti,ρi → χx A+ w i,ρi w Ti,ρi 45.2 The Matching Polynomial
ρ∈{0,1}k−1 i=1 ρ∈{0,1}k i=1

and that the latter polynomial is real rooted. A matching in a graph G = (V, E) is a subgraph of G in which every vertex has degree 1. We say
that a matching has size k if it has k edges. We let

44.8 Conclusion mk (G)

denote the number of matchings in G of size k. Throughout this lecture, we let |V | = n. Observe
The major open problem left by this work is establishing the existence of regular (non-bipartite) that m1 (G) is the number of edges in G, and that mn/2 (G) is the number of perfect matchings in
Ramanujan graphs. The reason we can not prove this using the techniques in this lecture is that G. By convention we set m0 (G) = 1, as the empty set is matching with no edges. Computing the
the interlacing techniques only allow us to reason about the largest or smallest eigenvalue of a number of perfect matchings is a #P -hard problem. This means that it is much harder than
matrix, but not both. solving N P -hard problems, so you shouldn’t expect to do it quickly on large graphs.
To see related papers establishing the existence of Ramanujan graphs, see [MSS15d, HPS15]. For
a survey on this and related material, see [MSS14].

354
CHAPTER 45. THE MATCHING POLYNOMIAL 355 CHAPTER 45. THE MATCHING POLYNOMIAL 356

The matching polynomial of G, written µx [G], is Lemma 45.3.2. X


µx [G] = xµx [G − a] − µx [G − a − b] .
n/2
def
X b∼a
n−2k k
µx [G] = x (−1) mk (G).
k=0 The matching polynomials of trees are very special—they are exactly the same as the
Our convention that m0 (G) = 1 ensures that this is a polynomial of degree n. characteristic polynomial of the adjacency matrix.

This is a fundamental example of a polynomial that is defined so that its coefficients count Theorem 45.3.3. Let G be a tree. Then
something. When the “something” is interesting, the polynomial usually is as well.
µx [G] = χx (AG ).

45.3 Properties of the Matching Polynomial Proof. Expand


χx (AG ) = det(xI − AG )
We begin by establishing some fundamental properties of the matching polynomial. For graphs G by summing over permutations. We obtain
and H on different vertex sets, we write G ∪ H for their disjoint union. X Y
(−1)sgn(π) x|{a:π(a)=a}| (−AG (a, π(a))).
Lemma 45.3.1. Let G and H be graphs on different vertex sets. Then, π∈Sn a:π(a)̸=a

µx [G ∪ H] = µx [G] µx [H] . We will prove that the only permutations that contribute to this sum are those for which
π(π(a)) = a for every a. And, these correspond to matchings.
Proof. Every matching in G ∪ H is the union of a matchings in G and a matching in H. Thus, If π is a permutation for which there is an a so that π(π(a)) ̸= a, then there are a = a1 , . . . , ak
k with k > 2 so that π(ai ) = ai+1 for 1 ≤ i < k, and π(ak ) = a1 . For this term to contribute, it
X
mk (G ∪ H) = µj (G)µk−j (H). must be the case that AG (ai , ai+1 ) = 1 for all i, and that AG (ak , a1 ) = 1. For k > 2, this would
j=0 be a cycle of length k in G. However, G is a tree and so cannot have a cycle.

The lemma follows. So, the only permutations that contribute are the involutions: the permutations π that are their
own inverse. An involution has only fixed points and cycles of length 2. Each cycle of length 2
that contributes a nonzero term corresponds to an edge in the graph. Thus, the number of
For a a vertex of G = (V, E), we write G − a for the graph G(V − {a}). This notation will prove
permutations with k cycles of length 2 is equal to the number of matchings with k edges. As the
very useful when reasoning about matching polynmomials. Fix a vertex a of G, and divide the
sign of an involution with k cycles of length 2 is (−1)k , the coefficient of xn−2k is (−1)k mk (G).
matchings in G into two classes: those that involve vertex a and those that do not. The number
of matchings of size k that do not involve a is mk (G − a). On the other hand, those that do
involve a connect a to one of its neighbors. To count these, we enumerate the neighbors b of a. A
matching of size k that includes edge (a, b) can be written as the union of (a, b) and a matching of 45.4 The Path Tree
size k − 1 in G − a − b. So, the number of matchings that involve a is
X Godsil proves that the matching polynomial of a graph is real rooted by proving that it divides
mk−1 (G − a − b). the matching polynomial of a tree. As the matching polynomial of a tree is the same as the
b∼a characteristic polynomial of its adjacency matrix, it is real rooted. Thus, the matching
polynomial of the graph is as well. The tree that Godsil uses is the path tree of G starting at a
So, X vertex of G. For a a vertex of G, the path tree of G starting at a, written Ta (G) is a tree whose
mk (G) = mk (G − a) + mk−1 (G − a − b).
vertices correspond to paths in G that start at a and do not contain any vertex twice. One path is
b∼a
connected to another if one extends the other by one vertex. For example, here is a graph and its
To turn this into a recurrence for µx [G], write path tree starting at a.
xn−2k (−1)k mk (G) = x · xn−1−2k (−1)k mk (G − a) − xn−2−2(k−1) (−1)k−1 mk−1 (G − a − b).

This establishes the following formula.


CHAPTER 45. THE MATCHING POLYNOMIAL 357 CHAPTER 45. THE MATCHING POLYNOMIAL 358

Let ab be the vertex in Ta (G) corresponding to the path from a to b. We also have
   
[ [
Ta (G) − a − ab =  Tc (G − a) ∪  Tc (G − a − b)
c∼a,c̸=b c∼b,c̸=a
 
[
= Tc (G − a) ∪ (Tb (G − a) − b) .
c∼a,c̸=b
When G is a tree, Ta (G) is isomorphic to G.
which implies
Godsil’s proof begins by deriving a somewhat strange equality. Since I haven’t yet found a better  
proof, I’ll take this route too. Y
µx [Ta (G) − a − ab] =  µx [Tc (G − a)] µx [Tb (G − a) − b] .
Theorem 45.4.1. For every graph G and vertex a of G, c∼a,c̸=b

µx [G − a] µx [Ta (G) − a]
= . Thus,
µx [G] µx [Ta (G)] Q 
µx [Ta (G) − a − ab] c∼a,c̸=b µx [Tc (G − a)] µx [Tb (G − a) − b]
The term on the upper-right hand side is a little odd. It is a forrest obtained by removing the = Q
µx [Ta (G) − a] c∼a µx [Tc (G − a)]
root of the tree Ta (G). We may write it as a disjoint union of trees as
µx [Tb (G − a) − b]
[ = .
Ta (G) − a = Tb (G − a). µx [Tb (G − a)]
b∼a
Plugging this in to (45.1), we obtain
Proof. If G is a tree, then the left and right sides are identical, and so the inequality holds. As µx [G] X µx [Ta (G) − a − ab]
the only graphs on less than 3 vertices are trees, the theorem holds for all graphs on at most 2 =x−
µx [G − a] µx [Ta (G) − a]
vertices. We will now prove it by induction on the number of vertices. b∼a
P
xµx [Ta (G) − a] − b∼a µx [Ta (G) − a − ab]
We may use Lemma 45.3.2 to expand the reciprocal of the left-hand side: =
µx [Ta (G) − a]
P X µx [G − a − b]
µx [G] xµx [G − a] − b∼a µx [G − a − b] µx [Ta (G)]
= =x− . = .
µx [G − a] µx [G − a] µx [G − a] µx [Ta (G) − a]
b∼a
Be obtain the equality claimed in the theorem by taking the reciprocals of both sides.
By applying the inductive hypothesis to G − a, we see that this equals
X µx [Tb (G − a) − b] Theorem 45.4.2. For every vertex a of G, the polynomial µx [G] divides the polynomial
x− . (45.1) µx [Ta (G)].
µx [Tb (G − a)]
b∼a

To simplify this expression, we examine these graphs carefully. By the observtion we made before Proof. We again prove this by induction on the number of vertices in G, using as our base case
the proof, graphs with at most 2 vertices. We then know, by induction, that for b ∼ a,
[
Tb (G − a) − b = Tc (G − a − b).
µx [G − a] divides µx [Tb (G − a)] .
c∼b,c̸=a

Similarly, [ As
Ta (G) − a = Tc (G − a), Ta (G) − a = ∪b∼a Tb (G − a),
c∼a
µx [Tb (G − a)] divides µx [Ta (G) − a] .
which implies Y
µx [Ta (G) − a] = µx [Tc (G − a)] . Thus,
c∼a
µx [G − a] divides µx [Ta (G) − a] ,
CHAPTER 45. THE MATCHING POLYNOMIAL 359

and so
µx [Ta (G) − a]
µx [G − a]
is a polynomial in x. To finish the proof, we apply Theorem 45.4.1, which implies

µx [G] µx [Ta (G) − a]


µx [Ta (G)] = µx [Ta (G) − a] = µx [G] .
µx [G − a] µx [G − a]
Bibliography

45.5 Root bounds [ABN+ 92] Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ron M. Roth.
Construction of asymptotically good low-rate error-correcting codes through
If every vertex of G has degree at most d, then the same is true of Ta (G). We showed
√ in pseudo-random graphs. IEEE Transactions on Information Theory, 38(2):509–516,
Theorem 4.2.4 that every eigenvalue of a tree of maximum degree d is at most 2 d − 1. When March 1992. 68, 249
combined with Theorem 45.4.2, this tells us that that matching polynomial of √ a graph with
maximum degree at most d has all of its roots bounded in absolute value by 2 d − 1. [ABN08] I. Abraham, Y. Bartal, and O. Neiman. Nearly tight low stretch spanning trees. In
Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer
Science, pages 781–790, Oct. 2008. 291

[AC88] Noga Alon and Fan Chung. Explicit construction of linear sized tolerant networks.
Discrete Mathematics, 72:15–19, 1988. 218

[AH77a] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable part i.
discharging. Illinois Journal of Mathematics, 21:429–490, 1977. 161

[AH77b] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable part ii.
reducibility. Illinois Journal of Mathematics, 21:491–567, 1977. 161

[AKPW95] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic
game and its application to the k-server problem. SIAM Journal on Computing,
24(1):78–100, February 1995. 288

[AKV02] Noga Alon, Michael Krivelevich, and Van H. Vu. On the concentration of eigenvalues
of random symmetric matrices. Israel Journal of Mathematics, 131(1):259–267, 2002.
189

[Alo86] N. Alon. Eigenvalues and expanders. Combinatorica, 6(2):83–96, 1986. 173

[AM85] Noga Alon and V. D. Milman. λ1 , isoperimetric inequalities for graphs, and
superconcentrators. J. Comb. Theory, Ser. B, 38(1):73–88, 1985. 173

[AN12] Ittai Abraham and Ofer Neiman. Using petal-decompositions to build a low stretch
spanning tree. In Proceedings of the 44th Annual ACM Symposium on the Theory of
Computing (STOC ’12), pages 395–406, 2012. 288

[AR94] Noga Alon and Yuval Roichman. Random cayley graphs and expanders. Random
Structures & Algorithms, 5(2):271–284, 1994. 66

360
BIBLIOGRAPHY 361 BIBLIOGRAPHY 362

[AS06] A. Ashikhmin and V. Skachek. Decoding of expander codes at rates close to capacity. [BLR10] P. Biswal, J. Lee, and S. Rao. Eigenvalue bounds, spectral partitioning, and metrical
IEEE Transactions on Information Theory, 52(12):5475–5485, Dec. 2006. 239 deformations via flows. Journal of the ACM, 2010. to appear. 207

[AW02] R. Ahlswede and A. Winter. Strong converse for identification via quantum channels. [BMS93] Richard Beigel, Grigorii Margulis, and Daniel A. Spielman. Fault diagnosis in a small
Information Theory, IEEE Transactions on, 48(3):569–579, 2002. 256 constant number of parallel testing rounds. In SPAA ’93: Proceedings of the fifth
annual ACM symposium on Parallel algorithms and architectures, pages 21–29, New
[AZLO15] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and York, NY, USA, 1993. ACM. 218
regret minimization beyond matrix multiplicative updates. In Proceedings of the
Forty-Seventh Annual ACM on Symposium on Theory of Computing, pages 237–245. [Bol86] Béla Bollobás. Combinatorics: set systems, hypergraphs, families of vectors, and
ACM, 2015. 268 combinatorial probability. Cambridge University Press, 1986. 48

[Bab79] László Babai. Spectra of cayley graphs. Journal of Combinatorial Theory, Series B, [BR97] R. B. Bapat and T. E. S. Raghavan. Nonnegative Matrices and Applications.
pages 180–189, 1979. 68 Number 64 in Encyclopedia of Mathematics and its Applications. Cambridge
University Press, 1997. 42
[Bab80] László Babai. On the complexity of canonical labeling of strongly regular graphs.
SIAM Journal on Computing, 9(1):212–216, 1980. 318 [BSS12] Joshua Batson, Daniel A Spielman, and Nikhil Srivastava. Twice-Ramanujan
sparsifiers. SIAM Journal on Computing, 41(6):1704–1721, 2012. 219, 261
[Bab81] László Babai. Moderately exponential bound for graph isomorphism. In
Fundamentals of Computation Theory, number 117 in Lecture Notes in Math, pages [BSS14] Joshua Batson, Daniel A Spielman, and Nikhil Srivastava. Twice-ramanujan
34–50. Springer-Verlag, Berlin-Heidelberg-New York, 1981. 304 sparsifiers. SIAM Review, 56(2):315–334, 2014. 261

[Bab16] László Babai. Graph isomorphism in quasipolynomial time. In Proceedings of the [BW13] László Babai and John Wilmes. Quasipolynomial-time canonical form for steiner
forty-eighth annual ACM symposium on Theory of Computing, pages 684–697. ACM, designs. In Proceedings of the forty-fifth annual ACM symposium on Theory of
2016. 304, 321 computing, pages 261–270. ACM, 2013. 321

[Bar82] Earl R. Barnes. An algorithm for partitioning the nodes of a graph. SIAM Journal [BZ02] A. Barg and G. Zemor. Error exponents of expander codes. IEEE Transactions on
on Algebraic and Discrete Methods, 3(4):541–550, 1982. 201 Information Theory, 48(6):1725–1729, Jun 2002. 239

[BCS+ 13] László Babai, Xi Chen, Xiaorui Sun, Shang-Hua Teng, and John Wilmes. Faster [BZ05] A. Barg and G. Zemor. Concatenated codes: serial and parallel. IEEE Transactions
canonical forms for strongly regular graphs. In 2013 IEEE 54th Annual Symposium on Information Theory, 51(5):1625–1634, May 2005. 239
on Foundations of Computer Science, pages 157–166. IEEE, 2013. 321
[BZ06] A. Barg and G. Zemor. Distance properties of expander codes. IEEE Transactions on
[BGM82] László Babai, D Yu Grigoryev, and David M Mount. Isomorphism of graphs with Information Theory, 52(1):78–90, Jan. 2006. 239
bounded eigenvalue multiplicity. In Proceedings of the fourteenth annual ACM
symposium on Theory of computing, pages 310–324. ACM, 1982. 13, 304 [BZ13] Nick Bridle and Xiaojin Zhu. p-voltages: Laplacian regularization for semi-supervised
learning on high-dimensional data. In Eleventh Workshop on Mining and Learning
[BH01] Erik Boman and B. Hendrickson. On spanning tree preconditioners. Manuscript, with Graphs (MLG2013), 2013. 154
Sandia National Lab., 2001. 289
[CCL+ 15] Dehua Cheng, Yu Cheng, Yan Liu, Richard Peng, and Shang-Hua Teng. Efficient
[BL83] László Babai and Eugene M Luks. Canonical labeling of graphs. In Proceedings of the sampling for gaussian graphical models via spectral sparsification. In Peter Grünwald,
fifteenth annual ACM symposium on Theory of computing, pages 171–183. ACM, Elad Hazan, and Satyen Kale, editors, Proceedings of The 28th Conference on
1983. 304 Learning Theory, volume 40 of Proceedings of Machine Learning Research, pages
364–390, Paris, France, 03–06 Jul 2015. PMLR. 128
[BL06] Yonatan Bilu and Nathan Linial. Lifts, discrepancy and nearly optimal spectral gap*.
Combinatorica, 26(5):495–519, 2006. 346, 347 [CFM94] F. R. K. Chung, V. Faber, and T. A. Manteuffel. On the diameter of a graph from
eigenvalues associated with its Laplacian. SIAM Journal on Discrete Mathematics,
[BLM15] Charles Bordenave, Marc Lelarge, and Laurent Massoulié. Non-backtracking 7:443–457, 1994. 283
spectrum of random graphs: community detection and non-regular ramanujan
graphs. arXiv preprint arXiv:1501.06087, 2015. 186
BIBLIOGRAPHY 363 BIBLIOGRAPHY 364

[CGP+ 18] Timothy Chu, Yu Gao, Richard Peng, Sushant Sachdeva, Saurabh Sawlani, and [DS91] Persi Diaconis and Daniel Stroock. Geometric bounds for eigenvalues of Markov
Junxing Wang. Graph sparsification, spectral sketches, and faster resistance chains. The Annals of Applied Probability, 1(1):36–61, 1991. 53
computation, via short cycle decompositions. arXiv preprint arXiv:1805.12051, 2018.
[Duf47] R. J. Duffin. Nonlinear networks. IIa. Bull. Amer. Math. Soc, 53:963–971, 1947. 152
260
[dV90] Colin de Verdière. Sur un nouvel invariant des graphes et un critère de planarité. J.
[CGW89] F. R. K. Chung, R. L. Graham, and R. M. Wilson. Quasi-random graphs.
Combin. Theory Ser. B, 50:11–21, 1990. 209
Combinatorica, 9(4):345–362, 1989. 64
[EEST08] Michael Elkin, Yuval Emek, Daniel A. Spielman, and Shang-Hua Teng. Lower-stretch
[CH91] Joel E Cohen and Paul Horowitz. Paradoxical behaviour of mechanical and electrical
spanning trees. SIAM Journal on Computing, 32(2):608–628, 2008. 291
networks. 1991. 147
[Eli55] Peter Elias. Coding for noisy channels. IRE Conv. Rec., 3:37–46, 1955. 229
[Che70] J. Cheeger. A lower bound for smallest eigenvalue of the Laplacian. In Problems in
Analysis, pages 195–199, Princeton University Press, 1970. 173 [Erd47] Paul Erdös. Some remarks on the theory of graphs. Bulletin of the American
Mathematical Society, 53(4):292–294, 1947. 164
[Chi92] Patrick Chiu. Cubic Ramanujan graphs. Combinatorica, 12(3):275–285, 1992. 346
[Fan49] Ky Fan. On a theorem of weyl concerning eigenvalues of linear transformations i.
[Chu97] F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, 1997. 171
Proceedings of the National Academy of Sciences of the United States of America,
[CKM+ 14] Michael B. Cohen, Rasmus Kyng, Gary L. Miller, Jakub W. Pachocki, Richard Peng, 35(11):652, 1949. 32
Anup B. Rao, and Shen Chen Xu. Solving sdd linear systems in nearly mlog1/2n
[Fie73] M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal,
time. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing,
23(98):298–305, 1973. 16
STOC ’14, pages 343–352, New York, NY, USA, 2014. ACM. 290, 303
[Fie75a] M. Fiedler. Eigenvectors of acyclic matices. Czechoslovak Mathematical Journal,
[CLM+ 15] Michael B Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng,
25(100):607–618, 1975. 196
and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of
the 2015 Conference on Innovations in Theoretical Computer Science, pages 181–190. [Fie75b] M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its
ACM, 2015. 260 applications to graph theory. Czechoslovak Mathematical Journal, 25(100):618–633,
1975. 199
[CZ19] Eshan Chattopadhyay and David Zuckerman. Explicit two-source extractors and
resilient functions. Annals of Mathematics, 189(3):653 – 705, 2019. 164 [FK81] Z. Füredi and J. Komlós. The eigenvalues of random symmetric matrices.
Combinatorica, 1(3):233–241, 1981. 74, 81
[dCSHS11] Marcel K. de Carli Silva, Nicholas J. A. Harvey, and Cristiane M. Sato. Sparse sums
of positive semidefinite matrices. CoRR, abs/1107.0088, 2011. 268 [FK98] Uriel Feige and Joe Kilian. Zero knowledge and the chromatic number. Journal of
Computer and System Sciences, 57(2):187–199, 1998. 162
[DH72] W. E. Donath and A. J. Hoffman. Algorithms for partitioning graphs and computer
logic based on eigenvectors of connection matrices. IBM Technical Disclosure [Fri08] Joel Friedman. A Proof of Alon’s Second Eigenvalue Conjecture and Related
Bulletin, 15(3):938–944, 1972. 201 Problems. Number 910 in Memoirs of the American Mathematical Society. American
Mathematical Society, 2008. 218, 338, 346
[DH73] W. E. Donath and A. J. Hoffman. Lower bounds for the partitioning of graphs. IBM
Journal of Research and Development, 17(5):420–425, September 1973. 201 [Fro12] Georg Frobenius. Über matrizen aus nicht negativen elementen. 1912. 39
[DK70] Chandler Davis and William Morton Kahan. The rotation of eigenvectors by a [Gal63] R. G. Gallager. Low Density Parity-Check Codes. MIT Press, Cambridge, MA, 1963.
perturbation. iii. SIAM Journal on Numerical Analysis, 7(1):1–46, 1970. 189 238
[DKMZ11] Aurelien Decelle, Florent Krzakala, Cristopher Moore, and Lenka Zdeborová. [Gee12] Jim Geelen. On how to draw a graph.
Asymptotic analysis of the stochastic block model for modular networks and its https://www.math.uwaterloo.ca/∼jfgeelen/Publications/tutte.pdf, 2012. 138
algorithmic applications. Physical Review E, 84(6):066106, 2011. 186
[GG81] C. D. Godsil and I. Gutman. On the matching polynomial of a graph. In L. Lovász
[Dod84] Jozef Dodziuk. Difference equations, isoperimetric inequality and transience of and Vera T. Sós, editors, Algebraic Methods in graph theory, volume I of Colloquia
certain random walks. Transactions of the American Mathematical Society, Mathematica Societatis János Bolyai, 25, pages 241–249. János Bolyai Mathematical
284(2):787–794, 1984. 173 Society, 1981. 348
BIBLIOGRAPHY 365 BIBLIOGRAPHY 366

[GGT06] Steven J Gortler, Craig Gotsman, and Dylan Thurston. Discrete one-forms on meshes [IZ89] R. Impagliazzo and D. Zuckerman. How to recycle random bits. In 30th annual IEEE
and applications to 3d mesh parameterization. Computer Aided Geometric Design, Symposium on Foundations of Computer Science, pages 248–253, 1989. 248
23(2):83–112, 2006. 138
[JL84] William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a
[Gil98] David Gillman. A chernoff bound for random walks on expander graphs. SIAM hilbert space. Contemporary mathematics, 26(189-206):1, 1984. 127
Journal on Computing, 27(4):1203–1220, 1998. 253
[JL97] Bruce W Jordan and Ron Livné. Ramanujan local systems on graphs. Topology,
[GLM99] S. Guattery, T. Leighton, and G. L. Miller. The path resistance method for bounding 36(5):1007–1024, 1997. 346
the smallest nontrivial eigenvalue of a Laplacian. Combinatorics, Probability and
Computing, 8:441–460, 1999. 53 [Kar72] Richard M. Karp. Reducibility among combinatorial problems. In Complexity of
Computer Computations: Proceedings of a symposium on the Complexity of
[GLSS18] Ankit Garg, Yin Tat Lee, Zhao Song, and Nikhil Srivastava. A matrix expander Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson
chernoff bound. In Proceedings of the 50th Annual ACM SIGACT Symposium on Research Center, Yorktown Heights, New York, pages 85–103, Boston, MA, 1972.
Theory of Computing, pages 1102–1114. ACM, 2018. 253 Springer US. 161
[GN08] C.D. Godsil and M.W. Newman. Eigenvalue bounds for independent sets. Journal of [Kel06] Jonathan A. Kelner. Spectral partitioning, eigenvalue bounds, and circle packings for
Combinatorial Theory, Series B, 98(4):721 – 734, 2008. 162 graphs of bounded genus. SIAM J. Comput., 35(4):882–902, 2006. 207
[God81] C. D. Godsil. Matchings and walks in graphs. Journal of Graph Theory, [KLL16] Tsz Chiu Kwok, Lap Chi Lau, and Yin Tat Lee. Improved Cheeger’s inequality and
5(3):285–297, 1981. 354 analysis of local graph partitioning using vertex expansion and expansion profile. In
Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete
[God93] Chris Godsil. Algebraic Combinatorics. Chapman & Hall, 1993. 52, 354
algorithms, pages 1848–1861. Society for Industrial and Applied Mathematics, 2016.
[Gol07] Oded Goldreich. Foundations of cryptography: volume 1, basic tools. Cambridge 178
university press, 2007. 248
[KLP12] Ioannis Koutis, Alex Levin, and Richard Peng. Improved spectral sparsification and
[Hal70] K. M. Hall. An r-dimensional quadratic placement algorithm. Management Science, numerical algorithms for sdd matrices. In STACS’12 (29th Symposium on Theoretical
17:219–229, 1970. 31 Aspects of Computer Science), volume 14, pages 266–277. LIPIcs, 2012. 260

[Har76] Sergiu Hart. A note on the edges of the n-cube. Discrete Mathematics, 14(2):157–163, [KLP+ 16] Rasmus Kyng, Yin Tat Lee, Richard Peng, Sushant Sachdeva, and Daniel A
1976. 48 Spielman. Sparsified cholesky and multigrid solvers for connection laplacians. In
Proceedings of the forty-eighth annual ACM symposium on Theory of Computing,
[Hås99] Johan Håstad. Clique is hard to approximate within n1−ϵ . Acta Mathematica, pages 842–850. ACM, 2016. 290
182(1):105 – 142, 1999. 162
[KLPT09] Jonathan A. Kelner, James Lee, Gregory Price, and Shang-Hua Teng. Higher
[HILL99] Johan Håstad, Russell Impagliazzo, Leonid A. Levin, and Michael Luby. A
eigenvalues of graphs. In Proceedings of the 50th IEEE Symposium on Foundations of
pseudorandom generator from any one-way function. SIAM Journal on Computing,
Computer Science, 2009. 207
28(4):1364–1396, 1999. 248, 249
[KMP10] I. Koutis, G.L. Miller, and R. Peng. Approaching optimality for solving sdd linear
[HL72] Ole J Heilmann and Elliott H Lieb. Theory of monomer-dimer systems.
systems. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE
Communications in Mathematical Physics, 25(3):190–232, 1972. 348, 354
Symposium on, pages 235 –244, 2010. 292
[HL09] Mark Herbster and Guy Lever. Predicting the labelling of a graph via minimum
[KMP11] I. Koutis, G.L. Miller, and R. Peng. A nearly-mlogn time solver for sdd linear
p-seminorm interpolation. In Proceedings of the 2009 Conference on Learning Theory
systems. In Foundations of Computer Science (FOCS), 2011 52nd Annual IEEE
(COLT), 2009. 154
Symposium on, pages 590–598, 2011. 290, 292
[Hof70] A. J. Hoffman. On eigenvalues and colorings of graphs. In Graph Theory and its
[Kou14] Ioannis Koutis. Simple parallel and distributed algorithms for spectral graph
Applications, pages 79–92. Academic Press, New York, 1970. 162, 164
sparsification. In Proceedings of the 26th ACM Symposium on Parallelism in
[HPS15] Chris Hall, Doron Puder, and William F Sawin. Ramanujan coverings of graphs. Algorithms and Architectures, SPAA ’14, pages 61–66, New York, NY, USA, 2014.
arXiv preprint arXiv:1506.02335, 2015. 323, 346, 353 ACM. 260
BIBLIOGRAPHY 367 BIBLIOGRAPHY 368

[KS16] Rasmus Kyng and Sushant Sachdeva. Approximate gaussian elimination for [LSY18] Yang P Liu, Sushant Sachdeva, and Zejun Yu. Short cycles via low-diameter
laplacians-fast, sparse, and simple. In Foundations of Computer Science (FOCS), decompositions. arXiv preprint arXiv:1810.05143, 2018. 260
2016 IEEE 57th Annual Symposium on, pages 573–582. IEEE, 2016. 290
[LV99] László Lovász and Katalin Vesztergombi. Geometric representations of graphs. Paul
[Kur30] Casimir Kuratowski. Sur le probleme des courbes gauches en topologie. Fundamenta Erdos and his Mathematics, 1999. 138
mathematicae, 15(1):271–283, 1930. 133
[Mar88] G. A. Margulis. Explicit group theoretical constructions of combinatorial schemes
[LM82] F. Tom Leighton and Gary Miller. Certificates for graphs with distinct eigenvalues. and their application to the design of expanders and concentrators. Problems of
Manuscript, 1982. 304 Information Transmission, 24(1):39–46, July 1988. 18, 67, 221, 249, 346

[LM00] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional [Mas14] Laurent Massoulié. Community detection thresholds and the weak ramanujan
by model selection. Annals of Statistics, pages 1302–1338, 2000. 129 property. In Proceedings of the 46th Annual ACM Symposium on Theory of
Computing, pages 694–703. ACM, 2014. 186
[LMSS01] Michael G Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, and Daniel A
Spielman. Efficient erasure correcting codes. IEEE Transactions on Information [McS01] F. McSherry. Spectral partitioning of random graphs. In FOCS ’01: Proceedings of
Theory, 47(2):569–584, 2001. 239 the 42nd IEEE symposium on Foundations of Computer Science, page 529, 2001. 186

[Lov01] Làszlò Lovàsz. Steinitz representations of polyhedra and the Colin de Verdière [Mil51] William Millar. Cxvi. some general theorems for non-linear systems possessing
number. Journal of Combinatorial Theory, Series B, 82(2):223 – 236, 2001. 15, 210 resistance. Philosophical Magazine, 42(333):1150–1160, 1951. 155

[LPS88] A. Lubotzky, R. Phillips, and P. Sarnak. Ramanujan graphs. Combinatorica, [MNS08] Steven J. Miller, Tim Novikoff, and Anthony Sabelli. The distribution of the largest
8(3):261–277, 1988. 18, 67, 221, 249, 346 nontrivial eigenvalues in families of random regular graphs. Experiment. Math.,
17(2):231–244, 2008. 346
[LPS15] Yin Tat Lee, Richard Peng, and Daniel A. Spielman. Sparsified cholesky solvers for
SDD linear systems. CoRR, abs/1506.08204, 2015. 260, 303 [MNS14] Elchanan Mossel, Joe Neeman, and Allan Sly. Belief propagation, robust
reconstruction and optimal recovery of block models. In Proceedings of The 27th
[LR97] John D. Lafferty and Daniel N. Rockmore. Spectral techniques for expander codes. In Conference on Learning Theory, pages 356–370, 2014. 186
STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of
computing, pages 160–167, New York, NY, USA, 1997. ACM. 239 [Mor94] M. Morgenstern. Existance and explicit constructions of q + 1 regular Ramanujan
graphs for every prime power q. Journal of Combinatorial Theory, Series B,
[LRT79] Richard J Lipton, Donald J Rose, and Robert Endre Tarjan. Generalized nested 62:44–62, 1994. 346
dissection. SIAM journal on numerical analysis, 16(2):346–358, 1979. 271
[MSS14] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Ramanujan graphs
[LS88] Gregory F. Lawler and Alan D. Sokal. Bounds on the l2 spectrum for Markov chains and the solution of the Kadison-Singer problem. In Proceedings of the International
and Markov processes: A generalization of Cheeger’s inequality. Transactions of the Congress of Mathematicians, 2014. 268, 353
American Mathematical Society, 309(2):557–580, 1988. 173
[MSS15a] A. W. Marcus, D. A. Spielman, and N. Srivastava. Finite free convolutions of
[LS90] L. Lovàsz and M. Simonovits. The mixing rate of Markov chains, an isoperimetric polynomials. arXiv preprint arXiv:1504.00350, April 2015. 330, 338, 342
inequality, and computing the volume. In IEEE, editor, Proceedings: 31st Annual
Symposium on Foundations of Computer Science: October 22–24, 1990, St. Louis, [MSS15b] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing families I:
Missouri, volume 1, pages 346–354, 1109 Spring Street, Suite 300, Silver Spring, MD Bipartite Ramanujan graphs of all degrees. Ann. of Math., 182-1:307–325, 2015. 346
20910, USA, 1990. IEEE Computer Society Press. 139
[MSS15c] Adam W. Marcus, Daniel A. Spielman, and Nikhil Srivastava. Interlacing families II:
[LS98] Làszlò Lovàsz and Alexander Schrijver. A borsuk theorem for antipodal links and a Mixed characteristic polynomials and the Kadison-Singer problem. Ann. of Math.,
spectral characterization of linklessly embeddable graphs. Proceedings of the 182-1:327–350, 2015. 268
American Mathematical Society, 126(5):1275–1285, 1998. 209
[MSS15d] Adam W Marcus, Nikhil Srivastava, and Daniel A Spielman. Interlacing families IV:
[LS15] Yin Tat Lee and He Sun. Constructing linear-sized spectral sparsification in Bipartite Ramanujan graphs of all sizes. arXiv preprint arXiv:1505.08010, 2015.
almost-linear time. arXiv preprint arXiv:1508.03261, 2015. 268 appeared in Proceedings of the 56th IEEE Symposium on Foundations of Computer
Science. 323, 330, 338, 353
BIBLIOGRAPHY 369 BIBLIOGRAPHY 370

[Nil91] A. Nilli. On the second eigenvalue of a graph. Discrete Math, 91:207–210, 1991. 221 [Spi96a] D.A. Spielman. Linear-time encodable and decodable error-correcting codes. IEEE
Transactions on Information Theory, 42(6):1723–1731, Nov 1996. 239
[Obr63] Nikola Obrechkoff. Verteilung und berechnung der Nullstellen reeller Polynome. VEB
Deutscher Verlag der Wissenschaften, Berlin, 1963. 325, 352 [Spi96b] Daniel A. Spielman. Faster isomorphism testing of strongly regular graphs. In STOC
’96: Proceedings of the twenty-eighth annual ACM symposium on Theory of
[Per07] Oskar Perron. Zur theorie der matrices. Mathematische Annalen, 64(2):248–263, computing, pages 576–584, New York, NY, USA, 1996. ACM. 321
1907. 39
[SS96] M. Sipser and D.A. Spielman. Expander codes. IEEE Transactions on Information
[Piz90] Arnold K Pizer. Ramanujan graphs and Hecke operators. Bulletin of the AMS, 23(1), Theory, 42(6):1710–1722, Nov 1996. 239
1990. 346
[SS11] D.A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM
[PP03] Claude M Penchina and Leora J Penchina. The braess paradox in mechanical, traffic, Journal on Computing, 40(6):1913–1926, 2011. 255, 260
and other networks. American Journal of Physics, 71:479, 2003. 147
[ST04] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for graph
[PS14] Richard Peng and Daniel A. Spielman. An efficient parallel solver for SDD linear partitioning, graph sparsification, and solving linear systems. In Proceedings of the
systems. In Symposium on Theory of Computing, STOC 2014, New York, NY, USA, thirty-sixth annual ACM Symposium on Theory of Computing, pages 81–90, 2004.
May 31 - June 03, 2014, pages 333–342, 2014. 290, 297 Full version available at http://arxiv.org/abs/cs.DS/0310051. 178
[PSL90] A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with [ST07] Daniel A. Spielman and Shang-Hua Teng. Spectral partitioning works: Planar graphs
eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11:430–452, 1990. 201 and finite element meshes. Linear Algebra and its Applications, 421:284–305, 2007.
201, 207
[RSU01] Thomas J Richardson, Mohammad Amin Shokrollahi, and Rüdiger L Urbanke.
Design of capacity-approaching irregular low-density parity-check codes. IEEE [ST13] Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massive
transactions on information theory, 47(2):619–637, 2001. 239 graphs and its application to nearly linear time graph partitioning. SIAM Journal on
Computing, 42(1):1–26, 2013. 178
[RU08] Tom Richardson and Rüdiger Urbanke. Modern coding theory. Cambridge university
press, 2008. 239 [ST14] Daniel A. Spielman and Shang-Hua Teng. Nearly-linear time algorithms for
preconditioning and solving symmetric, diagonally dominant linear systems. SIAM.
[Rud99] M. Rudelson. Random vectors in the isotropic position,. Journal of Functional
J. Matrix Anal. & Appl., 35:835–885, 2014. 278, 290, 292
Analysis, 164(1):60 – 72, 1999. 255, 256
[SW09] Daniel A. Spielman and Jaeoh Woo. A note on preconditioning by low-stretch
[RV07] Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach
spanning trees. CoRR, abs/0903.2816, 2009. Available at
through geometric functional analysis. J. ACM, 54(4):21, 2007. 256
http://arxiv.org/abs/0903.2816. 290
[RVW02] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy waves, the zig-zag graph
[SW15] Xiaorui Sun and John Wilmes. Faster canonical forms for primitive coherent
product, and new constant-degree expanders. Annals of Mathematics,
configurations. In Proceedings of the forty-seventh annual ACM symposium on
155(1):157–187, 2002. 240, 247
Theory of computing, pages 693–702. ACM, 2015. 321
[Sen06] Eugene Seneta. Non-negative matrices and Markov chains. Springer Science &
[Tan81] R. Michael Tanner. A recursive approach to low complexity codes. IEEE
Business Media, 2006. 42
Transactions on Information Theory, 27(5):533–547, September 1981. 238
[Sha48] Claude Elwood Shannon. A mathematical theory of communication. Bell system
[Tan84] R. Michael Tanner. Explicit concentrators from generalized n-gons. SIAM Journal
technical journal, 27(3):379–423, 1948. 229
Alg. Disc. Meth., 5(3):287–293, September 1984. 220
[Sim91] Horst D. Simon. Partitioning of unstructured problems for parallel processing.
[Tre09] Luca Trevisan. Max cut and the smallest eigenvalue. In STOC ’09: Proceedings of
Computing Systems in Engineering, 2:135–148, 1991. 201
the 41st annual ACM symposium on Theory of computing, pages 263–272, 2009. 17
[SJ89] Alistair Sinclair and Mark Jerrum. Approximate counting, uniform generation and
[Tre11] Luca Trevisan. Lecture 4 from cs359g: Graph partitioning and expanders, stanford
rapidly mixing Markov chains. Information and Computation, 82(1):93–133, July
university, January 2011. available at
1989. 53, 173
http://theory.stanford.edu/ trevisan/cs359g/lecture04.pdf. 173
BIBLIOGRAPHY 371

[Tro12] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of
Computational Mathematics, 12(4):389–434, 2012. 192, 255, 256

[Tut63] W. T. Tutte. How to draw a graph. Proc. London Mathematical Society, 13:743–768,
1963. 17, 131
Index
[Vai90] Pravin M. Vaidya. Solving linear equations with symmetric diagonally dominant
matrices by constructing good preconditioners. Unpublished manuscript UIUC 1990.
A talk based on the manuscript was presented at the IMA Workshop on Graph
α, 161 GofS, 37
Theory and Sparse Matrix Computation, October 1991, Minneapolis., 1990. 287, 292
χ, 161 graphpgeq, 54
[van95] Hein van der Holst. A short proof of the planarity characterization of Colin de δ, 4
Verdière. Journal of Combinatorial Theory, Series B, 65(2):269 – 272, 1995. 212 µ1 , 35 hamming weight, 226
dave , 35 hypercube, 3
[Var85] N. Th. Varopoulos. Isoperimetric inequalities and Markov chains. Journal of dmax , 35
Functional Analysis, 63(2):215 – 239, 1985. 173 independence number, 161
approximation of graphs, 55 isoperimetric ratio, 44, 168
[Ver10] Roman Vershynin. Introduction to the non-asymptotic analysis of random matrices.
arXiv preprint arXiv:1011.3027, 2010. 192 bdry, 44 Laplacian, 5
bdry2, 168 lazy random walk, 4, 93
[Vis12] Nisheeth K. Vishnoi. Lx = b, 2012. available at boundary, 44, 168 lazy walk matrix, 93
http://research.microsoft.com/en-us/um/people/nvishno/Site/Lxb-Web.pdf. linear codes, 229
277 Cauchy’s Interlacing Theorem, 37 LLG, 5
Cayley graph, 61 Loewner partial order, 53
[Voi97] Dan V Voiculescu. Free probability theory. American Mathematical Society, 1997. 330
cdotG, 54
centered vector, 173 matrix norm, 126, 277
[Vu07] Van Vu. Spectral norm of random matrices. Combinatorica, 27(6):721–736, 2007. 74,
Characteristic Polynomial, 20 MMG, 3
77, 81, 188
chiG, 38 MofS, 37
[Vu14] Van Vu. A simple svd algorithm for finding hidden partitions. arXiv preprint chromatic number, 38, 161 mu, 35
arXiv:1404.3918, 2014. 186 coloring, 38
n, 3
combinatorial degree, 4
[Wal22] JL Walsh. On the location of the roots of certain types of polynomials. Transactions NN, 171
Conductance, 170
of the American Mathematical Society, 24(3):163–180, 1922. 331 normalized adjacency matrix, 93
Courant-Fischer Theorem, 21
normalized Laplacian, 96, 171
[Wig58] Eugene P Wigner. On the distribution of the roots of certain symmetric matrices. normInf, 36
dd, 4
Ann. Math, 67(2):325–327, 1958. 69 nui, 171
ddelta, 4
[Wil67] Herbert S. Wilf. The eigenvalues of a graph and its chromatic number. J. London DDG, 4
ooneS, 38
math. Soc., 42:330–332, 1967. 34, 38 ddhalf, 171
orthogonal matrix, 19, 22
degree, 4
[Zem01] G. Zemor. On expander codes. IEEE Transactions on Information Theory, delta, 29 Paley graph, 62
47(2):835–837, Feb 2001. 239 diffusion matrix, 4 path, 3
dilation, 42 path graph, 6
[ZKT85] V. M. Zemlyachenko, N. M. Kornienko, and R. I. Tyshkevich. Graph isomorphism
problem. Journal of Soviet Mathematics, 29:1426–1481, 1985. 304 permutation matrix, 19
E, 2
Perron vector, 34
Fiedler value, 16 pgeq, 53
floor, 38 positive definite, 6
positive semidefinite, 6

372
INDEX 373

Rayleigh quotient, 21
regular, 4
ring, 3

Schur complement, 110, 117


sim, 34
similar, 19
singular values, 27, 41
singular vectors, 27, 41
Spectral decomposition, 19
square root of a matrix, 111
star graph, 45

theta, 44
theta2, 168
trace, 20

V, 2
vertex-induced subgraph, 37

walk matrix, 4, 93
weighted degree, 4
WWG, 4
WWtil, 4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy