0% found this document useful (0 votes)

46 views229 pages

Fundamental

Uploaded by

Ahmet Çolak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views229 pages

Fundamental

Uploaded by

Ahmet Çolak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 229

SOME FUNDAMENTAL THEOREMS IN MATHEMATICS

OLIVER KNILL

Abstract. An expository hitchhikers guide to some theorems in mathematics.

Criteria for the current list of 272 theorems are whether the result can be formulated elegantly,
whether it is beautiful or useful and whether it could serve as a guide [6] without leading
to panic. The order is not a ranking but ordered along a time-line when things were writ-
ten down. Since [610] stated a mathematical theorem only becomes beautiful if presented
as a crown jewel within a context" we try sometimes to give some context. Of course, any
such list of theorems is a matter of personal preferences, taste and limitations. The num-
ber of theorems is arbitrary, the initial obvious goal was 42 but that number got eventually
surpassed as it is hard to stop, once started. As a compensation, there are 42 tweetable"
theorems with included proofs. More comments on the choice of the theorems is included in
an epilogue. For literature on general mathematics, see [210, 206, 31, 258, 280, 682, 446, 152],
for history [236, 688, 409, 82, 52, 226, 412, 397, 759, 123, 681, 89, 285, 372], for popular,
beautiful or elegant things [12, 577, 219, 199, 17, 738, 739, 50, 222, 207, 269, 482, 679, 332,
219, 2, 138, 161, 139, 545, 287, 187]. For comprehensive overviews in large parts of math-
ematics, [83, 180, 181, 57, 653] or predictions on developments [53]. For reections about
mathematics in general [160, 493, 51, 335, 474, 109, 616]. Encyclopedic source examples are
[205, 775, 736, 112, 209, 167, 243, 208, 121, 699].

This is a live document which is in the process of being extended. Thanks so far to Johan
Commelin, Mikhail Katz, David McCarthy, Kapil Paranjape, Jordan Stoyanov, Michael Somos,
Ross Rosenwald for some valuable comments or corrections.

1. Arithmetic

Let N = {0, 1, 2, 3, . . . } be the set of natural numbers. A number p ∈ N, p > 1 is prime

if p has no factors dierent from 1 and p. With a prime factorization n = p1 . . . pn , we
understand the prime factors pj of n to be ordered as pi ≤ pi+1 . The fundamental theorem
of arithmetic is

Theorem: Every n ∈ N, n > 1 has a unique prime factorization.

Euclid anticipated the result. Carl Friedrich Gauss gave in 1798 the rst proof in his monograph
Disquisitiones Arithmeticae". Within abstract algebra, the result is the statement that the
ring of integers Z is a unique factorization domain. For a literature source, see [386]. For
more general number theory literature, see [356, 127].

Date : 7/22/2018, last update 6/25/2023.

1
FUNDAMENTAL THEOREMS

2. Geometry
√
Given an inner product space (V, ·) with dot product v · w leading to length |v| = v.v ,
three non-zero vectors v, w, v − w dene a right angle triangle if v and w are perpendicular
meaning that v · w = 0. If a = |v|, b = |w|, c = |v − w| are the lengths of the three vectors, then
the Pythagoras theorem is

Theorem: a2 + b 2 = c 2 .
Anticipated by Babylonians mathematicians in examples, it appeared independently also in
Chinese mathematics [693] and might have been proven rst by Pythagoras [685] but already
early source express uncertainty (see e.g. [384] p. 32). The theorem is used in many parts of
mathematics like in the Perseval equality of Fourier theory or that for uncorrelated random
variables the variance is additive Var[X] +P Var[Y ] = Var[X + Y ]. In linear algebra it generalizes
to the Lagrange identity det(F F ) = |P |=m det2 (FP ) which holds for all n × m matrices,
T

where the sum to the right is over all m × m sub-matrices P of F [34], a formula which in
calculus becomes |⃗v |2 |w|
⃗ 2 − (⃗v · w) ⃗ 2 . See [590, 584, 488, 397].
⃗ 2 = |⃗v ∧ w|

3. Calculus

Let f be a function of one variables which is continuously dierentiable, meaning that

the limit g(x) = limh→0 [f (x + h) − f (x)]/h exists at every point x and denes a continuous
Rb
function g . For any such function f , we can form the integral a f (t) dt and the derivative
d/dxf (x) = f ′ (x).
Rb Rx
Theorem: a
f ′ (x)dx = f (b) − f (a), d
dx 0
f (t)dt = f (x)

Newton and Leibniz discovered the result independently, Gregory wrote down the rst proof
in his Geometriae Pars Universalis" of 1668. The result generalizes to higher dimensions in
the form of the Green-Stokes-Gauss-Ostogradski theorem M dF = δM F which holds
R R
for n-forms F with exterior derivative dF and compact (n + 1)-manifolds M with boundary
δM . [215] tells the tongue in the cheek" proof: as the derivative is a limit of quotient
of dierences, the anti-derivative must be a limit of sums of products. For history, see
[396, 211].

4. Algebra

A polynomial is a complex-valued function of the form f (x) = a0 + a1 x + · · · + an xn , where

the entries ak are in the complex plane C. The space of all polynomials is denoted by C[x]. The
largest non-negative integer n for which an ̸= 0 is called the degree of the polynomial. Degree
1 polynomials are linear, degree 2 polynomials are called quadratic etc. The fundamental
theorem of algebra is

Theorem: Every f ∈ C[x] of degree n can be factored into n linear factors.

This result was anticipated during the 17th century. The rst author to assert that any n'th
degree polynomial has a root is Peter Roth in 1600 [579]. This was proven rst by Carl Friedrich
Gauss and nalized in 1920 by Alexander Ostrowski who xed a topological mistake in Gauss
proof. The theorem assures that the eld of complex numbers C is algebraically closed. For
history and many proofs see [235].

2
OLIVER KNILL

5. Probability

Given a sequence Xk of independent random variables on a probability space (Ω, A, P)

which all have the same cumulative distribution functions FX (t) = P[X ≤R t]. The nor-
malized random variable X = is (X − E[X])/σ[X], where E[X] is the mean Ω X(ω)dP (ω)
and σ[X] = E[(X − E[X])2 ]1/2 is the standard deviation. A sequence of random variables
Zn → Z converges in distribution to Z if FZn (t) → FZ (t) for all t as n → ∞. If Z is a
Gaussian random variable with zero mean E[Z] = 0 and standard deviation σ[Z] = 1, the
central limit theorem is:

Theorem: (X1 + X2 + · · · + Xn ) → Z in distribution.

Proven in a special case by Abraham De-Moivre in 1711 (and rediscovered by Pierre-Simon

Laplace in 1812, for discrete random variables, and then by Constantin Carathéodory, Lya-
punov, Jarl Waldemard Lindeberg [472] and Paul Lévy, the √ theorem explains the importance
−x2 /2
and ubiquity of the Gaussian density function e / 2π dening the normal distribu-
tion. The Gaussian distribution was rst considered by Abraham de Moivre in 1738. See
[687, 420, 238].

6. Dynamics

Assume X is a random variable on a probability space R (Ω, A, P) for which |X| has nite
mean E[|X|]. This means X : Ω → R is measurable and Ω |X(x)|dP(x) is nite. Let T be an
ergodic, measure-preserving transformation from Ω to Ω. Measure preserving means that
P [T −1 (A)] = P [A] for all measurable sets A ∈ A. Ergodic means that that T (A) = A
implies P[A] = 0 or P[A] = 1 for all A ∈ A. The ergodic theorem states, that for an ergodic
transformation T on has:
Theorem: [X(x) + X(T x) + · · · + X(T n−1 (x))]/n → E[X] for almost all x.

This theorem from 1931 is due to George Birkho and is called Birkho 's pointwise ergodic
theorem. It assures that time averages" are equal to space averages". A draft of the von
Neumann mean ergodic theorem which appeared in 1932 by John von Neumann has
motivated Birkho, but the mean ergodic version is weaker. See [774] for history. A special
case is the law of large numbers, in which case the random variables x → X(T k (x)) are
independent with equal distribution (IID). The theorem belongs to ergodic theory [312, 158,
654].

7. Set theory

A bijection is a map from a set X to a set Y which is injective: f (x) = f (y) ⇒ x = y and
surjective: for every y ∈ Y , there exists x ∈ X with f (x) = y . Two sets X, Y have the same
cardinality, if there exists a bijection from X to Y . Given a set X , the power set 2X is the
set of all subsets of X , including the empty set and X itself. If X has n elements, the power
set has 2n elements. Cantor's theorem is
Theorem: For any set X , the sets X and 2X have dierent cardinality.

The result is due to Cantor. Taking for X the natural numbers, then every Y ∈ 2X denes a
real number ϕ(Y ) = y∈Y 2 ∈ [0, 1]. As Y and [0, 1] have the same cardinality (as double
−y
P

3
FUNDAMENTAL THEOREMS

counting pair cases like 0.39999999 · · · = 0.400000 . . . form a countable set), the interval [0, 1]
is uncountable. There are dierent types of innities leading to countable innite sets and
uncountable innite sets. In order to compare sets, the Schröder-Bernstein theorem is
important. If there exist injective functions f : X → Y and g : Y → X , then there exists also
a bijection X → Y . This result was used by Cantor already. For literature, see [313].

8. Statistics

A probability space (Ω, A, P) consists of a set Ω, a σ -algebra A and a probability mea-

sure P. A σ -algebra is a collection of subset of Ω which contains the empty set and which
is closed under the operations of taking complements, countable unions and countable
intersections. The function P on A takes values in the interval [0, 1], satises P[Ω] = 1 and
P[ A∈S A] = A∈S P[A] for any nite or countable set S ⊂ A of pairwise disjoint sets. The
S P
elements in A are called events. Given two events A, B where B satises P[B] > 0, one can
dene the conditional probability P[A|B] = P[A ∩ B]/P[B]. Bayes theorem states:

Theorem: P[A|B] = P[B|A]P[A]/P[B]

The setup stated the Kolmogorov axioms by Andrey Kolmogorov who wrote in 1933 the
Grundbegrie der Wahrscheinlichkeitsrechnung" [438] based on measure theory built by Emile
Borel and Henry Lebesgue. For history, see [641], who report that Kolmogorov sat down
to write the Grundbegrie, in a rented cottage on the Klyaz'ma River in November 1932".
Bayes theorem is rather a fantastically clever denition and not really a theorem. There is
almost nothing to prove as multiplying with P[B] gives P[A ∩ B] on both sides. It essentially
restates that A ∩ B = B ∩ A, the Abelian property of the product in the ring A. More
general is the statement that
P if A1 , . . . , An is a disjoint set of events whose union is Ω, then
P[Ai |B] = P[B|Ai ]P[Ai ]/( j P[B|Aj ]P[Aj ]. Bayes theorem was rst noticed in 1763 by Thomas
Bayes. It is by some considered to the theory of probability what the Pythagoras theorem is to
geometry. Monty Hall" type stories [609] illustrate that conditional expectation is not always
intuitive.

9. Graph theory

A nite simple graph G = (V, E) is a nite collection V of vertices connected by a nite

collection E of edges, which are un-ordered pairs (a, b) with a, b ∈ V . Simple means that no
self-loops nor multiple connections are present in the graph. The vertex degree d(x) of
x ∈ V is the number of edges containing x.

x∈V d(x)/2 = |E|.

P
Theorem:

This formula is also called the Euler handshake formula because every edge in a graph
contributes exactly two handshakes. It can be seen as a Gauss-Bonnet formula for the
valuation G → v1 (G) counting the number of edges in G. A valuation ϕ is a function
dened on sub-graphs with the property that ϕ(A ∪ B) = ϕ(A) + ϕ(B) − ϕ(A ∩ B). Examples
of valuations are the number vk (G) of complete sub-graphs of dimension k of G. An other
example is the Euler characteristic χ(G) = v0 (G)−v1 (G)+v2 (G)−v3 (G)+· · ·+(−1)d vd (G).
If we write dk (x) = vk (S(x)), where S(x) is the unit sphere of x, then x∈V dk (x)/(k + 1) =
P
vk (G) is the generalized handshake formula, the Gauss-Bonnet result for vk . The Euler
characteristic then satises x∈V K(x) = χ(G), where K(x) = ∞ k=0 (−1) vk (S(x))/(k + 1).
k
P P

4
OLIVER KNILL

This is the discrete Gauss-Bonnet result. The handshake result is the special case for the
valuation v1 (G) counting the number of edges of a graph G. It was found by Euler and has
by some called the fundamental theorem of graph theory. For more about graph theory,
[76, 512, 38, 294] about Euler: [233].

10. Polyhedra

A nite simple graph G = (V, E) is given by a nite vertex set V and edge set E . A subset
W of V generates the sub-graph (W, {{a, b} ∈ E | a, b ∈ W }). The unit sphere of v ∈ V
is the sub graph generated by S(x) = {y ∈ V | {x, v} ∈ E}. The empty graph 0 = (∅, ∅)
is called the (−1)-sphere. The 1-point graph 1 = ({1}, ∅) = K1 is the smallest contractible
graph. Inductively, a graph G is contractible, if it is either 1 or if there exists x ∈ V such that
both G − x and S(x) are contractible. Inductively, a graph G is a d-sphere, if it is either 0 or
if every S(x) is a (d − 1)-sphere and if there exists a vertex x such that G − x is contractible.
Let vk denote the number of complete sub-graphs Kk+1 of G. The vector (v0 , v1 , . . . ) is the
f -vector of G and χ(G) = v0 − v1 + v2 − . . . is the Euler characteristic of G. The generalized
Euler gem formula due to Schläi is:

Theorem: For d = 2, χ(G) = v − e + f = 2. For d-spheres, χ(G) = 1 + (−1)d .

Convex Polytopes were studied already in ancient Greece. The Euler characteristic relations
were discovered in dimension 2 by Descartes [4] and interpreted topologically by Euler who
proved the case d = 2. This is written as v − e + f = 2, where v = v0 , e = v1 , f = v2 . The
two-dimensional case can be stated for planar graphs, where one has a clear notion of what
the two dimensional cells are and can use the topology of the ambient sphere in which the graph
is embedded. Historically there had been confusions [143, 600] about the denitions. It was
Ludwig Schläi [630] who covered the higher dimensional case. The above set-up is a modern
reformulation of his set-up, due essentially to Alexander Evako. Multiple refutations [457] can
be blamed to ambiguous denitions. Polytopes are often dened through convexity [298, 771]
and there is not much consensus on a general denition [297], which was the reason in this
entry to formulate Schläi's theorem in a rather restrictive case (where all cells are simplices),
but where we have a simple combinatorial denition of what a sphere" is. See also [376].

11. Topology

The Zorn lemma assures that the Cartesian product of a non-empty family of non-empty
sets is non-empty. The Zorn lemma is equivalent to the axiom of choice C in the Zermelo-
Frenkel ZFCQ axiom system and also equivalent to the Tychonov theorem in topology.
Let X = i∈I Xi denote the product of topological spaces. The product topology is the
weakest topology on X which renders all projection functions πi : X → Xi continuous.
Here is Tychonov;s theorem

Theorem: If all Xi are compact, then i∈I Xi is compact.

Zorn's lemma is due to Kazimierz Kuratowski in 1922 and Max August Zorn in 1935. Andrey
Nikolayevich Tykhonov proved his theorem in 1930. One application of the Zorn lemma is the
Hahn-Banach theorem in functional analysis, the existence of spanning trees in innite
graphs or the fact that commutative rings with units have maximal ideals. For literature, see
[374].

5
FUNDAMENTAL THEOREMS

12. Algebraic geometry

The algebraic set V (J) of an ideal J in the commutative ring R = k[x1 , . . . , xn ] over an
algebraically closed eld k denes the ideal I(V (J)) containing all polynomials that vanish
√
on V (J). The radical J of an ideal J is the set of polynomials in R such that rn ∈ J for
some positive n. [An ideal J in a ring R is a subgroup of the additive group of R such that
rx ∈ I for all r ∈ R and all x ∈ I . It denes the quotient ring R/I and is so the kernel of
a ring homomorphism from R to R/I . The algebraic set V (J) = {x ∈ k n | f (x) = 0, ∀f ∈ J}
of an ideal J in the polynomial ring R is the set of common roots of all these functions f .
The algebraic sets are the closed sets in the Zariski topology of R. The ring R/I(V ) is the
coordinate ring of the algebraic set V .] The Hilbert Nullstellensatz is
√
Theorem: I(V (J)) = J.

The theorem is due to Hilbert from 1893 [336] (page 320). Of course, Hilbert did not yet use the
language of ideals but in terms of having ganze rationalen homogene Funktionen" of several
variables. A simple example is when J = ⟨p⟩ = ⟨x2 − 2xy + y 2 ⟩ is the ideal J generated by p in
R[x, y]; then V (J) = {x = y} and I(V (J)) is the ideal generated by x − y . For literature, see
[321, 767].

13. Cryptology

An integer p > 1 is prime if 1 and p are the only factors of p. The number k mod p is the
reminder when dividing k by p. For example 18mod7 = 4. Fermat's little theorem is

Theorem: ap = a mod p for every prime p and every integer a.

The theorem was found by Pierre de Fermat in 1640. A rst proof appeared in 1683 by
Leibniz. Euler in 1736 published the rst proof. The result is used in the Die-Hellman key
exchange, where a large public prime p and a public base value a are taken. Ana chooses a
number x and publishes X = ax modp and Bob picks y publishing Y = ay modp. Their secret
key is K = X y = Y x . An adversary Eve who only knows a, p, X and Y can from this not get
K due to the diculty of the discrete log problem. More generally, for possibly composite
numbers n, the theorem extends to the fact that aϕ(n) = 1 modulo p, where the Euler's totient
function ϕ(n) counts the number of positive integers less than n which are coprime to n. The
generalized Fermat theorem is the key for RSA crypto systems: in order for Ana and Bob
to communicate. Bob publishes the product n = pq of two large primes as well as some base
integer a. Neither Ana nor any third party Eve do know the factorization. Ana communicates a
message x to Bob by sending X = ax modn using modular exponentiation. Bob, who knows
p, q , can nd y such that xy = 1 mod ϕ(n). This is because of Fermat a(p−1)(q−1) = a mod n.
Now, he can compute x = y −1 mod ϕ(n). Not even Ana herself could recover x from X .

14. Spectral theorem

A bounded linear operator A on a Hilbert space is called normal if AA∗ = A∗ A, where

T
A∗ = A is the adjoint and AT is the transpose and A is the complex conjugate. Examples
of normal operators are self-adjoint operators (meaning A = A∗ ) or unitary operators
(meaning AA∗ = 1).

6
OLIVER KNILL

Theorem: A is normal if and only if A is unitarily diagonalizable.

In nite dimensions, any unitary U diagonalizing A using B = U ∗ AU contains an orthonormal
eigenbasis of A as column vectors. The theorem is due to Hilbert. In the self-adjoint case,
all the eigenvalues are real and in the unitary case, all eigenvalues are on the unit circle. The
result allows a functional calculus for normal operators: for any continuous function f and
any bounded linear operator A, one can dene f (A) = U f (B)U ∗ , if B = U ∗ AU . See [151].

15. Number systems

A monoid is a set X equipped with an associative operation ∗ and an identity element

1 satisfying 1 ∗ x = x for all x ∈ X . Associativity means x ∗ (y ∗ z) = (x ∗ y) ∗ z for all
x, y, z ∈ X . The monoid structure belongs to a collection of mathematical structures magmas
⊃ semigroups ⊃ monoids ⊃ groups. A monoid is commutative, if x ∗ y = y ∗ x for
all x, y ∈ X . A group is a monoid in which every element x has an inverse y satisfying
x ∗ y = y ∗ x = 1.
Theorem: Every commutative monoid can be extended to a group.
The general result is due to Alexander Grothendieck from around 1957. A more precise state-
ment is that there is a group containing a homomorphic image of the monoid. It is for can-
cellative monoids (the statement a ∗ x = b ∗ x implies a = b in the monoid) that the monoid
is also contained isomorphically inside the group. In general, like for a zero monoid with 3 or
more elements dened by x ∗ y = 1 for all x, y which is not cancellative, such a collapse already
appears. The group is called the Grothendieck group completion of the monoid. For ex-
ample, the additive monoid of natural numbers can be extended to the group of integers, the
multiplicative monoid of non-zero integers can be extended to the group of rational numbers.
The construction of the group is used in K-theory [30, 388] For insight about the philosophy
of Grothendieck's mathematics, see [508].

16. Combinatorics

Let |X| denote the cardinality of a nite set X . This means that |X| is the number of elements
in X . A function f from a set X to a set Y is called injective if f (x) = f (y) implies x = y .
The pigeonhole principle tells:

Theorem: If |X| > |Y | then no function X → Y can be injective.

This implies that if we place n items into m boxes and n > m, then one box must contain
more than one item. The principle is believed to be formalized rst by Peter Dirichlet. Despite
its simplicity, the principle has many applications, like proving that something exists. An
example is the statement that there are two trees in New York City streets which have the
same number of leaves. The reason is that the U.S. Forest services states 592'130 trees in
the year 2006 and that a mature, healthy tree has about 200'000 leaves. One can also use
it for less trivial statements like that in a cocktail party there are at least two with the same
number of friends present at the party. A mathematical application is the Chinese remainder
Theorem stating that that there exists a solution to ai x = bi mod mi all disjoint pairs mi , mj
and all pairs ai , mi are relatively prime [182, 494]. The principle generalizes to innite set if
|X| is the cardinality. It implies then for example that there is no injective function from the
real numbers to the integers. For literature, see for example [100], which states also a stronger

7
FUNDAMENTAL THEOREMS

version which for example allows to show that any sequence of real n2 + 1 real numbers contains
either an increasing subsequence of length n + 1 or a decreasing subsequence of length n + 1.

17. Complex analysis

Assume f is an analytic function in an open domain G of the complex plane C. Such

a function is also called holomorphic in G. Holomorphic means that if f (x + iy) = u(x +
iy) + iv(x + iy), then the Cauchy-Riemann dierential equations ux = vy , uy = −vx hold in
G. Assume z is in G and assume C ⊂ G is a circle a + reiθ centered at z which is bounding a
disc D = {z ∈ C | |z − a| < r} ⊂ G.
f (z)dz
For analytic f and circle C ⊂ G, one has f (a) = 1
.
R
Theorem: 2πi C (z−a)

This Cauchy integral formula of Cauchy is used for other R results and estimates. It implies
for example the Cauchy integral theorem assuring that C f (z)dz = 0 for any simple closed
curve C in G bounding R a simply connected region D ⊂ G. Morera's theorem assures that
for any domain G, if C f (z) dz = 0 for all simple closed smooth curves C in G, then f is
holomorphic in G. An other generalization is residue calculus: For a simply connected region
G and a function f which is analytic except in a Rnite set A of points.
P If C is piecewise smooth
continuous closed curve not intersecting A, then C f (z) dz = 2πi a∈A I(C, a)Res(f, a), where
I(C, a) is the winding number of C with respect to a and Res(f, a) is the residue of f at a
which is in the case of poles given by limz→a (z − a)f (z). See [120, 10, 150].

18. Linear algebra

If A is a m × n matrix with image ran(A) and kernel ker(A). If V is a linear subspace of

Rm , then V ⊥ denotes the orthogonal complement of V in Rm , the linear space of vectors
perpendicular to all x ∈ V .

Theorem: dim(kerA) + dim(ranA) = n, dim((ranA)⊥ ) = dim(kerAT ).

The result is used in data tting for example when understanding the least square solu-
tion x = (AT A)−1 AT b of a system of linear equations Ax = b. It assures that AT A is
invertible if A has a trivial kernel. The result is a bit stronger than the rank-nullity theorem
dim(ran(A)) + dim(ker(A)) = n alone and implies that for nite m × n matrices the index
dim(kerA) − dim(kerA∗ ) is always n − m, which is the value for the 0 matrix. For literature, see
[684]. The result has an abstract generalization in the form of the group isomorphism theorem
for a group homomorphism f stating that G/ker(f ) is isomorphic to f (G). It can also be
described using the singular value decomposition A = U DV T . The number r = ranA has
as a basis the rst r columns of U . The number n − r = kerA has as a basis the last n − r
columns of V . The number ranAT has as a basis the rst r columns of V . The number kerAT
has as a basis the last m − r columns of U .

19. Differential equations

A dierential equation dt d
x = f (x) and x(0) = x0 in a Banach space (X, || · ||) (a normed,
complete vector space) denes an initial value problem: we look for a solution x(t) satisfying
the equation and given initial condition x(0) = x0 and t ∈ (−a, a) for some a > 0. A function
f from R to X is called Lipschitz, if there exists a constant C such that for all x, y ∈ X the
inequality ||f (x) − f (y)|| ≤ C|x − y| holds.

8
OLIVER KNILL

Theorem: If f is Lipschitz, a unique solution of x′ = f (x), x(0) = x0 exists.

This result is due to Picard and Lindelöf from 1894. Replacing the Lipschitz condition with
continuity still gives an existence√ theorem which is due to Giuseppe Peano in 1886, but
uniqueness can fail like for x = x, x(0) = 0 with solutions x = 0 and x(t) = t /4. The
′ 2

example x′ (t) = x2 (t), x(0) = 1 with solution 1/(1 − t) shows that we can not have solutions
for all t. The proof is a simple application of the Banach xed point theorem. For literature,
see [140].

20. Logic

An axiom system A is a collection of formal statements assumed to be true. We assume it to

contain the basic Peano axioms of arithmetic. (One only needs rst order Peano arithmetic
PA, for the rst incompletness theorem one can even do with the weaker Robinson arithmetic.)
An axiom system is complete, if every true statement can be proven within the system. The
system is consistent if one can not prove 1 = 0 within the system. It is provably consistent
if one can prove a theorem "The axiom system A is consistent." within the system. It is
important that the axiom system is strong enough to contain the Peano arithmetic as there are
interesting and widely studied theories that happen to be complete, such as the theory of real
closed elds.

Theorem: An axiom system is neither complete nor provably consistent.

The result is due to Kurt Gödel who proved it in 1931. In this thesis, Gödel had proven
a completeness theorem of rst order predicate logic. The incompleteness theorems of 1931
destroyed the dream of Hilbert's program which aimed for a complete and consistent axiom
system for mathematics. A commonly assumed axiom system is the Zermelo-Frenkel axiom
system together with the axiom of choice ZFC. Other examples are Quine's new foundations
NF or Lawvere's elementary theory of the category of sets ETCS. For a modern view on
Hilbert's program, see [705]. For Gödel's theorem [246, 537]. Hardly any other theorem had so
much impact outside of mathematics.

21. Representation theory

For a nite group or compact topological group G, one can look at representations,
group homomorphisms from G to the automorphisms of a vector space V . A representation
of G is irreducible if the only G-invariant subspaces of V are 0 or V . The direct sum of of
two representations ϕ, ψ is dened as ϕ ⊕ ψ(g)(v ⊕ w) = ϕ(g)(v) ⊕ ϕ(g)(w). A representation is
semi simple if it is a unique direct sum of irreducible nite-dimensional representations:

Theorem: Representations of compact topological groups are semi simple.

For representation theory, see [743]. Pioneers in representation theory were Ferdinand Georg
Frobenius, Herman Weyl, and Élie Cartan. Examples of compact groups are nite group, or
compact Lie groups (a smooth manifold which is also a group for which the multiplications
and inverse operations are smooth) like the torus group T n , the orthogonal groups O(n) of
all orthogonal n × n matrices or the unitary groups U (n) of all unitary n × n matrices or the
group Sp(n) of all symplectic n × n matrices. Examples of groups that are not Lie groups are
the groups Zp of p-adic integers, which are examples of pro-nite groups.

9
FUNDAMENTAL THEOREMS

22. Lie theory

Given a topological group G, a Borel measure µ on G is called left invariant if µ(gA) =

µ(A) for every g ∈ G and every measurable set A ⊂ G. A left-invariant measure on G is also
called a Haar measure. A topological space is called locally compact, if every point has a
compact neighborhood.
Theorem: A locally compact group has a unique Haar measure.
Alfréd Haar showed the existence in 1933 and John von Neumann proved that it is unique.
In the compact case, the measure is nite, leading to an inner product and so to unitary
representations. Locally compact Abelian groups G can be understood by their characters,
continuous group homomorphisms from G to the circle group T = R/Z. The set of characters
denes a new locally compact group Ĝ, the dual of G. The multiplication is the pointwise
multiplication, the inverse is the complex conjugate and the topology is the one of uniform
convergence on compact sets. If G is compact, then Ĝ is discrete, and if G is discrete, then
Ĝ is compact. In order ˆ
R to prove Pontryagin duality ˆG = G, one needs a generalized Fourier
ˆ
transform f (χ) = G f (x)χ(x)dµ(x) which uses the Haar measure. The inverse Fourier
transform gives back f using the Rdual Haar measure. The Haar measure is also used to
dene the convolution f ⋆ g(x) = G f (x − y)g(y)dµ(y) rendering L1 (G) a Banach algebra.
The Fourier transform then produces a homomorphism from L1 (G) to C0 (Ĝ) or a unitary
transformation from L2 (G) to L2 (Ĝ). For literature, see [132, 729].

23. Computability

The class of general recursive functions is the smallest class of functions which allows
projection, iteration, composition and minimization. The class of Turing computable
functions are the functions which can be implemented by a Turing machine possessing
nitely many states. Turing introduced this in 1936 [572].

Theorem: The generally recursive class is the Turing computable class.

Kurt Gödel and Jacques Herbrand dened the class of general recursive functions around 1933.
They were motivated by work of Alonzo Church who then created λ calculus later in 1936.
Alan Turing developed the idea of a Turing machine which allows to replace Herbrand-Gödel
recursion and λ calculus. The Church thesis or Church-Turing thesis states that everything
we can compute is generally recursive. As whatever we can compute" is not formally dened,
this always will remain a thesis unless some more eective computation concept would emerge.

24. Category theory

Given an element A in a category C , let hA denote the functor which assigns to a set X the
set Hom(A, X) of all morphisms from A to X . Given a functor F from C to the category
S = Set, let N (G, F ) be the set of natural transformations from G = hA to F . (A natural
transformation between two functors G and F from C to S assigns to every object x in
C a morphism ηx : G(x) → F (x) such that for every morphism f : x → y in C we have
ηy ◦ G(f ) = F (f ) ◦ ηx .) The functor category dened by C and S has as objects the functors
F and as morphisms the natural transformations. The Yoneda lemma is

10
OLIVER KNILL

Theorem: N (hA , F ) can be identied with F (A).

Category theory was introduced in 1945 by Samuel Eilenberg and Sounders Mac Lane. The
lemma above is due to Nobuo Yoneda from 1954. It allows to see a category embedded in a
functor category which is a topos and serves as a sort of completion. One can identify a
set S for example with Hom(1, S). An other example is Cayley's theorem stating that the
category of groups can be completely understood by looking at the group of permutations of
G. For category theory, see [507, 458]. For history, [449].

25. Perturbation theory

A function f of several variables is called smooth if one can take rst partial derivatives
like ∂x , ∂y and second partial derivatives like ∂x ∂y f (x, y) = fxy (x, y) and still have continuous
functions. Assume f (x, y) is a smooth function of two Euclidean variables x, y ∈ Rn . If
f (a, 0) = 0, we say a is a root of x → f (x, y). If fy (x0 , y) is invertible, the root is called
non-degenerate. If there is a solution f (g(y), y) = 0 such that g(0) = a and g is continuous,
the root a has a local continuation and say that it persists under perturbation.

Theorem: A non-degenerate root persists under perturbation.

This is the implicit function theorem. There are concrete and fast algorithms to compute the
continuation. An example is the Newton method which iterates T (x) = x − f (x, y)/fx (x, y)
to nd the roots of x → f (x, y) for xed y . The importance of the implicit function theorem
is both theoretical as well as applied. The result assures that one can makes statements about
a complicated theory near some model, which is understood. There are related situations, like
if we want to continue a solution of F (x, y) = (f (x, y), g(x, y)) = (0, 0) giving equilibrium
points of the vector eld F . Then the Newton step T (x, y) = (x, y) − dF −1 (x, y) · F (x, y)
method allows a continuation if dF (x, y) is invertible. This means that small deformations of
F do not lead to changes of the nature of the equilibrium points. When equilibrium points
change, the system exhibits bifurcations. This in particular applies to F (x, y) = ∇f (x, y),
where equilibrium points are critical points. The derivative dF of F is then the Hessian.
[447] call it one of the most important and oldest pradigms in modern mathematics for which
the germ of the idea was already formed in the writings of Isaac Newton and Gottfried Leibniz
but only riped under Augustin-Louis Cauchy to the theorem we know today.

26. Counting

A simplicial complex X is a nite set of non-empty sets that is closed under the operation
of taking nite non-empty subsets. The Euler characteristic χ of a simplicial complex G is
dened as χ(X) = x∈X (−1)dim(x) , where the dimension dim(x) of a set x is its cardinality
P
|x| minus 1.

Theorem: χ(X × Y ) = χ(X)χ(Y ).

For zero-dimensional simplicial complexes G, (meaning that all sets in G have cardinality
1), we get the rule of product: if you have m ways to do one thing and n ways to do
an other, then there are mn ways to do both. This fundamental counting principle is
used in probability theory for example. The Cartesian product X × Y of two complexes
is dened as the set-theoretical product of the two nite sets. It is not a simplicial complex

11
FUNDAMENTAL THEOREMS

any more in general but has the same Euler characteristic than its Barycentric renement
(X × Y )1 , which
Pnis a simplicial complex. The maximal dimension of A × B is dim(A) + dim(B)
and pX (t) = k=0 vk (X)t is the generating function of vk (X), then pX×Y (t) = pX (t)pY (t)
k

implying the counting principle as pX (−1) = χ(X). The function pX (t) is called the Euler
polynomial of X . The importance of Euler characteristic as a counting tool lies in the fact
that only χ(X) = pX (−1) is invariant under Barycentric subdivision χ(X) = X1 , where X1
is the complex which consists of the vertices of all complete subgraphs of the graph in which
the sets of X are the vertices and where two are connected if one is contained in the other.
The concept of Euler characteristic goes so over to continuum spaces like manifolds where the
product property holds too. See for example [14].

27. Metric spaces

A continuous map T : X → X , where (X, d) is a complete non-empty metric space is called

a contraction if there exists a real number 0 < λ < 1 such that d(T (x), T (y)) ≤ λd(x, y) for
all x, y ∈ X . The space is called complete if every Cauchy sequence in X has a limit. (A
sequence xn in X is called Cauchy if for all ϵ > 0, there exists n > 0 such that for all i, j > n,
one has d(xi , xj ) < ϵ.)

Theorem: A contraction has a unique xed point in X .

This result is the Banach xed point theorem proven by Stefan Banach from 1922. The
example case T (x) = (1 − x2 )/2 on X = Q ∩ [0.3, 0.6] having contraction rate λ = 0.6 and
T (X) = Q√ ∩ [0.32, 0.455] ⊂ X shows that completeness √ is necessary. The unique xed point of
T in X is 2 − 1 = 0.414... which is not in Q because 2 = p/q would imply 2q 2 = p2 , which
is not possible for integers as the left hand side has an odd number of prime factors 2 while the
right hand side has an even number of prime factors. See [560]

28. Dirichlet series

The abscissa of simple convergence of a Dirichlet series ζ(s) = ∞ −λn s

is σ0 =
P
n=1 an e
inf{a ∈ P R | ζ(z) converges for all Re(z) > a }. For λn = n we have the Taylor series
fP(z) = ∞ n=1 an z with z = e . For λn = log(n) we have the standard Dirichlet
n −s
P∞ n series
∞
n=1 an /n . For example, for an = z , one gets the poly-logarithm Lis (z) = n=1 z /n and
s n s

especially
P∞ n Lis (1) = ζ(s), the Riemann
Pn zeta function or the Lerch transcendent Φ(z, s, a) =
n=1 z /(n + a) . Dene S(n) = k=1 ak . The Cahen's formula applies if the series S(n)
s

does not converge.

log |S(n)|
Theorem: σ0 = lim supn→∞ λn
.

There is a similar formula for the abscissa of absolute convergence of ζ which is dened
as σa = inf{a ∈ R | ζ(z) converges absolutely for all Re(z) > a }. The result is σa =
P∞
lim supn→∞ log(S(n))
λn
, For example, for the Dirichlet eta function ζ(s) = n=1 (−1)
n−1
/ns
has the abscissa
P∞of convergence σ0 = 0 and the absolute abscissa of convergence σa = 1. The
inα
series ζ(s) = n=1 e /n has s
P σa = 1 sandQ σ0 = 1 − α. If an is multiplicative an+m = an am for
relatively prime n, m, then ∞ a
n=1 Qn /n = p (1 + ap /p + ap2 /p + · · · ) generalizes the Euler
s 2s

p (1 − 1/p ) . See [316, 318].

s s −1
P
golden key formula n 1/n =

12
OLIVER KNILL

29. Trigonometry

Mathematicians had a long and painful struggle with the concept of limit. One of the rst to
ponder the question was Zeno of Elea around 450 BC [501]. Archimedes of Syracuse made some
progress around 250 BC. Since Augustin-Louis Cauchy [120] one uses the notion of limits. See
also [282]. Today, one denes the limit limx→a f (x) = b to exist if and only if for all ϵ > 0,
there exists a δ > 0 such that if |x − a| < δ , then |f (x) − b| < ϵ. A place where limits appear
are when computing derivatives g ′ (0) = limx→0 [g(x) − g(0)]/x. In the case g(x) = sin(x),
one has to understand the limit of the function f (x) = sin(x)/x which is the sinc function.
A prototype result is the fundamental theorem of trigonometry (called as such in some
calculus texts like [97]).

Theorem: limx→0 sin(x)/x = 1.

It appears strange to give weight to such a special result but it explains the diculty of limit
and the l'Hôpital rule of 1694, which was formulated in a book of Bernoulli commissioned to
Hôpital: the limit can be obtained by dierentiating both the denominator and nominator and
taking the limit of the quotients. The result allows to derive (using trigonometric identities)
that in general sin′ (x) = cos(x) and cos′ (x) = − sin(x). One single limit is the gateway. It is im-
portant also culturally because it embraces thousands of years of struggle. It was Archimedes,
who used the theorem when computing the circumference of the circle formula 2πr using
exhaustion using regular polygons from the inside and outside. Comparing the lengths of
the approximations essentially battled that fundamental theorem of trigonometry. The iden-
tity is therefore the epicenter around the development of trigonometry, dierentiation and
integration.

30. Logarithms

The natural logarithm is the inverse of the exponential function exp(x) establishing so a
group homomorphism from the additive group (R, +) to the multiplicative group (R+ , ∗).
We have:
Theorem: log(uv) = log(u) + log(v).

This follows from exp(x + y) = exp(x) exp(y) and log(exp(x)) = exp(log(x)) = x by plugging
in x = log(u), y = log(v). The logarithms were independently discovered by Jost Bürgi around
1600 and John Napier in 1614 [670]. The logarithm with base b > 0 is denoted by logb . It is the
inverse of x → bx = ex log(b) . The concept of logarithm has been extended in various ways: in any
group G, one can dene the discrete logarithm logb (a) to base b as an integer k such that
bk = a (if it exists). For complex numbers the complex logarithm log(z) as any solution w of
ew = z . It is multi-valued as log(|z|) + iarg(z) + 2πik all solve this with some integer k , where
arg(z) ∈ (−π, π). The identity log(uv) = log(u)+log(v) is now only true up to 2πki. Logarithms
can also be dened for matrices. Any matrix B solving exp(B) = A is called a logarithm of
A. For A close to the identity I , can dene log(A) = (A − I) − (A − I)2 /2 + (A − I)3 /3 − ...,
which is a Mercator series. For normal invertible matrices, one can dene logarithms
using the functional calculus by diagonalization. On a Riemannian manifold M , one also
has an exponential map: it is a dieomorphim from a small ball Br (0) in the tangent space
x ∈ M to M . The map v → expx (v) is obtained by dening expx (0) = x and by taking for
v ̸= 0 a geodesic with initial direction v/|v| and running it for time |v|. The logarithm logx

13
FUNDAMENTAL THEOREMS

is now dened on a geodesic ball of radius r and denes an element in the tangent space. In
the case of a Lie group M = G, where the points are matrices, each tangent space is its Lie
algebra.

31. Geometric probability

A subset K of Rn is called compact if it is closed and bounded. By Bolzano-Weierstrass

this is equivalent to the fact that every innite sequence xn in K has a subsequence which
converges. A subset K of Rn is called convex, if for any two given points x, y ∈ K , the
interval {x + t(y − x), t ∈ [0, 1]} is a subset of K . Let G be the set of all compact convex
subsets of Rn . An invariant valuation X is a function X : G → R satisfying X(A ∪
B) + X(A ∩ B) = X(A) + X(B), which is continuous in the Hausdor metric d(K, L) =
max(supx∈K inf y∈L d(x, y) + supy∈K inf x∈L d(x, y)) and invariant under rigid motion generated
by rotations, reections and translations in the linear space Rn .

Theorem: The space of valuations is (n + 1)-dimensional.

The theorem is P
due to Hugo Hadwiger from 1937. The coecients aj (G) of the polynomial
Vol(G + tB) = nj=0 aj tj are a basis, where B is the unit ball B = {|x| ≤ 1}. See [408].

32. Partial differential equations

A quasilinear partial dierential equation is a dierential equation of the form ut (x, t) =

F (x, t, u) · ∇x u(x, t) + f (x, t, u) with analytic initial condition u(x, 0) = u0 (x) and an analytic
vector eld F . It denes a quasi-linear Cauchy problem.

Theorem: A quasi-linear Cauchy problem has a unique analytic solution.

This is the Cauchy-Kovalevskaya theorem. It was initiated by Augustin-Louis Cauchy
in 1842 and proven in 1875 by Sophie Kowalevskaya. Analyticity is important, smoothness
alone is not enough. If F is analytic in each variable, one can look at equations like the
Cauchy problem ut = F (t, x, u, ux , uxx ). Examples are partial dierential equations like the heat
equation ut = uxx or the wave equation utt = uxx . Given an initial condition u(0, x) = u0 (x)
one then deals with an ordinary dierential equation in a function space. One can then try
to approach the Cauchy-Kovalevskaya problem by Picard-Lindelöf. The problem is that the
Lipschitz condition fails because the corresponding operators are unbounded. Even Cauchy-
Peano (which does not ask for uniqueness) fails. And this even in an analytic setting. [549]
gives the example ut = uxx with initial condition u(0, x) = 1/(1 + x2 ) for which the entire series
solving the problem has a zero radius of convergence in x for any t > 0. Texts like [707, 549]
give full versions of the Cauchy-Kovalevskaya theorem for real-analytic Cauchy initial data on
a real analytic hypersurface satisfying a non-characteristic condition for the partial dierential
equation. For a shorter introduction to partial dierential equations, see [25].

33. Game theory

If S = (S1 , . . . , Sn ) are n players and f = (f1 , . . . , fn ) is a payo function dened on a

strategy prole x = (x1 , . . . , xn ). A point x∗ is called an equilibrium if fi (x∗ ) is maximal
with respect to changes of xi alone in the prole x for every player i.
Theorem: There is an equilibrium for any game with mixed strategy

14
OLIVER KNILL

The equilibrium is called a Nash equilibrium. It tells us what we would see in a world if
everybody is doing their best, given what everybody else is doing. John Forbes Nash used
in 1950 the Brouwer xed point theorem and later in 1951 the Kakutani xed point
theorem to prove it. The Brouwer xed point theorem itself is generalized by the Lefschetz
xed point theorem which equates the super trace of the induced map on cohomology with
the sum of the indices of the xed points. About John Nash and some history of game theory,
see [647]: game theory started maybe with Adam Smith's the Wealth of Nations" published
in 1776, Ernst Zermelo in 1913 (Zermelo's theorem), Émile Borel in the 1920s and John von
Neumann in 1928 pioneered mathematical game theory. Together with Oskar Morgenstern,
John von Neumann merged game theory with economics in 1944. Nash published his thesis in
a paper of 1951. For the mathematics of games, see [733].

34. Measure theory

A topological space with open sets O denes the Borel σ -algebra, the smallest σ algebra
which contains O. For the metric space (R, d) with d(x, y) = |x − y|, already the intervals
generate the Borel σ algebra A. A Borel measure is a measure dened on a Borel σ -algebra.
Every Borel measure µ on the real line R can be decomposed uniquely into an absolutely
continuous part µac , a singular continuous part µsc and a pure point part µpp :

Theorem: µ = µac + µsc + µpp .

This is called the Lebesgue decomposition theorem. It uses the Radon-Nikodym the-
orem. The decomposition theorem implies the decomposition theorem of the spectrum of
a linear operator. See [653] (like page 259). Lebesgue's theorem was published in 1904. A
generalization due to Johann Radon and Otto Nikodym was done in 1913.

35. Geometric number theory

If Γ is a lattice in Rn , denote with Rn /Γ the fundamental region and by |Γ| its volume. A
set K is convex if x, y ∈ K implies x + t(x − y) ∈ K for all 0 ≤ t ≤ 1. A set K is centrally
symmetric if x ∈ K implies −x ∈ K . A region is Minkowski if it is convex and centrally
symmetric. Let |K| denote the volume of K .

Theorem: If K is Minkowski and |K| > 2n |Γ| then K ∩ Γ ̸= ∅.

The theorem is due to Hermann Minkowski in 1896. It lead to a eld called geometry of
numbers. [122]. It has many applications in number theory and Diophantine analysis
[106, 356]

36. Fredholm

An integral kernel K(x, y) ∈ L2 ([a, b]2 ) denes an integral operator A dened by Af (x) =
Rb Rb
a
K(x, y)f (y) dy with adjoint T ∗
f (x) = a
K(y, x)f (y) dy . The L2 assumption makes the
function K(x, y) what one calls a Hilbert-Schmidt kernel. Fredholm showed that the Fred-
holm equation A∗ f = (T ∗ − λ)f = g has a solution f if and only if f is perpendicular to
the kernel of A = T − λ. This identity ker(A)⊥ = im(A∗ ) is in nite dimensions part of the
fundamental theorem of linear algebra. The Fredholm alternative reformulates this in
a more catchy way as an alternative:

15
FUNDAMENTAL THEOREMS

Theorem: Either ∃f ̸= 0 with Af = 0 or for all g , ∃f with Af = g .

In the second case, the solution depends continuously on g . The alternative can be put more
generally by stating that if A is a compact operator on a Hilbert space and λ is not an
eigenvalue of A, then the resolvent (A − λ)−1 is bounded. A bounded operator A on a Hilbert
space H is called compact if the image of the unit ball is relatively compact (has a compact
closure). The Fredholm alternative is part of Fredholm theory. It was developed by Ivar
Fredholm in 1903.

37. Prime distribution

The Dirichlet theorem about the primes along an arithmetic progression tells that if a and b
are relatively prime meaning that there largest common divisor is 1, then there are innitely
many primes of the form p = a mod b. The Green-Tao theorem strengthens this. We say
that a set A contains arbitrary long arithmetic progressions if for every k there exists an
arithmetic progression {a + bj, j = 1, · · · , k} within A.

Theorem: The set of primes contains arbitrary long arithmetic progressions.

The Dirichlet prime number theorem was found in 1837. The Green-Tao theorem
was done in 2004 and appeared in 2008 [288]. It uses Szemerédi's theorem [253] which
shows that any set A of positive upper density lim supn→∞ |A ∩ {1 · · · n}|/n has arbitrary long
arithmetic progressions. So, any subset A of the primes P for which the relative density
lim supn→∞ |A ∩ {1 · · · n}|/|P ∩ {1 · · · n}| is positive has arbitrary long arithmetic progressions.
For non-linear sequences of numbers the problems are wide open. The Landau problem of
the innitude of primes of the form x2 + 1 illustrates this. The Green-Tao theorem gives hope
to tackle the Erdös P conjecture on arithmetic progressions telling that a sequence {xn }
of integers satisfying n xn = ∞ contains arbitrary long arithmetic progressions.

38. Riemannian geometry

A Riemannian manifold is a smooth nite dimensional manifold M equipped with a smooth,

symmetric, positive denite tensor g dening on each tangent space Tx M an inner
i,j gij (x)u v . Let Ω be the space of smooth vector elds
i j
P
product (u, v)x = (g(x)u, v) =
X on M . A vector eld X acts on smooth functions f as directional derivative Xf = δX f . Given
two vector elds X, Y on M , one has at each point x ∈ M a number g(X, Y ) = (g(x)X(x), Y (x))
so that g(X, Y ) is a smooth function on M . A connection is a bilinear map (X, Y ) →
∇X Y from Ω × Ω to Ω satisfying the dierentiation rules ∇f X Y = f ∇X Y and Leibniz rule
∇X (f Y ) = df (X)Y + f ∇X Y . It is compatible with the metric if the Lie derivative
satises δX g(Y, Z) = g(ΓX Y, Z) + g(Y, ΓX Z). It is torsion-free if ∇X Y − ∇Y X = [X, Y ] is
the Lie bracket on Ω.
Theorem: There is exactly one torsion-free connection compatible with g .

This is the fundamental theorem of Riemannian geometry. The connection is called the
Levi-Civita connection, named after Tullio Levi-Civita. One proof goes by establishing the
Koszul formula which determines ∇X Y explicitely

2g(∇X Y, Z) = Xg(Y, Z) + Y g(X, Z) − Zg(X, Y ) − g(X, [Y, Z]) − g(Y, [X, Z]) + g(Z, [X, Y ]) .
See for example [196, 3, 666, 166].

16
OLIVER KNILL

39. Symplectic geometry

A symplectic manifold (M, ω) is a smooth 2n-manifold M equipped with a non-degenerate

closed 2-form ω . The later is called a symplectic form. As a 2-form, it satises ω(x, y) =
−ω(y, x). Non-degenerate P means ω(u, v) = 0 for all v implies u = 0. The standard
symplectic form is ω0 = i<j dxi ∧ dxj .

Theorem: Every symplectic form is locally dieomorphic to ω0 .

This theorem is due to Jean Gaston Darboux from 1882. Modern proofs use Moser's trick from
1965 (i.e. [340]). The Darboux theorem assures that locally, two symplectic manifolds of the
same dimension are symplectic equivalent. It also implies thatsymplectic
matrices A (2n×2n
0 I
matrices satisfying AT JA = J with skew symmetric J = ) have determinant 1
−I 0
which is not obvious as applying the determinant to AT JA = J only establishes det(A)2 = 1.
In contrast, for Riemannian manifolds, one can not trivialize the Riemannian metric in a
neighborhood one can only render it the standard metric at the point itself.

40. Differential topology

Given a smooth function f on a dierentiable manifold M . Let df denote the gradient

of f . A point x is called a critical point, if df (x) = 0. We assume f has only nitely many
critical points and that all of them are non-degenerate. The later means that the Hessian
d2 f (x) is invertible at x. One calls such functions Morse functions. The Morse index
of a critical point x is the number of negative eigenvalues of d2 f . The Morse inequalities
relate the number ck (f, K) of critical points of index k of f with the Betti numbers bk (M ),
dened as the nullity of the Hodge star operator dd∗ + d∗ d restricted to k -forms Ωk , where
dk : Ωk → Ωk+1 is the exterior derivative.

Theorem: ck − ck−1 + · · · + (−1)k c0 ≥ bk − bk−1 + · · · + (−1)k b0 .

These are the Morse inequalities due to Marston Morse from 1934. It implies in particular
the weak Morse inequalities bk ≤ ck . Modern proofs use Witten deformation [166] of the
exterior derivative d.

41. Non-commutative geometry

A spectral triple (A, H, D) is given by a Hilbert space H , a C ∗ -algebra A of operators

on H and a densely dened self-adjoint operator D satisfying ||[D, a]|| < ∞ for all a ∈ A
2
the operator e−tD is trace class. The operator D is called a Dirac operator. The set-up
generalizes Riemannian geometry because of the following result dealing with the exterior de-
rivative d on a Riemannian manifold (M, g), where A = C(M ) is the C ∗ -algebra of continuous
functions and D = d + d∗ is the Dirac operator, dening a spectral triple for (M, g). Let δ
denote the geodesic distance in (M, g):

Theorem: δ(x, y) = supf ∈A,||[D,f ]||≤1 |f (x) − f (y)|.

This formula of Alain Connes tells that the spectral triple determines the geodesic distance
in (M, g) and so the metric g . It justies to look at spectral triples as non-commutative
generalizations of Riemannian geometry. See [145].

17
FUNDAMENTAL THEOREMS

42. Polytopes

A convex polytop P in dimension n is the convex hull of nitely many points in Rn . One
assumes all vertices to be extreme points, points which do not lie in an open line segment
of P . The boundary of P is formed by (n − 1) dimensional boundary facets. The notion
of Platonic solid is recursive. A convex polytope is Platonic, if all its facets are Platonic
(n − 1)-dimensional polytopes and vertex gures. Let p = (p2 , p3 , p4 , . . . ) encode the number
of Platonic solids meaning that pd is the number of Platonic polytops in dimension d.

Theorem: There are 5 platonic solids and p = (∞, 5, 6, 3, 3, 3, . . . )

In dimension 2, there are innitely many. They are the regular polygons. The list of
Platonic solids is octahedron", dodecahedron", icosahedron", tetrahedron" and cube" has
been known by the Greeks already. Ludwig Schläi rst classied the higher dimensional case.
There are six in dimension 4: they are the 5 cell", the 8 cell" (tesseract), the 16 cell", the 24
cell", the 120 cell" and the 600 cell". There are only three regular polytopes in dimension 5
and higher, where only the analog of the tetrahedron, cube and octahedron exist. For literature,
see [298, 771, 600].

43. Descriptive set theory

A metric space (X, d) is a set with a metric d (a function X × X → [0, ∞) satisfying

symmetry d(x, y) = d(y, x), the triangle inequality d(x, y) + d(y, z) ≥ d(x, z), and d(x, y) =
0 ↔ x = y .) A metric space (X, d) is complete if every Cauchy sequence converges in X . A
metric space is of second Baire category if the intersection of a countable set of open dense
sets is dense. The Baire Category theorem tells

Theorem: Complete metric spaces are of second Baire category.

One calls the intersection A of a countable set of open dense sets A in X also a generic set
or residual set. The complement of a generic set is also called a meager set or negligible
or a set of rst category. Such a set is the union of countably many nowhere dense sets.
Like measure theory, Baire category theory can be used to get existence results. There can be
surprises: a generic continuous function is not dierentiable for example. For descriptive set
theory, see [400]. The frame work for classical descriptive set theory often are Polish spaces,
which are separable complete metric spaces. See [91].

44. Calculus of variations

Let X be the vector space of smooth, compactly supported functions h on an interval (a, b).
The fundamental lemma of calculus of variations tells
Rb
Theorem: a
f (x)g(x)dx = 0 for all g ∈ X , then f = 0.

The result is due to Joseph-Louis Lagrange. OneR can restate this as the fact that if f = 0
b
weakly then f is actually zero. It implies that if a f (x)g ′ (x) dx = 0 for all g ∈ X , then f is
constant. This is nice as f is not assumed to be
R b dierentiable. The result is used to prove that
extrema to a variational problem I(x) = a L(t, x, x ) dt are weak solutions of the Euler
′

Lagrange equations Lx = d/dtLx′ . See [265, 532].

18
OLIVER KNILL

45. Integrable systems

Given a Hamilton dierential equation x′ = J∇H(x) on a compact symplectic 2n-

manifold (M, ω). The almost complex structure J : T ∗ M → T M is tied to ω using a
Riemannian metric g by ω(v, w) = ⟨v, Jg⟩. A function F : M → R is called an rst integral
if d/dtF (x(t)) = 0 for all t. An example is the Hamiltonian function H itself. A set of
integrals F1 , . . . , Fk Poisson commutes if {Fj , Fk } = J∇Fj · ∇Fk = 0 for all k, j . They
are linearly independent, if at every point the vectors ∇Fj are linearly independent in the
sense of linear algebra. A system is Liouville integrable if there are d linearly independent,
Poisson commuting integrals. The following theorem due to Liouville and Arnold characterizes
the level surfaces {F = c} = {F1 = c1 , . . . Fd = cd }:

Theorem: For a Liouville integrable system, level surfaces F = c are tori.

An example how to get integrals is to write the system as an isospectral deformation of
an operator L. This is called a Lax system. Such a dierential equation has the form
L′ = [B, L], where B = B(L) is skew symmetric. An example is the periodic Toda system
ȧn = an (bn+1 − bn ), ḃn = 2(a2n − a2n−1 ), where (Lu)n = an un+1 + an−1 un−1 + bn un and (Bu)n =
an un+1 − an−1 un−1 . An other example is the motion of a rigid body in n dimensions if the
center of mass is xed. See [24].

46. Harmonic analysis

On the vector space X of continuously dierentiable 2π periodic, complex- valued functions,

dene the inner product (f, g) = (2π) −1
f (x)g(x) dx. The Fourier coecients of f are
R
ˆ
fn = (f, en ), where {en (x) = e }n∈Z is the Fourier basis. The Fourier series of f is the
inx

sum n∈Z fˆn einx .

Theorem: The Fourier series of f ∈ X converges point-wise to f .

Already Fourier claimed this always to be true in his Théorie Analytique de la Chaleur".
After many fallacious proofs, Dirichlet gave the rst proof of convergence [436]. The case is
subtle because there are continuous functions for which the convergence fails at some points.
Lipót Féjer was able to show that for a continuous function f , the coecients fˆn nevertheless
determine the function using Césaro convergence. See [399].

47. Jensen inequality

If V is a vector space, a set X is called convex if for all points a, b ∈ X , the line segment
{tb + (1 − t)a | t ∈ [0, 1]} is contained in X . A real-valued function ϕ : X → R is called convex
if ϕ(tb + (1 − t)a) ≤ tϕ(b) + (1 − t)ϕ(a) for all a, b ∈ X and all t ∈ [0, 1]. Let now R(Ω, A, P) be a
probability space, and f ∈ L1 (Ω, P) an integrable function. We write E[f ] = ω f (x) dP (x)
for the expectation of f . For any convex ϕ : R → R and f ∈ L1 (Ω, P ), we have the Jensen
inequality

Theorem: ϕ(E[f ]) ≤ E[ϕ(f )].

For ϕ(x) = exp(x) and a nite probability space Ω = {1, 2, . . . , n} with f (k) = xk = exp(yk )
and P[{x}] = 1/n, this gives the arithmetic mean- geometric mean inequality (x1 ·
x2 · · · xn )1/n ≤ (x1 + x2 + · · · + xn )/n. The case ϕ(x) = ex is useful in general as it leads to the

19
FUNDAMENTAL THEOREMS

inequality eE[f ] ≤ E[ef ] if ef ∈ L1 . For f ∈ L2 (ω, P ) one gets (E[f ])2 ≤ E[f 2 ] which reects the
fact that E[f 2 ] − (E[f ])2 = E[(f − E[f ])2 ] = Var[f ] ≥ 0 where Var[f ] is the variance of f .

48. Jordan curve theorem

A closed curve in the image of a continuous map r : T → R2 . It is called simple, if this map
r is injective. One then calls the map an embedding and the image a topological 1-sphere,
meaning that it is homeomorphic to the standard circle x2 + y 2 = 1 in R2 . The image is then
called a Jordan curve. The Jordan curve theorem deals with such simple closed curves S
in the two-dimensional plane.

Theorem: A simple closed curve divides the plane into two regions.

The Jordan curve theorem is due to Camille Jordan. His proof [379] was objected at rst [411]
but rehabilitated in [310]. The theorem can be strengthened, a theorem of Schoenies tells
that each of the two regions is homeomorphic to the disk {(x, y) ∈ R2 | x2 + y 2 < 1}. In the
smooth case, it is even possible to extend the map to a dieomorphism in the plane. In higher
dimensions, one knows that an embedding of the (d − 1) dimensional sphere in a Rd divides
space into two regions. This is the Jordan-Brouwer separation theorem. It is no more true
in general that the two parts are homeomorphic to {x ∈ Rd | |x| < 1}: a counter example
is the Alexander horned sphere which is a topological 2-sphere but where the unbounded
component is not simply connected and so not homeomorphic to the complement of a unit ball.
See [91].

49. Chinese remainder theorem

Given integers a, b, a linear modular equation or congruence ax + b = 0 mod m asks to

nd an integer x such that ax + b is divisible by m. This linear equation can always be solved
if a and m are coprime. The Chinese remainder theorem deals with the system of linear
modular equations x = b1 mod m1 , x = b2 mod m2 , . . . , x = bn mod mn , where mk are the
moduli. More generally, for an integer n × n matrix A we call Ax = bmod m a Chinese
remainder theorem system or shortly CRT system if the mj are pairwise relatively prime
and in each row there is a matrix element Aij relatively prime to mi .

Theorem: Every Chinese remainder theorem system has a solution.

The classical single variable case case is when Ai1 = 1 and Aij = 0 for j > 1. Let M =
m1 · · · m2 · · · mn be the product. In this one-dimensional case, the result implies that xmod M
→ (x mod m1 , . . . , (x mod mn ) is a ring isomorphism. Dene Mi = M/mi . An explicit
algorithm is to nding numbers yi , zi with yi Mi + zi mi = 1 (nding y, z solving ay + bz = 1 for
coprime a, b is computed using the Euclidean algorithm), then nding x = b1 m1 y1 + · · · +
bn mn yn . [182, 494]. The multi-variable version appeared in 2005 [419, 422] and can be found
also in [692].

50. Bézout's theorem

A polynomial is homogeneous if the total degree of all its monomials is the same. A homo-
geneous polynomial f in n + 1 variables of degree d ≥ 1 denes a projective hypersurface
f = 0. Given n projective irreducible hypersurfaces fk = ck of degree dk in a projective space
Pn we can look at the solution set {f = c} = {f1 = c1 , · · · , fk = ck } of a system of nonlinear

20
OLIVER KNILL

equations. The Bézout's bound is d = d1 · · · dk the product of the degrees. Bézout's theo-
rem allows to count the number of solutions of the system, where the number of solutions is
counted with multiplicity.

Theorem: The set {f = c} is either innite or has d elements.

Bézout's theorem was stated in the Principia" of Newton in 1687 but was proven st only in
1779 by Étienne Bézout. If the hypersurfaces are all irreducible and in general position", then
there are exactly d solutions and each has multiplicity 1. This can be used also for ane surfaces.
If y 2 −x3 −3x−5 = 0 is an elliptic curve for example, then y 2 z−x3 −3xz 2 −5z 3 = is a projective
hypersurface, its projective completion. Bézout's theorem implies part the fundamental
theorem of algebra as for n = 1, when we have only one homogeneous equation we have d roots
to a polynomial of degree d. The theorem implies for example that the intersection of two conic
sections have in general 2 intersection points. The example x2 − yz = 0, x2 + z 2 − yz = 0
has only the solution x = z = 0, y = 1 but with multiplicity 2. As non-linear systems of
equations appear frequently in computer algebra this theorem gives a lower bound on the
computational complexity for solving such problems.

51. Group theory

A nite group (G, ∗, 1) is a nite set containing a unit 1 ∈ G and a binary operation ∗ :
G × G → G satisfying the associativity property (x ∗ y) ∗ z = x ∗ (y ∗ z) and such that for
every x, there exists a unique y = x−1 such that x ∗ y = y ∗ x = 1. The order n of the group
is the number of elements in the group. An element x ∈ G generates a subgroup formed by
1, x, x2 = x ∗ x, . . . . This is the cyclic subgroup C(x) generated by x. Lagrange's theorem
tells
Theorem: |C(x)| is a factor of |G|

The origins of group theory go back to Joseph Louis Lagrange, Paulo Runi and Évariste
Galois. The concept of abstract group appeared rst in the work of Arthur Cayley. Given a
subgroup H of G, the left cosets of H are the equivalence classes of the equivalence relation
x ∼ y if there exists z ∈ H with x = z ∗ y . The equivalence classes G/N partition G.
The number [G : N ] of elements in G/H is called the index of H in G. It follows that
|G| = |H|[G : H] and more generally that if K is a subgroup of H and H is a subgroup of G
then [G : K] = [G : H][H : K]. The group N generated by x is a called a normal group
N ◁ G if for all a ∈ N and all x in G the element x ∗ a ∗ x−1 is in N . This can be rewritten as
H ∗ x = x ∗ H . If N is a normal group, then G/H is again a group, the quotient group. For
example, if f : G → G′ is a group homomorphism, then the kernel of f is a normal subgroup
and |G| = |ker(f )||im(f )| because of the rst group isomorphism theorem.

52. Primes

A rational prime (or simply prime") is an integer larger than 1 which is only divisible
by 1 or itself. The Wilson theorem allows to dene a prime as a number n for which
(n − 1)! + 1 is divisible by n. Euclid already knew that there are innitely many primes (if
there were nitely many p1 , . . . , pn , the new number p1 p2 · · · pn + 1 would have a prime factor
dierentPfrom the given set). It also follows from the divergence of the harmonic series
∞
ζ(1) = n=1 1/n = 1 + 1/2 + 1/3 + · · · and the Euler golden key or Euler product

21
FUNDAMENTAL THEOREMS

ζ(s) = ∞ 2 s −1
for the Riemann zeta function ζ(s) that there are
P P
n=1 1/n = p prime (1 − 1/p )
innitely many primes as otherwise, the product to the right would be nite.
Let π(x) be the prime-counting function which gives the number of primes smaller or equal
to x. Given two functions f (x), g(x) from the integers to the integers, we say f ∼ g , if
limx→∞ f (x)/g(x) = 1. The prime number theorem tells

Theorem: π(x) ∼ x/ log(x).

The result was investigated experimentally rst by Anton Ferkel and Jurij Vega, Adrien-Marie
Legendre rst conjectured in 1797 a law of this form. Carl Friedrich Gauss wrote in 1849 that
he experimented independently around 1792 with such a law. The theorem was proven in 1896
by Jacques Hadamard and Charles de la Vallée Poussin. Proofs without complex analysis were
put forward by Atle Selberg and Paul Erdös in 1949. A simple analytic proof was given by
Donald Newman in [547]. The prime number theorem also assures that there are innitely
many primes but it makes the statement quantitative in that it gives an idea how fast the
number of primes grow asymptotically. Under√ the assumption of the Riemann
Rx hypothesis,
Lowell Schoenfeld proved |π(x) − li(x)| < x log(x)/(8π), where li(x) = 0 dt/ log(t) is the
logarithmic integral.

53. Cellular automata

A nite set A called alphabet and an integer d ≥ 1 denes the compact topological space
d
Ω = AZ of all innite d-dimensional congurations. The topology is the product topology
which is compact by the Tychonov theorem. The translation maps Ti (x)n = xn+ei are homeo-
morphisms of Ω called shifts. A closed T invariant subset X ⊂ Ω denes a subshift (X, T ). An
automorphism T of Ω which commutes with the translations Ti is called a cellular automaton,
abbreviated CA. An example of a cellular automaton is a map T xn = ϕ(xn+u1 , . . . xn+uk ) where
U = {u1 , . . . uk } ⊂ Zd is a xed nite set. It is called an local automaton because it is dened
by a nite rule so that the status of the cell n at the next step depends only on the status of
the neighboring cells" {n + u | u ∈ U }. The following result is the Curtis-Hedlund-Lyndon
theorem:

Theorem: Every cellular automaton is a local automaton.

Cellular automata were introduced by John von Neumann and mathematically in 1969 by
Hedlund [329]. The result appears there. Hedlund saw cellular automata also as maps on
subshifts. One can so look at cellular automata on subclasses of subshifts. For example,
one can restrict the cellular automata map T on almost periodic congurations, which are
subsets X of Ω on which (X, T1 , . . . Tj ) has only invariant measures µ for which the Koopman
operators Ui f = f (Ti ) on L2 (X, µ) have pure point spectrum. A particularly well studied case
is d = 1 and A = {0, 1}, if U = {−1, 0, 1}, where the automaton is called an elementary
cellular automaton. The Wolfram numbering labels the 28 possible elementary automata
with a number between 1 and 255. The game of life of Conway is a case for d = 2 and
A = {−1, 0, 1} × {−1, 0, 1}. For literature on cellular automata see [756] or as part of complex
systems [757] or evolutionary dynamics [550]. For topological dynamics, see [172].

22
OLIVER KNILL

54. Topos theory

A category has objects as nodes and morphisms as arrows going from one object to an
other object. There can be multiple connections and self-loops so that one can visualize a
category as a quiver a multidigraph. Every object has the identity arrow 1A . A topos X is
a Cartesian closed category C in which nite limits exists and which has a sub-object
classier Ω allowing to identify sub-objects with morphisms from X to Ω. Cartesian closed
means that one can dene for any pair of objects A, B in C the product A × B and an
equalizer representing solutions f = g to arrows f : A → B, G : A → B as well as an
exponential B A representing all arrows from A to B . An example is the topos of sets. An
example of a sub-object classier is Ω = {0, 1} encoding true or false".
The slice category E/X of a category E with an object X in E is a category, where the objects
are the arrows from E → X . An E/X arrow between objects f : A → X and g : B → X is
a map s : A → B which produces a commutative triangle in E . The composition is pasting
triangles together. The fundamental theorem of topos theory is:

Theorem: The slice category E/X of a topos E is a topos.

For example, if E is the topos of sets, then the slice category is the category of pointed
sets: the objects are then sets together with a function selecting a point as a base point".
A morphism f : A → B denes a functor E/B → E/A which preserves exponentials and the
subobject classier Ω. Topos theory was motivated by geometry (Grothendieck), physics
(Lawvere), topology (Tierney) and algebra (Kan). It can be seen as a generalization and
even a replacement of set theory: the Lawvere's elementary theory of the category of
sets ETCS is seen as part of ZFC which are less likely to be inconsistent [465]. For a short
introduction [368], for textbooks [507, 115], for history of topos theory in particular, see [506].

55. Transcendentals

A root of an equation f (x) = 0 with integer polynomial f (x) = an xn + an−1 xn−1 + · · · + a0

with n ≥ 0 and aj ∈ Z is called an algebraic number. The set A of algebraic numbers is
sub-eld of the eld R of real numbers. The eld A is the algebraic closure of the rational
numbers Q. It is of number theoretic interest as it contains all algebraic number elds, nite
degree eld extensions of Q. The complement R \ A is the set of transcendental numbers.
Transcendental numbers are necessarily irrational because every rational number x = p/q is
algebraic, solving qx − p = 0. Because the set of algebraic numbers is countable and the real
numbers are not, most numbers are transcendental. The group of all automorphisms of A which
x Q is called the absolute Galois group of Q.

Theorem: π and e are transcendental

This result is due to Ferdinand von Lindemann. He proved that ex is transcendental for every
non-zero algebraic number x. This immediately implies e is transcendental. Now, if π were
algebraic, then πi would be algebraic and eiπ = −1 would be transcendental. But −1 is
rational. Lindemann's result was extended in 1885 by Karl Weierstrass to the statement telling
that if x1 , . . . xn are linearly independent algebraic numbers, then ex1 , . . . exn are algebraically
independent. The transcendental property of π also proves that π is irrational. This is easier
to prove directly. See [356].

23
FUNDAMENTAL THEOREMS

56. Recurrence

A homeomorphism T : X → X of a compact topological space X denes a topological

dynamical system (X, T ). We write T j (x) = T (T (. . . T (x))) to indicate that the map T is
applied j times. For any d > 0, we get from this a set (T1 , T2 , . . . , Td ) of commuting homeo-
morphisms on X , where Tj (x) = T j x. A point x ∈ X is called multiple recurrent for T if for
every d > 0, there exists a sequence n1 < n2 < n3 < · · · of integers nk ∈ N for which Tjnk x → x
for k → ∞ and all j = 1, . . . , d. Fürstenberg's multiple recurrence theorem states:
Theorem: Every topological dynamical system is multiple recurrent.
It is known even that the set of multiple recurrent points are Baire generic. Hillel Fürstenberg
proved this result in 1975. There is a parallel theorem for measure preserving systems:
an automorphism T of a probability space (Ω, A, P) is called multiple recurrent if there
exists A ∈ A and an integer n such that P[A ∩ T1 (A) ∩ · · · ∩ Td (A)] > 0. This generalizes the
Poincaré recurrence theorem, which is the case d = 1. Recurrence theorems are related
to the Szemerédi theorem telling that a subset A of N of positive upper density contains
arithmetic progressions of arbitrary nite length. See [253].

57. Solvability

A basic task in mathematics is to solve polynomial equations p(x) = an xn + an−1 xn−1 + · · · +

a1 x + a0 = 0 with complex coecients ak using explicit formulas involving roots. One calls
this nding an explicit algebraic solution. The √ linear case ax + b = 0 with x = −b/a, the
quadratic case ax + bx + c = 0 with x = (−b ± b2 − 4ac)/(2a) were known since antiquity.
2

The cubic x3 + ax2 + bx + C = 0 was solved by Niccolo Tartaglia and Cerolamo Cardano: a
rst substitution x = X − a/3 produces the depressed cubic X 3 + pX + q (rst solved by
Scipione dal Ferro). The substitution X = u − p/(3u) then produces a quadratic equation for
u3 . Lodovico Ferrari solved nally the quartic by reducing it to the cubic. It was Paolo Runi,
Niels Abel and Évariste Galois who realized that there are no algebraic solution formulas any
more for polynomials of degree n ≥ 5.

Theorem: Explicit algebraic solutions to p(x) = 0 exist if and only if n ≤ 4.

The quadratic case was settled over a longer period in independent developments in Babylonian,
Egyptian, Chinese and Indian mathematics. The cubic and quartic discoveries were dramatic
culminating with Cardano's book of 1545, marking the beginning of modern algebra. After
centuries of failures of solving the quintic, Paolo Runi published the rst proof in 1799, a
proof which had a gap but who paved the way for Niels Hendrik Abel and Évariste Galois. For
further discoveries see [500, 473, 13].

58. Galois theory

If F is sub-eld of E , then E is a vector space over F . The dimension of this vector space is
called the degree [E : F ] of the eld extension E/F . The eld extension is called nite if
[E : F ] is nite. A eld extension is called transcendental if there exists an element in E which
is not a root of an integral polynomial f with coecients in F . Otherwise, the extension is called
algebraic. In the later case, there exists a unique monic polynomial f which is irreducible over
F and the eld extension is nite. An algebraic eld extension E/F is called normal if every
irreducible polynomial over K with at least one root in E splits over F into linear factors. An

24
OLIVER KNILL

algebraic eld extension E/F is called separable if the associated irreducible polynomial f is
separable, meaningP that f ′ is not zero. This means, that F has zero characteristic or that f
is not of the form k ak xpk if F has characteristic p. A eld extension is called Galois if it
normal and separable. Let Fields(E/F ) be the set of subelds of E/F and Groups(E/F ))
the set of subgroups of the automorphism group Aut(E/F ). The Fundamental theorem of
Galois theory assures:

bijective
Theorem: Fields(E/F ) ↔ Groups(E/F ) if E/F is Galois.

The intermediate elds of E/F are so described by groups. It implies the Abel-Runi the-
orem about the non-solvability of the quintic by radicals. The fundamental theorem demon-
strates that solvable extensions correspond to solvable groups. The symmetry groups of
permutations of 5 or more elements are no more solvable. See [680].

59. Metric spaces

A topological space (X, O) is given by a set X and a nite collection O of subsets of X with
the property that the empty set ∅ and Ω both belong to O and that O is closed under arbitrary
unions and nite intersections. The sets in O are called open sets. Metric spaces (X, d) are
special topological spaces. In that case, O consists of all sets U such that for every x ∈ U there
exists r > 0 such that the open ball Br (x) = {y ∈ X | d(x, y) < r} is contained in U . Two
topological spaces (X, O), (Y, Q) are homeomorphic if there exists a bijection f : X → Y ,
such that f and f −1 are both continuous. A function f : X → Y is continuous if f −1 (A) ∈ O
for all A ∈ Q. When is a topological space homeomorphic to a metric space? The Urysohn
metrization theorem gives an answer: we need the regular Hausdor property meaning
that a closed set K and a point x can be separated by disjoint neighborhoods K ⊂ U, y ∈ V .
We also need the space to be second countable meaning that there is a countable topological
base (a topological base in O is a subset B ⊂ O such that every U ∈ O can be written as a
union of elements in B .)

Theorem: A second countable regular Hausdor space is metrizable.

The result was proven by Pavel Urysohn in 1925 with regular" replaced by normal" and by
Andrey Tychonov in 1926. It follows that a compact Hausdor space is metrizable if and only
if it is second countable. For literature, see [91].

60. Fixed point

Given a continuous transformation T : X → X of a compact topological space X , one

can look for the xed point set FixT (X) = {x | T (x) = x}. This is useful for nding
periodic points as xed points of T n = T ◦ T ◦ T · · · ◦ T are periodic points of period n.
If X has a nite cohomology like if X is a compact d-manifold with boundary, one can
look at the linear map Tp induced on the cohomology groups H p (X). The super trace
χT (X) = dp=0 (−1)p tr(Tp ) is called the Lefschetz number of T on X . If T is the identity,
P

this is the Euler characteristic. Let indT (x) be the Brouwer degree of the map T induced
on a small (d − 1)-sphere S centered at x. This is the trace of the linear map Td−1 induced
from T on the cohomology group H d−1 (S) which is an integer. If T is dierentiable and dT (x)
is invertible, the Brouwer degree is indT (x) = sign(det(dT )). Let FixT (X) denote the set of
xed points of T . The Lefschetz-Hopf xed point theorem is

25
FUNDAMENTAL THEOREMS

If FixT (X) is nite, then χT (X) = indT (x).

P
Theorem: x∈FixT (X)

A special case is the Brouwer xed point theorem: if X is a compact convex subset of
Euclidean space. In that case χT (X) = 1 and the theorem assures the existence of a xed
point. In particular, if T : D → D is a continuous map from the disc D = {x2 + y 2 ≤ 1}
onto itself, then T has a xed point. This Brouwer xed point theorem was proved in
1910 by Jacques Hadamard and Luitzen Egbertus Jan Brouwer. The Schauder xed point
theorem from 1930 generalizes the result to convex compact subsets of Banach spaces. The
Lefschetz-Hopf xed point theorem was given in 1926. For literature, see [190, 79].

61. Quadratic reciprocity

Given a prime p, a number a is called a quadratic residue if there exists a number x such
that x2 has remainder a modulo p. In other words, quadratic residues are the squares in the
eld Zp . The Legendre symbol (a|p) is dened by be 0 if a is 0 or a multiple of p and 1 if
a is a non-zero residue of p and −1 if it is not. While the integer 0 is sometimes considered
to be a quadratic residue we don't include it as it is a special case. Also, in the multiplicative
group Z∗p without zero, there is a symmetry: there are the same number of quadratic residues
and non-residues. This is made more precise in the law of quadratic reciprocity
p−1 q−1
Theorem: For any two odd primes (p|q)(q|p) = (−1) 2 2 .

This means that (p|q) = −(q|p) if and only if both p and q have remainder 3 modulo 4. The
odd primes with of the form 4k + 3 are also prime in the Gaussian integers. To remember
the law, one can think of them as Fermions" and quadratic reciprocity tells they Fermions
are anti-commuting. The odd primes of the form 4k + 1 factor by the 4-square theorem
in the Gaussian plane to p = (a + ib)(a − ib) and are as a product of two Gaussian primes
and are therefore Bosons. One can remember the rule because Bosons commute both other
particles so that if either p or q or both are Bosonic", then (p|q) = (q|p). The law of quadratic
reciprocity was rst conjectured by Euler and Legendre and published by Carl Friedrich Gauss
in his Disquisitiones Arithmeticae of 1801. (Gauss found the rst proof in 1796). [319, 356].

62. Quadratic map

Every quadratic map z → f (z) = z 2 + bz + d in the complex plane is conjugated by a linear

transformation to one of the quadratic family maps Tc (z) = z 2 + c. The Mandelbrot set
M = {c ∈ C, Tcn (0) stays bounded } is also called the connectedness locus of the quadratic
family because for c ∈ M , the Julia set Jc = {z ∈ C; T n (z) stays bounded } is connected and
for c ∈
/ M , the Julia set Jc is a Cantor set. The fundamental theorem for quadratic dynamical
systems is:

Theorem: The Mandelbrot set is connected.

Mandelbrot rst thought after doing experiments and picturing the set using a computer and
printing it out that it was disconnected. The theorem is due to Adrien Duady and John
Hubbard in 1982. One can also look at the connectedness locus for T (z) = z d + c, which leads
to Multibrot sets or the map z → z + c, which leads to the tricorn or mandelbar which is
not path connected. One does not know whether the Mandelbrot set M is locally connected,
nor whether it is path connected. See [520, 117, 49]

26
OLIVER KNILL

63. Differential equations

Let us say that a dierential equation x′ (t) = F (x(t)) is integrable if a trajectory x(t) either
converges to innity, or to an equilibrium point or to a limit cycle or to a limiting torus,
where it is a periodic or almost periodic trajectory. We assume that F has global solutions
meaning that a unique solution x(t), t ≥ 0 solving x′ = F (x) exists for all times The Poincaré-
Bendixon theorem is:

Theorem: Any dierential equation in the plane is integrable.

This changes in dimensions 3 and higher. The Lorenz attractor or the Rössler attractor are
examples of strange attractors, limit sets on which the dynamics can have positive topological
entropy and is therefore no more integrable. The theorem also does not hold any more if R2
is replaced by the 2-dimensional torus T2 because there can be recurrent non-periodic orbits
and even weak mixing situations can occur generically in smooth situations. The proof of the
Poincaré-Bendixon theorem relies on the Jordan curve theorem which states that a simple
closed curve has an interior and exterior in R2 . [140, 390].

64. Approximation theory

A function f on a closed interval I = [a, b] is called continuous if for every ϵ > 0 there exists
a δ > 0 such that if |x − y| < δ then |f (x) − f (y)| < ϵ. In the space X = C(I) of all continuous
functions, one can dene a distance d(f, g) = maxx∈I |f (x) − g(x)|. A subset Y of X is called
dense if for every ϵ > 0 and every x ∈ X , there exists y ∈ Y with d(x, y) < ϵ. Let P denote
the class of polynomials in X . The Weierstrass approximation theorem tells that

Theorem: Polynomials P are dense in continuous functions C(I).

The Weierstrass theorem has been proven in 1885 by Karl Weierstrass. A constructive
Pn proof sug-
gested by Sergey Bernstein in 1912 uses Bernstein polynomials fn (x) = k=0 f (k/n)Bk,n (x)
with Bk,n (x) = B(n, k)xk (1 − x)n−k , where B(n, k) denote the Binomial coecients. The result
has been generalized to compact Hausdor spaces X and more general subalgebras of C(X).
The Stone-Weierstrass approximation theorem was proven by Marshall Stone in 1937
and simplied in 1948 by Stone. In the complex, there is Runge's theorem from 1885 ap-
proximating functions holomomorphic on a bounded region G with rational functions uniformly
on a compact subset K of G and Mergelyan's theorem from 1951 allowing approximation
uniformly on a compact subset with polynomials if the region G is simply connected. In nu-
merical analysis one has the task to approximate a given function space by functions from a
simpler class. Examples are approximations of smooth functions by polynomials, trigonometric
polynomials. There is also the interpolation problem of approximating a given data set with
polynomials or piecewise polynomials like splines or Bézier curves. See [709, 539].

65. Diophantine approximation

An algebraic number is a root of a polynomial p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 with

integer coecients ak . A real number x is called Diophantine if there exists ϵ > 0 and a
positive constant C such that the Diophantine condition |x − p/q| > C/q 2+ϵ is satised for
all p, and all q > 0. Thue-Siegel-Roth theorem tells:

27
FUNDAMENTAL THEOREMS

Theorem: Any irrational algebraic number is Diophantine.

The Hurwitz's theorem

√ from 1891 assures that there are innitely many p, q with |x − p/q| <
C/q for C = 1/ 5. This shows that the Tue-Siegel-Roth √
2
Theorem can not be extended to
ϵ = 0. The√Hurwitz constant C is optimal. For any C < 1/ 5 one can with the golden ratio
x = (1 + 5)/2 have only nitely many p, q with |x − p/q| < C/q 2 . The set of Diophantine
numbers has full Lebesgue measure. A slightly larger set is the Brjuno set of all numbers
for which the continued fraction convergent pn /qn satises n log(qn+1 )/qn < ∞. A Brjuno
P
rotation number assures the Siegel linearization theorem still can be proven. For quadratic
polynomials, Jean-Christophe Yoccoz showed that linearizability implies the rotation number
must be a Brjuno number. [117, 333]

66. Almost periodicity

If µ is a probability measure of compact support on R, then µ̂n = einx dµ(x) are the
R
Fourier coecients of µ. The Riemann-Lebesgue lemma tells that if µ is absolutely
continuous, then µ̂n goes to zero. The pure point part can be detected with the following
Wiener theorem:
Pn
x∈T |µ({x})| .
1 2 2
P
Theorem: limn→∞ n k=1 |µ̂k | =
P ˆ ˆ
This looks a bit like the Poisson summationP n f (n), where f is the
P
formula n f (n) =
Fourier transform of f . [The later follows from n e2πikx
= n δ(x − n), where δ(x) is a Dirac
P

delta function. The Poisson formula holds if f is uniformly continuous and if both f and fˆ
satisfy the growth condition |f (x)| ≤ C/|1 + |x||1+ϵ . ] More generally, one can read o the
Hausdor dimension from decay rates of the Fourier coecients. See [399, 677].

67. Shadowing

Let T be a dieomorphism on a smooth Riemannian manifold M with geodesic metric

d. A T -invariant set is called hyperbolic if for each x ∈ K , the tangent space Tx M splits
into a stable and unstable bundle Ex+ ⊕ Ex− such that for some 0 < λ < 1 and constant C ,
one has dT Ex± = ET±x and |dT ±n v| ≤ Cλn for v ∈ E ± and n ≥ 0. An ϵ-orbit is a sequence
xn of points in M such that xn+1 ∈ Bϵ (T (xn )), where Bϵ is the geodesic ball of radius ϵ. Two
sequences xn , yn ∈ M are called δ -close if d(yn , xn ) ≤ δ for all n. We say that a set K has the
shadowing property, if there exists an open neighborhood U of K such that for all δ > 0
there exists ϵ > 0 such that every ϵ-pseudo orbit of T in U is δ -close to true orbit of T .
Theorem: Every hyperbolic set has the shadowing property.
This is only interesting for innite K as if K is a nite periodic hyperbolic orbit, then the orbit
itself is the orbit. It is interesting however for a hyperbolic invariant set like a Smale horse
shoe or in the Anosov case, which is the situation when the entire manifold is hyperbolic.
See [390].

68. Partition function

Let p(n) denote the number of ways we can write n as a sum of positive integers without
distinguishing the order. For example, p(4) = 5 because 4 = 1 + 3 = 2 + 2 = 1 + 1 + 2 =
1 + 1 + 1 + 1 can be written in 4 dierent ways as a sum of positive integers. Euler used

28
OLIVER KNILL

its generating function which is ∞

Q∞
k=1 (1 − x ) . The reciprocal function
n k −1
P
n=0 p(n)x =
(1 − x)(1 − x ) + (1 − x ) · · · is P
2 3
called the Euler function and generates the generalized
k k(3k−1)/2
Pentagonal number theorem k∈Z (−1) x = 1 − x − x2 + x5 − x7 − x12 − x15 · · ·
leading to the recursion p(n) = p(n−1)+p(n−2)−p(n−5)−p(n−7)+p(n−12)+p(n−15) · · · .
The Jacobi triple product identity is
Q∞ P∞ 2
Theorem: n=1 (1 − x2m )(1 − x2m−1 y 2 )(1 − x2m−1 y −2 ) = n=−∞ xn y 2n .
√ √
The formula was found in 1829 by Jacobi. For x = z z and y 2 = − z the identity reduces to
the pentagonal number theorem of Euler. See [21].

69. Burnside lemma

If G is a nite group acting on a nite set X , let X/G denote the number of disjoint orbits
and X g = {x ∈ X | g.x = x, ∀g ∈ G} the xed point set of elements which are xed by g .
The number |X/G| of orbits and the group order |G| and the size of the xed point sets
are related by the Burnside lemma:

1
|X g |
P
Theorem: |X/G| = |G| g∈G

The result was rst proven by Frobenius in 1887. Burnside popularized it in 1897 [108].

70. Taylor series

A complex-valued function f which is analytic in a disc D = Dr (a) = {|x − a| < r} can be

written as a series involving the n'th derivatives f (n) (a) of f at a. If f is real valued on the real
axes, the function is called real analytic in (x − a, x + a). In several dimensions we can use
multi-index notation a = (a1 , . . . , ad ), n = (n1 , . . . , nd ), x = (x1 , . . . , xd ) and xn = xn1 1 · · · xnd d
and f (n) (x) = ∂xn11 · · · ∂xndd and use a polydisc D = Dr (a) = {|x1 − a1 | < r1 , . . . |xd − ad | < rd }.
The Taylor series formula is:
P∞ f (n) (a)
Theorem: For analytic f in D, f (x) = n=0 n!
(x − a)n .

Here, Tr (a) = {|xi − a1 | = r1 . . . |xd − ad | = rd } isPthe boundary torus. For example, for
f (x) = exp(x), where f (n) (0) = 1, one has f (x) = ∞ n=0 x /n!. Using the dierential op-
n
(n)
∞ f (x) n
erator Df (x) = f ′ (x), one can see f (x + t) = t = eDt f (x) as a solution of
P
n=0 n!
the transport equation ft = Df . One can also represent f as a Cauchy formula for
polydiscs 1/(2πi)d |Tr (a)| f (z)/(z − a)d dz integrating along the boundary torus. Finite Taylor
R

series hold in the case if f is m + 1 times dierentiable. In that case one has a nite series
f (n) (a)
S(x) = m (x − a)n such that the Lagrange rest term is f (x) − S(x) = R(x) =
P
n=0 n!
f m+1 (ξ)(x − a)m+1 /((m + 1)!), where ξ is between x and a. This generalizes the mean value
theorem in R xthe(m+1)
case m = 0, where f is only dierentiable. The remainder term can also be
written as a f (s)(x − a)m /m! ds. Brook Taylor did state but not justify the formula in
1715. In 1742 Colin Maclaurin uses the modern form. [443].

29
FUNDAMENTAL THEOREMS

71. Isoperimetric inequality

Given a smooth surface S in Rn homeomorphic to a sphere and bounding a region B . Assume

that the surface area |S| is xed. How large can the volume |B| of B become? If B is the
unit ball B1 with volume |B1 | the answer is given by the isoperimetric inequality:

Theorem: nn |B|n−1 ≤ |S|n /|B1 |.

If B = B1 , this gives n|B| ≤ |S|, which is an equality as then the volume of the ball |B| =
π n/2 /Γ(n/2+1) and the surface area of the sphere |S| = nπ n/2 /Γ(n/2+1) which Archimedes
rst got in the case n = 3, where |S| = 4π and |B| = 4π/3. The classical isoperimetric
problem is n = 2, where we are in the plane R2 . The inequality tells then 4|B| ≤ |S|2 /π
which means 4πArea ≤ Length2 . The ball B1 with area 1 maximizes the functional. For
n = 3, with usual Euclidean space R3 , the inequality tells |B|2 ≤ (4π)3 /(27 · 4π/3) which is
|B| ≤ 4π/3. The rst proof in the case n = 2 was attempted by Jakob Steiner in 1838 using
the Steiner symmetrization process which is a renement of the Archimedes-Cavalieri
principle. In 1902 a proof by Hurwitz was given using Fourier series. The result has been
extended to geometric measure theory [231]. One can also look at the discrete problem to
maximize the area dened by a polygon: if {(xi , yi ), i = 0, . . . n − 1} are the points of the
Pn−1
polygon, then the area is given by Green's formula as A = i=0 xi yi+1 − xi+1 yi and the length
Pn−1
is L = i=0 (xi − xi+1 ) + (yi − yi+1 ) with (xn , yn ) identied with (x0 , y0 ). The Lagrange
2 2

equations for A under the constraint L = 1 together with a x of (x0 , y0 ) and (x1 = 1/n, 0)
produces two maxima which are both regular polygons. A generalization to n-dimensional
Riemannian manifolds is given by the Lévi-Gromov isoperimetric inequality.

72. Riemann Roch

A Riemann surface is a one-dimensional complex manifold. It is a two-dimensional real

analytic manifold but it has also a complex structure forcing it to be orientable for example.
Let G be a compact connected Riemann surface of Euler characteristic χ(G) = 1 − g , where
g = b1 (G) is the genus, the number of handles P of G (and 1 = b0 (G) indicates that we have
only one connected component). A divisor D = i ai zi on G is an element of the free Abelian
group on the points of the surface. These are nite formal sums of points zi in G, where P ai ∈ Z
is the multiplicity of the point zi . The degree of the divisor is dened as deg(D) = i ai . Let
us write χ(D) = deg(D) + χ(G) = deg(D) + 1 − g and call this the Euler characteristic of
the divisor D as one can see a divisor as a geometric object by itself generalizing the complex
manifold X (which is thePcase D P = 0). A meromorphic function f on G denes the
principal divisor (f ) = a z
i i i − j bj wj , where ai are the multiplicities of the roots zi of
f and bj the multiplicities of the poles wj of f . The principal divisor of a global meromorphic
1-form dz which is called the canonical divisor K . Let l(D) be the dimension of the linear
space of meromorphic functions f on G for which (f ) + D ≥ 0. (The notation ≥ 0 means that
all coecients are non-negative. One calls such a divisor eective). The Riemann-Roch
theorem is
Theorem: l(D) − l(K − D) = χ(D)

The idea of a Riemann surfaces was dened by Bernhard Riemann. Riemann-Roch was proven
for Riemann surfaces by Bernhard Riemann in 1857 and Gustav Roch in 1865. It is possible
to see this as a Euler-Poincaré type relation by identifying the left hand side as a signed

30
OLIVER KNILL

cohomological Euler characteristic and the right hand side as a combinatorial Euler character-
istic. There are various generalizations, to arithmetic geometry or to higher dimensions. See
[290, 628].

73. Optimal transport

Given two probability spaces (X, P ), (Y, Q) and a continuous cost function c : X ×Y → [0, ∞],
the optimal transport problem or Monge-Kantorovich minimization problem is to nd
the minimum of X c(x, T (x)) dP (x) among all coupling transformations T : X → Y which
R
have the property that it transports the measure P to the measure Q. More generally, one
looks at a measure π on X × Y such that the projection of π Ronto X it is P and the projection
of π onto Y is Q. The function to optimize is then I(π) = X×Y c(x, y) dπ(x, y). One of the
fundamental results is that optimal transport exists. The technical assumption is that if the
two probability spaces X, Y are Polish (=separable complete metric spaces) and that the cost
function c is continuous.
Theorem: For continuous cost functions c, there exists a minimum of I .

In the simple set-up of probability spaces, this just follows from the compactness (the Alaoglu
theorem for balls in the weak star topology of a Banach space) of the set of probability measures:
any sequence πn of probability measures on X × Y has a convergent subsequence. Since I is
continuous, picking a sequence πn with I(πn ) decreasing produces to a minimum. The problem
was formalized in 1781 by Gaspard Monge and worked on by Leonid Kantorovich. Hirisho
Tanaka in the 1970ies produced connections with partial dierential equations like the Bolzmann
equation. There are also connections to weak KAM theory in the form of Aubry-Mather
theory. The above existence result is true under substantial less regularity. The question of
uniqueness or the existence of a Monge coupling given in the form of a transformation T is
subtle [723].

74. Structure from motion

Given m hyper planes in Rd serving as retinas or photographic plates for ane cameras and
n points in Rd . The ane structure from motion problem is to understand under which
conditions it is possible to recover both the points and planes when knowing the orthogonal
projections onto the planes. It is a model problem for the task to reconstruct both the scene
as well as the camera positions if the scene has n points and m camera pictures were taken.
Ullman's theorem is a prototype result with n = 3 dierent cameras and m = 3 points which are
not collinear. Other setups are perspective cameras or omni-directional cameras. The
Ullman map F is a nonlinear map from Rd·2 × SOd2 to (R3d−3 )2 which is a map between equal
dimensional spaces if d = 2 and d = 3. The group SOd is the rotation group in R describing the
possible ways in which the ane camera can be positioned. Ane cameras capture the same
picture when translated so that the planes can all go through the origin. In the case d = 2, we
get a map from R4 × SO22 to R6 and in the case d = 3, F maps R6 × SO32 into R12 .

Theorem: The structure from motion map is locally invertible.

In the case d = 2, there is a reection ambiguity. In dimension d = 3, the number of ambiguities

is typically 64. Ullman's theorem appeared in 1979 in [715]. Ullman states the theorem for d=3
with 4 points as adding a four point cuts the number of ambiguities from 64 to 2. See [427]

31
FUNDAMENTAL THEOREMS

both in dimension d=2 and d=3 the Jacobean dF of the Ullman map is seen to be invertible
and the inverse of F is given explicitly. For structure from motion problems in computer vision
in general, see [230, 320, 710]. In applications one takes n and m large and reconstructs both
the points as well as the camera parameters using statistical data tting.

75. Poisson equation

What functions u solve the Poisson equation −∆u = f , a Rpartial dierential equation? The
right hand side can be written down for f ∈ L1 as Kf (x) = Rn G(x, y)f (y) dy + h, where h is
harmonic. If f = 0, then the Poisson equation is the Laplace equation. The function G(x, y)
is the Green's function, an integral kernel. It satises −∆G(x, y) = δ(y −x), where δ is the
Dirac delta function, a distribution. It is given by G(x, y) = − log |x − y|/(2π) for n = 2 or
G(x, y) = |x − y|−1 /(4π) for n = 3. In elliptic regularity theory, one replaces the Laplacian
−∆ with an elliptic second order dierential operator L = A(x) · D · D + b(x) · D + V (x)
where D = ∇ is the gradient and A is a positive denite matrix, b(x) is a vector eld and c is
a scalar eld.

Theorem: For f ∈ Lp and p > n, then Kf is dierentiable.

The result is much more general and can be extended. If f is in C k and has compact support
for example, then Kf is in C k+1 . An example of the more general set up is the Schrödinger
operator L = −∆ + V (x) − E . The solution to Lu = 0, solves then an eigenvalue problem.
As one looks for solutions in L2 , the solution only exists if E is an eigenvalue of L. The
Euclidean space Rn can be replaced by a bounded domain Ω of Rn where one can look at
boundary conditions like of Dirichlet or von Neumann type. Or one can look at the situation
on a general Riemannian manifold M with orRwithout boundary. On a Hilbert space, one has
then Fredholm theory. The equation P un = nG(x, y)f (y)dy is called a Fredholm integral
equation and det(1 − sG) = exp(− n s tr(G )/n!) the Fredholm determinant leading to
the zeta function 1/det(1 − sG). See [597, 471].

76. Four square theorem

Waring's problem asked whether there exists for every k an integer g(k) such that every
positive integer can be written as a sum of g(k) powers xk1 + · · · + xkg(k) . Obviously g(1) = 1.
David Hilbert proved in 1909, that g(k) is nite. This is the Hilbert-Waring theorem. The
following theorem of Lagrange tells that g(2) = 4:

Theorem: Every positive integer is a sum of four squares

.
The result needs only to be veried for prime numbers as N (a, b, c, d) = a2 + b2 + c2 + d2
is a norm for quaternions q = (a, b, c, d) which has the property N (pq) = N (p)N (q). This
property can be seen also as a Cauchy-Binet formula, when writing quaternions as complex
2 × 2 matrices. The four-square theorem had been conjectured already by Diophantus, but
was proven rst by Lagrange in 1770. The case g(3) = 9 was done by Wieferich in 1912. It is
conjectured that g(k) = 2k + [(3/2)k ] − 2, where [x] is the integral part of a real number. See
[176, 177, 356].

32
OLIVER KNILL

77. Knots

A knot is a closed curve in R3 , an embedding of the circle in three dimensional Euclidean

space. One also draws knots in the 3-sphere S 3 . As the knot complement S 3 − K of a knot
K characterizes the knot up to mirror reection, the theory of knots is part of 3-manifold
theory. The HOMFLYPT polynomial P of a knot or link K is dened recursively using
skein relations lP (L+ ) + l−1 P (L− ) + mP (L0 ) = 0. Let K#L denote the knot sum which
is a connected sum. Oriented knots form with this operation a commutative monoid with
unknot as unit. It features a unique prime factorization. The unknot has P (K) = 1, the
unlink has P (K) = 0. The trefoil knot has P (K) = 2l2 − l4 + l2 m2 .

Theorem: P (K#L) = P (K)P (L).

The Alexander polynomial was discovered in 1928 and initiated classical knot theory. John
Conway showed in the 60ies how to compute the Alexander polynomial using a recursive skein
relations (skein comes from French escaigne=hank of yarn). The Alexander polynomial allows
to compute an invariant for knots by looking at the projection. The Jones polynomial found
by Vaughan Jones came in 1984. This is generalized by the HOMFLYPT polynomial named
after Jim Hoste, Adrian Ocneanu, Kenneth Millett, Peter J. Freyd and W.B.R. Lickorish from
1985 and J. Przytycki and P. Traczyk from 1987. See [5]. Further invariants are Vassiliev
invariants of 1990 and Kontsevich invariants of 1993.

78. Hamiltonian dynamics

Given a probability space (M, A, m) and a smooth Lie manifold N with potential function
V : N → R, the Vlasov R Hamiltonian dierential equations on all maps X = (f, g) : M →
T N is f = g, g = N ∇V (f (x) − f (y)) dm(y). Starting with X0 = Id, we get a ow Xt
∗ ′ ′

and by push forward an evolution P t = Xt∗ m of probability measures on N . The Vlasov intro-
dierential equations on measures in T ∗ N are Ṗ t (x, y) + y · ∇x P t (x, y) − W (x) · ∇y P t (x, y) = 0
with W (x) = M ∇x V (x − x′ )P t (x′ , y ′ )) dy ′ dx′ . Note that while Xt is an innite dimensional
R

ordinary dierential equations evolving maps M → T ∗ N , the path P t is an integro

dierential equation describing the evolution of measures on T ∗ N .

Theorem: If Xt solves the Vlasov Hamiltonian, then P t = Xt∗ m solves Vlasov.

This is a result which goes back to James Clerk Maxwell. Vlasov dynamics was introduced in
1938 by Anatoly Vlasov. An existence result was proven by W. Brown and Klaus Hepp in 1977.
The maps Xt will stay perfectly smooth if smooth initially. However, even if P 0 is smooth,
the measure P t in general rather quickly develops singularities so that the partial dierential
equation has only weak solutions. The analysis of P directly would involve complicated
function spaces. The fundamental theorem of Vlasov dynamics therefore plays the role
of the method of characteristics in this eld. If M is a nite probability space, then the
Vlasov Hamiltonian system is the Hamiltonian n-body problem on N . An other example is
M = T ∗ N and where m is an initial phase space measure. Now Xt is a one parameter family of
dieomorphisms Xt : M → T ∗ N pushing forward m to a measure P t on the cotangent bundle.
If M is a circle then X 0 denes a closed curve on T ∗ N . In particular, if γ(t) is a curve in N and
X 0 (t) = (γ(t), 0), we have a continuum of particles initially at rest which evolve by interacting
with a force ∇V . About interacting particle dynamics, see [667].

33
FUNDAMENTAL THEOREMS

79. Hypercomplexity

A hypercomplex algebra is a nite dimensional algebra over R which is unital and dis-
tributive. The classication of hypercomplex algebras (up to isomorphism) of two-dimensional
hypercomplex algebras over the reals are the complex numbers x + iy with i2 = −1, the
split complex numbers x + jy with j 2 = −1 and the dual numbers (the exterior algebra)
x + ϵy with ϵ2 = 0. A division algebra over a eld F is an algebra over F in which division
is possible. Wedderburn's little theorem tells that a nite division algebra must be a nite
eld. Only C is the only two dimensional division algebra over R. The following theorem of
Frobenius classies the class X of nite dimensional associative division algebras over R:

Theorem: X consists of the algebras R, C and H.

Hypercomplex numbers like quaternions, tessarines or octonions extend the algebra of
complex numbers. Cataloging them started with Benjamin Peirce 1872 "Linear associative
algebra". Dual numbers were introduced in 1873 by William Cliord. The Cayley-Dickson
constructions generates iteratively algebras of twice the dimensions: like the complex numbers
from the reals, the quaternions from the complex numbers or the octonions from the quaternions
(for octonions associativity is lost). The next step leads to sedenions but the later are not
even an alternative algebra any more. The Hurwitz and Frobenius theorems limit the number
in the case of real normed division algebras. Ferdinand George Frobenius classied in 1877 the
nite-dimensional associative division algebras. Adolf Hurwitz proved in 1923 (posthumously)
that unital nite dimensional real algebra endowed with a positive-denite quadratic form (a
real normed division algebra must be R, C, H or O). These four are the only Euclidean
Hurwitz algebras. In 1907, Joseph Wedderburn classied simple algebras (simple meaning
that there are no non-trivial two-sided ideals and so that ab = 0 implies a = 0 or b = 0). In 1958
J. Frank Adams showed topologically that R, C, H, O are the only nite dimensional real division
algebras. In general, division algebras have dimension 1, 2, 4 or 8 as Michel Kervaire and Raoul
Bott and John Milnor have shown in 1958 by relating the problem to the parallelizability of
spheres. The problem of classication of division algebras over a eld F led Richard Brauer
to the Brauer group BR(F ), which Jean Pierre Serre identied it with Galois cohomology
H 2 (K, K ∗ ), where K ∗ is the multiplicative group of K seen as an algebraic group. Each Brauer
equivalence class among central simple algebras (Brauer algebras) contains a unique division
algebra by the Artin-Wedderburn theorem. Examples: the Brauer group of an algebraically
closed eld or nite eld is trivial, the Brauer group of R is Z2 . Brauer groups were later
dened for commutative rings by Maurice Auslander and Oscar Goldman and by Alexander
Grothendieck in 1968 for schemes. Ofer Gabber extended the Serre result to schemes with
ample line bundles. The niteness of the Brauer group of a proper integral scheme is open. See
[37, 228].

80. Approximation

The Kolmogorov-Arnold superposition theorem shows that continuous functions C(Rn )

of several variables can be written as a composition of continuous functions of two variables:

Theorem: Every f ∈ C(Rn ) composition of continuous functions in C(R2 ).

More precisely, it is now known since 1962 that there exist functions fk,l and a function g
P2n
in C(R) such that f (x1 , . . . , xn ) = k=0 g(fk,1 (x1 ) + · · · + fk,n xn ). As one can write nite

34
OLIVER KNILL

sums using functions of two variables like h(x, y) = x + y or h(x + y, z) = x + y + z two

variables suce. The above form was given by by George Lorentz in 1962. Andrei Kolmogorov
reduced the problem in 1956 to functions of three variables. Vladimir Arnold showed then (as a
student at Moscow State university) in 1957 that one can do with two variables. The problem
came from a more specic problem in algebra, the problem of nding roots of a polynomial
p(x) = xn + a1 xn−1 + · · · an using radicals and arithmetic operations in the coecients is not
possible in general for n ≥ 5. Erland Samuel Bring shows in 1786 that a quintic can be reduced
to x5 + ax + 1. In 1836 William Rowan Hamilton showed that the sextic can be reduced to
x6 + ax2 + bx + 1 to x7 + ax3 + bx2 + cx + 1 and the degree 8 to a 4 parameter problem
x8 + ax4 + bx3 + cx2 + dx + 1. Hilbert conjectured that one can not do better. They are the
Hilbert's 13th problem, the sextic conjecture and octic conjecture. In 1957, Arnold and
Kolmogorov showed that no topological obstructions exist to reduce the number of variables.
Important progress was done in 1975 by Richard Brauer. Some history is given in [229]:

81. Determinants

The determinant of a n × n matrix A is dened as the sum π (−1)sign(π) A1π(1) · · · Anπ(n) ,

P
where the sum is over all n! permutations π of {1, . . . , n} and sign(π) is the signature of
the permutation π . The determinant functional satises the product formula det(AB) =
det(A)det(B). As the determinant is the constant coecient of the characteristic polyno-
mial pA (x) = det(A − x1) = p0 (−x)n + p1 (−x)n−1 + · · · + pk (−x)n−k + · · · + pn of A, one can
get the coecients of the product F T G of two n × m matrices F, G as follows:

|P |=k det(FP ) det(GP ).

P
Theorem: pk =

The right hand side is a sum over all minors of length k including the empty one |P | =
0, where det(FP )Pdet(GP ) = 1. This implies det(1 + F T G) = P ) det(GP ) and so
P
P det(F
P det (FP ). The classical Cauchy-Binet theorem is the special case k = m,
T 2
det(1 + F F ) = P
where det(F T G) = P det(FP )det(GP ) is a sum over all m×m patterns if n ≥ m. It has as even
more special case the Pythagorean consequence det(A A) = P det(A2P ). The determinant
T
P
product formula is the even more special case when n = m. [369, 423, 341].

82. Triangles

A triangle T on a two-dimensional surface S is dened by three points A, B, C joined by three

geodesic paths. (It is assumed that the three geodesic paths have no self-intersections nor other
intersections besides A, B, C so that T is a topological disk with a piecewise geodesic boundary).
If α, β, γ are the inner anglesR of a triangle T located on a surface with curvature K , there
is the Gauss-Bonnet formula S K(x)dA(x) = χ(S), where dA denotes the area element on
the surface. This implies a relation between the integral of the curvature over the triangle and
the angles:
R
Theorem: α + β + γ = T K dA + π

This can be seen as a special

R Gauss-Bonnet result for Riemannian manifolds with bound-
ary as it is equivalent to T K dA + α + β + γ ′ = 2π with complementary angles α′ =
′

π − α, β ′ = π − β, γ ′ = π − γ . One can think of the vertex contributions as boundary cur-

vatures (generalized function). In the case of constant curvature K , the formula becomes
α + β + γ = KA + π , where A is the area of the triangle. Since antiquity, one knows the

35
FUNDAMENTAL THEOREMS

at case K = 0, where π = α + β + γ taught in elementary school. On the unit sphere

this is α + β + γ = A + π , result of Albert Girard which was predated by Thomas Harriot.
In the Poincaré disk model K = −1, this is α + β + γ = −A + π which is usually stated
that the area of a triangle in the disk is π − α − β − γ . This was proven by Johann Heinrich
Lambert. See [101] for spherical geometry and [19] for hyperbolic geometry, which are both
part of non-Euclidean geometry and now part of Riemannian geometry. [58, 381]

83. KAM

An area preserving map T (x, y) = (2x−y +cf (x), x) has an orbit (xn+1 , xn ) on T2 = (R/Z)2
which satises the recursion xn+1 − 2xn + xn−1 = cf (xn ). The 1-periodic function f is assumed
R1
to be real-analytic, non-constant satisfying 0 f (x) dx = 0. In the case f (x) = sin(2πx), one
has the Standard map. When looking for invariant curves (q(t + α), q(t)) with smooth q , we
seek a solution of the nonlinear equation F (q) = q(t + α) − 2q(t) + q(t − α) − cf (q(t)) = 0. For
c = 0, there is the solution q(t) = t. The linearization dF (q)(u) = Lu = u(t + α) − 2u(t) +
u(t−α)−cf ′ (q(t))u(t) is a bounded linear operator on L2 (T) but it is not invertible for c = 0 so
that the implicit function theorem does not apply. The map Lu = u(t+α)−2u(t)+u(t−α)
becomes after a Fourier transform the diagonal matrix L̂ûn = [2 cos(nα) − 2]ûn which has the
inverse diagonal entries [2 cos(nα) − n]−1 leading to small divisors. A real number α is called
Diophantine if there exists a constant C such that for all integers p, q with q ̸= 0, we have
|α − p/q| ≥ C/q 2 . KAM theory assures that the solution q0 (t) = t persists and remains
smooth qc (t) if c is small. With solution the theorem means a smooth solution. For real
analytic F , it can be real analytic. The following result is a special case of the twist map
theorem.

Theorem: For Diophantine α, there is a solution of F (q) = 0 for small |c|.

The KAM theorem was predated by the Poincaré-Siegel theorem in complex dynamics
which assured that if f is analytic near z = 0 and f ′ (0) = λ = exp(2πiα) with Diophantine α,
then there exists u(z) = z + q(z) such that f (u(z)) = u(λz) holds in a small disk 0: there is an
analytic solution q to the Schröder equation λz + g(z + q(z)) = q(λz). The question about
the existence of invariant curves is important because it determines the stability. The twist
map theorem result follows also from a strong implicit function theorem initiated by John
Nash and Jürgen Moser. For larger c, or non-Diophantine α, the solution q still exists but it is
no more continuous. This is Aubry-Mather theory. For c ̸= 0, the operator L̂ is an almost
periodic Toeplitz matrix on l2 (Z) which is a special kind of discrete Schrödinger operator.
The decay rate of the o diagonals depends on the smoothness of f . Getting control of the
inverse can be technical [85]. Even in the Standard map case f (x) = sin(x), the composition
f (q(t)) is no more a trigonometric polynomial so that L̂ is not a Jacobi matrix in a strip.
The rst breakthrough of the theorem in a frame work of Hamiltonian dierential equations
was done in 1954 by Andrey Kolmogorov. Jürgen Moser proved the discrete twist map version
and Vladimir Arnold in 1963 proved the theorem for Hamiltonian systems. The above stated
result generalizes to higher dimensions where one looks for invariant tori called KAM tori
and where one also needs some non-degeneracy conditions. See [117, 530, 532]. For the story
on KAM, see [198].

36
OLIVER KNILL

84. Continued Fraction

Given a positive square free integer d, the Diophantine equation√x2 −dy 2 = 1 is called √ Pell's
equation. Solving it means to nd a nontrivial unit in the ring Z[ d] because (x + y d)(x −
√
y d) = 1. The trivial solutions are x = ±1, y = 0. Solving the equation is therefore part of the
Dirichlet unit problem from algebraic number theory. Let [a0 ; a1 , . . . ] denote the continued
√
fraction expansion of x = d. This means a0 = [x] is the integer part and [1/(x − a0 )] = a1
etc. If x = [a0 ; a1 , . . . , an + bn ], then an+1 = [1/bn ]. Let
√ pn /qn = [a0 ; a1 , a2 , . . . , an ] denote the
n'th convergent to the regular continued fraction of d. A solution (x1 , y1 ) which minimizes
x is called the fundamental solution. The theorem tells that it is of the form (pn , qn ):

Theorem: Any solution to the Pell's equation is a convergent pn /qn .

√
One can nd more solutions recursively because the ring of units in Z[ d] √ is Z2 × Cn for
√ some
cyclic group Cn . The other solutions (xk , yk ) can be obtained from xk + dyk = (x1 + dy1 )k .
One of the rst instances, where the equation appeared is in the Archimedes cattle problem
which is x2 − 410286423278424y 2 = 1. The equation is named after John Pell, who has nothing
to do with the equation. It was Euler who attributed the solution by mistake to Pell. It was
rst found by William Brouncker. The approach through continued fractions started with Euler
and Lagrange. See [602, 95, 468].

85. Gauss-Bonnet-Chern

Let (M, g) be a Riemannian manifold of dimension d with volume element dµ. If

ij
Rkl is Riemann curvature tensor with respect to the metric g , dene the constant C =
σ(1)σ(2) σ(d−1)σ(d)
((4π)d/2 (−2)d/2 (d/2)!)−1 and the curvature K(x) = C σ,π sign(σ)sign(π)Rπ(1)π(2) · · · Rπ(d−1)π(d) ,
P

where the sum is over all permutations π, σ of {1, . . . , d}. It can be interpreted as a Pfaan.
In odd dimensions, the curvature is zero. Denote by χ(M ) the Euler characteristic of M .

K(x) dµ(x) = 2πχ(M ).

R
Theorem: M

The case d = 2 was solved by Carl Friedrich Gauss and by Pierre Ossian Bonnet in 1848. Gauss
knew the theorem but never published it. In the case d = 2, the curvature K is the Gaussian
curvature which is the product of the principal curvatures κ1 , κ2 at a point. For a sphere
of radius R for example, the Gauss curvature is 1/R R and χ(M ) = 2. The volume form is
2

then the usual area element normalized so that M 1 dµ(x) = 1. Allendoerfer-Weil in 1943
gave the rst proof, based on previous work of Allendoerfer, Fenchel and Weil. Chern nally, in
1944 proved the theorem independent of an embedding. [166] features a proof of Vijay Kumar
Patodi. A more classical approach is in in [711].

86. Atiyah-Singer

Assume M is a compact orientable nite dimensional manifold of dimension n and assume D

is an elliptic dierential operator D : E → F between two smooth vector bundles E, F
over M . Using multi-index notation Dk = ∂xk11 · · · ∂xknn , a dierential operator k ak (x)Dk x
P
is called elliptic if for all x, its symbol the polynomial σ(D)(y) = |k|=n ak (x)y k is not zero
P
for nonzero y . Elliptic regularity assures that both the kernel of D and the kernel of the
adjoint D ∗ : F → E are both nite dimensional. The analytical index of D is dened as
χ(D) = dim(ker(D)) − dim(ker(D∗ )). We think of it as the Euler characteristic of D. The

37
FUNDAMENTAL THEOREMS

topological index of D is dened as the integral of the n-form KD = (−1)n ch(σ(D))·td(T M ),

over M . This n-form is the cup product · of the Chern character ch(σ(D)) and the Todd
class of the complexied tangent bundle T M of M . We think about KD as a curvature.
Integration is done over the fundamental class [M ] of M which is the natural volume form
on M . The Chern character and the Todd classes are both mixed rational cohomology classes.
On a complex vector bundle E they are both given by concrete power series of Chern classes
ck (E) like ch(E) = ea1 (E) + · · · + ean (E) and td(E) = a1 (1 + e−a1 )−1 · · · an (1 + e−an )−1 with
ai = c1 (Li ) if E = L1 ⊕ · · · ⊕ Ln is a direct sum of line bundles.

The analytic index and topological indices agree: χ(D) = KD .

R
Theorem: M

In the case when D = d+d∗ from the vector bundle of even forms E to the vector bundle of odd
forms F , then KD is the Gauss-Bonnet curvature and χ(D) = χ(M ). Israil Gelfand conjectured
around 1960 that the analytical index should have a topological description. The Atiyah-Singer
index theorem has been proven in 1963 by Michael Atiyah and Isadore Singer. The result
generalizes the Gauss-Bonnet-Chern and Riemann-Roch-Hirzebruch theorem. According to
[606], the theorem is valuable, because it connects analysis and topology in a beautiful and
insightful way". See [559].

87. Complex multiplication

A n'th root of unity is a solution to the equation z n = 1 in the complex plane C. It is called
primitive if it is not a solution to z k = 1 for some 1 ≤ k < n. A cyclotomic eld is a
number eld Q(ζn ) which is obtained by adjoining a complex primitive root of unity ζn to
Q. Every cyclotomic eld is an Abelian eld extension of the eld of rational numbers Q. The
Kronecker-Weber theorem reverses this. It is also called the main theorem of class eld
theory over Q

Theorem: Every Abelian extension L/Q is a subeld of a cyclotomic eld.

Abelian eld extensions of Q are also called class elds. It follows that any algebraic number
eld K/Q with Abelian Galois group has a conductor, the smallest n such that K lies in
the eld generated by n'th roots of unity. Extending this theorem to other base number elds is
Kronecker's Jugendtraum or Hilbert's twelfth problem. The theory of complex mul-
tiplication does the generalization for imaginary quadratic elds. The theorem was stated
by Leopold Kronecker in 1853 and proven by Heinrich Martin Weber in 1886. A generalization
to local elds was done by Jonathan Lubin and John Tate in 1965 and 1966. (A local eld is
a locally compact topological eld with respect to some non-discrete topology. The list of local
elds is R, C, eld extensions of the p-adic numbers Qp , or formal Laurent series Fq ((t)) over
a nite eld Fq .) The study of cyclotomic elds came from elementary geometric problems
like the construction of a regular n-gon with ruler and compass. Gauss constructed a regular
17-gon and showed that a regular n-gon can be constructed if and only if n is a Fermat
n
prime Fn = 22 + 1 (the known ones are 3, 7, 17, 257, 65537 and a problem of Eisenstein of 1844
asks whether there are innitely many). Further interest came in the context of Fermat's last
theorem because xn + y n = z n can be written as xn + y n = (x + y)(x + ζy) · · · (x + ζ n−1 y),
where ζ is an n'th root of unity for n > 2.

38
OLIVER KNILL

88. Choquet theory

Let K be a compact and convex set in a Banach space X . A point x ∈ K is called extreme
if x is not in an open interval (a, b) with a, b ∈ K . Let E be the set of extreme points in K . The
Krein-Milman theorem, proven in 1940 by Mark Krein and David Milman, assures R that K
is the convex hull of E . Given a probability measure µ on E , it denes the point x = ydµ(y).
We say that x is the Barycenter of µ. The Choquet theorem is
Theorem: Every point in K is a Barycenter of its extreme points.
This result of Choquet implies the Krein-Milman theorem. It generalizes to locally compact
topological spaces. The measure µ is not unique in general. It is in nite dimensions if K is
a simplex. But in general, as shown by Heinz Bauer in 1961, for an extreme point x ∈ K the
measure µx is unique. It has been proven by Gustave Choquet in 1956 and was generalized
by Erret Bishop and Karl de Leeuw in 1959. [573]

89. Helly's theorem

Given a family K = {K1 , . . . Kn } of convex sets K1 , K2 , . . . , Kn in the Euclidean space Rd

and assume that n > d. Let Km denote the set of subsets of K which have exactly m elements.
We say that Km has the intersection property if each of its elements have a non-empty
common intersection. The theorem of Helly assures that
Theorem: Kn has the intersection property if Kd+1 has.

For example, given n = 3 intervals K = {A, B, C} in R1 , then K3 has the intersection property
if and only if {A, B}, {B, C} and {A, C} all have the intersection property. The theorem was
proven in 1913 by Eduard Helly. It generalizes to an innite collection of compact, convex
subsets. This theorem led Johann Radon to prove in 1921 the Radon theorem which states
that any set of d + 2 points in Rd can be partitioned into two disjoint subsets whose convex
hull intersect. A nice application of Radon's theorem is the Borsuk-Ulam theorem which
states that a continuous function f from the d-dimensional sphere S n to Rd must map a pair of
antipodal points to the same point meaning that f (x) = f (−x) has a solution. For example,
if d = 2, this implies that on earth, there are at every moment two antipodal points on the
Earth's surface for which the temperature and the pressure are the same. The Borsuk-Ulam
theorem appears rst have been stated in work of Lazar Lyusternik and Lev Shnirelman in
1930, and was proven by Karol Borsuk in 1933 who attributed it to Stanislav Ulam.

90. Weak Mixing

An automorphism T of a probability space (X, A, m) is a measure preserving invertible

measurable transformation from X to X . It is called ergodic if T (A) = A implies m(A) = 0
or m(A) = 1. It is called mixing if m(T n (A) ∩ B) → m(A) · m(B) for n → ∞ for all A, B .
It is called weakly mixing if n−1 n−1k=0 |m(T (A) ∩ B) − m(A) · m(B)| → 0 for all A, B ∈ A
k
P
and n → ∞. This is equivalent to the fact that the unitary operator U f = f (T ) on L2 (X) has
no point spectrum when restricted to the orthogonal complement of the constant functions.
A topological transformation (a continuous map on a locally compact topological space) with
a weakly mixing invariant measure is not integrable as for integrability, one wants every
invariant measure to lead to an operator U with pure point spectrum and conjugating it so to
a group translation. Let G be the complete topological group of automorphisms of (X, A, m)

39
FUNDAMENTAL THEOREMS

with the weak topology: Tj converges to T weakly, if m(Tj (A)∆T (A)) → 0 for all A ∈ A; this
topology is metrizable and completeness is dened with respect to an equivalent metric.

Theorem: A generic T is weakly mixing and so ergodic.

Anatol Katok and Anatolii Mikhailovich Stepin in 1967 [392] proved that purely singular con-
tinuous spectrum of U is generic. A new proof was given by [134] and a short proof in using
Rokhlin's lemma, Halmos conjugacy lemma and a Simon's wonderland theorem" estab-
lishes both genericity of weak mixing and genericity of singular spectrum. On the topological
side, a generic volume preserving homeomorphism of a manifold has purely singular continuous
spectrum which strengthens Oxtoby-Ulam's theorem [557] about generic ergodicity. [393, 312]
The Wonderland theorem of Simon [651] also allowed to prove that a generic invariant measure
of a shift is singular continuous [417] or that zero-dimensional singular continuous spectrum is
generic for open sets of ows on the torus allowing also to show that open sets of Hamiltonian
systems contain generic subset with both quasi-periodic as well as weakly mixing invariant tori
[418].

91. Universality

The space X of unimodular maps is the set of twice continuously dierentiable even maps
f : [−1, 1] → [−1, 1] satisfying f (0) = 1 f ′′ (x) < 0 and λ = f (1) < 0. The Feigenbaum-
Cvitanovi¢ functional equation (FCE) is g = T g with T (g)(x) = λ g(g(λx)). The map T is
1

a renormalization map.

Theorem: There exists an analytic hyperbolic xed point of T .

The rst proof was given by Oscar Lanford III in 1982 (computer assisted). See [366, 367].
That proof also established that the xed point is hyperbolic with a one-dimensional unstable
manifold and positive expanding eigenvalue. This explains some universal features of uni-
modular maps found experimentally in 1978 by Mitchell Feigenbaum and which is now called
Feigenbaum universality. The result has been ported to area preserving maps [202].

92. Compactness

Let X be a compact metric space (X, d). The Banach space C(X) of real-valued continuous
functions is equipped with the supremum norm. A closed subset F ⊂ C(X) is called uniformly
bounded if for every x the supremum of all values f (x) with f ∈ F is bounded. The set F is
called equicontinuous if for every x and every ϵ > 0 there exists δ > 0 such that if d(x, y) < δ ,
then |f (x) − f (y)| < ϵ for all f ∈ F . A set F is called precompact if its closure is compact.
The Arzelà-Ascoli theorem is:
Theorem: Equicontinuous uniformly bounded sets in C(X) are precompact.

The result also holds on Hausdor spaces and not only metric spaces. In the complex, there
is a variant called Montel's theorem which is the fundamental normality test for holomorphic
functions: an uniformly bounded family of holomorphic functions on a complex domain G is
normal meaning that its closure is compact with respect to the compact-open topology. The
compact-open topology in C(X, Y ) is the topology dened by the sub-base of all continuous
maps fK,U : f : K → U , where K runs over all compact subsets of X and U runs over all open
subsets of Y .

40
OLIVER KNILL

93. Geodesic

The geodesic distance d(x, y) between two points x, y on a Riemannian manifold (M, g) is
dened as the length of the shortest geodesic γ connecting x with y . This renders the manifold
a metric space (M, d). We assume it is locally compact, meaning that every point x ∈ M has
a compact neighborhood. A metric space is called complete if every Cauchy sequence in M
has a convergent subsequence. (A sequence xk is called a Cauchy sequence if for every ϵ > 0,
there exists n such that for all i, j > n one has d(xi , xj ) < ϵ.) The local existence of dierential
equations assures that the geodesic equations exist for small enough time. This can be restated
that the exponential map v ∈ Tx M → M assigning to a point v ̸= 0 in the tangent space
Tx M the solution γ(t) with initial velocity v/|v| and t ≤ |v|, and γ(0) = x. A Riemannian
manifold M is called geodesically complete if the exponential map can be extended to the
entire tangent space Tx M for every x ∈ M . This means that geodesics can be continued for all
times. The Hopf-Rinow theorem assures:

Theorem: Completeness and geodesic completeness are equivalent.

The theorem was named after Heinz Hopf and his student Willi Rinow who published it in
1931. See [354, 188].

94. Crystallography

A wall paper group is a discrete subgroup of the Euclidean symmetry group E2 of the
plane. Wall paper groups classify two-dimensional patterns according to their symmetry. In
the plane R2 , the underlying group is the group E2 of Euclidean plane symmetries which
contain translations rotations or reections or glide reections. This group is the group
of rigid motions. It is a three dimensional Lie group which according to Klein's Erlangen
program characterizes Euclidean geometry. Every element in E2 can be given as a pair
(A, b), where A is an orthogonal matrix and b is a vector. A subgroup G of E2 is called discrete
if there is a positive minimal distance between two elements of the group. This implies the
crystallographic restriction theorem assuring that only rotations of order 2, 3, 4 or 6 can
appear. This means only rotations by 180, 120, 90 or 60 degrees can occur in a Wall paper
group.

Theorem: There are 17 wallpaper groups

The rst proof was given by Evgraf Fedorov in 1891 and then by George Polya in 1924. in
three dimensions there are 230 space groups and 219 types if chiral copies are identied. In
space there are 65 space groups which preserve the orientation. See [556, 299, 376].

95. Quadratic forms

A symmetric square
Pn matrix Q of size n × n with integer entries denes a integer quadratic
form Q(x) = i,j=1 Qij xi xj . It is called positive if Q(x) > 0 whenever x ̸= 0. A positive
integral quadratic form is called universal if its range is N. For example, by the Lagrange
four square theorem, the form Q(x1 , x2 , x3 , x4 ) = x21 + x22 + x23 + x24 is universal. The
Conway-Schneeberger fteen theorem tells

Theorem: Q is universal if it has {1, . . . 15} in the range.

41
FUNDAMENTAL THEOREMS

The interest in quadratic forms started in the 17'th century especially about numbers which
can be represented as sums x2 + y 2 . Lagrange, in 1770 proved the four square theorem. In
1916, Ramajujan listed all diagonal quaternary forms which are universal. The 15 theorem
was proven in 1993 by John Conway and William Schneeberger (a student of Conway's in a
graduate course given in 1993). There is an analogue theorem for integral positive quadratic
forms, these are dened by positive denite matrices Q which take only integer values. The
binary quadratic form x2 + xy + y 2 for example is integral but not an integer quadratic form
because the corresponding matrix Q has fractions 1/2. In 2005, Manjul Bhargava and Jonathan
Hanke proved the 290 theorem, assuring that an integral positive quadratic form is universal if
it contains {1, . . . , 290} in its range. [149].

96. Sphere packing

A sphere packing in Rd is an arrangement of non-overlapping unit spheres in the d-dimensional

Euclidean space Rd with volume measure µ. It is known since [292] that packings with maximal
densities exist. Denote by Br (x)
S the ball of radius r centered at x ∈ R . If X is the set of
d

centers of the sphere and P = x∈X B1 (x) is the union of the unit
R balls centered at points in
X , then the density of the packing is dened as ∆d = lim sup Br (0) P dµ/ Br (0) 1 dµ. The
R

sphere packing problem is now solved in 5 dierent cases:

Theorem: Optimal sphere packings are known for d = 1, 2, 3, 8, 24.
√
The one-dimensional case ∆1 = 1 is trivial. The case ∆2 = π/ 12 was known since Axel Thue
in 1910 but proven only by Lásló Fejes Toóth in 1943.√The case d = 3 was called the Kepler
conjecture as Johannes Kepler conjectured ∆3 = π/ 18. It was settled by Thomas Hales in
1998 using computer assistance. A complete formal proof appeared in 2015. The case d = 8 was
settled by Maryna Viazovska who proved in 2017 [722] that ∆8 = π 4 /384 and also established
uniqueness. The densest packing in the case d = 8 is the E8 lattice. The proof is based on
linear programming bounds developed by Henry Cohn and Noam Elkies in 2003. Later with
other collaborators, she also covered the case d = 24. The densest packing in dimension 24 is
the Leech lattice. For sphere packing see [156, 155].

97. Sturm theorem

Given a square free real-valued polynomial p let pk denote the Sturm chain, p0 = p, p1 = p′ ,
p2 = p0 mod p1 , p3 = p1 mod p2 etc. Let σ(x) be the number of sign changes ignoring zeros
in the sequence p0 (x), p1 (x), . . . , pm (x).

Theorem: The number of distinct roots of p in (a, b] is σ(b) − σ(a).

Sturm proved the theorem in 1829. He found his theorem on sequences while studying solutions
of dierential equations Sturm-Liouville theory and credits Fourier for inspiration. See [592].

98. Smith Normal form

A integer m × n matrix A is said to be expressible in Smith normal form if there exists an

invertible m × m matrix S and an invertible n × n matrix T so that SM T is a diagonal matrix
Diag(α1 , . . . , αr , 0, 0, 0) with αi |αi+1 . The integers αi are called elementary divisors. They
can be written as αi = di (A)/di−1 (A), where d0 (A) = 1 and dk (A) is the greatest common

42
OLIVER KNILL

divisor of all k × k minors of A. The Smith normal form is called unique if the elementary
divisors αi are determined up to a sign.
Theorem: Any integer matrix has a unique Smith normal form.
The result was proven by Henry John Stephen Smith in 1861. The result holds more generally
in a principal ideal domain, which is an integral domain (a ring R in which ab = 0 implies
a = 0 or b = 0) in which every ideal (an additive subgroup I of the ring such that ab ∈ I if
a ∈ I and b ∈ R) is generated by a single element.

99. Spectral perturbation

A complex valued matrix A is self-adjoint = Hermitian if A∗ = A, where A∗ij = Aji . The

spectral theorem assures that A has real eigenvalues Given two selfadjoint complex n × n
matrices A, B with eigenvalues α1 ≤ α2 ≤ · · · ≤ αn and β1 ≤ β2 ≤ · · · ≤ βn , one has the
Lidskii-Last theorem:
Pn Pn
Theorem: j=1 |αj − β j | ≤ i,j=1 |A − B|ij .

The result has been deduced

the l1 distance of the matrices. This is handy as we often know the matrices A, B explicitly
rather than the eigenvalues γj of A − B .

100. Radon transform

In order to solve the tomography problem like magnetic resonance imaging (MRI) of
nding the density function g(x, y, z) of a three dimensional body, one looks at a slice f (x, y) =
R y, c), where z = c is kept constant and measures the Radon transform R(f )(p, θ) =
g(x,
{x cos(θ)+y sin(θ)=p}
f (x, y) ds. This quantity is the absorption rate due to nuclear magnetic
resonance along the line L of polar angle α in distance p from the center. Reconstructing
f (x, y) = g(x, y, c) for dierent c allows to recover the tissue density g and so to see inside
the body".
Theorem: The Radon transform can be diagonlized and so pseudo inverted.

We only need that the Fourier series f (r, ϕ) = n fn (r)einϕPconverges

P∞ uniformly for all r > 0 and
P
that fn (r) has a Taylor series. The expansion f (r, ϕ) = n∈Z k=1 fn,k ψn,k with ψn,k (r, ϕ) =
R π/2
r−k einϕ is an eigenfunction expansion with eigenvalues λn,k = 2 0 cos(nx) cos(x)(k−1) dx =
Γ(k+1)
π
2k−1 ·k
· Γ( k+n+1 )Γ( k−n+1 )
. The inverse problem is subtle due to the existence of a kernel
2 2
spanned by {ψn,k | (n + k) odd , |n| > k}. One calls it an ill posed problem in the sense of
Hadamard. The Radon transform was rst studied by Johann Radon in 1917 [331].

101. Linear programming

Given two vectors c ∈ Rm and b ∈ Rn , and a n × m matrix A, a linear program is the

variational problem on Rm to maximize f (x) = c · x subject to the linear constraints Ax ≤ b
and x ≥ 0. The dual problem is to minimize b · y subject to to AT y ≥ c, y ≥ 0. The maximum

43
FUNDAMENTAL THEOREMS

principle for linear programming is tells that the solution is on the boundary of the convex
polytop formed by the feasable region dened by the constraints.

Theorem: Local optima of linear programs are global and on the boundary
.
Since the solutions are located on the vertices of the polytope dened by the constraints the
simplex algorithm for solving linear programs works: start at a vertex of the polytop, then
move along the edges along the gradient until the optimum is reached. If A = [2, 3] and
x = [x1 , x2 ] and b = 6 and c = [3, 5] we have n = 1, m = 2. The problem is to maximize
f (x1 , x2 ) = 3x1 + 5x2 on the triangular region 2x1 + 3x2 ≤ 6, x1 ≥ 0, x2 ≥ 0. Start at (0, 0),
the best improvement is to go to (0, 2) which is already the maximum. Linear programming is
used to solve practical problems in operations research. The simplex algorithm was formulated
by George Dantzig in 1947. It solves random problems nicely but there are expensive cases in
general and it is possible that cycles occur. One of the open problems of Steven Smale asks
for a strongly polynomial time algorithm deciding whether a solution of a linear programming
problem exists. [534]

102. Random Matrices

A random matrix A is given by an n × n array of independent, identically distributed

√ random
variables Aij of zero mean and standard deviation 1. The eigenvalues λ j of A/ n dene a
discrete measure µn = j δλj called spectral measure of A. The circular law on the
P

complex plane C is the probability measure µ0 = 1D /π , where D = {|z| ≤ 1} is the unit disk.
A sequence νn of probability measures converges weakly orR in law to ν if for every continuous
and bounded function f : C → C one has f (z) dνn (z) → f (z) dν(z). The circular law is:
R

Theorem: Almost surely, the spectral measures converge µn → µ0 .

One can think of An as a sequence of larger and larger matrix valued random variables. The
circular law tells that the eigenvalues ll out the unit disk in the complex plane uniformly when
taking larger and larger matrices. It is a kind of central limit theorem. An older version due
to Eugene Wigner from 1955 is the semi-circular law telling that √ in the self-adjoint case, the
now real measures µn converge to a distribution with density 4 − x2 /(2π) on [−2, 2]. The
circular law was stated rst by Jean Ginibre in 1965 and Vyacheslav Girko 1984. It was proven
rst by Z.D. Bai in 1997. Various authors have generalized it and removed more and more
moment conditions. The latest condition was removed by Terence Tao and Van Vu in 2010,
proving so the above fundamental theorem of random matrix theory". See [702].

103. Diffeomorphisms

Let M be a compact Riemannian surface and T : M → M a C 2 -dieomorphism. A Borel

probability measure µ on M is T -invariant if µ(T (A)) = µ(A) for all A ∈ A. It is called
ergodic if T (A) = A implies µ(A) = 1 or µ(A) = 0. The Hausdor dimension dim(µ) of
a measure µ is dened as the Hausdor dimension of the smallest Borel set A of full measure
µ(A) = 1. The entropy hµ (T ) is the Kolmogorov-Sinai entropy of the measure-preserving
dynamical system (X, T, µ). For an ergodic surface dieomorphism, the Lyapunov exponents
λ1 , λ2 of (X, T, µ) are the logarithms of the eigenvalues of A = limn→∞ [(dT n (x))∗ dT n (x)]1/(2n) ,
which is a limiting Oseledec matrix and constant µ almost everywhere due to ergodicity. Let

44
OLIVER KNILL

λ(T, µ) denote the Harmonic mean of λ1 , −λ2 . The entropy-dimension-Lyapunov theorem

tells that for every T -invariant ergodic probability measure µ of T , one has:

Theorem: hµ = dim(µ)λ/2.

This formula has become famous because it relates entropy", fractals" and chaos", which are
all rock star" notions also outside of mathematics. The theorem implies in the case of Lebesgue
measure preserving symplectic transformation, where dim(µ) = 2 and λ1 = −λ2 that entropy
= Lyaponov exponent" which is a formula of Pesin given by hµ (T ) = λ(T, µ). A similar result
holds for circle dieomorphims or smooth interval maps, where hµ (T ) = dim(µ)λ(T, µ). The
notion of Hausdor dimension was introduced by Felix Hausdo in 1918. Entropy was dened
in 1958 by Nicolai Kolmogorov and in general by Yakov Sinai in 1959, Lyapunov exponents
were introduced with the work of Valery Oseledec in 1965. The above theorem is due to Lai-
Sang Young who proved it in 1982. P Francois Ledrapier and Lai-Sang Young proved in 1985
that in arbitrary dimensions, hµ = j λj γj , where γj are dimensions of µ in the direction
of the Oseledec spaces Ej . This is called P the+ Ledrappier-Young formula. It implies the
Margulis-Ruelle inequality hµ (T ) ≤ λ
j j (T ) , where λ+
j = max(λ j , 0) and λj (T ) are the
Lyapunov exponents. In the case of a smooth P T -invariant measure µ or more generally, for
SRB measures, there is an equality hµ (T ) = j λ+ j (T ) which is called the Pesin formula. See
[390, 203].

104. Linearization

If F : M → M is a globally Lipschitz continuous function on a nite dimensional vector space

M , then the dierential equation x′ = F (x) has a global solution x(t) = f t (x(0)) (a local
by Picard-Lindelöf 's existence theorem and global by the Grönwall inequality). An
equilibrium point of the system is a point x0 for which F (x0 ) = 0. This means that x0 is a
xed point of a dierentiable mapping f = f 1 , the time-1-map. We say that f is linearizable
near x0 if there exists a homeomorphism ϕ from a neighborhood U of x0 to a neighborhood V of
x0 such that ϕ ◦ f ◦ ϕ−1 = df . The Sternberg-Grobman-Hartman linearization theorem
is
Theorem: If f is hyperbolic, then f is linearizable near x0 .

The theorem was proven by D.M. Grobman in 1959 Philip Hartman in 1960 and by Shlomo
Sternberg in 1958. This implies the existence of stable and unstable manifolds passing
through x0 . One can show more and this is due to Sternberg who wrote a series of papers
starting 1957 [675]: if A = df (x0 ) satises no resonance condition meaning that no relation
λ0 = λ1 · · · λj exists between eigenvalues of A, then a linearization to order n is a C n map
ϕ(x) = x + g(x), with g(0) = g ′ (0) = 0 such that ϕ ◦ f ◦ ϕ−1 (x) = Ax + o(|x|n ) near x0 . We
say then that f can be n-linearized near x0 . The generalized result tells that non-resonance
xed points of C n maps are n-linearizable near a xed point. See [459].

105. Fractals

An iterated function system is a nite set of contractions {fi }P

i=1 on a complete metric space
n

(X, d). The corresponding Hutchingson operator H(A) = i fi (A) is then a contraction
on the Hausdor metric of sets and has a unique xed point called the attractor S of the
iterated function system. The denition of Hausdor dimension is as follows: dene hsδ (A) =

45
FUNDAMENTAL THEOREMS

|Ui |s , where U is a δ -cover of A. And hs (A) = limδ→0 Hδs (A). The Hausdor
P
inf U ∈U i
dimension dimH (S) nally is the value s, where hs (S) jumps from ∞ to 0. If the contractions
are maps with contraction factors 0 < λj < 1 then the Hausdor dimension of the attractor S
can be estimated with the the similarity dimension of Pnthe contraction vector (λ1 , . . . , λn ):
this number is dened as the solution s of the equation i=1 λi = 1.
−s

Theorem: dimhausdorff (S) ≤ dimsimilarity (S).

There is an equality if fi are all ane contractions like fi (x) = Ai λx + βi with the same
contraction factor and Ai are orthogonal and βi are vectors (a situation which generates a
large class of popular fractals). For equality one also has to assume that there is an open
non-empty set G such that Gi = fi (G) are disjoint. In the case λj = λ are all the same
then nλ−dim = 1 which implies dim(S) = − log(n)/ log(λ). For the Smith-Cantor set S ,
where f1 (x) = x/3 + 2/3, f2 (x) = x/3 and G = (0, 1). One gets with n = 2 and λ = 1/3
the dimension dim(S) = log(2)/ log(3). For the Menger carpet with n = 8 ane maps
fij (x, y) = (x/3 + i/3, y/3 + j/3) with 0 ≤ i ≤ 2, 0 ≤ j ≤ 2, (i, j) ̸= (1, 1), the dimension is
log(8)/ log(3). The Menger sponge is the analogue object with n = 20 ane contractions
in R3 and has dimension log(20)/ log(3). For the Koch curve on the interval, where n = 4
ane contractions of contraction factor 1/3 exist, the dimension is log(4)/ log(3). These are all
fractals, sets with Hausdor dimension dierent from an integer. The modern formulation of
iterated function systems is due to John E. Hutchingson from 1981. Michael Barnsley used the
concept for a fractal compression algorithms, which uses the idea that storing the rules
for an iterated function system is much cheaper than the actual attractor. Iterated function
systems appear in complex dynamics in the case when the Julia set is completely disconnected,
they have appeared earlier also in work of Georges de Rham 1957. See [486, 227].

106. Strong law of small numbers

Like the Bayes theorem or the Pigeon hole principle which both are too simple to qualify as
theorems" but still are of utmost importance, the Strong law of small numbers" is not really
a theorem but a fundamental mathematical principle. It is more fundamental than a
specic theorem as it applies throughout mathematics. It is for example important in Ramsey
theory: The statement is put in dierent ways like "There aren't enough small numbers to
meet the many demands made of them". [302] puts it in the following catchy way:

Theorem: You can't tell by looking.

The point was made by Richard Guy in [302] who states two corollaries": supercial similar-
ities spawn spurious statements" and early exceptions eclipse eventual essentials".
The statement is backed up with countless many examples (a list of 35 are given in [302]).
n
Famous are Fermat's claim that all Fermat primes 22 + 1 are prime or the claim that the
number π3 (n) of primes of the form 4k + 3 in {1, . . . , n} is larger than π1 (n) of primes of the
form 4k + 1 so that the 4k + 3 primes win the prime race. Hardy and Littlewood showed
however π3 (n) − π1 (n) changes sign innitely often. The prime number theorem extended to
arithmetic progressions shows π1 (n) ∼ n/(2 log(n)) and π3 (n) ∼ n/(2 log(n)) but the density
of numbers with π3 (n) > π1 (n) is larger than 1/2. This is the Chebyshev bias. Experiments
then suggested the density to be 1 but also this is false: the density of numbers for which
π3 (n) > π1 (n) is smaller than 1. The principle is important in a branch of combinatorics called

46
OLIVER KNILL

Ramsey theory. But it not only applies in discrete mathematics. There are many examples,
where one can not tell by looking. When looking at the boundary of the Mandelbrot set for
example, one would tell that it is a fractal with Hausdor dimension between 1 and 2. In real-
ity the Hausdor dimension is 2 by a result of Mitsuhiro Shishikura [646]. Mandelbrot himself
thought rst by looking" that the Mandelbrot set M is disconnected. Douady and Hubbard
proved that M is connected [194].

107. Ramsey Theory

Let G be the complete graph with n vertices. An edge labeling with r colors is an assignment
of r numbers to the edges of G. A complete sub-graph of G is called a clique. If it is has
s vertices, it is denoted by Ks . A graph G is called monochromatic if all edges in G have
the same color. (We use in here coloring as a short for edge labeling and not in the sense
of chromatology where an edge coloring assumes that intersecting edges have dierent colors.)
Ramsey's theorem is:

Theorem: For large n, every r-colored Kn contains a monochromatic Ks .

So, there exist Ramsey numbers R(r, s) such that for n ≥ R(r, s), the edge coloring of one
of the s-cliques can occur. A famous case is the identity R(3, 3) = 6. Take n = 6 people. It
denes the complete graph G. If two of them are friends, color the edge blue, otherwise red.
This friendship graph therefore is a r = 2 coloring of G. There are 78 possible colorings. In
each of them, there is a triangle of friends or a triangle of strangers. In a group of 6 people,
there are either a clique with 3 friends or a clique of 3 complete strangers. The theorem was
proven by Frank Ramsey in 1930. Paul Erdoes asked to give explicit estimated R(s) which is
the least integer n such that any graph on n vertices contains either a clique of size s (a set
where all are connected to each other) or an independent set of size s (a set where none are
connected to each other). Graham for example asks whether the limit R(n)1/n exists. Ramsey
theory also deals other sets: van der Waerden's theorem from 1927 for example tells that
if the positive integers N are colored with r colors, then for every k , there exists an N called
W (r, k) such that the nite set {1 . . . , N } has an arithmetic progression with the same color.
For example, W (2, 3) = 9. Also here, it is an open problem to nd a formula for W (r, k) or
even give good upper bounds. [284] [283]

108. Poincaré Duality

For a dierentiable Riemannian n-manifold (M, g) there is an exterior derivative d = dp

which maps p-forms Λp to (p + 1)-forms Λp+1 . For p = 0, the derivative is called the gradient,
for p = 1, the derivative is called the curl and for p = d − 1, the derivative is the adjoint of
divergence. The Riemannian metric denes an inner product ⟨f, h⟩ on Λp allowing so to see
Λp as part of a Hilbert space and to dene the adjoint d∗ of d. It is a linear map from Λp+1
to Λp . The exterior derivative denes so the self-adjoint Dirac operator D = d + d∗ and the
Hodge Laplacian L = D 2 = dd∗ + d∗ d which now leaves each Λp invariant. Hodge theory
assures that dim(ker(L|Λp )) = bp = dim(H p (M )), where H p (M ) are the p'th cohomology
group, the kernel of dp modulo the image of dp−1 . Poincaré duality is:

Theorem: If M is orientable n-manifold, then bk (M ) = bn−k (M ).

47
FUNDAMENTAL THEOREMS

The Hodge dual of f ∈ Λp is dened as the unique ∗g ∈ Λn−p satisfying ⟨f, ∗g⟩ = ⟨f ∧ g, ω⟩
where ω is the volume form. One has d∗ f = (−1)d+dp+1 ∗ d ∗ f and L ∗ f = ∗Lf . This implies
that ∗ is a unitary map from ker(L|Λp ) to ker(L|Λd−p ) proving so the duality theorem. For
n = 4k , one has ∗2 = 1, allowing to dene the Hirzebruch signature σ := dim{u|Lu =
0, ∗u = u} − dim(u|Lu = 0, ∗u = −u}. The Poinaré duality theorem was rst stated by Henri
Poincaré in 1895. It took until the 1930ies to clean out the notions and make it precise. The
Hodge approach establishing an explicit isomorphism between harmonic p and n − p forms
appears for example in [166].

109. Rokhlin-Kakutani approximation

Let T be an automorphism of a probability space (Ω, A, µ). This means µ(A) = µ(T (A)) for
all A ∈ A. The system T is called aperiodic, if the set of periodic points P = {x ∈
Ω | ∃n > 0, T n x = x} has measure µ(P ) = 0. A set B ∈ A which has the property that
B, T (B), . . . , T n−1 (B) are disjoint is called a Rokhlin tower. If the measure of the tower is
µ(B ∪ · · · ∪ T n−1 (B)) = nµ(B) = 1 − ϵ, we call it an (1 − ϵ)-Rokhlin tower. We say T can be
approximated arbitrary well by Rokhlin towers, if for all ϵ > 0, there is an (1 − ϵ) Rokhlin
tower.

Theorem: An aperiodic T can be approximated well by Rokhlin towers.

The result was proven by Vladimir Abramovich Rokhlin in his thesis 1947 and independently
by Shizuo Kakutani in 1943. The lemma can be used to build Kakutani skyscrapers, which
are nice partitions associated to a transformation. This lemma allows to approximate an
aperiodic transformation T by a periodic transformations Tn . Just change T on T n−1 (B) so
that Tnn (x) = x for all x. The theorem has been generalized by Donald Ornstein and Benjamin
Weiss to higher dimensions like Zd actions of measure preserving transformations where the
periodicity assumption is replaced by the assumption that the action is free: for any n ̸= 0,
the set T n (x) = x has zero measure. See [158, 248, 312].

110. Lax approximation

On the group X of all measurable, invertible transformations on the d-dimensional torus

X = Td which preserve the Lebesgue volume measure, one has the metric
δ(T, S) = |δ(T (x), S(x))|∞ ,
where δ is the geodesic distance on the at torus and where | · |∞ is the L∞ supremum norm.
Lets call (Td , T, µ) a toral dynamical system if T is a homeomorphism, a continuous
transformation with continuous inverse. A cube exchange transformation on Td is a peri-
odic,
Qd piecewise ane measure-preserving transformation T which permutes rigidly all the cubes
[k
i=1 i /n, (ki + 1)/n], where k i ∈ {0, . . . , n − 1} . Every point in T d
is T periodic. A cube ex-
change transformation is determined by a permutation of the set {1, . . . , n} . If it is cyclic,
d

the exchange transformation is called cyclic. A theorem of Lax [464] states that every toral
dynamical system can approximated in the metric δ by cube exchange transformations. The
approximations can even be cyclic [16].

Theorem: Toral systems can be approximated by cyclic cube exchanges

48
OLIVER KNILL

The result is due to Peter Lax [464]. The proof of this result uses Hall's marriage theorem in
graph theory (for a 'book proof' of the later theorem, see [12]). Periodic approximations of
symplectic maps work surprisingly well for relatively small n (see [594]). On the Pesin region
this can be explained in part by the shadowing property [390]. The approximation by cyclic
transformations make long time stability questions look dierent [311].

111. Sobolev embedding

All functions are dened on Rn , integrated over Rn and assumed to be locally integrable
R

meaning that for every compact set K the Lebesgue integral K |f | dx is nite. For functions
R
in Cc∞ which serve as test functions, partial derivatives ∂i = ∂/∂xi and more general
dierential operators D k = ∂xk11 · · · ∂xknn can be applied. A function g is a weak partial
derivative of f if f ∂i ϕdx = − gϕdx for all test functions ϕ. For p ∈ [1, ∞), the Lp space is
R R

{f | |f |p dx < ∞}. The Sobolev space W k,p is the set of functions for which all k 'th weak
R
derivatives are in Lp . So W 0,p = Lp . The Hölder space C r,α with r ∈ N, α ∈ (0, 1] is dened
as the set of functions for which all r'th derivatives are α-Hölder continuous. It is a Banach
space with norm max|k|≤r ||Dk f ||∞ + max|k|=r ||Dk f ||α , where ||f ||∞ is the supremum norm
and ||f ||α is the Hölder coecient supx̸=y |f (x) − f (y)|/|x − y|α . The Sobolev embedding
theorem is

Theorem: If n < p and l = r + α < k − n/p, one has W k,p ⊂ C r,α .

([653] states this as Theorem 6.3.6) gives some history: generalized functions appeared
rst in the work of Oliver Heaviside in the form of operational calculus. Paul Dirac used
the formalism in quantum mechanics. In the 1930s, Kurt Otto Friedrichs, Salomon Bocher
and Sergei Sobolev dene weak solutions of PDE's. Schwartz used the Cc∞ functions, smooth
functions of compact support. This means that the existence of k weak derivatives implies the
existence of actual derivatives. For p = 2, the spaces W k are Hilbert spaces and the theory a
bit simpler due to the availability of Fourier theory, where tempered distributions ourished.
In that case, one can dene for any real s > 0 the Hilbert space H s as the subset of all f ∈ S ′
for which (1 + |ξ|2 )s/2 fˆ(ξ) is in L2 . The Schwartz test functions S consists of all C ∞ functions
having bounded semi norms ||ϕ||k = max|α|+|β|≤k ||xβ Dα ϕ||∞ < ∞ where α, β ∈ Nn . Since S is
larger than the set of smooth functions of compact support, the dual space S ′ is smaller. They
are tempered distributions. Sobolev emedding theorems like above allow to show that weak
solutions of PDE's are smooth: for example, if the Poisson problem ∆f = V f with smooth V
is solved by a distribution f , then f is smooth. [98, 653]

112. Whitney embedding

A smooth n-manifold M is a metric space equipped with a cover Uj = ϕ−1 j (B) with B = {x ∈
R | |x| < 1}) or Uj = ϕj (H) with H = {x ∈ R | |x| < 1, x0 ≥ 0}) with δH = {x ∈
n 2 −1 n 2

H | x0 = 0} such that the homeomorphisms ϕj : Uj → B or ϕj : Uj → H lead to smooth

transition maps ϕkj = ϕj ϕ−1
k from ϕk (Uj ∩ Uk ) to ϕj (Uj ∩ Uk ) which have the property that all
restrictions of ϕkj from δϕk (Uj ∩ Uk ) to δϕj (Uj ∩ Uk ) are smooth too. The boundary δM of
M now naturally is a smooth (n − 1) manifold, the atlas being given by the sets Vj = ϕj (δH)
for the indices j which map ϕj : Uj → H . Two manifolds M, N are dieomorphic if there
is a renement {Uj , ϕj } of the atlas in M and a renement {Vj , ψj } of the atlas in N such

49
FUNDAMENTAL THEOREMS

that ϕj (Uj ) = ψj (Vj ). A manifold M can be smoothly embedded in Rk if there is a smooth

injective map f from M to Rk such that the image f (M ) is dieomorphic to M .

Theorem: Any n-manifold M can be smoothly embedded in R2n .

The theorem has been proven by Hassler Whitney in 1926 who also was the rst to give a
precise denition of manifold in 1936. The standard assumption is that M is second countable
Hausdor but as every smooth nite dimensional manifold can be upgraded to be Riemannian,
the simpler metric assumption is no restriction of generality. The modern point of view is to
see M as a scheme over Euclidean n-space, more precisely as a ringed space, that is locally
the spectrum of the commutative ring C ∞ (B) or C ∞ (H). The set of manifolds is a category
in which the smooth maps M → N are the morphisms. The cover Uj denes an atlas and
the transition maps ϕj allow to port notions like smoothness from Euclidean space to M . The
maps ϕ−1j : B → M or ϕj : H → M parametrize the sets Uj . [745].
−1

113. Artificial intelligence

Like meta mathematics or reverse mathematics, the eld of articial intelligence (AI)
is a part of mathematics which also reects on subject itself. It is related of data science
(algorithms for data mining, and statistics) computation theory (like complexity theory)
language theory and especially grammar and evolutionary dynamics, optimization
problems (like solving optimal transport or extremal problems) solving inverse problems
(like developing algorithms for computer vision or optical character or speech recognition),
cognitive science as well as pedagogy in education (human or machine learning and human
motivation). There is no apparent fundamental theorem" of AI, (except maybe for Marvin
Minsky's "The most ecient way to solve a problem is to already know how to solve it." [524],
which is a surprisingly deep and insightful statement as modern AI agents like Alexa, Siri,
Google Home, IBM Watson or Cortana demonstrate; they compute little, they just know
or look up - or annoy you to look it up yourself...). But there is a theorem of Lebowski
on machine super intelligence which taps into the rather uncharted territory of machine
motivation

Theorem: No AI will bother after hacking its own reward function.

The picture [442] is that once the AI has gured out the philosophy of the Dude" in the Cohen
brothers movie Lebowski, also repeated mischiefs does not bother it and it goes bowling".
Objections are brushed away with Well, this is your, like, opinion, man". Two examples
of human super intelligent units who have succeeded to hack their own reward function are
Alexander Grothendieck or Grigori Perelman. The Lebowski theorem is due to Joscha Bach
[36], who stated this theorem of super intelligence in a tongue-in-cheek tweet. From a
mathematical point of view, the smartest way to solve" an optimal transport problem is to
change the utility function. On a more serious level, the smartest way to solve" the continuum
hypothesis is to change the axiom system. This might look like a cheat, but on a meta level,
more creativity is possible. Precursor's of the Lebowski theme is Stanislav Lem's notion of a
mimicretin [466], a computer that plays stupid in order, once and for all, to be left in peace
or the machine in [6] who develops humor and enjoys fooling humans with the answer to the
ultimate question: 42". This document entry is the analogue to the ultimate question: What
is the fundamental theorem of AI"?

50
OLIVER KNILL

114. Stokes Theorem

On a smooth orientable n-dimensional manifold M , one has Λp , the vector bundle of smooth
dierential p-forms. As any p-form F induces an induced volume form on a p-dimensional
sub-manifold G dening so an integral G F . The exterior derivative d : Λp → Λp+1 satis-
R

es d2 = 0 and denes an elliptic complex. There is a natural Hodge duality isomorphism
given called Hodge star" ∗ : Λp → Λn−p . Given a p-form F ∈ Λp and a (p + 1)-dimensional
compact oriented sub-manifold G of M with boundary δG compatible with the orientation of
G, we have Stokes theorem:`

Theorem: ⟨G, dF ⟩ = G dF = δG F = ⟨δG, F ⟩.

R R

The theorem states that the exterior derivative d is dual to the boundary operator δ . If G
is a connected 1-manifold with boundary, it is a curve with boundary δG = {A, B}. A 1-
form can be integrated over the curve G by choosing the on G induced volume form r′ (t)dt
Rb
given by a curve parametrization [a, b] → G and integrate a F (r(t)) · r′ (t)dt, which is
the line integral. Stokes theorem is then the fundamental theorem of line integrals.
Take
R b a 0-form′ f which is a scalar function the derivative df is the gradient F = ∇f . Then
a
∇f (r(t)) · r (t) dt = f (B) − f (A). If G is a two dimensional surface with boundary δG and F
is a 1-form, then the 2-form dF is the curl of F . If G is given as a surface parametrization
r(u, v), one can applyR dF on the pair of tangent vectors ru , rv and integrate this dF (ru , rv ) over
the surface GR to get G dF . The Kelvin-Stokes theorem tells that this is the same than the
line integral δG F . In the case of M = R3 , where F = P dx + Qdy + Rdz can be identied with
a vector eld FR R= [P, Q, R] and dF = ∇R ×R F and integration of a 2-form H over a parametrized
manifold G is R
H(r(u, v))(ru , rv ) = R
H(r(u, v)·ru ×rv dudv we get the classical Kelvin-
Stokes theorem. If F is a 2-form, then dF is a 3-form which can be integrated over a 3-
manifold G. As d : Λ2 → Λ3 can via Hodge duality naturally R R R be paired with d0R: RΛ → Λ ,
∗ 1 0

which is the divergence, the divergence theorem G

div(F ) dxdydz = δG
F · dS
relates a triple integral with a ux integral. Historical milestones start with the development
of the fundamental theorem of calculus (1666 Isaac Newton, 1668 James Gregory, Isaac
Barrow 1670 and Gottfried Leibniz 1693); the rst rigorous proof was done by Cauchy in 1823
(the rst textbook appearance in 1876 by Paul du Bois-Reymond). See [94]. In 1762, Joseph-
Louis Lagrange and in 1813 Karl-Friedrich Gauss look at special cases of divergence theorem,
Mikhail Ostogradsky in 1826 and George Green in 1828 cover the general case. Green's theorem
in two dimensions was rst stated by Augustin-Louis Cauchy in 1846 and Bernhard Riemann
in 1851. Stokes theorem rst appeared in 1854 as an exam question but the theorem has
appeared already in a letter of William Thomson to Lord Kelvin in 1850, hence also the name
Kelvin-Stokes theorem. Vito Volterra in 1889 and Henri Poincaré in 1899 generalized the
theorems to higher dimensions. Dierential forms were introduced in 1899 by Élie Cartan. The
d notation for exterior derivative was introduced in 1902 by Theodore de Donder. The ultimate
formulation above is from Cartan 1945. We followed Katz [398] who noticed that only in 1959,
this version has started to appear in textbooks.

115. Moments

The Hausdor moment problem asks for R 1 necessary and sucient conditions for a sequence
µn to be realizable as a moment sequence 0 x dµ(x) for a Borel probability measure on [0, 1].
n

One can study the problem also in higher dimensions: for a multi-index n = (n1 , . . . , nd ) denote

51
FUNDAMENTAL THEOREMS

by µn = xn1 1 . . . xnd d dµ(x) the n'th moment of a signed Borel measure µ on the unit cube
R

I d = [0, 1]d ⊂ Rd . We say µn is a moment conguration if there exists a measure µ which

has µn as moments. If ei denotes the standard basis in Zd , dene the partial
dierence

n n d ni
(∆i a)n = an−ei − an and ∆ = i ∆i . We write n = i=1 ni and
k ki k ki
Q Q Q
= i=1
Pn Pn1 Pnd k ki
and k=0 = k1 =0 · · · kd =0 . We say moments µn are Hausdor bounded if there exists a

Pn n
constant C such that k=0 | (∆k µ)n | ≤ C for all n ∈ Nd . The theorem of Hausdor-
k
Hildebrandt-Schoenberg is

Theorem: Hausdor bounded moments µn are generated by a measure µ.

The above result is due to Theophil Henry Hildebrandt and Isaac Jacob Schoenberg from 1933.
[338]. Moments also allow to compare measures: a measure µ is called uniformly absolutely
continuous with respect to ν if there exists f ∈ L∞ (ν) such that µ = f ν . A positive probability
measure µ is uniformly absolutely continuous with respect to a second probability measure ν
if and only if there exists a constant C such that (∆k µ)n ≤ C · (∆k ν)n for all k, n ∈ Nd .
In particular it gives a generalization of a result of Felix Hausdor from 1921 [324] assuring
that µ is positive if and only if (∆k µ)n ≥ 0 for all k, n ∈ Nd . An other special case is that
µ is uniformly
absolutely continuous with respect to Lebesgue measure ν on I d if and only if
n
|∆k µn | ≤ (n + 1)d for all k and n. Moments play an important role in statistics, when
k
looking at moment generating functions n µn tn of random variables X , where µn = E[X n ]
P
as well as in multivariate statistics, when looking at random vectors (X1 , . . . , Xd ), where
µn = E[X1n1 · · · Xdnd ] are multivariate moments. See [420, 632]

116. Martingales

A sequence of random variables X1 , X2 , . . . on a probability space (Ω, A, P) is called a discrete

time stochastic process. We assume the Xk to be in L2 meaning that the expectation
E[Xk2 ] < ∞ for all k . Given a sub-σ algebra B of A, the conditional expectation E[X|B] is
the projection of L2 (Ω, A, P ) to L2 (Ω, B, P ). Extreme cases are E[X|A] = X and E[X|{∅, Ω}] =
E[X]. A nite set Y1 , . . . , Yn of random variables generates a sub- σ -algebra B of A, the
smallest σ -algebra for which all Yj are still measurable. Write E[X|Y1 , · · · , Yn ] = E[X|B],
where B is the σ -algebra generated by Y1 , · · · Yn . A discrete time stochastic process is called
a martingale if E[Xn+1 |X1 , · · · , Xn ] = E[Xn ] for all n. If the equal sign is replaced with ≤
then the process
Pn is called a super-martingale, if ≥ it is a sub-martingale . The random
k=1 Yk dened by a sequence of independent L random variables Yk is an
2
walk Xn =
example of a martingale because independence implies E[Xn+1 |X1 , · · · , Xn ] = E[Xn+1 ] which
is E[Xn ] by the identical distribution assumption. If X and M are two discrete time stochastic
processes, dene
Pn the martingale transform (=discrete Ito integral) X · M as the process
(X · M )n = k=1 Xk (Mk − Mk−1 ). If the process X is bounded meaning that there exists a
constant C such that E[|Xk |] ≤ C for all k , then if M is a martingale, also X ·M is a martingale.
The Doob martingale convergence theorem is

Theorem: For a bounded super martingale X , then Xn converges in L1 .

52
OLIVER KNILL

The convergence theorem can be used to prove the optimal stopping time theorem which
tells that the expected value of a stopping time is the initial expected value. In nance it
is known as the fundamental theorem of asset pricing. If τ is a stopping time adapted
to a martingale Xk , it denes the random variable Xτ and E[Xτ ] = E[X0 ]. For a super-
martingale one has ≥ and for a sub-martingale ≤. The proof is obtained by dening the
Pmin(τ,n)−1
stopped process Xnτ = X0 + k=0 (Xk+1 − Xk ) which is a martingale transform and so
a martingale. The martingale convergence theorem gives a limiting random variable Xτ and
because E[Xnτ ] = E[X0 ] for all n, E[Xτ ] = E[X0 ]. This is rephrased as you can not beat the
system" [750]. A trivial implication is that one can not for example design a strategy allowing
to win in a fair game by designing a clever stopping time" like betting on red" in roulette if
6 times black" in a row has occurred. Or to follow the strategy to stop the game, if one has a
rst positive total win, which one can always do by doubling the bet in case of losing a game.
Martingales were introduced by Paul Lévy in 1934, the name martingale" (referring to the just
mentioned doubling betting strategy) was added in a 1939 probability book of Jean Ville. The
theory was developed by Joseph Leo Doob in his book of 1953. [193]. See [750].

117. Theorema Egregium

A Riemannian metric on a two-dimensional manifold S denes the quadratic form I = Edu2 +

2F dudv + Gdv 2 called rst fundamental form on the surface. If r(u, v) is a parameterization
of S , then E = ru · ru , F = ru · rv and G = rv · rv . The second fundamental form of S
is II = Ldu2 + 2M dudv + N dv 2 , where L = ruu · n, M = ruv · n, N = rvv · n, written using
the normal vector n = (ru × rv )/|ru × rv |. The Gaussian curvature K = det(II)/det(I) =
(LN − M 2 )/(EG − F 2 ). depends on the embedding r : R → S in space R3 , but it actually only
depends on the intrinsic metric, the rst fundamental form. This is the Theorema egregium
of Gauss:
Theorem: The Gaussian curvature only depends on the Riemannian metric.
Gauss himself already gave explicit formulas, but a formula of Brioschi gives the curvature K
explicitly as a ratio of determinants involving E, F, G as well as and rst and second derivatives
of them. In the case when the surface is given as a graph z = f (x, y), one can give K =
D/(1 + |∇f |2 )2 , where D = (fxx fyy − fxy
2
) is the discriminant and (1 + |∇f |2 )2 = det(II). If
the surface is rotated in space so that (u, v) is a critical point for f , then the discriminant
D is equal to the curvature. One can see the independence of the embedding also from the
Puiseux formula K = 3(|S0 (r)| − S(r))/(πr 3 ), where |S0 (r)| = 2πr is the circumference of
the circle S0 (r) in the at case and |S(r)| is the circumference of the geodesic circle of radius
r on S . The theorem Egregium also follows from Gauss-Bonnet as the later allows to write the
curvature in terms of the angle sum of a geodesic innitesimal triangle with the angle sum π
of a at triangle. As the angle sums are entirely dened intrinsically, the curvature is intrinsic.
The Theorema Egregium" was found by Karl-Friedrich Gauss in 1827 and published in 1828
in Disquisitiones generales circa supercies curvas". It is not an accident, that Gauss was
occupied with concrete geodesic triangulation problems too.

118. Entropy

Given a random variable X on a probability space (Ω, A, P) which is nite and discrete in the
sense that it takes only nitely many values, the entropy is dened as S(X) = − x px log(px ),
P
where px = P[X = x]. To compare, for a random variable X with cumulative distribution

53
FUNDAMENTAL THEOREMS

function F (x)
R = P[X ≤ x] having a continuous derivative F = f , the entropy is dened as
′

S(X) = − f (x) log(f (x)) dx, allowing the value −∞ if the integral does not converge. (We
always read p log(p) = 0 if p = 0.) In the continuous case, one also calls this the dierential
entropy. Two discrete random variables X, Y are called independent if one can realize them
on a product probability space Ω = A × B so that X(a, b) = X(a) and Y (a, b) = Y (b) for
some functions X : A → R, Y : B → R. Independence implies that the random variables are
uncorrelated, E[XY ] = E[X]E[Y ] and that the entropy adds up S(XY ) = S(X) + S(Y ).
We can write S(X) = E[log(W (x))], where W is the Wahrscheinlichkeit" random variable
assigning to ω ∈ Ω the value W (ω) = 1/px if X(ω) = xP . Let us say, a functional on discrete
random variables is additive if it is of the form H(X) = x f (px ) for some continuous function
f for which f (t)/t is monotone. We say it is multiplicative if H(XY ) = H(X) + H(Y ) for
independent random variables. The functional is normalized if H(X) = log(4) if X is a
random variable taking two values {0, 1} with probability p0 = p1 = 1/2. Shannon's theorem
is:

Theorem: Any normalized, additive and multiplicative H is entropy S .

The word entropy" was introduced by Rudolf Clausius in 1850 [611]. Ludwig Bolzmann saw
the importance of dtd
S ≥ 0 in the context of heat and wrote in 1872 S = kB log(W ), where
W (x) = 1/px is the inverse Wahrscheinlichkeit" thatPa state has the value x. His equation
is understood as the expectation S = kB E[log(W )] = x px log(W (x)) which is the Shannon
entropy, introduced in 1948 by Claude Shannon in the context of information theory. (Shannon
characterized functionals H with the property that if H is continuous in p, then for random
variables Hn with px (Hn ) = 1/n, one has H(Xn )/n ≤ H(Xm )/m if n ≤ m and if X, Y are
two random variables so that theP nite σ -algebras A dened by X is a sub-σ -algebra B dened
by Y , then H(Y ) = H(X) + x px H(Yx ), where Yx (ω) = Y (ω) for ω ∈ {X = x}. One can
show that these Shannon conditions are equivalent to the combination of being additive and
multiplicative. In statistical thermodynamics, where px is the probability of a micro-state,
then kB S is also called the Gibbs entropy, where kB is the Boltzmann constant. For
general random variables X on (Ω, A, P) and a nite σ -sub-algebra B , Gibbs looked in 1902 at
course grained entropy, which is the entropy of the conditional expectation Y = E[X|B|,
which is now a random variable Y taking only nitely many values so that entropy is dened.
See [643].

119. Mountain Pass

Let E be a real Banach space, and let f ∈ C 1 (E, R) be a continuously dierentiable function
from E to R. (The Fréchet derivative f ′ at a point x ∈ H is a linear operator A satisfying
f (x + h) − f (x) − Ah = o(h) for all h → 0. ). A point x ∈ H is called a critical point of
f if f ′ (x) = 0. The function f satises the Palais-Smale condition, if every sequence xk in
E for which {f (xk )} is bounded and for which f ′ (xk ) → 0, has a convergent subsequence in
the closure of {xk }k∈N . If f (a) = 0 and f (y) ≥ ϵ > 0 in Sr (a) and f (b) ≤ 0 for some |b| > ϵ,
we say that a and b are separated by a mountain range. Let us call a critical point to be a
saddle for that mountain range, if f has a mini-max critical value c there. This means that
c = inf γ∈Γ maxt∈[0,1] f (γ(t)) ≥ ϵ with Γ = {r ∈ C([0, 1], E), r(0) = a, r(1) = b}.

Theorem: If f is Palais-Smale and has a mountain range, it has a saddle.

54
OLIVER KNILL

The rst Mountain Pass Theorem appears in [18] in 1973 (Theorem 2.1). The name Moun-
tain Pass" had been popularized by Louis Nirenberg. [18] has been motivated by Ljusternik
and Schnirelmann. If f is even, stronger results about the multiplicity are possible. The rst
mountain pass idea goes back to George David Birkho in 1917. A folklore result in calculus
is that a dierentiable function on Rn with a pair of isolated local minima and a condition
avoiding critical points at ∞ forces a third critical point. Ljusternik and Schnirelmann did
some early work on minimax methods. The crucial Palais-Smale compactness condition
which makes the theorem work in innite dimensions appeared in 1964. [32] calls it condition
(C), a notion which already appeared in the original paper [561]. However, even for the special
case when E is a Hilbert space, there seems have been no earlier analogue of the Mountain
Pass Theorem in innite dimensions than the 1973 result. See also [591].

120. Exponential sums

Given a smoothPfunction f : R → R which maps integers to integers, P one can look at expo-
b n−1
x=a exp(iπf (x)) An example is the Gaussian sum x=0 exp(iαx ). There
2
nential sums
are lots of interesting relations and estimates. One P
of the magical formulas is the Landsberg-
p−1
Schaar relations for the nite sums S(q, p) = p x=0 exp(iπx q/p).
√1 2

Theorem: If p, q are positive and odd integers, then S(2q, p) = eiπ/4 S(−p, 2q).

√ P
One has S(1, p) = (1/ p) p−1 x=0 exp(ix /p) = 1 for all positive integers p and S(2, p) =
2
√ p−1
(eiπ/4 / p) x=0 exp(2ix2 /p) = 1 if p = 4k + 1 and i if p = 4k − 1. The method of expo-
P
nential sums has been expanded especially by Vinogradov's papers [724] and used for number
theory like for quadratic reciprocity [535]. The topic is of interest also outside of number the-
ory. Like in dynamical systems theory as Fürstenberg has demonstrated. An ergodic theorist
would look at the dynamical system T (x, y) = (x + 2y + 1, y + 1) on the 2-torus T2 = R2 /(πZ)2
and dene gα (x, y) = exp(iπxα). Since the orbit of this toral map is T n (1, 1) = (n2 , n), the
exponential sum can be written as a Birkho sum p−1 k=0 gq/p (T (1, 1)) which is a particular
k
P
orbit of a dynamical system. Results as those mentioned above show that the random walk
√
grows like p, similarly as in a random setting. Now, since the dynamical system is minimal,
the growth rate should not depend on the initial point and πq/p should be replaceable by any
irrational α and no more be linked to the length of the orbit. The problem is then to study
Pt−1
the growth rate of the stochastic process S (x, y) = k=0 g(T k (x, y)) (= sequence of random
t

variables) for any continuous g with zero expectation which by Fourier boils down to look at
exponential sums. Of course S t (x, y)/t → 0 by Birkho's ergodic theorem, but as in the law
of iterated logarithm one is interested in precise growth rates. This can be subtle. Already in
the simpler case of an integrable T (x) = x + α on the 1-torus, there is Denjoy-Koskma theory
which shows that the growth rate depends on Diophantine properties of πα. Unlike for irra-
tional rotations, the Fürstenberg type skew systems T leading to the theta functions are not
integrable: it is not conjugated to a group translation (there is some randomness, even-so weak
as Kolmogorov-Sinai entropy is zero). The dichotomy between structure and randomness and
especially the similarities between dynamical and number theoretical set-ups has been discussed
in [701].

55
FUNDAMENTAL THEOREMS

121. Sphere theorem

A compact Riemannian manifold M is said to have positive curvature, if all sectional

curvatures are positive. The sectional curvature at a point x ∈ M in the direction of the 2-
dimensional plane Σ ⊂ Tx M is dened as the Gaussian curvature of the surface expx (Σ) ⊂ M at
the point. In terms of the Riemannian curvature tensor R : Tx M 4 → R and an orthonormal
basis {u, v} spanning Σ, this is R(u, v, u, v). The curvature is called quarter pinched, if it
the sectional curvature is in the interval (1, 4] at all points x ∈ M . In particular, a quarter
pinched manifold is a manifold with positive curvature. We say here, a compact Riemannian
manifold is a sphere if it is homeomorphic to a sphere. The sphere theorem is:

Theorem: A simply-connected quarter pinched manifold is a sphere

The theorem was proven by Marcel Berger and Wilhelm Klingenberg in 1960. That a pinching
condition would imply a manifold to be a sphere had been conjectured already by Heinz Hopf.
Hopf himself proved in 1926 that constant sectional curvature implies that M is even isometric
to a sphere. Harry Rauch, after visiting Hopf in Zürich in the 1940's proved that a 3/4-
pinched simply connected manifold is a sphere. In 2007, Simon Brendle and Richard Schoen
proved that the theorem even holds if the statement M is a d-sphere (meaning that M is
dieomorphic to the Euclidean d-sphere {|x|2 = 1} ⊂ Rd+1 ). This is the dierentiable sphere
theorem. Since John Milnor had given in 1956 examples of spheres which are homeomorphic
but not dieomorphic to the standard sphere (so called exotic spheres, spheres which carry
a smooth maximal atlas dierent from the standard one), the dierentiable sphere theorem
is a substantial improvement on the topological sphere theorem. It needed completely new
techniques, especially the Ricci ow ġ = −2Ric(g) of Richard Hamilton which is a weakly
parabolic partial dierential equation deforming the metric g and uses the Ricci curvature
Ric of g . See [57, 92].

122. Word problem

The word problem in a nitely presented group G = (g|r) with generators g and re-
lations r is the problem to decide, whether a given set of two words v, w represent the same
group element in G or not. The word problem is not solvable in general. There are concrete
nitely presented groups in which it is not. The following theorem of Boone and Higman relates
the solvability to algebra. A group is simple if its only normal subgroup is either the trivial
group or then the group itself.

Theorem: Finitely presented simple groups have a solvable word problem.

More generally, if G ⊂ H ⊂ K where H is simple and K is nitely presented, then G has

a solvable word problem. Max Dehn proposed the word problem in 1911. Pyotr Novikov in
1955 proved that the word problem is undecidable for nitely presented groups. William W.
Boone and Graham Higman proved the theorem in 1974 [78]. Higman would in the same year
also nd an example of an innite nitely presented simple group. The non-solvability of the
word problem implies the non-solvability of the homeomorphism problem for n-manifolds with
n ≥ 4. See [760].

56
OLIVER KNILL

123. Finite simple groups

A nite group (G, ∗, 1) is a nite set G with an operation ∗ : G×G → G and 1 element, such
that the operation is associative (a ∗ b) ∗ c = a ∗ (b ∗ c), for all a, b, c, such that a ∗ 1 = 1 ∗ a = a
for every a and such that every a has an inverse a−1 satisfying a ∗ a−1 = 1. A group G is
simple if the only normal subgroups of G are the trivial group {1} or the group itself. A
subgroup H of G is called normal if gH = Hg for all g . Simple groups play the role of the
primes in the set of integers. A theorem of Jordan-Hölder is that the composition series
of G (with simple groups as quotients) is unique up to permutations and isomorphisms. The
classication theorem of nite simple groups is

Theorem: Every nite simple group is cyclic, alternating, Lie or sporadic.

There are 18 so called regular families of nite simple groups made of cyclic, alternating
and 16 Lie type groups. Then there are 26 so called sporadic groups, in which 20 are happy
groups as they are subgroups or sub-quotients of the monster and 6 are pariahs, outcasts
which are not under the spell of the monster. The classication was a huge collaborative eort
with more than 100 authors, covering 500 journal articles. According to Daniel Gorenstein, the
classication was completed in 1981 and xes were applied until 2004. (Michael Aschbacher
and Stephen Smith resolved the last problems which lasted several years leading to a full proof
of 1300 pages.) A second generation cleaned-out proof written with more details is under way
and currently has 5000 pages. Some history is given in [662].

124. God number

Given a nite nitely presented group G = (g|r) like for example the Rubik group. It denes
the Cayley graph Γ in which the group elements are the nodes and where two nodes a, b
are connected if there is a generator x in in g such that xa = b. The diameter of a graph
is the largest geodesic distance between two nodes in Γ. It is also called the God number
of the puzzle. The Rubik cube is an example of a nitely presented group. The original
3 × 3 × 3 cube allows to permute the 26 boundary cubes using the 18 possible rotations of the 6
faces as generators. From the X = 8!12!38 212 possible ways to physically build the cube, only
|G| = X/12 = 43252003274489856000 are present in the Rubik group G. Some of the positions
quarks" [276] can not be realized but combinations of them mesons" or baryons" can.

Theorem: The God number of the Rubik cube is 20.

This means that from any position, one could, in principle solve the puzzle in 20 moves. Note
that one has to specify clearly the generators of the group as this denes the Cayley graph
and so a metric on the group. The lower bound 18 had already been known in 1980 because a
counting of all the possible moves with 17 steps produced less elements. The lower bound 20
came in 1995 when Michael Reid proved that the super-ip position (where the edges are
all ipped but corners are correct) needs 20 moves. In July 2010, using about 35 CPU years,
a team around Tomas Rokicki established that the God number is 20. They partitioned the
possible group positions into roughly 2 billion sets of 20 billions positions each. Using symmetry
they reduced it to 55 million positions, then found solutions for any of the positions in these
sets. [224] It appears silly to put a God number computation as a fundamental theorem, but
the status of the Rubik cube is enormous as it has been one of the most popular puzzles for

57
FUNDAMENTAL THEOREMS

decades and is a prototype for many other similar puzzles, the choice can be defended. 1
One can ask to compute the God number of any nitely presented nite group. Interesting
in general is the complexity of evaluating that functional. The simplest nontrivial Rubik
cuboid is the 2 × 2 × 1 one. It has 6 positions and 2 generators a, b. The nitely presented
group is {a, b|a2 = b2 = (ab)3 = 1} which is the dihedral group D3 . Its group elements are
G = {1, a = babab, ab = baba, aba = bab, abab = ba, ababa = b}. The group is isomorphic to the
symmetry group of the equilateral triangle, generated by the two reections a, b at two
altitude lines. The God number of that group is 3 because the Cayley graph Γ is the cyclic
graph C6 . The puzzle solver has here no other choice than solving the puzzle", because one is
forced to make non-trivial move in each step. See [382] or [47] for general combinatorial group
theory and [613] for a recent auto biography of Erno Rubik.

125. Sard Theorem

Let f : M → N be a smooth map between smooth manifolds M, N of dimension dim(M ) = m

and dim(N ) = n. A point x ∈ M is called a critical point of f , if the Jacobian n × m matrix
df (x) has rank both smaller than m and n. If C is the set of critical points, then f (C) ⊂ N
is called the critical set of f . The volume measure on N is a choice of a volume form,
obtained for example after introducing a Riemannian metric. Sard's theorem is

Theorem: The critical set of f : M → N has zero volume measure in N .

The theorem applied to smooth map f : M → R tells that for almost all c, the set f −1 (c)
is a smooth hypersurface of M or then empty. The later can happen if f is constant. We
assumed C ∞ but one can relax the smoothness assumption of f . If n ≥ m, then f needs only
to be continuously dierentiable. If n < m, then f needs to be in C m−n+1 . The case when
N is one-dimensional has been covered by Antony Morse (who is unrelated to Marston Morse)
in 1939 and by Arthur Sard in general in 1942. A bit confusing is that Marston Morse (not
Antony) covered the case m = 1, 2, 3 and Sard in the case m = 4, 5, 6 in unpublished papers
before as mentioned in a footnote to [626]. Sard also notes already that examples of Hassler
Whitney show that the smoothness condition can not be relaxed. Sard formulated the results
for M = Rm and N = Rn (by the way with the same choice f : M → N as done here and not
as in many other places). The manifold case appears for example in [676].

126. Elliptic curves

An elliptic curve is a plane algebraic curve dened by the points satisfying the Weierstrass
equation y 2 = x3 + ax + b = f (x). One assumes the curve to be non-singular, meaning
that the discriminant ∆ = −16(4a3 + 27b2 ) is not zero. This assures that there are no
cusps nor multiple roots for the simple reason that the explicit solution formulas for roots of
f (x) = 0 involves only square roots of ∆. A curve is an Abelian variety, if it carries an
Abelian algebraic group structure, meaning that the addition of a point denes a morphism of
the variety.

Theorem: Elliptic curves are Abelian varieties.

1I presented the God number problem in the 80ies as an undergraduate in a logic seminar of Ernst Specker
and the choice of topic had been objected to by Specker himself as a too narrow problem". But the Rubik
cube and its group properties have cult status". The object was one of the triggers for me to study math.

58
OLIVER KNILL

The theorem seems rst have been realized by Henri Poincaré in 1901. Weierstrass before had
used the Weierstrass P function earlier in the case of elliptic curves over the complex plane. To
dene the group multiplication, one uses the chord-tangent construction: rst add point O
called the point at innity which serves as the zero in the group. Then dene −P as the point
obtained by reecting at the x-axes. The group multiplication between two dierent points
P, Q on the curve is dened to be −R if R is the point of intersection of the line through P, Q
with the curve. If P = Q, then R is dened to be the intersection of the tangent with the curve.
If there is no intersection, that is if P = Q is an inection point, then one denes P + P = −P .
Finally, dene P + O = O + P = P and P + (−P ) = 0. This recipe can be explicitly given
in coordinates allowing to dene the multiplication in any eld of characteristic dierent from
2 or 3. The group structure on elliptic curves over nite elds provides a rich source of nite
Abelian groups which can be used for cryptological purposes, the so called elliptic curve
cryptograph ECC. Any procedure, like public key, Die-Hellman or factorization attacks on
integers can be done using groups given by elliptic curves. [727].

127. Billiards

Billiards are the geodesic ow on a smooth compact n-manifold M with boundary. The dy-
namics is extended through the boundary by applying the law of reection. While the ow
of the geodesic X t is Hamiltonian on the unit tangent bundle SM , the billiard ow is only
piecewise smooth and also the return map to the boundary is not continuous in general but it
is a map preserving a natural volume so that one can look at ergodic theory. Already dicult
are at 2-manifolds M homeomorphic to a disc having convex boundary homeomorphic to a
circle. For smooth convex tables this leads to a return map T on the annulus X = T × [−1, 1]
which is C r−1 smooth if the boundary is C r [195]. It denes a monotone twist map: in the
sense that it preserves the boundary, is area and orientation preserving and satises the twist
condition that y → T (x, y) is strictly monotone. A Bunimovich stadium is the 2-manifold
with boundary obtained by taking the convex hull of two discs of equal radius in R with dif-
ferent center. The billiard map is called chaotic, if it is ergodic and the Kolmogorov-Sinai
entropy is positive. By Pesin theory, this metric entropy is the Lyapunov exponent which
is the exponential growth rate of the Jacobian dT n (and constant almost everywhere due to
ergodicity). There are coordinates in the tangent bundle of the annulus X in which dT is the
composition of a horizontal shear with strength L(x, y), where L is the trajectory length before
the impact with a vertical shear with strength −2κ/ sin(θ) where κ(x) is the curvature of the
curve at the impact x and y = cos(θ), with impact angle θ ∈ [0, π] between the tangent and
the trajectory.

Theorem: The Bunimovich stadium billiard is chaotic.

Jacques Hadmard in 1898 and Emile Artin in 1924 already looked at the geodesic ow on
a surface of constant negative curvature. Yakov Sinai constructed in 1970 the rst chaotic
billiards, the Lorentz gas or Sinai billiard. An example, where Sinai's result applies is the
hypocycloid x1/3 + y 1/3 = 1. The Bernoulli property was established by Giovanni Gallavotti
and Donald Ornstein in 1974. In 1973, Vladimir Lazutkin proved that a generic smooth convex
two-dimensional billiard can not be ergodic due to the presence of KAM whisper galleries
using Moser's twist map theorem. These galleries are absent in the presence of at points
(by a theorem of John Mather) or points, where the curvature is unbounded (by a theorem of
Andrea Hubacher [357]). Leonid Bunimovich [105] constructed in 1979 the rst convex chaotic

59
FUNDAMENTAL THEOREMS

billiard. No smooth convex billiard table with positive Kolmogorov-Sinai entropy is known. A
candidate is the real analytic x4 + y 4 = 1. Various generalizations have been considered like in
[753]. A detailed proof that the Bunimovich stadium is measure theoretically conjugated to a
Bernoulli system (the shift on a product space) is surprisingly dicult: one has to show positive
Lyapunov exponents on a set of positive measure. Applying Pesin theory with singularities
(Katok-Strelcyn theory [391]) gives a Markov process. One needs then to establish ergodicity
using a method of Eberhard Hopf of 1936 which requires to understand stable and unstable
manifolds [131]. See [698, 726, 532, 274, 390, 131] for sources on billiards.

128. Uniformization

A Riemann surface is a one-dimensional complex manifold. This means is is a connected

two-dimensional real manifold so that the transition functions of the atlas are holomorphic
mappings of the complex plane. It is simply connected if its fundamental group is trivial
(equivalently, its genus b1 is zero). Two Riemann surfaces are conformally equivalent or
simply equivalent if they are equivalent as complex manifolds, that is if there is a bijective
morphism f between them. A map f : S → S ′ is holmorphic if for every choice of coordinates
ϕ : S → C and ψ ′ : S ′ → C, the maps ϕ′ ◦ f ◦ ϕ−1 are holomorphic. The curvature is the
Gaussian curvature of the surface. The uniformization theorem is:
Theorem: A Riemann surface is equivalent to one with constant curvature.

This is a geometrization statement" and means that the universal cover of every Riemann sur-
face is conformally equivalent to either a Riemann sphere (positive curvature), a complex
plane (zero curvature) or a unit disk (negative curvature). It implies that any region G ⊂ C
whose complement contains two or more points has a universal cover which is the disk. It
especially implies the Riemann mapping theorem assuring that any region U homeomor-
phic to a disk is conformally equivalent to the unit disk (see [117]). For a detailed treatment
of compact Riemann surfaces, see [266]. It also follows that all Riemann surfaces (without
restriction of genus) can be obtained as quotients of these three spaces: for the sphere one
does not have to take any quotient, the genus 1 surfaces = elliptic curves can be obtained
as quotients of the complex plane and any genus g > 1 surface can be obtained as quotients
of the unit disk. Since every closed 2-dimensional orientable surface is characterized by their
genus g , the uniformization theorem implies that any such surface admits a metric of constant
curvature. Teichmüller theory parametrizes the possible metrics, and there are 3g − 3 dimen-
sional parameters for g ≥ 2, whereas for g = 0 there is one and for g = 1 a moduli space
H/SL2 (Z). In higher dimensions, closest to the uniformization theorem is the Killing-Hopf
theorem telling that every connected complete Riemannian manifold of constant sectional
curvature and dimension n is isometric to the quotient of a sphere Sn , Euclidean space Rn
or Hyperbolic n-space Hn restating that constant curvature geometry is either elliptic, para-
bolic=Euclidean or yyperbolic geometry. Complex analysis has rich applications in complex
dynamics [49, 520, 117] and relates to much more geometry [509].

129. Control Theory

A Kalman lter is an optional estimates algorithm of a linear dynamic system from a series
of possibly noisy measurements. The idea is similar as in a dynamic Bayesian network or
hidden Markov model. The lter applies both to dierential equations ẋ(t) = Ax(t) +
Bu(t) + Gz(t) as well as discrete dynamical system x(t + 1) = Ax(t) + Bu(t) + Gz(t), where

60
OLIVER KNILL

u(t) is external input and z(t) input noise given by independent identically distributed
usually Gaussian random variables. Kalman calls this a Wiener problem. One does not
see the state x(t) of the system but some output y(t) = Cx(t) + Du(t). The lter then lters
out" or learns" the best estimate x∗ (t) from the observed data y(t). The linear space X is
dened as the vector space spanned by the already observed vectors. The optimal solution is
given by a sophisticated dynamical data tting.

Theorem: The optimal estimate x∗ is the projection of y onto X .

This formulation is the informal 1-sentence description which can be found already in Kalman's
article. Kalman then gives explicit formulas which generate from the stochastic dierence
equation a concrete deterministic linear system. For a modern exposition, see [489]. The
Kalman lter is named after Rudolf Kalman who wrote [385] in 1960. Kalman's paper is
one of the most cited papers in applied mathematics. The ideas were used both in the Apollo
and Space Shuttle program. Similar ideas have been introduced in statistics by the Danish
astronomer Thorvald Thiele and the radar theoretician Peter Swerling. There are also nonlinear
version of the Kalman lter which is used in nonlinear state estimation like navigation systems
and GPS. The nonlinear version uses a multi-variate Taylor series expansion to linearise about
a working point. See [225, 489].

130. Zariski main theorem

A variety is called normal if it can be covered by open ane varieties whose rings of functions
are normal. A commutative ring is called normal if it has no non-zero nilpotent elements and
is integrally closed in its complete ring of fractions. For a curve, a one-dimensional variety,
normality is equivalent to being non-singular but in higher dimensions, a normal variety still
can have singularities. The normal complex variety is called unibranch at a point x ∈ X if
there are arbitrary small neighborhoods U of x such that the set of non-singular points of U is
connected. Zariski's main theorem can be stated as:

Theorem: Any closed point of a normal complex variety is unibranch.

Oscar Zariski proved the theorem in 1943. To cite [533], it was the nal result in a foundational
analysis of birational maps between varieties. The 'main Theorem' asserts in a strong sense
′
that the normalization (the integral closure) of a variety X is the maximal variety X birational
′
over X, such that the bres of the map X → X are nite. A generalization of this fact
became Alexandre Grothendieck's concept of the 'Stein factorization' of a map. The result
has been generalized to schemes X , which is called unibranch at a point x if the local ring
at x is unibranch. A generalization is the Zariski connectedness theorem from 1957: if
f : X → Y is a birational projective morphism between Noetherian integral schemes, then the
inverse image of every normal point of Y is connected. Put more colloquially, the bres of a
birational morphism from a projective variety X to a normal variety Y are connected. It implies
that a birational morphism f : X → Y of algebraic varieties X, Y is an open embedding into a
neighbourhood of a normal point y if f −1 (y) is a nite set. Especially, a birational morphism
between normal varieties which is bijective near points is an isomorphism. [321, 533]

61
FUNDAMENTAL THEOREMS

131. Poincaré's last theorem

A homeomorphism T of an annulus X = T × [0, 1] is called measure preserving if it

preserves the Lebesgue (area) measure and preserves the orientation of X . As a homeomor-
phism it induces also homeomorphisms on each of the two boundary circles. It is called twist
homeomorphism, if it rotates the boundaries in dierent directions.

Theorem: A twist map on an annulus has at least two xed points.

This is called the Poincaré-Birkho theorem or Poincaré's last theorem. It was stated by
Henri Poincaré in 1912 in the context of the three body problem. Poincaré already gave an
index argument for the existence of one xed point gives a second. The existence of the rst
was proven by George Birkho in 1913 and in 1925, where Birkho added the precise argument
for the existence of the second. The twist condition is necessary because the rotation of the
annulus (r, θ) → (r, θ +1) has no xed point. Also area-preservation is necessary as the example
(r, θ) → (r(2 − r), θ + 2r − 1) shows. [68, 99]

132. Geometrization

A closed manifold M is a smooth compact manifold without boundary. A closed manifold

is simply connected if it is connected and the fundamental group is trivial meaning that
every closed loop in M can be pulled together to a point within M : (if r : T → M is a
parametrization of a closed path in M , then there exists a continuous map R : T × [0, 1] → M
such that R(0, t) = r(t) and R(1, t) = r(0).) We say that M is 3-sphere if M is homeomorphic
to the 3-dimensional unit sphere {(x1 , x2 , x3 , x4 ) ∈ R4 | x21 + x22 + x23 + x23 = 1}.

Theorem: A closed simply connected 3-manifold is a 3-sphere.

Henri Poincaré conjectured this in 1904. It remained the Poincaré conjecture until its proof
by Grigori Perelman in 2006 [527]. In higher dimensions, the statement was known as the
generalized Poincaré conjecture, the case n > 4 had been proven by Stephen Smale in
1961 and the case n = 4 by Michael Freedman in 1982. A d-homotopy sphere is a closed
d-manifold that is homotopic to a d-sphere. (A manifold M is homotopic to a manifold N if
there exists a continuous map f : M → N and a continuous map g : N → M such that the
composition g ◦ f : M → M is homotopic to the identity map on M (meaning that there exists
a continuous map F : M × [0, 1] → M such that F (x, 0) = g(f (x)) and F (x, 1) = x and the
map f ◦ g : N → N is homotopic to the identity on N .) The Poincaré conjecture itself, the case
d = 3, was proven using a theory built by Richard Hamilton who suggested to use the Ricci
ow to solve the conjecture and more generally the geometrization conjecture of William
Thurston: every closed 3-manifold can be decomposed into prime manifolds which are of
˜
8 types, the so called Thurston geometries S 3 , E 3 , H 3 , S 2 × R, H 2 × R, SL(2, R), Nil, Solv.
If the statement M is a sphere is replaced by M is dieomorphic to a sphere, one has
the smooth Poincaré conjecture. Perelman's proof veries this also in dimension d = 3.
The smooth Poincaré conjecture is false in dimension d ≥ 7 as d-spheres then can admit non-
standard smooth structures, so called exotic spheres constructed rst by John Milnor. For
d = 5 it is true following result of Dennis Barden from 1964. It is also true for d = 6. For
d = 4, the smooth Poincaré conjecture is open, and called the last man standing among all
great problems of classical geometric topology" [480]. See [528] for details on Perelman's proof.

62
OLIVER KNILL

133. Steinitz theorem

A non-empty nite simple connected graph G is called planar if it can be embedded in the
plane R2 without self crossings. The abstract edges of the graph are then realized as actual
curves in the plane connecting two vertices which are realized as actual points in the plane.
The embedding of G in the plane subdivides the plane now into a nite collection F of simply
connected regions called faces. (In the two dimensional plane, a region is simply connected
if it is homeomorphic to a disc.) Let v = |V | is the number of vertices, e = |E| the number
of edges and f = |F | is the number of faces. A planar graph is called polyhedral if it can
be realized as a convex polyhedron, a convex hull of nitely many points in R3 . A graph is
called 3-connected, if it remains connected also after removing one or two of its vertices. A
connected, planar 3-connected graph is also called a 3-polyhedral graph. The Polyhedral
formula of Euler combined with Steinitz's theorem means:

Theorem: G planar ⇒ v − e + f = 2. Planar 3-connected ⇔ polyhedral.

The Euler polyhedron formula has rst been noticed in examples by René Descartes [4] and
written down in a secret notebook. It was realized by Euler in 1750 that the formula works
for general planar graphs. Euler already gave an induction proof (also in 1752) but the rst
complete proof appears have been given rst by Legendre in 1794. The Steinitz theorem was
proven by Ernst Steinitz in 1922, even so he obtained the result already in 1916. In general, a
planar graph always denes a nite generalized CW complex in which the faces are the 2-cells,
the edges are the 1-cells and the vertices are the 0-cells. The embedding in the plane denes
then a geometric realization of this combinatorial structure as a topological 2-sphere (as the
2-sphere is the compactication of the plane). The structure is not required to be achievable
in the form of a convex polyhedron. And it is in general not: take a tree graph for example,
a connected graph without triangles and without closed loops. It is planar but it is not even
2-connected. The number of vertices v and the number of edges e satisfy v − e = 1. After
embedding the tree in the plane, we have exactly one face, so that f = 1. The Euler polyhedron
formula v −e+f = 2 is veried, but the graph is far from polyhedral. Even in the extreme case,
where G is a one-point graph, the Euler formula holds: in that case there are v = 1 vertices,
e = 0 edges and f = 1 faces (given by the complement of the point in the plane) so that still
v − e + f = 2 holds The 3-connectedness assures that the realization can be done using convex
polyhedra. It is then even possible to have force the vertices of the polyhedron to be on the
integer lattice points [298, 771]. In [298], it is stated that the Steinitz theorem is the most
important and deepest known result for 3-polytopes".

134. Hilbert-Einstein action

Let (M, g) be a smooth 4-dimensional Lorentzian manifold which is asymptotically at.

(A simplication is that the Riemannian curvature tensor R is at outside a compact subset
of M but this is a bit restrictive as the Schwarzschild solution below indicates.) A Lorentzian
manifold is a 4-dimensional pseudo Riemannian manifold of signature (1, 3) which in the at
case is dx2 + dy 2 + dz 2 − dt2 . The technical condition of asymptotic atness should imply that
the volume form dµ then has the property that the scalar curvature R is in L1 (M, dµ) (which
is the case if the non-at part is compact.) One can now look at the variational problem to
nd extrema of the functional g → M Rdµ. RMore generally, one can add a Lagrangian L one
R

consider the Hilbert-Einstein functional M R/κ + Ldµ, where κ = 8πG/c4 is the Einstein

63
FUNDAMENTAL THEOREMS

constant. Let Rij be the Ricci tensor, a symmetric tensor, and Tij the energy-momentum
tensor. The Einstein eld equations are

Theorem: Gij = Rij − gij R/2 = κTij .

These are the Euler-Lagrange equations of an innite-dimensional extremization problem. The

variational problem was proposed by David Hilbert in 1915. Einstein published in the same
year the general theory of relativity. In the case of a vacuum: T = 0, solutions g of the
Einstein equations dene Einstein manifolds (M, g). An example of a solution to the vacuum
Einstein equations dierent from the at space solution is the Schwarzschild solution, which
was found also in 1915 and published in 1916. It is the metric given in spherical coordinates
as −(1 − r/ρ)c2 dt2 + (1 − r/ρ)−1 dρ2 + ρ2 dϕ2 + ρ2 sin2 ϕdθ2 , where r is the Schwarzschild
radius, ρ the distance to the singularity, θ, ϕ are the standard Euler angles (longitude and
colatitude) in calculus. The metric solves the Einstein equations for ρ > r . The at metric
−c2 dt2 +dρ2 +ρ2 dθ2 +ρ2 sin2 θdϕ2 describes the vacuum and the Schwarzschild solution describes
the gravitational eld near a massive body. Intuitively, the metric tensor g is determined by
g(v, v), and the Ricci tensor by R(v, v) which is 3 times the average sectional curvature over
all planes passing through a plane through v . The scalar curvature is 6 times the average over
all sectional curvatures passing through a point. See [165, 137].

135. Hall stable marriage

Let X be a nite set and A a family of nite subsets A of X . A transversal of A is an injective

function f : A → X such that f (A) ∈ A for all A ∈ A. The S set A satises the marriage
condition if for every nite subset B of A, one has |B| ≤ | A∈B A|. The Hall marriage
theorem is

Theorem: A has a transversal ⇔ A satises marriage condition.

The theorem was proven by Philip Hall in 1935. It implies for example that if a deck of cards
with 52 cards is partitioned into 13 equal sized piles, one can chose from each deck a card so
that the 13 cards have exactly one card of each rank. The theorem can be deduced from a
result in graph geometry: if G = (V, E) = (X, ∅) + (Y, ∅) is a bipartite graph, then a matching
in G is a collection of edges which pairwise have no common vertex. For a subset W of X , let
S(W ) denote the set of all vertices adjacent to some element in W . The theorem assures that
there is an X-saturating matching (a matching that covers X ) if and only if |W | ≤ |S(W )|
for every W ⊂ X . The reason for the name marriage" is the situation that X is a set of
men and Y a set of women and that all men are eager to marry. Let Ai be the set of women
which could make a spouse for the i'th man, then marrying everybody o is an X -saturating
matching. The condition is that any set of k men has a combined list of at least k women who
would make suitable spouses. See [100].

136. Mandelbulb

The Mandelbrot set M = M2,2 is the set of vectors c ∈ R2 for which T (x) = x2 + c leads
to a bounded orbit starting at 0 = (0, 0), where x2 has the polar coordinates (r2 , 2θ) if x
has the polar coordinates (r, θ). (The map T is just a real reformulation of the complex map
T (z) = z 2 + c in C and written in the real so that the construction can be done in arbitrary
dimensions.) The Mandelbulb set M3,8 is dened as the set of vectors c ∈ R3 for which

64
OLIVER KNILL

T (x) = x8 + c leads to a bounded orbit starting at 0 = (0, 0, 0), where x8 has the spherical
coordinates (ρ8 , 8ϕ, 8θ) if x has the spherical coordinates (ρ, ϕ, θ). Like the Mandelbrot
set, it is a compact set (just verify that for |x| > 2, the orbits go to innity). The topology
of M8 is unexplored. Also like in the complex plane, one could look at the dynamics of a
polynomials p = a0 + a1 x + · · · + ar xr in Rn . If (ρ, ϕ1 , . . . , ϕn−1 ) are spherical coordinates, then
x → xm = (ρm , mϕ1 , . . . , mϕn−1 ) is a higher dimensional power" and allows to look at the
dynamics of Tn,p (x) = p(x). This denes then a corresponding Mandelbulb Mn,p . As with all
celebrities, there is a scandal:

Theorem: There is no theorem about the Mandelbulb Mn,m for n > 2.

Except of course the just stated theorem. But you decide whether it is true of not. The
Mandelbulb set has been discovered only recently. An attempt to trace some of its history was
done in [415]: already Rudy Rucker had experimented with a variant of M3,2 in 1988. Jules
Ruis wrote me to have written a computer program in Basic" in 1997. The rst person we
know who wrote down the formulas used today is Daniel White , mentioned in a 2009 fractal
forum. Jules Ruis 3D printed the rst models in 2010. See also [86] for some information on
generating the graphics.

137. Banach Alaoglu

A Banach space X is a linear space equipped with a norm | · | dening a metric d(x, y) =
|x − y| with respect to which the space X is complete. The unit ball in X is the closed
ball {x ∈ X | |x| ≤ 1}. The dual space X ∗ of X is the linear space of linear functionals
f : X → R with the norm |f | = sup|x|≤1,x∈X |f (x)|. It is again a Banach space. The weak*
topology is the smallest topology on X ∗ which makes all maps f → f (x) continuous for all
x ∈ X.
Theorem: The unit ball in a dual Banach space X ∗ is weak* compact.

The theorem was proven in 1932 in the separable case by Stefan Banach and in 1940 in general
by Leonidas Alaoglu. The result essentially follows from Tychonov's theorem as X ∗ can be seen
as a closed subset of a product space. Banach-Alaoglu therefore relies on the axiom of choice. A
case which often appears in applications is when X = C(K) is the space of continuous functions
on a compact Hausdor space K . In that case X ∗ is the space of signed measures on K . One
implication is that the set of probability measures is compact on K . An other example are
Lp spaces (p ∈ [1, ∞), for which the dual is Lq with 1/p + 1/q = 1 (meaning q = ∞ for p = 1)
and showing that for p = 2, the Hilbert space L2 is self-dual. In the work of Bourbaki the
theorem was extended from Banach spaces to locally convex spaces (linear spaces equipped
with a family of semi-norms). Examples are Fréchet spaces (locally convex spaces which are
complete with respect to a translation-invariant metric). See [151].

138. Whitney trick

Let M be a smooth orientable simply connected d-manifold and two smooth connected sub-
manifolds K, L of dimension k and l such that k + l = d which have the property that K and L
intersect transversely in points x, y in the sense that the tangent spaces at the intersection
points span Tx M and Ty M and that they have opposite intersection sign. The two manifolds
K, L can be isotoped from each other along a disc if there exists a smooth 2-disk embedded

65
FUNDAMENTAL THEOREMS

in M such that M ∩ K and M ∩ L are single points. The disk is called a Whitney disk. The
Whitney trick or Whitney lemma is:

Theorem: Any transverse K, L of ≥ 3 manifolds in M has a Whitney disk.

See [192]. In [454] there are counter examples in d ≤ 4. The author writes there A hypothesis
of algebraic topology given by the signs of the intersection points leads to the existence of
an isotopy". The failure of the Whitney trick in smaller dimensions is one reason why some
questions in manifold theory appear hardest in three or four dimension. There is a variant
of the Whitney trick which works also in dimensions 5, where K has dimension 2 and L has
dimension 3.

139. Torsion groups

An elliptic curve E over Q is also called a rational elliptic curve. The curve E carries
an Abelian group structure where every addition of a point x → x + y is a morphism. The
torsion subgroup of E is the subgroup consisting of elements which all have nite order in
E . The Mordell-Weil theorem (which applies more generally for any Abelian variety) assures
that E = Zr ⊕ T , where T is a nite group and r is a nite number called the rank of E .
Mazur's torsion theorem states that the only possible nite orders in E are 1, 2, 3, . . . , 9, 10
and 12. Only 15 dierent torsion subgroups appear in rational elliptic curves: Z1 , . . . , Z10 , Z12
or Z2 × Z2 , Z2 × Z4 , Z2 × Z6 and Z2 × Z8 . Lets call this collection of groups the Mazur class.
The theorem is:
Theorem: The torsion group of a rational elliptic curve is in the Mazur class.

The theorem was proven by Barry Mazur in 1977. [648].

140. Coloring

A graph G = (V, E) with vertex set V and edge set E is called planar if it can be embedded in
the Euclidean plane R2 without any of the edges intersecting. By a theorem of Kuratowski, this
is equivalent to a graph theoretical statement: G does not contain a homeomorphic image of
neither the complete graph K5 nor the bipartite utility graph K3,3 . A graph coloring with k
colors is a function f : V → {1, 2, . . . , k} with the property that if (x, y) ∈ E , then f (x) ̸= f (y).
In other words, adjacent vertices must have dierent colors. The 4-color theorem is:

Theorem: Every planar graph can be colored with 4 colors.

Some graphs need 4 colors like a wheel graph having an odd number of spikes. There are
planar graphs which need less. The 1-point graph K1 needs only one color, trees needs only
2 colors and the graph K3 or any wheel graph with an even number of spikes only need 3
colors. The theorem has an interesting history: since August Ferdinand Möbius in 1840 spread
a precursor problem given to him by Benjamin Gotthold Weiske, the problem was rst known
also as the Möbius-Weiske puzzle [660]. The actual problem was rst posed in 1852 by
Francis Guthrie [490], after thinking about it with his brother Frederick, who communicated
it to his teacher Augustus de Morgan, a former teacher of Francis who told William Hamilton
about it. Arthur Cayley in 1878 put it rst in print, (but it was still not in the language of
graph theory). Alfred Kempe published a proof in 1879. But a gap was noticed by Percy John
Heawood 11 years later in 1890. There were other unsuccessful attempts like one by Peter

66
OLIVER KNILL

Tait in 1880. After considerable theoretical work by various mathematicians including Charles
Pierce, George Birkho, Oswald Veblen, Philip Franklin, Hassler Whitney, Hugo Hadwiger,
Leonard Brooks, William Tutte, Yoshio Shimamoto, Heinrich Heesch, Karl Dürre or Walter
Stromquist, a computer assisted proof of the 4-color theorem was obtained by Ken Appel
and Wolfgang Haken in 1976. In 1997, Neil Robertson, Daniel Sanders, Paul Seymour, and
Robin Thomas wrote a new computer program. Goerge Gonthier produced in 2004 a fully
machine-checked proof of the four-color theorem [751]. There is a considerable literature like
[552, 64, 249, 621, 128, 751].

141. Contact Geometry

Assume M is a smooth compact orientable (2n − 1)-manifold equipped with an auxiliary Rie-
mannian metric g . A 1-form α ∈ Λ1 (M ) denes a eld of hyperplanes ξ = ker(α) ⊂ T M .
Conversely, given a eld of hyperplanes, one can dene α = g(X, ·), where X is a local non-zero
section of the line bundle ξ ⊥ . A contact structure is a hyperplane eld ξ = dα for which
the volume form α ∧ (dα)n is nowhere zero. The 1-form α is then called a contact form and
(M, ξ) is called a contact manifold. The Reeb vector eld R is dened by dα(R, ·) = 0,
α(R) = 1. The Weinstein conjecture is a theorem in dimension 3:
Theorem: On a 3-manifold, the Reeb vector eld has a closed periodic orbit.
The theorem was proven by Cliord Taubes in 2007 using Seiberg-Witten theory. Mike Hutch-
ings with Taubes established 2 Reeb orbits under the condition that all Reeb orbits R are
non-degenerate in the sense that the linearized ow does not have an eigenvalue 1. Hutch-
ings with Dan Cristofaro-Gardiner later removed the non-degeneracy condition [363, 162] and
also showed that if the product of the actions A(γ) = γ α of the two orbits is larger than the
R

volume M α ∧ dα of the contact form, then there are three. To the history: Alan Weinstein
R

has shown already that if Y is a convex compact hypersurface in R2n , then there is a periodic
orbit. Paul Rabinovitz extended it to star-shaped surfaces. Weinstein conjectured in 1978 that
every compact hypersurface of contact type in a symplectic manifold has a closed character-
istic. Contact geometry as an odd dimensional brother of symplectic geometry has become
its own eld. Contact structures are the opposite of integrable hyperplane elds: the Frobe-
nius integrability condition α ∧ dα = 0 denes an integrable hyperplane eld forming a
co-dimension 1 foliation of M . Contact geometry is therefore a totally non-integrable hyper
plane eld". [260]. The higher dimensional case of the Weinstein conjecture is wide open [361].
Also the symplectic question whether every compact and regular energy surface H = c for a
Hamiltonian vector eld in R2n has a periodic solution is open. One knows that there are for
almost all energy values in a small interval around c. [340].

142. Simplicial spheres

A convex polytope G is dened as the convex hull of n points in Rd such that all vertices
are extreme points called vertices. (Extreme points are points which do not lie in an open
line segment of G.) This denition of [298] is also called a polytopal sphere. A simplicial
sphere is a geometric realization of a simplicial complex that is homeomorphic to the standard
(d-1)-dimensional spheres in Rd . For a polytopal sphere, the boundary of G is made up of
(d − 1)-dimensional polytopes called (d − 1)-faces. A cyclic polytope C(n, d) can be realized
as the convex hull of the n vertices {(t, t2 , t3 , · · · td ) | t = 1, 2, . . . , n} ⊂ Rd . Let fk (G) denote
the number of k -dimensional faces in G. So, f0 (G) is the number of vertices, f1 (G) the number

67
FUNDAMENTAL THEOREMS

of line segments and fd−1 the number of facets, the highest dimensional faces in G. Extending
the denition to f−1 = 1 (counting the empty complex, which is a (−1)-dimensional complex),
the vector f = (f−1 , f0 , f1 , · · · fd ) is called the extended f -vector of G. The upper bound
theorem is

Theorem: For simplicial spheres with f0 (G) = n, then fk (G) ≤ fk (C(n, d)).

This had been the upper bound conjecture of Theodore Motzkin from 1957 which was
proven by Peter McMullen in 1970 who reformulated it hk (G) ≤ n−d+k−1 k
for all k < d/2
as the other numbers are determined by Dehn-Sommerville conditions hk = hd−k for 0 ≤
k ≤ d. The h-vector (h0 , . . . hd ) and f -vector (f−1 , f0 , . . . , fd−1 ) determine each other via
Pd d−k
= dk=0 hk td−k . Victor Klee suggested the upper bound conjecture to be
P
k=0 fk−1 (t − 1)
true for simplicial spheres, which was then proven in by Richard Stanley in 1975 using new
ideas like relating hk with intersection cohomology of a projective toric variety associated
with the dual of G. (A toric variety is an algebraic variety containing an algebraic torus as
an open dense subset such that the group action on the torus extends to the variety.) The
result for simplicial spheres implies the result for convex polytopes because a subdivision of
faces of a convex polytope into simplices only increases the numbers fk . The g-conjecture of
McMullen from 1971 gives a complete characterization of f -vectors of simplicial spheres. Dene
g0 = 1 and gk = hk − hk−1 for k ≤ d/2. The g -conjecture claims that (g0 , . . . g[d/2] ) appears as
a g-vector of a sphere triangulation if and only if there exists a multicomplex Γ with exactly
gk vectors of degree k for all 0 ≤ i ≤ [d/2]. (A multi-complex Γ is a set of non-negative
integer vectors (aP 1 , . . . , an ) such that if 0 ≤ bi ≤ ai , then (b1 , . . . bn ) is in Γ. The degree of a
multicomplex is i ai .) The g-theorem proves this for polytopal spheres (Billera and Lee in
1980 suciency) and (Stanley 1980 giving necessity). The g-conjecture is open for simplicial
spheres. [771, 668, 129]

143. Bertrand postulate

A basic result in number theory is

Theorem: For n > 1, there always exists a prime p between n and 2n.
As the theorem was conjectured in 1845 by Joseph Bertrand, it is still called Bertrand's
postulate. Since Pafnuty Tschebyschef's (Chebyshev) proof in 1852, it is a theorem. For
a proof, see [365] page 367. Srinivasa Ramanujan simplied Chebyshev's proof considerably
in 1919 and strengthened it: if π(x) = p≤x,p prime 1 is the prime counting function, then
P

Bertrand's result can be restated as π(x) − π(x/2) ≥ 1 for x ≥ 2. Ramanujna shows that
π(x) − π(x/2) ≥ k , for large enough x (larger or equal than pk ). The primes pk giving the
lower bound for x solving this are called Ramanujan primes. Simple proofs like one of Erdös
from 1932 are given in Wikipedia or [356] page 82, who notes "it is not a very sharp result.
Deep analytic methods can be used to give much better results concerning the gaps between
successive primes". There is a very simple proof assuming the Goldbach conjecture (stating
that every even number larger than 2 is a sum of two primes): [599] if n is not prime, then
2n = p + q is a sum of two primes, where one is larger than n and one smaller than 2n; on
the other hand, if n is prime, then n + 1 is not prime and 2n + 2 = p + q is a sum of two
primes, where one, say q is larger than n and smaller than 2n + 2. But q can not be 2n + 1 (as
that would mean p = 1), nor 2n (as 2n is composite) so that n < q < 2n. There are various
generalizations like Mohamed El Bachraoui's 2006 theorem that there are primes between 2n

68
OLIVER KNILL

and 3n or Denis Hanson from 1973 [315] that there are primes between 3n and 4n for n ≥ 1.
Mohamed El Bachraoui asked in 2006 whether for all n > 1 and all k ≤ n, there exists a
prime in [kn, (k + 1)n] which is for k = 1 the Bertrand postulate. A positive answer would give
that there is always a prime in the interval [n2 , n2 + n]. Already the Legendre conjecture,
asking whether there is always a prime p satisfying n2 < p < (n + 1)2 for n ≥ 1 is open. The
Legendre's conjecture is the fourth of the super famous great problems of Edmund Landau's
1912 list: the other three are the Goldbach conjecture, the twin prime conjecture and
then the Landau conjecture asking whether there are innitely many primes of the form
n2 + 1. Landau really nailed it. There are 4 conjectures only, but all of them can be stated
in half a dozen words, are completely elementary, and for more than 100 years, nobody has
proven nor disproved any of them.

144. Non-squeezing theorem

The Euclidean space M = R2n carries

the standard symplectic 2-form ω(v, w) = (v, Jw) with
0 I
the skew-symmetric matrix J = . A linear transformation f : M → M, x → Ax is
−I 0
called symplectic, if A satises AT JA = J . A smooth transformation f : M → M is called
a symplectomorphism if it is a dieomorphism and if the derivative df is a symplectic map
from Tx M → Tf (x) M at every point x ∈ M . Any smooth map for which df is symplectic
is automatically a dieomorphism as symplectic matrices have determinant 1 and are so are
invertible. Let B(r) = {x ∈ M | x · x ≤ r2 } denote the round solid ball of radius r
and Z(r) = {x ∈ M | x21 + y12 ≤ r2 } the solid cylinder of radius r. Given two sets A, B ,
one says there is a symplectic embedding of A in B , if there exists a symplectomorphism f
such that f (A) ⊂ B . As symplectic maps are volume preserving, a necessary condition is
Vol(A) ≤ Vol(B). Is this the only constraint? Yes, for n = 1, where the cylinder and the ball
are the same as dened B(r) = Z(r). That the answer is no" in higher dimensions n ≥ 2 is
subject of the Gromov non-squeezing theorem:

Theorem: A symplectic embedding B(r) → Z(R) implies r ≤ R.

The theorem has been proven in 1985 by Michael Gromov. It has been dubbed as the principle
of the symplectic camel by Maurice de Gosson referring to the eye of the needle" metaphor.
A reformulation of the Gosson allegory [168] after encoding camel" = ball in the phase space",
hole = cylinder", and pass"=symplectically embed into", size of the hole" = radius of
cylinder" and size of the camel" = radius of the ball" is: There is no way that a camel can
pass through a hole if the size of the hole is smaller than the size of the camel". See [504, 362]
for expositions. The non-squeezing theorem motivated also the introduction of symplectic
capacities, quantities which are monotone c(M ) ≤ c(N ) if there is a symplectic embedding of
M into N , which are conformal in the sense that if ω is scaled by λ, then c(M ) is scaled by |λ|
and such that c(B(1)) = c(Z(1)) = π . For n = 1, the area is an example of a symplectic capacity
(it actually is unique). The existence of a symplectic capacity obviously proves the squeezing
theorem. Already Gromov introduced an example for a capacity, the Gromov width, which
is the smallest. More have been constructed in using calculus of variations. See [340, 505].

145. Kähler Geometry

A Kähler manifold is a complex manifold (M, J) together with a Hermitian metric h whose
associated Kähler form ω is closed. (The manifold can be given by a Riemannian metric g

69
FUNDAMENTAL THEOREMS

compatible with the complex structure g(JX, JY ) = g(X, Y ). The Kähler form ω is then
a 2-form ω(X, Y ) = g(JX, Y ) satisfying dω = 0 and the metric h = g + iω is the Hermitian
metric. (M, ω) is then also a symplectic manifold.) As ω is closed, it represents an element in
the cohomology class H 2 (M ) called Kähler class. The Calabi inverse problem is: given a
compact Kähler manifold (M, ω0 ) and a (1, 1)-form R representing 2π times the rst Chern
class of M , nd a metric ω in the Kähler class of ω0 such that Ricci(ω) = R. In local
coordinates, one can write Ricci(ω) = −i∂∂ log det(g). For compact M :

Theorem: The Calabi inverse problem has a unique solution ω .

This was conjectured in 1957 by Eugenio Calabi and proven in 1978 by Shing-Tung Yau by
solving nonlinear Monge-Ampère equations using analytic Nash-Moser type techniques. The
theorem implies that if the rst Chern class of M is zero, then (M, ω0 ) carries has a unique
Ricci-at Kähler metric g in the same Kähler class than ω0 . Kähler geometry deals simultane-
ously with Riemannian, symplectic and complex structures: (M, g) is a Riemannian, (M, ω) is
a symplectic and (M, J) is a complex manifold. The inverse problem of characterizing geome-
tries from curvature data is central in all of dierential geometry. Here are some examples: a)
M = C with Euclidean metric g is Kähler with ω = (i/2) k dz ∧ dz k but it is not compact.
n k
P
But if Γ is a lattice, then the induced metric on the torus Cn /Γ is Kähler. b) Because complex
submanifolds of a Kähler manifold are Kähler , and the complex projective space CP n with the
Fubini-Study metric is Kähler (with ω = i∂∂ρ, where ρ = log(1 + k |zk | /2) is the Kähler
2
P
potential), any complex projective variety is Kähler. d) For the complex hyperbolic case
where M is the unit ball in Cn , the Kähler potential is ρ = 1 − |z|2 . By Kodeira, Kähler
forms representing an integral cohomology class correspond to projective algebraic varieties. c)
Calabi-Yau manifolds are complex Kähler manifolds with zero rst Chern classes. Examples
are K3 surfaces. The existence theorem assures that they carry a Ricci-at metric, which are
examples of Kähler-Einstein metrics. Also Hodge theory works well for Kähler manifolds. In
the complex, the Dolbeault operators ∂, ∂ and d = ∂ + ∂ lead to Hodge Laplacians ∆∂ , ∆∂
and ∆d , and so to harmonic forms HP p,q
for dierential forms of type (p, q) and harmonic r-
forms for ∆. In the Kähler case, H = p+q=r H p,q . An example result due to Lichnerowicz is
r

that if Ricci(Ω) ≥ λ > 0, then the rst eigenvalue λ1 of ∆ satises λ1 ≥ 2λ. See [40, 740, 33].

146. Projective Geometry

A conic section is a curve which is obtained when intersecting a cone x2 +y 2 = z 2 with a plane
ax+by+cz = d. A bit more general is a conic, an algebraic curve ax2 +bxy+cy 2 +dx+ey+g = 0
of degree 2. They are either non-singular conics, classied as ellipses like x2 + y 2 = 1,
hyperbola x2 −y 2 = 1 or parabola x2 = y , or then degenerate conics like a point x2 +y 2 = 0,
the cross x2 = y 2 , the line x2 = 0 or pair of parallel lines x2 = 1. Given 6 dierent
points A1 , A2 , A3 , B1 , B2 , B3 on a conic, where A1 , A2 , A3 are neighboring and B1 , B2 , B3 are
neighboring, a Pascal conguration is the set of lines Ai Bj with i ̸= j . The intersection
points of this Pascal conguration is the set of three intersections of Ai Bj with Aj Bi , where
{i, j} runs over all three 2-point subsets of {1, 2, 3}.

Theorem: The intersection points of a Pascal conguration are on a line.

The theorem was found in 1639 by Blaise Pascal (as a teenager) in the case of an ellipse. A
limiting case where we have two crossing lines is the Pappus hexagon theorem, which goes

70
OLIVER KNILL

back to Pappus of Alexandria who lived around 320 AD. The Pappus hexagon theorem is
one of the rst known results in projective geometry.

147. Vitali theorem

A Lebesgue measure in Euclidean space Rn is a Borel measure which is invariant under

Euclidean transformations. It is the Haar measure of the locally compact group Rn and unique
if one normalizes is so that the unit cube has measure 1. In dimension n = 1, the Lebesgue
measure of an interval [a, b] is b − a. In dimension n = 2, the Lebesgue measure of a measurable
set is the area of the set. In particular, a ball of radius r has area πr2 . When constructing
the measure one has to specify a σ -algebra, which is in the Lebesgue case the Borel σ -algebra
generated by the open sets in Rn . One has for every n ≥ 1:
Theorem: There exist sets in Rn that are not Lebesgue measurable.
The result is due to Giuseppe Vitali from 1905. It justies why one has to go through all the
trouble of building a σ -algebra carefully and why it is not possible to work with the complete
σ -algebra of all subsets of Rn (which is called the discrete σ -algebra). The proof of the Vitali
theorem shows connections with the foundations of mathematics: by the axiom of choice
there exists a set V which represents equivalence classes in T/Q, where T is the circle. For
this Vitali set V , all translates Vr = V + r are all disjoint with r ∈ Q. {r + V, r ∈ Q} = R
and so form a partition. By the Lebesgue measure property, all translated sets Vr have the
same measure. As they are a countable set and are disjoint and add up to a set of Lebesgue
measure 1, they have to have measure zero. But this contradicts σ -additivity. Now lift V to R
and then build V × Rn−1 . More spectacular are decompositions of the unit ball into 5 disjoint
sets which are equivalent under Euclidean transformations and which can be reassembled to
get two disjoint unit balls. This is the Banach-Tarski construction from 1924.

148. Wilson's theorem

The factorial n! of a number dened as n! = 1 · 2 · · · n. For example, 5! = 120.

Theorem: n > 1 is prime if and only if (n − 1)! + 1 is divisible by n.

For n = 5 for example (5 − 1)! + 1 = 25 is divisible by 5. For n = 6 we have (6 − 1)! + 1 = 121

which is not divisible by 6. Indeed, 6 = 2 ∗ 3 is not prime. The theorem is named after John
Wilson, who was a student of Edward Waring. It seems that Joseph-Louis Lagrange gave the
rst proof in 1771. It is not a practical way to determine whether a number is prime: [673]:
from a computational point of view, it is probably one of the world's least ecient primality
tests, since computing (n − 1)! takes so many steps. Also named after Wilson are the Wilson
primes. These are primes for which not only p but p2 divides (p − 1)! + 1. The smallest one
is 5. It is not known whether there are innitely many.

149. Carleson theorem

If f ∈ L2 (T), where T = R/(2πZ) P is the circle, then the Fourier transform L2 (T) → l2 (Z)
gives a Fourier series g(x) = k∈Z ck e , where c = (. . . , c−2 , c−1 , c0 , c1 , c2 , . . . ) ∈ l2 (Z) is
ikx

Rgiven2 by ck = (2π)
P 2 T f (x)e dx. For smooth f , one knows g = f and Parseval's identity
−1
R ikx

R
f (x) dx = k ck so that the Fourier transform extends to an unitary operator L2 (T) →
l (Z). This does not say anything yet about the convergence of the sequence gn (x). We say the
2

71
FUNDAMENTAL THEOREMS

Fourier series converges to f at a point x, if the sequence gn (x) = nk=−n ck eikx converges to
P
f (x) for n → ∞. We say, a sequence gn (x) converges almost everywhere to f , if there exists a
set Y ⊂ T of full Lebesgue measure µ(T) = 1 such that the series converges for all x ∈ Y . (The
Lebesgue measure is the normalized Haar measure dx/(2π) on the circle). That the question
can be subtle is illustrated by the result of Andrey Kolmogorov from 1923 to 1926, who gave
examples of L1 (T) functions for which the Fourier series diverges everywhere.

Theorem: The Fourier series of a L2 function converges almost everywhere.

The statement had been conjectured by Nikolai Luzin in 1915 and was known as the Luzin
conjecture. The theorem was proven by Lennart Carleson in 1966. An extension to Lp with
p ∈ (1, ∞] was proven by Richard Hunt in 1968. The proof of the Carleson theorem is dicult.
While mentioned in harmonic analysis texts like [399] or surveys [403], who say about the
Carleson-Hunt theorem that it is one of the deepest and least understood parts of the theory.

150. Intermediate value

Let (X, O) be connected topological space and f : X → R a continuous map. We say that
f reaches both positive and negative signs if there exists a, b ∈ X such that f (a) < 0
and f (b) > 0. A root of f is a point x ∈ X such that f (x) = 0. Let C(X) denote the set of
continuous functions from X to R. This means that for f ∈ C(X) and all open sets U in R,
one has f −1 (U ) ∈ O.

Theorem: f ∈ C(X) reaching both signs on a connected X has a root.

The theorem was proven by Bernard Bolzano in 1817 for functions from the interval [a, b] to
R. The proof follows from the denitions: as P = (0, ∞) is open, also f −1 (P ) is open. As
N = (−∞, 0) is open, also f −1 (N ) is open. If there is no root, then X = N ∪P is a disjoint union
of two open sets and so disconnected. This contradicts the assumption of X being connected.
A consequences is the wobbly table theorem: given a square table with parallel equal length
legs and a oor" given by the graph z = g(x, y) of a continuous g can be rotated and possibly
translated in the z direction so that all 4 legs are on the table. The proof of this application
is seen as a consequence of the intermediate value theorem applied to the height function f (ϕ)
of the fourth leg if three other legs are on the oor. A consequence is also Rolle's theorem,
assuring that if a continuously dierentiable function [a, b] → R with f (a) = f (b) has a point
x ∈ (a, b) with f ′ (x) = 0. Tilting Rolle gives the mean value theorem assuring that for a
continuously dierentiable function [a, b] → R, there exists x ∈ (a, b) with f ′ (x) = f (b) − f (a).
The general theorem shows that it is the connectedness and not the completeness of X which
is the important assumption.

151. Perron-Frobenius

A n × n matrix A is non-negative if Aij ≥ 0 for all i, j and positive if Aij > 0 for all i, j .
The Perron-Frobenius theorem is:
Theorem: A positive matrix has a unique largest eigenvalue.

The theorem has been proven by Oskar Perron in 1907 [569] and by Georg Frobenius in 1908
[250]. When seeing the map x → Ax on the projective space, this is in suitable coordinates a
contraction and the Banach xed point theorem applies. This is the proof of Garret Birkho

72
OLIVER KNILL

who used the Hilbert metric [437]. The Brouwer xed point theorem only gives existence, not
uniqueness, but the Brouwer xed point applies for non-negative matrices. This has applications
in graph theory, Markov chains or Google page rank. The Google matrix is dened as
G = dA + (1 − d)E , where d is a damping factor and A is a Markov matrix dened by the
network and E is the matrix Eij = 1. Sergey Brin and Larry Page write the damping factor
d is the probability at each page the random surfer will get bored and request another random
page". The page rank equation is Gx = x. In other words, the Google Page rank vector (the
one billion dollar vector), is a Perron-Frobenius eigenvector. It assigns page rank values to the
individual nodes of the network. See [462]. For the linear algebra of non-negative matrices, see
[523].

152. Continuum hypothesis

ℵ0 is the cardinality of the natural numbers N. ℵ1 is the next larger cardinality. The cardi-
nality of the real numbers R is 2ℵ0 . The statement 2ℵ0 = ℵ1 is the continuum hypothesis
abbreviated CH. The Zermelo-Fraenkel axiom system ZFC of set theory is the most com-
mon foundational axiomatic framework of mathematics. The letter C refers to the axiom of
choice.

Theorem: Neither 2ℵ0 = ℵ1 nor 2ℵ0 ̸= ℵ1 can be proven in ZFC.

This result combines a result of Kurt Goedel from 1938 [270] (CH is consistent with ZFC) and
Paul Cohen (Negated CH is independent of ZVC) from 1963 [141, 142]. Cantor had for a long
time tried to prove that the continuum hypothesis holds. The Goedel-Cohen's theorem shows
that any such eort has been in vain and illustrates why Cantor was doomed not to succeed.
The problem had then been the rst of Hilbert's problems of 1900. For more, see [644] or [758]
who summarizes the result in words: Gödel solved the substructure problem in 1938. Over
25 years later Cohen, arguably the Galois of set theory, solved the extension problem.

153. Homotopy-Homology

Given a path connected pointed topological space X with base b, the n'th homotopy group
πn (X) is the set of equivalence classes of base preserving maps from the pointed sphere S n to
X . It can be written as the set of homotopy classes of maps from the n-cube [0, 1]n to X
such that the boundary of [0, 1]n is mapped to b. It becomes a group by dening addition as
(f +g)(t1 , . . . , tn ) = f (2t1 , t2 , . . . tn ) for 0 ≤ t1 ≤ 1/2 and (f +g)(t1 , . . . , tn ) = g(2t1 −1, t2 , . . . , tn )
for 1/2 ≤ t ≤ 1. In the case n = 1, this is joining the trip": travel rst along the rst curve
with twice the speed, then take the second curve. The groups πn do not depend on the base
point. As X is assumed to be connected, π0 (X) is the trivial group. The group π1 (X) is
the fundamental group. It can be non-abelian. For n ≥ 2, the groups πn (X) are always
Abelian f + g = g + f . The k 'th homology group Hn (X) of a topological space X with
integer coecients is obtained from the chain complex of the free abelian group generated by
continuous maps from n-dimensional simplices to X . The Hurewicz theorem is

Theorem: There exists a homomorphism πn (X) → Hn (X).

Higher homotopy groups were discovered by Witold Hurewitz during the years 1935-1936. The
Hurewitz theorem itself has then been established in 1950 [360]. In the case n = 1, the

73
FUNDAMENTAL THEOREMS

homomorphism can be easily described: if γ : [0, 1] → X is a path, then since [0, 1] is a 1-

simplex, the path is a singular 1-simplex in X . As the boundary of γ is empty, this singular
1-simplex is a cycle. This allows to see it as an element in H1 (X). If two paths are homotopic,
then their corresponding singular simplices are equivalent in H1 (X). There is an elegant proof
using Hodge theory if X = M is a compact manifold: the image C of a map πp (M ) can be
interpreted as a Schwartz distribution on M . Let L = (d + d∗ )2 be the Hodge Laplacian and
let the heat ow e−tL act on C . For t > 0, the image e−tL C is now smooth and denes a
dierential form in Λp (M ). As all the non-zero eigenspaces get damped exponentially, the
limit of the heat ow is a harmonic form, an eigenvector to the eigenvalue 0. But Hodge
theory identies ker(L|Λp ) with H p (M ) and so with Hp (M ) by Poincaré duality. The Hurewitz
homomorphism is then even constructive. Just heat up the curve to get the corresponding
cohomology element, the commutator group elements get melted away by the heat." A space
X is called n-connected if πi (X) = 0 for all i ≤ n. So, 0-connected means path connected
and 1-connected is simply connected. For n ≥ 2, one has πn (X) isomorphic to Hn (X) if X
is (n − 1)-connected. In the case n = 1, this can already not be true as π1 (X) is in general non-
commutative and H1 (X) is but H1 (X) is the isomorphic to the abelianization of G = π1 (X)
which is the group obtained by factoring out the commutator subgroup [G, G] which is a normal
subgroup of G and generated by all the commutators g −1 h−1 gh of group elements g, h of G.
See [322].

154. Pick's theorem

Let P be a simple polygon in the plane R2 . This means that it is given by as nite ordered
set of points called vertices Pi = (xi , yi ) i = 0, . . . , n such that the line segments Pi Pmod(i+1,n)
called edges joining neighboring points do not intersect. The polygon denes a polygonal
region G with area A. Assume now that all coordinates xi , yi are integers. Let I be the number
of lattice points (k, l) ∈ Z2 inside G and B the number of lattice points at the boundary of G.
Pick's theorem assures:

Theorem: A = I + B/2 − 1.

The result was found in 1899 by Georg Pick [575]. For a triangle for example with no interior
points, one has 0 + 3/2 − 1 = 1/2, for a rectangle parallel to the coordinate axes with I = n ∗ m
interior points and B = 2n + 2m + 4 boundary points and area A = (n + 1)(m + 1) also
I − B/2 − 1 = A. The theorem has become a popular school project assignment in early
geometry courses as there are many ways to prove it. An example is to cut away a triangle and
use induction on the area then verify that if two polygons are joined along a line segment, the
functional I + B/2 − 1 is additive. There are other explicit formulas for the area like Green's
formula A = n−1 i=0 xi yi+1 − xi+1 yi which does not assume the vertices Pi = (xi , yi ) to be lattice
P
points.

155. Isospectral drums

On a compact region G ⊂ R2 with piecewise smooth boundary δG one can look at the Dirichlet
problem −∆f = 0 in the interior of G and f = 0 on δG. The region is considered a drum". If
hit, one hears the spectrum of the Laplacian ∆u = uxx + uyy . There is a sequence of Dirichlet
eigenvalues 0 = λ0 < λ1 ≤ λ2 ≤ · · · , real values which solve −∆un = λn un for some functions
un which are zero on the boundary. For example, if G is the square [0, π] × [0, π], then the
eigenvalues are n2 + m2 with eigenvectors sin(nx) sin(mx). The eigenvalue 0 belongs to the

74
OLIVER KNILL

constant eigenfunction. Two drums are called isospectral, if they have the same eigenvalues.
Two drums are non-isometric, if there is no transformation generated by rotations, translation
and reections which maps one drum to the other.
Theorem: There exist non-isometric but isospectral drums.

Mark Kac had asked in 1962 Can one hear the sound of a drum" [383]". Caroline Gordon,
David Webb and Scott Wolpert answered this question negatively [278]. In the convex case,
the question is still open.

156. Bertrand theorem

The path r(t) of a particle in Rn moving in a central force potential V (x) = f (|x|) experi-
ences the central force F = −∇V (x) = −f ′ (|x|)x/|x|. In the case of the Newton potential
V (x) = −GM m/|x|, where the central mass M , the body mass m as well as the gravitational
constant G determines the force F (x) = −xGM m/|x|3 . The motion of the particle follows the
dierential equations r′′ (t) = −M Gr(t)/|r|3 , which conserve the energy E(r) = mr′2 /2 + V (r)
and angular momentum L = mr ∧ r′ , a n(n + 1)/2 dimensional quantity. The invariance
of L assures that r(t) stays in the plane initially spanned by r(0) and r′ (0) and that the area
of the parallelogram spanned by r(t) and r′ (t) is constant. To see the natural potential in Rn
is, one has to go beyond Newton and pass to Gauss, who wrote the gravitational law in the
form div(F ) = 4πρ, where ρ is the mass density. It expresses that mass is the source for the
force eld F . To get the force eld in a central symmetric mass distribution, one can use the
divergence theorem in Rn and relate the integral of 4πρ over a ball of radius r with the ux
of F through the sphere S(r) of radius r. The former is 4πM , where M is the total mass in
the ball, the later is −|S(r)|F (r), where |S(r)| is the surface area of the sphere and the nega-
tive sign is because for an attractive force F (r) points inside. So, in three dimensions, Gauss
recovers the Newton gravitational law F (r) = −4πGM/|S(r)| = −GM/|r|2 . There is a natural
central force Kepler problem in any dimensions: in Rn , we have F (r) = −Cn r/|r|n where Cn
is a constant. For n = 1, there is a constant force pulling the particle towards the center, for
n = 2, one has a 1/|r| force which corresponds to a logarithmic potential, for n = 3, it is the
Newtonian inverse square 1/r2 force, in n = 4, it is a 1/r3 force. For n = 0, one formally gets
the harmonic oscillator which is Hook's law. Which potentials lead to periodic motion?
The answer is surprising and was given by Bertrand: only the harmonic oscillator potential and
the Newtonian potential in R3 work. Let us call a central force potential all periodic if every
bounded (position and velocity) solution r(t) of the dierential equations is periodic. Already
for the Kepler problem, there are not only motions on ellipses but also scattering solutions
moving on parabola or hyperbola, or then suicide motions, with r′ (0) = 0, where the particle
dives into the singularity.
Theorem: Only the Newton potentials for n = −1 and n = 3 are all periodic.
This theorem of Joseph Bertrand from 1873 tells that three dimensional space is special as it in
any other dimension, calendars would be almost periodic as the solutions to the Kepler problem
would not close up. We could live with that but there are more compelling reasons why n = 3
is dynamically better: in other dimensions, only very special orbits stay bounded. A small
perturbation leads to the planet colliding with the sun or escaping to innity. Gauss's analysis
allows also to compute the force F (r) in distance r to the center of a n-dimensional ball with
constant mass density. The divergence theorem gives 4πρ|B(r)| = −|S(r)|F (r), where |B(r)|

75
FUNDAMENTAL THEOREMS

is the volume of the solid ball of radius r and |S(r)| the surface area of the sphere. This gives
the Hook law force F (x) = −4πρx/n, where n is the dimension.

157. Catastrophe theory

Catastrophe theory describes the singularity structure of smooth functions f on a n-manifold M

parametrized by some r parameters. A basic assumption is that congurations of interest
of the functional f are critical points of f : M → R. Especially interesting are minima, stable
congurations. When changing parameters of f , bifurcations, structural changes of the critical
set can happen. Especially, minima can change their nature or disappear. In particular, the
function ft (xt ), where xt is a local minimum can change discontinuously, even if the function
(t, x) → ft (x) is smooth. Such discontinuous changes are called catastrophes. The stage for
Thom's theorem is a smooth function f : Rn → Rr . One can think of f as a r parameter family
of functions on space Rn . Let ∇x = (∂x1 , . . . ∂xn ) is the gradient operator with respect to
the space variables and Mf = {(x, y) ∈ Rn × Rr | ∇x f = 0} is the submanifold on which points
are critical. The space X = C ∞ (Rn × Rr ) of smooth functions in space and parameter can be
equipped with the Whitney topology, the topology generated by a basis which is the union
of all the basis sets of C k Whitney topologies. A basis for the later is the set of all functions for
which f (j) (x, y) ∈ Uj for all 0 ≤ j ≤ k and U0 , · · · Uk are all open intervals. With the Whitney
C ∞ topology, X is a Baire space so that residual sets (countable intersections of open dense
sets), are dense. The next theorem works n = 2, r ≤ 6 and for n ≥ 3 if r ≤ 5 [514]

Theorem: For a residual set in X , Mf is an r-dimensional manifold.

The theorem was due to René Thom who initiated catastrophe theory in a 1966 article and
wrote [708] building on previous work by Hassler Whitney. More work and proofs were done
by various mathematicians like John Mather or Bernard Malgrange. There is more to it: the
restriction Xf of the projection of the singularity set Mf onto the parameter space Rr can be
classied. Thom proved that for r = 4, there are exactly seven elementary catastrophes:
`fold", cusp", swallowtail", buttery", hyperbolic umbillic", elliptic umbillic" and parabolic
umbillic". For r = 5, the number of catastrophe types is 11. The subject is part of singularity
theory of dierentiable maps, a theory that started by Hassler Whitney in 1955. The theory
of bifurcations was developed by Henri Poincaré and Alexander Andronov. See also [514, 585,
719]. It is also widely studied in the context of dynamical systems [525].

158. Phase transition

Given a nite simple graph G = (V, E), an interaction function J : E → R and a scalar
eld h : V → R denes a Hamiltonian H(σ) = j∈V hj σj on the set
P P
(i,j)∈E Jij σi σj − µ
of all functions σ : V → {−1, 1}. The interpretation is that σi are spin values, hj an
external magnetic eld and Jij is an interaction function. The additional parameter
µ is a magnetic moment. The energy H denes a probability measure P on the set Ω =
{−1, 1}V of all spin congurations. It is the Gibbs-Boltzmann distribution P [{σ}] = e−H(σ) /Z ,
where Z is is the normalization constant rendering P a probability measure. One calls Z the
partition function (as it is usually considered to be a function of some of the parameters
like temperature). Given a random variable =observable X : Ω → R, one is interested in the
expectation E[X]. An example is X(σ) = σi σj , which leads to the correlation. When replacing
H with βH , where β = 1/(KT ) is an inverse temperature parameter (T is the temperature
and K the Bolzmann constant), one can study the expectation of a random variable X in

76
OLIVER KNILL

dependence of β . One writes now also E[X] = ⟨X⟩β to stress the dependence on β . In the case
when
P G is a d-dimensional lattice G = [−L, L] , where two lattice points x, y dare connected
d

if k |xk − yk | = 1 one look at the L → ∞ van Hove limit, where G = Z . In the case
J = 1, h = 0 this is the Ising model. As J is positive, this is a Ferromagnetic situation. A
parameter value, where a quantity like Zβ or a derivative of it changes discontinuously is called
a phase transition.

Theorem: The Ising model in two dimensions has a phase transition.

This was rst proven by Lars Onsager in 1944, who in a tour de force gave analytical solutions.
The analysis shows that there is a phase transition. The temperature T at which this happens
is called the Curie temperature. The one dimensional case had been solved by Ernst Ising
in 1925, who got it as a PhD project from his adviser Wilhelm Lenz. In one dimensions, there
is no phase transition. In three and higher dimensions, there are no analytical solutions. The
Ising model is only one of many models and generalizations. If the Jij are random one deals
with disordered systems. An example is the Edwards-Anderson model, where Jij are
Gaussian random variables. This is an example of a spin glass model. An other example is
the Sherrington-Kirkpatrick model from 1975, where the lattice is replaced by a complete
graph and the Jij dene a random matrix. An other possibility is to change the spin to Zn
or the symmetric group (Potts) or then some other Lie group (Lattice gauge elds) and then
use a character to get a numerical value. Or one replaces the zero-dimensional sphere Z2 with
a higher dimensional sphere like S 2 and takes σi · σj (Heisenberg model). See [650].

159. Ceva theorem

Given a triangle ABC in the Euclidean plane R2 and a point O in the interior. For any
choice of points A′ on the segment BC , any point B ′ on the segment AC and any point C ′ on
the segment AB , one can look at the ratios r(AB) = AC ′ /C ′ B and r(BC) = BA′ /A′ C and
r(CA) = CB ′ /B ′ A in which the points bisect the sides of the triangle. The Ceva theorem is

Theorem: r(AB)r(BC)r(CA) = 1

The theorem is called after Giovanni Ceva who wrote it down in 1678. The result is older
however: Al-Mu'taman ibn Hud from Zaragoza proved it already in the 11'th century. [343].
See [617].

160. Angle theorem

Given a circle C in the plane R2 . Denote by M its center point. Pick two points A, B on C .
If P is a point on C , then AP B is constant for all P in C which are on the same side than M
with respect to the segment AB . The angle AP B is called the inscribed angle of the secant
AB . The next theorem is also called the inscribed angle theorem.

Theorem: The angle AP B is half the angle AM B .

The theorem is believed to have been known already to Thales of Miletus who is the rst Greek
mathematician known by name (624 - 546 BC). It is usually called Thales theorem in the
special case is if A, B are on a diagonal. Then the angle AP B is a right angle. A consequence
of the theorem is that the opposite angles of a quadrilateral which is inscribed in a circle add

77
FUNDAMENTAL THEOREMS

up to π . Unlike the special case of the right angle which immediately follows from symmetry,
the full version of Thales theorem can surprise at rst.

161. Total curvature

A smooth simple closed curve C in R3 is called a knot. If r(t) is the parametrization of

C , then κ(t) = |r′ (t) × r′′ (t)|/|r′ (t)|3 isR called the curvature of the parametrization of r at
2π
the point r(t). The integral K(C) = 0 κ(t) dt is the total curvature of r. We say C is
unknotted if C can be continuously deformed to a circle S = {x2 + y 2 = 1, z = 0} = {r1 (t) =
(cos(t), sin(t), 0), t ∈ [0, 2π]} meaning that there exists a smooth function R(t, s) such that
R(t, 0) = r(t) and R(t, 1) = r1 (t) such that for any s, the curve Ct : t → R(t, s) is a simple
closed curve.
Theorem: If C is a knot and K(C) ≤ 4π , then K is unknotted.

This is the theorem of Fary-Milnor, proven by Fary in 1949 and Milnor in 1950. The theorem
follows also from the existence of quadrisecants, which are lines intersecting the knot in 4
points [173]. The existence of quadrisecants was proven by Erika Pannwitz in 1933 for smooth
knots and generalized in 1997 by Greg Kuperberg to tame knots, knots which are equivalent
to polygonal knots.

162. Morley's Theorem

An angle trisector of an angle α = ∠(CAB) in R2 is a pair of lines P A, QA through A such

that the angles ∠(CAP ), ∠(P AQ), ∠(QAB) are all equal. Given a triangle ABC , we can look
at the angle trisectors at each point and intersect the adjacent trisectors, leading to a triangle
P QR inside the triangle. The triangle P QR is called the Morley triangle of ABC . Morley's
theorem is
Theorem: For any triangle ABC , the Morley triangle is equilateral.

Morley's theorem was discovered in 1899 by Frank Morley. A short proof was given in 1995
by John H. Conway: assume the triangle ABC had angles 3α, 3β, 3γ so that α + β + γ = π/3.
Start with an equilateral triangle P QR of length 1. Build three triangles P QA with angles
β + π/3, α, γ + π/3, QCA with angles α + π/3, γ, β + π/3 and a triangle RBQ with angles
γ+π/3, β, α+π/3. Then ll in three other triangles ACQ, CBR, BAP with angles α, γ, β+2π/3
and γ, β, α + 2π/3 β, α, γ + 2π/3. These triangles ts together to a triangle of the shape ABC .
See [222].

163. Rising sun lemma

Given an interval [a, b], the space C([a, b]) denotes the vector space of all continuous functions
on [a, b]. For g ∈ C([a, b]), we say the set E(g) = {x ∈ (a, b) | g(t) > g(x) for t > x} has
the rising sun property if E is open, and S E is empty if and only if g is decreasing and
if not empty, then E can be written as E = n (an , bn ) with pairwise disjoint intervals with
g(an ) ≤ g(bn ). See [54].

Theorem: f ∈ C([a, b], R) has the rising sun property.

78
OLIVER KNILL

The theorem is due to F. Riesz. The name rising sun lemma" appeared according to [54] rst
in [29]. The picture is to draw the graph of the function f . If light comes from a distant source
parallel to the x-axis, then the intervals (an , bn ) delimit the hollows that remain in the shade
at the moment of sunrise. The lemma is used in real analysis to prove that every monotone
non-decreasing function is almost everywhere dierentiable.

164. Uniform continuity

Uniform continuity is a stronger version of continuity. But unlike continuity, which is dened for
maps between topological spaces, uniform continuity needs more structure like a metric spaces
or more generally a topological space with a uniform structure. Given two metric spaces X
and Y , a function f : X → Y is called continuous if f −1 (U ) is open for every open U in Y . A
function f is called uniformly continuous if there exists a sequence of numbers Mn → 0 such
that for every positive n ∈ N, the condition d(x, y) ≤ 1/n implies that d(f (x), f (y)) ≤ Mn .

Theorem: For compact X , continuous implies uniformly continuous.

The theorem is due to Eduard Heine and Georg Cantor. Heine is known also for the Heine-
Borel theorem which states that in Euclidean spaces, the class of closed and bounded sets
agrees with the class of compact sets. The proof of the Heine-Cantor theorem uses the
extreme value theorem assuring that a continuous function on a compact space X achieves
a maximum. Look for every n and every x at the minimal Mn (x) such that if |x − y| ≤ 1/n,
then |f (x) − f (y)| ≤ Mn (x). Now Mn (x) is non-negative and nite and depends continuously
on x. By the extremal value theorem there is a maximum. We call it Mn . This assures now
that if |x − y| ≤ 1/n, then |f (x) − f (y)| ≤ Mn . The Bolzano-Weierstrass or sequential
compactness theorem assures that a bounded sequence in Rn has a convergent subsequence.
This is used in the intermediate value theorem assuring that if f (a) < 0 and f (b) > 0, then
there is an x with f (x) = 0. The Heine-Cantor theorem together with the intermediate value
theorem assures that continuous functions are Riemann integrable. The additional uniform
structure or metric structure is also necessary when dening completeness in the sense that
every Cauchy sequence converges. Completeness is not a property of topological spaces: (0, 1)
is not complete but R is complete even so the two spaces are homeomorphic.

165. Jordan normal form

A n × n matrix A is similar to an other n × n matrix B if there exists an invertible n × n

matrix S such that B = S −1 AS . A matrix is in Jordan normal form (also called Jordan
canonical form) if it is block diagonal, where each block is a Jordan block. A m × m matrix
J is a Jordan block, if Je1 = λe 1 , and Jek 
= λek + ek+1 for k = 2, . . . , m. An example of a
3 1 0
3 × 3 Jordan block matrix is J =  0 3 1 . In other words, A is of the form A = λ1 + N ,
0 0 3
where N is nilpotent: N m = 0 and more precisely only has 1 in the super diagonal above the
diagonal.

Theorem: Every n × n matrix is similar to a matrix in Jordan normal form.

Up to the order of the Jordan blocks, the Jordan normal form is unique. If each Jordan block
is a 1 × 1 matrix, then the matrix is called diagonalizable. The spectral theorem assures

79
FUNDAMENTAL THEOREMS

that a normal matrix AA∗ = A A is diagonalizable. Not every matrix is diagonalizable as the
∗

1 1
shear matrix A = , a 2 × 2 Jordan block, shows. The theorem has been stated rst
0 1
by Camille Jordan in 1870. For history, see [90]. The Jordan-Chevalley generalization states
that over an arbitrary perfect eld, a matrix is similar to B + N , where B is semi-simple and
N is nilpotent and BN = N B . (See [359] page 17). A matrix B is called semi-simple if every
B -invariant linear subspace V has a complementary B -invariant subspace. For algebraically
closed elds, semi-simple is equivalent to be conjugated to a diagonal matrix. To the condition
on the eld: a eld k is called perfect if every irreducible polynomial over k has distinct roots.

166. Hippocrates theorem

The Hippocrates theorem dealing with the lunes of Hippocrates or the lunes of Alhazen
is a theorem in planar geometry: given a triangle ABC in R2 with right angle β at B , one can
draw the circles with diameter AC, AB and BC centered at the midpoints (A+C)/2, (A+B)/2
and (B + C)/2. They dene two moon-shaped" regions U, V bounded by circles called the
lunes.

Theorem: The area of U plus the area of V is the area of the triangle.

The proof directly follows from Pythagoras by relating the areas of half discs and triangle. The
result is remarkable as it was historically the rst attempt for the quadrature of the circle.
The lunes are bound by circles, while the triangle is bound by line segments. The theorem does
the quadrature of the lunes. Hippocrates of Chios lived from about 470 to 410 BC. The
theorem was rst mentioned in Simplicius's commentary on Aristotle's physics and in more
modern times was payed attention to in 1870 (see [328] page 539). For more history see [39]
page 37.

167. Fermat-Hamilton principle

A point x is called a critical point of a dierentiable function f : Rm → R, if ∇f (x) = 0,

where ∇f is the gradient of f . A point x0 is called a local maximum of f if there exists r > 0
such that f (x) ≤ f (x0 ) for all |x − x0 | < r. The local maximum does not have to be isolated.
For a constant function for example, every point is a local maximum. The local maximum also
does not have to be a global maximum. The function f (x) = x4 − x2 has a local maximum
at x = 0 but this is not a global maximum because f (2) > f (0).

Theorem: If x0 is a local maximum of f , then ∇f (x0 ) = 0.

This generalizes to theR calculus of variations, where ∇f is replaced by the variation. In

b
the case when f (x) = a L(x(t), x′ (t)) dt is a function on the space of curves [a, b] → Rn (one
calls this then a functional or action functional) then we an look at the problem to minimize
the action. In that case, the gradient is δS = Lx (x(t), x′ (t)) − dt
d
Lx′ (x(t), x′ (t)) = 0. This so
called Hamilton principle can be seen as a generalization of the Fermat principle to innite
dimensions. The equations δS = 0 are called the Euler-Lagrange equations or Lagrange
equations of the second kind. They are the starting point of Lagrangian mechanics.
Fermat's original paper deals with the single variable situation but the higher dimensional
situation is similar. Fermat in some sense already looked at the action principle which is the

80
OLIVER KNILL

situation to minimize the arc length of a path in a medium with two dierent properties like
water and air. In that case the shortest path is described by the Fermat law or Fermat's
principle.

168. Alternating sign

An alternating sign matrix is a square matrix with entries in {0, 1, −1} such that the sum
of each row and column is 1 and the nonzero entries in each row and column alternate in sign.
Qn−1 (3k+1)!
Theorem: The number of n × n alternating sign matrices is k=0 (n+k)! .

Qn−1
The numbers k=0 (3k +1)!/(n+k)!are known as the Robbins numbers or Andrews-Mills-
Robbins-Rumsey numbers and are the integer sequence A005130 [1]. The alternating sign
conjecture was popularized by David Robbins in [603]. The theorem was proven by Doron
Zeilberger in 1994 [770]. A short proof was given by Greg Kuperberg in 1996 [452]. A book
about it is [96].

169. Combinatorial convexity

A nite set P of points in Rd is called r-convex, if there is a partition of P into r sets such that
their convex hulls intersect simultaneously in a non-empty set. Tverberg's theorem states:

Theorem: A set of (r − 1)(d + 1) + 1 points in Rd is r-convex.

The decomposition of P into r subsets is called the Tverberg partition. In the one-dimensional
case d = 1, the theorem assures that 2r − 1 points on the line are r-convex. For r = 3 for exam-
ple, this means that 5 points are 3-convex. If the points are arranged x1 < x2 < x3 < x4 < x5 ,
the Tverberg partition {x1 , x4 }, {x2 , x5 }, {x3 }}. For r = 2, it implies Radon's theorem which
tells that d + 2 points in Rd can be partitioned into 2 sets whose convex hulls intersect. For ex-
ample, 4 points {x1 , x2 , x3 , x4 } in R2 can be partitioned into two sets such that their convex hull
intersect. Indeed, the 4 points dene a quadrilateral and the partition {{x1 , x3 }, {x2 , x4 }} de-
ne the two diagonals of the quadrilateral. The theorem has been proven by Helge Tverberg
in 1966. See [714, 364].

170. The Umlaufsatz

Let r be a continuously dierentiable closed curve in R2 . If r(t) is a parametrization for which

the speed is 1, we have r′ (t) = (cos(α(t)), sin(α(t))) and a signed curvature κ(t) = α′ (t).
R 2π
If [0, 2π] is the parameter interval, then K = 0 κ(t) dt is the total curvature. The Hopf
Umlaufsatz is:

Theorem: For r ∈ C 1 , the total curvature of a plane curve is 2π .

The paper was proven in 1935 by Heinz Hopf [349] using a homotopy proof: dene f (s, t) as
the argument of the line through r(s) and r(t) or continuously extend it s = t as the argument
of the tangent line. The direct line from (0, 0) to (1, 1) in the parameter st-plane gives a total
angle change of n2π where n is an integer. Now deform the curve from (0, 0) to (1, 1) so that
it rst goes straight from (0, 0) to (0, 1), then straight from (0, 1) to (1, 1). Both lines produce
a deformation of π and show that n = 1. The theorem can be generalized to a Gauss-Bonnet

81
FUNDAMENTAL THEOREMS

theorem for planar regions G. The total curvature of the boundary is 2π times the Euler
characteristic of G. For a discrete version, see [421].

171. Frobenius determinant

The Frobenius determinant theorem tells how the determinant of the multiplication table
matrix" factors into irreducible polynomials: if G = {g1 , . . . , gn } is a nite group and xi is a
variable associated to the group element gi , then the matrix Aij = xgi gj satises
Qr
Theorem: det(A) = j=1 pj (x1 , . . . , xn ))dj

Here, dj = deg(pj ) and r is the number of conjugacy classes of G. For an Abelian group G,
there are n conjugacy classes. The theorem had been conjectured in 1896 by Richard Dedekind.
Frobenius proved it. See [326, 147].

172. König's theorem

A matching M in a nite simple graph G = (V, E) with vertex set V and edge set E
is a subset M of the edges E in S which no two edges have a common vertex. A vertex cover
C is a set of vertices such that x∈C S(x) = V , where S(x) is the unit sphere of a vertex x.
A bipartite graph is a graph for which V = V1 ∪ V2 can be partitioned into two disjoint sets
V1 , V2 such that all edges connect vertices from dierent sets. König's theorem, from 1931, also
known as König-Egeváry theorem is:

Theorem: For bipartite G, matching number = vertex cover number.

The vertex cover problem is the problem to nd the vertex cover number is a classical NP-
complete problem. For example, for a cyclic graph G = C10 with 2n vertices {1, 2, 3, · · · , 2n}
(which is an example of a bipartite graph), the set C = {2, 4, · · · 2n} is a minimal vertex cover.
The edges M = {(1, 2), (3, 4), · · · (2n − 1, 2n)} are a maximal matching. The example of an
odd cyclic graph like C9 (which is not bipartite) already shows that the bipartite condition is
necessary: for C9 , the set {1, 3, 5, 7, 8} is a minimal cover and M = {(1, 2), (3, 4), (5, 6), (7, 8)}
is a maximal matching.

The origin of the theorem is attributed to Dénes K

nig, who proved it in 1931 and wrote a
precursor paper in 1916, where he proved that a regular (constant vertex degree) bipartite
graph has a perfect matching (a matching which covers all vertices). For a proof, see [178]
(Chapter 2).

173. Polynomial Ergodic theorems

Birkho's ergodic theorem stating that Sn,f (x) = n1 nk=1 f (T k x) converges for n → ∞ point-
P
wise for µ almost every x for an automorphism T of a probability space (X, A, µ) and a function
f ∈ Lp (X) with 1 ≤ p <P ∞ has been generalized in 1988 by Jean Bourgain [84] to polynomial
n
averages SP,n,f (x) = n
1
k=1 f (T
P (k)
x), where P is a polynomial with integer coecients.

Theorem: SP,n,f (x) converges point-wise almost everywhere if p > 1.

82
OLIVER KNILL

Bourgain proves in [84] rst a maximal ergodic theorem and extends it also to Zd actions
generated by d commuting transformations. The starting point is that for f ∈ L2 (X, µ), there is
for any integer t a bound |Snt ,n,f |2 ≤ C|f |2 . This implies for example that n1 n−1 t
P
R1 k=0 f (x+m α) →

0
f (x) dx for any irrational α and any bounded measurable function. The case t = 2 leads
to results to sums like n1 n−1 πik2 α
Pq−1 2πik2 a/q
which relates to Gauss sums S(q, a) = 1q k=0 .
P
k=0 e e
Pn−1 2πik2 α √ p
One can for example estimate k=0 e ≤ C(n/ q + n log(q) + q log(q)). [84]. The case
p

p = 1 is known to fail [103]. The results have been generalized to correlation expressions
like Sn,f,a,b (x) = n1 n−1 k=0 f (T x)g(T x) for integers a, b where f ∈ L , g ∈ L with 1 <
an bn p q
P
p, Pq ≤ ∞ and 1/p + 1/q < 3/2 [171, 453] and to non-conventional bilinear polynomial averages
n−1
n
1 n
k=0 f (T x)gT
P (n)
x) [35], where P is an integer polynomial of degree d ≥ 2 and f ∈ Lp , g ∈
Lq with 1 < p, q ≤ ∞ and 1/p + 1/q ≤ 1.

174. Wantzel's theorem on Angle trisection

A classical problem in geometry asks to trisect an angle using an unmarked straightedge

(ruler) and compass only. The insistence on restricting constructions to ruler and compass
has been proposed already by Euclid and Archimedes already knew how to solve the problem
using a marked straightedge meaning that one has additionally to the constructed points also
an additional real number to work with. One can trisect and angle using an additional curve like
an Archimedean p spiral [22] given in polar coordinates as r = θ. In that case, the trisecting
the radius r = x2 + y 2 of a given point (x, y) = (r cos(θ), r sin(θ)) gives the angle θ/3 by
intersecting the circle of radius r/3 with the spiral r = θ. More generally, a curve which can
be used to trisect an angle is called a trisectrix.

Theorem: One can not trisect a general angle with ruler and compass.

The theorem follows from Galois theory. An angle α can be trisected if and only if the poly-
nomial 5x3 − 3x − cos(α) is reducible over the eld Q(cos(α)). The angle α = 60◦ = π/3
for example is not trisectable. The rst proof of the impossibility of trisecting an arbitrary
angle was given by Pierre Wantzel in 1837. Wantzel also solved there the problem of doubling
the cube and characterized constructable regular n-gons as the ones with n = 2k p1 · · · pk
m
with distinct Fermat primes pk = 22 k + 1. Bieberbach realized in 1932 that every cubic
construction can be traced back to the trisection of an angle and the extraction of the third
root. [63]. This has been formulated more precisely by Gleason in 1988 [268] who states in
in that article as Theorem 1: a real cubic equation can be solved geometrically using ruler
and compass and angle-trisector if and only if its roots are all real. Gleason shows from this
also that a regular n-gon can be constructed by ruler, compass and angle-trisector if and
only if the prime factorization of n has the form 2r 3s p1 p2 · · · pk with k ≥ 0, where all primes
pk > 3 are distinct and have the form 2n 3m + 1. An example is p = 13 = 22 3 + 1. The corre-
sponding 13-gon is called the triskaidecagon for which Gleason gives a concrete construction
using that 2 cos(2πk/13)
√ are the roots of the polynomial
√ x6 + x5 −√5x4 − 4x3 + 6x2 + 3x − 1
which factors over Q( 13) because with λ = (1 − 13)/2, λ = (1 + 13)/2 one can write it as
(x3 − x − 1 + λ(x2 − 1))(x3 − x − 1 − λ(x2 − 1)), where the rst factor has the root 2 cos(2π/13).
For more on angle trisector and especially many failed attempts, see [197].

83
FUNDAMENTAL THEOREMS

175. Preissmann's Theorem

Let M− denote the class of compact negatively curved Riemannian manifolds M . Negative
curvature means that all sectional curvatures of M are negative everywhere. Let π1 (M ) denote
the fundamental group of M . For positively curved manifolds, the theorem of Synge shows
that the fundamental group π1 (M ) is nite; it can be trivial like for a sphere Sd , d > 1 or be a
nite group like π1 (M ) = Z2 for the projective space M = RPd for d > 1. For a at manifold
like the torus Td , the fundamental group can already be the innite group Zd . This changes for
negative curvature. Preissmann showed that if π1 (M ) is cyclic, then there is only one closed
geodesic and that there is maximally one geodesic in each homotopy class of closed curves in
M . Here is Preissmann's theorem which deals with non-trivial subgroups G of π1 (M ) meaning
that G should not be the trivial 1-point group.

Theorem: If G ⊂ π1 (M ) for M ∈ M− is Abelian then M = Z.

A consequence is that the torus Tn can not admit a Riemannian metric of negative sectional
curvature. Preissmann gives in his paper also the corollary that the product of two negatively
curved Riemannian manifolds can not carry a metric with negative curvature. An analogue
result for positive curvature is not known. The famous product conjecture of Heinz Hopf asks
whether the product manifold S 2 × S 2 can carry a metric of positive curvature (see [761]).
Preissmann who was born at Neuchâtel in Switzerland in 1916, went to school at La Chaux-de-
Fonds. He studied mathematics from 1934 to 1938 at the ETH and worked there until 1942 as an
assistant to Kollros and Gonseth, writing his thesis under the guidance of Heinz Hopf, where the
theorem appears [586]. Preissmann later later got interested in hydraulic computations given
the Swiss boom of hydro-power developments. After having been an actuary in a life insurance
from 1942-1946, he joined VAWD until 1958, then led the Department of Mathematical Methods
of the hydraulics laboratory SOGREAH in Grenoble from 1958-1972, retiring in 1981. See [164].

176. Killing-Hopf theorem

A space form M is a quotient A/G, where A is a sphere, an Euclidean space or hyperbolic space
and G is a group acting freely (gx = x is only possible for g=1) and discontinuously. The later
means that for any compact K in M , and any g ∈ G the set gK ∩ K is nite). A Riemannian
manifold has constant curvature if all sectional curvatures are the same everywhere. The
Killing-Hopf theorem is:

Theorem: Constant curvature manifolds are space forms.

The theorem is due to Wilhelm Killing from 1891 [405] and Heinz Hopf 1926 [353]. See [755]
for the topic of constant curvature manifolds.

177. Ballot theorem

Let Xj be independent identically distributed random variables taking values ek = (0, · · · , 0,

1, 0, · · · 0) in Zd withP
probability pk . If p1 > · · · > pd , we can look at the multi-dimensional
n
random walk Sn = k=1 Xk . What is the probability that the walk starting at 0 remains in
open cone Q = {x1 > x2 · · · > xd } at all positive times? The answer is given by the Ballot
theorem. It expresses the probability as a van der Monde determinant:

84
OLIVER KNILL

− pj ) .
Q
Theorem: P[Sn ∈ Q, ∀n > 0] = i<j (pi

The case d = 2 is the classical result is due to Joseph Bertrand [60] and appears in virtually
every probability textbook like [232] who also points out that the theorem has been proven
earlier by William Whitworth [747] who looked at the problem in a dierent context like the
problem of counting the number of weak orderings. The historical context is voting and explains
the etymology of the theorem [7]: if candidate A gets m votes and candidate B gets n votes,
then the probability that during the counting process A always has more votes than B is
(n − m)/(n + m). If Pn,m counts the number of paths always favorable for A, then the recursion
Pn+1,m+1 = Pn+1,m + Pn,m+1 holds. As Binomial coecients Bn,m and so Dn,m = Bn,m − Pn,m
satises the same recursion, it can be shown by induction that Dn,m = 2mBn,m /(n+m), leading
to the result. The multidimensional result has been studied in [768, 263].

178. Poincaré-Hopf

If F be a smooth vector eld on a compact n-manifold M with nitely many equilibrium points
F (x) = 0. The index iF (x) of F at such an equilibrium point xk is dened as the degree
of the map u ∈ S(x) → F (u)/|F (u)| ∈ Sn−1 , where S(x) is the boundary of a small enough
ball containing x in the interior. Let χ(M ) denote the Euler characteristic of M . The
Poincaré-Hopf index formula links the topological quantity χ(M ) with the analytic index
sum:
P
Theorem: x,F (x)=0 iF (x) = χ(M )

The formula can be used to compute the Euler characteristic of a manifold M : just construct
a smooth vector eld F with nitely many equilibrium points and add up their indices. For
example, on the n-torus M = Tn , there is the constant vector eld F (x) = v without equilibrium
points. Therefore χ(M ) = 0. On a 2n-sphere embedded as {|x| = 1} in R2n+1 there are circles
in SO(2n + 1, R) that have two xed points of index 1, the Euler characteristic is 2. On a
2n + 1 sphere M , there are circles in SO(2n + 2, R) without xed points so that χ(M ) = 0.
A special case is if f is a Morse function on M , where F = ∇f , the equilibrium points of F
are the critical points of f . In that case iF (x) = (−1)m(x) , where m(x) is the Morse index, the
number of negative eigenvalues of the Hessian d2 f (x). Poincaré wrote the rst article in 1885
[580]. Then appeared Hopf's articles [346] for hypersurfaces and [347] for vector elds.

179. Sampling theorem

Let S be the Schwartz space of complex-valued functions in C ∞ (R, C) such that ||f ||m,n =
supx∈R |xm f (n) (x)| < ∞. The Fourier transform fˆ of f ∈ S is dened as fˆ(k) = √12π
f (x)e−ixk dx. The Nyquist-Shannon sampling theorem tells that if fˆ supported on
R
R
[−π, π]. Then {f (n), n ∈ Z} determines f :
1
P∞
Theorem: f (t) = π n=−∞ f (n)sinc(π(n − t))

It uses the sinc function sinc(x) = sin(x)/x. The explicit reconstruction formula is also known
as the Whittaker-Shannon interpolation formula as the formula appeared in the book [746].
Whitaker has already found that formula in 1915 while [643] which is the start of information
theory. The result was also spearheaded by Nyquist 1928. We followed [677].

85
FUNDAMENTAL THEOREMS

180. Peter Weyl theorem

Let G be a compact topological group and let C(G, C) denote the Banach space of continuous
complex-valued functions on G, equipped with the uniform norm |f |∞ = maxx∈G f (x). Denote
by π : G → Gl(V ) a group representation of G, where V is a complex vector space. This
means π(gh) = π(g)π(h) for any g, h ∈ G. A matrix coecient of G is a map ϕ : G → C
which has the form L(π(x)), where π is a representation of G and where L is a linear functional
Gl(V ) → C. An example of a linear functional on Gl(V ) is the trace tr(A) or an other linear
combination of matrix entries Aij (explaining the name). The Peter-Weyl theorem is:

Theorem: The set of matrix coecients is dense in C(G, C).

This implies that the matrix coecients are also dense in the Hilbert space L2 (G, µ) dened by
the Haar measure µ on G. If π is a unitary representation on a Hilbert space H = (V, (·, ·)), one
can write π as a direct sum of irreducible unitary representations and the matrix elements give
an explicit orthornormal basis in L2 (G): make a list of representatives of the p isomorphism
classes π of irreducible unitary representations of G, then take the basis elements d(π)π(g)ij ,
where d(π) is the degree of the representation. The theorem was proven by Fritz Peter and
Herman Weyl in 1927 [570]. The result follows from the Stone-Weierstrass theorem if G is a
matrix group and especially for Lie groups which are known to be matrix groups. Not much
seems to be known about Fritz Peter (1899-1949) whose residence is in the paper [570] given
as Karlsruhe and to whom Weyl refers as his student". The book [325] states that Peter got
a doctorate in Göttingen in 1923 (with the title: Über Brechungsindizes und Absorptionskon-
stanten des Diamanten zwischen λ 644 und 266), under the guidance of Max Born [629]. A
conference proceeding lists him later as a teacher at a school in Schloss Salem near Überlingen
in Germany. See [325]

181. Kruskal-Katona theorem

A nite abstract simplicial complex G is a nite set of non-empty sets which is closed under
the operation of taking nite non-empty subsets. The dimension of a set x is the |x|−1, where
|x| is the cardinality of x ∈ G. The f -vector f = (f0 , f1 , · · · , fd ) ∈ Nd+1 counts the number
fk of k -dimensional sets x in G. If n = B(ni , i) + B(ni−1 , i − 1) + · · · + B(nj , j) is the Binomial
development of n at level i, dene n(i) = B(ni , i + 1) + · · · + B(nj , j + 1). The theorem of
Kruskal-Katona characterizes the possible f -vectors which simplicial complexes can have:

(i)
Theorem: f is the f -vector of a complex if and only if fi ≤ fi−1 .

The theorem was found by Joseph Kruskal (1963) (a brother of Martin Kruskal known in the
context of solitons) and Gyula Katona (1968). See [245]. Because the result is sharp, it is
often mentioned in the context of extremal set theory. The result implies the Erdoes-Ko-
Rado theorem [221]. The later is the result about a nite set G of sub-sets of {1, . . . , n} of
cardinality k such that each pair has a non-empty intersection and n > 2k , then the number
of sets in G is less or equal than the Binomial coecient B(n − 1, k − 1). A bit easier to
state is the following special case of the Kruskal-Katona theorem formulated by Lovasz: if
fk = B(m, i), then fk−r ≥ B(m, i − r) for any r ≥ 0. The fact that these statements are
sharp can be seen when looking at the complete complex G consisting of all non-empty subsets
of {1, 2, . . . , n}, where fk (G) = B(n, k − 1) which means m = n, i = k − 1 in the above

86
OLIVER KNILL

notation. More specically, if G = {{1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4},
{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}}, where the f -vector is (4, 6, 4, 1) we have the
situation of Lovasz.

182. Computational complexity

An NP decision problem has a probabilistically checkable proof (PCP) if given any

probability p < 1, there exists a polynomial f such that every mathematical proof of length n
can be rewritten with a proof of length f (n) and that can be formally veried with accuracy
p. The later means that one can formally verify p ∗ f (n) letters of the proof of an NP decision
problem. Examples of NP hard decision problems are the traveling salesperson problem,
the knapsack decision problem, clique problems in graphs. The PCP theorem is:

Theorem: Every NP decision problem has probabilistically checkable proof.

To cite [183]: "Every language in NP has a witness format that can be checked probabilistically
by reading only a constant number of bits from the proof. The celebrated equivalence of this
theorem and inapproximability of certain optimization problems, due to Feige et al. 1996, has
placed the PCP theorem at the heart of the area of inapproximability."
The theorem has been proven by various mathematicians starting with 1990 by Laszlo Babai,
Lance Fortnow and Carsten Lund. More work was done by Sanjeev Arora and Shmuel Safra
from 1998. The theorem is considered one of the most important results in complexity theory
as it shows that certain problems can have no polynomial-time approximation schemes. See
[735].

183. Fenchel duality theorem

In the theory of convex analysis, one can look at convex bounded continuous functions f :
X → R, g : X → R on a Banach space X and at a bounded linear map A : X → Y from X to an
other Banach space to compute p∗ = inf x∈X f (x) + g(Ax) and d∗ = supy∈Y ∗ −f ( A∗ y ∗ ) + g(−y ∗ ).
If X ∗ , Y ∗ are the dual Banach spaces of X, Y and A∗ : Y ∗ → X ∗ is the adjoint map (z ∗ , Ay) =
(A∗ z ∗ , y) for the pairing of Y with Y ∗ , then the strong duality theorem of Fenchel states:

Theorem: p∗ = d∗

The theorem is due to Werner Fenchel [77]. It can be generalized, allowing for milder regularity
and even unbounded functions f, g but then only the weak duality result p∗ ≥ d∗ holds.

184. Legendre transform duality

The Legendre transform of a convex function f : X → R dened on a convex set X in

Rn with inner product (x, y) is dened as the function f ∗ (x∗ ) = supx ∈ X(x∗ , x) − f (x) on
X ∗ = {x∗ , supx(x∗ , x) − f (x) < ∞}. The convex function f ∗ on X ∗ is also called the convex
conjugate of f . One has the following duality result:

Theorem: f ∗∗ = f .

In the simplest one-dimensional case, convexity means f ′′ (x) > 0. The derivative ((x∗ , x) −
f (x))′ = 0 means x∗ = f ′ (x) so that g ′ (x∗ ) = f ′ (x)−1 (x∗ ). For f (x) = ex , one has f ∗ (x∗ ) =
x∗ log(x∗ ) − x and for f (x) = x2 one has f ∗ (x∗ ) = x2 /4. For the function f (x) = ex−1 = y one

87
FUNDAMENTAL THEOREMS

has y = f ′ (x) = ex−1 and x = 1 + log(y) so that f ∗ (x∗ ) = x∗ − xx∗ = x∗ − (1 + log(x∗ ))x∗ =
−x∗ log(x∗ ) which is the function appearing when dening entropy. See [605].

185. Gershgorin Circle Theorem

If A be a complex n × n matrix, denote by λj (A) the eigenvalues of A. These are the solutions
to the polynomial equation pA (λ) = det(A − λI) = 0 of degree n. By the fundamental theorem
of algebra, there are exactly n eigenvalues, counted with multiplicity. If Ri = j̸=i |Aij | is the
P

l∞ norm of the i'th row vector with the diagonal entry |Aii | missing, the disk Gij = BRi (Aij ) is
called a Gershgorin disc. The Gershgorin circle theorem is a result in matrix theory.

Theorem: Every eigenvalue λj lies in at least one Gershgorin disk

The theorem can also be seen as a perturbation result because if A is a permutation matrix
multiplied with a diagonal matrix, then the Gershogorin discs have radius 0. The result can be
used to estimate how much the eigenvalues can deviate if Q such a matrix is perturbed. The result
can also be used to estimate the determinant det(A) = j λj of A. A special case, attributed
by Gershgorin to Bendixson and Hirsch is that |λj | ≤ nmax1≤i,j≤n |Aij |. The result can also
be used to estimate the error when computing solutions Ax = b of linear equations. This is
useful in numerical methods like when expressing the error of x in terms of the error in A, B
using the
Pcondition number ||A ||||A|| of A. Gershgorin also mentions the corollary that if
−1

|Aii | > j̸=i |Aij | for all i, then the matrix A is invertible. The result was found by Semyon
Aranovich Gershgorin in 1931. See [718]. This book contains also a copy of Gershgorin's paper
from 1931. (In that original article, Gershgorin writes his name as Gerschgorin.)

186. The Canada Day Theorem

For any symmetric n × n matrix A , the sum of all k × k minors det(AI×J ) with |I| = |J| = k of
A is equal to the sum of the principal k × k minors det((T A)I×I ) of the matrix T A , where
T is the lower triangular n × n matrix that is Tkk = 1 in the diagonal, and Tkl = 2 for k > l
and Tkl = 0 for k < l. The notation is that if I, J are subsets of {1, . . . , n} with cardinality
k , then P = I × J is the product set which denes the k × k matrix AI×J in which only the
elements in the pattern P appear. The minor is then dened as the determinant det(AI×J ) of
that sub-matrix.

|I|=k det((T A)I×I ).

P P
Theorem: |I|,|J|=k det(AI×J ) =

a b 1 0
For example, if A = and T = , then for k = 2, this means det(A) = det(T A).
b d 2 1

a b
For k = 1, the theorem can be veried by computing T A = and checking
2a + b 2b + c
a + b + b + c = a + (2b + c). The paper appeared rst in [344] and was published in [345]. Since
the peak of the discovery appeared on a July 1 2008 which is Canada Day, the name stuck.
The proof of the result uses the Cauchy-Binet theorem which reduces it to show
X X
det(AI×J ) = det(TI×J )det(AI×J ) .
|I|,|J|=k |I|,|J|=k

Now, det(TI×J ) is 2p(J,I) if J < I and 0 otherwise, where p(J, I) = |J \ I ∩ J|.

88
OLIVER KNILL

187. Nash embedding theorem

A Riemannian m-dimensional manifold (M, g) is isometrically embedded in Rn if there is

an injective smooth map ϕ : M → Rn that is an isometry. This means that g(u, v) =
(dϕ(u), dϕ(v)) for all u, v ∈ Tx M for the Riemannian metric g of M . Let us say the (m, n)-
embedding problem can be solved for M or an (m, n)-embedding is possible, if an
isometric smooth embedding into Rn can be achieved for every compact Riemannian manifold
(M, g) of dimension m. The Nash embedding theorem is

Theorem: An (m, n)-embedding is possible for n ≥ 1.5m2 + 5.5m.

For non-compact manifolds (M, d), an isometric embedding needs the dimension n of the Eu-
clidean space to be a bit larger. It is possible if n ≥ 1.5m3 + 7m2 + 5.5m. These constants
appeared in the original 1955 paper of Nash (reprinted in [450] Chapter 11). See also [635]
Appendix C. The embedding cannot not work for n < 0.5m2 + 0.5m because the right hand
side is the number of freedoms of the Riemannian tensor at a point. Nash's paper includes
also some history: Ludwig Schläi in 1871 conjectured an embedding in n ≥ 0.5m2 + 0.5m
but Hilbert in 1901 showed that a constant negative curvature manifold can not be embedded
in R3 . Chern and Kuiper in 1952 showed that the at torus (Tn , d) can not be embedded in
R2n−1 . This is sharp because for even n, the Cliord torus is an embedding in R2n using that
T1 has an isometric embedding in R2 . For local embeddings, Élie Cartan was able to verify in
1927 (following work of M. Janet in 1926) that the Schläi constant works. A modern proof of
the Nash-embedding theorem uses the Nash-Moser inverse function theorem (combining
the method of Nash from 1955 and from a paper of J. Moser of 1966, who fashioned it into
an abstract theorem in functional analysis [314, 447]). The Nash embedding theorem is much
harder than the Whitney embedding theorem which solves the embedding problem without
insisting that ϕ is an isometry. In that case, n ≥ 2m is possible. For a more recent simpli-
cation of the proof improving also the constant to n ≥ max(0.5m2 + 2.5m, 0.5m2 + 1.5m + 5),
see [300]. The local embedding is rst solved based on the Cauchy-Kowalevski theorem for
partial dierential equations in an analytic setting. The problem is considerably harder in the
smooth case and this is where already an iterative smoothing process is needed.

188. Erdös Straus relation

The Diophantine equation 4/n = 1/x + 1/y + 1/z for unknown positive integers x, y, z, n
is called the Erdös Strauss relation. It is equivalent to 4xyz = n(xy + xz + yz). One only
needs to study this in the case when n is prime because if 4/p = 1/a + 1/b + 1/c is solved, then
4/(pq) = 1/(aq) + 1/(bq) + 1/(cq). As the equation can be solved modulo any prime, by the
Hasse principle one should be able to get solutions for any n; but this is still unknown. It
can appear silly to put the following as a theorem" because it is obvious" (or trivial" to use
a curse word), once one sees it, but it illustrates that the diculty of a Diophantine problem
can be hard to judge, if one sees it for the rst time.

Theorem: If n+1 is divisible by 3, then the Erdös Straus equation is solvable.

There is an easy explicit solution formula which one can look up, but which can be fun to search
for, but only if one has not seen it yet. The Erdös-Straus conjecture or 4/n problem states
that for all integers n larger than 1, the rational number 4/n can be expressed as the sum
of three positive unit fractions. Paul Erdös and Ernst G. Straus formulated the conjecture in

89
FUNDAMENTAL THEOREMS

1948. The problem is still open. Related is a conjecture of Sierpinski, the conjecture that
5/n = 1/x + 1/y + 1/z can be solved. [303]. These problems have appeal because they tap
into an old theme of Egyptian fractions which already appear on the Rhynd papyrus from
around 1650 BC. On that document, numbers 2/n were written as Egyptian fractions for all
odd numbers n between 5 and 101. An other interesting problem is to countP or estimate the
number f (n) of solutions of the 4/n problem. In [218], the sum S(n) = p≤n,p prime f (p) is
bound both from below and above by n log (n) ≤ S(n) ≤ n log (n) log log(n).
2 2

189. Dieudonné Determinant

If A is a n × n matrix with entries in a not necessarily commutative ring like the quaternions
Q, one can still look at the Leibniz determinant det(A) = σ sign(σ)A1σ(1)) · · · Anσ(n) . This
P
is a sum over all permutations σ of {1, . . . , n}, where sign(σ) is the signature of σ . It does not
satisfy the Cauchy-Binet identity det(AB) = det(A)det(B) in general. There are two ways
to get a determinant which satises the later: the rst one is called the Study determinant
[691]. It is a real-valued determinant dened if R is a real normed division ring, meaning
|ab| = |a||b|. The second is the Dieudonné determinant [179] which takes values in the
Abelianization R/[R, R] of the division ring (this is the unique largest subring of R that is
Abelian. It is obtained by factoring out all elements of the commutator form aba−1 b−1 ). The
Dieudonné determinant has the property that it agrees with the Leibniz determinant in the com-
mutative case like R = R or R = C, the Study determinant is a bit easier to compute because
we do not bother with commutators and allows directly go to the norm. Both determinants rely
on the ability to make row reduction which requires that one can divide from the left or from
the right. They work especially in all normed real division algebras R, C, H, O, where in the
quaternion and octonion case, the Study and the Dieudonné determinant agree. The axiomatic
denition of the Dieudonné determinant is by asking it take values Q in the Abelianization R and
demanding for example det(A)det(B) = det(AB) and det(A) = i Aii if A is upper triangular.

Theorem: For a division ring, there is a unique Dieudonné determinant.

It follows from the axioms that det(A) = 1 and that det(1 + Eij ) = 1, if Eij is the elementary
0 − 1 matrix which is 0 everywhere except in the diagonal and the entry ij , where it the value
is 1. It also follows that det(λA) = λdet(A) so that row reduction allows to compute the
determinant depending on whether (−1) = 1 or not. It also follows that det(A) = 0 if and only
if A is singular because that is equivalent to having A row reduce to a triangular matrix with
a zero in the diagonal. For quaternions for example (−1) = 1 because iji−1 j −1 = kk = −1.
Because SU (2) has a trivial Abelianization, one has q = |q| for quaternions. In order to show
the existence of the determinant, one can use row reduction and note that for n ≥ 2, any
diagonal entry aba−1 b−1 can be morphed into 1 using row reduction steps. One can verify the
product property by writing the matrix A as a product of elementary matrices and abelianized
ring elements. The Dieudonné determinant is treated in [27, 93, 608].

190. Centroid theorem

The surface area of a surface S or revolution in R3 obtained by rotating a piecewise smooth

curve T around the axis of symmetry L is equal to the arc length |T | times the length |C|
of the circle which is traced by the geometric centroid of T . This is the Pappus surface
centroid theorem and it can be written as |S| = |T ||C|. Similarly, the volume |E| of a solid

90
OLIVER KNILL

of revolution E obtained by taking the unions of all projection lines from S to L is equal to the
area |A| of the at lamina A between L and T , multiplied by the arc length |C| of the circle
which the centroid of A traces when rotated around L. The Pappus solid centroid theorem
is then the formula |E| = |A||C|. This can be generalized: let C be a nite curve connecting
two points P and Q and let A be a bounded closed region with smooth boundary T contained
in the plane perpendicular to the curve at A such that P is the centroid of A. The region A
can be transported along C using the Frénet frame and denes a solid E with boundary S .
We assume that the tube S remains smooth and is a smooth embedding of a 2-dimensional
cylinder in R3 .

Theorem: For surface area |S| = |T ||C|, for volume |E| = |A||C|.

For example, if T is a half circle of radius r in the xz -plane connecting P = (0, 0, −r) with
Q = (0, 0, r) and L is the z -axes, then |γ| = πr and |C| = 2π(2rπ) = 4r so that the surface
area of the sphere S is 4πr2 . A lamina A is a half disc in the xz -plane of radius r which has
area |A| = πr2 /2. The centroid of A has distance d = 4r/(3π) from L moving on a circle
of length |C| = 2πd = 8r/3. The volume of the sphere of radius r therefore is |A||C| =
(πr2 /2)(8r/3) = 4πr3 /3. The result of Pappus is also used to compute the surface area and
volume of Stubes. Here is an other example: if C is a smooth closed curve in R3 such that
the tube x∈C Br (x) forms a solid E with piecewise smooth boundary surface S that does
not have any self intersection, then the surface area is |S| = |T ||C| = 2πr|C| + 4πr2 and the
volume is |E| = |A||C| + 4πr3 (the additional terms come from the sphere roundings at the
end points"). In this case, the lamina A are disks of radius πr2 and the curves T are circles
of arc-length 2πr. Even more general versions have been discussed in detail in [277]. For tube
methods in dierential geometry also in higher dimensions (which are certainly also inspired
by the Pappus centroid theorem), see [286]. Herman Weyl used tubes as a powerful tool in
dierential geometry [742].

191. The Borsuk antipodal theorem

Let M = Sn denote the n-sphere {|x|2 = 1} ⊂ Rn+1 equipped with metric induced from open
sets in
Snthe Euclidean space R . Let A0 , . . . , An be cover of M by closed sets. This means
n+1

that k=0 Ak = S . We say, a subset A of M contains an antipodal pair, if there is a pair of

points {x, −x} ∈ M which are both in A.

Theorem: A cover of the n-sphere by n + 1 sets contains an antipodal pair.

The theorem is also known as the Lusternik-Schnirelman-Borsuk antipodal theorem

(already called so by [350]), much of the literature just calls it the Borsuk theorem, maybe
because of simplicity. The theorem is equivalent to the Borsuk-Ulam theorem stating that
every map f from M to Rn has the property that some antipod pair x, y has the property
that f (x) = f (y). Stan Ulam was credited in the Borsuk paper as the originator of the
problem. In [75] section 41 contains elegant proofs that the statements in Borsuk's theorem
and in the Borsuk-Ulam are equivalent. The result generalizes to the situation when M with
a manifold homeomorphic to Sn equipped with an involution T : M → M which is conjugated
to the antipodality on Sn . See [75] Section 150. For n = 1, the theorem is equivalent to the
intermediate value theorem: M is a circle and the function f (x) − f (x′ ), if not constant
0, takes both positive and negative values so that there must be a point where f (x) = f (x′ )

91
FUNDAMENTAL THEOREMS

with antipodal points x′ . For n = 2, if we cover the 2-sphere with 3 open sets, there is one of
the sets which contains an antipode. The more surprising equivalent Borsuk-Ulam statement is
then that there are two anti-podes on earth, where both the temperature and the pressure are
the same. The theorem appeared rst in 1930 in a paper by Lusternik and Schnirelman and
then more generally in 1933 by Karol Borsuk [80]. The fact that there is a general theorem on
Lusternik-Schnirelman category by Lusternik and Schnirelman is a reason to stick to Borsuk for
the antipodal theorem. Heinz Hopf generalized in 1944 the theorem as follows: if A0 , · · · , An
are n closed sets covering the unit sphere Sn in Rn+1 and 0 < d ≤ 2 is a distance, then there
exists a set Ak in which there exists two points of distance d. The special case d = 2 is the
Borsuk theorem. Hopf notes that this implies that if the n-sphere is covered by n+2 non-empty
closed sets such that none of them contains a antipodal pair, then every collection of n + 1
sets has a non-empty intersection and states in a footnote that this means that the nerve of
the cover F0 , . . . , Fn+1 is then isomorphic to the boundary complex of a (n + 1)-dimensional
simplex.

192. Zagier's inequality

Assume f, g are non-negative and decreasing functions on [0, T ]. They are then automatically
RT
integrable. Denote by E[f ] = T1 0 f (x) dx the average of f .

Theorem: If f, g are non-negative and decreasing, then E[f g] ≥ E[f ]E[g].

[765] formulates this more generally as follows: if f, g are decreasing and non-negative on [0, ∞)
and F, GR ∈ L1 ([0, ∞)) take values in [0, 1], then (f, g) ≥ (f, F )(g, G)/max(I(f ), I(g)), where
∞
I(F ) = 0 F (x)dx = |F |1 .
The Zagier inequality has also been called a anti-Cauchy-Schwarz inequality [75] because in
Cauchy-Schwarz |f · g| ≤ |f ||g| in a Hilbert space, the inequality works in the opposite
direction. In [75], the inequality on nite intervals is called Chebychev's inequality but the
later should maybe be reserved for the inequality P[|X − E[X]| > ϵ] ≤ Var[X]/ϵ2 for a random
variable variable X ∈ L2 (Ω, A, P ) on a probability space (Ω, A, P). The Zagier inequality also
works for decreasing sequences fn , where E[f ] = n1 n−1 k=0 fk is the Birkho average. Now,
P
the same statement E[f g] ≥ E[f ]E[g] holds. In the simplest case, for f = (a, b) and g = (c, d),
this is equivalent to 2(ac + bd) ≥ (a + b)(c + d) for a ≥ b, c ≥ d which is already not totally
obvious as it is equivalent to ac + bd ≥ ad + bc.

193. Gini coefficient

If x1 , · · · , xn are non-negative real numbers with mean m = n1 nk=1 xk , the number G =

P
n n
j=1 |xi − xj | is called the Gini coecient of the data. Using |a − b| = a + b −
1
P P
2n2 m i=1 Pn Pn
2min(a, b) it can be rewritten as G = 1 − n2 m i=1 i=1 min(xi , xj ). A common interpretation
1

is that xk is the income of person k in a population X = {1, . . . , n} of n people. The number m

is then the mean income of the population. If a population X of nPpeople is split Pinto smaller
groups Xk , k = 1, . . . , r of size nk and have mean income mk , then rk=1 nk = n, rk=1 nk mk =
nm. If G(X) is the Gini coecient of X and G(Xk ) the Gini coecient of the sub-population
Xk , then
Pr
Theorem: nG(X) ≥ k=1 nk G(Xk )

92
OLIVER KNILL

This and many more inequalities relating G(X) with G(Xk ) appear in [763].R There is also
∞
a continuum
R ∞ analog: for a probability density function f on [0, ∞) with 01 Rf ∞ (x)R dx =
∞
1, m = 0 xf (x) dx, the continuum Gini coecient is dened as G = 2m 0 0 |x −
∞ ∞
y|f (x)f (y)dxdy which is equivalent to G = 1 − m1 0 0 min(x, y)f (x)f (y)dxdy . The Gini
R R
R1
coecient isR twice the area between the Lorenz curve and the diagonal G R= 2 0 p − L(p) dp,
x x
where p = 0 f (t) dt is the cumulative distribution value and L(p) = m 1
0
tf (t) dt. In the
context of income inequality, where the subject has come up in economics, L(p) represents
the fraction of the total income which is earned by the poorest np people. The graph of L(p)
is a convex curve from (0, 0) to (1, 1), the slope L′ (p) being the relative income in the corre-
sponding percentile of the population. The Gini coecient is also called Gini index. It has
been introduced by Corrado Gini in 1912. It is a natural quantity because on the real line
the Green's function of the Laplacian −∆/2 with ∆f = f ′′ one has g(x, y) = |x − y|. The
potential V (x) = |x| is the natural Newton potential". For M = Rd in dimension d ̸= 2 it
is g(x, y) = |x|2−d for the Laplacian −∆/|Sd−1 |, where |Sk | is the volume of the k -dimensional
unit sphere; it is the logarithmic potential log |z|/(2π) in dimension d = 2. The most famil-
iar case is the 3-dimensional Euclidean space R3 , where the Newton potential 1/|x| appears
in electro magnetism and gravity. The Gini potential |x| is roughly the force between
two planar parallel mass sheets like two galaxies rotating around the same axis. In general,
for any Riemannian manifold M with Greens function g(x, R Ry) (the inverse of the Laplacian)
and measure µ (mass distribution) the integral I(µ) = M M g(x, y)dµ(x)dµ(y) is the po-
tential theoretical energy of the measure µ. The Gini index therefore is proportional to
the potential theoretical energy for a mass distribution with density µ = f (x)dx on [0, ∞).
The above inequality could therefore be interpreted as an inequality for the potential energy
of particles which are partitioned into non-interacting groups. Switching o energies between
non-interacting parts lowers the energy.

194. Denjoy-Koksma theorem

If T : X → X be an ergodic automorphism of a probability space (Ω, A, µ). (Automorphism

means µ(T (A)) = µ(A) for all A ∈ A and ergodic means that T (A) = A implies µ(A) ∈ {0, 1}.)
The Birkho ergodic Rtheorem assures that for all g ∈ L1 (Ω) and almost every x ∈ Ω we
Pn−1
have Sn (x)/n → E[g] = Ω g(x) dµ(x) with the Birkho sum Sn = k=0 g(T k x). An example
dynamical system is the irrational rotation T : x → x + α on the circle T1 = R1 /Z1 equipped
with the Lebesgue measure µ = dx. Denjoy-Koksma theory estimates the growth of Sn (x)
depending on Diophantine properties of α and regularity properties of g . A real number
α is called Diophantine, if there exists a constant C such thatP |pα − q| ≤ Cq , for all integers
p, q . A function g has bounded variation if Var(g) = supP |g(xi+1 ) − g(xi )| is nite, where
the supremum is over all nite sets P = {x1 , . . . , xn = x0 } in T1 . In the simplest case, the
Denjoy theorem says Sn ≤ C log(n)Var(g) for all n and that there is a sequence of integers
qn , for which Sqn (x) ≤ Var(f ), the periodic approximations pn /qn → α. For r ≥ 1, a real
number α is called r-Diophantine, if |qα − p| ≤ Cq r for all integers p, q . The Denjoy-Koksma
theorem was generalized in 1999 by Svetlana Jitomirskaja to

Theorem: If α is r-Diophantine, then |Sn | ≤ Cn1−1/r log(n)Var(g).

For a periodic approximation p/q of α [404] one has |Sq | ≤ Var(f ): to see this divide T1
into q intervals centered at yk = kp/q . The intervals have length 1/q ± O(1/q 2 ) and each

93
FUNDAMENTAL THEOREMS

contains exactly one point. Renumber theP points to haveRyk in Ik . By the intermediate value
q−1
theorem, there exists a Riemann sum q f (x) dx = 0 for which every xi is in
1
i=0 f (xi ) =
an interval Ii . Choosing xi = minx∈Ii f (x) gives an lower and xi = maxx∈Ik f (x) gives an upper
bound. Now, q−1 |f (yj ) − f (xj )| + |f (xj ) − f (yj+1 )| ≤ Var(f ). Therefore,
P P
j=0 f (yj ) − f (xj ) ≤
if qk ≤ n ≤ qk+1 and n = bk qk +P bk−1 qk−1 + · · · + b1 q1 + b0 , then Sn ≤ ni=0 (b0 + · · · + bn )Var(f ).
P
where bk ≤ qi+1 /qi . So, Sn ≤ ni=0 qi+1 qi
Var(f ). If α is r-Diophantine, then |qα| ≤ c/q r and
qi+1 ≤ qir /c. Because n ≤ qk+1 ≤ qkr /c, we have qk ≥ (cn)1/r and n/qk ≤ c−1/r k 1−1/r . Because
k ≤ 2 log(qk )/ log(2), the claim follows. For r = 1, see [158] (page 84). In general, see [377].

195. Quadrilateral theorem

Let ABCD denote a convex quadrilateral in R2 . Alternatively, the four arbitrary points
A, B, C, D in R3 dene a tetrahedron. Assume the side lengths are a = |AB|, b = |BC|, c =
|CD|, d = |DA| and that the diagonal lengths are e = |AC|, f = |BD|. Let M = (A + C)/2
and N = (B + D)/2 be the midpoints of the diagonals and g = 2|M N |. The Euler law on
quadrilaterals is

Theorem: a2 + b2 + c2 + d2 = e2 + f 2 + g 2 .
One can verify this by just expanding out what one gets when writing the condition in coordi-
nates. The proof then shows that the statement gives also a statement about lengths of a tetra-
hedron in space R3 : if a, b, c, d, e, f are the side lengths of an arbitrary tetrahedron in space
and the edges L, M belonging e, f have no common point, and g is twice the length between the
midpoints of the two segments L and M , then the same relation holds in space. This has been
noted in [387]. In the case of a rectangle, where a = c, b = d, g = 0, e2 = f 2 = a2 + b2 , g 2 = 0
one has the Pythagorean theorem. In the case of a parallelogram, where a = c, b = d, g = 0
one has 2a2 + 2b2 = e2 + f 2 , it is the parallelogram law. Some other themes of Euler come to
mind too like Diophantine equations: if the points A, B, C, D have integer coordinates and all
distances between points are integers, one has a problem in number theory. For rectangles, this
leads to Pythagorean triples. The problem of perfect Euler bricks comes then to mind,
which asks for a cuboid with integer side and diagonal lengths.

196. Reeb sphere theorem

Let M be a closed, compact d-dimensional dierentiable manifold. Closed means that the
boundary of M is empty. If f : M → R is a smooth real-valued function, then points x ∈ M
with vanishing gradient ∇f (x) = 0 are called critical points. A critical point x is called
non-degenerate, if the Hessian d × d matrix H(f )(x) is invertible at x. Let c(M ) denote
the minimal number of non-degenerate critical points which a function f on M can have. We
say M is a d-sphere, if there is a homeomorphism of M to the standard unit sphere {|x| = 1}
in Rd+1 .
Theorem: c(M ) = 2 if and only if M is a d-sphere for some d ≥ 0.

The level curves f −1 (c) = {f = c} of f form then a foliation of M which are (d−1)-dimensional
spheres which only degenerate to points at the critical points. The proof of the theorem goes
by showing that a manifold which admits exactly 2 critical points can be covered by 2 balls,
then use that this characterizes spheres. The Reeb sphere theorem was proven in 1952 [596].

94
OLIVER KNILL

It is referred to and generalized in [503] who generalizes and improves on results by Milnor and
Rosen. The assumption of f has two critical points does not imply that M is dieomorphic to
the standard unit sphere. There are exotic spheres which are homeomorphic to the standard
unit sphere but not dieomorphic to it. The Reeb theorem is covered in [518]. In the rst proof
of the existence of exotic 7-spheres, [521], the Reeb Sphere theorem was used as hypothesis H.

197. Hausdorff distance

Let (X, d) be a metric space. Given a compact subset U of X , let BS r (U ) the set of all points
that are in distance ≤ r from a point of U . In other words Br (U ) = x∈U Br (x), where Br (x)
is the ball {y ∈ X, d(x, y) ≤ r} in X . The Hausdor distance δ between two non-empty
compact subsets U, V of X is dened as the inmum over all r ≥ 0 such that U ⊂ Br (V ) and
V ⊂ Br (U ). It is a metric on the set of all compact subsets. This space (χ, δ) is a new metric
space. It is again compact:

Theorem: If (X, d) is compact, then (χ, δ) is again compact.

The process could therefore be iterated and produce a sequence of compact metric spaces,
where in each step the Hausdor metric is used on the previous one. For Hausdor distance,
see [227], Chapter 9, in the context of iterated function systems in the theory of fractals.
A sequence of contractions denes an attractor which can be seen as a limit of a sequence of
compact sets. In the simplest situations, one can then use the Banach xed point theorem
to establish the existence of a limit. The distance has been used by Maurice Fréchet in 1906 to
measure the distance between curves. The distance was introduced by Felix Hausdor in 1914
[323] (page 303).
The Hausdor distance allows also to dene a distance between compact metric spaces (X1 , d1 ),
(X2 , d2 ). The Gromov-Hausdor distance of two compact metric spaces is dened as
the inmum over all possible Hausdor distances δ(ϕ1 (X1 ), ϕ2 (X2 )), where ϕi : Xi → X are
isometric embeddings of (Xi , di ) into a third metric space (X, d). This metric space (X , D) of
all compact metric spaces has a dense set of nite metric spaces so that it is separable. It is
also complete, from which one can deduce that it is connected. David Edwards [212] called this
superspace".

198. Grove-Searle theorem

The set of compact even-dimensional Riemannian 2d-manifolds which admit a positive cur-
vature metric contains spheres S2d , projective spaces RP2d , CPd , HPd , OP2 over the division
algebras R, C, H, O, the three Wallach ag manifolds W 6 , W 12 , W 24 [728] and the Eschen-
burg manifold E 6 [223]. No other example is known [772]. All these manifolds admit a
positive metric with a continuum isometry group. In particular they admit a metric which
allows for an isometric circle action. The xed point set N = ϕ(M ) of such an action is never
empty [56]. By a theory started by Conner and Kobayashi it is again a positive curvature
manifold N that is totally geodesic and of even co-dimension. The components of N can have
dierent dimension but by Lefschetz, the Euler characteristic of N is the Euler characteristic
of M [144, 434]. Lets call a manifold with circular symmetry Grove-Searle if the xed point
set N has a connected component of co-dimension 2. The Grove-Searle theorem [295] now
tells:

95
FUNDAMENTAL THEOREMS

Theorem: If M is Grove-Searle, then M = S2d , RP2d or CPd .

In odd dimensions, there is beside M = S2d+1 or M = RP2d+1 also the possibility of space
forms S2d+1 /Zm . An application of the theorem is that all 2d-dimensional positive curvature
manifolds admitting a circular symmetry have positive Euler characteristic if 2d ≤ 8. Proof: N
is not empty by Berger and χ(N ) = χ(M ). N has a co-dimension 2 component, Grove-Searle
forces M to be in {RP2d , S2d , CPd }. By Frankel [244], there can be not two co-dimension 2
connected components. In the remaining cases, Gauss-Bonnet-Chern [130] forces all to have
positive Euler characteristic. There is huge interest in even-dimensional positive curvature
manifolds because of the open Hopf conjecture [348, 351, 352, 69, 57] asking whether every
even-dimensional compact positive curvature manifold has positive Euler characteristic. The
above corollary of Grove-Searle assures that the Hopf conjecture with circle symmetry holds
in dimension ≤ 8. It is also known for 2d = 10: [589, 607, 749]. See also 2d = 6 in [571] (2.
Edition, Cor. 8.3.3). While in dimension 2 and 4 the classication of positive metric manifolds
with circular symmetry is known like {S4 , RP4 , CP2 } in dimension 4 [355], in dimension 6 one
knows so far the cases {S6 , RP6 , CP3 , E 6 , W 6 } and it is not known whether they are all.

199. Radon-Nikodym theorem

A measurable space (Ω, A) is a set equipped with a σ -algebra A. This means that A is a set
of subsets of X containing X , that is closed under forming complements and the operation of
taking countable unions. A non-negative valued function f : Ω → [0, ∞) is called measurable
if f −1 (B) ∈ A for every B in the Borel σ -algebra on [0, ∞), the smallest σ -algebra containing
the open sets. Given two σ -nite measures µ, ν , (meaning that Ω is in each case a countable
union of sets of nite measure), on (Ω, A) one calls µ absolutely continuous with respect to
ν , if ν(A) = 0R implies µ(A) = 0. An example is if there exists a function f ∈ L1 (ω, A, ν) such
that µ(A) = A f (x) dν(x), then µ is absolutely continuous with respect to ν and the function
f is called the Radon-Nikodym derivative of µ with respect to ν , as dµ(x) = f (x)dν(x)
suggests to write dµ/dν = f . The Theorem of Radon-Nikodym assures that this situation
is the general case. Let us abbreviate µ << ν if µ is absolutely continuous with respect to ν .

Theorem: If µ << ν , there exists f ∈ L1 (Ω, A, ν) with µ = f ν .

The theorem is important in probability theory, where the measures under consideration are
usually probability measures, meaning µ(Ω) = 1. If µ is absolutely continuous to ν then
every set of zero probability with respect to ν has zero probability with respect to µ. An example
of a measure µ on the Lebesgue space ([0, 1], A, ν = dx is a Dirac point measure δx for a point
in [0, 1]. An application of the Radon-Nikodym theorem is the Lebesgue decomposition
of a measure. One can split every σ -nite measure into an absolutely continuous, a singular
continuous and a pure point part. This is important in spectral theory of mathematical physics
[597]. For measure theory and real analysis in general, see for example [220]. For the history,
[653] (page 257): the theorem was rst proven by Radon in 1913 in Rn and then by Nikodym
in 1930.

200. Crofton formula

If a needle of length l < 1 is thrown randomly into a periodic grid of lines spaced distance 1
apart, the probability of hitting a grid line is 2l/π . This method of computing π is an example

96
OLIVER KNILL

of a Monte-Carlo method. A probability space of needle congurations can be given as

(Ω, A, µ) = ([0, 1/2] × [−π/2, π/2], A, 2dθdr/π) with product Lebesgue measure, where r is the
minimal distance of the center of the needle to a grid line and θ is the polar angle. The needle
obviously hits if and only if r ≤ (l/2) cos(θ). The probability therefore is obtained by integrating
R π/2 R (l/2) cos(θ)
the density 2/π over this region. It gives −π/2 0 2/πdrdθ = 2l/π . This can now be
generalized for any rectiable curve of length l. One has only to look at the random variable
X , which counts the number X of intersections of the randomly placed curve with a grid. The
Crofton formula in the plane is now E[X] = 2l/π : (to see this, approximate the curve by a
polygon and look at each segment li as a needle" of length l/n. Then X = X1 + · · · + Xn where
Xj counts the number of intersections with Lj . By linearity of expectation and additivity of
length, the Crofton formula follows.) One can look at the problem also in Rn , where one has
a system of parallel hyperplanes spaced a unit apart and a rectiable curve of length l. Now,
the volume |B n−1 | of the (n − 1)-dimensional unit ball and the volume |S n−1 | of the (n − 1)-
dimensional sphere matters. Again, X is the number of intersections of the curve with the
periodic plane grid.

Theorem: E[X] = 2l|B n−1 |/|S n−1 |.

In the case n = 2, this was |B 1 | = 2, |S 1 | = 2π and the original Buon formula follows. The
Buon needle problem is the st connection between probability theory and geometry. It
appeared rst in 1733 and was reproduced again in 1777 by Buon. Morgan Crofton extended
this in 1868 [163]. The mathematical eld of integral geometry started to blossom with Blaschke
[71] in the late 1930ies. Probability spaces can be used to study more geometrical quantities
like surface area, or curvature [41, 42, 517]. General references are [624, 625, 408, 633]. The
n-dimensional Crowfton formula can be found in [408].

201. Desnanot-Jacobi identity

If A is a n × n matrix, the matrix entries are accessed as Aij . Call Aji the matrix obtained by
deleting row i and column j in A. The expression (−1)i+j det(Aji ) is also known as a cofactor
of the minor det(Aji ). Similarly, let Akl
ij be the matrix in which rows i, j and columns k, l are
deleted. The Desnanot-Jacobi identity is the following relation between sub-determinants
of a matrix:

1n ) = det(A1 )det(An ) − det(A1 )det(An ).

det(A)det(A1n 1 n n 1
Theorem:

It allows to write det(A) in terms of the (n − 2) × (n − 2) matrix in which the boundary rim
is removed and all the four possible (n − 1) × (n − 1) matrices, where one boundary row and
boundary column is removed from the matrix. In the case when n = 2 the identity still works if
one interprets det(A1n
1n ) = det(A12 ) as 1, which is usually assumed the valuefor the
12
determinant
a b
of the empty matrix. In that case, the Desnanot-Jacobi identity is just det( ) = ad − bc.
c d
The identity is also called the Desnanot-Jacobi adjoint matrix theorem. A generalization
is called the Sylvester determinant identity. The Desnanot-Jacobi identity leads to a
process called Dodgons condensation or Alice in Wonderland condensation because
Charles Lutwidge Dodgson is also known as Lewis Carroll, the author of Alice in Wonderland"
[118]. The condensation method was described in [189] in 1866. The Desnanot result appears
rst in 1819, in the book [175] on page 152 and in [373] in 1827. Like for Cauchy-Binet, it is

97
FUNDAMENTAL THEOREMS

historically remarkable that this identity was found before matrices were formed. Indeed, the
word matrix", related to the latin word mater" for mother was later used in a more generalized
sense as womb". The word matrix" therefore appeared because matrices are devices which
bear determinants. There are more references in [432].

202. Existence of Minimal surfaces

A 2-dimensional surface S in Rn is the image of a parametrization r(u, v) : R → Rn , where

R ⊂ R2 is the parameter domain, an open, simply connected region in the plane which one
can assume to be the unit disc R with circle C as boundary. The surface is called minimal
surface if very component of r is harmonic ∆r = 0 and furthermore E − F = |ru |2 − |rv |2 = 0
and F = ru · rv = 0, expressing that the Riemannian metric g on S is conformal. The Plateau
problem is to nd for a given simple closed curve Γ in Rn , a minimal surface S which has Γ
as the boundary. One wants the map r to be smooth in R and continuous up to the boundary
C . The surface S does not necessarily have to be embedded (r is not necessarily injective), it
can just be immersed.

Theorem: There is a solution to the Plateau problem.

Note that this does not mean that the solution is unique. Indeed, in general there are multiple
solutions even-so generically only nitely many. In general, solutions also can have branch
points, self-intersections or can be physically unstable and so would be dicult to observe in
soap bubble experiments. The problem was solved rst in 1931 by Jesse Douglas and Tibor
Rado in 1930. If more generally, the region R has larger genus and so several boundary curves,
the problem is called the Douglas problem. When looking at how soap lms change in
dependence of parameters, huge changes like catastrophes can happen. For example, in that
if Γ is changed, suddenly, solutions to a genus one Douglas problem appear as it has lower
energy.[242] In order to solve the Plateau problem one is led to the variational problem of
extremizing the Dirichlet integral L(r) = R |ru |2 + |rv |2 dudv . The harmonicity condition
RR
∆r = 0 is the Euler equation of the variational problem. This is a special case of a Dirichlet
principle. The problem was raised by Joseph-Louis Lagrange in 1760 and named after the
physics and anatomy professor Joseph Plateau who made experiments. Poisson realized that
soap lms are surfaces of constant mean curvature. In higher dimensions, the problem has
led to geometric measure theory. We followed partly [159]. More information is in [689],
where also the history of soap lms and soap bubbles is described as one of the oldest objects
in mathematical analysis and pointed out that for a long time, since Lagrange's derivation
of the minimal surface equation, the analysis was too dicult even for mathematicians like
Riemann, Weierstrass or Schwarz. In [242] (part I) there is more history and many pictures
and relations where minimal lms in nature as the most economical surfaces forming skeletons
of radiolarians, tiny marine organisms.

203. Fermat's right angle theorem

A positive integer is a congruent number if it is the area of a right triangle with rational
sides. The 3-4-5 triangle for example has the area n = 6 so that 6 is a congruent number.
The 3/2,20/3,41/6 triangle has area n = 5. The example n = 5 shows that one have to
use rational numbers in general. If x, y, z are the lengths of the triangle, then the condition is
x2 + y 2 = z 2 , xy = 2n. Rational Pythagorean triples can be generated with x = u2 − v 2 , y =

98
OLIVER KNILL

2uv, z = u2 +v 2 . This leads to congruent numbers n = uv(u2 −v 2 ). For u = 3, v = 2 for example,

one gets the 12-5-13 triangle with with area 30. Fermat showed:
Theorem: No square number can be a congruent number

Fermat's proof from 1670 using decent can be found in a self-contained way in [148]. While
integer solutions (x, y, z) can be done by nite search for a xed n, the task to nd rational
solutions x, y, z for a given n can be dicult. For example, the smallest example for n=101
found by Bastien in 1914 is x = 711024064578955010000/q , y = 3967272806033495003922/q ,
z = 4030484925899520003922)/q with q = 118171431852779451900 [126] shows that already for
smaller n, the smallest rational numbers x, y, z solving the problem can become complicated.
Arabic mathematicians have known that numbers like 5, 6, 14, 15, 21, 30, 34, 67, 70, 110, 154, 190
were congruent numbers. Leonardo Pisano (Fibonacci) established that n = 7 is a congruent
number with (x, y, z) = (35/12, 24/5, 337/60) and conjectured that no square can be a congruent
numbers. Fermat then with his method of innite descent proved that no square is a congruent
number. Already n = 1 is interesting as it illustrates the decent method: if n = 1 is congruent
then x4 = y 4 + z 2 has a non-trivial solution.√Let a be a√rational number
√ such that
2 2
√ a + n, a − n
are squares of rational numbers. Then x = a + n + a − n, y = a + n + a − n, z = 2a
2 2 2 2

is a solution as xy/2 = (a2 + n) − (a2 − n))/2 = n. Work of 1922 by Louis Mordell related
the congruent number problem to elliptic curves. If u is so that u2 + n, u2 − n are rational
squares, then u4 − n2 is a rational square v 2 so that u6 − n2 u2 = u2 v 2 , with x = u2 , y = uv
this gives y 2 = x3 − n2 x. So, if n is a congruent number, there is a rational point on the
curve y 2 = x3 − n2 x. Kurt Heegner proved in his 1952 paper that if a prime is congruent
to 5 or 7 modulo 8, then p is a congruent number and that if a prime is congruent to 3 or
7 modulo 8 then 2p is a congruent number [66]. Jerold Tunnell (a student of Tate) showed
in 1983 [712] that the congruent number problem would have a full solution under the Birch
and Swinnerton-Dyer conjecture, one of the Millenium problems. Having that established
would allow to test in nitely many steps whether a given integer n is a congruent number or
not.

204. Stark-Heegner theorem

√
A imaginary quadratic eld K = Q[ −n] has class number 1 if there is a unique prime
factorization in K . Carl Friedrich Gauss found already 9 cases {1, 2, 3, 7, 11, 19, 43, 67, 163}.
These cases turned out to be all and are now called Heegner numbers.
Theorem: There are exactly 9 imaginary quadratic elds of class number 1.
The theorem is now known as the Stark-Heegner theorem. Kurt Heegner proved this in 1952
[330]. The proof was for more than a decade labeled to have a gap", but it got rehabilitated by
Harold Stark in 1969 [669] thanks also to [66] who was one of the rst to recognize Heegner's
achievement in his 1952 paper. The introduction of Heegner's paper is a master piece, skillfully
pointing out how the class number theory has relations to the congruent number problem
that historically has led Fermat to his descent method. As Stark pointed out, the dismissal
of Heegners proof must also have been due to professional bias as there was just one step
missing showing that a concrete equation x24 − ax8 − 16 = 0 has a six degree factor whose
coecients are algebraic integers of degree 1, to which he refers to Weber's textbook [734].
About the origin of bias [66]: Heegner was a ne mathematician, with a rather low-grade post
in a gymnasium in East Berlin. It was a widely held view that the trouble in Heegners proof

99
FUNDAMENTAL THEOREMS

should be traced to Weber. Today thanks to Stark, it is now clear that the gap was actually
not existent and Heegners proof correct. Stark also points out that Weber's part is correct but
could have given more details, if he had seen any need to do so. One can justify the name
Stark-Heegner theorem because Stark not just gave a new claried proof but took the trouble
to investigate whether there was indeed mistake in Heegner's proof. Bryan Birch certainly
also played an important role in discovering Heegner as also Heegner's numbers" got into the
spot light in the context of the Birch and Swinnerton-Dyer conjecture and the Gross-
Zagier theorem. The largest √
Heegner number got a bit of a cult status" as it appears in
Ramanujan's constant e π 163
that is less than 10−12 close to the integer 6403203 + 744
√ . This
can be justied by the fact that if n is a Heegner number, then the j -invariant of (1+ √
−n)/2)
is an integer and a q -expansion gives then a theoretical error is of the order O(e−π 163 ).

205. Equichordal point theorem

If C is a smooth convex curve in the plane a point P in its interior is called an equichordal
point if all the line segments through P have the same length. For the circle C , this happens
at the center. For the polar curve r(t) = 2 + sin(t), the center is an equichordal point.

Theorem: A convex curve can not have two equichordal points.

The problem had been posed by Fujiwara in 1916 [252] and appeared in a problem section of
Blaschke, Rothe and Weitzenböck: [72]. It seems that also Erdös was independently conjectur-
ing this as Gabriel Andrew Diracs work of 1952 indicates [184]. The conjecture was proven by
Marek Rychlik in 1997 [619] who established it more generally for star-like curves. The proof
uses methods from dynamical systems, complex analysis and algebraic geometry.

206. Lucas fundamental theorem

The Fibonacci sequence F (n) is dened by the second order recursion F (0) = F (1) = 1 and
F (n + 1) = F (n) + F (n − 1). When looking at the prime factorizations one can notice that
the even terms F (2n) have lots of prime divisors while the odd terms F (2n + 1) have only a
few. Indeed, it follows from Lucas work that all primes appear as factors of the even Fibonacci
numbers. Let GCD denote the greatest common divisor.

Theorem: GCD(F (m), F (n)) = F (GCD(m, n)).

This fundamental theorem of Lucas of 1878 [477] tells that the sequence F (n) is a strong
divisibility sequence. Together with Lucas law of apparition and Lucas law of repe-
tition, it implies that every integer divides innitely many Fibonacci numbers. [455]. In the
context or primality testing, Lucas also looked that the Lucas numbers, L(n) which satisfy
the same recursion but have a dierent initial condition L(1) = 1, L(2) = 3. One has then
F (2n) = F (n)L(n). Lagarias proved in 1985 an anlogue of the Chebotarev Density The-
orem using a method of Hasse. He showed that the density of prime divisors of the Lucas
sequence is 2/3 [456]. That article mentions that it is believed that the set of primes dividing
the terms U (n) of any non-degenerate second order linear recurrence has a positive density and
that this is conditionally true under the assumption of the generalized Riemann hypothesis.
A bit about the history (see [456]): the Fibonacci sequences appeared rst in the third book of
Liber Abbaci" of Leonardo Pisano from 1227, a book that contains 90 sample problems, with
50 from Arabic sources. It also contains the famous rabbit problem. Édouard Lucas had been

100
OLIVER KNILL

an artillery ocer in the Franco-Prussian war and then was a high school teacher in Paris, who
also was interested in recreational mathematics and invented the tower of Hanoi problem [478].

207. Hilbert distance

The Hilbert distance d(x, y) is dened for points x, y a bounded convex domain X in a
Hilbert space: construct the line through x, y . It intersects the boundary of X in exactly
two points p, q . The Hilbert distance is now dened as d(x, y) = 21 log(C(x, y, p, q)), where
C(x, y, p, q) = (|x − p||y − q|)/(|y − p||x − q|) is the cross ratio between these four points. Due
to its projective invariance, the Hilbert distance denes then also a Hilbert distance on the
projective space RP
n−1
which has the property that positive n × n matrices are contractions.
Lets call a metric on the projective space Perron-Frobenius if it has this property.

Theorem: The Hilbert metric is the unique Perron-Frobenius metric.

In the simplest case P1 , elements are described as t = [1, t] with

t ∈ R∪{∞}. The Hilbert metric
a b
then is d(t, s) = | log(t/s)|. A positive matrix A = maps [1, t] to [1, (c + dt)/(a + bt)].
c d
David Hilbert dened the Hilbert metric in 1895 in a letter to Felix Klein [337]. The Hilbert
metric between two points depends on the domain in which the points are considered. The
larger the domain, the smaller the distance. Also, if z is in the line segment [x, y], then
d(x, y) = d(x, z) + d(z, y). For strictly convex region, there is a unique geodesic (with respect
to this metric) connecting two points. It was Garret Birkho [67] and Hans Samelson [623],
who independently rst suggested to use the Banach xed point theorem to prove the Perron-
Frobenius theorem [569, 250, 251] stating that a positive matrix has a unique maximal
eigenvalue. [467]. Birkho called it the projective metric. For that, one only needs the mere
existence of a Hilbert metric and not the uniqueness. Uniqueness is shown in [437].

208. Gross-Zagier

The projective special linear group G = P SL(2, Z) is the group of integer matrices A of
determinant 1 for which the matrices A and −A are identied. It is also called the modular
group as its elements act as Möbius transformations z → Ta,b,c,d (z) = (az + b)/(cz + d)
on the upper half plane H ⊂ C. A congruence subgroup Γ of G is a subgroup of G
which has a principal congruence subgroup Γ(N ), a set of matrices in G congruent to the
identity matrix modulo M . The smallest N for which this happens, is called the level of Γ.
An important example is the Hecke congruence group Γ0 (N ) = {Ta,b,c,d , N |c}. A modular
elliptic curve E is a quotient H/Γ, where Γ is a congruence subgroup of the modular group.
Elliptic curves are the simplest positive-dimensional projective algebraic curves that carry a
commutative algebraic group structure. The set of rational points E(Q) is nitely generated
by the Mordell-Weil theorem, so that E(Q) is isomorphic to Zr × T , where r ≥ 0 is called
the rank of E and T is a nite Abelian group called the torsion subgroup of E . The Birch
and Swinnerton-Dyer conjecture claims that r is the order ords=1 (L(E, s) of L(E, s) at
s = 1, where the
P L-function L(s) for an elliptic curve E over K is an explicitly given Dirichlet
series L(s) = ∞ a
n=1 n n −s
. [It can dened as follows: for a prime p let Fpe denote the eld with
p elements. Dene t1 (E) = 2, tep (E) = pe + 1 − |E(Fpe )| and the counting zeta function
e
t e (E)
ζp (z) = exp( e≥1 p e z e ) at p and 1E (p) = 1 if p does not divide N and 1E (p) = 0 if p|N . Then
P

101
FUNDAMENTAL THEOREMS

ζp (z) = Q(1 − tp (E)z + 1E (p)pz 2 )−1 . The L-function is then dened as the Euler product
L(s) = p prime ζp (p−s ). While the Dirichlet series only converges for Re(s) larger than the
abscissa of convergence, one knows from work in the 1970ies like Shimura that in the modular
case, L has an analytic continuation to all of C.] The j-invariant j(τ ) is a modular function
of weight zero on G. It can be explicitly written down and was originally used to represent
isomorphism classes of elliptic curves. It is known that the eld of modular functions is
C(j). If τ is an element of an imaginary quadratic eld with positive imaginary part, then j(τ )
is an algebraic integer by a result of Theodor Schneider from 1937. Now, a modular elliptic
curve can be parametrized as r(z) = (j(z), j(N z)) ∈ C2 , where√ N is the level of Γ. If ω ∈ H is
a quadratic irrational number (a number of the form a + b D ∈ H with rational a, b) which
solves N Aω 2 + Bω + C = 0 then ω and N ω both have the same discriminant D = B 2 − 4N AC
so that P = r(ω) ∈ E(Q(D)). This P is called a Heegner point on E [66]. The global
canonical height function h : E → R is a function on E with the property that h(Q) = 0 if Q
is a torsion point and such that the parallelogram law h(P + Q) + h(P − Q) = 2h(P ) + 2h(Q)
holds for all pair of points P, Q on E . It is dicult to compute but the Gross-Zagier formula
[293] relates it in an explicit way with the order of the root at 1 of the function L:

Theorem: The height h(P ) of a Heegner point is a non-zero multiple of L′ (1).

This implies that if L′ (1) = 0, then P is a torsion point and that if L′ (1) ̸= 0, then the rank r
of E is positive. Heegner points have been used to construct a rational point on the curve of
innite order. The theorem was later used to prove much of the Birch and Swinnerton-Dyer
conjecture for rank 1 elliptic curves. [66] illuminates the history of the theorem.

209. Schur determinant identity

A B
The Schur determinant identity is an identity for partitioned matrices M = ,
C D

A 0
where A, B, C, D are all n × n matrices. Assume A is invertible, one can write M =
0 1
−1

1 0 1 A B
. Using the Cauchy-Binet product formula, one gets the Schur
C 1 0 D − CA−1 B
identity

Theorem: det(M ) = det(A)det(D − CA−1 B)

The matrix D − CA−1 B is called the Schur . Given

complement two n × m matrices F, G
1 −F 1 F
one can compare the determinant of AB = with the determinant of BA
G 1 0 1
to get the Weinstein-Aronszajn identity det(1 + F T G) P = det(1 + GT F ). See [170, 703].
This identity also follows from the formula det(1 + F G) = P det(FP )det(GP ) involving the
T

summation over all minors [423]. (Compare that the classical Cauchy-Binet formula for n × m
matrices F, G states det(F T G) = P det(FP )det(GP ) which is a sum over all m × m minors.
P
For n = m, it becomes the product formula det(F G) = det(F )det(G).) In [703] many more
identities are listed like det(A + BC) = det(A)det(1 + CA−1 B) if A is invertible (which means
especially det(A + B) = det(A)det(1 + A−1 B) which is special case of the Schur identity for
C = D = 1) or det(A + B)det(A − B) = det(B)det(AB −1 A − B), if B is invertible.

102
OLIVER KNILL

210. Herman's subharmonicity theorem

If (Ω, A, µ) is a probability space and T an automorphism and A ∈ L∞ (Ω, SL(2, C)) dene the
non-abelian Birkho product An (x) = A(T n−1 x))A(T n−2 (x)) · · · A(T x)A(x). An example
is when Ω is a 2-manifold and T an area- preserving dieomorphism Ω → Ω and A(x) = dT (x)
is the Jacobian. An other example is when (Lu)(n) = u(n+1)+u(n−1)+V
(Tn x)u(n) where the
E − V (x) 1
time equation Lu = Eu leads to the transfer matrix A(x) = . Dene An (x) =
−1 0R
A(T n−1 x) · · · A(T (x))A(x). The Lyapunov exponent λ(A) = limn→∞ n1 Ω log ||An (x)||dµ(x)
exists because it is a limit of a sub-additive sequence. Assume z ∈ Cd → SL(2, C) is analytic
in the sense that each matrix entry is an analytic function in each of the variables. Assume
also that T : Ddr → Ddr is analytic in a neighborhood of the polydisc Ddr and maps the boundary
Ω = Td into itself and that T (0) = 0 and T preserves the Haar measure on Ω. Herman's
theorem [334] is

Theorem: λ(A) ≥ λ(A(0)) = log(max(σ(A(0))))

The reason is that z → log(||An (z)||) is pluri-subharmonic so that the integral over the torus
is bounded below by the Lyapunov exponent value at 0. For example, if p(z) = c(z +z −1 )/2 and
T (z) = wz with w = eiα induces the dynamical system T (θ) =θ + α mod 2π on the boundary
c cos(θ) −1
T1 , then the Lyapunov exponent of A(θ) = over the dynamical system is
1 0
then larger or 2equal than log(c/2)
. The reason is that the Lyapunov exponent of B(z) =
c(z + 1)/2 −z
zA(z) = is bounded below by the logarithm of the spectral radius of
z 0

c/2 0
B(0) = = log(c/2). An other application is if A ∈ L∞ (Ω, SL(2, C)) is arbitrary and
0 0

cos(β) − sin(β)
T : (Ω, A, µ) → (Ω, A, µ) is a dynamical system, then for Aβ (x) = A(x) ,
sin(β) cos(β)
the Lebesgue measure of values β with λ(A(β)) > 0 is positive if A ∈ / SU (2, C) on some
positive measure. This can be used to show that the set of A ∈ L∞ (Ω, SL(2, C)) with λ(A) > 0
is dense [416]. The method of Herman has been extended in various way: [664] use the Jensen
inequality
in complex
analysis to show that for a non-constant real analytic f and A(x) =
E − cf (x) 1
and dynamical system T (x) = x + α mod 2π with irrational α, the Lyapunov
−1 0
exponent of A is positive for all E , if c is large enough. Herman's and the Soret's Spencer
theorem are the starting point in [85].

211. Gabriel's theorem

A quiver (V, E) is an other word for a multidigraph, a directed graph in which multiple
directed connections = arrows and self connections = loops are allowed. The graph dened
by (V, E) is the multigraph one obtains if the directions of the arrows are ignored. A repre-
sentation V of a quiver assigns vector space over an algebraically closed eld to each node
x ∈ V and a linear map V (x → y) : V (x) → V (y) attaching to each arrow x → y a linear map.
It is indecomposable if it can not be written as the direct sum of smaller positive dimensional
representations. A quiver is of nite type, if it has only nitely many isomorphism classes
of indecomposable representations. The Quiver diagrams are formed by the simply laced

103
FUNDAMENTAL THEOREMS

Dynkin diagrams An , Dn , E6 , E7 , E8 . Gabriel's theorem classies the connected quivers of

nite type.

Theorem: Connected quivers of nite type correspond to quiver diagrams

.
The theorem was proven by Peter Gabriel in 1972 [255]. Written in German, the article uses
the word Köcher" is used there for quiver. Peter Gabriel (1933-2015) was a French and
Swiss mathematician also known as Pierre Gabriel. On Wikipedia, he is listed as a student of
Alexander Grothendieck with a thesis done in 1960 on Abelian categories [254] (on his personal
website which is still active, Henri Cartan was listed as the Jury, and Jean Pierre Serre as
the rapporteur, on the Mathematics Genealogy page, Jean-Pierre Serre is listed as the advisor.
[According to Serre, Gabriel wrote an independent thesis and pointed out that in 1960, the
advisor status had not been yet as formal as today. In the published article, it is also not
visible who the formal advisor was.] Remarkably, Gabriel was doing his military service 1960-
1962 just after nishing his thesis and the Abelian category paper was submitted in 1961.
Gabriel worked at the University of Zürich from 1974-1998.

212. Zeckendorf representation

Let F (n) denote the n'th Fibonacci number. It is dened by the recursion F (n + 1) =
+ F (n − 1) and F (0) = 0, F (1) = 1, F (2) = 1. Given a positive integer n, a representation
F (n)P
n= m k=0 F (c(k)) with c(k) ≥ 2 and c(k+1) > c(k)+1 is called a Zeckendorf representation.
The nite sequence nF = (c(0), c(1), . . . , c(m)) a notation of Knuth, this is called the Fibonacci
coding of n. For example, 11 = (1010000)F and 13 = (10000000).

Theorem: Every positive integer has a unique Zeckendorf representation.

Edouard Zeckendorf published this in 1972 and mentions to have proven it already in 1939.
Lederkerker independently found the result in 1952 [124]. The proof of existence and uniqueness
can both be done by induction. As Donald Knuth realized [430], the Zeckendorf representation
of an integer leads to an associative multiplication x ◦ y = m(x)
Pm(y)
j=1 F (ci (x) + cj (y)) for
P
i=0
positive integers x, y . This is called the Fibonacci product. The proof of associativity is the
Pm(y) Pm(y)
realization that (x◦y)◦z is equal to m(x) (y)+ck (z)). Knuth mentions
P
i=0 j=1 k=1 F (ci (x)+c
√ j
that the Fibonacci product asymptotically satises x ◦ y ∼ 5xy and that the multiplication
√+ [ϕx][ϕx] by Porta and Stolarsky is asymptotically (1 + ϕ )mn ∼ 3.62mn, where
2
x ∗ y = xy
ϕ = (1 + 5)/2 is the golden ratio.

213. Turan's theorem

A nite simple graph G = (V, E) has n = |V | vertices and m = |E| edges. A p-clique is
a complete subgraph of G with p vertices. The 1-cliques can be identied with V and the 2
cliques can be identied with E . Turán's graph theorem [713] is
p−2 n2
Theorem: If m > p−1 2
, then G has a p-clique.

It assures that a triangle free graph can have at most n2 /4 edges so that if a graph has more
than a quarter of all edges connected, there must be a triangle in it. This is called Mantel's
theorem from 1907. The Turan graphs are graphs of the form Pn1 + ... + Pnk where for all nj ,

104
OLIVER KNILL

we have nj ∈ {a, a + 1} for some integer a. For nj = n/(p − 1) are constant, these are graphs
without p-cliques and B(p − 1, 2)(n/(p − 1))2 = p−2
p−1
n/2. This shows that the result is sharp.
[12] contains four short proofs, the rst one doing induction with respect to n. See also [11]
who states that the theorem of Turán initiated extremal graph theory and that the theorem
had been rediscovered man8y times.

214. The Szpilrajn-Marczewski theorem

A nite simple graph Γ = (V, E) is represented by a set of sets G if V = G and E =

{(x, y) |x ̸= y, x ∩ y ̸= ∅}. The graph Γ is the connection graph of the set of sets G.

Theorem: Every graph is the connection graph of a set of sets.

An arbitrary set of sets is sometimes also called a multigraph. The theorem shows that from
the point of view of connectivity, a multigraph can be studied by its connection graph. It
does not encode other properties like subset property. The set of sets G = {{1, 2}, {2, 3}} and
the set of sets H = {{1, 2}, {2}} both have the same connection graph K2 . The theorem was
shown by Edward Szpilrajn-Marczewski (1907-1976) in 1945 [694]. The Polish mathematician
was born Szpilrajn but changed his name while hiding from Nazi persecution. Erdös, Goodman
and Posa showed in 1964 that one can realize any graph of n vertices as a set of subsets of
a set with [n2 /4] elements and that for n ≥ 4, one can even require all sets to be distinct.
The result is sharp for n ≥ 4 The smallest number d(n) of sets needed to represent every
graph with n vertices satises d(2) = 2, d(3) = 3) and d(n) = [n4 /4] for n ≥ 4. For example
d(4) = [42 /4] = 4 and d(5) = [52 /4] = 6. The Erdös-Goodman-Posa proof is done by induction
n → n + 2 and by rst establishing the cases 4 and 5 which can be done by looking at all cases.
The Szpilrajn-Marczewski theorem has been abbreviated SM theorem in [558] and is a much
referenced theorem in intersection graph theory. The theorem does not assume the graph
to be nite. `

215. Sakai theorem

Let B(H, C) denote the Banach algebra of all bounded linear operators on a Hilbert space H .
The commutant X ′ of a subset X ⊂ B(H) is the set of all elements in B(H) that commute
with every element in X . Because of the contra-variance condition X ⊂ Y ⇒ Y ′ ⊂ X ′ , the
bicommutants satisfy X ′′ ⊂ Y ′′ so that, using B(H)′ = C, C′ = B(H), any subset X is
contained in the bicommutant X ′′ . A subalgebra X satisfying X = X ′′ is called a von-
Neumann algebra. It is called a factor if its center X ∩ X ′ is C. Von Neumann showed
the bicommutant theorem stating that X ′′ = X is equivalent to X being weakly closed.
(The weak operator topology means pointwise convergence in the sense that An → A in
the weak operator topology if and only if for every pair f, g ∈ H one has (g, An f ) → (g, Af ),
meaning that given a basis in H that the matrix elements of operators converge pointwise.)
The bicommutant theorem is remarkable as it equates the algebraic bicommutant condition
with the topological weak-closed condition. Von Neumann algebras can also be dened more
abstractly using C ∗ algebras without referral to operator algebras but the GNS construction
justies the more intuitive operator algebra denition. Like the bicommutant theorem, there
are other characterizations of von Neumann algebras. One of them is Sakai's theorem

Theorem: A C ∗ algebra is von Neumann if and only if has a pre-dual.

105
FUNDAMENTAL THEOREMS

Sakai's theorem was proven in 1956 [622]. Examples of von Neumann algebras are X = B(H),
any nite dimensional subalgebra X of the algebra of operators B(H) or any algebra X = (S ∪
S ∗ )′′ generated by an arbitrary subset S of B(H). For example, every commutative von-Neuman
algebra is of the form L∞ (Ω, A, µ); the predual is then L1 (Ω, A, µ). Since L∞ (Ω, A, µ) (acting as
multiplication operators on H = L2 (Ω, A, µ)) for a measure µ completely encodes the measure
theory of (Ω, A, µ), the theory of von Neumann algebras has been seen as non-commutative
measure theory. This is the picture of Alain Connes [145]. Von Neumann algebras are pretty
well understood: each is a direct integral of factors. Factors are classied as type I (meaning
that it has a non-zero minimal projection like operator algebras on Hilbert spaces), type II
(meaning that there is a non-zero nite projection) or then type III (meaning that it contains
no non-zero nite projection). There are other characterizations of von Neumann algebras: the
Kaplanski density theorem states that if A is a C ∗ subalgebra of an operator algebra B(H)
then the unit ball of A is strongly dense in the unit ball of the weak closure of A. This implies
that a subalgebra M of B(H) containing 1 is a von Neumann algebra if and only if the unit
ball of M is weakly closed. More references are [597, 70, 186, 721].

216. Takens's theorem

Let M be a d-dimensional manifold and T : M → M be a smooth map from M to M . A compact

T -invariant set A ⊂TM is called an attractor for T if there there is an open neighborhood
N of K such that n≥0 T n (N ) = A. It is called a minimal attractor if no proper sub
attractor exists. The R map T is called partially hyperbolic if the Lyapunov exponent
λ(µ) = limn→∞ n−1 A log |dT n (x)| dµ(x) is non-zero for some T -invariant measure µ on A,
where dT (x) is the Jacobian matrix. The partial hyperbolic attractor is called strange if it is
not a countable union of lower dimensional sets homeomorphic to varietes in M . This happens
for example if A is a fractal, meaning that the Hausdor dimension of A is not an integer. A
Takens embedding of M is given by a transformation T and a smooth C 2 function f : M → R
and an integer k and dened as the time series x → (f (x), f (T (x)), . . . , f (T k−1 x)) ⊂ Rk . One
can often reconstruct M and so also the attractor A from such measurements. This happens
Bair generically in C 2 (M, M ) × C 2 (M, R).

Theorem: For a Bair generic set of pairs T, f , a Takens embedding exists.

One can therefore use a dynamical system T : M → M to embed M into some Euclidean space
Rk . This Takens's embedding theorem is analogue to the Whitney embedding theorem
which assures that if f is allowed to be Rm valued, then A can be embedded in Rm , even for a
time series with k = 1 observation so that no dynamics is needed: f : M ∈ Rm embeds M and
so A into a Euclidean space. The signicance of the Takens's theorem is that one can see"
M or the attractor A using a time series of a single real observable f : M → R and then
use time, that is the dynamical system, to generate the coordinates of the embedding. This
is extremely practical. One can for example observe the times, when a drop leaves a faucet
and use the dierences of the times between two drops to create an attractor without having
any model of drop formation. The time series of course does work in general as the functions
f and the transformation T must be interesting enough. For the identity T for example, the
time series does not give enough information. A special case is if A consists of a single point a
which is hyperbolic in the sense that all eigenvalues of the Jacobian matrix dT (a) are smaller
than 1 in absolute value. In that case, the manifold M is the stable manifold of a and that
remains true for an open set of transformations near T . Takens theorem then implies that for

106
OLIVER KNILL

a generic C 2 function f , one can chose a k such that the time series reconstructs the manifold
M . The same works if A is a hyperbolic attractor, because the structural stability of T
allows then to restrict the genericity statement to the function f . Floris Takens 1940 -2010
was a Dutch mathematician. Together with David Ruelle, he introduced the notion of strange
attractor. See [172] for dynamical systems in general. Takens's article is in [380] (p 366-381).

217. Perfect graphs

A nite simple graph is called perfect if every induced subgraph has a chromatic number
(minimal number of colors needed for a vertex coloring) which is equal to the clique number
the graph. (The clique number is maximal number n of vertices for which there exists a complete
subgraph Kn with that number of vertices). A nite graph satises the Berge condition, it
none of the induced subgraphs are cyclic graphs C2n+1 with n ≥ 2 nor that it is the complement
of such a cyclic graph. The strong perfect graph theorem states:

Theorem: The set of perfect graphs is the set of Berge graphs.

Because the odd cycle condition is invariant under graph complement formation, the following
weak perfect graph theorem follows: if G is perfect, then its graph complement is perfect.
Examples of perfect graphs are trees, bipartite graphs, wheel graphs with even boundary length
or Barycentric renements of graphs (the graph in which the cliques are the vertices and two
cliques are connected if one is contained in the other, where obviously the dimension function
is a coloring and agrees with the clique number). The strong perfect graph conjecture had
been conjectured by Berge in 1961 [55]. Maria Chudnovsky, Neil Robertson, Paul Seymour and
Robin Thomas proved the theorem in 2006 [135].

218. Kochen-Specker theorem

Let H be a Hilbert space and let X denote the set of self-adjoint operators on H . These
operators A are also known as quantum mechanical observables. The mathematical frame
work of quantum mechanics considers a time evolution ψ = iLψ with a Hamiltonian L and then
does for A ∈ X produce data ⟨ψ(t), Aψ(t)⟩ (Schrödinger picture) or ⟨ψ, A(t)ψ⟩ with A(t) =
U (t)∗ AU (t) with unitary U (t) = exp(iL) (Heisenberg picture). Since X is non-commutative,
one can not expect to do measurements as in the classical calculus. The non-commutativity is
illustrated best with the famous anti-commutation relation [P, Q] = i which holds for the
self-adjoint operators P f (x) = if ′ (x), Q(x) = xf (x) on L2 (R) which represent momentum and
position of a particle on the real line. Before John Bell and Simon Kochen and Ernst Specker,
it was not excluded that one could use some hidden variables and still be close to a classical
theory. By formulating this precisely, one can also produce theorems. A function v : X → R
is called a classical value function, if it is linear X and satises f (v(A)) = v(f (A)) for all
continuous functions f as well as v(AB) = v(A)v(B). In other words, v is a multiplicative
linear functional on X , honoring the functional calculus as well as being compatible with
multiplication. For a continuous real function, the value f (A) is dened by the functional
calculus which exists by the spectral theorem for any self-adjoint operator. The Kochen-
Specker theorem is a no-go theorem and holds also over nite elds:

Theorem: If dim(H) ≥ 3, there is no classical value function.

107
FUNDAMENTAL THEOREMS

It was proven by Simon Kochen and Ernst Specker in 1967 [435] in the case when the dimension
is 3 or higher. It complements Bells theorem on hidden variables". An important precur-
sor was Gleason's theorem. Kochen and Specker show more generally that there is no partial
Boolean algebra D has no homomorphism into Z2 . It is refreshingly simple and elegant espe-
cially, considering the diculties that surround interpretations of quantum mechanics. A bit
simpler is the argument if the dimension of H is assumed to be 4 or higher: let u1 , u2 , u3 , u4 be
four orthogonal vectors in H and let Pk be the projection operators onto the line spanned by uk .
They satisfy P1 + P2 + P3 + P4 = 1 so that by linearity, v(P1 ) + v(P2 ) + v(P3 ) + v(P4 ) = 1. The
condition v(AB) = v(A)v(B) implies for projections P (elements in X satisfying P 2 = P )
that v(P 2 ) = v(P ) = v(P )v(P ) so that v(P ) = 0 or 1. The linearity condition now implies that
exactly one value is 1. [402] simplies [568] uses the following list of 11 inconsistent equations
for 20 vectors which can not be satised because each vector appears 2 or four times but on
the left one has column sum which is 11 and so odd.
1 = v([1, 0, 0, 0]) + v([0, 1, 0, 0]) + v([0, 0, 1, 0]) + v([0, 0, 0, 1])
1 = v([1, 0, 0, 0]) + v([0, 1, 0, 0]) + v([0, 0, 1, 1]) + v([0, 0, 1, −1])
1 = v([1, 0, 0, 0]) + v([0, 0, 1, 0]) + v([0, 1, 0, 1]) + v([0, 1, 0, −1])
1 = v([1, 0, 0, 0]) + v([0, 0, 0, 1]) + v([0, 1, 1, 0]) + v([0, 1, −1, 0])
1 = v([−1, 1, 1, 1]) + v([1, −1, 1, 1]) + v([1, 1, −1, 1]) + v([1, 1, 1, −1])
1 = v([−1, 1, 1, 1]) + v([1, 1, −1, 1]) + v([1, 0, 1, 0]) + v([0, 1, 0, −1])
1 = v([1, −1, 1, 1]) + v([1, 1, −1, 1]) + v([0, 1, 1, 0]) + v([1, 0, 0, −1])
1 = v([1, 1, −1, 1]) + v([1, 1, 1, −1]) + v([0, 0, 1, 1]) + v([1, −1, 0, 0])
1 = v([0, 1, −1, 0]) + v([1, 0, 0, −1]) + v([1, 1, 1, 1]) + v([1, −1, −1, 1])
1 = v([0, 0, 1, −1]) + v([1, −1, 0, 0]) + v([1, 1, 1, 1]) + v([1, 1, −1, −1])
1 = v([1, 0, 1, 0)]) + v([0, 1, 0, 1]) + v([1, 1, −1, −1]) + v([1, −1, −1, 1]) .

219. Perfect difference sets

A subset D of Zm is called a perfect dierence set if every nonzero number in Zm can be

written uniquely as a − b for a, b ∈ D. An example for m = 13 is D = {1, 2, 5, 7} ⊂ Z13 . For
D to exist we need m = n2 + n + 1 and |D| = n + 1. The number n is called the order of
the perfect dierence set. Any perfect dierence set D produces a nite projective plane
P (2, n) with m = n2 + n + 1 lines. Singer showed in 1938 [656] that perfect dierence sets exist
if n = pk is a prime power:

Theorem: For every prime power n = pk there exists a nite projective plane

Singer obtained the perfect dierence set in the following way: Let ζ be generator of the
multiplicative group in the Galois eld G3 = Fqn3 which is a Galois extension of G1 = Fqn , then
ζ is the root of an irreducible cubic polynomial in G1 so that every element can be written as
a + bζ + cζ 2 , a, b, c ∈ G1 . Every element dierent from 0 in G3 can be written as ζ k . Look at
all elements D = {k, ζ k = a + bζ for a, b ∈ G1 } ∪ 0. Two such elements are called equivalent
if one is the multiple of the other. The equivalence classes partition all numbers into n + 1
equivalence classes. If they are written as ai + bi ζ = ζ ki , then the set of exponents ki is a perfect
dierence set. The prime power conjecture claims that for any nite projective plane the
order is a prime power. One already does not know whether there exists a projective plane of
order n = 12. The prime power conjecture has been veried for all n ≤ 20 · 109 by Gordon.
Sarah Peluse recently showed [567] that the number of positive integers n < N such that
Zn2 +n+1 contains a perfect dierence set is asymptotically N/ log(N ) giving more evidence for

108
OLIVER KNILL

the prime power conjecture. Perfect dierence can be used to dene Sidon sets if a + b = c + d
for a, b, c, d ∈ D, then {a, b} = {c, d}. Small sets typically
√ are Sidon sets. Sidon2 sets D can not
be too large as |D|(|D| + 1)/2 < 2n implies |D| < 2 n. The set D = {(x, x ), x ∈ Zp } is a
Sidon set in Z2p

220. Trace Cayley-Hamilton theorem

Pn
For a n × n matrix A, let pA (x) = det(A − x) = k=0 cn−k x denote its
k
Pncharacteristic
polynomial. The Cayley-Hamilton theorem pA (A) = 0 assures that k=0 cn−k A = 0.
k

While obvious for matrices which allow diagonalization (like normal operators), the Cayley-
Hamilton theorem is remarkably non-shallow. The trace Cayley-Hamilton theorem is
Pk
Theorem: kck + j=1 tr(Aj )ck−j = 0

This implies that if all trace powers are zero, then pA (x) = (−x)n . The reason for the name
trace-Cayley-Hamilton Pn theoremj is that for k ≥ n, the result can be obtained from the Cayley-
Hamilton theorem j=0 cn−j A by multiplying with A k−n
and taking traces. The trace Cayley
Hamilton theorem implies also that if two n × n matrices have the same traces tr(Ak ) = tr(B k )
for k = 1, . . . , n, then A, B have the same characteristic polynomial and so are isospectral.
This is extremely useful as computing the traces of n matrices can be more convenient than
computing the characteristic polynomial. One can use the theorem especially in theoretical
settings better. For normal matrices one can conclude that A is the zero matrix if tr(Ak ) = 0
for k = 1, . . . , n. See [291, 769]. For moment problems see [632]. The Cayley-Hamilton theorem
was rst tackled in 1984 by William Rowan Hamilton in the context of quaternions, meaning
for n = 2 complex or n = 4 real matrices. Arthur Cayley stated the theorem in 1858 for n ≤ 3
but only proved n = 2. In 1878, the general case was proven by Ferdinand Georg Frobenius.

221. Maximal permanent

The permanent of a n × n matrix A is per(A) = π ni=1 Ai,π(i) , where P the sumQis over all
P Q
permutations of {1, 2, . . . , n}. It takes the Leibniz denition det(A) = π sign(π) ni=1 Ai,π(i)
of the determinant determinant but ignores the signatures sign(π) of the permutations. Unlike
determinants which can be computed in polynomial times using row reduction, there is no
polynomial way known to compute permanents in polynomial time. A probability vector
p = (p1 , . . . , pn ) is an element in Rn for which all entries are in [0, 1] and add up to 1. A n × n
matrix is doubly stochastic, if each row and each column of A are probability vectors. In 1926
Bartel van der Waerden conjectured that the maximal permanent which a doubly stochastic
n × n matrix can have, is obtained if all entries are 1/n. These are Pnthe matrices with maximal
entropy in the sense that the Shannon entropy S(p) = − k=1 pk log(pk ) is maximal for
each column or row of the matrix.
Theorem: Doubly stochastic maximal permanent ⇔ maximal entropy.

The van der Waerden conjecture was proven in 1980 by Béla Gyires [305] and in 1981 by
G.P. Egorychev and by D.I. Falikman. In [306] it was pointed out that the conjecture had
already been proven in 1977 [304]. For permanents, see [522]. Béla Gyires was a Hungarian
mathematician who lived from 1909 to 2001. In his last paper [307], Gyires gives an other
account on the proof of the van der Waerden conjecture and two proofs.

109
FUNDAMENTAL THEOREMS

222. Billiards in polygons

A convex compact polygon in R2 denes a billiard dynamical system. Parametrize the

boundary by x ∈ T = R/Z. Given (x1 , x2 ) ∈ T2 where both points are not at a vertex of the
polygon, we get a new point x3 such that the path x1 , x2 , x3 satises the law of reection at x2 .
The set of points in T2 for which no future point xk is a vertex has full measure. A point x0
is called a periodic point if xn = x0 for some n > 0 and xk are all points not on vertices of
the polygon. It is unknown already in the case of an obtuse triangle, whether a periodic point
exists. Fagnano already observed in 1775 that any acute triangle has a periodic trajectory, the
orthopic triangle. A polygon is called rational if all angles αj have the property that the
angles αj /π are rational.

Theorem: A rational polygon has a periodic orbit.

Actually, there is a dense set of directions θ for which there is a periodic orbit. This is called
the Masur theorem named after Howard Masur who proved this in 1986 [496] by reducing
the problem to ows dened by eiθ ϕ where ϕ is a holomorphic 1-form on a compact Riemann
surface R of genus ≥ 2. More generally, if q is a holomorphic quadratic dierential on such
an R, there exists a dense set of θ such that eiθ q has a closed regular vertical trajectory. The
existence theorem uses Teichmüller theory. The basic questions about billiards in polygons
has been raised by Carlo Boldrighini, Michael Kean and Federico Marchetti in 1978 [111].
Billiards in polygons are also interesting from an ergodic point of view. A Bair generic polygon
produces an ergodic ow. For rational polygons, this is not the case as the directions of the
ow stay in a nite set generated by the rational angles πni /mi at the vertices. There is then
an interval [0, π/n) which parametrizes invariant hypersurfaces in the phase space. One knows
that for Lebesgue all directions θ ∈ [0, π/n) the ow is uniquely ergodic, even weakly mixing
but not mixing and has zero entropy. This implies that there exists a generic set of ergodic and
even weakly mixing (non mixing) polygons (they are then non-rational) with n vertices. For
more on billiards in polygons, see [301, 698, 308].

223. Elasticity

If G ⊂ Rn be an open and connected domain. For a vector eld v : G → Rn and x ∈

G denote with v(x) = (v 1 (x), . . . , v n (x)) its coordinates. Let dvji (x) = ∂j v i (x) denote the
Jacobian matrix ofRv at x. Let ||v||H 1 denote the Sobolev norm obtained from the inner
product ⟨v, w⟩H 1 = G v(x) · w(x) + tr(dv T dw) dx on smooth vector elds and let H 1 (G) be
the Hilbert space obtained by completing this set of vector elds with respect to that norm.
Let (∂i v j + ∂j v i )/2 be abbreviated as dv s (x) = (dv T (x) + dv(x))/2 and denote the symmetric
part of Jacobian matrix at x. Let ||v||H 1 denote the symmetrized Sobolev norm obtained
S
from the inner product ⟨v, w⟩HS1 = G v(x) · w(x) + tr(ds v T (x)ds w(x)) dx). This means that the
R

Hilbert-Schmidt product tr(AT B) of the Jacobian matrices A = dv and B = dw is replaced

by the Hilbert-Schmidt product of the symmetrized Jacobian matrices ds v and ds w.

Theorem: There exists C = C(G) such that ||v||H 1 ≤ C||v||HS1 .

This inequality is called the Korn inequality. It is used in linear elasticity and continuum
mechanics. The constant C is called the Korn constant of G. The inequality had rst been
established by Arthur Korn in 1909 [441]
R ins the 2case GR= R , where
n
for
R smooth vector elds
v , we have using integration by parts G |d f (x)| dx = G |df (x)|2 /2 + G (div(f ))2 dx so that

110
OLIVER KNILL

the constant C = 2 would do. See [553]. The inequality has been generalized to W 1 (G) if the
region is bounded with Lipschitz boundary. It has also been generalized to other Sobolev spaces
W 1,p (G) for p ∈ (1, ∞) if the boundary is smooth enough. It fails for p = 1, ∞. Arthur Korn was
a German physicist born in 1870. He was also an inventor, involved in the development of the
fax machine and Bildtelegraph which were early television systems, as well as a mathematician
working on partial dierential equations. He had been dismissed from his post in 1935 and left
Germany to the US, working at the Stevens Institute of Technology in Hoboken. For more on
the inequality, see [136].

224. Twin primes

A pair (p, q = p + 2) of two rational primes is called a prime twin. Examples are (p, q) =
(5, 7). One might have wondered since antiquity about the innitude of prime twins. The twin
prime conjecture claims that innitely many prime twins exist. The rst known source about
the conjecture is Alphonse de Polignac in 1849 so that the conjecture is sometimes also called
the Polignac conjecture. Let π2 (x) denote the number of twin primes up to x. The sieve
bound has rst been established by Viggo Brun who showed π2 (x) = O(x(log log(x)/ log(x))2 .
R x dt 2
Let Li2 (x) = 2 log(t) and S = 2 p is prime ≥3 (1 − 2/p)(1 − 1/p)−2 . The sieve bounds theorem
Q

is
Theorem: There is a constant C with π2 (x) ≤ CSLi2 (x).

The constant S has a probabilistic background. It is S = p prime Sp where S2 = (1 − 1/2)(1 −

1/2)−2 = 2 and S√ = (1 − 2/p)(1 − 1/p)−2 for p ≥ 3. One expects then from a probabilistic
point of view a prime twin density of Sx/ log2 (x). The sieve bound implies that the sum
p,q prime p + q = (1/3 + 1/5) + (1/5 + 1/7) + · · · ∼ 1.902 of all reciprocals 1/p of all twin primes
1 1
P
converges. The constant limit is called the Brun's constant. In the context of the twin
prime conjecture there is Chen's theorem telling that there are innitely many primes p
such that p + 2 has at most 2 prime factors. Zhang's theorem from 2014 about the existence
of innitely many bounded gaps has been pushed further: there are innitely many pairs (p, q)
of distinct primes such that |p − q| ≤ 246. See [499] for a recent review.

225. Auction theory

A real n × m signal matrix S = Sik for n buyers=bidders and m goods=merchandise

encodes real signal values Sik which the buyer i can observe about the good k . Fixed also
before hand is Ti , the set of S matrix entries which buyer i can see for the good k . Buyers
have private values if they do not see what others do and common values if they do see all
what others can observe about k . A valuation matrix V evaluates the signals relevant
P to the
i'th buyers evaluation Vik for good k . It denes a welfare of value system Vi (S) = j∈Ti Vij .
Given a payment Pi (S), the utility is the dierence Ui (S) = Vi (S) − Pi (S) of value minus
payment. A strategy Σi of buyer i consists of dening V (S) given the constraint T of what
they can see. A pure strategy is a deterministic choice of V , meaning that buyers do not
randomize. Given P dened by the auction, its expected utility is denoted by Ui (Σ). A
strategy Σ∗ is a Nash equilibrium if all buyers optimize their own utility, meaning that
Ui (Σ∗ ) ≥ Ui (Σ) for all Σ. An auction with a Nash equilibrium is called eective if it is a
Nash equilibrium for which U is a global maximum U . The problem is to nd conditions and
mechanisms which lead to Nash equilibria or even eective equilibria. The auction process

111
FUNDAMENTAL THEOREMS

consists of a bidding that allows buyers to form a strategy Σ to nd the value V , an allocation
process assigning goods to buyers according to V and then dene a payment P leading to the
utility U . The goal is to nd an auction process which leads to an eective Nash equilibrium.
A Vickrey auction is an auction process for private values and one good, a Vickrey-Clarke-
Groves auction (VCG) extends this to several goods.

Theorem: There is a VCG bidding leading to an eective Nash equilibrium

Auction theory is a chapter in game theory and is part of mathematical economics. It deals
with the problem to use a bidding setup to allocate goods among buyers who bid for a fair prize.
It is a way to discover a correct price for a good. Game theory started with von Neumann's
paper of 1928. Von Neumann and Morgenstern [544] developed it in their book in 1944. The
concept of Nash equilibrium was introduced by John Nash in 1950 (see e.g. [450, 495]). In game
theoretical settings, this means that players choose strategies from which unilateral deviations
from the strategy do not pay better. The Vickrey auction from 1961 in which "the highest
bidder wins but pays the second highest bid", is a private auction where each person's bid only
depends on its own value. The theory has shown to be so valuable that Vickrey was awarded
a Nobel prize in economics for his work. See [410, 515].

226. Wiener's 1/f theorem

The Wiener algebra A(T) is the P set of continuous 2π -periodic functions f with P absolutely
convergent Fourier series f (x) = n∈Z cn e . Equipped with the norm ||f || = n∈Z |cn |2 <
inx

∞, it is a commutative Banach algebra, meaning ||f · g|| ≤ ||f || · ||g||. It is not a C ∗ algebra
although, one would have to change the norm to the supremum norm which then completes
to the larger set C(T). The algebra consists of mildly regular continuous functions because
C α (T) ⊂ A(T) ⊂ C(T) for all α > 1/2. The Fourier transform f ∈ A(T) → fˆ ∈ l1 (Z) is an
isomorphism of Banach algebras. Wiener's 1/f theorem is

Theorem: f ∈ A(T) and f (x) ̸= 0, ∀x ⇒ 1/f ∈ A(T).

Wiener proved this in 1932 ([748], Lemma IIe). In [399] the theorem is called one of the nicest
applications of the theory of Banach algebras to harmonic analysis". It is also known as
the Wiener-Lévy theorem as Paul Lévy extended the result showing that for any function
ϕ that is analytic on the image of f , the function ϕ(f (x)) is in A(T) [469]. Lévy gives the
example f (x) = 1/ log | sin(x)/2| which has Fourier coecients cn ∼ 4/(n log2 (n)) and so
has an absolutely convergent Fourier series. Its derivative f ′ (x) = −f 2 (x) cot(x) is no more
continuous. The 1/f theorem was proven by Israel Gelfand in 1939 using the structure theorem
for commutative Banach algebras [262, 773]: it uses the fact (actually a lemma in Gelfand's
rst paper on normed rings in 1939) that in order that an element in a normed ring has an
inverse, it is necessary and sucient that it is not in a maximal ideal: (if f has an inverse then
it does not belong to any maximal ideal I as then 1 ∈ I and so every g ∈ I ; on the other hand
if f does not have an inverse then I = {gf, f ∈ A(T) } is an ideal which is not the entire ring
and so is contained in a maximal ideal dierent from the ring). It also uses that the maximal
ideals in A(T) are the set of functions f which vanish at some point t0 ∈ T. So, functions
which do not vanish are not contained in a maximal ideal and so are invertible. A short direct
proof is given by Donald Newman in [546] using the inequality |f |∞ ≤ ||f || ≤ |f |∞ + 2|f ′ |∞
(which holds for dierentiable f and in particular for nite partial Fourier sums): if f ∈ A(T)

112
OLIVER KNILL

is given which is nowhere 0, scale it so that |f (x)| ≥ 1 andPtake a partial sum P such that
||P − f || ≤ 1/3. Now look at the geometric sum S(x) = ∞ n=1 (P (x) − f (x))
n−1
/P n which
converges because P (x) ≥ 2/3 and ||1/P n || ≤ (3|P ′ |∞ + 1)(3/2)n . Because the geometric series
converges to S(x) = 1/P (1/(1 − (P − f )/P ) = 1/f (x), the theorem is proven.

227. Well ordering theorem

A set X is called well-ordered if there is a total order on X such that every non-empty subset
Y ⊂ X has a least element. [A total order is a binary relation ≤ that satises antisymmetry
(a ≤ b and b ≤ a implies a = b), reexivity x ≤ x) transitivity (a ≤ b and b ≤ c implies
a ≤ c) and connexity (a ≤ b or b ≤ a). Without connexity, one only has a partial order.
The least element of a set Y in a totally ordered set is an element y ∈ Y such that y ≤ z
for all z ∈ Y .] The well ordering theorem is like Zorn's lemma or Tychonov's theorem
equivalent to the axiom of choice and leads to seemingly paradoxa like the Banach-Tarsky
paradox telling that one can partition the unit ball in R3 into 5 disjoint sets such that three of
them can be translated and rotated to become the unit ball again and the 2 remaining can be
translated and rotated to become the unit ball again which is a paradox because the doubling
of the ball is incompatible with volume.
Theorem: Every set can be well ordered.

The theorem was suggested by Georg Cantor in 1883 [113] and proven rst by Ernst Zermelo
in 1904 who called it the true fundament of the whole theory of number". The integers Z
can be well ordered (we write << to distinguish from the ≤) for example with x << y if
|x| < |y| or |x| = |y| and x = y or x < 0, y = −x > 0. By using a bijection from Q to Z,
also Q can be well ordered as such. But this does not work for R any more. Koenig in 1904 at
the Heidelberg Congress, gave on August 9th a wrong proof that the real numbers can not be
well ordered. Already on August 10th, Zermelo pointed out an error in König's argument. It
was Felix Hausdor who found an essential problem on September 1904 in a letter to Hilbert
and also Cantor pointed to a problem. Hilbert, Hensel, Hausdor and Schoeniess had met in
Wengen (in the Swiss alps) at a successor congress. König then in October 1904 revoked his
Heidelberg proof. On September 24 1904, Zermelo found the proof of the well-ordering theorem
in Münden, near Göttingen and acknowledges Erhard Schmidt for the idea, to base it on the
axiom of choice. The letter is printed in [201], where it also pointed out that the proof was
object of intensive criticism which only ebbed after decades. While the fact that R can be well
ordered is a consequence of the well ordering theorem, it is impossible to explicitly construct
such an ordering without assuming an axiom of constructibility. See [201] (section 2.5, 2.6)
or [200]. The well ordering theorem is now known in the mathematics history as one of the
greatest mathematical controversies of all times.

228. Caristi-Kirk-Ekeland theorem

Let (X, d) be a complete metric space and T : X → X an arbitrary map, not necessarily con-
tinuous. The map T satises an inward condition if there exists a lower semi-continuous
function f (x) ≥ 0 such that d(x, T (x)) ≤ f (x) − f (T (x)). [A function f is lower semi-
continuous if limits only can jump down" that is if f (a) ≤ limx→a f (x) for every a. For
example f (x) = −1 for x ≤ 0 and f (x) = 1 for x > 1 is lower semi-continuous but not contin-
uous. f is lower semi-continuous if and only if −f is upper semi continuous. If f is both
lower and upper continuous, then f is continuous.]

113
FUNDAMENTAL THEOREMS

Theorem: If T satises an inward condition, then T has a xed point.

An example is if we take y ∈ X and f (x) = d(x, y). The condition then means d(x, T (x)) ≤
d(x, y) − d(T (x), y), implying d(T (x), y) < d(x, y) if x ̸= T (x)), justifying the name inward
condition". In general, the condition means f (T (x)) ≤ f (x) − d(x, T (x)) < f (x) as long as
x ̸= T (x). The sequence xn = T n (x) has the property that yn = f (xn ) ≥ 0 is decreasing,
and so some sort of Lyapunov function. By completeness, the sequence yn then must have an
accumulation point y , which is a xed point of T . The theorem is easier to see if f is continuous
because there exists then by completeness of X an element x with f (x) = y and f (T (x)) = f (x)
so that f (T (x)) = f (x). James Caristi [116] (Theorem 2.1') mentions that the theorem was
suggested by Felix Browder and that I. Ekeland has proven an equivalent theorem in 1972 ([216]
Theorem 1) as an abstraction of a lemma of Bishop and Phelps. W.A. Kirk, who was the PhD
advisor of Caristi, proved already a related theorem in 1965 [407]: Kirk assumed that X is a
bounded convex subset of a reexive Banach space with a normal structure [for every convex
subset H of X with more than one point, there is a point that is not a diametral point. A
diametral point in H is a point x which appears as in the supremum supy∈H ||x − y|| being
the diameter of H ] and that T does not increase distances. Caristi's statement is more general
and elegant in comparison with the results of Ekeland and Kirk who were more concerned with
convex analysis [217].

229. Shapley-Folkman theorem

The Minkowski addition of two subsets A, B in V = Rd is dened as the set {a+b | a ∈ A, b ∈

B}. A set A in V is called convex, if for any two points x, y ∈ A also the connecting interval
points {x + t(y − x), t ∈ [0, 1]} is part of A. The convex hull of a set A is the smallest subset
c(A) in B which is convex. A set A is convex if and only if the Minkowski distance d(A, c(A))
of A to its convex hull c(A) is zero. One has in general the relation c(A + B) = c(A) + c(B).
Let us call a sequence of sets An uniformly bounded if there Pnis a ball B = Br such that An are
all subsets of B . Dene the Minkowski average Sn = n k=1 Ak . The Shapley-Folkman
1

theorem is:

Theorem: If An are uniformly bounded then d(Sn , c(Sn )) → 0.

This is some sort of a law of large sets in the sense that the Minkowski average converges
to the average" which is the convex hull. There are uniform bounds for the distance which do
not depend on n as long as n ≥ d. For convex analysis, see [217]. For convexity in economics,
see [289].

230. Dirichlet's unit theorem

An element r in a ring R is called a unit if it has an inverse. The units form a group called
the group of units. In a division ring R, it is the multiplicative group R \ {0}. [ In a normed
division algebra R one sometimes calls the elements of norm 1 units" in the sense that they are
elements of norm 1. Units here are all invertible elements in a ring. ] A algebraic number
eld is an algebraic eld extension of the eld of rational numbers Q. Let OK be the ring
of integers of the number eld
√ K . Its degree is the dimension of K as a vector eld over
Q. For quadratic elds Q( d) for example, the degree is 2 if the integer d is not a square
integer. The eld K is the eld of fractions of OK . Dirichlet's unit theorem tells:

114
OLIVER KNILL

Theorem: The group of units in a ring of integers is nitely generated.

The rank r of a ring is the maximal number of multiplicative independent elements in the
group of units. It is r = r1 + r2 − 1, where r1 is the number of real embeddings (the number of
real conjugates of a primitive element) and 2r2 is the number
√ of conjugates which are complex.
For a ring of integers in a real quadratic eld like √ Z[ 5], the rank is 1. In an imaginary
quadratic eld like the ring of Gaussian integers Z[ −1], the rank is 0. For all other elds,
the rank is larger than 1. For algebraic number theory [460, 542, 680]. A relatively short proof
of the unity theorem can be found in [674].

231. Spectrum of a countable theory

Model theory investigates how a formal theory build by sentences in a formal language is
modeled and interpreted in concrete structures. A theory T a set of sentences in a language
L. A model M of T is a set with interpretations of functions, relations, symbols in that language
L. A model is complete if every substructure of a model of T which is a itself a model of T
can be axiomatized in rst order logic. The spectrum of a complete theory T is the number
I(T, k) = I(k) of isomorphism classes of models as a function of the cardinality k . The
Löwenheim-Skolem theorem tells that if I(T, k) > 0 for some countable k , then I(k) > 0
for all cardinalities k . First order logic theories can therefore not control the cardinality of their
models. The theorem also shows that a theory with arbitrary large nite models must have
an innite model. A theory T is called k -categorical if I(T, k) = 1 has only one model up to
isomorphism. Löwenheim-Skolem shows that a rst order theory with an innite model is not k -
categorical. This also follows from Gödels incompleteness theorem. Michael Morley conjectured
in 1961 that I(T, k) = 1 for some uncountable k then I(T, k) = 1 for all uncountable k . In
other words, if a theory is k -categorical for an uncountable power k , then it is k -categorical for
every uncountable power k . It was proven in 1965 by Morley [529] and called the Morley's
categoricity theorem.

Theorem: I(T, k) = 1 for k uncountable ⇒ I(T, k) = 1 ∀ uncountable k .

The theorem is remarkable in comparison with the Löwenheim-Skolem theorem which tells
that a theory in a countable language has an countably innity model, then it has a model of
any innite cardinality. The categoricity theorem is considered the beginning of modern model
theory. Michael Morley who died on October 16, 2020 had won the 2003 Steele prize for seminal
contributions to research for his paper [529] which had been initiated when writing this PhD
thesis in 1962. See [491] chapter 6 and [645].

232. Peano axioms

The Dedekind-Peano axioms (PA) formalize the arithmetic of the natural numbers N. The
axiom system rst lists ve axioms that are already true in rst order logic with equality. The
next three axiomatize the successor function S : 1) for every n ∈ N, there is a successor
S(n). 2) S is injective and 3) there is no n with S(n) = 0. And then there is the axiom of
induction: 4) if K is a set such that 0 ∈ K and n ∈ K implies S(n) ∈ K , then N ⊂ K . Not
all statements which are true for integers can be proven by the Peano axioms. Already Kurt
Gödel established the existence of statements in PA that are true but unprovable within PA.
An accessible and natural example has been given by Je Paris and Harrington [563]: a nite
set H ⊂ N is called relatively large if card(H) ≥ min(H). Given a nite set M ⊂ N and

115
FUNDAMENTAL THEOREMS

e, r, k ∈ N let F (M, k, r, e) denote the statement: for every coloring map P : M e → {1, . . . , r}
(producing a partition of M e ), there is a relatively large H ⊂ M with card(H) ≥ k on which
P is constant (there is only one color on H ). The extended nite Ramsey theorem is for
all e, r, k ∈ N, there exists M such that F (M, k, r, e) holds".

Theorem: The extended nite Ramsey theorem is not provable in PA.

Paris and Harrington point out that when working with natural numbers, working in PA
amounts of replacing the axiom of innity by its negation in ZF. They then give rst a proof
of the extended nite Ramsey theorem as follows. (We write N for ω , the order type of N): x
e, r, k and assume there is no such M . Let P : M → {1, . . . , r} be the counter example map.
There is no relatively large homogeneous set of size at least k . The set of counter examples
is a graph where (P, M ), (Q, N ) are connected if M ⊂ N and P is the restriction of Q to M .
This is an innite tree with nite vertex degree at every point. By König's lemma, there is
P : Ne → {1, . . . , r} such that for every M ⊂ N the restriction of P to M e is a counter example
for M . By the innite Ramsey theorem, there is an innite H ⊂ N that is homogeneous for P .
By choosing M large enough (compared to k , min(H)) H ∩ M is a relatively large homogeneous
set for P |M e of size at least k . This nishes the proof of the extended nite Ramsey theorem.
The proof of the Paris-Harrington theorem uses model theoretic techniques and the Gödel's
incompleteness theorem. Paris and Harrington dene a beefed up" theory T and show that
the consistency of T implies the consistency of P A using P A only. Then they show that the
extended nite Ramsey theorem implies the consistency of T and so the consistency of P A.
This contradicts Gödel's incompleteness theorem: one can not prove the consistency of P A
within P A. Laurence Kirby and Je Paris have produced even more accessible examples [406],
especially the Hydra game: which is a game in which the player has to cut o heads of a
tree to which the tree reacts by growing a multiple copies of branches. The theorem is that the
player always wins. The surprise is that one can not prove this within PA. More examples are
in [665].

233. Simplicial sets

The simplex category ∆ has simplices [n] = {0, 1, . . . , n} (non-empty totally ordered nite
sets) as objects and order preserving maps between them as morphisms. A simplicial
set is a contravariant functor ∆ → Set. Simplicial sets form a category called sSet. More
generally, a simplicial object is a contravariant functor from ∆ to an other category C . A
coface map di : {0, . . . , n} → {0, . . . , n + 1} is the unique order preserving bijection for which
element i is omitted in the codomain. The codegeneracy map si : {0, . . . , n+1) → {0, . . . , n}
duplicates the element i meaning that si (j) = j if 0 ≤ j ≤ i and si (j) = j − 1 if i < j ≤ n.
There is now a decomposition lemma:
Theorem: ∆ morphisms are a composition of coface and codegeneracy maps.
This simple lemma is important to appreciate the axiomatic description of simplicial sets given
rst by May in 1967. See [322, 601]. It tells that every morphism: f : {0, . . . , n} → {0, . . . , m}
has a unique representation f = dik · · · di1 sj1 · · · sjh with n+k −h = m and m ≥ ik ≥ · · · ≥ i1 ≥
0 and 0 ≤ j1 ≤ · · · ≤ jh < n. One has the relations di dj = dj+1 di and sj si = si sj+1 for i ≤ j
and si dj = di sj−1 for i < j , sj di = 1 if i = j, j + 1 and si di = di−1 sj . In the opposite category
∆op (the category with the same objects but reversed morphisms), the morphisms are denoted
by di , sj and called face and degeneracy maps. All morphisms in ∆op are now generated

116
OLIVER KNILL

by composites of di , sj . It follows that a contra-variant functor X from ∆ to C is determined

by the images X{1, . . . , n} of the simplices and if the face and degeneracy maps di and sj are
known. A simplicial set therefore is a set of sets Xn together with functions di : Xn → Xn−1
and si : Xn → Xn+1 satisfying the composition relations for di , si . The elements X0 are called
the vertices, the elements Xk are called the k -simplices. The image of some sj is called a
degenerate simplex. An advantage of looking at simplicial sets rather than the simplicial
complexes is that one can use the frame work in any category and that the Cartesian product of
simplicial sets is a simplicial set. The covariant geometric realization functor X → |X| from ∆
to T op is the right adjoint to the singular homology of the theory of simplicial sets. An other
example is that the nerve N (C) of a small category C is a simplicial set constructed from the
objects and morphisms of C . A functor f : C → D between two categories induces then a map
of the corresponding simplicial sets and a natural induced transformation between two functors
induces a homotopy between the induced maps. The geometric realization of N C is called the
classifying space of C . In general, for locally nite simplicial sets one has |X × Y | = |X| × |Y | in
T op and the geometric realization |X| of a simplicial set X in Euclidean space is a CW complex.
In 1950, Eilenberg and Zilber introduced semisimplicial complexes (see [640]), a terminology
which later morphed into simplicial sets. According to [322], every simplicial complex can
be subdivided to become a simplicial complex so that every simplicial set is homeomorphic to
a simplicial complex. But similarly as with CW complexes, simplicial sets allow computations
with fewer simplices. The category of topological spaces of homotopy type of a CW complex
are equivalent to the category of simplicial sets which satisfy an extension condition. For more
literature [498, 271, 479].

234. Ostrowski theorem

An absolute value on Q is a norm function from x ∈ Q → |x| ∈ R+ with the property that
|x| = 0 is equivalent to x = 0, and which is compatible with multiplication |xy| = |x||y| and
satisfying the triangle inequality. The trivial norm is |x|1 = 1 for x ̸= 0 and |0|1 = 0 is
not considered. An example is the usual absolute value |x|∞ or the p-adic absolute value
|x|p = |pn uv |p = p−n if p, u, v are all coprime numbers and where p is a rational prime. The
p-adic norm |x|p is a non-Archimedean absolute value in the sense that the ultra metric
property |x+y| ≤ max(|x|, |y|) holds. Two dierent absolute values are equivalent if |x|1 = |x|c2
for some positive constant c. An equivalence class of non-trivial absolute values on Q is a place.

Theorem: Every place |x| is either |x| = |x|∞ or |x| = |x|p with prime p.

Alexander Ostrowski proved this theorem in 1916 [554]. It shows that every eld containing
Q which is complete with respect to an ArchimedeanQabsolute value is either R or Q. An
other curious consequence is the product formula p≤∞ |x|p = 1 for x ∈ Q \ {0} which
combines all possible norms |x|p as well as |x|∞ . The completion of Q with respect to |x|p
is the space Qp of p-adic numbers.PEach number in Qp can be written in a unique way as
∞
k=−∞ ak p , where the ak ∈ {0, . . . , p − 1} are zero for
k
half innite Laurent series x =
k < n(x). The P∞ norm is then |x|p = p−n(x) . The eld Q contains the sub-ring Z of p-adic
k=0 ak p which is the unit ball in Qp . The ring Z of p-adic integers has no zero
k
integers x =
divisors so that Qp is the eld of fractions of Zp (the smallest eld containing Zp ). The p-adic
integers G = Zp with addition and metric coming from the norm forms a commutative compact
topological group with respect to addition. It is a totally disconnected perfect metric space
and so a Cantor space, a topological space homeomorphic to the Cantor set. Its Pontryagin

117
FUNDAMENTAL THEOREMS

dual group Ĝ is the p-Prüfer group Qp /Zp , the p-adic rational numbers modulo the integers
l
which is Ĝ = {e2πik/p , k, l ∈ Z}. While R and Qp are all locally compact, the circle T = R/T
is a compact, connected metric space which is the dual to the integers Z which is non-compact
and completely disconnected. The p-adic integers Zp are all compact, completely disconnected
spaces, dual to the Prüfer p-group Pp = Qp /Zp . In both cases, one has Haar measures µ on
G and a Fourier isomorphism f ∈ L2 (G, µ) → L2 (Ĝ, µ̂) which is the Pontryagin involution. All
group translations and multiplications by some integer preserve the Haar measure µ. While for
T, the translations x → x + α preserving dx can be arbitrarily close to the identity, there is a
smallest group translation x → T (x) = x + 1 on the p-adic integers. It is called the adding
machine and is ergodic. The eigenvalues of the unitary Koopman operator f → f (T ) on
L2 (Zp ) coincides as a set with the Prüfer group Pp ⊂ T = {z ∈ Z, |z| = 1}. Besides group
translations, there are also Bernoulli shifts in both cases which preserve the Haar measure. On
the compact topological group T = R/Z, the map x → nx is a Bernoulli shift for every n > 1
with entropy log(n). On the compact topological group Zp , the map x → px is a Bernoulli
shift with Kolmogorov-Sinai entropy log(p). About the life of Ostrowski see [259]. For p-adic
analysis [394, 279].

235. Clifford algebras

If V is a nite dimensional vector space over a eld k and q is a quadratic form of signature
(p,
L∞ q), N
its Cliord algebra Cl(V, q) is the quotient T (V )/I of the tensor algebra T (V ) =
n
n=0 k=1 V by the ideal I generated by elements v ⊗ v − q(v)1. [We use the sign convention
of [133, 659]. Other authors, like [257, 463], prefer to take the ideal v ⊗ v + q(v)1.] The Cliord
algebra is a unital associative algebra. If the underlying eld k is R, it is called a geometric
algebra. For q = 0, one obtains the exterior product with exterior multiplication, where
v ⊗ w = −w ⊗ v . As turning on" the form q deforms the anti-commutativity relation, one sees
Cl(V, q) as a quantization of the exterior algebra Ext(V ). (One can also see the process of
going from (V, q) to Cl(V, q) as a second quantization if one interprets the tensor algebra as a
many-body Fock space.) Examples of Cliord algebras are the complex numbers Cl0,1 (R) =
C the quaternions Cl0,2 (R) and split complex numbers Cl1,0 (R) or split quaternions
CL2,0 (R). Notable in relativity is the space time algebra Cl1,3 (R). Let i : V → Cl(V, q)
be the inclusion map which embeds V into Cl(V, q) and which satises i(v)2 = q(v). The
algebra CL(V, q) enjoys now a universal property if given any associative algebra A and any
linear map j : V → A obeying j(v)2 = q(v): there exists a unique algebra homomorphisms
f : Cl(V, q) → A such that j(v) = f (i(v)). The fundamental theorem of Cliord algebras
is that CL(V, q) is unique. One can speak of the" Cliord algebra Cl(V, q) dened by V
and q .
Theorem: The Cliord algebra construct satises the universal property.

In category theory one sees Cl as a functor from the category of nite dimensional pseudo
Hilbert spaces (V, q) to the category of unital associative algebras. The universal property
generalizes the process of getting from an algebra to the free algebra F (A) or to get the
tensor algebra T (M ) from a module M over a ring [133]. If V has dimension n, the dimension
of the Cliord algebra Cl(V, q) is 2n . As a vector space, the Cliord algebra is isomorphic to
the exterior algebra Ext(V ) which is like Cl(V, q) a super algebra. This follows from the
universal property. As the involution v → −v does not change the quadratic form q , it lifts
to an involution α on Cl(V, q). This produces a splitting Cl(V, q) = Cl(V, q)even ⊕ Cl(V, q)odd ,

118
OLIVER KNILL

where Cl(V, q)even = {x ∈ Cl(V, q), α(x) = x} and CL(V, q)odd = {x ∈ CL(V, q), α(x) = −x}.
Multiplication honors this grading. The quadratic form q can be extended from V to Cl(V, q):
rst dene the transpose xT which reverses P the N
order x = v1 ⊗ · · · ⊗ vk → vk ⊗ · · · ⊗ v1 ,
and the scalar part x0 of an element x = n |k|=n xk vk1 ⊗ · · · ⊗ vkn . The symmetric,
bilinear form q1 on V1 = Cl(V, q) is then dened as x · y = (xT · y)0 and continues to be non-
degenerate if q was. In the case when q is positive denite, where (V, q) is a Hilbert space,
the operation (V, q) → (V1 , q1 ) can now be iterated and produces a sequence (Vn , qn ) or Hilbert
spaces. Cliord algebras have many applications, like in algebraic geometry (starting with
Grassmann who introduced exterior algebras), in representation theory of classical Lie groups
(it was Élie Cartan who discovered in 1913 rst unknown representations of the orthogonal
group and called the elements on which the matrices operated spinors" [119]), in physics or
in dierential geometry. To the later, Cartier writes in the introduction to [133] that since the
1950's, spinors and the associated Dirac equation have developed into a fundamental tool in
dierential geometry. Indeed, on has at every point x ∈ M of a Riemannian manifold a Cliord
algebra Cl(Tx M, g(x)) dened by g(x), the quadratic form in the tangent space V = Tx M . This
produces a Cliord bundle. One can then ask, under which conditions a spin structure
exists on M . It is the case if and only if the second Stiefel-Whitney class w2 (M ) is zero.
This topological obstruction for the existence of spin structures on an orientable Riemannian
manifold (M, g) was found by André Haeiger in 1956. Haeiger dened the spin structure
on (M, g) as a lift of the principal orthonormal frame bundle FSO (M ) → M to FSpin (M ).
Not every Riemannian manifold is spin. While spheres Sn are spin, the 2n-manifolds CP2n
(complex projective spaces) are not spin. The space of spinors of (V, q) is the fundamental
representation of a Cliord algebra Cl(V, q). Spinors belong also to vectors in a representation
of the double cover Lie algebra Spin(p, q) of the special orthogonal group SO(p, q) of signature
p, q . Representation theory of classical groups like SO(n) or Spin(n), a subgroup of the group
of invertible elements in a Cliord algebra of a Hilbert space, are a major motivator for Cliord
algebras.

236. Transcendental number theory

A complex number is called algebraic if it is the root of a polynomial a0 +a1 x+· · ·+an xn ∈ Z[x]
with integer coecients a0 , . . . , an ∈ Z. The algebraic numbers A form a eld. They can be
enumerated and so are a countable set in C. As a consequences, almost all real numbers are not
algebraic. This argument of Cantor is a non-constructive but elegant proof of the existence of
non-algebraic numbers, numbers in the complement C\A which are also called transcendental
numbers. Let us call a Gelfond-Schneider pair a pair of algebraic numbers α, β for which
α ̸= 0, 1 and for which β is rational. The Gelfond-Schneider theorem is:

Theorem: (α, β) Gelfond-Schneider ⇒ any choice of αβ is transcendental.

The theorem is named after Alexander Osipovich Gelfond and Theodor Schneider. Gelfond
proved a special case in 1929 (β imaginary quadratic) and the full version in 1934. Schneider
proved the same in his PhD thesis in 1934 written under the advise of Carl Siegel, who √already
proved it for real quadratic β . An example is the Gelfond-Schneider constant 2 2 . An
other example is Gelfond constant eπ = (eiπ )−i = (−1)−i . One has to say any choice"
because (−1)−i invokes the complex logarithm which has many branches. Any other branch
like (−1)−i = e−i log(−1) = e−π+2kπ = e−(1+2k)π is also transcendental. A third example is the
eye for an eye" number ii = (eiπ/2 )i = e−π/2 which is already transcendental as a consequence

119
FUNDAMENTAL THEOREMS

of the Gelfond theorem because i is algebraic, solving the equation 1 + x2 = 0. The problem
whether αβ is transcendental for a Gelfond-Schneider pair had been asked by David Hilbert
and got to be known as Hilbert's seventh problem in 1900. Questions about transcendental
numbers are dicult. For example, one still does not know, whether π e is transcendental or
not. See [275] and especially chapter 4 and 5 of [107].

237. Conformal maps

Let Dr (w) = {|z − w| < r} denote a disk of radius r centered at z ∈ C. If f : Dr (w) → C

is a holomorphic function with f ′ (w) ̸= 0, then by the implicit function theorem, the map
is invertible and by Bloch's theorem, f (Dr (w)) contains a disk D(f (w), c|f ′ (w)|r) for some
constant c. The best constant c for which this works is is called the Bloch constant. An
analytic, injective function f on Dr (w) is also called univalent and f : Dr (w) → f (Dr (w)) is
called a conformal mapping. Koebe's quarter theorem is

Theorem: If f is univalent on Dr (w), then D|f ′ (w)|r/4 (w) ⊂ f (Dr (w)).

The result had been conjectured in 1907 by Paul Koebe and was P rst proven by Ludwig Bieber-
bach in 1916 [62]. The Koebe function f (z) = z/(1−z)2 = ∞ n=1 nz n
shows that the constant
1/4 can not be improved upon. In [704], the result is stated and proven that for polynomials
of degree n, the image f (Dr (w)) contains the disk D|f ′ (w)|r/n (f (w)). (The blog cites [516] but
this the disc result is not that obvious there). For more information on Koebe, see [117]. Paul
Koebe's is famous also for a his theorem generalizing the Riemann mapping theorem. It
states that any nitely connected domain is conformally equivalent to a circle domain unique up
to Möbius transformations. (A circle domain is an open subset of C such that every connected
components of its boundary is either a circle or a point.) Koebe's Kreisnormierungsproblem
from 1909 asks whether every domain in C is conformally equivalent to a circle domain unique
up to a Möbius transformation. The problem is open.

238. Shannon capacity

The independence number α(G) of a nite simple graph G = (V, E) is the maximum number
of independent points in G (a set of vertices is independent if the members of the set are
pairwise not adjacent). The strong product G ∗ H of two graphs G, H has as the vertex
set the Cartesian product V (G) × V (H) of vertices in G and H and as edges all connections
which when projected on any of the graphs gives either a vertex or edge. In communication
theory, where V is the alphabet and E gives letters which can be confused, then α(Gk ) the
maximal number of k letter messages which can be sent without the danger of confusion. The
limit Θ(G) = limk→∞ α(Gk )1/k is called the Shannon capacity of the graph. One has clearly
Θ(G) ≥ α(G) because there are at least α(G)k words which can not be confused. The extreme
cases is Pn , the graph with n vertices and no edges and Kn , the graph with n vertices and all
edges present. In these cases Θ(Pn ) = n and Θ(Kn ) = 1.
√
Theorem: The Shannon capacity of G = C5 is 5.

The Shannon capacity was introduced by Claude Shannon in 1956 [642] who wrote: The zero
error capacity of a noisy channel is dened as the least upper bound of rates at which it is possible
to transmit information with zero probability of error. Shannon took the logarithm and called
C0 = log(Θ(G)) = limk→∞ k1 log(α(Gk )) the zero-error capacity which reminds of a Lyapunov

120
OLIVER KNILL

exponent measuring the exponential growth of a cocycle. Shannon computed the capacity for
all graphs with n = 1, 2, 3, 4, 5 nodes and the pentagon had
√ been the smallest, where he had
been unable to determine the value, he only established 5 = 2.236.0 ≤ Θ(G) ≤ 5/2 = 2.5.
This true value for the pentagon was then computed in [476], where also the notation Θ(G)
appears. An exposition about Shannon capacity appears in [497] (Miniature 28 and 29). The
problem of computing Θ(G) is formidable. One does not even know Θ(C7 ).

239. Outer billiards

A convex curve C in R2 denes an area-preserving map T : X → X , where X is the unbounded

region outside of the table. A point (x, y) is mapped into T (x, y) which is the point reection
at the point (p, q) which is the midpoint of the interval I obtained by intersecting the counter
clockwise tangent from (x, y) to C . The map can be extended to C by dening T (x, y) = (x, y)
there. For most points (x, y), the interval I is a single point but already for polygons, we want to
have T dened everywhere, even so it is not continuous. The map T is called the outer billiard
map or dual billiard map dened by C . The Penrose polygon is the quadrilateral
ABP Q dened by the 5-gon A, B, C, D, E , where P = (AD) ∩ (BE) and Q = (BD) ∩ (CD),
with (AD), (BE), (BD), (CD) denoting diagonal segments in the pentagon. A table is called
unstable if there exists (x, y) such that |T n (x, y)| is unbounded.

Theorem: Outer billiard at the Penrose kite is unstable.

The outer billiard T is smooth if C is a smooth and strictly convex. The dynamical system had
been introduced by B.H. Neumann in the Manchester University Mathematics students journal
of 1959 [543] and was popularized in [530, 531]. The question whether there exists a convex table
for which an unbounded orbit of the map T exists is known as the Moser-Neumann question
[636, 637]. If C is smooth and strictly convex, then KAM theory establishes invariant curves
for T and so stability of the table [195]. For a class of tables called quasi-rational polygons,
which includes rational polygons and regular n-gons, all orbits are bounded [725, 439, 214].
Also trapezoids lead to bounded orbits [470].

240. Sandwich theorem

Let G = (V, E) denote a nite simple graph. In information theory, where V is an alphabet of
symbols, the graph is the confusion graph where connecting symbols which can be confused.
A function f : V → S ( n + 1) is called an orthonormal representation if the orthogonality
condition ⟨f (u), f (v)⟩ = 0 holds if the vertices are not adjacent. The Lovaz number is dened
as
θ(G) = min maxv∈V ⟨c, U (v)⟩−2 ,
c,U

where c is a unit vector and U is an orthonormal representation. This corresponds to minimizing

the half-angle α of a rotational cone as θ(G) = 1/ cos2 (α), where c is the symmetry axes of the
cone. The Lovasz number is multiplicative in the graph product because one can build for
every power Gn also an umbrella U n . Let α(G) be the independence number of G. It is the
clique number c(G) of the graph complement G. Let χ(G) denote the chromatic number,
the minimal number of colors which one can use to color the graph. It is the clique covering
number β(G) of the graph complement. The following sandwich identity is the key to estimate
the Shannon capacity Θ(G) = limn→∞ α(Gn )1/n [642].

121
FUNDAMENTAL THEOREMS

Theorem: c(Gn ) = α(Gn ) ≤ Θ(G) ≤ θ(G) ≤ β(G) = χ(G).

The Lovasz number θ(G) can be computed in polynomial time in the number of vertices. The
Shannon capacity is sandwiched between the independence number of any power of G and the
Lovasz number. See [431]. An example is α(C52 ) = 5 where (1, 1), √ (2, 3), (3, 5), (5, 4), (4, 2) is
an independent set in the Shannon product G , we have Θ(C5 ) ≥ 5. The Lovasz umbrella
2

} with uk = [cos(t) sin(s), sin(t) sin(s), cos(s)] with cos(s)

U = {u1 , u2 , u3 , u4 , u5√ (
√ = 1/5 1/4), t =
2πk/5 gives θ(C5 ) ≤ 5. Therefore, the Shannon capacity of the pentagon is 5. One does not
know the Shannon capacity of the heptagon. In [476], where also the notation Θ(G) appears.
An exposition about Shannon capacity appears in [497].

241. Shannon capacity theorem

For a communication with bandwidth B , the signal to noise ratio S/N (also abbreviated
SNR) has maximal capacity C . These quantities are related by

Theorem: C = B log2 (1 + S/N ).

This is also called the Shannon capacity theorem. The units are C as bits per second. The
bandwidth is in given Herz, S is the average received signal power measured in Watts and N is
the average power of the noise measured in Watts. The number log2 (1 + S/N ) is the spectral
eciency.
In 1993, turbo codes appeared [110]. These were rst practical codes to get to the Shannon
limit. These codes were already patented by Claude Berrou in 1991. These codes are used in
modern 3G, 4G mobile telephony standards. In 5G wireless communication other codes like
Polar codes are used, which reach Shannon channel capacity [536].

242. Differential Galois theory

A dierential ring is a eld R equipped with a derivation D : R → R which is linear and

satises the Leibniz rule D(f g) = D(f )g + f D(g). The eld of fractions of an integral
domain R (a ring R for which the product of two non-zero elements is not zero) is the smallest
eld containing R. A dierential ring extension R < S has the ring R as a sub-ring and
the derivation of S on restricted to R agreeing with the derivation on R. A dierential ideal
is an ideal I ⊂ R that is invariant under D. If denes the quotient ring R/I with derivation
D(a + I) = D(a) + I . The ring of dierential polynomials over R is the polynomial ring
R[Y1 , Y2 , . . . ] with a countable set of variables in which D is extended as DYi = Yi+1 . If F is a
dierential eld and K a eld extension, then t ∈ K is called elementary if it is generated by
algebraic, a logarithm or an exponential functions.
2
Theorem: f = ex can not be integrated in elementary terms.

After dierentiation, there would have to exist a function f with 1 = f ′ + 2f x. [484]. For a
book, [485] or the lectures [513].

243. Non-linear Schroedinger equation

The non-linear Schrödinger equation (NLSE) iut = −∆u + |u|p u is an example of a nonlin-
ear partial dierential equation for u(t, x) with x ∈ Rd . It is an example of a classical eld
equation which can be used to describe Langmuir waves in hot plasmas or wave propagation

122
OLIVER KNILL

in ber optics in which the non-linearity comes from self-phase modulation. It also appears to
have relevance in understanding the formation of rogue waves in the ocean. The later are
unexpectedly large waves that can endanger ships. In dimension d = 1, the dierential equation
is an example of an integrable system featuring non-linear phenomena like solitons. The
L2 -norm square of u is called the mass |u|2 of u. It is preserved under the evolution. For any λ,
the function uλ (t, x) = λ2/p u(λ2 t, λx) is also a solution and its mass is M (uλ ) = λ−d+4/p M (u).
The mass subcritical case is p < 4/d. The Sobolev space H k (Rd ) is the space of functions
f such that f as well as all its weak derivatives up to order k have nite L2 norm.

Theorem: Global solutions of a subcritical NLSE exist in H 1 (Rd ).

The problem is ill-posed in H 2/d−2/p (Rd ). The mass-critical case is when p = 4/d. In that
case there is a minimal mass m0 for solutions to blow up. See [697].

244. Menger's theorem

Let G = (V, E) be a nite simple graph. For two disjoint subsets A, B , a minimal AB sep-
arator is the minimal number of vertices disjoint from A, B which when removed disconnects
A from B . A maximal AB-connector is the maximal number of pairwise disjoint paths
connecting A with B . Let us denote by |M inimalAB − separators| the number of minimal
AB-separators and similarly for the maximal AB-separators. The result is:

Theorem: |Minimal AB-separators| = |maximal AB-connectors|.

If A and B have an intersection, both numbers are just the cardinality of A ∩ B as zero length
paths {x} ⊂ A ∩ B are considered connectors. An other special case is G is 2-connected with
cut {x} and where A ∪ {x} ∪ B is a disjoint union. Now, {x} is a minimal AB -separator. Since
every path from A to B crosses x, a maximal AB -connector consists of only one path. More
generally, if G is k -connected meaning that we need to remove a vertex set X of cardinality k
to make it disconnected, then if V = A ∪ X ∪ B is a disjoint union, the set X is a minimal AB
separator and a maximal AB connector consist of |X| paths. The proof is done with respect to
the number of edges in G. Menger proved this theorem in 1927 [511]. Menger did not use the
language of graphs but proved it for curves which are compact connected topological spaces
for which the boundary of arbitrary small neighborhoods is disconnected. He considered them
as one-dimensional continua. Menger's research was part of a program about dimension which
works for general topological spaces independent of metric. The graph theoretical version is
a special case as a geometric realization of the one-dimensional skeleton complex V ∪ E of a
graph G = (V, E) denes a curve in Menger's sense.

245. Apéry's theorem

P∞ 1
The Apéry constant ζ(3) = n=1 n3 is a special value of the Riemann zeta function. R.
Apéry proved in 1979 that
Theorem: The Apéry constant is irrational.

While one knows that all ζ(2n) are irrational for n ≥ 1 starting with ζ(2) = π 2 /6, ζ(4) = π 4 /90,
the odd numbers are not yet known for 2n + 1 > 3. One does not know for example whether
ζ(5) is irrational. The problem of whether the Apéry constant is irrational is in [538] the most
mysterious unsolved math problem". One only knows that innitely many of the numbers

123
FUNDAMENTAL THEOREMS

ζ(2n + 1) are irrational. To the history: Euler, who gained fame with the computation of
ζ(2) = π 2 /6 already computed ζ(3) to several digits. The entire book [538] is dedicated to
Zeta-3.

246. Vietoris theorem

A topological space (X, O) is compact if every open cover (a subset F of O whose union is
X ) has a nite sub-cover (a nite subset of F whose union is X ). If A ∈ O then X \ A is called
closed. A topological space is normal if any two closed sets A, B in X have disjoint open
neighborhoods U, V . Non-normal topological spaces are relevant in mathematics: the Zariski
topology on the spectrum of a ring for example is non-normal. The Vietoris theorem is

Theorem: A compact topological space is normal.

Leopold Vietoris proved this in 1921 and considered it his most important result even so he
lived 110 years and wrote his last paper on trigonometric sums with 103 and more than half
of his papers were written after his sixties birthday [598]. Normality is also called Tietze's
normality condition. In modern topology books the normality condition is called Axiom
T4 . Sometimes, normality also assumes Hausdor (any two points in X can be separated by
open neighborhoods). The normality axiom T4 does not imply Hausdor T2 : an example is the
topological space X = ({a, b}, O = {∅, X}) which has only ∅, X as closed sets and both sets are
both open and closed. They also have disjoint open neighborhoods as they themselves are open
neighborhoods ( they are both clopen sets). Indeed, any indiscrete topological space is
normal. But if X contains at least two points, it is not Hausdor as there are then two points
a, b that can not be separated by open neighborhoods. Any indiscrete topological space with
at least two points is also an example of a compact non-Hausdor space. The Theorem
of Vietoris assures that the seemingly stronger condition of normality holds for all compact
topological spaces while the Hausdor property does not always hold. Vietoris is the father of
modern convergence concepts like lter base or nets and modern notions of compactness.
Normality is important because of the Tietze's extension theorem stating that a continuous
function on closed subset of a normal topological space can be extended to the entire space.
The Tietze theorem was proven by Brouwer and Lebesgue for Euclidean spaces, extended by
Tietze to metric spaces and by Urysohn for normal space.

247. Whitney extension theorem

Given a C m function on Rn , Taylor's theorem assures that f (x) = (k)

P
|k|≤m f (y)/k!(x −
y) + |k|=m Rk (x, y)(x − y) /m! with Rk (x, y) → 0 uniformly as x, y → a. This gives relations
k m
P

f (r) (x) = |k|≤m−|r| f (k+r) (y)(x − y)k /k! + Rr (x, y). A set F = {fk } of functions on Rn with
P

multi-index |k| ≤ m satisfying these Taylor compatibility conditions is called a Taylor

compatible set. A closed subset A of Rn has the Whitney extension property if there is
a function f ∈ C m such that f k (x) = fk (x) for x ∈ A and such that f is real analytic on every
point Rn \ A.

Theorem: A Taylor compatible F, A has the Whitney extension property.

Hassler Whitney proved this result at Harvard in 1934 [744] just two years after getting his PhD
there (remarkably in the completely dierent eld of graph theory). He cites [61] who proved
a special case. Chapter 12 in [401] gives an exposition of Whitney's work on the theorem. To

124
OLIVER KNILL

cite from this book: Hass found it a real challenge to go beyond the rst dimension. He drew
picture after picture, but the problem seemed stubbornly intent on putting up a succession of
frustrating barriers. His eventual success in 1933 was a real tour de force for the 26-year old.

248. Markov's inequality

Let X : Ω → R a random variable on a probability space (Ω, A, P) and let a > 0 be a real
constant. In all what follows, using E[f (X)] for some function f assumes that this expectation
is nite, meaning that f (X) ∈ L1 (Ω, P). The Markov inequality is P[|X| ≥ a] ≤ E[|X|]/a.
More generally, if f : [0, ∞) → [0, ∞) is a strictly monotonically increasing function with
f (0) = 0 and a > 0, then

Theorem: P[|X| ≥ a] ≤ E[f (|X|)]/f (a).

The proof is done by dening for all a > 0 a new random variable A(x) R = a if X(x) ≥ a
and A(x) = 0 else. Then 0 ≤ f (A(x)) ≤ f (X(x)) and E[f (X)] = Ω
f (X(x)) dP (x) ≥
f (A(x)) dP (x) = f (a)P[X ≥ a]. This gives P [X ≥ a] ≤ E[f (X)]/f (a). For example, if
R
Ω
f (x) = x2 , applying the inequality to X = Y − E[Y ] such that f (X) = Var[Y ], one has the
Var[X]
Chebyshev's inequality P[|Y − E[Y ]| ≥ a] ≤ a2 . For f (x) = ex one gets the Cherno
X
inequality P[X ≥ a] = P[eX ≥ ea ] ≤ E[eea ] . Since this works also for every f (x) = etx with
tX
t ≥ 0. One has the Cherno bound P[X ≥ a] ≤ inf t≥0 E[eeta ] which is of interest as E[E tX ] is
the moment generating function of the random variable X .

249. Magnus Freiheitssatz

Let G = (X, R) be a nitely presented group with generators X = {x1 , . . . , xq } and

relations R = {r1 , . . . rq }. The group G is called a 1-relator group or Magnus group
if q = 1 and if the relation r has the property that r is cyclically reduced and that all
generators appear in r. [A word is called cylically reduced if every cylic permutation of the
word is reduced. A word is called reduced if it does not contain subwords of the type xx−1
or x−1 x. For example, the word aba−1 is not cyclically reduced because a cylic permutation of
this word can be reduced to b.]

Theorem: Y ⊂ X, Y ̸= X generates a free group in a Magnus group G.

This is a result of Wilhelm Magnus of 1930. The theorem means that given say the genera-
tors x1 , . . . , xq−1 , the only relations involving them are the trivial ones. It is also called the
Independence Theorem. In [234] the Freiheitssatz is considered a non-commutative analog
of a similar result in a commutative algebraic structure. If V is a n-dimensional linear space
over a eld and W ⊂ V is a linear subspace given by a single equation i ai xi = 0, then W
P
has dimension n − 1 meaning that W is a free Abelian subgroup of V . An other analogy in
that overview paper is to compare it with a situation in algebraic geometry: an irreducible
algebraic equation of n complex variables in which all n variables appear can not be used to
derive any irreducible algebraic equation in which not all of these variables appear. The theory
of nitely presented groups was initiated by Max Dehn in 1912. Karl Magnus wrote his thesis
in 1931 under the guidance of Max Dehn. Dehn had raised the word problem, the conjugacy
problem and the isomorphism problem. Dehn also proposed that the Freiheitssatz could hold
true. As Magnus pointed out, the Freiheitssatz assures that the 1-relator group has a positive

125
FUNDAMENTAL THEOREMS

word problem solution. In 1954, Pyotr Novikov had came up with the rst nitely presented
group with insoluble word problem. See [234, 48, 47].

250. Martin's axiom

Let k be a cardinal number (like for example ℵ0 = |N|, the cardinal of the integers which is the
smallest innite cardinal or ⌋ = 2ℵ0 = |R| the cardinal number of the continuum), the Martin
condition M (k) is the statement that if P is a partial order satisfying the countable chain
condition and family D of dense sets in P with cardinality less or equal than k , there is a
lter F on P such that F intersects every element in D. The Martin axiom states that M (k)
holds for every cardinality smaller than 2ℵ0 . One has

Theorem: In ZFC, M (ℵ0 ) holds but M (2ℵ0 ) fails.

The rst statement is the Rasiowa-Sikorski lemma. The second statement is proven by an
example: the set [0, 1] with usual topology is separable and so satises the countable chain
condition. An individual point is nowhere dense but the union has 2α0 points. Martin's
axiom was introduced in 1970 by Tony Martin and Robert Solovay [492]. The continuum
hypothesis CH implies Martin's axiom but it is also consistent with ZF C and the negation
of CH.

251. Von Staudt theorem

Given a nite simple graph K . A spanning tree G in K is also called a maze. If K is a

planar graph, a graph which can be embedded in a plane, then it denes a dual graph K ′
in which the faces (connected regions) are the vertices and two faces are connected if their
boundary is an edge in K . A maze G has a dual maze G′ which is a spanning tree of K ′ using
only connections belonging to edges not in G. It follows from the Jordan curve theorem
that G′ can not have closed loops. It therefore is a tree. Von Staudt theorem is

Theorem: The number of mazes in K = the number of mazes in K ′ .

If E(G) is the number of edges in G and E is the number of edges in K then E(G)+E(G′ ) = E .
Since G and G′ are both trees, we have V (G) = E(G) + 1, V (G′ ) = F = E(G′ ) + 1. Together,
V − E + F = 2 which is Euler's formula. This is the proof of Karl Christian von Staudt
of the Euler polyhedron formula. Von Staudt published this proof in 1847 [671] (of course
using dierent terminology). His key insight was to pair up spanning trees in K with spanning
trees in K ′ . By the Matrix tree theorem, the number of mazes=spanning trees in K is the
pseudo determinant of the Kirchho Laplacian L of K divided by V , the number of vertices.
(The pseudo determinant of a matrix is the product of the non-zero eigenvalues of the matrix.)

252. Banyaga's theorem

Assume that (M, ω) is a compact, connected symplectic manifold. Denote by H(M, ω) the
group of Hamiltonian dieomorphisms of M . These are dieomorphisms on M which
preserve the symplectic structure ω (a closed, non-degenerate 2-form on M ). A group G is
called simple if every normal subgroup is either 0 or G. Banyaga's theorem is:

Theorem: The group H(M, ω) is simple.

126
OLIVER KNILL

Augustin Banyaga, a student of André Haeiger published this theorem in 1978 [44]. The group
H(M, ω) was already studied by him while writing his thesis. After a visit at the institute of
advanced study in Princeton, he was a Benjamin Peirce assistant professor at Harvard from
1978-1982. For more on the theorem, see also [45, 583]. In the introduction of [44] it is stated
that if M has dimension n and k ̸= n + 1, then the larger group Diff k (M ) is a simple group
by work of Epstein, Herman, Mather and Thurston.

253. Tate's theorem

The eld Qp of p-adic numbers is the completion of the rational numbers with respect to the
p-adic valuation | · |p . It is a eld with addition and multiplication extended from the rational
numbers Q. With respect to the additive group structure, G = (Qp , | · |p , +) is a locally
compact topological group. A continuous group homomorphism G → T is called a character.
The character group Ĝ is a topological group again and called the Pontryagin dual of G.
We have for example R̂ = R, T̂ = Z. The dual group of the group of dyadic integers Zp
(the unit disk in Qp ) is called the Prüfer p-group Z(p∞ ) = Qp /Zp . Here is the self duality
theorem of Tate:

Theorem: The Pontryagin dual of Qp is Qp .

The result is also proven in [730] who refers to [273] Theorem 7-1-10, where it is stated for any
local eld, like R, C, Qp or the formal Laurent series over a nite eld. Indeed, the additive
group of a general local eld is always self-dual. [574] states that in John Tate's thesis from
1950 [706], the self-duality was noticed for the rst time. In [146], the dual of Q = A/Q of the
rational numbersQ Q is discussed, where A is the ring of adeles, elements (a∞ , a2 , a3 , . . . ) in the
product set R × p Qp such that ap is in Zp for all but nitely many primes p.

254. Napoleon's theorem

Given a triangle ABC in the plane. Construct equilateral triangles over each of the three
sides and assume all of them are pointing outwards. The center points (centroids) of each of
these three triangles are called Napoleon points and the triangle dened by these points is
the Napoleon triangle.
Theorem: The Napoleon points form an equilateral triangle.

The theorem can quickly be checked using vector geometry. Place coordinates A = (a1 , a2 ), B =
(b1 , b2 ), C = (c1, c2) and write T (x, y) = (−y, x). The coordinates of the Napoleon points are
X = (a+b)/2+(1/3)T (B−A), Y = (b+c)/2+(1/3)T (C −B) and Z = (a+b)/2+(1/3)T (B−A).
Just simplify (X − Y ) · (X − Y ) and see that it is the same than (Y − Z) · (Y − Z) and
(X − Z) · (X − Z). The theorem appears rst in the literature in 1825 in `The Ladies Diary"
[296] and is one of the most often rediscovered results in mathematics" [741]. There is no
evidence known that the theorem has been found by Napoleon.

255. Zero-One law

Let (Ω, A, P ) be a probability space dened by a σ -algebra A on Ω and a probability measure

P . Let {Ai }i∈I be a collection of σ -subalgebras For any nonempty set J ⊂ I , let AJ := j∈J Aj
W

be the σ -algebra generated by j∈J Aj . Dene A∅ = {∅, Ω }. The tail σ -algebra T of {A}i∈I
S

is dened as T = J⊂I,J finite AJ c , where J c = I \ J . We say a σ -sub algebra B is P -trivial

127
FUNDAMENTAL THEOREMS

if P [A] = 0 or P [A] = 1 for all A ∈ B . Two events A, B ∈ A are called independent, if

P [A ∩ B] = P [A] · P [B]. Two σ -algebras A, B are independent if all pairs A ∈ A, B ∈ B are
independent. A collection {Ai }i∈I ofT σ -algebrasQ
is called independent, for every nite subset
J of I and any choice Aj ∈ Aj P [ j∈J Aj ] = j∈J P [Aj ]. The following theorem is called
Kolmogorov's zero-one law:

Theorem: If {Ai }i∈I are independent, then T is P -trivial.

The theorem appears in the appendix in [438] (of course not using modern language but in
terms of random variables). TheTtheorem S can be used as follows: if {An }n∈N is a sequence in A,
∞
dene A∞ := lim supn→∞ An = m=1 n≥m An . It it is the set of all ω ∈ Ω such that ω ∈ An
for innitely many n ∈ N. The set A∞ is contained in the tail σ -algebra of the sequence of
algebras An = {∅, An , Acn , Ω }. It follows from Kolmogorov's 0 − 1 law that P [A∞ ] ∈ {0, 1}
if {An } are P -independent. The following two statements are called rst and second Borel-
Cantelli lemma. The rst one holds for all An :
Pn ] < ∞ ⇒ P [A∞ ] = 0. The second
P
n∈N P [A
one assumes that the An are independent events. Then, n∈N P [An ] = ∞ ⇒ P [A∞ ] = 1. In
the case when An are independent, both statements hold and give a necessary and sucient
condition for a set A∞ to have measure 1 or 0.

256. Polygon dissection

A polygon in the plane is a region in the plane bound by a piece-wise linear simple path with
nitely many vertices. The region does not need to be convex but both the simplicity and
the nite number of corners are crucial. Two polygons G, H in the plane are called equide-
composable A ∼ B if they are scissors-congruent meaning that one can split both into
nitely many triangles G1 , · · · , Gn and H1 , · · · , Hn such that Gk , Hk are isometric: there is a
rotation-translation T (x) = Ax + b such that T (Gk ) = Hk . The notion of equidecomposibility
∼ produces an equivalence relation on triangles. An other equivalence relation is given by
having the same area |G| = |H|. The Wallace-Bolyai-Gerwien theorem tells that these
two equivalence relations are the same:

Theorem: G ∼ H if and only if |G| = |H|.

The theorem also holds for nite simple polygons in spherical or hyperbolic geometry. The
theorem is named after William Wallace, Farkas Wolfgang Boyai (the father of Janos Bolyai
known in non-Euclidean geometry) and Paul Gerwien. Wallace raised the question in 1807
[247], and proved the theorem in 1807. Unaware, Farkas Bolyai posed the question again in
1833 and proved it also in 1833, as did Paul Gerwien independently in 1835. It was pointed
out by Ian Stewart in [678] that Wallace already had a proof in 1807.
The proof is simple and constructive: rst cut each of the two polygons G, H into triangles.
Then show that a triangle is scissors equivalent to a rectangle. Then show that two rectangles
of the same area are equivalent. Every rectangle is especially equivalent to a square. This then
shows that G is equivalent to a square. Also H is equivalent to a square. If the areas are the
same, G and H are equivalent.
The corresponding problem in three dimensions is known as Hilbert's third problem. It was
solved by Max Dehn in 1900, who introduced an invariant, the Dehn invariant D(G) which is
the same for equivalent bodies D(G) = D(H) if G ∼ H . There are however polyhedra G, H of
the same volume |G| = |H| such that D(G) ̸= D(H) implying that G, H can not be equivalent.

128
OLIVER KNILL

In spherical or hyperbolic three-dimensional geometry, the polyedron dissection problem is

open. No Dehn invariant has yet been found there, nor a Wallace-Bolyai-Gerwien type theorem
found there. We used here also information given by [74] or the book [247]. The later gives a
short bio of William Wallas (1768-1843) on page 223. The book [678] states without reference
However, William Wallace got there earlier: he gave a proof in 1807". This version appears
also in the Wikipedia article about Wallas as well as [74].

257. Popoviciu's Theorem

A function f : R → R is called convex if f (z) ≤ (f (x) + f (y))/2 for all points x < z < y . One
can also state it as f (tx+(1−t)y) ≤ tf (x)+(1−t)f (y) for all t ∈ [0, 1] because convexity means
that the graph of f on any interval (x, y) is below or on the line through x, y . No continuity, nor
dierentiability is assumed for the function f . [Convex also does not mean strictly convexity,
which would be the stronger assumption f (z) < (f (x) + f (y))/2 of that the graph of f is below
the line on each open interval (x, y).] The theorem is that for all convex functions f and all
points x, y, z the following inequality holds:

Theorem: f (x) + f (y) + f (z) + 3f ( x+y+z

3
) ≥ 2[f ( x+y
2
) + f ( y+z
2
) + f ( z+x
2
)]

A short proof from 1991 of the Romanian mathematician Tiberiu Popoviciu is given in section 6
of [627]: independent of f one has (x+y+z)/3 ≤ (x+z)/2 ≤ z and (x+y+z)/3 ≤ (y+z)/2 ≤ z .
There are therefore s, t ∈ [0, 1] such that (x + z)/2 = s(x + y + z)/3 + (1 − s)z leading to
f ((x + z)/2) ≤ sf ((x + y + z)/3) + (1 − s)f (z) and similarly also f ((y + z)/2) ≤ tf ((x + y +
z)/3) + (1 − t)f (z). By denition of convexity one also has f ((x + y)/2) ≤ (f (x) + f (y))/2.
Add up these three equations up to get f ((x + y)/2) + f ((y + z)/2) + f ((z + x)/2) ≤ sf ((x +
y + z)/3) + (1 − s)f (z) + tf ((x + y + z)/3) + (1 − t)f (z) + (f (x) + f (y))/2. The claim is obtained
as a special case by plugging in s = 1/2 and t = 1.

258. The two chords theorem

Take two chords AC, BD in a planar circle intersecting in the point M . The intersecting
chords theorem of Euclid tells that

Theorem: |M A||M C| = |M B||M D|

Related is the Ptolemy's theorem |AC||BD| = |AB||CD| + |AD||BC| which appears as the
Second theorem [157]. A special case is AC passes through the origin O . In this case,
BD is perpendicular to AC and h = |BM | = |DM | so that with m = |M A| and n = |M C|
one has the famous formula mn = h2 in a right angle triangle of height h and base lengths
m, n adding up to the hypotenuse length m + n = c. The theorem follows from the theorem
√ | which quickly can√be veried by turning the situation so that AC
2
|M A||M C| = 1 − |OM
is vertical A = (x, − 1 − x2 ) and C = (x, 1 − x2 ) so that with M = (x, y), we can check
|M A||M C| = 1 − |OM |2 . One can also show that ABM and CDM are similar triangles. The
theorem is Proposition 35 in the third volume of Euclid's elements (Volume 2 in [327] page 71).
The theorem appears in [267].

259. Hermite Interlacing

If f is a monic polynomial of degree n with real roots λ1 ≤ · · · ≤ λn and g is a monic polynomial

of degree n − 1 with real roots µ1 ≤ · · · ≤ µn−1 then g interlaces f if λ1 ≤ µ1 ≤ λ2 ≤ · · · ≤

129
FUNDAMENTAL THEOREMS

µn−1 ≤ λn . Equivalent for f and g to interlace is that the interpolation tf + (1 − t)g has real
roots for all t ∈ [0, 1]. The Hermite interlace Theorem (also known as Hermite-Kakeya
theorem for real polynomials which follows from the Hermite-Biehler theorem telling that if
f + ig has all roots in the upper half plane, then f, g strictly interlace (see [592] chapter 6).
The Hermite-Kakeya theorem is:

Theorem: Polynomials f, g interlace ⇔ f + tg has only real roots ∀t ∈ R.

Fisk noticed in [240] that this implies directly implies the Cauchy interlace theorem, assuring
that the characteristic polynomial g of a principal minor sub-matrix B of a symmetric real
n × n matrix A with characteristic polynomial f interlace. This result has many applications
for example in spectral graph theory.

260. Mandelbrot set

The Mandelbrot set M is the parameter domain {c ∈ C the map Tc (z) = z 2 + c has an
unbounded orbit starting at z = 0}. The boundary of the Mandelbrot set is commonly referred
to as a fractal" an object of fractal dimension. It is a connected set [194]. Benoit Mandelbrot
[239] and John Milnor [519] had both conjectured that the boundary has Hausdor dimension
2. It is therefore not an object with fractal dimension".

Theorem: The Hausdor dimension of δM has dimension 2

Mitshuhiro Shishikura proved this in 1991. It was published in the annals of mathematics in
1998 [646]. Furthermore, for a Baire generic c ∈ δM the Hausdor dimension of the Julia set
Jc is 2. It is believed that the 2-dimensional Lebesgue measure of δM is zero. For more recent
results on complex dynamics, see also [520, 510, 117, 191, 487, 631].
The etymology of fractal" is interesting. The term was popularized by the father of fractals"
Benoit Mandelbrot in his book [486]. Mandelbrot rst dened a fractal as an object where the
Hausdor dimension exceeds the topological dimension but then modied this. No commonly
accepted denition therefore has been established. One problem with topological dimension"
is that there are dierent versions of topological dimension like small or large inductive di-
mensions or Lebesgue covering dimension.

261. Moore's classification theorem

A nite eld K is a eld that contains only nitely many points. It is also called a Galois
eld. An example of a nite eld is the prime eld GF (p) = Zp = Z/(pZ) the integers
modulo a prime number p. Given q = pm , one the polynomial xq − x has as roots all the
numbers a = 0, . . . , q − 1 so that xq − q = 0≤a<q (x − a). The roots satisfy the eld axioms
Q

and GF (pm ) is a eld, a eld extension of the prime eld GF (p).

Theorem: Any nite eld is some GF (q) with q = pm for a prime p.

One can look at GF (q) as a Galois extension of GF (p) with cyclic Galois group. The later
is generated by the Frobenius automorphism ϕ : x → xp (linear because of (x+y)p = xp +y p )
endomorphism of GF (q) xing GF (p). One can wonder about structures generating elds like
nite division rings which also are called skew elds. In a division ring division by non-
zero elements is possible. By by Wedderburn's little theorem, all nite division rings are
commutative and so nite elds and by Moore's theorem of the form GF (q) for some power

130
OLIVER KNILL

q = pn of a prime p. The classication theorem for nite elds was rst proven in 1893 by
the American Mathematician Eliakim Hastings Moore also just known as E.H. Moore. Moore
was the PhD advisor of mathematicians like George Birkho (a pioneer of dynamical systems
theory), Leonard Dickson (known through his three-volume about the history in number theory)
or Oswald Veblen (who helped organize the Institute for Advanced Study in Princeton) who
had him self many impressive students like James Alexander (a topologist) or J.H.C. Whitehead
(a founder of homotopy theory) or Alonzo Church (known in theoretical computer science in
the Church-Turing thesis).

262. Factorization of entire functions

A function f (z) of a complex variable z that is analytic in the entire complex plane is called
entire. Let Z = {z1 , z2 , . . . zn , . . . ) enumerate the roots of f . Let E(z, 0) = 1 − z and E(z, n) =
2 n /n
(1 − z)ez+z /2+...+zP for n > 0. The canonical product PZ (z) = z∈Z E(z/zn , p), where
Q
p = p(Z) = inf q≥0 n |zn | −q−1
< ∞. The Weierstrass factorization theorem assures that
every entire function can be written as z m g(z)
for some increasing sequence
Q
z∈Z E(z/zn , λn )e
λn of integers and some entire function g . The Hadamard factorization theorem improves
this to:
Theorem: f entire ⇒ f (z) = z m PZ (z)eQ(z) where Q is a polynomial.

See [612, 73]. Examples of entire functions are polynomials, the exponential function, sine or
cosine functions, derivatives or integrals of entire functions. For example the anti derivative of
2
e−x is an entire function. The Weierstrass factorization theorem and its more powerful version
in the form of the Hadamard factorization theorem generalizes the fundamental theorem of
algebra assuring that every polynomial can be factored into linear factors z − zk . and a non-
zero constant eq . What happens for analytic function is that every factor (1 − z) has to be
modied to E(z, p) and that the constant eq becomes a function eQ with polynomial Q. The
factorization in the Hadamard factorization theorem is also called the Hadamard canonical
factorization.

263. Parry-Sullivan invariant

Let G be a quiver, a directed graph with possible loops and multiple connections. We assume
there are k vertices and n edges. The Kirchho matrix A is a k × k integer matrix with non-
negative integers Aij whose total sum is n. The matrix entry Aij tells how many edges there
are from i to j . One can now associate to such a graph an invariant subset X = XA of S Z ,
where S = {1, . . . , n} is the alphabet consisting of edges. An element x in X denes an innite
path in G, where xj is the edge j ∈ {1, . . . , s}. X is also called a subshift of nite type
or a topological Markov shift. The shift is an intrinsic Markov chain with a unique invariant
measure maximizing entropy, which is log(λ), where λ is the maximal eigenvalue of A. The
map TA describes a ow T on F = X × R/(TA × x → x + 1), the Smale ow. Two incidence
matrices A, B are called ow equivalent if F (A) and F (B) are conjugated ows.

Theorem: If A, B are ow equivalent, then det(1 − A) = det(1 − B).

The theorem is named after Bill Parry and Dennis Sullivan who published it in 1975 [565]. The
invariant
P relates to the value ζ(1) of the Bowen-Lanford zeta function ζ(s) = 1/ det(1 − tA) =
exp( ∞ N
n=1 n t n
/n) where Nn = tr(An ) is the number of periodic points of length n of the shift.

131
FUNDAMENTAL THEOREMS

For a general dynamical system with nitely many periodic orbits of each period, ζ(s) is also
known as the Artin-Mazur zeta function. For ow invariant matrices A, B , the value of the
zeta function at t = 1 agrees. To reverse the theorem, one needs a bit more than the invariant.
If the Bowen-Franks group Zn /((1 − A)Zn ) and the Parry-Sullivan invariant agree, then the
systems are equivalent. The topic of one-dimensional ows has also been investigated by Rufus
Bowen, John Franks [88].

264. The pizza theorem

Cut a disk into 4n ≥ 4 dierent slices by cutting from a point P inside the disk along 4n-equi-
angular spaced spider. Lets call this a equi-angular spider partition of the disk. We can
number them 1, . . . , 4n. Call A the set of even slices and B the set of odd slices.

Theorem: The area of even and odd slices is the same.

The result is obvious if one of the diameters is perpendicular to the boundary. This can be
done with 4n dierent slices. By renumbering, the result holds therefore if there is one angle
perpendicular. By continuity, we have now the result in general. One can also prove the theorem
with calculus [339]. If a = r(θ), b = r(θ + π/2), c = r(θ + π), d = r(θ + 3π/2), and α = π/(2n)
R kα
then the area of k ′ th slice is (k−1)α (a2 + b2 + c2 + d2 )/2 dθ. From (a2 + b2 + c2 + d2 ) = 1 follows
that the result is θ-invariant. The problem was posed in 1967 by L.J. Upton and solved in 1968
by Michael Goldberg. See [125] on page 192.

265. Implicit function theorem

Let X, Y be Banach spaces for which balls Br (x0 ) are weakly compact, like for example a
Hilbert space. A map f : X → Y is Neuberger in Br (x0 ) if it is continuous in Br (x0 ) and if
for every x ∈ Br (x0 ), there exists h ∈ Br (0) such that the one-sided Gateau derivative Dh f =
limt→a+ 0 1t (f (x + th) − f (x)) exists and satises Dh (f ) = −f (x0 ). This applies if f is Fréchet
dierentiable and such that for every x ∈ Br (x0 ) there exists h such that f ′ (x)h = −f (x0 ) in
a r-neighborhood of x0 . If f ′ is invertible, we can choose h = −f ′ (x0 )−1 f (x0 ) and can use the
Newton method x → x + h to get closer to the root. Neuberger maps satisfy therefore the
classical implicit function theorem: if f0 (x0 ) = 0 and f0′ (x0 ) is invertible, then for any small
enough |ϵ|, there exists x ∈ Br (x0 ) such that fϵ (x) = 0. Neuberger's theorem only requires the
directional Gateau derivatives to exist and not the stronger Fréchet derivative.

Theorem: If f is Neuberger in a ball B , there is a root in B .

The result means in one dimension that |f ′ (x)| > |f (0)|/r for all |x| < r, then f has a root within
distance r. For a function f (x, y) having |∇f (x, y)| > |f (x0 , y0 )|/r in |(x−x0 , y−y0 )| < r assures
that f has a root in distance r: follow the anti-gradient eld −∇f (x, y) to reach the root. Here
is the proof: given ϵ > 0, dene S = {s ∈ [0, 1] | ∃x ∈ Brs (x0 ) with ||f (x)−(1−s)f (x0 )|| ≤ ϵs .}.
Dene λ = sup{s ∈ S }. Assume λ < 1. We aim to get from this a contradiction so that λ = 1.
The set S is closed, because f is continuous. The set Br (x0 ) is compact in the weak topology.
We can therefore pick h ∈ Br (0), and δ ∈ (0, 1 − λ] so that || 1δ (f (x + δh) − f (x)) + f (x0 )|| ≤ ϵλ.
The later gives ||f (x + δh) − f (x) + δf (x0 )|| ≤ ϵδ . By choice of the constants, ||x + δh|| ≤
λr + δr = (λ + δ)r. By the triangle inequality,
||f (x + δh) − (1 − δ − λ)f (x0 )|| ≤ ||f (x + δh) − f (x) + δf (x0 )|| + ||f (q) − (1 − λ)f (x0 )|| ≤ ϵδ + ϵλ .

132
OLIVER KNILL

Therefore, δ + λ ∈ S , contradicting that λ is the largest element in S and it follows that λ = 1.

Having for each ϵ > 0 an element xϵ such that f (xϵ ) ≤ ϵ implies by the continuity of f and the
compactness of Br (x0 ) that there exists x in Br (x0 ) with f (x) = 0. See [447] for the implicit
function theorem and [541] for Neuberger's theorem.

266. Schur product theorem

The Hadamard product of two matrices A, B is (A ∗ B)ij = Aij Bij produces an associative,
distributive and commutative structure on the linear space of n × m matrices. It is also called
Schur product or array multiplication. A symmetric matrix A is positive denite if for
every vector v the number v ∗ Av is positive, where v ∗ = v T is the conjugate row matrix.
Theorem: The Hadamard product preserves positive denite matrices.
The product was introduced by Issai Schur in 2011. In the same paper, the theorem was proven
[634]. The Hadamard product appears in machine learning or image processing. In the later
case one can for example mask a picture by Hadamard multiplying the picture with some mask
matrix. As noted in the wikipedia article about the Schur product theorem, if A is symmetric
and A ∗ B is positive denite for all positive denite matrices B , then A is positive denite.

267. Divine triangles

A right angle triangle with side length x, y, z is called divine if both x+y and z are squares. One
can write this as a Diophantine problem: the task is to nd integers x, y such that z 2 = x2 + y 2
is a fourth power and x + y a square: x2 + y 2 = a4 , x + y = b2 . I has been mentioned rst in
May 31, 1642 by Fermat in a letter to St Martin and Frenicle. [576] page 74 mentions that it
appeared also in a letter of 1643 from Fermat to his friend Marin Mersenne.
Theorem: x = 4565486027761, y = 1061652293520 denes a divine triangle.
This is remarkable because a brute force verication of this would hardly be possible by hand
only. Pickover calls the triangles therefore divine triangles. In [576] he mentions that
Pythagoras would have admired anybody who could compute with such large numbers as
gods and that we (with the help of computers and mathematics) have become gods. The topic
is mentioned in [176]. The mathematics for nding solutions has been discovered by Fermat
himself and an other by Euler. See [?], where a relation to an elliptic curve t4 + 4t3 − 6t2 − 4t + 1
is shown.

268. Alexander subbase theorem

A subbase of a topological space is a subset of the topology which generates the topology,
where generates" means that every open set can be obtained as a union of nite intersections
of the subbase. [Note that the intersection of no sets is considered to be the whole space X so
that a subbase does not need to cover the space. For example, for any set X , the empty set
B = {} is a subbase that generates the trivial topology {∅, X} on X while B does not cover
X .] If (X, O) is a topological space with subbase B such that every cover of X by elements in
B has a nite subcover, we say that X is compact with respect to the subbase B . The
Alexander sub-base theorem assures that

Theorem: X is compact ⇔ X is compact with respect to a subbase.

133
FUNDAMENTAL THEOREMS

The theorem was proven by James Waddell Alexander. The theorem is useful because it allows
to prove Tychono's theorem more easily. That theorem (proven by Andrey Nikolayevich
Tikhonov in 1935), is the statement that the product of non-empty compact topological spaces
is compact.

269. Double suspension theorem

A homology 3-sphere is a 3-manifold that has the same homology groups than a 3-sphere
but which is not homeomorphic to a sphere. An example is the Poincaré homology sphere
X = S 3 /A, where A = 2I is the binary icosean group, a nite group with 120 elements. The
fundamental group of X is the group A. The suspension of a topological space is X + S0 ,
where S0 is the 0-sphere. It has dimension 1 + dim(X). This is the disjoint union of X and
S0 together with all connections from a point in X to a point in S0 . If the suspension is done
twice, it is called a double suspension. This can also be seen as the join of X with a circle.

Theorem: The double suspension of a homology 3-sphere X is a 5-sphere.

The question whether the double suspension of a homology 3-sphere is a 5-sphere had been
asked by John Milnor in 1963. The theorem has been proven by Robert Duncan Edwards in
1970 [213] and by James W. Cannon in 1979.

270. Theorem of three geodesics

A d-sphere is a Riemannian d-manifold with the topology of a sphere meaning that it is home-
omorphic to a d-sphere. A periodic geodesic is an embedded circle which locally minimizes
length. It has been conjectured by Henry Poincaré in 1905 that every 2-sphere has at least
three periodic geodesics. This was proven in 1929 by Lazar Lusternik and Lev Schnirelmann.

Theorem: Every 2-sphere has at least 3 simple periodic geodesics.

[413] has a Morse theoretical proof. Lusternik and Schnirelmann used the Lusternik-Schnirelman
category. See [?].

271. Loewner's Torus Inequality

If M is a Riemannian manifold, the systole sys(M ) of M is the length of the shortest non-
contractible closed path in M . If M is a 2-dimensional surface and area(M ) is its area, one can
compare the systole with its area. One of the rst estimates in systolic geometry is Loewner's
torus inequality for the torus M :

Theorem: sys(M )2 ≤ √2 area(M )

Charles Loewner proved this in 1949 [588], probably motivated by results on the girth intro-
duced in 1947 by W. Tutte in graph theory [395]. The inequality √ is optimal as M = R /Λ
2

with the hexagonal lattice Λ spanned by ω1 = (1, 0), ω2 = (1, 3)/2 provides the extremal
case. In the complex,
√ the hexagonal lattice is the Eisenstein elliptic curve C/{a + bω}
with ω = (1 + 3i)/2 being the cube root of unity. The corresponding elliptic curve is
y 2 = 4x3 − 1 [562]. The constant √23 is the Hermite constant in R2 , the shortest nonzero
element {a − b, a, b ∈ Λ} in the lattice Λ, scaled such that the covolume area(R2 /Λ) = 1. Pao
Ming Pu, a student of Loewner proved in 1951 [588] a similar inequality for the projective

134
OLIVER KNILL

plane M = P2 and mentioned the then still unpublished result of Loewner. For the projective
plane, one has sys(M )2 ≤ π2 area(M ). For a general surface dierent from a sphere, an inequal-
ity sys(M )2 ≤ γarea(M ) holds which is sharp for constant Gaussian curvature [397]. Modern
proofs are integral-geometric: write the area area(T2 ) as an average of energies of parallel loops
coming from straight lines in R2 , foliating the torus.

272. Ramanujan's Cubic identity

The quadratic form Q(x) = |x1 + ix2 |2 = x21 + x22 satises Q(x)Q(y) = Q(x ∗ y), where
x ∗ y is the complex multiplication. There is a cubic analog. If C be the cubic form C(x) =
x31 +x32 +x33 −3x1 x2 x3 and if for two vectors x, y one denes the product [x1 , x2 , x3 ]∗[y1 , y2 , y3 ] =
[x1 y1 + x2 y3 + x3 y2 , x2 y2 + x1 y3 + x3 y1 , x3 y3 + x1 y2 + x2 y2 ], then

Theorem: C(x)C(y) = C(x ∗ y)

 
x1 x2 x3
The proof is by noticing that the matrix A(x) =  x3 x1 x2 . has the property C(x) =
x2 x3 x1
det(A(x)). The cubic identity of Ramanujan now just follows from the Cauchy-Binet formula
det(AB) = det(A)det(B). If a function f is a cubic form then the Hessian H(f ) = det(d2 f ) is
again a cubic form. Now H(C) = −54C . In other words, the Ramanujan cubic form C is an
eigenvector of the Hessian operator on cubic forms. See [555].

Epilogue: Value

Which mathematical theorems are the most important ones? This is a complicated variational
problem because it is a general and fundamental problem in economics to dene value". The
diculty with the concept is that value" is often a matter of taste or fashion or social inuence
and so an equilibrium of a complex social system. Value can change rapidly, sometimes
triggered by small things. The reason is that the notion of value like in game theory depends
on how it is valued by others. A fundamental principle of catastrophe theory is that maxima
of a functional can depend discontinuously on parameter. As value is often a social concept,
this can be especially brutal or lead to unexpected viral eects. First of all, value is often
linked to historical or morale considerations. We tend more and more to link artistic and
scientic value also to the person. In mathematics, the work of Oswald Teichmüller or Ludwig
Bieberbach for example are linked to their political view and so devalued despite their brilliance
[638]. This happens also outside of science, in art or in industry. The value of a company now
also depends on what investors think" or what analysts see for potential gain in the future.
Social media try to measure value using likes" or number of followers". A majority vote is a
measure but how well can it predict correctly what be valuable in the future? Majority votes
taken over longer times would give a more reliable value functional. Assume one could persuade
every mathematician to give a list of the two dozen most fundamental theorems and do that
every couple of years, and reect the wisdom of an educated crowd", one could probably get a
pretty good value functional. Ranking theorems and results in mathematics are a mathematical
optimization problem by itself. One could use techniques known in the search industry". One
idea is to look at the nite graph in which the theorems are the nodes and where two theorems
are related to each other if one can be deduced from the other (or alternatively connect them
if one inuences the other strongly). One can then run a page rank algorithm [462] to see

135
FUNDAMENTAL THEOREMS

which ones are important. Running this in each of the major mathematical elds could give an
algorithm to determine which theorems deserve the name fundamental". Now, there was also
a problem with publishing the page rank as people tried to manipulate it using search engine
optimization tricks. Google now does no more give the page rank of a website, simply to avoid
such manipulations. The story illustrates that reecting about algorithms that measure value
can inuence the algorithm itself and even destroy it. Similarly as in quantum mechanics, the
measurement process can inuence the experiment to the point that it is no more reliable.

Opinions

It had been a course Math from a historical perspective" taught a couple of times at the Har-
vard extension school has motivated to write up the present document. As part of a project
it was often asked to to write about some theorems or mathematical elds or a mathematical
person and try to rank it. The present document benets from these writings as it is interesting
to see what others consider important. Sometimes, seeing dierent opinions can change your
own view. I was denitely inuenced by students, teachers, colleagues and literature as well of
course by the limitations of my own understanding. My own point of view has already changed
while writing the actual theorems down and will certainly change more. Value is more like an
equilibrium of many dierent factors. In mathematics, values have changed rapidly over time.
And mathematics can describe the rate of change of value [585]. Major changes in the appre-
ciation for mathematical topics came throughout the history. Sometimes with dramatic shifts
like when mathematical notations started to appear, at the time of Euclid, then at the time
when calculus was developed by Newton and Leibniz. Also the development of more abstract
algebraic constructs or topological notions, like for example the start of set theory changed
things considerably. In more modern times, the categorization of mathematics and the
development of rather general and abstract new objects, (for example with new approaches
taken by Grothendieck) changed the landscape. In most of the new development, I remain the
puzzled tourist wondering how large the world of mathematics is. It has become so large that
continents have emerged: we have applied mathematics, mathematical physics, statis-
tics, computer science and economics which have drifted away to independent subjects and
departments. Classical mathematicians like Euler would now be called applied mathematicians,
de Moivre would maybe be stamped as a statistician, Newton a mathematical physicist and
Turing a computer scientist and von Neuman an economist or physicist.

A couple of months before starting this document in 2018, when looking online for George
Green", the rst hit in a search engine would be a 22 year old soccer player. (This was not
a search bubble thing [564] as it was tested with cleared browser cache and via anonymous
VPN from other locations, where the search engine can not determine the identity of the user).
Now, I love soccer, played it myself a lot as a kid and also like to watch it on screen, but
is the English soccer player George William Athelston Green really more relevant" than the
British mathematician George Green, who made fundamental break through discoveries which
are used in mathematics and physics? Shortly after I had tweeted about this strange ranking
on December 27, 2017, the page rank algorithm must have been adapted, because already on
January 4th, 2018, the Mathematician George Green appeared rst (again not a search bubble
phenomenon, where the search engine adapts to the users taste and adjusts the search to
their preferences). It is not impossible that my tweet has reached, meandering through social

136
OLIVER KNILL

media, some search engine engineer who was able to rectify the injustice done to the miller
and mathematician George Green. The theory of networks shows small world phenomena"
[732, 46, 731] can explain that such inuences or synchronizations are not that impossible
[686]. But coincidences can also be deceiving. Humans just tend to observe coincidences even
so there might be a perfectly mathematical explanations. This is prototyped by the birthday
paradox [502]. But one must also understand that search needs to serve the majority. For a
general public, a particular subject like mathematics is not that important. When searching
for Hardy" for example, it is not Godfrey Hardy who is mentioned rst as a person belonging
to that keyword but Tom Hardy, an English actor. This obviously serves most of the searches
better. As this might infuriate particular groups (here mathematicians), search engines have
started to adapt the searches to the user, giving the search some context which is an important
ingredient in articial intelligence. The problem is the search bubble phenomenon which runs
hard against objectivity. Textbooks of the future might adapt their language, diculty and
even their citations or the historical credit on who reads it. Novels might adapt the language to
the age of the user, the country where the user lives, and the ending might depend on personal
preferences or even the medical history of the user (the medical history of course being accessible
by the book seller via `big data" analysis of user behavior and tracking which is not SciFi this is
already happening): even classical books are cleansed for political correctness, many computer
games are already customizable to the taste of the user. A person agged as sensitive or a
young child might be served a happy ending in a novel rather than a conclusion of the novel in
an ambivalent limbo or even a disaster. [564] explains the diculty. The issues have amplied
even more in more recent times. The phenomenon of lter bubble even inuences elections and
polarizes opinions as one does not even hear any more alternate arguments.

Beauty

In order to determine what is a fundamental theorem", also aesthetic values matter. But the
question of what is beautiful" is even trickier. Many have tried to dene and investigate the
mechanisms of beauty: [317, 738, 739, 610, 655, 9, 526]. In the context of mathematical formu-
las, the question has been investigated within the eld of neuro-aesthetics. Psychologists, in
collaboration with mathematicians have measured the brain activity of 16 mathematicians with
the goal to determine what they consider beautiful [620]. The Euler identity eiπ + 1 = 0 was
rated high with a value 0.8667 while a formula for 1/π due to Ramanujan was rated low with an
average rating of -9.7333. Obviously, what mattered was not only the complexity of the formula
but also how much insight the participants got when looking at the equation. The authors
of that paper cite Plato who wrote once "nothing without understanding would ever be more
beauteous than with understanding". Obviously, the formula of Ramanujan is much deeper but
it requires some background knowledge for being appreciated. But the authors acknowledge
in the discussion that that correlating beauty and understanding" can be tricky. Rota [610]
notes that the appreciation of mathematical beauty in some statement requires the ability to
understand it. And [526] notices that even professional mathematicians specialized in a certain
eld might nd results or proofs in other elds obscure" but that this is not much dierent
from say music, where knowledge about technical details such as the dierences between things
like cadences, progressions or chords changes the way we appreciate music" and that the sym-
metry of a fugue or a sonata are simply invisible without a certain technical knowledge". As
history has shown, there were also always artistic connections" [256, 102] as well as religious
inuences" [474, 657]. The book [256] cites Einstein who denes mathematics as the poetry

137
FUNDAMENTAL THEOREMS

of logical ideas". It also provides many examples and illustrations and quotations. And there
are various opinions. Rota argues that beauty is a rather objective property which depends
on historic-social contexts. And then there is taste: what is more appealing, the element of
surprise like the Birthday paradox or Petersburg paradox in probability theory, the Banach-
Tarski paradox in measure theory which obviously does not trigger any enlightenment nor
understanding if one hears the rst time: one can disassemble a sphere into 5 pieces, rotate
and translate these pieces in space to build up two spheres. Or the surprising fact that the
innite sum 1 + 2 + 3 + 4 + 5 + . . . is naturally equal to −1/12 as it is ζ(−1) (which is a value
dened by analytic continuation and can hardly be understood without training in complex
analysis). The role of aesthetic in mathematics is especially important in education, where
mathematical models [237], mathematical visualization [43], artistic enrichment [241], surfaces
[448], or 3D printing [639, 428] can help to make mathematics more approachable. Update 2019:
as reported in Science Daily a study of the university of Bath concludes that people appreciate
beauty in complex mathematics [378]. The results which had been chosen in that study had
been rather simple however: the innite geometric series formula, the Gauss's summation trick
for positive integers, the Pigeonhole principle, and a geometric proof of a Faulhaber formula
for the sum the rst powers of an integer. When judging the mathematics describing physical
models, Paul Dirac was probably the most outspoken advocate for beauty. He stated in [185]
for example: It seems to be one of the fundamental features of nature that fundamental physical
laws are described in terms of a mathematical theory of great beauty and power, needing quite
a high standard of mathematics for one to understand it.

Deepness

A taxonomy is a way to place objects like theorems in an multi-dimensional cube of numerical

attributes. Besides the ugly-beauty parameter, one can think of all kind of taxonomies
to classify theorems. There is the simplicity-complexity axes, which could be measured
by the number of mathematicians who can understand the proof, the boring-interesting
axes which measures the entertainment value or potential for pop culture appearances, the
useless-applicable axes which measures how many applications the theorem has in engineer-
ing, economics or other sciences, the easy-hard which could be measured in the amount of
time one needs to understand the proof. And then there is the shallow-deepness axes, which
is even more subjective but which could be quantied too. One could look for example, how
long a proof path is from basic axioms to the theorem and weight each path with how many
other interesting theorems have been visited along. Also of benet are how many dierent
areas of mathematics have been visited along the proof. A deep theorem could be obtained
by proving it with dierent long paths, each reaching other already established deep results.
One can now argue how to average all these paths, whether one should take the minimum or
maximal deep proof path. The later point was addressed in [461].

Maybe unlike with other parameters, the antipode trivial" of deepness" has a positive side
too: it is maybe not shallow" but what we call fundamental". Fundamental theorems are
not necessarily deep. The Pythagorean theorem for example or Zorn's lemma are not deep
but they are fundamental. Basic logical identities based on Boolean algebra which are used in
almost every proof step are of fundamental importance but not deep. One could still go back
and measure how fundamental something is by how many deep theorems can be proven with

138
OLIVER KNILL

it.

[716] points out that the adjective deep" is used for all kind of mathematical objects: theorems,
proofs, problems, insights or concepts can be described as deep and that often the theorem is
called deep if its proof is deep. Urquhart points out however that if a simple proof is discovered
later, perhaps the result might be reclassied as not deep at all" and that so, the diculty of
the concept mathematical depth" is not so well dened. The author then mentions the graph
minor theorem (in every innite set of graphs there are two for which one is the minor of the
other), which Diestel [178] calls one of the deepest theorems that mathematics has to oer.
Some justication for the deepness of the result is that it has made impact also outside graph
theory and that its proof takes well over 500 pages.

[716] also collects opinions of philosophers and mathematics about deepness. Cited is for exam-
ple [317] as Hardy gives an extended discussion on depth and sees mathematical ideas arranged
somehow in strata, each stratum being linked by a complex relation both among themselves
and with those above and below, the lower the stratum, the deeper the idea. Also cited is
the book of Penelope Maddy [483] which expresses doubt that that mathematical depth really
can be accounted for productively because it is a catch-all" for the various kinds of virtues
and often used as a term of approbation, but always in an informal context without giving
a precise meaning. Also cited in [716] are present day mathematicians like Gowers [281] who
links deep" with hard" and contrasts it with obvious". If a proof requires a non-obvious idea,
then it is considered deep. Also cited is a later statement of Gowers telling that The normal
use of the word `deep' is something like this: a theorem is deep if it depends on a long chain of
ideas, each involving a signicant insight". Finally mentioned is Tao [700] who lists over twenty
meanings to good mathematics": (be a breakthrough for solving a problem, masterfully using
technique, building theory, having insight, discovering something unexpected, having applica-
tion, clear exposition, good pedagogy enabling understanding, long-range vision, good taste,
public relations, advancing foundations, rigorous, beautiful, elegant, creative, useful, sharp to
known counterexamples, intuitive and visualisable, being denitive like a classication result
and nally deep which Tao denes as manifestly non-trivial, for instance by capturing a subtle
phenomenon beyond the reach of more elementary tools".) [716] also illustrates the concept of
deepness with moves one sees in chess: a combination of moves which are not obvious and have
an element of surprise like in the Byrne-Fischer game of 1963-1964.

In a talk Mathematical Depth Workshop" of April 11,12, 2014 John Stillwell gave the follow-
ing examples of deep theorems: Dirichlet's theorem on primes in an arithmetic progression,
Perelman's theorem on Poincaré's conjecture, Fermat's last theorem and then the classication
of nite simple groups. A deep theorem should be dicult, surprising, important, fruitful,
elegant and fundamental. As less deep but accessible, he gives the independence of the parallel
postulate, the fundamental theorem of algebra, the existence of division algebras, the Riemann
integrability of continuous functions, the uncountability of R. Robert Geroch told in that same
workshop that deep theorems should be detached from connections with people, or then have
connections with physics: examples are representations of the Lie group SL(2, C), the TCP
theorem or the appearance of symmetric hyperbolic partial dierential equations. Jeremy
Gray stressed then the importance of multiple proofs, to give more reasoning, show dierent
methodologies, see new routes or produce more purity. He said that the dierence between
deep and dicult is that deep things should be more hidden. Deep according to Gauss has to

139
FUNDAMENTAL THEOREMS

be dicult". The result may be elegant or beautiful, but the proof needs to be dicult. Marc
Lange [461] argues to assign the attribute deep to the proof of a theorem and not the theorem
itself. The reason is that there could be multiple proofs, where one proof is deeper than the
other. This could mean for example that a theorem which is considered deep, remains to have
a deep proof even in the case if it turns out to be provable in a very simple and dull way.

The fate of fame

Aesthetics is a fragile subject. If something beautiful has become too popular and so entered
pop-culture, a natural aversion against it can develop. The feeling is justied that popular
things are often frivolous. It is also in danger to become a clishé or even become kitsch
(which is a word used to tear down popular stu or to label poor taste). The Mandelbrot set
for example is just marvelous, but it does hardly does excite anymore because it is so commonly
known. The Monty-Hall problem which became famous by Gardner columns in the early
1990'ies (see [658, 609]) was cool to teach in 1994, three years after the infamous parade
column" of 1991 by Marilyn vos Savant which blew it into the spot light. But especially after
a cameo in the movie 21", the theorem has become part of mathematical kitsch. I myself
love mathematical kitsch. A topic that gained that status must have been nice and innovative
to obtain that label. Kitsch becomes only tiresome however if it is not presented in a new and
original form. The book [566], in the context of complex dynamics, remains a master piece still
today, even-so the picture have become only too familiar, but rendering the Mandelbrot set
today in that same way hardly does the rock the boat any more. Still, it remains fascinating
and more and youtube allows to see sophisticated zooms down to the size of 10−200 . In that
context, it appears strange that mathematicians do not jump on the Mandelbulb set" M , a
three dimensional version of the Mandelbrot set which is one of the most beautiful mathematical
objects. The reason could be that as a youtube star" it is not worthy yet any serious academic
consideration; more likely however is that the object is just too dicult for a serious study,
as we lack the mathematical analytic tools which for example would just to answer a basic
question like whether M is connected. A second example is catastrophe theory [585, 719]
a beautiful part of singularity theory which started with Hassler Whitney and was then
developed by René Thom [708]. It was hyped to much that it fell into a deep fall from which it
has not yet fully recovered. This happened despite the fact that Thom himself already pointed
out the limits, as well as the controversies of the theoryy [87]. It had to pay a prize for its fame
and appears to be forgotten. Chaos theory from the 60ies which started to peak with Edward
Lorenz and terms like the Buttery eect" strange attractors" started to become a clishé
latest after that infamous scene featuring the character Ian Malcolm in the 1993 movie Jurassic
park. It was laughed at already within the same movie franchise, when in the third Jurassic
Park installment of 2001, the kid Erik Kirby snus on Malcolm's preachiness" and quotes his
statement everything is chaos" in a condescending way. In art, architecture, music, fashion or
design also, if something has become too popular, it is despised by the connaisseurs". Hardly
anybody would consider a lava lamp" (invented in 1963) a object of taste nowadays, even so,
the uid dynamics and motion is objectively rich and interesting, illustrating also geometric
deformation techniques in geometry like the Ricci ow. The piano piece Für Elise" by Ludwig
van Beethoven became so popular that it can not even be played any more as background music
in a supermarket. There is something which prevents a serious music critic" to admit that
the piece is great, genius due to its simplicity. Such examples suggest that it might be better
for an achievement (or theorem in mathematics) not to enter pop-culture as this indicates a

140
OLIVER KNILL

lack of deepness" and is therefore despised by the elite. The principle of having fame torn
down to disgrace is common also outside of mathematics. Famous actors, entrepreneurs or
politicians are not universally admired but sometimes hated to the guts, or torn to pieces and
certainly can hardly live normal lives any more. The phenomenon of accumulated critique
got amplied with mob type phenomena in social media. There must be something fullling
to trash achievements, the simplest explanation being envy. Film critics are often harsh and
judge negatively because this elevates their own status as they appear to have a high standard".
Similarly morale judgement is expressed often just to elevate the status of the judge even so
experience has shown that often judges are oenders themselves and the critique turns out to
be a compensation. Maybe it is also human Schadenfreude", or greed which makes so many
to voice critique. History has shown however that social value systems do not matter much
in the long term. A good and rich theory will show its true value if it is appreciated also in
hundreds of years, where fashion and social inuence have no more any impact. The theorem
of Pythagoras will be important independent of fame and even if it has become a cliché, it is
too important to be labeled as such. It has not only earned the status of kitsch, it is also a
prototype as well as a useful tool.

Media

There is no question that the Pythagorean theorem, the Euler polyhedron formula
χ = v − e + f the Euler identity eiπ + 1 = 0, or the Basel problem formula 1 + 1/4 + 1/9 +
1/16 + · · · = π/6 will always rank highly in any list of beautiful formulas. Most mathematicians
agree that they are elegant and beautiful. These results will also in the future keep top spots in
any ranking. On social networks, one can nd lists of√favorite formulas. On Quora", one can
nd the arithmetic mean-geometric mean inequality ab ≤ (a + b)/2 or the geometric sum-
mation formula 1 + a + a2 + · · · = 1/(1 − a) high up. One can also nd strange contributions
in social media like the identity 1 = 0.99999 . . . which is used by Piaget inspired educators
to probe mathematical maturity of kids. Similarly as in Piaget's experiments, there is time of
mathematical maturity where a student starts to understand that this is an identity. A very
young student thinks 1 is larger than 0.9999... even if told to point out a number in between.
Such threshold moments can be crucial for example to mathematical success later. We have a
strange fascination with wunderkinds", kids for which some mathematical abilities have come
earlier (even so the existence of each wonder kid produces a devastating collateral damage in
its neighborhood as their success sucks out any motivation of immediate peers). The problem
is also that if somebody does not pass these Piaget thresholds early, teachers and parents con-
sider them lost, they get discouraged and become uninterested in math (the situation in other
art or sport is similar). In reality, slow learners for which the thresholds are passed later are
often deeper thinkers and can produce deeper or more extraordinary results. At the moment,
searching for the most beautiful formula in mathematics" gives the Euler identity and search
engines agree. But the concept of taste in a time of social media can be confusing. We live in
an epoch, where a 17 year old social inuencer" can in a few days gather more followers" and
become more widely known than Sophie Kovalewskaya who made fundamental beautiful
and lasting contributions in mathematics and physics like the Cauchy-Kovalevskaya theorem.
Such a theorem is denitely more lasting than a few sele shots" of a pretty face, but mea-
sured by a majority vote", it would not only lose, it would completely disappear. One can nd
youtube videos of kids explaining the 4th dimension, which are watched millions of times, many
thousand times more than videos of mathematicians who have created deep mathematical new

141
FUNDAMENTAL THEOREMS

insight about four dimensional space. But time recties. Kovalewskaya will also be ranked
highly in 50 years, while the pretty face has faded. Hardy put this even more extremely by
comparing a mathematician with a literary heavy weight: Archimedes will be remembered when
Aeschylus is forgotten, because languages die and mathematical ideas do not. [317] There is no
doubt that lm and TV (and now internet like Youtube", social networks and blogs") has a
great short-term inuence on value or exposure of a mathematical eld. Examples of movies
with inuence are It is my turn (1980), or Antonia's line (1995) featuring some algebraic
topology, Good will hunting (1997) in which some graph theory and Fourier theory appears,
21 from (2008) which has a scene in which the Monty Hall problem has a cameo. The man
who knew innity displays the work of Ramanujan and promotes some combinatorics like
the theory of partitions. There are lots of movies featuring cryptology like Sneakers (1992),
Breaking the code (1996), Enigma (2001) or The imitation game (2014). For TV, math-
ematics was promoted nicely in Numb3rs (2005-2010). For more, see [582] or my own online
math in movies collection.

Professional opinions

Interviews with professional mathematicians can also probe the waters. In [440], Natasha Kon-
dratieva has asked a number of mathematicians: What three mathematical formulas are the
most beautiful to you". The formulas of Euler or the Pythagoras theorem naturally were
ranked high. Interestingly, Michael Atiyah included even a formula "Beauty = Simplicity +
Depth". Also other results, like the Leibniz series π/4 = 1 − 1/3 + 1/5 − 1/7 + 1/9 − . . . , the
Maxwell equations dF = 0, d∗ F = J or the Schrödinger equation iℏu′ = (iℏ∇ + eA)2 u +
V u, the Einstein formula E = mc2 or the Euler's golden key ∞ 1/ns = p (1 − 1/ps )−1
P Q
n=1
R∞ 2 √
or the Gauss identity −∞ e−x dx = π or the volume of the unit ball in R2n given as
π n /n! appeared. Gregory Margulis mentioned
√ P −n an application of the Poisson summation for-
P ˆ 2 2
n f (n) which is = n e−n /4 or the quadratic reciprocity
P P
mula n f (n) = 2 ne
law (p|q) = (−1)(p−1)/2(q−1)/2 , where (p|q) = 1 if q is a quadratic residue modulo p and −1
else. Robert Minlos gave the Gibbs formula, a Feynman-Kac formula or the Stirling
formula. Yakov Sinai mentioned the Gelfand-Naimark realization of an Abelian C ∗ alge-
bra as an algebra of continuous function
Q∞ or the ksecondP∞law of nthermodynamics. Anatoly
Vershik gave the generating function k=0 (1 + x ) = n=0 p(n)x for the partition function
p(n) and the generalized Cauchy inequality between arithmetic and geometric mean. An
interesting statement of David Ruelle appears in that article who quoted Grothendieck by
my life's ambition as a mathematician, or rather my joy and passion, have constantly been
to discover obvious things . . . ". Combining Grothendieck's and Atiyah's quote, fundamental
theorems should be obvious, beautiful, simple and still deep".
A recent column Roots of unity" in the Scientic American asks mathematicians for their fa-
vorite theorem: examples are Noether's theorem, the uniformization theorem, the Ham
Sandwich theorem, the fundamental theorem of calculus, the circumference of the
circle, the classication of compact 2-surfaces, Fermat's little theorem, the Gromov
non-squeezing theorem, a theorem about Betti numbers, the Pythagorean theorem, the
classication of Platonic solids, the Birkho ergodic theorem, the Burnside lemma,
the Gauss-Bonnet theorem, Conways rational tangle theorem, Varignon's theorem, an
upper bound on Reidemeister moves in knot theory, the asymptotic number of rela-
tive prime pairs, the Mittag Leer theorem, a theorem about spectral sparsiers, the

142
OLIVER KNILL

Yoneda lemma and the Brouwer xed point theorem. These interviews illustrate also
that the choices are dierent if asked for personal favorite theorem" or objectively favorite
theorem".

Fundamental versus important

Asking for fundamental theorems is dierent than asking for deep theorems" or important
theorems". Examples of deep theorems are the Atiyah-Singer or Atiyah-Bott theorems in
dierential topology, the KAM theorem related to the strong implicit function theorem, or
the Nash embedding theorem in Riemannian geometry. An other example is the Gauss-
Bonnet-Chern theorem in Riemannian geometry or the Pesin theorem in partially hy-
perbolic dynamical systems. Maybe the shadowing lemma in hyperbolic dynamics is more
fundamental than the much deeper Pesin theorem (which is still too complex to be proven
with full details in any classroom. Also excellent textbooks like [581, 391] do not prove the
full theorem establishing the Bernoulli property on ergodic components). One can also argue,
whether the theorema egregium" of Gauss, stating that the curvature of a surface is intrinsic
and not dependent on an embedding is more fundamental" than the Gauss-Bonnet" result,
which is denitely deeper. In number theory, one can argue that the quadratic reciprocity
formula is deeper than the little Theorem of Fermat or the Wilson theorem. (The later
gives an if and only criterion for primality but still is far less important than the little theorem
of Fermat which as the later is used in many applications.) The last theorem of Fermat [81]
is an example of an important theorem as it is deep and related to other elds and culture,
but it is not yet so much a fundamental theorem". Similarly, the Perelman theorem xing
the Poincaré conjecture is important, but it is not (yet) a fundamental theorem. It is still a
mountain peak and not a sediment in a rock. Important theorems are not much used by other
theorems as they are located at the end of a development. Also the solution to the Kepler
problem on sphere packings or the proof of the 4-color theorem [128] or the proof of the
Feigenbaum conjectures [169, 367] are important results but not so much used by other
results. Important theorems build the roof of the building, while fundamental theorems form
the foundation on which a building can be constructed. But this can depend on time as what
is the roof today, might be in the foundation later on, once more oors have been added.

Essential math

In education it is necessary regularly to reexamine what a student of mathematics needs to

know. What are essential elds in mathematics? Also here, there are many opinions and
things are always in the ux. The 7 liberal arts of sciences was an early attempt to organize
things in a larger scale. For example, while in the 19th century, quaternions were considered
essential, they fell out of the curriculum and today, it is well possible that a student learns
about division algebras only in graduate school. One of the questions is how to balance appli-
cability and elegance. In pure mathematics, one might more focus on beauty and elegance, in
applied mathematics, the applicability is important. As the eld of mathematics has expanded
enormously, there is the problem of fragmentation. On the other hand, the mathematical elds
have also split. Some domains have been taken over" by new departments like applied mathe-
matics, computer science or statistics. Discrete mathematics courses like graph theory or theory
of computation or cryptology are now in the hands of computer science, dierential equations

143
FUNDAMENTAL THEOREMS

or numerical analysis by applied mathematics departments, probability theory courses taught

by statistics departments. Still, there is a core of mathematical content which a mathemati-
cian should at least have been exposed to. A student studying the subject should probably
have an eye on both getting into a eld which looks promising for research as well as having a
broad general education in all possible elds. One can get an idea what is required in various
mathematics departments by looking at what are called general examinations" or qualifying
examinations". These are exams given to rst year graduate students which have to be passed.
Departments like Harvard [174] or Princeton [690] have many of these questions in the public.
Also here, one could go to the AMS classication and grind through all topics. Instead, let us
try an attempt to put it all in one box, being aware that other priorities can work too:
Pre-calculus Algebra, Trig functions,Log and Exp Functions, Graphs, Modeling, Geometry, Solving equations, Inequalities
Single variable Functions, Limits, Continuity, Dierentiation, Integration, Series, Dierential equations, Fundamental theorem
Multi variable Vectors, Geometry, Functions, Dierentiation, Integration, Vector calculus: Green Stokes and Gauss
Linear algebra Linear equations, Determinants, Eigenvalues, Projection and Data-tting, Dierential equations, Fourier theory
Dynamical systems Iteration of maps, Ordinary and partial dierential equations, Bifurcation theory, Integrability, Ergodic Theory
Probability Probability spaces, Random variables, Distributions, Stochastic Processes, Statistics, Data, Estimation
Discrete math Combinatorics, Graphs, Order structures, Counting tools, Theory of computation, Complexity, Game Theory
Numerics Algorithms,Integration, Solving ODE's, PDE's, Approximation techniques, Interpolation, Comput. Geometry
Analysis Functional analysis, Banach algebras, Complex analysis, Harmonic analysis, Fourier theory, Laplace, PDE's
Algebra Groups and Rings, Modules, Vector Spaces, Commutative algebra, Non-commutative Rings, Galois theory
Number theory Primes, Diophantine equations and approximations, Geometry of numbers, Dirichlet Series, Zeta function
Geometry Dierential topology, Dierential Geometry, Geodesics, Curvature, Invariants, Geometric Measure theory
Alg. Geometry Ane and Projective varieties, Ringed spaces, Schemes, Sheaf Theoretical Methods, Cohomology, Categories
Topology Set theoretical topology, Fractal Geometry, Dierential topology, Homotopy, Algebraic Topology, Topos theory
Logic First/second order Logic, Foundations, Models, Incompleteness, Forcing, Computability, New Axiom systems
Real analysis Foundations, Metric spaces, Measure theory, Theory of integration on delta rings, Non-standard analysis
Computer Science Math software, Programming Paradigms, Computer Architecture, Data structures, Big Data, Machine Learning
Connections History, Big picture, Number systems, Notation, Linguistic, Psychology, Philosophy, Sociology and Pedagogy

Open problems

The importance of a result is also related to open problems attached to the theorem. Open
problems fuel new research and new concepts. Of course this is a moving target but any value
functional" for fundamental theorems" is time dependent and a bit also a matter of fash-
ion, entertainment (TV series like Numbers" or Hollywood movies like good will hunting"
changed the value) and under the inuence of big shot mathematicians which serve as in-
uencers". Some of the problems have prizes attached like the 23 problems of Hilbert, the
15 problems of Simon [649], the 18 problems of Smale, the Yau problems in geometry
[761], the 10 Millenium problems or the four Landau problems (Goldbach conjecture,
twin prime conjecture, the existence of primes between consecutive primes and the existence
of innitely many primes of the form n2 + 1) and then the oldest problem of mathematics
the existence of odd perfect numbers.
There are beautiful open problems in any major eld and building a ranking would be as
dicult as the problem to rank theorems. It is a bit a personal matter. I like the odd per-
fect number problem because it is the oldest problem in mathematics. Also Landau's list of
4 problems are clearly on the top. They are shockingly short and elementary but brutally
hard, having resisted more than a century of attacks by the best minds. There are other prob-
lems, where one believes that the mathematics has just not been developed yet to tackle it,
an example being the Collatz (3k+1) problem. With respect to the Millenium problems, one
could argue that the Yang-Mills gap problem is a rather vague. The problem looks like made

144
OLIVER KNILL

by humans" while a problem like the odd perfect number problem has been made by the gods".

There appears to be wide consensus that the Riemann hypothesis is the most important
open problem in mathematics. It states that the roots of the Riemann zeta function are all
located on the axes Re(z) = 1/2. In number theory, the prime twin problem or the Gold-
bach problem have a high exposure because they can be explained to a general audience
without mathematics background. For some reason, an equally simple problem, the Landau
problem asking whether there are innitely many primes of the form n2 + 1 is much less well
known. In recent years, due to an alleged proof by Shinichi Mochizuki of the ABC conjec-
ture using a new theory called Inter-Universal Teichmüller Theory (IUT) which so far
has not been accepted by the main mathematical community despite strong eorts. But it
has put the ABC conjecture from 1985 in the spot light like [754]. It has been described in
[272] as the most important problem in Diophantine equations. It can be expressed using the
quality Q(a, b, c) of three integers a, b, c which is Q(a, b, c) = log(c)/ log(rad(abc)), where the
radical rad(n) of a number n is the product of the distinct prime factors of n. The ABC
conjecture is that for any real number q > 1 there exist only nitely many triples (a, b, c) of
positive relatively prime integers with a + b = c for which Q(a, b, c) > q . The triple with
the highest quality so far is (a, b, c) = (2, 310 109, 235 ); its quality is Q = 1.6299. And then
there are entire collections of conjectures, one being the Langlands program which relates
dierent parts of mathematics like number theory, algebra, representation theory or algebraic
geometry. My personal favorite problem is the entropy problem in smooth dynamical sys-
tems theory [389]. The Kolmogorov-Sinai entropy of a smooth dynamical system can be
described using Lyapunov exponents. For many conservative systems like smooth convex bil-
liards, one measures positive entropy but one is unable to prove it. An example of a smooth
billiard is the real analytic l4 table x4 +y 4 = 1 [375]. For ergodic theory, see [158, 172, 248, 654].

Finally, one should mention that also problems which have been solved already are fair game
for new proofs or more simple proofs. It has been seen many times that rst proofs were
either too complicated or even had gaps. Some Math Olympiad or Putnam collection problem
books like [695, 661, 595] or themes [20], or short stories like [497, 627].

Classification results

One can also see classication theorems like the above mentioned Gelfand-Naimark realization
as mountain peaks in the landscape of mathematics. Examples of classication results are
the classication of regular or semi-regular polytopes, the classication of discrete subgroups of
a Lie group, the classication of Lie algebras", the classication of von Neumann algebras",
the classication of nite simple groups", the classication of Abelian groups, or the
classication of associative division algebras which by Frobenius is given either by the real or
complex or quaternion numbers. Not only in algebra, also in dierential topology, one would like
to have classications like the classication of d-dimensional manifolds. In topology, an example
of a classication result is that every Polish space is homeomorphic to some subspace of the
Hilbert cube. Related to physics is the insight that functionals" are important. Uniqueness
results help to render a functional important and fundamental. The classication of valuations
of elds is classied by Ostrowski's theorem classifying valuations over the rational numbers
either being the absolute value or the p-adic norm. The Euler characteristic for example
can be characterized as the unique valuation on simplicial complexes which assumes the value

145
FUNDAMENTAL THEOREMS

1 on simplices or functional which is invariant under Barycentric renements. A theorem of

Claude Shannon [643] identies the Shannon entropy is the unique functional on probability
spaces being compatible with additive and multiplicative operations on probability spaces and
satisfying some normalization condition.

Bounds and inequalities

An other class of important theorems are best bounds like the √ Hurwitz estimate stating
that there are innitely many p/q for which |x − p/q| < 1/( 5q ). For packing problems,
2

one wants to nd the best packing density, like for sphere packing problems in Euclidean
space. In complex analysis, one has the maximum principle, which assures that a harmonic
function f can not have a local maximum in its domain of denition. One can argue for
including this as a fundamental theorem as it is used by other theorems like the Schwarz
lemma (named after Hermann Amandus Schwarz) from complex analysis which is used in
many places. In probability theory or statistical mechanics, one often has thresholds, where
some phase transition appears. Computing these values is often important. The concept of
maximizing entropy explains many things like why the Gaussian distribution is fundamental
as it maximizes entropy. Measures maximizing entropy are often special and often equilibrium
measures. This is a central topic in statistical mechanics [614, 615]. In combinatorial topology,
the upper bound theorem was a milestone. It was long a conjecture of Peter McMullen and
then proven by Richard Stanley that cyclic polytopes maximize the volume in the class of
polytopes with a given number of vertices. Fundamental area also some inequalities [264] like
the Cauchy-Schwarz inequality |a · b| ≤ |a||b|, the Chebyshev inequality P[|X − [E[X]| ≥
|a|] ≤ Var[X]/a2 . In complex analysis, the Hadamard three circle theorem is important
as gives bounds between the maximum of |f | for a holomorphic function f dened on an
annulus given by two concentric circles. Often inequalities are more fundamental and powerful
than equalities because they are more widely used. Related to inequalities are embedding
theorems like Sobolev embedding theorems. For more inequalities, see [104]. Apropos
embedding, there are the important Whitney or Nash embedding theorems which are appealing.

Big ideas

Classifying and valuing big ideas is even more dicult than ranking individual theorems.
Examples of big ideas are the idea of axiomatisation which stated with planar geometry
and number theory as described by Euclid and the concept of proof or later the concept of
models. Archimedes idea of comparison, leading to ideas like the Cavalieri principle,
integral geometry or measure theory. René Descartes idea of coordinates which allowed to
work on geometry using algebraic tools, the use of innitesimals and limits leading to
calculus, allowing to merge concepts of rate of change and accumulation, the idea of extrema
leading to the calculus of variations or Lagrangian and Hamiltonian dynamics or descriptions of
fundamental forces. Maximizing quantities like entropy lead to fundamental distributions like
the Gaussian, exponential, Binomial or uniform distributions. Cantor's set theory allows
for a universal simple language to cover all of mathematics, the Klein Erlangen program of
classifying and characterizing geometries through symmetry". The abstract idea of a group
or more general mathematical structures like monoids. Then there is the concept of extending
number systems like completing the real numbers or extending it to the quaternions and

146
OLIVER KNILL

octonions or then producing p-adic number or hyperreal numbers. In the context of

complex numbers comes in the general idea of algebraically complete a eld. There
is also a topological completion like when passing from the rationals to the reals. Nice
corner stones are elds that both algebraically as well as topologically complete. The idea of
logarithms started as a computation tool by Napier or Bürgi [670]. An idea of Galois is to
relate problems about solving equations with eld extensions and symmetries. The idea
of equivalence classes is used when looking at projective spaces or ideals. A big idea is to
see prime ideals as a more fundamental replacement for maximal ideal" or points", leading
to the notion of spectrum of a ring and by gluing to the notion of schemes vastly expanding
classical algebraic geometry or manifold theory which patches locally Euclidean parts. The
Grothendieck program of geometry without points" or locales" as topologies without points
was an idea to overcome shortcomings of set theory. This led to new objects like schemes or
topoi. Central in algebra, geometry and number theory is the idea of localization which allows
to extend a ring so that one can start dividing", the prototype being the eld of fractions like
the construction of rational functions from polynomials. An other basic big idea is the concept
of duality, which appears in many places like in projective geometry, in polyhedra, Poincaré
duality or Pontryagin duality or Langlands duality for reductive algebraic groups. The
idea of dimension to measure topological spaces numerically leading to fractal geometry.
The idea of almost periodicity is an important generalization of periodicity. Crossing the
boundary of integrability leads to the important paradigm of stability and randomness [530]
and the interplay of structure and randomness [701]. These themes are related to harmonic
analysis and integrability as integrability means that for every invariant measure one has
measure theoretically a group treanslation. This means that for any invariant measure we have
almost periodicity and so predictability. Integrability is also related to spectral properties in
solid state physics or via Koopman theory in ergodic theory or then to fundamental new
number systems like the p-adic numbers: the p-adic integers form a compact topological
group on which the translation is almost periodic. It also leads to problems in Diophantine
approximation. The concept of algorithm and building the foundation of computation
using precise mathematical notions. The use of algebra to track problems in topology starting
with mathematicians like Gustav Kirchho, Enrico Betti, Henri Poincaré or Emmy Nöther.
An other important principle is to reduce a problem to a xed point problem. This often
leads to universality like for the central limit theorem (where the Gaussian distribution is
the xed point). The categorical approach is not only a unifying language but also allows
for generalizations of concepts allowing to solve problems. Examples are generalizations of Lie
groups in the form of group schemes. Then there is the deformation idea which was used for
example in the Perelman proof of the Poincaré conjecture. Deformation often comes in the
form of partial dierential equations and in particular heat type equations. Deformations
can be abstract in the form of homotopies or more concrete by analyzing concrete partial
dierential equations like the mean curvature ow or Ricci ow. An other important line of
ideas is to use probability theory to prove results, even in combinatorics. A probabilistic
argument can often give existence of objects which one can not even construct. Examples are
to dene Pa sequence of simplicial complexes Gn with n nodes for which the Euler characteristic
χ(Gn ) = x (−1)dim(x) is exponentially large in n. The idea of non-commutative geometry
generalizing geometry through functional analysis or the idea of discretization which leads
to numerical methods or computational geometry. The power of coordinates allows to solve
geometric problems more easily. The above mentioned examples have all proven their use.
Grothendieck's ideas have lead to the solution of the Weil conjectures, xed point theorems

147
FUNDAMENTAL THEOREMS

were used in Game theory (like by John Nash for Nash equilibria), or be used to prove
uniqueness of solutions of dierential equations. It is also used to justify perturbation theory
using renormalization schemes or iterative methods like in the KAM theorem about the
persistence of quasi-periodic motion, a perturbation problem which also led to hard implicit
function theorems (see e.g. [635] Chapter II). In the end, what really counts is whether
the big idea can solve practical problems or that it can be used to new theorems (or reprove
old theorems more elegantly). The history of mathematics clearly shows that abstraction
for the sake of abstraction or for the sake of generalization rarely was able to convince the
mathematical community initially. But it can also happen that the break-through of a new
theory or generalization only pays o much later and that a subtle generalization actually
pushes the tool into a realm where it can be used in other contexts. A big idea might have to
age like a good wine.

Paradigms

There is once in a while an idea which completely changes the way we look at things. These
are paradigm shifts as described by the philosopher and historian Thomas Kuhn who relates
it also to scientic revolutions [451]. For mathematics, there are various places, where
such fundamental changes have happened: the introduction of written numbers which emerged
independently in various dierent places. An early example is the tally mark notation
on tally sticks (early sources are the Lebombo bone from 40 thousand years ago or the
Ishango bone from 20 thousand years ago) or the technology of talking knots, the Khipu
[475, 28, 717], which is a topological writing which ourished in the Tawantinsuyu, the Inka
empire. An other example of a paradigm change is the development of proof, which required
the insight that some mathematical statements are assumed as axioms from which, using
logical deduction, new theorems are proven. Also proof assistant frameworks like SAM
[358], ACL2 [481], Coq [720], Isabelle [696], Lean [309] (extended to Xena in an educational
setting) have emerged allowing to build in more reliability and accountability to proofs.
The fact that axiom systems can be deformed like from Euclidean to non-Euclidean geometry
was denitely a paradigm change. On a larger scale, the insight that even the axiom systems
of mathematics can be deformed and extended in various ways came only in the 20th century
with Gödel. Before that, one was under the impression that one could base all of mathematics
on a universal axiom system. This was Hilbert's program [762]. A third example of a paradigm
change is the introduction of the concept of functions which came surprisingly late. The
modern concept of a function which takes a quantity and assigns it a new quantity came only
late in the 19'th century with the development of set theory, which is a paradigm change
too. There had been a long struggle also with understanding limits, which puzzled already
Greek mathematicians like Zeno but which really only became solid with clear denitions like
Weierstrass and then with the concept of topology where the concept of limit is absorbed within
set theory, for example using the notion of lters. Related to functions is the use of functions to
understand combinatorial or number theoretical problems, like through the use of generating
functions, or Dirichlet series, allowing analytic tools to solve discrete problems like the
existence of primes on arithmetic progressions. The opposite, the use of discrete structures
like nite groups to understand the continuum like Galois theory is an other example of a
paradigm change. It led to the insight that the quadrature of the circle, or angle trisection
can not be done with ruler and compass. There are various other places, where paradigm
changes happened. A nice example is the axiomatization of probability theory by Kolmogorov

148
OLIVER KNILL

or the realization that statistics becomes a geometric theory if random variables are seen
as vectors in a vector space: the correlation between two random variables is the cosine of
the angle between centered versions of these random variables. Paradigm changes which are
really fundamental can be surprisingly simple. An example is the Connes formula [145] which
is based on the simple idea that distance can be measured by extremizing slope. This allows to
push traditional geometry into non-commutative settings or discrete settings, where a priory
no metric (notion of distance) is given. An other example is the extremely simple but powerful
idea of the Grothendieck extension of a monoid to a group. It has been used throughout
the history of mathematics to generate new number systems, starting with getting integers
from natural numbers, rational numbers from integers. The idea of doubling appears when
constructing complex numbers from real numbers or quaternions from complex numbers, or
the construction of surreal numbers or games generalizing numbers. The idea is also used
in dynamical systems theory to generate from a not necessarily invertible dynamical system an
invertible dynamical system by extending time from a monoid to a group. In the context of
Grothendieck, one should mention also that category theory similarly as set theory at the
beginning of the last century changed the way mathematics is done and extended. Like the
switch from relational data bases to graph databases, it is a paradigm change stressing
and focusing more on the relations (arrows) between objects (nodes) and not only the objects
(sets) themselves.

Taxonomies

When looking at mathematics overall, taxonomies are important. They not only help to nav-
igate the landscape, they are also interesting from a pedagogical as well as historical point of
view. I borrow here some material from a more historical course taught several times already
which is so global that a taxonomy is helpful. Organizing a eld using markers is also important
when teaching intelligent machines, a eld which be seen as the pedagogy for AI. The
big bulk of work in [426] was to teach a bot mathematics, which means to ll in thousands of
entries of knowledge. It can appear a bit mind-numbing, as it is a similar task than writing
a dictionary. It appears that writing things down for a machine actually is even tougher than
writing things down for a student. We can not assume the machine to know anything it is not
told. This document about fundamental theorems by the way could relatively easily be adapted
into a database of important theorems". We actually plan to do that by rendering each of the
small stories a node in a database and labeling each with keyworks allowing the document to
be rewritten automatically like by organizing the entries according to mathematical elds. It
actually is one my aims to feed it eventually to the Soa bot. If the machine is asked about
important theorem in mathematics", it should be well informed, even so it is just a stupid"
encyclopedic data entry. Historically, when knowledge was still sparse, one has classied teach-
ing material using the liberal arts of sciences, the trivium: grammar, logic and rhetoric, as
well as the quadrivium: arithmetic, geometry, music, and astronomy. More specically, one
has built the eight ancient roots of mathematics which are tied to activities: counting and
sorting (arithmetic), spacing and distancing (geometry), positioning and locating (topology),
surveying and angulating (trigonometry), balancing and weighing (statics), moving and hitting
(dynamics), guessing and judging (probability) and collecting and ordering (algorithms). This
leads then to topics like Arithmetic, Geometry, Number Theory, Algebra, Calculus, Set theory,
Probability, Topology, Analysis, Numerics, Dynamics and Algorithms. The AMS classica-
tion is much more rened and distinguishes 64 elds. The Bourbaki point of view is given in

149
FUNDAMENTAL THEOREMS

[180]: it partitions mathematics into algebraic and dierential topology, dierential geometry,
ordinary dierential equations, ergodic theory, partial dierential equations, non-commutative
harmonic analysis, automorphic forms, analytic geometry, algebraic geometry, number theory,
homological algebra, Lie groups, abstract groups, commutative harmonic analysis, logic, prob-
ability theory, categories and sheaves, commutative algebra and spectral theory. What are hot
spots in mathematics? Michael Atiyah [31] distinguished parameters like local - global,
low and high dimensional, commutative - non-commutative, linear - nonlinear, ge-
ometry - algebra, physics and mathematics.

Key examples

The concept of experiment came even earlier and has always been part of mathematics. Exper-
iments allow to get good examples and set the stage for a theorem. Many important theorems
we are aware of came rst in experimental form. Pythagoras theorem appeared in experiments
done on Clay tablets. 2 Obviously the theorem can not contradict any of the examples. But
examples are more than just a tool to falsify statements; a good example can be the seed for a
new theory or for an entire subject. Here are a few examples: in smooth dynamical systems
the Smale horse shoe comes to mind, in dierential topology the exotic spheres of Mil-
nor, in one-dimensional dynamics the logistic map, or Hénon map, in perturbation theory
of Hamiltonian systems the Standard map featuring KAM tori or Mather sets, in homotopy
theory the dunce hat or Bing house, in combinatorial topology the Rudin sphere, the
Nash-Kuiper non-smooth embedding of a torus into Euclidean space, in topology there
is the Alexander horned sphere or the Antoine necklace. In complexity theory there is
the busy beaver problem in Turing computation which is an illustration with how small ma-
chines one can achieve great things, in group theory there is the Rubik cube which illustrates
many fundamental notions for nitely presented groups, in fractal analysis the Cantor set,
the Menger sponge, in Fourier theory the series of f (x) = x mod 1, in Diophantine approxi-
mation the golden ratio, in the calculus of sums the zeta function, in dimension theory the
Banach Tarski paradox. In harmonic analysis the Weierstrass function as an example
of a nowhere dierentiable function. The case of Peano curves giving concrete examples of
a continuous bijection from an interval to a square or cube. In complex dynamics not only
the Mandelbrot set plays an important role, but also individual, specic Julia sets can be
interesting. Examples like the Mandelbulb have not yet been investigated mathematically.
In mathematical physics, the almost Matthieu operator [166] produced a rich theory re-
lated to spectral theory, Diophantine approximation, fractal geometry and functional analysis.
Besides examples illustrating a typical case, it is also important to explore the boundary and
limitations of a theorem or theory by looking at counter examples. Collections of counter
examples exist in many elds like [261, 672, 593, 683, 752, 114, 414].

Physics

One can also make a list of great ideas in physics [204] and see the relations with the fundamental
theorems in mathematics. A high applicability should then contribute to a value functional
in the list of theorems. Great ideas in physics are the concept of space and time, meaning
to describe physical events using dierential equations. In cosmology, one of the insights was
to understand the structure of our solar system and getting for a earth centered to a heliocentric
system, an other is to look at space-time as a hole and realize the expansion of the universe
2To quote Vladimir Arnold: Mathematics is a part of physics where experiments are cheap"

150
OLIVER KNILL

or that the idea of a big bang. More general is the Platonic idea that physics is geometry.
Or calculus: Lagrange developed his calculus of variations to nd laws of physics. Then
there is the idea of Lorentz invariance and symmetries more general which leads to special
relativity, there is the idea of general relativity which allows to describe gravity through
geometry and a larger symmetry seen through the equivalence principle. There is the idea of
see elementary particles using Lie groups. There is the Noether theorem which is the idea
that any symmetry is tied to a conservation law: translation symmetry leads to momentum
conservation, rotation symmetry to angular momentum conservation for example. Symmetries
also play a role when spontaneous broken symmetry or phase transitions. There is the
idea of quantum mechanics which mathematically means replacing dierential equations with
partial dierential equations or replacing commutative algebras of observables with non-
commutative algebras. An important idea is the concept of perturbation theory and in
particular the notion of linearization. Many laws are simplications of more complicated laws
and described in the simplest cases through linear laws like Ohms law or Hooks law. Quantiza-
tion processes allow to go from commutative to non-commutative structures. Perturbation
theory allows then to extrapolate from a simple law to a more complicated law. Some is easy
application of the implicit function theorem, some is harder like KAM theory. There is the
idea of using discrete mathematics to describe complicated processes. An example is the
language of Feynman graphs or the language of graph theory in general to describe physics as
in loop quantum gravity or then the language of cellular automata which can be seen as par-
tial dierence equations where also the function space is quantized. The idea of quantization,
a formal transition from an ordinary dierential equation like a Hamiltonian system to a partial
dierential equation or to replace single particle systems with innite particle systems (Fock).
There are other quantization approaches through deformation of algebras which is related
to non-commutative geometry. There is the idea of using smooth functions to describe
discrete particle processes. An example is the Vlasov dynamical system or Boltzmann's
equation to describe a plasma, or thermodynamic notions to describe large sets of particles
like a gas or uid. Dual to this is the use of discretization to describe a smooth system by
discrete processes. An example is numerical approximation, like using the Runge-Kutta
scheme to compute the trajectory of a dierential equation. There is the realization that we
have a whole spectrum of dynamical systems, integrability and chaos and that some of the
transitions are universal. An other example is the tight binding approximation in which
a continuum Schrödinger equation is replaced with a bounded discrete Jacobi operator.
There is the general idea of nding the building blocks or elementary particles. Starting
with Demokrit in ancient Greece, the idea got rened again and again. Once, atoms were
detected and charges found to be quantized (Robert Millikan), the structure of the atom was
explored (Rutherford), and then the atom got split (Lisa Meitner, Otto Hahn). The structure
of the nuclei with protons and neutrons was then rened again using quarks leading the stan-
dard model in particle physics. There is furthermore the idea to use statistical methods
for complex systems. An example is the use of stochastic dierential equations like diusion
processes to describe actually deterministic particle systems. There is the insight that compli-
cated systems can form patterns through interplay between symmetry, conservation laws and
synchronization. Large scale patterns can be formed from systems with local laws. Finally,
there is the idea of solving inverse problems using mathematical tools like Fourier theory or
basic geometry (Eratostenes could compute the radius of the earth by comparing the lengths
of shadows at dierent places of the earth.) An example is tomography, where the structure
of some object is explored using resonance and where the reconstruction solves an inverse

151
FUNDAMENTAL THEOREMS

problem. Then there is the idea of scale invariance which allows to describe objects which
have fractal nature.

Computer science

As in physics, it is harder to pinpoint big ideas" in computer science as they are in general not
theorems. But it has been done [429]. The initial steps of mathematics was to build a language,
where numbers represent quantities [154]. Physical tools which assist in manipulating numbers
can already been seen as a computing device. Marks on a bone, pebbles in a clay bag, talking
knots in a Khipu [717, 28], marks on a Clay tablet were the rst step. Papyri, paper, magnetic,
optical and electric storage, the tools to build memory were rened over millenniums. The
mathematical language allowed us to explore topics beyond the nite and also build data
bases. The Khipu concept was already an early form of graph database [8]. Using a nite
number of symbols we can represent and count innite sets, have notions of cardinality, have
various number systems and more generally have algebraic structures. Numbers can
even be seen as games [153, 433]. A major idea is the concept of an algorithm. Adding or
multiplying on an abacus already was an algorithm. The concept was rened in geometry,
where ruler and compass were used as computing devices, like the construction of points
in a triangle. To measure the eectiveness of an algorithm, one can use notions of complexity.
This has been made precise by computing pioneers like Alan Turing, as one has to formulate
rst what a computation" is. The concept of the Turing machine is particularly elegant as
it is both a theoretical construct as well as a concrete machine (although extremely inecient).
In the last century one has seen that computations and proofs are very similar and that they
have similar general restrictions. There are some tasks which can not be computed with a
Turing machine and there are theorems which can not be proven in a specic axiom system.
As mathematics is a language, we have to deal with concepts of syntax, grammar, notation,
context, parsing, validation, verication. As Mathematics is a human activity which is
done in our brains, it is related to psychology and computer architecture. Computer science
aspects are also important also in pedagogy and education how can an idea be communicated
clearly? How do we motivate? How do we convince peers that a result is true? Examples
from history show that this is often done by authority and that the validity of some proofs
turned out to be wrong or incomplete, even in the case of fundamental theorems or when
treated by great mathematicians. (Examples are the fundamental theorem of arithmetic, the
fundamental theorem of algebra or the wrong published proof of Kempe of the 4 color theorem).
On the other hand, there were also quite many results which only later got recognized. The
work of Galois for example only exploded much later. How come we trust a human brain
more than an electronic one? We have to make some fundamental assumptions for example
to be made like that if we do a logical step "if A and B then A and B" holds. This assumes
for example that our memory is faithful: after having put A and B in the memory and
making the conclusion, we have to assume that we did not forget A nor B! Why do we trust
this more than the memory of a machine? As we are also assisted more and more by electronic
devices, the question of the validity of computer assisted proofs comes up. The 4-color
theorem of Kenneth Appel and Wolfgang Haken based on previous work of many others like
Heinrich Heesch or the proof of the Feigenbaum conjecture of Mitchell Feigenbaum rst
proven by Oscar Lanford III or the proof of the Kepler problem given by Thomas Hales are
examples. A great general idea is related to the representation of data. This can be done using

152
OLIVER KNILL

matrices like in a relational database or using other structures like graphs leading to graph
databases. The ability to use computers allows mathematicians to do experiments. A branch
of mathematics called experimental mathematics [26, 370] relies heavily on experiments to
nd new theorems or relations. Experiments are related to simulations. We are able, within
a computer to build and explore new worlds, like in computer games, we can enhance the
physical world using virtual reality or augmented reality or then capturing a world by
3D scanning and realize a world by printing the objects [428]. A major theme is articial
intelligence [618, 371]. It is related to optimization problems like optimal transport, neural
nets as well as inverse problems like structure from motion problems. An intelligent
entity must be able to take information, build a model and then nd an optimal strategy to
solve a given task. A self-driving car for example has to be able to translate pictures from a
camera and build a map, then determine where to drive. Such tasks are usually considered
part of applied mathematics but they are very much related with pure mathematics because
computers also start to learn how to read mathematics, how to verify proofs and to nd new
theorems. Articial intelligent agents [737] were rst developed in the 1960ies learned also
some mathematics. I myself learned about it when incorporated computer algebra systems into
a chatbots in [426]. AI has now become a big business as Alexa, Siri, Google Home, IBM
Watson or Cortana demonstrate. But these information systems must be taught, they must
be able to rank alternative answers, even inject some humor or opinions. Soon, they will be
able to learn themselves and answer questions like what are the 10 most important theorems
in mathematics?"

Brevity

We live in a instagram, snapchat, twitter, microblog, vine, tiktok, watch-mojo, petcha-kutcha

time. Many of us multi task, read news on smart phones, watch faster paced movies, read
shorter novels and feel that a million word Marcel Proust's masterpiece a la recherche du
temps perdu" is temps perdu". Even classrooms and seminars have become more aphoristic.
Micro blogging tools are only the latest incarnation of miniature stories". They continue
the tradition of older formats like "mural art" by Romans to modern grati or aphorisms"
[444, 445]), poetry, cartoons, Unix fortune cookies [23]. Shortness has appeal: aphorisms,
poems, ferry tales, quotes, words of wisdom, life hacker lists, and tabloid top 10 lists illustrate
this. And then there are books like Math in 5 minutes", 30 second math", math in minutes"
[50, 269, 207], which are great coee table additions. Also short proofs are appealing like Let
epsilon be smaller than zero" which is the shortest known math joke, or There are three type
of mathematicians, the ones who can count, and the ones who can't." Also short open problems
are attractive, like the twin prime problem there are innitely many twin primes" or the
Landau problem there are innitely many primes of the form n2 + 1, or the Goldbach
problem every n > 2 is the sum of two primes". For the larger public in mathematics
shortness has appeal: according to a poll of the Mathematical Intelligencer from 1988, the
most favorite theorems are short [738, 739]. Results with longer proofs can make it to graduate
courses or specialized textbooks but still then, the results are often short enough so that they
can be tweeted without proof. Why is shortness attractive? Paul Erdös expressed short elegant
proofs as proofs from the book" [12]. Shortness reduces the possibility of error as complexity is
always a stumbling block for understanding. But is beauty equivalent to brevity? Buckminster
Fuller once said: If the solution is not beautiful, I know it is wrong." [9]. Much about the
aesthetics in mathematics is investigated in [526]. According to [610], the beauty of a piece

153
FUNDAMENTAL THEOREMS

of mathematics is frequently associated with the shortness of statement or of proof: beautiful

theories are also thought of as short, self-contained chapters tting within broader theories.
There are examples of complex and extensive theories which every mathematician agrees to
be beautiful, but these examples are not the one which come to mind. Also psychologists and
educators know that simplicity appeals to children: From [655] For now, I want simply to draw
attention to the fact that even for a young, mathematically naive child, aesthetic sensibilities
and values (a penchant for simplicity, for nding the building blocks of more complex ideas,
and a preference for shortcuts and liberating" tricks rather than cumbersome recipes) animates
mathematical experience. It is hard to exhaust them all, even not with tweets: there are more
than googool = 10 texts of length 140. This can not all ever be written down because there
2 200

are more than what we estimate the number of elementary particles. But there are even short
story collections. Berry's paradox tells in this context that the shortest non-tweetable text in
140 characters can be tweeted: "The shortest non-tweetable text". Since we insist on giving
proofs, we have to cut corners. Books containing lots of elegant examples are [17, 12]. We
should add that brevity is not a new thing. J.E. Littlewood has raised the question how short
a dissertation can be and proves in an example, that two sentences are enough and gives a
one-sentence proof of the fact that bounded entire functions are constant by using Cauchy's
integral theorem. It has been rened a bit in [766].

Twitter math

The following 42 tweets were written in 2014, when twitter still had a 140 character limit. Some
of them were actually tweeted. The experiment was to see which theorems are short enough
so that one can tweet both the theorem as well as the proof in 140 characters. Of course, that
often required a bit of cheating. See [12] for proofs from the books, where the proofs have full
details.
Euclid: The set of primes is innite. Proof: let p be largest
prime, then p! + 1 has a larger prime factor than p. Contradic-
tion.

Euclid: 2p −1 prime then 2p−1 (2p −1) is perfect. Proof. σ(n) =

sum of factors of n, σ(2n − 1)2n−1 ) = σ(2n − 1)σ(2n−1 ) =
2n (2n − 1) = 2 · 2n (2n − 1) shows σ(k) = 2k .

√ √
Hippasus: 2 is irrational. Proof. If 2 = p/q , then 2q 2 =
p2 . To the left is an odd number of factors 2, to the right it is
even one. Contradiction.

Pythagorean triples: all x2 + y 2 = z 2 are of form (x, y, z) =

(2st, s2 − t2 , s2 + t2 ). Proof: x or y is even (both odd gives
x2 + y 2 = wk with odd k ). Say x2 is even: write x2 = z 2 − y 2 =
(z − y)(z + y). This is 4s2 t2 . Therefore 2s2 = z − y, 2t2 = z + y .
Solve for z, y .

154
OLIVER KNILL

Pigeon principle: if n + 1 pigeons live in n boxes, there is a

box with 2 or more pigeons. Proof: place a pigeon in each box
until every box is lled. The pigeon left must have a roommate.

Angle sum in triangle: α + β + γ = KA + π if K is cur-

vature, A triangle area. Proof: Gauss-Bonnet for surface with
boundary. α, β, γ are Dirac measures on the boundary.

Chinese remainder theorem: a(i) x = b(i) mod n(i) has a

solution if gcd(a(i),n(i))=0 and gcd(n(i),n(j))=0 Proof: solve
eq(1), then increment x by n(1) to solve eq(2), then increment
x by n(1) n(2) until second is ok. etc.

Nullstellensatz: algebraic sets in K n are 1:1 to radical ideals

in K[x1 ...xn ]. Proof: An algebra over K which is a eld is nite
eld extension of K.

Fundamental theorem algebra: a polynomial of degree n

has exactly n roots. Proof: the metric g = |f |−2/n |dz|2 on
the Riemann sphere has curvature K = n−1 ∆ log |f |. Without
root, K=0 everywhere contradicting Gauss-Bonnet. [15]:

Fermat: p prime (a, p) = 1, then p|ap − a Proof: induction

with respect to a. Case a = 1 is trivial (a + 1)p − (a + 1)
is congruent to ap − a modulo p because Binomial coecients
B(p, k) are divisible by p for k = 1, . . . p − 1.

Wilson: p is prime i p|(p − 1)! + 1 Proof. Group 2, . . . p − 2

into pairs (a, a− 1) whose product is 1 modulo p. Now (p−1)! =
(p−1) = −1 modulo p. If p = ab is not prime, then (p−1)! = 0
modulo p and p does not divide (p − 1)! + 1.

Bayes: A, B are events and Ac is the complement. P [A|B] =

Archimedes: Volume of sphere S(r) is 4πr3 /3 Proof: the

complement of the cone inside the cylinder has at height z the
cross section area r2 − z 2 , the same as the cross section area of
the sphere at height z .

155
FUNDAMENTAL THEOREMS

Archimedes: the area of the sphere S(r) is 4πr2 Proof: dif-

ferentiate the volume formula with respect to r or project the
sphere onto a cylinder of height 2 and circumference 2π and
not that this is area preserving.

Cauchy-Schwarz: |v·w| ≤ |v||w|. Proof: scale to get |w| = 1,

dene a = v.w, so that 0 ≤ (v − aw) . . . (v − aw) = |v|2 − a2 =
|v|2 |w|2 − (v · w)2 .

Angle formula: Cauchy-Schwarz denes the angle between

two vectors as cos(A) = v.w/|v||w|. If v, w are centered random
variables, then v · w is the covariance, |v|, |w| are standard
deviations and cos(A) is the correlation.

Cos formula: c2 = a2 + b2 − ab cos(A) in a triangle ABC

(Al-Kashi theorem) Proof: v = AB, w = AC has length a =
|v|, b = |w|, |c| = |v − w|. Now: (v − w).(v − w) = |v|2 + |w|2 −
2|v||w| cos(A).

Pythagoras: A = π/2, then c2 = a2 + b2 . Proof: Let v = AB ,

w = AC , v − w = BC be the sides of the triangle. Multiply
out (v − w) · (v − w) = |v|2 + |w|2 and use v · w = 0.

Euler formula: exp(ix) = cos(x) + i sin(x). Proof: exp(ix) =

1 + (ix) + (ix) /2! − ... Pair real and imaginary parts and use
2

denition cos(x) = 1 − x2 /2! + x4 /4!... and sin(x) = x − x3 /3! +

x5 /5! − .....

x K(x) = χ(G) with K(x) = 1 −

P
Discrete Gauss-Bonnet
V0 (x)/2 + V1 (x)/3 + V2 (x)/4... curvature χ(G) = vP
0 − v1 + v2 −
v3 ... Euler characteristic Proof: Use handshake x Vk (x) =
vk+1 /(k + 2).

Poincaré-Hopf: let f be a coloring, if (x) = 1 − χ(Sf− (x)),

where Sf− (x) = y ∈ S(x)|f (y) < f (x) if (x) = χ(G). Proof
P
by induction. Removing local maximum of f reduces Euler
characteristic by χ(Bf (x)) − χ(S − f (x)) = if (x).

x iT (x) = str(T |H(G)). Proof: LHS is

P
Lefschetz:
str(exp(−0L)UT ) and RHS is str(exp(−tL)UT ) for t → ∞.
The super trace does not depend on t.

156
OLIVER KNILL

Stokes: orient edges E of graph G. F : E → R function, S

surface in G with boundary C. d(F )(ijk) = F (ij) + F (jk) −
F (ki) is the curl. The sum of the curls over all triangles is the
line integral of F along C .

Plato: there are exactly 5 platonic solids. Proof: number

f of n-gon satises f = 2e/n, v vertices of degree m satisfy
v = 2e/m v−e+f −2 means 2e/m−e+2e/n = 2 or 1/m+1/n =
1/e + 1/2 with solutions: (m = 4, n = 3), (m = 3, n = 5), (n =
m = 3), (n = 3, m = 5), (m = 3, n = 4).

Poincaré recurrence: T area-preserving map of probabil-

ity space (X, m). If m(A) > 0 and n > 1/m(A) we have
m(T k (A) ∩ A) > 0 for some 1 ≤ k ≤ n Proof. Otherwise
A, T (A), ..., T n (A) are all disjoint and the union has measure
n · m(A) > 1.

Turing: there is no Turing machine which halts if input is

Turing machine which halts: Proof: otherwise build an other
one which halts if the input is a non-halting one and does not
halt if input is a halting one.

Cantor: the set of reals in [0,1] is uncountable. Proof: if there

is an enumeration x(k), let x(k, l) be the l'th digit of x(k)
in binary form. The number with binary expansion y(k) =
x(k, k) + 1 mod 2 is not in the list.

Niven: π∈/ Q: Proof: π = a/b, f (x) = xn (a−bx)n /n! satises

f (pi − x) = f (x) and 0 < f (x) < π n an /nn f ( j)(x) = 0 at 0 and
π for 0 ≤ j ≤ n shows F (x) = f (x) − f (2) (x) + f (4) (x) · · · +
(−1)n f (2n) (x) has F (0), F (π) ∈ Z and F R+ F ′′ = f . Now
π
(F ′ (x) sin(x) − F (x) cos(x)) = f sin(x), so 0 f (x) sin(x)dx ∈
Z.

Fundamental theorem calculus: With dierentiation

Df (x) = f (x+1)−f (x) and integration Sf (x) = f (0)+f (1)+
... + f (n − 1) have SDf (x) = f (x) − f (0), DSf (x) = f (x).

f (x + t) = k f (k) (x)tk /k!. Proof: f (x + t) satises

P
Taylor:
transport equation ft = fx = Df an ODE for the dierential
operator D. Solve f (x + t) = exp(Dt)f (x).

157
FUNDAMENTAL THEOREMS

det(1 + F T G) =
P
Cauchy-Binet: P det(FP ) det(GP )
Proof: A = F T
G. Coecients of det(x − A) is
|P |=k det(FP ) det(GP ).
P

Intermediate: f continuous f (0) < 0, f (1) > 0, then there

exists 0 < x < 1, f (x) = 0. Proof. If f (1/2) < 0 do proof with
(1/2, 1) If f (1/2) > 0 redo proof with (0, 1/2).

Ergodicity: T (x) = x + a mod P 1 with irrational a

is
P ergodic. Proof. f = n a(n) exp(inx) T f =
n a(n) exp(ina) exp(inx) = f implies a(n) = 0 .

Benford: rst digit k of 2n appears with probability log(1 −

1/k) Proof: T : x → x + log(2) mod 1 is ergodic.
log(2n )mod 1 = k if log(k) ≤ T n (0) < log(k + 1). The proba-
bility of hitting this interval is log(k + 1)/ log(k).

Rank-Nullity: dim(ker(A)) + dim(im(A)) = n for m × n

matrix A. Proof: a column has a leading 1 in rref (A) or no
leading 1. In the rst case it contributes to the image, in the
second to a free variable parametrizing the kernel.

Column-Row picture: A : Rm → Rn . The k 'th column of A

is the image Aek . If all rows of A are perpendicular to x then
x is in the kernel of A.

Picard: x′ = f (x), x(0) = x0 has

R tlocally a unique solution if
f ∈ C . Proof: the map T (y) = 0 f (y(s)) ds is a contraction
1

on C([0, a]) for small enough a > 0. Banach xed point theo-
rem.

Banach: a contraction d(T (x), T (y)) ≤ ad(x, y) on complete

(X, d) has a unique xed point. Proof: d(xk , xn ) ≤ ak /(1 − a)
using triangle inequality and geometric series. Have Cauchy
sequence.

Liouville: every prime p=4k+1 is the sum of two squares.

Proof: there is an involution on S = (x, y, z)|x2 + 4yz = p with
exactly one xed point showing |S| is odd implying (x, y, z)− >
(x, z, y) has a xed point. [764]

158
OLIVER KNILL

Banach-Tarski: The unit ball in R3 can be cut into 5 pieces,

re-assembled using rotation and translation to get two spheres.
Proof: cut cleverly using axiom of choice.

Math areas

We add here the core handouts of Math E320 which aimed to give for each of the 12 math-
ematical subjects an overview on two pages. For that course, I had recommended books like
[226, 285, 59, 681, 682].

159
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective O. Knill, 2010-2018

Lecture 1: Mathematical roots

Similarly, as one has distinguished the canons of rhetorics: memory, invention, delivery, style, and arrange-

ment, or combined the trivium: grammar, logic and rhetorics, with the quadrivium: arithmetic, geometry,

music, and astronomy, to obtain the seven liberal arts and sciences, one has tried to organize all mathe-
matical activities.
counting and sorting arithmetic
spacing and distancing geometry
Historically, one has distin-
positioning and locating topology
eight ancient roots of
guished
surveying and angulating trigonometry
mathematics. Each of these 8
balancing and weighing statics
activities in turn suggest a key
moving and hitting dynamics
area in mathematics:
guessing and judging probability
collecting and ordering algorithms
To morph these 8 roots to the 12 mathematical areas covered in this class, we complemented the ancient roots

with calculus, numerics and computer science, merge trigonometry with geometry, separate arithmetic into

number theory, algebra and arithmetic and turn statics into analysis.

counting and sorting arithmetic

spacing and distancing geometry
positioning and locating topology
dividing and comparing number theory
Let us call this modern adapta-
balancing and weighing analysis
tion the
moving and hitting dynamics
12 modern roots of guessing and judging probability
Mathematics: collecting and ordering algorithms
slicing and stacking calculus
operating and memorizing computer science
optimizing and planning numerics
manipulating and solving algebra

Arithmetic numbers and number systems

Geometry invariance, symmetries, measurement, maps

While mathe-
relating Number theory Diophantine equations, factorizations

matical areas with hu- Algebra algebraic and discrete structures

man activities is useful, Calculus limits, derivatives, integrals

it makes sense to select Set Theory set theory, foundations and formalisms

specic topics in each of Probability combinatorics, measure theory and statistics

this area. These 12 top- Topology polyhedra, topological spaces, manifolds

ics will be the 12 lectures Analysis extrema, estimates, variation, measure

of this course. Numerics numerical schemes, codes, cryptology

Dynamics dierential equations, maps

Algorithms computer science, articial intelligence

Like any classication, this chosen division is rather arbitrary and a matter of personal preferences. The 2010
AMS classication distinguishes 64 areas of mathematics. Many of the just dened main areas are broken

160
OLIVER KNILL

o into even ner pieces. Additionally, there are elds which relate with other areas of science, like economics,

biology or physics:a

00 General 45 Integral equations

01 History and biography 46 Functional analysis
03 Mathematical logic and foundations 47 Operator theory
05 Combinatorics 49 Calculus of variations, optimization
06 Lattices, ordered algebraic structures 51 Geometry
08 General algebraic systems 52 Convex and discrete geometry
11 Number theory 53 Dierential geometry
12 Field theory and polynomials 54 General topology
13 Commutative rings and algebras 55 Algebraic topology
14 Algebraic geometry 57 Manifolds and cell complexes
15 Linear/multi-linear algebra; matrix theory 58 Global analysis, analysis on manifolds
16 Associative rings and algebras 60 Probability theory and stochastic processes
17 Non-associative rings and algebras 62 Statistics
18 Category theory, homological algebra 65 Numerical analysis
19 K-theory 68 Computer science
20 Group theory and generalizations 70 Mechanics of particles and systems

22 Topological groups, Lie groups 74 Mechanics of deformable solids

26 Real functions 76 Fluid mechanics
28 Measure and integration 78 Optics, electromagnetic theory
30 Functions of a complex variable 80 Classical thermodynamics, heat transfer
31 Potential theory 81 Quantum theory
32 Several complex variables, analytic spaces 82 Statistical mechanics, structure of matter
33 Special functions 83 Relativity and gravitational theory
34 Ordinary dierential equations 85 Astronomy and astrophysics
35 Partial dierential equations 86 Geophysics
37 Dynamical systems and ergodic theory 90 Operations research, math. programming
39 Dierence and functional equations 91 Game theory, Economics Social and Behavioral Sciences
40 Sequences, series, summability 92 Biology and other natural sciences
41 Approximations and expansions 93 Systems theory and control
42 Fourier analysis 94 Information and communication, circuits
43 Abstract harmonic analysis 97 Mathematics education
44 Integral transforms, operational calculus

What are
local and global

low and high dimension

fancy developments
commutative and non-commutative
in mathematics today? Michael Atiyah
linear and nonlinear
[31] identied in the year 2000 the
geometry and algebra
following six hot spots: physics and mathematics

Also this choice is of course highly personal. One can easily add 12 other polarizing quantities which help to
distinguish or parametrize dierent parts of mathematical areas, especially the ambivalent pairs which produce

a captivating gradient:

regularity and randomness discrete and continuous

integrable and non-integrable existence and construction

invariants and perturbations nite dim and innite dimensional

experimental and deductive topological and dierential geometric

polynomial and exponential practical and theoretical

applied and abstract axiomatic and case based

The goal is to illustrate some of these structures from a historical point of view and show that Mathematics is

the science of structure".

161
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 2: Arithmetic
The oldest mathematical discipline is arithmetic. It is the theory of the construction and manipulation of
numbers. Babylonian, Egyptian, Chinese, Indian and Greek thinkers.
The earliest steps were done by

Building up the number system starts with the natural numbers 1, 2, 3, 4... which can be added and multiplied.

Addition is natural: join 3 sticks to 5 sticks to get 8 sticks. Multiplication∗ is more subtle: 3 ∗ 4 means to take 3
copies of 4 and get 4 + 4 + 4 = 12 while 4 ∗ 3 means to take 4 copies of 3 to get 3 + 3 + 3 + 3 = 12. The rst factor

counts the number of operations while the second factor counts the objects. To motivate 3 ∗ 4 = 4 ∗ 3, spacial

insight motivates to arrange the 12 objects in a rectangle. This commutativity axiom will be carried over to

larger number systems. Realizing an addition and multiplicative structure on the natural numbers requires to

dene 0 and 1. It leads naturally to more general numbers. There are two major motivations to to build new
numbers: we want to

1. invert operations and still get results. 2. solve equations.

To nd an additive inverse of 3 means solving x + 3 = 0. The answer is a negative number. To solve x ∗ 3 = 1,
2 2
we get to a rational number x = 1/3. To solve x = 2 one need to escape to real numbers. To solve x = −2

requires complex numbers.

Numbers Operation to complete Examples of equations to solve

Natural numbers addition and multiplication 5+x=9

Positive fractions addition and division 5x = 8
Integers subtraction 5+x=3
Rational numbers division 3x = 5
Algebraic numbers taking positive roots x2 = 2 , 2x + x2 − x3 = 2
Real numbers taking limits x = 1 − 1/3 + 1/5 − +...,cos(x) = x
Complex numbers take any roots x2 = −2
Surreal numbers transnite limits x2 = ω , 1/x = ω
Surreal complex any operation x2 + 1 = −ω
The development and history of arithmetic can be summarized as follows: humans started with natural numbers,

dealt with positive fractions, reluctantly introduced negative numbers and zero to get the integers, struggled to

realize" real numbers, were scared to introduce complex numbers, hardly accepted surreal numbers and most

do not even know about surreal complex numbers. Ironically, as simple but impossibly dicult questions in

number theory show, the modern point of view is the opposite to Kronecker's "God made the integers; all
else is the work of man":
The surreal complex numbers are the most natural numbers;

The natural numbers are the most complex, surreal numbers.

Natural numbers. Counting can be realized by sticks, bones, quipu knots, pebbles or wampum knots. The
tally stick concept is still used when playing card games: where bundles of ves are formed, maybe by crossing
4 "sticks" with a fth. There is a "log counting" method in which graphs are used and vertices and edges count.

wolf radius bone contains 55 notches, with 5 groups of 5. It is probably more

An old stone age tally stick, the

Ishango bone, the bula of a baboon.

than 30'000 years old. [663] The most famous paleolithic tally stick is the

It could be 20'000 - 30'000 years old. [226] Earlier counting could have been done by assembling pebbles,

tying knots in a string, making scratches in dirt or bark but no such traces have survived the thousands of

years. The Roman system improved the tally stick concept by introducing new symbols for larger numbers

like V = 5, X = 10, L = 40, C = 100, D = 500, M = 1000. in order to avoid bundling too many single sticks.

162
OLIVER KNILL

The system is unt for computations as simple calculations V III + V II = XV show. Clay tablets, some as

early as 2000 BC and others from 600 - 300 BC are known. They feature Akkadian arithmetic using the base
60. The hexadecimal system with base 60 is convenient because of many factors. It survived: we use 60 minutes
per hour. The Egyptians used the base 10. The most important source on Egyptian mathematics is the
Rhind Papyrus of 1650 BC. It was found in 1858 [397, 663]. Hieratic numerals were used to write on papyrus
from 2500 BC on. Egyptian numerals are hieroglyphics. Found in carvings on tombs and monuments they

are 5000 years old. The modern way to write numbers like 2018 is the Hindu-Arab system which diused

to the West only during the late Middle ages. It replaced the more primitive Roman system. [663] Greek

arithmetic used a number system with no place values: 9 Greek letters for 1, 2, . . . 9, nine for 10, 20, . . . , 90 and

nine for 100, 200, . . . , 900.

Integers. Indian Mathematics morphed the place-value system into a modern method of writing numbers.
Hindu astronomers used words to represent digits, but the numbers would be written in the opposite order.

Independently, also the Mayans developed the concept of 0 in a number system using base 20. Sometimes

after 500, the Hindus changed to a digital notation which included the symbol 0. Negative numbers were

introduced around 100 BC in the Chinese text "Nine Chapters on the Mathematical art". Also the Bakshali
manuscript, written around 300 AD subtracts numbers carried out additions with negative numbers, where +
was used to indicate a negative sign. [577] In Europe, negative numbers were avoided until the 15'th century.

Fractions: Babylonians could handle fractions. The Egyptians also used fractions, but wrote every frac-

tion a as a sum of fractions with unit numerator and distinct denominators, like 4/5 = 1/2 + 1/4 + 1/20 or

5/6 = 1/2 + 1/3. Maybe because of such cumbersome computation techniques, Egyptian mathematics failed to

progress beyond a primitive stage. [663]. The modern decimal fractions used nowadays for numerical calcula-

tions were adopted only in 1595 in Europe.

Real numbers: As noted by the Greeks already, the diagonal of the square is not a fraction. It rst produced a

crisis until it became clear that "most" numbers are not rational. Georg Cantor saw rst that the cardinality
of all real numbers is much larger than the cardinality of the integers: while one can count all rational numbers

but not enumerate all real numbers. One consequence is that most real numbers are transcendental: they do

not occur as solutions of polynomial equations with integer coecients. The number π is an example. The
concept of real numbers is related to the concept of limit. Sums like 1 + 1/4 + 1/9 + 1/16 + 1/25 + . . . are
not rational.

Complex numbers: some polynomials have no real root. To solve x2 = −1 for example, we need new

numbers. One idea is to use pairs of numbers (a, b) where (a, 0) = a are the usual numbers and extend addition
and multiplication (a, b) + (c, d) = (a + c, b + d) and (a, b) · (c, d) = (ac − bd, ad + bc). With this multiplication,

the number (0, 1) has the property that (0, 1) · (0, 1) = (−1, 0) = −1. It is more convenient to write a + ib where

i = (0, 1) satises i2 = −1. One can now use the common rules of addition and multiplication.
Surreal numbers: Similarly as real numbers ll in the gaps between the integers, the surreal numbers ll in the

gaps between Cantors ordinal numbers. They are written as (a, b, c, ...|d, e, f, ...) meaning that the "simplest"
number is larger than a, b, c... and smaller than d, e, f, ... We have (|) = 0, (0|) = 1, (1|) = 2 and (0|1) = 1/2

or (|0) = −1. Surreals contain already transnite numbers like (0, 1, 2, 3...|) or innitesimal numbers like

(0|1/2, 1/3, 1/4, 1/5, ...). They were introduced in the 1970'ies by John Conway. The late appearance conrms
the pedagogical principle: late human discovery manifests in increased diculty to teach it.

163
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 3: Geometry

Geometry is the science of shape, size and symmetry. While arithmetic deals with numerical structures,

geometry handles metric structures. Geometry is one of the oldest mathematical disciplines. Early geometry

has relations with arithmetic: the multiplication of two numbers n×m as an area of a shape that is invariant
under rotational symmetry. Identities like the Pythagorean triples 3 + 4 = 5 were interpreted and
2 2 2

drawn geometrically. The right angle is the most "symmetric" angle apart from 0. Symmetry manifests

itself in quantities which are invariant. Invariants are one the most central aspects of geometry. Felix Klein's

Erlangen program uses symmetry to classify geometries depending on how large the symmetries of the shapes
are. In this lecture, we look at a few results which can all be stated in terms of invariants. In the presentation

special points in
as well as the worksheet part of this lecture, we will work us through smaller miracles like

triangles as well as a couple of gems: Pythagoras, Thales,Hippocrates, Feuerbach, Pappus, Morley,

Buttery which illustrate the importance of symmetry.

Much of geometry is based on our ability to measure length, the distance between two points. Having a

distance d(A, B) between any two points A, B , we can look at the next more complicated object, which is a set

A, B, C of 3 points, a triangle. Given an arbitrary triangle ABC, are there relations between the 3 possible

distances a = d(B, C), b = d(A, C), c = d(A, B)? If we x the scale by c = 1, then a + b ≥ 1, a + 1 ≥ b, b + 1 ≥ a.
For any pair of (a, b) in this region, there is a triangle. After an identication, we get an abstract space, which
represent all triangles uniquely up to similarity. Mathematicians call this an example of a moduli space.

A sphere Sr (x) is the set of points which have distance r from a given point x. In the plane, the sphere is called
a circle. A natural problem is to nd the circumference L = 2π of a unit circle, or the area A = π of a unit disc,
the area F = 4π of a unit sphere and the volume V = 4 = π/3 of a unit sphere. Measuring the length of segments

on the circle leads to new concepts like angle or curvature. Because the circumference of the unit circle in the

plane is L = 2π , angle questions are tied to the number π , which Archimedes already approximated by fractions.

Alsovolumes were among the rst quantities, Mathematicians wanted to measure and compute. A problem
on Moscow papyrus dating back to 1850 BC explains the general formula h(a2 + ab + b2 )/3 for a truncated
pyramid with base length a, roof length b and height h. Archimedes achieved to compute the volume of the

sphere: place a cone inside a cylinder. The complement of the cone inside the cylinder has on each height h
the area π − πh2 . The half sphere cut at height h is a disc of radius (1 − h2 ) which has area π(1 − h2 ) too.

Since the slices at each height have the same area, the volume must be the same. The complement of the cone

inside the cylinder has volume π − π/3 = 2π/3, half the volume of the sphere.

planimetry, the geometry in the at two dimensional space. Highlights

The rst geometric playground was

are Pythagoras theorem, Thales theorem, Hippocrates theorem, and Pappus theorem. Discoveries
in planimetry have been made later on: an example is the Feuerbach 9 point theorem from the 19th century.

Ancient Greek Mathematics is closely related to history. It starts with Thales goes over Euclid's era at 500

BC and ends with the threefold destruction of Alexandria 47 BC by the Romans, 392 by the Christians and

640 by the Muslims. Geometry was also a place, where the axiomatic method was brought to mathematics:
theorems are proved from a few statements which are called axioms like the 5 axioms of Euclid:

164
OLIVER KNILL

1. Any two distinct points A, B determines a line through A and B.

2. A line segment [A, B] can be extended to a straight line containing the segment.

3. A line segment [A, B] determines a circle containing B and center A.

4. All right angles are congruent.

5. If lines L, M intersect with a third so that inner angles add up to < π, then L, M intersect.

Euclid wondered whether the fth postulate can be derived from the rst four and called theorems derived

from the rst four the "absolute geometry". Only much later, with Karl-Friedrich Gauss and Janos Bolyai
and Nicolai Lobachevsky in the 19'th century in hyperbolic space the 5'th axiom does not hold. Indeed,
geometry can be generalized to non-at, or even much more abstract situations. Basic examples are geometry

on a sphere leading to spherical geometry or geometry on the Poincare disc, a hyperbolic space. Both
of these geometries are non-Euclidean. Riemannian geometry, which is essential for general relativity
theory generalizes both concepts to a great extent. An example is the geometry on an arbitrary surface. Cur-

vatures of such spaces can be computed by measuring length alone, which is how long light needs to go from

one point to the next.

An important moment in mathematics was the merge of geometry with algebra: this giant step is often

attributed to René Descartes. Together with algebra, the subject leads to algebraic geometry which can

be tackled with computers: here are some examples of geometries which are determined from the amount of

symmetry which is allowed:

Euclidean geometry Properties invariant under a group of rotations and translations

Ane geometry Properties invariant under a group of ane transformations

Projective geometry Properties invariant under a group of projective transformations

Spherical geometry Properties invariant under a group of rotations

Conformal geometry Properties invariant under angle preserving transformations

Hyperbolic geometry Properties invariant under a group of Möbius transformations

Here are four pictures about the 4 special points in a triangle and with which we will begin the lecture. We will

see why in each of these cases, the 3 lines intersect in a common point. It is a manifestation of a symmetry
present on the space of all triangles. size of the distance of intersection points is constant 0 if we move on the
space of all triangular shapes. It's Geometry!

165
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 4: Number Theory

Number theory studies the structure of integers like prime numbers and solutions to Diophantine equations.

Gauss called it the "Queen of Mathematics". Here are a few theorems and open problems.

An integer larger than 1 which is divisible by 1 and itself only is called a prime number. The number
257885161
− 1 is the largest known prime number. It has 17425170 digits. Euclid proved that there are innitely
many primes: [Proof. Assume there are only nitely many primes p1 < p2 < · · · < pn . Then n = p1 p2 · · · pn + 1
is not divisible by any p1 , . . . , pn . Therefore, it is a prime or divisible by a prime larger than pn .] Primes become

more sparse as larger as they get. An important result is the prime number theorem which states that the n'th
prime number has approximately the size n log(n). For example the n = 1012 'th prime is p(n) = 29996224275833
and n log(n) = 27631021115928.545... and p(n)/(n log(n)) = 1.0856... Many questions about prime numbers

are unsettled: Here are four problems: the third uses the notation (∆a)n = |an+1 − an | to get the absolute
2
dierence. For example: ∆ (1, 4, 9, 16, 25...) = ∆(3, 5, 7, 9, 11, ...) = (2, 2, 2, 2, ...). Progress on prime gaps has

been done in 2013: pn+1 − pn is smaller than 100'000'000 eventually (Yitang Zhang). pn+1 − pn is smaller than

600 eventually (Maynard). The largest known gap is 1476 which occurs after p = 1425172824437699411.

Landau there are innitely many primes of the form n2 + 1 .

Twin prime there are innitely many primes p such that p + 2 is prime.

Goldbach every even integer n>2 is a sum of two primes.

Gilbreath If pn enumerates the primes, then (∆k p)1 = 1 for all k > 0.
√ √
Andrica The prime gap estimate pn+1 − pn < 1 holds for all n.

If the sum of the proper divisors of a n is equal to n, then n is called a perfect number. For example,
6 is perfect as its proper divisors 1, 2, 3 sum up to 6. All currently known perfect numbers are even. The
question whether odd perfect numbers exist is probably the oldest open problem in mathematics and not

settled. Perfect numbers were familiar to Pythagoras and his followers already. Calendar coincidences like that

we have 6 work days and the moon needs "perfect" 28 days to circle the earth could have helped to promote

the "mystery" of perfect number. Euclid of Alexandria (300-275 BC) was the rst to realize that if 2p − 1
is prime then k = 2p−1 (2p − 1) is a perfect number: [Proof: let σ(n) be the sum of all factors of n, including
n. Now σ(2 − 1)2n−1 ) = σ(2n − 1)σ(2n−1 ) = 2n (2n − 1) = 2 · 2n (2n − 1) shows σ(k) = 2k and veries
n

that k is perfect.] Around 100 AD, Nicomachus of Gerasa (60-120) classied in his work "Introduction to

Arithmetic" numbers on the concept of perfect numbers and lists four perfect numbers. Only much later it

became clear that Euclid got all the even perfect numbers: Euler showed that all even perfect numbers are of

the form (2n − 1)2n−1 , where 2n − 1 is prime. The factor 2n − 1 is called a Mersenne prime. [Proof: Assume
N = 2k m is perfect where m is odd and k > 0. Then 2k+1 m = 2N = σ(N ) = (2k+1 − 1)σ(m). This gives
σ(m) = 2k+1 m/(2k+1 − 1) = m(1 + 1/(2k+1 − 1)) = m + m/(2k+1 − 1). Because σ(m) and m are integers,
k+1
also m/(2 − 1) is an integer. It must also be a factor of m. The only way that σ(m) can be the sum of
only two of its factors is that m is prime and so 2
k+1
− 1 = m.] The rst 39 known Mersenne primes are
n
of the form 2 − 1 with n = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253,

4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787,

1398269, 2976221, 3021377, 6972593, 13466917. There are 11 more known from which one does not know the

rank of the corresponding Mersenne prime: n = 20996011, 24036583, 25964951, 30402457, 32582657, 37156667,

42643801,43112609,57885161, 74207281,77232917. The last was found in December 2017 only. It is unknown

whether there are innitely many.

166
OLIVER KNILL

A polynomial equations for which all coecients and variables are integers is called a Diophantine equation.
The rst Diophantine equation studied already by Babylonians is x2 + y 2 = z 2 . A solution (x, y, z) of this
equation in positive integers is called a Pythagorean triple. For example, (3, 4, 5) is a Pythagorean triple.
2 2 2 2
Since 1600 BC, it is known that all solutions to this equation are of the form (x, y, z) = (2st, s − t , s + t ) or

(x, y, z) = (s2 − t2 , 2st, s2 + t2 ), where s, t are dierent integers. [Proof. Either x or y has to be even because
2 2
if both are odd, then the sum x + y is even but not divisible by 4 but the right hand side is either odd or
2 2 2 2
divisible by 4. Move the even one, say x to the left and write x = z − y = (z − y)(z + y), then the right
2 2 2 2
hand side contains a factor 4 and is of the form 4s t . Therefore 2s = z − y, 2t = z + y . Solving for z, y gives

z = s2 + t2 , y = s2 − t2 , x = 2st.]
Analyzing Diophantine equations can be dicult. Only 10 years ago, one has established that the Fermat
equation x + y = z
n n n
has no solutions with xyz ̸= 0 if n > 2. Here are some open problems for Diophantine
equations. Are there nontrivial solutions to the following Diophantine equations?

x6 + y 6 + z 6 + u6 + v 6 = w6 x, y, z, u, v, w > 0
x5 + y 5 + z 5 = w 5 x, y, z, w > 0
xk + y k = n!z k k ≥ 2, n > 1
xa + y b = z c , a, b, c > 2 gcd(a, b, c) = 1

The last equation is called Super Fermat. A Texan banker Andrew Beals once sponsored a prize of 100′ 000
dollars for a proof or counter example to the statement: "If xp + y q = z r
p, q, r > 2, then gcd(x, y, z) > 1."
with

Given a prime like 7 and a number n we can add or subtract multiples of 7 from n to get a number in
{0, 1, 2, 3, 4, 5, 6 }. We write for example 19 = 12 mod 7 because 12 and 19 both leave the rest 5 when dividing
by 7. Or 5 ∗ 6 = 2 mod 7 because 30 leaves the rest 2 when dividing by 7. The most important theorem in

elementary number theory is Fermat's little theorem which tells that if a is an integer and p is prime then

ap − a is divisible by p. For example 27 − 2 = 126 is divisible by 7. [Proof: use induction. For a = 0 it is clear.
p p p p
The binomial expansion shows that (a+1) −a −1 is divisible by p. This means (a+1) −(a+1) = (a −a)+mp
p p
for some m. By induction, a − a is divisible by p and so (a + 1) − (a + 1).] An other beautiful theorem is

Wilson's theorem which allows to characterize primes: It tells that (n − 1)! + 1 is divisible by n if and only
if n is a prime number. For example, for n = 5, we verify that 4! + 1 = 25 is divisible by 5. [Proof: assume

n is prime. There are then exactly two numbers 1, −1 for which x2 − 1 is divisible by n. The other numbers
in 1, . . . , n − 1 can be paired as (a, b) with ab = 1. Rearranging the product shows (n − 1)! = −1 modulo n.

Conversely, if n is not prime, then n = km with k, m < n and (n − 1)! = ...km is divisible by n = km. ]

The solution to systems of linear equations like x = 3 (mod 5), x = 2 (mod 7) is given by the Chinese

remainder theorem. To solve it, continue adding 5 to 3 until we reach a number which leaves rest 2 to 7:
on the list 3, 8, 13, 18, 23, 28, 33, 38, the number 23 is the solution. Since 5 and 7 have no common divisor, the

system of linear equations has a solution.

For a given n, x2 − yn = 1 for the unknowns y, x? A solution produces a square root x of 1

how do we solve

modulo n. For prime n, only x = 1, x = −1 are the solutions. For composite n = pq , more solutions x = r · s
2 2
where r = −1 mod p and s = −1 mod q appear. Finding x is equivalent to factor n, because the greatest

common divisor of x − 1 and n is a factor of n. Factoring is dicult if the numbers are large. It assures
2

that encryption algorithms work and that bank accounts and communications stay safe. Number theory,

once the least applied discipline of mathematics has become one of the most applied one in mathematics.

167
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 5: Algebra
Algebra studies algebraic structures like "groups" and "rings". The theory allows to solve polynomial

equations, characterize objects by its symmetries and is the heart and soul of many puzzles. Lagrange claims

Diophantus to be the inventor of Algebra, others argue that the subject started with solutions of quadratic
equation by Mohammed ben Musa Al-Khwarizmi in the book Al-jabr w'al muqabala of 830 AD. Solutions
to equation like x + 10x = 39 are solved there by completing the squares: add 25 on both sides go get
2

x2 + 10x + 25 = 64 and so (x + 5) = 8 so that x = 3.

The use of variables introduced in school in elementary algebra were introduced later. Ancient texts only

dealt with particular examples and calculations were done with concrete numbers in the realm of arithmetic.
Francois Viete (1540-1603) used rst letters like A, B, C, X for variables.

The search for formulas for polynomial equations of degree 3 and 4 lasted 700 years. In the 16'th century,

the cubic equation and quartic equations were solved. Niccolo Tartaglia and Gerolamo Cardano reduced

the cubic to the quadratic: [rst remove the quadratic part with X = x − a/3 so that X 3 + aX 2 + bX + c
becomes the depressed cubic x3 + px + q . Now substitute x = u − p/(3u)
to get a quadratic equation

(u + qu − p /27)/u3 = 0 for u3 .] Lodovico Ferrari shows that the quartic equation can be reduced to the
6 3 3

cubic. For the quintic however no formulas could be found. It was Paolo Runi, Niels Abel and Évariste

Galois who independently realized that there are no formulas in terms of roots which allow to "solve" equations
p(x) = 0 for polynomials p of degree larger than 4. This was an amazing achievement and the birth of "group

theory".

Two important algebraic structures are groups and rings.

In a group G one has an operation ∗, an inverse a−1 and a one-element 1 such that a ∗ (b ∗ c) = (a ∗ b) ∗ c, a ∗ 1 =
1 ∗ a = a, a ∗ a−1 = a−1 ∗ a = 1. For example, the set Q∗ of nonzero fractions p/q with multiplication operation ∗
−1
and inverse 1/a form a group. The integers with addition and inverse a = −a and "1"-element 0 form a group
too. A ring R has two compositions + and ∗, where the plus operation is a group satisfying a+b = b+a in which
∗
the one element is called 0. The multiplication operation ∗ has all group properties on R except the existence

of an inverse. The two operations + and ∗ are glued together by the distributive law a ∗ (b + c) = a ∗ b + a ∗ c.

An example of a ring are the integers or the rational numbers or the real numbers. The later two are
actually elds, rings for which the multiplication on nonzero elements is a group too. The ring of integers are
no eld because an integer like 5 has no multiplicative inverse. The ring of rational numbers however form a eld.

Why is the theory of groups and rings not part of arithmetic? First of all, a crucial ingredient of algebra is

the appearance of variables and computations with these algebras without using concrete numbers. Second,

the algebraic structures are not restricted to "numbers". Groups and rings are general structures and extend

for example to objects like the set of all possible symmetries of a geometric object. The set of all similarity
operations on the plane for example form a group. An important example of a ring is the polynomial ring

of all polynomials. Given any ring R and a variable x, R[x] consists of all polynomials with coecients
the set
2 2
in R. The addition and multiplication is done like in (x + 3x + 1) + (x − 7) = x + 4x − 7. The problem to
2
factor a given polynomial with integer coecients into polynomials of smaller degree: x − x + 2 for example

can be written as (x + 1)(x − 2) have a number theoretical avor. Because symmetries of some structure form

a group, we also have intimate connections with geometry. But this is not the only connection with geometry.

Geometry also enters through the polynomial rings with several variables. Solutions to f (x, y) = 0 leads to

geometric objects with shape and symmetry which sometimes even have their own algebraic structure. They

are called varieties, a central object in algebraic geometry, objects which in turn have been generalized

168
OLIVER KNILL

further to schemes, algebraic spaces or stacks.

Arithmetic introduces addition and multiplication of numbers. Both form a group. The operations can be

written additively or multiplicatively. Lets look at this a bit closer: for integers, fractions and reals and the

addition +, 1 element 0 and inverse −g , we have a group. Many groups are written multiplicatively where
the

the 1 1. In the case of fractions or reals, 0 is not part of the multiplicative group because it is not
element is

possible to divide by 0. The nonzero fractions or the nonzero reals form a group. In all these examples the

groups satisfy the commutative law g ∗ h = h ∗ g .

Here is a group which is not commutative: let G be the set of all rotations in space, which leave the unit

cube invariant. There are 3*3=9 rotations around each major coordinate axes, then 6 rotations around axes

connecting midpoints of opposite edges, then 2*4 rotations around diagonals. Together with the identity rotation

e, these are 24 rotations. The group operation is the composition of these transformations.

An other example of a group is S4 , the set of all permutations of four numbers (1, 2, 3, 4). If g : (1, 2, 3, 4) →
(2, 3, 4, 1) is a permutation and h : (1, 2, 3, 4) → (3, 1, 2, 4) is an other permutation, then we can combine the
two and dene h ∗ g as the permutation which does rst g and then h. We end up with the permutation

(1, 2, 3, 4) → (1, 2, 4, 3). The rotational symmetry group of the cube happens to be the same than the group

S4 . To see this "isomorphism", label the 4 space diagonals in the cube by 1, 2, 3, 4. Given a rotation, we can

look at the induced permutation of the diagonals and every rotation corresponds to exactly one permutation.

The symmetry group can be introduced for any geometric object. For shapes like the triangle, the cube, the

octahedron or tilings in the plane.

Symmetry groups describe geometric shapes by algebra.

Many puzzles are groups. A popular puzzle, the 15-puzzle was invented in 1874 by Noyes Palmer Chapman
in the state of New York. If the hole is given the number 0, then the task of the puzzle is to order a given

random start permutation of the 16 pieces. To do so, the user is allowed to transposes 0 with a neighboring

piece. Since every step changes the signature s of the permutation and changes the taxi-metric distance d of 0
to the end position by 1, only situations with even s+d can be reached. It was Sam Loyd who suggested to

start with an impossible solution and as an evil plot to oer 1000 dollars for a solution. The 15 puzzle group

has 16!/2 elements and the "god number" is between 152 and 208. The Rubik cube is an other famous puzzle,
which is a group. Exactly 100 years after the invention of the 15 puzzle, the Rubik puzzle was introduced in

1974. Its still popular and the world record is to have it solved in 5.55 seconds. All Cubes 2x2x2 to 7x7x7 in a

row have been solved in a total time of 6 minutes. For the 3x3x3 cube, the God number is now known to be
20: one can always solve it in 20 or less moves.

Many puzzles are groups.

A small Rubik type game is the "oppy", which is a third of the Rubik and which has only 192 elements. An

other example is the Meert's great challenge. Probably the simplest example of a Rubik type puzzle is

the pyramorphix. It is a puzzle based on the tetrahedron. Its group has only 24 elements. It is the group

of all possible permutations of the 4 elements. It is the same group as the group of all reection and rotation

symmetries of the cube in three dimensions and also is relevant when understanding the solutions to the quartic

equation discussed at the beginning. The circle is closed.

169
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 6: Calculus
Calculus generalizes the process of taking dierences and taking sums. Dierences measure change, sums
explore how quantities accumulate. The procedure of taking dierences has a limit called derivative. The
activity of taking sums leads to the integral. Sum and dierence are dual to each other and related in an

intimate way. In this lecture, we look rst at a simple set-up, where functions are evaluated on integers and

where we do not take any limits.

Several dozen thousand years ago, numbers were represented by units like 1, 1, 1, 1, 1, 1, . . . . The units were

carved into sticks or bones like the Ishango bone It took thousands of years until numbers were represented

with symbols like 0, 1, 2, 3, 4, . . . . Using the modern concept of function, we can say f (0) = 0, f (1) = 1, f (2) =
2, f (3) = 3 and mean that the function f assigns to an input like 1001 an output like f (1001) = 1001. Now

look at Df (n) = f (n + 1) − f (n), the dierence. We see that Df (n) = 1 for all n. We can also formalize the

summation process. If g(n) = 1 is the constant 1 function, then then Sg(n) = g(0) + g(1) + · · · + g(n − 1) =

1 + 1 + · · · + 1 = n. We see that Df = g and Sg = f . If we start with f (n) = n and apply summation on

that function Then Sf (n) = f (0) + f (1) + f (2) + · · · + f (n − 1) leading to the values 0, 1, 3, 6, 10, 15, 21, . . . .

The new function g = Sf satises g(1) = 1, g(2) = 3, g(2) = 6, etc. The values are called the triangular

numbers. From g we can get back f by taking dierence: Dg(n) = g(n + 1) − g(n) = f (n). For example
Dg(5) = g(6) − g(5) = 15 − 10 = 5 which indeed is f (5). Finding a formula for the sum Sf (n) is not so easy.
Can you do it? When Karl-Friedrich Gauss was a 9 year old school kid, his teacher, a Mr. Büttner gave him

the task to sum up the rst 100 numbers 1 + 2 + · · · + 100. Gauss found the answer immediately by pairing

things up: to add up 1 + 2 + 3 + · · · + 100 he would write this as (1 + 100) + (2 + 99) + · · · + (50 + 51) leading
to 50 terms of 101 to get for n = 101 the value g(n) = n(n − 1)/2 = 5050. Taking dierences again is easier
Dg(n) = n(n + 1)/2 − n(n − 1)/2 = n = f (n). If we add up he triangular numbers we compute h = Sg which
has the rst values 0, 1, 4, 10, 20, 35, ..... These are the tetrahedral numbers because h(n) balls are needed

to build a tetrahedron of side length n. For example, h(4) = 20 golf balls are needed to build a tetrahedron of

side length 4. The formula which holds for h is h(n) = n(n − 1)(n − 2)/6 . Here is the fundamental theorem

of calculus, which is the core of calculus:

Df (n) = f (n) − f (0), DSf (n) = f (n) .

Proof.
n−1
X
SDf (n) = [f (k + 1) − f (k)] = f (n) − f (0) ,
k=0
n−1
X n−1
X
DSf (n) = [ f (k + 1) − f (k)] = f (n) .
k=0 k=0

integral
Rx
The process of adding up numbers will lead to the
0
f (x) dx . The process of taking dierences will

lead to the derivative d

dx f (x) .
The familiar notation is
Rx d d
Rx
0 dt
f (t) dt = f (x) − f (0), dx 0
f (t) dt = f (x)
If we dene [n]0 = 1, [n]1 = n, [n]2 = n(n − 1)/2, [n]3 = n(n − 1)(n − 2)/6 then D[n] = [1], D[n]2 = 2[n], D[n]3 =
3[n]2 and in general

d n
dx [x] = n[x]n−1
The calculus you have just seen, contains the essence of single variable calculus. This core idea will become

more powerful and natural if we use it together with the concept of limit.

170
OLIVER KNILL

Problem: The Fibonnacci sequence 1, 1, 2, 3, 5, 8, 13, 21, . . . satises the rule f (x) = f (x − 1) + f (x − 2). For
example, f (6) = 8. What is the function g = Df , if we assume f (0) = 0? We take the dierence between
successive numbers and get the sequence of numbers 0, 1, 1, 2, 3, 5, 8, ... which is the same sequence again. We

see that Df (x) = f (x − 1) .

If we take the same function f but now but now compute the function h(n) = Sf (n), we get the sequence
1, 2, 4, 7, 12, 20, 33, .... What sequence is that? Solution: Because Df (x) = f (x − 1) we have f (x) − f (0) =
SDf (x) = Sf (x − 1) so that Sf (x) = f (x + 1) − f (1). Summing the Fibonnacci sequence produces the
Fibonnacci sequence shifted to the left with f (2) = 1 is subtracted. It has been relatively easy to nd the

sum, because we knew what the dierence operation did. This example shows: we can study dierences to

understand sums.

Problem: The function f (n) = 2n is called the exponential function. We have for example f (0) = 1, f (1) =
2, f (2) = 4, . . . . It leads to the sequence of numbers

n= 0 1 2 3 4 5 6 7 8 ...
f(n)= 1 2 4 8 16 32 64 128 256 ...

We can verify that f satises the equation Df (x) = f (x) . because Df (x) = 2x+1 − 2x = (2 − 1)2x = 2x .
This is an important special case of the fact that

The derivative of the exponential function is the exponential function itself.

The function 2x is a special case of the exponential function when the Planck constant is equal to 1. We will see

that the relation will hold for any h>0 and also in the limit h → 0, where it becomes the classical exponential

function ex which plays an important role in science.

Calculus has many applications: computing areas, volumes, solving dierential equations. It even has applica-

tions in arithmetic. Here is an example for illustration. It is a proof that π is irrational The theorem is due

to Johann Heinrich Lambert (1728-1777): We show here the proof by Ivan Niven is given in a book of Niven-

Zuckerman-Montgomery. It originally appeared in 1947 (Ivan Niven, Bull.Amer.Math.Soc. 53 (1947),509). The

proof illustrates how calculus can help to get results in arithmetic.

Proof. Assume π = a/b with positive integers a and b. For any positive integer n dene

f (x) = xn (a − bx)n /n! .

We have f (x) = f (π − x) and

0 ≤ f (x) ≤ π n an /n!(∗)

for 0 ≤ x ≤ π . For all 0 ≤ j ≤ n, the j-th derivative of f is zero at 0 and π and for n <= j , the j-th derivative
of f is an integer at 0 and π .
(2)
The function F (x) = f (x) − f (x) + f (4) (x) − ... + (−1)n f (2n) (x) has the property that F (0) and F (π) are
′′ ′ ′
integers and F + F = f . Therefore, (F (x) sin(x) − F (x) cos(x)) = f sin(x). By the fundamental theorem of
Rπ
calculus,
0
f (x) sin(x) dx is an integer. Inequality (*) implies however that this integral is between 0 and 1 for
large enough n. For such an n we get a contradiction.

171
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 7: Set Theory and Logic

Set theory studies sets, the fundamental building blocks of mathematics. While logic describes the language of
all mathematics, set theory provides the framework for additional structures like category theory. In Cantorian
set theory, one can compute with subsets of a given set X like with numbers. There are two basic operations:
the addition A + B of two sets is dened as the set of all points which are in exactly one of the sets. The

multiplication A · B of two sets contains all the points which are in both sets. With the symmetric dierence
as addition and the intersection as multiplication, the subsets of a given set X become a ring. This Boolean

ring has the property A + A = 0 and A · A = A for all sets. The zero element is the empty set ∅ = {}. The
additive inverse of A is the complement −A of A in X. The multiplicative 1-element is the set X because

X · A = A. As in the ring Z of integers, the addition and multiplication on sets is commutative. Multiplication

does not have an inverse in general. Two sets A, B have the same cardinality, if there exists a one-to-one map
from A to B. For nite sets, this means that they have the same number of elements. Sets which do not have

nitely many elements are called innite. Do all sets with innitely many elements have the same cardinality?

The integers Z and the natural numbers N for example are innite sets which have the same cardinality: the
map f (2n) = n, f (2n + 1) = −n establishes a bijection between N and Z. Also the rational numbers Q have
the same cardinality than N. Associate a fraction p/q with a point (p, q) in the plane. Now cut out the column

q = 0 and run the Ulam spiral on the modied plane. This provides a numbering of the rationals. Sets which
can be counted are called of cardinality ℵ0 . Does an interval have the same cardinality than the reals? Even

so an interval like I = (−π/2, π/2) has nite length, one can bijectively map it to R with the tan function as

tan : I → R is bijective. Similarly, one can see that any two intervals of positive length have the same cardinality.
It was a great moment of mathematics, when Georg Cantor realized in 1874 that the interval (0, 1) does not

have the same cardinality than the natural numbers. His argument is ingenious: assume, we could count the

points a1 , a2 , . . . . If 0.ai1 ai2 ai3 ... is the decimal expansion of ai , dene the real number b = 0.b1 b2 b3 ..., where
bi = aii + 1 mod 10. Because this number b does not agree at the rst decimal place with a1 , at the second
place with a2 and so on, the number b does not appear in that enumeration of all reals. It has positive distance
−i
at least 10 from the i'th number (and any representation of the number by a decimal expansion which is

equivalent). This is a contradiction. The new cardinality, the continuum is also denoted ℵ1 . The reals are

uncountable. This gives elegant proofs like the existence of transcendental number, numbers which are not
algebraic, meaning that they are not the root of any polynomial with integer coecients: algebraic numbers can

be counted. Similarly as one can establish a bijection between the natural numbers N and the integers Z, there
is a bijectionf between the interval I and the unit square: if x = 0.x1 x2 x3 . . . is the decimal expansion of x then
f (x) = (0.x1 x3 x5 . . . , 0.x2 x4 x6 . . . ) is the bijection. Are there cardinalities larger than ℵ1 ? Cantor answered
also this question. He showed that for an innite set, the set of all subsets has a larger cardinality than the set

itself. How does one see this? Assume there is a bijection x → A(x) which maps each point to a set A(x). Now
look at the set B = {x | x ∈/ A(x) } and let b be the point in X which corresponds to B . If y ∈ B , then y ∈
/ B(x).
On the other hand, if y ∈ / B , then y ∈ B . The set B does appear in the "enumeration" x → A(x) of all sets. The
j
P
set of all subsets of N has the same cardinality than the continuum: A → j∈A 1/2 provides a map from P (N )
to [0, 1]. The set of all nite subsets of N however can be counted. The set of all subsets of the real numbers

has cardinality ℵ2 , etc. Is there a cardinality between ℵ0 and ℵ1 ? In other words, is there a set which can

not be counted and which is strictly smaller than the continuum in the sense that one can not nd a bijection

between it and R? This was the rst of the 23 problems posed by Hilbert in 1900. The answer is surprising:

one has a choice. One can accept either the "yes" or the "no" as a new axiom. In both cases, Mathematics

is still ne. The nonexistence of a cardinality between ℵ0 and ℵ1 is called the continuum hypothesis and

is usually abbreviated CH. It is independent of the other axioms making up mathematics. This was the work

172
OLIVER KNILL

of Kurt Gödel in 1940 and Paul Cohen in 1963. The story of exploring the consistency and completeness

of axiom systems of all of mathematics is exciting. Euclid axiomatized geometry, Hilbert's program was more

ambitious. He aimed at a set of axiom systems for all of mathematics. The challenge to prove Euclid's 5'th

postulate is paralleled by the quest to prove the CH. But the later is much more fundamental because it deals

withall of mathematics and not only with some geometric space. Here are the Zermelo-Frenkel Axioms
Ernst Zermelo in 1908 and Adolf Fraenkel and
(ZFC) including the Axiom of choice (C) as established by

Thoral Skolem in 1922.

Extension If two sets have the same elements, they are the same.
Image Given a function and a set, then the image of the function is a set too.
Pairing For any two sets, there exists a set which contains both sets.
Property For any property, there exists a set for which each element has the property.
Union Given a set of sets, there exists a set which is the union of these sets.
Power Given a set, there exists the set of all subsets of this set.
Innity There exists an innite set.
Regularity Every nonempty set has an element which has no intersection with the set.
Choice Any set of nonempty sets leads to a set which contains an element from each.
There are other systems like ETCS, which is the elementary theory of the category of sets. In category

theory, not the sets but the categories are the building blocks. Categories do not form a set in general. It

elegantly avoids the Russel paradox too. The axiom of choice (C) has a nonconstructive nature which can
lead to seemingly paradoxical results like the Banach Tarski paradox: one can cut the unit ball into 5 pieces,
rotate and translate the pieces to assemble two identical balls of the same size than the original ball. Gödel and

Cohen showed that the axiom of choice is logically independent of the other axioms ZF. Other axioms in ZF

have been shown to be independent, like the axiom of innity. Anitist would refute this axiom and work
without it. It is surprising what one can do with nite sets. The axiom of regularity excludes Russellian
sets like the set X of all sets which do not contain themselves. The Russell paradox is: Does X contain

X? It is popularized as the Barber riddle: a barber in a town only shaves the people who do not shave
themselves. Does the barber shave himself ? Gödels theorems of 1931 deal with mathematical theories

which are strong enough to do basic arithmetic in them.

First incompleteness theorem: Second incompleteness theorem:
In any theory there are true statements which can In any theory, the consistency of the theory can not

not be proved within the theory. be proven within the theory.

The proof uses an encoding of mathematical sentences which allows to state liar paradoxical statement "this

sentence can not be proved". While the later is an odd recreational entertainment gag, it is the core for a theorem

which makes striking statements about mathematics. These theorems are not limitations of mathematics; they

illustrate its inniteness. How awful if one could build axiom system and enumerate mechanically all possible

truths from it.

173
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 8: Probability theory

Probability theory is the science of chance. It starts with combinatorics and leads to a theory of stochas-
tic processes. Historically, probability theory initiated from gambling problems as in Girolamo Cardano's
gamblers manual in the 16th century. A great moment of mathematics occurred, when Blaise Pascal and

Pierre Fermat jointly laid a foundation of mathematical probability theory.

It took a while to formalize randomness" precisely. Here is the setup as which it had been put forward by

Andrey Kolmogorov: all possible experiments of a situation are modeled by a set Ω, the "laboratory". A

measurable subset of experiments is called an event". Measurements are done by real-valued functions X.
These functions are called random variables and are used to observe the laboratory.
As an example, let us model the process of throwing a coin 5 times. An experiment is a word like httht, where h
stands for head" and t represents tail". The laboratory consists of all such 32 words. We could look for example
at the event A that the rst two coin tosses are tail. It is the set A = {ttttt, tttth, tttht, ttthh, tthtt, tthth, tthht, tthhh}.
We could look at the random variable which assigns to a word the number of heads. For every experiment, we

get a value, like for example, X[tthht] = 2.

In order to make statements about randomness, the concept of a probability measure is needed. This is

a function P from the set of all events to the interval [0, 1]. It should have the property that P [Ω] = 1 and
P [A1 ∪ A2 ∪ · · · ] = P [A1 ] + P [A2 ] + · · · , if Ai is a sequence of disjoint events.
The most natural probability measure on a nite set Ω is P [A] = ∥A∥/∥Ω∥, where ∥A∥ stands for the number

of elements in A. It is the number of good cases" divided by the number of all cases". For example, to count

the probability of the event A that we throw 3 heads during the 5 coin tosses, we have |A| = 10 possibilities.

Since the entire laboratory has |Ω| = 32 possibilities, the probability of the event is 10/32. In order to study

these probabilities, one needs combinatorics:

How many ways are there to: The answer is:
rearrange or permute n elements n! = n(n − 1)...2 · 1
choose k from n with repetitions nk
n!
pick k from n if order matters
(n−k)!!
n n!
pick k from n with order irrelevant = k!(n−k)!
k

expectation
P
The of a random variable E[X] is dened as the sum m = ω∈Ω X(ω)P [{ω}]. In our coin toss
experiment, this is 5/2. The variance of X is the expectation of (X − m)2 . In our coin experiments, it is 5/4.
The square root of the variance is the standard deviation. This is the expected deviation from the mean. An

event happens almost surely if the event has probability 1.

2
√1 e−x
R
An important case of a random variable is X(ω) = ω on Ω = R equipped with probability P [A] = A π
dx,
the standard normal distribution. Analyzed rst by Abraham de Moivre Carl
in 1733, it was studied by

Friedrich Gauss in 1807 and therefore also called Gaussian distribution.

Two random variables X, Y are called uncorrelated, if E[XY ] = E[X] · E[Y ]. If for any functions f, g also

f (X) and g(Y ) are uncorrelated, then X, Y are called independent. Two random variables are said to have
the same distribution, if for anya < b, the events {a ≤ X ≤ b } and {a ≤ Y ≤ b } are independent. If X, Y
are uncorrelated, then the relationVar[X] + Var[Y ] = Var[X + Y ] holds which is just Pythagoras theorem,
because uncorrelated can be understood geometrically: X − E[X] and Y − E[Y ] are orthogonal. A common

problem is to study the sum of independent random variables Xn with identical distribution. One abbreviates

this IID. Here are the three most important theorems which we formulate in the case, where all random variables

are assumed to have expectatation 0 and standard deviation 1. Let Sn = X1 + ... + Xn be the n'th sum of the

174
OLIVER KNILL

IID random variables. It is also called a random walk.

LLN Law of Large Numbers assures that Sn /n converges to 0.

√
CLT Central Limit Theorem:Sn / n approaches the Gaussian distribution.
Law of Iterated Logarithm: Sn / 2n log log(n) accumulates in [−1, 1].
p
LIL

The LLN shows that one can nd out about the expectation by averaging experiments. The CLT explains why

one sees the standard normal distribution so often. The LIL nally gives us a precise estimate how fast Sn
grows. Things become interesting if the random variables are no more independent. Generalizing LLN,CLT,LIL

to such situations is part of ongoing research.

Here are two open questions in probability theory:

√
Are numbers like π, e, 2 normal: do all digits appear with the same frequency?
What growth rates Λn can occur in Sn /Λn having limsup 1 and liminf −1?

p
For the second question, there are examples for Λn = 1, λn = log(n) and of course λn = n log log(n) from

LIL if the random variables are independent. Examples of random variables which are not independent are
√
Xn = cos(n 2).

Statistics is the science of modeling random events in a probabilistic setup. Given data points, we want to
nd a model which ts the data best. This allows to understand the past, predict the future or discover

laws of nature. The most common task is to nd the mean and the standard deviation of some data. The
mean is also called the average and given by m =
1
Pn 2 1
Pn 2
n k=1 xk . The variance is σ = n k=1 (xk − m) with
standard deviation σ.

A sequence of random variables Xn dene a so called stochastic process. Continuous versions of such pro-

cesses are where Xt is a curve of random random variables. An important example is Brownian motion,
which is a model of a random particles.

Besides gambling and analyzing data, also physics was an important motivator to develop probability theory.
An example is statistical mechanics, where the laws of nature are studied with probabilistic methods. A

Ludwig Boltzmann's relation S = k log(W ) for entropy, a formula which decorates

famous physical law is

Boltzmann's tombstone. The entropy of a probability measure P [{k}] = pk on a nite set {1, ..., n} is dened
Pn
asS = − i=1 pi log(pi ). Today, we would reformulate Boltzmann's law and say that it is the expectation

S = E[log(W )] of the logarithm of the Wahrscheinlichkeit" random variable W (i) = 1/pi on Ω = {1, ..., n }.
Entropy is important because nature tries to maximize it

175
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 9: Topology
Topology studies properties of geometric objects which do not change under continuous reversible deforma-

tions. In topology, a coee cup with a single handle is the same as a doughnut. One can deform one into the

other without punching any holes in it or ripping it apart. Similarly, a plate and a croissant are the same. But

a croissant is not equivalent to a doughnut. On a doughnut, there are closed curves which can not be pulled

together to a point. For a topologist the letters O and P are the equivalent but dierent from the letter B.
The mathematical setup is beautiful: a topological space is a set X with a set O of subsets of X containing
both ∅ and X such that nite intersections and arbitrary unions in O are in O. Sets in O are called open sets
and O is called a topology. The complement of an open set is called closed. Examples of topologies are the

trivial topology O = {∅, X}, where no open sets besides the empty set and X exist or the discrete topology
O = {A | A ⊂ X}, where every subset is open. But these are in general not interesting. An important example
on the plane X is the collection O of sets U in the plane X for which every point is the center of a small disc

still contained in U . A special class of topological spaces are metric spaces, where a set X is equipped with a

distance function d(x, y) = d(y, x) ≥ 0 which satises the triangle inequality d(x, y) + d(y, z) ≥ d(x, z) and
for which d(x, y) = 0 if and only if x = y . A set U in a metric space is open if to every x in U , there is a ball

Br (x) = {y|d(x, y) < r} of positive radius r contained in U . Metric spaces are topological spaces but not vice
versa: the trivial topology for example is not in general. For doing calculus on a topological space X , each

point has a neighborhood called chart which is topologically equivalent to a disc in Euclidean space. Finitely

many neighborhoods covering X atlas of X . If the charts are glued together with identication maps
form an

on the intersection one obtains a manifold. Two dimensional examples are the sphere, the torus, the pro-
jective plane or the Klein bottle. Topological spaces X, Y are called homeomorphic meaning topologically

equivalent" if there is an invertible map from X to Y such that this map induces an invertible map on the

corresponding topologies. How can one decide whether two spaces are equivalent in this sense? The surface of

the coee cup for example is equivalent in this sense to the surface of a doughnut but it is not equivalent to the

surface of a sphere. Many properties of geometric spaces can be understood by discretizing it like with a graph.

A graph is a nite collection of vertices V together with a nite set of edges E, where each edge connects two

points in V. For example, the set V of cities in the US where the edges are pairs of cities connected by a street

is a graph. The Königsberg bridge problem was a trigger puzzle for the study of graph theory. Polyhedra
were an other start in graph theory. It study is loosely related to the analysis of surfaces. The reason is that

one can see polyhedra as discrete versions of surfaces. In computer graphics for example, surfaces are rendered

as nite graphs, using triangularizations. The Euler characteristic of a convex polyhedron is a remarkable

topological invariant. It is V − E + F = 2, where V is the number of vertices, E the number of edges and F the

number of faces. This number is equal to 2 for connected polyhedra in which every closed loop can be pulled

together to a point. This formula for the Euler characteristic is also called Euler's gem. It comes with a rich

history. René Descartes stumbled upon it and written it down in a secret notebook. It was Leonard Euler

in 1752 was the rst to proved the formula for convex polyhedra. A convex polyhedron is called a Platonic
solid, if all vertices are on the unit sphere, all edges have the same length and all faces are congruent polygons.
A theorem of Theaetetus states that there are only ve Platonic solids: [Proof: Assume the faces are regular

n-gons and m of them meet at each vertex. Beside the Euler relation V + E + F = 2, a polyhedron also satises
the relations nF = 2E and mV = 2E which come from counting vertices or edges in dierent ways. This gives

2E/m − E + 2E/n = 2 or 1/n + 1/m = 1/E + 1/2. From n ≥ 3 and m ≥ 3 we see that it is impossible that both
m and n are larger than 3. There are now nly two possibilities: either n = 3 or m = 3. In the case n = 3 we
have m = 3, 4, 5 in the case m = 3 we have n = 3, 4, 5. The ve possibilities (3, 3), (3, 4), (3, 5), (4, 3), (5, 3)

176
OLIVER KNILL

represent the ve Platonic solids.] The pairs (n, m) are called the Schläy symbol of the polyhedron:

Name V E F V-E+F Schläi Name V E F V-E+F Schläi

tetrahedron 4 6 4 2 {3, 3}
hexahedron 8 12 6 2 {4, 3} dodecahedron 20 30 12 2 {5, 3}
octahedron 6 12 8 2 {3, 4} icosahedron 12 30 20 2 {3, 5}
The Greeks proceeded geometrically: Euclid showed in the "Elements" that each vertex can have either 3,4 or 5
equilateral triangles attached, 3 squares or 3 regular pentagons. (6 triangles, 4 squares or 4 pentagons would lead
to a total angle which is too large because each corner must have at least 3 dierent edges). Simon Antoine-
Jean L'Huilier rened in 1813 Euler's formula to situations with holes: V − E + F = 2 − 2g ,
where g is the number of holes. For a doughnut it is V − E + F = 0. Cauchy rst proved that there are 4
non-convex regular Kepler-Poinsot polyhedra.

Name V E F V-E+F Schläi

small stellated dodecahedron 12 30 12 -6 {5/2, 5}
great dodecahedron 12 30 12 -6 {5, 5/2}
great stellated dodecahedron 20 30 12 2 {5/2, 3}
great icosahedron 12 30 20 2 {3, 5/2}

semi-regular polyhe-
If two dierent face types are allowed but each vertex still look the same, one obtains 13
dra. They were rst studied by Archimedes in 287 BC. Since his work is lost, Johannes Kepler is considered
the rst since antiquity to describe all of them them in his "Harmonices Mundi". The Euler characteristic for
surfaces is χ = 2−2g where g is the number of holes. The computation can be done by triangulating the surface.
The Euler characteristic characterizes smooth compact surfaces if they are orientable. A non-orientable surface,
the Klein bottle can be obtained by gluing ends of the Möbius strip. Classifying higher dimensional manifolds
is more dicult and nding good invariants is part of modern research. Higher analogues of polyhedra are
called polytopes (Alicia Boole Stott). Regular polytopes are the analogue of the Platonic solids in higher
dimensions. Examples:

dimension name Schläi symbols

2: Regular polygons {3}, {4}, {5}, ...
3: Platonic solids {3, 3}, {3, 4}, {3, 5}, {4, 3}, {5, 3}
4: Regular 4D polytopes {3, 3, 3}, {4, 3, 3}, {3, 3, 4}, {3, 4, 3}, {5, 3, 3}, {3, 3, 5}
≥ 5: Regular polytopes {3, 3, 3, . . . , 3}, {4, 3, 3, . . . , 3}, {3, 3, 3, . . . , 3, 4}

Ludwig Schllay saw in 1852 exactly six convex regular convex 4-polytopes or polychora, where "Choros"

is Greek for "space". Schlaei's polyhedral formula is V −E+F −C =0 holds, where C

is the number of 3-dimensional chambers. In dimensions 5 and higher, there are only 3 types of poly-

topes: the higher dimensional analogues of the tetrahedron, octahedron and the cube. A general formula
Pd−1 k d
k=0 (−1) vk = 1 − (−1)
gives the Euler characteristic of a convex polytop in d dimensions with

k -dimensional parts vk .

177
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 10: Analysis

Analysis is a science of measure and optimization. As a rather diverse collection of mathematical elds, it con-
tains real and complex analysis, functional analysis, harmonic analysis and calculus of variations.

Analysis has relations to calculus, geometry, topology, probability theory and dynamical systems. We focus

here mostly on "the geometry of fractals" which can be seen as part of dimension theory. Examples are Julia

sets which belong to the subeld of "complex analysis" of "dynamical systems". "Calculus of variations" is

illustrated by the Kakeya needle set in "geometric measure theory", "Fourier analysis" appears when looking

at functions which have fractal graphs, "spectral theory" as part of functional analysis is represented by the

"Hofstadter buttery". We somehow describe the topic using "pop icons".

A fractal is a set with non-integer dimension. An example is the Cantor set, as discovered in 1875 by Henry
Smith. Start with the unit interval. Cut the middle third, then cut the middle third from both parts then the

middle parts of the four parts etc. The limiting set is the Cantor set. The mathematical theory of fractals belongs

to measure theory and can also be thought of a playground for real analysis or topology. The term fractal
had been introduced by Benoit Mandelbrot in 1975. Dimension can be dened in dierent ways. The simplest

is the box counting denition which works for most household fractals: if we need n squares of length r to

cover a set, then d = − log(n)/ log(r) converges to the dimension of the set with r → 0. A curve

of length L for example needs L/r r so that its dimension is 1. A region of area A needs A/r2
squares of length
m
squares of length r to be covered and its dimension is 2. The Cantor set needs to be covered with n = 2 squares
m
of length r = 1/3 . Its dimension is − log(n)/ log(r) = −m log(2)/(m log(1/3)) = log(2)/ log(3). Examples of

fractals are the graph of the Weierstrass function 1872, the Koch snowak (1904), the Sierpinski carpet (1915)

or the Menger sponge (1926).

Complex analysis extends calculus to the complex. It deals with functions f (z) dened in the complex plane.
Integration is done along paths. Complex analysis completes the understanding about functions. It also provides

more examples of fractals by iterating functions like the quadratic map f (z) = z 2 + c:
One has already iterated functions before like the Newton method (1879). The Julia sets were introduced in

1918, the Mandelbrot set in 1978 and the Mandelbar set in 1989. Particularly famous are the Douady rabbit
and the dragon, the dendrite, the airplane. Calculus of variations is calculus in innite dimensions.

Taking derivatives is called taking "variations". Historically, it started with the problem to nd the curve of

fastest fall leading to the Brachistochrone curve ⃗r(t) = (t − sin(t), 1 − cos(t)). In calculus, we nd maxima

and minima of functions. In calculus of variations, we extremize on much larger spaces. Here are examples of

problems:

Brachistochrone 1696

Minimal surface 1760

Geodesics 1830

Isoperimetric problem 1838

Kakeya Needle problem 1917

Fourier theory decomposes a function into basic components of various frequencies f (x) = a1 sin(x) +
a2 sin(2x) + a3 sin(3x) + · · · . The numbers ai are called the Fourier coecients. Our ear does such a

decomposition, when we listen to music. By distinguish dierent frequencies, our ear produces a Fourier anal-

ysis.

178
OLIVER KNILL

Fourier series 1729

Fourier transform (FT) 1811

Discrete FT Gauss?

Wavelet transform 1930

an cos(πbn x)
P
The Weierstrass function mentioned above is given as a series n with 0 < a < 1, ab > 1 + 3π/2.
The dimension of its graph is believed to be 2 + log(a)/ log(b) but no rigorous computation of the dimension

was done yet. Spectral theory analyzes linear maps L. The spectrum are the real numbers E such that

L − E is not invertible. A Hollywood celebrity among all linear maps is the almost Matthieu operator
L(x)n = xn+1 + xn−1 + (2 − 2 cos(cn))xn : if we draw the spectrum for for each c, we see the Hofstadter
buttery. For xed c the map describes the behavior of an electron in an almost periodic crystal. An
other famous system is the quantum harmonic oscillator, L(f ) = f (x) + f (x), the vibrating drum
′′

L(f ) = fxx + fyy , where f is the amplitude of the drum and f = 0 on the boundary of the drum.

Hydrogen atom 1914

Hofstadter buttery 1976

Harmonic oscillator 1900

Vibrating drum 1680

All these examples in analysis look unrelated at rst. Fractal geometry ties many of them together: spectra are

often fractals, minimal congurations have fractal nature, like in solid state physics or in diusion limited
aggregation or in other critical phenomena like percolation phenomena, cracks in solids or the formation
of lighting bolts In Hamiltonian mechanics, minimal energy congurations are often fractals like Mather

theory. And solutions to minimizing problems lead to fractals in a natural way like when you have the task to
turn around a needle on a table by 180 degrees and minimize the area swept out by the needle. The minimal

turn leads to a Kakaya set, which is a fractal. Finally, lets mention some unsolved problems in analysis: does the

Riemann zeta function f (z) = ∞ z

P
n=1 1/n have all nontrivial roots on the axis Re(z) = 1/2? This question
is called the Riemann hypothesis and is the most important open problem in mathematics. It is an example

of a question in analytic number theory which also illustrates how analysis has entered into number theory.

Some mathematicians think that spectral theory might solve it. Also the Mandelbrot set M is not understood

yet: the "holy grail" in the eld of complex dynamics is the problem whether it M is locally connected. From

the Hofstadter buttery one knows that it has measure zero. What is its dimension? An other open question

in spectral theory is the "can one hear the sound of a drum" problem which asks whether there are two convex

drums which are not congruent but which have the same spectrum. In the area of calculus of variations, just one

problem: how long is the shortest curve in space such that its convex hull (the union of all possible connections

between two points on the curve) contains the unit ball.

179
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 11: Cryptography

Cryptography is the theory of codes. Two important aspects of the eld are the encryption rsp. decryption
of information and error correction. Both are crucial in daily life. When getting access to a computer, viewing

a bank statement or when taking money from the ATM, encryption algorithms are used. When phoning, surng

the web, accessing data on a computer or listening to music, error correction algorithms are used. Since our

lives have become more and more digital: music, movies, books, journals, nance, transportation, medicine,

and communication have become digital, we rely on strong error correction to avoid errors and encryption

to assure things can not be tempered with. Without error correction, airplanes would crash: small errors

in the memory of a computer would produce glitches in the navigation and control program. In a computer

memory every hour a couple of bits are altered, for example by cosmic rays. Error correction assures that this

gets xed. Without error correction music would sound like a 1920 gramophone record. Without encryption,

everybody could intrude electronic banks and transfer money. Medical history shared with your doctor would

all be public. Before the digital age, error correction was assured by extremely redundant information storage.

Writing a letter on a piece of paper displaces billions of billions of molecules in ink. Now, changing any single

bit could give a letter a dierent meaning. Before the digital age, information was kept in well guarded safes

which were physically dicult to penetrate. Now, information is locked up in computers which are connected

to other computers. Vaults, money or voting ballots are secured by mathematical algorithms which assure

that information can only be accessed by authorized users. Also life needs error correction: information in the

genome is stored in a genetic code, where a error correction makes sure that life can survive. A cosmic ray

hitting the skin changes the DNA of a cell, but in general this is harmless. Only a larger amount of radiation

can render cells cancerous.

How can an encryption algorithm be safe? One possibility is to invent a new method and keep it secret. An

other is to use a well known encryption method and rely on the diculty of mathematical computation
tasks to assure that the method is safe. History has shown that the rst method is unreliable. Systems which

rely on "security through obfuscation" usually do not last. The reason is that it is tough to keep a method

secret if the encryption tool is distributed. Reverse engineering of the method is often possible, for example

using plain text attacks. Given a map T, a third party can compute pairs x, T (x) and by choosing specic texts
gure out what happens.

The Caesar cypher permutes the letters of the alphabet. We can for example replace every letter A with

B, every letter B with C and so on until nally Z is replaced with A. The word "Mathematics" becomes so

encrypted as "Nbuifnbujdt". Caesar would shift the letters by 3. The right shift just discussed was used by

his Nephew Augustus. Rot13 shifts by 13, and Atbash cypher reects the alphabet, switch A with Z, B
with Y etc. The last two examples are involutive: encryption is decryption. More general cyphers are obtained

by permuting the alphabet. Because of 26! = 403291461126605635584000000 ∼ 1027 permutations, it appears

rst that a brute force attack is not possible. But Cesar cyphers can be cracked very quickly using statistical

analysis. If we know the frequency with which letters appear and match the frequency of a text we can gure

out which letter was replaced with which. The Trithemius cypher prevents this simple analysis by changing
the permutation in each step. It is called a polyalphabetic substitution cypher. Instead of a simple permutation,

there are many permutations. After transcoding a letter, we also change the key. Lets take a simple example.

Rotate for the rst letter the alphabet by 1, for the second letter, the alphabet by 2, for the third letter, the

alphabet by 3 etc. The word "Mathematics" becomes now "Ncwljshbrmd". Note that the second "a" has been

translated to something dierent than a. A frequency analysis is now more dicult. The Viginaire cypher
adds even more complexity: instead of shifting the alphabet by 1, we can take a key like "BCNZ", then shift the

rst letter by 1, the second letter by 3 the third letter by 13, the fourth letter by 25 the shift the 5th letter by

180
OLIVER KNILL

1 again. While this cypher remained unbroken for long, a more sophisticated frequency analysis which involves

rst nding the length of the key makes the cypher breakable. With the emergence of computers, even more

enigma had no chance.

sophisticated versions like the German

Die-Hellman key exchange allows Ana and Bob want to agree on a secret key over a public channel. The

two palindromic friends agree on a prime number p and a base a. This information can be exchanged over an
open channel. Ana chooses now a secret number x and sends X = ax modulo p to Bob over the channel. Bob
chooses a secret number y and sends Y = ay modulo p to Ana. Ana can compute Y x and Bob can compute
Xy but both are equal to axy . This number is their common secret. The key point is that eves dropper Eve,

can not compute this number. The only information available to Eve are X and Y, as well as the base a and p.
Eve knows that X = ax but can not determine x. The key diculty in this code is the discrete log problem:
getting x from ax modulo p is believed to be dicult for large p.
The Rivest-Shamir-Adleman public key system uses a RSA public key (n, a) with an integer n = pq and
a < (p − 1)(q − 1), where p, q n and a are public. Only the factorization of n is kept secret.
are prime. Also here,

Ana publishes this pair. Bob who wants to email Ana a message x, sends her y = xa mod n. Ana, who has
b ab
computed b with ab = 1 mod (p−1)(q−1) can read the secrete email y because y = x = x(p−1)(q−1) = x modn.
But Eve, has no chance because the only thing Eve knows is y and (n, a). It is believed that without the

factorization of n, it is not possible to determine x. The message has been transmitted securely. The core
diculty is that taking roots in the ring Zn = {0, . . . , n − 1 } is dicult without knowing the factorization

of n. With a factorization, we can quickly take arbitrary roots. If we can take square roots, then we can also
2
factor: assume we have a product n = pq and we know how to take square roots of 1. If x solves x = 1 mod n
2
and x is dierent from 1, then x − 1 = (x − 1)(x + 1) is zero modulo n. This means that p divides (x − 1) or

(x + 1). To nd a factor, we can take the greatest common divisor of n, x − 1. Take n = 77 for example. We
2
are given the root 34 of 1. ( 34 = 1156 has reminder 1 when divided by 34). The greatest common divisor of

34 − 1 and 77 is 11 is a factor of 77. Similarly, the greatest common divisor of 34 + 1 and 77 is 7 divides 77.
Finding roots modulo a composite number and factoring the number is equally dicult.
Cipher Used for Diculty Attack

Cesar transmitting messages many permutations Statistics

Viginere transmitting messages many permutations Statistics

Enigma transmitting messages no frequency analysis Plain text

Die-Helleman agreeing on secret key discrete log mod p Unsafe primes

RSA electronic commerce factoring integers Factoring

The simplest error correcting code uses 3 copies of the same information so single error can be corrected.

With 3 watches for example, one watch can fail. But this basic error correcting code is not ecient. It can

correct single errors by tripling the size. Its eciency is 33 percent.

181
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 12: Dynamical systems

Dynamical systems theory is the science of time evolution. If time is continuous the evolution is dened
by a dierential equation ẋ = f (x). If time is discrete then we look at the iteration of a map x → T (x).

The goal of the theory is to predict the future of the system when the present state is known. A dierential
equation is an equation of the form d/dtx(t) = f (x(t)), where the unknown quantity is a path x(t) in some

phase space". We know the velocity d/dtx(t) = ẋ(t) at all times and the initial conguration x(0)), we can to
compute the trajectory x(t). What happens at a future time? Does x(t) stay in a bounded region or escape

to innity? Which areas of the phase space are visited and how often? Can we reach a certain part of the

space when starting at a given point and if yes, when. An example of such a question is to predict, whether an

asteroid located at a specic location will hit the earth or not. An other example is to predict the weather of

the next week.

An examples of a dynamical systems in one dimension is the dierential equation

x′ (t) = x(t)(2 − x(t)), x(0) = 1

It is called the logistic system and describes population growth. This system has the solution x(t) =
2et /(1 + e2t ) as you can see by computing the left and right hand side.

A map is a rule which assigns to a quantity x(t) a new quantity x(t + 1) = T (x(t)). The state x(t) of the
system determines the situation x(t + 1) at time t + 1. An example is is the Ulam map T (x) = 4x(1 − x) on

the interval [0, 1]. This is an example, where we have no idea what happens after a few hundred iterates even

if we would know the initial position with the accuracy of the Planck scale.

Dynamical system theory has applications all elds of mathematics. It can be used to nd roots of equations

like for

T (x) = x − f (x)/f ′ (x) .

A system of number theoretical nature is the Collatz map
x
T (x) = (even x), 3x + 1 else .
2
A system of geometric nature is the Pedal map which assigns to a triangle the pedal triangle.
About 100 years ago, Henry Poincaré was able to deal with chaos of low dimensional systems. While

statistical mechanics had formalized the evolution of large systems with probabilistic methods already, the

new insight was that simple systems like a three body problem or a billiard map can produce very com-

plicated motion. It was Poincaré who saw that even for such low dimensional and completely deterministic

systems, random motion can emerge. While physisists have dealt with chaos earlier by assuming it or arti-

cially feeding it into equations like the Boltzmann equation, the occurrence of stochastic motion in geodesic
ows or billiards or restricted three body problems was a surprise. These ndings needed half a century to

sink in and only with the emergence of computers in the 1960ies, the awakening happened. Icons like Lorentz

helped to popularize the ndings and we owe them the "buttery eect" picture: a wing of a buttery can

produce a tornado in Texas in a few weeks. The reason for this statement is that the complicated equations

to simulate the weather reduce under extreme simplications and truncations to a simple dierential equation

ẋ = σ(y − x), ẏ = rx − y − xz, ż = xy − bz , the Lorenz system. For σ = 10, r = 28, b = 8/3, Ed Lorenz

discovered in 1963 an interesting long time behavior and an aperiodic "attractor". Ruelle-Takens called it a

182
OLIVER KNILL

strange attractor. It is a great moment in mathematics to realize that attractors of simple systems can

become fractals on which the motion is chaotic. It suggests that such behavior is abundant. What is chaos?

If a dynamical system shows sensitive dependence on initial conditions, we talk about chaos. We will

experiment with the two maps T (x) = 4x(1 − x) and S(x) = 4x − 4x2 which starting with the same initial

conditions will produce dierent outcomes after a couple of iterations.

The sensitive dependence on initial conditions is measured by how fast the derivative dT n n'th iterate
of the

grows. The exponential growth rate γ is called the Lyapunov exponent. A small error of the size h will be
γn
amplied to he after n iterates. In the case of the Logistic map with c = 4, the Lyapunov exponent is log(2)
−16 n −16
and an error of 10 is amplied to 2 · 10 . For time n = 53 already the error is of the order 1. This

explains the above experiment with the dierent maps. The maps T (x) and S(x) round dierently on the level

10−16 . After 53 iterations, these initial uctuation errors have grown to a macroscopic size.
Here is a famous open problem which has resisted many attempts to solve it: Show that the map T (x, y) =

(c sin(2πx) + 2x − y, x) with T n (x, y) = (fn (x, y), gn (x, y)) has sensitive dependence on initial conditions on a
1 1 1 c
R R
set of positive area. More precisely, verify that for c > 2 and all n
n 0 0 log |∂x fn (x, y)| dxdy ≥ log( 2 ). The
left hand side converges to the average of the Lyapunov exponents which is in this case also the entropy of the
map. For some systems, one can compute the entropy. The logistic map with c=4 for example, which is also

called the Ulam map, has entropy log(2). The cat map

T (x, y) = (2x + y, x + y) mod1

√
has positive entropy log |( 5 + 3)/2|. This is the logarithm of the larger eigenvalue of the matrix implementing

T.
While questions about simple maps look articial at rst, the mechanisms prevail in other systems: in astron-

omy, when studying planetary motion or electrons in the van Allen belt, in mechanics when studying coupled

pendulum or nonlinear oscillators, in uid dynamics when studying vortex motion or turbulence, in geometry,

when studying the evolution of light on a surface, the change of weather or tsunamis in the ocean. Dynamical

systems theory started historically with the problem to understand the motion of planets. Newton realized

that this is governed by a dierential equation, the n-body problem

n
X cij (xi − xj )
x′′j (t) = ,
i=1
|xi − xj |3

where cij depends on the masses and the gravitational constant. If one body is the sun and no interaction of the

planets is assumed and using the common center of gravity as the origin, this reduces to the Kepler problem
′′
x (t) = −Cx/|x| 3
, where planets move on ellipses, the radius vector sweeps equal area in each time and the

period squared is proportional to the semi-major axes cubed. A great moment in astronomy was when Kepler

derived these laws empirically. An other great moment in mathematics is Newton's theoretically derivation

from the dierential equations.

183
FUNDAMENTAL THEOREMS

E-320: Teaching Math with a Historical Perspective Oliver Knill, 2010-2018

Lecture 13: Computing

Computing deals with algorithms and the art of programming. While the subject intersects with computer sci-

ence, information technology, the theory is by nature very mathematical. But there are new aspects: computers

have opened the eld of experimental mathematics and serve now as the laboratory for new mathematics.
Computers are not only able to simulate more and more of our physical world, they allow us to explore new
worlds.

A mathematician pioneering new grounds with computer experiments does similar work than an experimental

physicist. Computers have smeared the boundaries between physics and mathematics. According to Borwein

and Bailey, experimental mathematics consists of:

Gain insight and intuition. Explore possible new results

Find patterns and relations Suggest approaches for proofs

Display mathematical principles Automate lengthy hand derivations

Test and falsify conjectures Conrm already existing proofs

When using computers to prove things, reading and verifying the computer program is part of the proof. If

Goldbach's conjecture would be known to be true for all n > 1018 , the conjecture should be accepted because
18
numerical verications have been done until 2 · 10 until today. The rst famous theorem proven with the help

of a computer was the "4 color theorem" in 1976. Here are some pointers in the history of computing:

2700BC Sumerian Abacus 1935 Zuse 1 programmable 1973 Windowed OS

200BC Chinese Abacus 1941 Zuse 3 1975 Altair 8800
150BC Astrolabe 1943 Harvard Mark I 1976 Cray I
125BC Antikythera 1944 Colossus 1977 Apple II
1300 Modern Abacus 1946 ENIAC 1981 Windows I
1400 Yupana 1947 Transistor 1983 IBM PC
1600 Slide rule 1948 Curta Gear Calculator 1984 Macintosh
1623 Schickard computer 1952 IBM 701 1985 Atari
1642 Pascal Calculator 1958 Integrated circuit 1988 Next
1672 Leibniz multiplier 1969 Arpanet 1989 HTTP
1801 Punch cards 1971 Microchip 1993 Web browser, PDA
1822 Dierence Engine 1972 Email 1998 Google
1876 Mechanical integrator 1972 HP-35 calculator 2007 iPhone
We live in a time where technology explodes exponentially. Moore's law from 1965 predicted that semiconductor
technology doubles in capacity and overall performance every 2 years. This has happened since. Futurologists

like Ray Kurzweil conclude from this technological singularity in which articial intelligence might take over.

An important question is how to decide whether a computation is "easy" or "hard". In 1937, Alan Turing
introduced the idea of a Turing machine, a theoretical model of a computer which allows to quantify com-

plexity. It has nitely many states S = {s1 , ..., sn , h } and works on an tape of 0−1 sequences. The state h is

the "halt" state. If it is reached, the machine stops. The machine has rules which tells what it does if it is in

state s and reads a letter a. Depending on s and a, it writes 1 or 0 or moves the tape to the left or right and

moves into a new state. Turing showed that anything we know to compute today can be computed with Turing

machines. For any known machine, there is a polynomial p so that a computation done in k steps with that

computer can be done in p(k) steps on a Turing machine. What can actually be computed? Church's thesis

of 1934 states that everything which can be computed can be computed with Turing machines. Similarly as in

mathematics itself, there are limitations of computing. Turing's setup allowed him to enumerate all possible

Turing machine and use them as input of an other machine. Denote by TM the set of all pairs (T, x), where T

184
OLIVER KNILL

is a Turing machine and x is a nite input. Let H ⊂ T M denote the set of Turing machines (T, x) which halt
with the tape x as input. Turing looked at the decision problem: is there a machine which decides whether a
given machine (T, x) is in H or not. An ingenious Diagonal argument of Turing shows that the answer is "no".

[Proof: assume there is a machine HALT which returns from the input (T, x) the output HALT(T, x) = true,

if T halts with the input x and otherwise returns HALT(T, x) = false. Turing constructs a Turing machine

DIAGONAL, which does the following: 1) Read x. 2) Dene Stop=HALT(x,x) 3) While Stop=True repeat
Stop:=True; 4) Stop.

Now, DIAGONAL is either in H or not. If DIAGONAL is in H, then the variable Stop is true which means

that the machine DIAGONAL runs for ever and DIAGONAL is not in H. But if DIAGONAL is not in H, then

the variable Stop is false which means that the loop 3) is never entered and the machine stops. The machine is

in H.]

Lets go back to the problem of distinguishing "easy" and "hard" problems: One calls P the class of decision

problems that are solvable in polynomial time and NP the class of decision problems which can eciently be

tested if the solution is given. These categories do not depend on the computing model used. The question

"N=NP?" is the most important open problem in theoretical computer science. It is one of the seven mille-
nium problems and it is widely believed that P ̸= N P . If a problem is such that every other NP problem

can be reduced to it, it is called NP-complete. Popular games like Minesweeper or Tetris are NP-complete. If

P ̸= N P , then there is no ecient algorithm to beat the game. The intersection of NP-hard and NP is the class

of NP-complete problems. An example of an NP-complete problem is the balanced number partitioning

problem: given n positive integers, divide them into two subsets A, B , so that the sum in A and the sum in B
are as close as possible. A rst shot: chose the largest remaining number and distribute it to alternatively to

the two sets.

We all feel that it is harder to nd a solution to a problem rather than to verify a solution. If N ̸= N P
there are one way functions, functions which are easy to compute but hard to verify. For some important prob-

lems, we do not even know whether they are in NP. Examples are the the integer factoring problem. An

ecient algorithm for the rst one would have enormous consequences. Finally, lets look at some mathematical

problems in articial intelligence AI:

problem solving playing games like chess, performing algorithms, solving puzzles

pattern matching speech, music, image, face, handwriting, plagiarism detection, spam

reconstruction tomography, city reconstruction, body scanning

research computer assisted proofs, discovering theorems, verifying proofs

data mining knowledge acquisition, knowledge organization, learning

translation language translation, porting applications to programming languages

creativity writing poems, jokes, novels, music pieces, painting, sculpture

simulation physics engines, evolution of bots, game development, aircraft design

inverse problems earth quake location, oil depository, tomography

prediction weather prediction, climate change, warming, epidemics, supplies

185
FUNDAMENTAL THEOREMS

About this document

It should have become obvious that I'm reporting on many of these theorems as a tourist and
not as a local. In some few areas I could qualify as a tour guide but hardly as a local. The
references contain only parts which have been consulted but it does not imply that I know all of
that source. My own background was in dynamical systems theory and mathematical physics.
Both of these subjects by nature have many connections with other branches of mathematics.

The motivation to try such a project came through teaching a course called Math E 320 at the
Harvard extension school. This math-multi-disciplinary course is part of the math for teaching
program", and tries to map out the major parts of mathematics and visit some selected placed
on 12 continents.

It is wonderful to visit other places and see connections. One can learn new things, relearn
old ones and marvel again about how large and diverse mathematics is but still to notice how
many similarities there are between seemingly remote areas. A goal of this project is also to
get back up to speed up to the level of a rst year grad student (one forgets a lot of things over
the years) and maybe pass the qualifying exams (with some luck).

This summer 2018 project also illustrates the challenges when trying to tour the most important
mountain peaks in the mathematical landscape with limited time. Already the identication
of major peaks and attaching a height" can be challenging. Which theorems are the most
important? Which are the most fundamental? Which theorems provide fertile seeds for new
theorems? I recently got asked by some students what I consider the most important theorem
in mathematics (my answer had been the Atiyah-Singer theorem").

Theorems are the entities which build up mathematics. Mathematical ideas show their merit
only through theorems. Theorems not only help to bring ideas to live, they in turn allow to
solve problems and justify the language or theory. But not only the results alone, also the
history and the connections with the mathematicians who created the results are fascinating.

The rst version of this document got started in May 2018 and was posted in July 2018. Com-
ments, suggestions or corrections are welcome. I hope to be able to extend, update and clarify
it and explore also still neglected continents in the future if time permits.

It should be pretty obvious that one can hardly do justice to all mathematical elds and that
much more would be needed to cover the essentials. A more serious project would be to
identify a dozen theorems in each of the major MSC classication elds. The current MSC2020
classication system has now 64 major entries and thousands of sub-entries listed on 120 pages
[551]. But even thousand and one theorem" list would only be the tip of the iceberg. Such
a list exists already: on Wikipedia, there are currently about 1000 theorems discussed. The
one-document project getting closest to this project is maybe the beautiful book [545].

186
OLIVER KNILL

273. Document history

The rst draft was posted on July 22, 2018 [424]. On July 23, 2018, a short list of theorems
was made available on [425]. This document history section got started on July 25-27, 2018.
• July 28 2018: Entry 36 had been a repeated prime number theorem entry. Its alternative is now the Fredholm alternative.
Also added are the Sturm theorem and Smith normal form.
• July 29: The two entries about Lidskii theorem and Radon transform are added.
• July 30: An entry about linear programming.
• July 31: An entry about random matrices.
• August 2: An entry about entropy of dieomorphisms
• August 4: 104-108 entries: linearization, law of small numbers, Ramsey, Fractals and Poincare duality.
• August 5: 109-111 entries: Rokhlin and Lax approximation, Sobolev embedding
• August 6: 112: Whitney embedding.
• August 8: 113-114: AI and Stokes entries
• August 12: 115 and 116: Moment entry and martingale theorem
• August 13: 117 and 118: theorema egregium and Shannon theorem
• August 14: 119 mountain pass
• August 15: 120, 121,122,123 exponential sums, sphere theorem, word problem and nite simple groups
• August 16: 124, 125, 126, Rubik, Sard and Elliptic curves,
• August 17: 127, 128, 129 billiards, uniformization, Kalman lter
• August 18: 130,131 Zarisky and Poincare's last theorem
• August 19: 132, 133 Geometrization, Steinitz
• August 21: 134, 135 Hilbert-Einstein, Hall marriage
• August 22: 136-130
• August 24: 141-142
• August 25: 143-144
• August 27: 145-149
• August 28: 150-151
• August 31: 152
• September 1: 153-155
• September 2: 156
• September 8: 157,158
• September 14 2018: 159-161
• September 25 2018: 162-164
• March 17 2019: 165-169
• March 20, 2019: section on paradigms
• March 21, 2019: 170
• March 27, 2019, 171
• June 20, 2019, 172
• August 6, 2020, 173-174, deepness section started
• August 8, 2020, 175-177, more on deepness section
• August 18, 2020, 178,179,
• August 19, 2020, 180,181,182
• August 20, 2020, section on essential math, 183-185
• August 24, 2020, 186,187
• August 25, 2020, 188,189,190,191
• August 26, 2020, 192, 193
• August 27, 2020, 194 - 200
• August 28, 2020, 201, 202
• August 30, 2020, 203
• August 31, 2020, 204,205
• September 5, 2020, 206,207
• September 6-8, 2020, 208-212
• September 9, 2020, 213-214
• September 10, 2020, 215,216,217,218
• September 21, 2020, 219
• October 2, 2020, 220-221
• October 8, 2020, 222-223
• October 12, 2020, 224-225
• November 4, 2020, 226-227
• November 5, 2020, 228-231
• November 6, 2020, 232
• November 16, 2020, 233-234
• November 25, 2020, 235-236

187
FUNDAMENTAL THEOREMS

• December 3, 2020, 237-238

• December 4, 2020, 239
• January 20, 2021, 240-243
• May 11, 2021, 244
• February 2, 2022, 245-250
• February 8, 2022, 251-254
• March 3, 2022, 255
• March 14, 2022, 256-257
• March 24, 2022, 258
• July 20, 2022, 259
• September 15, 2022, 260
• September 17, 2022, 261
• September 23, 2022, 262
• October 11, 2022, 263,264
• October 12, 2022, 265
• October 14, 2022, 266,267
• January 2023, 268,269,270
• June 25, 2023, 271, 272

274. Top choice

Here is a short list of 10 theorems mentioned once in a youtube clip:

• Fundamental theorem of arithmetic (prime factorization)
• Fundamental theorem of geometry (Pythagoras theorem)
• Fundamental theorem of logic (incompleteness theorem)
• Fundamental theorem of topology (rule of product)
• Fundamental theorem of computability (Turing computability)
• Fundamental theorem of calculus (Stokes theorem)
• Fundamental theorem of combinatorics, (pigeonhole principle)
• Fundamental theorem of analysis (spectral theorem)
• Fundamental theorem of algebra (polynomial factorization)
• Fundamental theorem of probability (central limit theorem)
To justify, it should rst be noted that similar arguments could be given for an other choice,
except maybe for the ve classical fundamental theorems: Arithmetic, Geometry (which is
Pythagoras), Calculus and Algebra, where one can hardly argue: except for the Pythagorean
theorem, their given name already suggests that they are considered fundamental. Here is some
reection:
• Analysis. Analysis is a large eld of mathematics ranging from calculus, complex,
harmonic or functional analysis to partial dierential equations. Operators like the
Laplacian play an important role. In harmonic analysis, one studies eigenvectors of
such operators, in complex analysis one studies the kernel of such operators (harmonic
functions and related analytic functions), in partial dierential equations one studies so-
lutions to nonlinear equations involving dierential operator. Therefore, spectral prop-
erties are important and central in analysis. Why chose the spectral theorem and not
say the more general Jordan normal form theorem? This is not an easy call but the
Jordan normal form theorem is less simple to state, is of more algebraic nature and
furthermore, that it does not stress the importance of normality giving the possibility
for a functional calculus. Also, the spectral theorem holds in innite dimensions for
operators on Hilbert spaces. If one looks at mathematical physics for example, then
it is the functional calculus of operators which is really made use of; the Jordan

188
OLIVER KNILL

normal form theorem appears rarely in comparison. In innite dimensions, a Jordan

normal form theorem would be much more dicult as the operator Au(n) = u(n + 1) on
l2 (Z) is both unitary as well as a Jordan form matrix". The spectral theorem however
sails through smoothly to innite dimensions and even applies with adaptations to un-
bounded self-adjoint operators which are important in physics. And as it is a core
part of analysis, it is also ne to see the theorem as part of analysis. The main reason
of course is that the fundamental theorem of algebra is already occupied by a theorem.
One could object that analysis" is already represented by the fundamental theorem of
calculus but calculus is so important that it can represent its own eld. The idea of
the fundamental theorem of calculus goes beyond calculus. It is essentially a cancella-
tion property, a telescopic sum or Pauli principle (d2 = 0 for exterior derivatives)
which makes the principle work. Calculus is the idea of an exterior derivative, the idea
of cohomology, a link between algebra and geometry. One can see calculus also as a
theory of time". In some sense, the fundamental theorem of calculus also represents
the eld of dierential equations and this is what time is all about".
• Probability. One can ask also why to pick the central limit theorem and not say
the Bayes formula or then the deeper law of iterated logarithm. One objection
against the Bayes formula is that it is essentially a denition, like the basic arithmetic
properties commutativity, distributivity or associativity" in an algebraic structure like
a ring. One does not present the identity a + b = b + a for example as a fundamental
theorem. Yes, the Bayes theorem has an unusual high appeal to scientists as it appears
like a magic bullet, but for a mathematician, the statement just does not have enough
beef: it is a denition, not a theorem. Not to belittle the Bayes theorem, like the notion
of entropy or the notion of logarithm, it is a genius concept. But it is not an
actual theorem, as the cleverness of the statement of Bayes lies in the denition and
so the clarication of conditional probability theory. For the central limit theorem, it is
pretty clear that it should be high up on any list of theorems, as the name suggests: it is
central. But also, it actually is stronger than some versions of the law of large numbers.
The strong law is also super seeded by Birkho's ergodic theorem which is much more
general. One could argue to pick the law of iterated logarithm or some Martingale
theorem instead but there is something appealing in the central limit theorem which
goes over to other set-ups. One can formulate the central limit theorem also for random
variables taking values in a compact topological group like when doing statistics with
spherical data [548]. An other pitch for the central limit theorem is that it is a xed
point of a renormalization map X → X + X (where the right hand side is the sum
of two independent copies of X ) in the space of random variables. This map increases
entropy and the xed point is is a random variable whose distribution function f has the
maximal entropy − R f (x) log(f (x)) dx among all probability density functions. The
R

entropy principle justies essentially all known probability density functions. Nature
just likes to maximize entropy and minimize energy or more generally - in the presence
of energy - to minimize the free energy.
• Topology. Topology is about geometric properties which do not change under contin-
uous deformation or more generally under homotopies. Quantities which are invariant

189
FUNDAMENTAL THEOREMS

under homeomorphisms are interesting. Such quantities should add up under disjoint
unions of geometries and multiply under products. The Euler characteristic is the proto-
type. Taking products is fundamental for building up Euclidean spaces (also over other
elds, not only the real numbers) which locally patch up more complicated spaces. It is
the essence of vector spaces that after building a basis, one has a product of Euclidean
spaces. Field extensions can be seen therefore as product spaces. How does the counting
principle come in? As stated, it actually is quite strong and calling it a fundamental
principle of topology" can be justied if the product of topological spaces is dened
properly: if 1 is the one-point space, one can see the statement G × 1 = G1 as the
Barycentric renement of G, implying that the Euler characteristic is a Barycentric
invariant and so that it is a counting tool" which can be pushed to the continuum, to
manifolds or varieties. And the compatibility with the product is the key to make it
work. Counting in the form of Euler characteristic goes throughout mathematics, com-
binatorics, dierential geometry or algebraic geometry. Riemann-Roch or Atiyah-Singer
and even dynamical versions like the Lefschetz xed point theorem (which generalizes
the Brouwer xed point theorem) or the even more general Atiyah-Bott theorem can be
seen as extending the basic counting principle: the Lefschetz number χ(X, T )
is a dynamical Euler characteristic which in the static case T = Id reduces to the Euler
characteristic χ(X). In school mathematics", one calls the principle the fundamental
principle of counting" or rule of product". It is put in the following way: If we have
k ways to do one thing and m ways to do an other thing, then we have k ∗ m ways to
do both". It is so simple that one can argue that it is over represented in teaching but
it is indeed important. [65] makes the point that it should be considered a founding
stone of combinatorics.
Why is the multiplicative property more fundamental than the additive counting
principle. It is again that the additive property is essentially placed in as a denition of
what a valuation is. It is in the in-out-formula χ(A ∪ B) + χ(A ∩ B) = χ(A) + χ(B).
Now, this inclusion-exclusion formula is also important in combinatorics but it is already
in the denition of what we call counting or adding things up". The multiplicative
property on the other hand is not a denition; it actually is quite non-trivial. It charac-
terizes classical mathematics as quantum mechanics or non-commutative avors
of mathematics have shown that one can extend things. So, if the rule of product"
(which is taught in elementary school) is beefed up to be more geometric and interpreted
to Euler characteristic, it becomes fundamental.
• Combinatorics. The pigeonhole principle stresses the importance of order structure,
partially ordered sets (posets) and cardinality or comparisons of cardinality. The point
for posets is made in [587] who writes The biggest lesson I learned from Richard Stanley's
work is, combinatorial objects want to be partially ordered! The use of injective functions
to express cardinality is a key part of Cantor. Like some of the ideas of Grothendieck it
is of infantile simplicity" (quote Grothendieck about schemes) but powerful. It allowed
for the stunning result that there are dierent innities. One of the reason for the success
of Cantor's set theory is the immediate applicability. For any new theory, one has to
ask: does it tell me something I did not know?" In set theory" the larger cardinality

190
OLIVER KNILL

of the reals (uncountable) than the cardinality of the algebraic numbers (countable)
gave immediately the existence of transcendental numbers. This is very elegant.
The pigeonhole principle similarly gives combinatorial results which are non trivial and
elegant. Currently, searching for the fundamental theorem of combinatorics" gives the
rule of product". As explained above, we gave it a geometric spin and placed it into
topology. Now, combinatorics and topology have always been very hard to distinguish.
Euler, who somehow booted up topology by reducing the Königsberg problem to a
problem in graph theory did that already. Combinatorial topology is essentially part of
topology. Today, some very geometric topics like algebraic geometry have been placed
within pure commutative algebra (this is how I myself was exposed to algebraic
geometry) On the other hand, some very hard core combinatorial problems like the
upper bound conjecture have been proven with algebro-geometric methods like toric
varieties which are geometric. In any case, order structures are important everywhere
and the pigeonhole principle justies the importance of order structures.
• Computation. There is no ocial fundamental theorem of computer science" but
the Turing completeness theorem comes up as a top candidate when searching on
engines. Turing formalized using Turing machines in a precise way, what computing
is, and even what a proof is. It nails down mathematical activity of running an
algorithm or argument in a mathematical way. It is also pure as it is not hardware
dependent. One can also only appreciate Turing's denition if one sees how dierent
programming languages can look like and also in logic, what type of dierent frame
works have been invented. Turing breaks all this complexity with a machine which
can be itself part of mathematics leading to the Halte problem illustrating the basic
limitations of computation. Quantum computing would add a hardware component
and might break through the Turing-Church thesis that everything we can compute
can be computed with Turing machines in the same complexity class. Gödel and Turing
are related and the Turing incompleteness theorem has a similar avor than the Gödel
incompleteness theorems. There is an other angle to it and that is the question of
complexity. I would predict that most mathematicians would currently favor the
Platonic view of the Church thesis and predict that also new paradigms like quantum
computing will never go beyond Turing computability or even not break through
complexity barriers like P-NP thresholds. It is just that the Turing completeness
theorem is too beautiful to be spoiled by a dierent type of complexity tied to a physical
world. The point of view is that anything we see in the physical world can in principle
be computed with a machine without changing the complexity class. But that
picture could be as naive as Hilbert's dream one hundred years ago. Still, whatever
happens in the future, the Turing completeness theorem remains a theorem. Theorems
stay true.
• Logic. One can certainly argue whether it would be justied to have Gödel's theorem
replaced by a theorem in category theory like the Yoneda lemma. The Yoneda result is
not easy to state and it does not produce yet an Aha moment" like Gödel's theorem
does (the liars paradox explains the core of Gödel's theorem, and it was successfully
popularized in [342].) Maybe The Yoneda theorem will hit the pop culture in the future,

191
FUNDAMENTAL THEOREMS

when all mathematics has been naturally and pedagogically well expressed in categorical
language. I'm personally not sure whether this will ever happen: not everything which
is nice also had been penetrating large parts of mathematics: an example is given by
non-standard analysis, which makes calculus orders of magnitudes easier and which is
related also to surreal numbers, which are the most natural" numbers. Both concepts
have not entered calculus or algebra textbooks and there are reasons: the subjects need
mathematical maturity and one can easily make mistakes. (I myself use non-standard
analysis on an intuitive level as presented by Nelson [540, 604] and think of a compact
set as a nite set for example which for example, where basic theorems almost require no
proof like the Bolzano theorem telling that a continuous function on a compact set takes
a maximum). But using non-standard analysis would be a no-no" both in teaching as
well when formulating mathematical thoughts for others who are not familar with the
three additional axioms IST within ZFC of Nelson. It is non-standard and true to its
name. An example where something was once pop-culture but then was sidelined are
quaternions. It might be a topic which has a comeback. Fashion is hard to predict. Also,
much of category theory still feels just like a huge conglomerate of denitions. There
is lots of dough in the form of denitions and little raisins in the form of theorems.
Historically also the language of set theory have been overkill especially in education,
where it has lead to new math" controversies in the 1960ies. The work of Russel and
Whitehead demonstrates, how clumsy things can become if boiled down to the small
pieces. We humans like to think and programming in higher order structures, rather
than doing assembly coding, we like to work in object oriented languages which give
more insight. But we like and make use of that higher order codes can be boiled down to
assembly closer to what the basic instructions are. This is similar in mathematics and
also in future, a topologist working in 4 manifold theory will hardly think about all the
denitions in terms of sets for similar reasons than a modern computer algebra system
does not break down all the objects into lists and lists of lists (even so, that's what it
often is). Category theory has a chance to change the landscape because it is close to
computer science and to natural data structures. It is more pictorial and exible than
set theory alone. It denitely has been very successful to nd new structures and see
connections within dierent elds like computer science [578]. It also has lead to more
exible axiom systems.

192
Index

E8 lattice, 42 Anosov, 28
σ -algebra, 4, 96 anti Cauchy-Schwarz inequality, 92
1/f theorem, 112 antipodal point, 39
15 theorem, 42 Antipodal theorem, 92
17-gon, 38 Antipode, 92
290 theorem, 42 Apéry constant, 124
3-connected, 63 Apéry's theorem, 124
3-sphere, 62 aperiodic, 48
3D scanning, 153 Archimedean, 118
4 color theorem, 67 Archimedean spiral, 83
4-color theorem, 143 area element, 35
4/n problem, 89 area of polygon, 30
area preservation, 62
ABC conjecture, 145 area triangle, 36
Abelianization, 90 argument, 14
abscissa of absolute convergence, 12 arithmetic mean, 20
abscissa of convergence, 12 arithmetic progression, 16
absolute Galois group, 23 arithmetic progressions, 24
absolutely continuous, 96 array multiplication, 133
absolutely continuous measure, 15 articial intelligence, 153
ADE diagrams, 104 Arzela-Ascoli, 40
adeles, 127 asset pricing, 53
adjoint, 6 associativity, 7, 21
ane camera, 31 Atiyah Bott, 143
Alaoglu theorem, 31 Atiyah Singer, 143
Alexander polynomial, 33 Atiyah-Singer theorem, 186
Alexander sphere, 150 Attractor, 106
Alexander Subbase theorem, 134 attractor of iterated function system, 46
algebra Aubry-Mather theory, 31, 36
tail, 128 Auctions, 112
algebraic closure, 2, 23 Axiom of choice, 5, 73
algebraic extension, 25 axiom of choice, 5, 113
Algebraic number, 27 axiom system, 9
algebraic number, 120
algebraic number eld, 23 Bézier curves, 27
algebraic numbers, 23 Bézout's bound, 21
Algebraic set, 6 Baire category theorem, 18
algorithm, 153 Baire space, 76
Alice in Wonderland, 97 Bakshali manuscript, 163
Almost complex structure, 19 Ballot theorem, 84
almost everywhere convergence, 72 Banach algebra, 10, 112
Almost Mathieu operator, 150 Banach xed point theorem, 12
alphabet, 22 Banach space, 8, 39, 65
Alternating sign conjecture, 81 Banach-Tarski construction, 71
Alternating sign matrix, 81 Banach-Tarski paradox, 150
AMS classication, 150, 161 Banach-Tarsky, 113
analytic function, 8 Banyaga theorem, 127
Analytical index, 38 Barycenter, 39
ancient roots, 160 Barycentric subdivision, 12
Angle trisection, 83 Bayes theorem, 4
Angle trisector, 78 Beals conjecture, 167
Angular momentum, 75 Berge graphs, 107
193
FUNDAMENTAL THEOREMS

Bernoulli shift, 22 canonical height, 102

Bernstein polynomials, 27 Canonical product, 131
Bertrand postulate, 69 canons of rhetorik, 160
Bertrand's theorem, 75 Cantor, 4
Betti number, 17 Cantor set, 26, 46, 150
bicommutant, 105 Capacity theorem, 122
bidding, 112 Caratheodory, 3
bifurcation, 11, 76 cardinality, 3, 7, 11, 12, 73
bijective, 3 Carleson theorem, 72
billiards, 59 Cartesian closed, 23
Binomial coecients, 27 Cartesian product, 12
bipartite graph, 64, 82 catastrophe, 76
Birch and Swinnerton-Dyer, 102 catastrophe theory, 136
Birch and Swinnerton-Dyer conjecture, 99 Categoricity Theorem, 115
Birkho, 3 category, 10, 23
Birkho sum, 93 Cauchy, 14
Birkho theorem, 3 Cauchy integral formula, 8
Bloch constant, 120 Cauchy integral theorem, 8
block diagonal, 80 Cauchy interlace theorem, 130
blow up of solutions, 123 Cauchy sequence, 12
Boltzmann constant, 54, 77 Cauchy-Binet theorem, 88
Bolzano-Weierstrass theorem, 79 Cauchy-Kovalevskaya theorem, 14
Bolzmann equation, 31
Cauchy-Riemann dierential equation, 8
Borel σ algebra, 96
Cauchy-Schwarz, 92
Borel measure, 10, 15, 52
Caucny-Binet, 102
Borel-Cantelli lemma, 128
Cavalieri principle, 30
Borsuk theorem, 92
Cayley theorem, 11
Borsuk-Ulam theorem, 39, 92
Cayley-Hamilton, 109
boundary, 50
cellular automaton, 22
bounded linear operator, 6
center of mass, 19
bounded martingale, 53
central force, 75
bounded stochastic process, 52
central limit theorem, 3, 175
Bounded variation, 93
centrally symmetric, 15
Bourgain's theorem, 83
centroid, 127
Bowen-Franks group, 132
Centroid theorem, 91
Bowen-Lanford Zeta function, 132
Ceva theorem, 77
Brauer group, 34
chamber, 177
Brioschi formula, 53
character, 10
Brjuno number, 28
Character group, 127
Brouwer degree, 25
Chebychev inequality, 92
Brouwer xed point theorem, 15, 26
Chebychev's inequality, 125
Brun's constant, 111
Chebyshev inequality, 146
Buon needle problem, 97
Chen's theorem, 111
Burnside lemma, 29
Chern character, 38
Buttery theorem, 164
Chern class, 70

C star algebra, 17 Chern classes, 38

Césaro convergence, 19 Cherno bound, 125
Cahen formula, 12 Chinese Remainder Theorem, 8
Calculus of variations, 80 chiral copies, 41
calculus of variations, 151 chord tangent construction, 59
Canada Day theorem, 88 chromatic number, 122
cancellative monoid, 7 Church, 10
Canomical factorization, 131 Church thesis, 10
canonical divisor, 30 Church-Turing thesis, 10

194
OLIVER KNILL

circle, 8 connectedness locus, 26

Circular law, 44 connection, 16
circumference of circle, 13 connection graph, 105
class elds, 38 Connes formula, 17
Classication of nite simple groups, 146 conservation law, 151
classication of nite simple groups, 57 consistent, 9
Cliord algebra, 119 constant curvature, 84
clique covering number, 122 context, 153
clique number, 122 continued faction expansion, 37

close orbits, 28 continuous function, 27

closed manifold, 62 continuous functions, 34

codegeneracy map, 117 continuous map, 25

coface map, 117 continuously dierentiable, 2

cognitive science, 50 Continuum hypothesis, 73, 126

Cohomology, 47 contractible, 5

cohomology, 25 contraction, 12

coloring, 47, 66 convergence, 12

Combinatorial convexity, 81 convergence in distribution, 3

combinatorics, 174 convex, 15, 19, 39, 114

communication theory, 121 convex analysis, 87, 114

commutant, 105 convex conjugate, 87

commutative, 7 convex function, 20

convex functions, 129
commutative ring, 6, 50
convex polytop, 44, 68
compact group, 9
convex set, 20
Compact non-Hausdor space, 124
convexity, 129
Compact operator, 16
convolution, 10
compact-open topology, 40
Conway-Schneeberger fteen theorem, 41
compactly supported, 18
coprime, 6, 20
compactness, 95
cost function, 31
compactness condition, 55
countable, 4
complementary angles, 36
coupling transformation, 31
complete, 9, 41
cover, 50
Complete metric space, 18
Covering dimension, 130
complete metric space, 12
critical point, 17, 55, 80
complete theory, 115
critical points, 58, 94
completeness, 12
critical set, 58
complex conjugate, 6
Crofton formula, 97
complex logarithm, 14
CRT system, 20
complex manifold, 60
crypto system, 6
complex multiplication, 38
crystallogrphy, 41
complex plane, 8
cube, 18
complexity theory, 50
cube exchange transformation, 48
composition of functions, 34
cubic equation, 24
computer algebra, 21
cumulative distribution function, 3
computer assisted proof, 40
Curie temperature, 77
computer vision, 50
Curtis-Hedlund-Lyndon, 22
conditional expectation, 53
curvature, 37
conditional probability, 4
cyclic cube exchange transformations, 48
conductor, 38
cyclic polytop, 68
conformally equivalent, 60
cyclic subgroup, 21
confusion graph, 122
Cyclotomic eld, 38
Congruent number, 99
cyclotomic eld, 38
conic, 70
conic section, 21, 70 damping factor, 73

195
FUNDAMENTAL THEOREMS

Darboux, 17 Dirichlet problem, 75

data tting, 8, 32 Dirichlet series, 12
data mining, 50 Dirichlet unit problem, 37
data science, 50 Dirichlet's unit theorem, 115
de la Vallee-Poussin, 22 discrete σ algebra, 71
De Moivre, 3 discrete dynamical system, 61
decent method, 99 discrete Ito integral, 53
Deepness, 140 discrete log, 14
Deformation idea, 148 discrete log problem, 6

degree, 2, 85 discrete time stochastic process, 52

degree of a divisor, 30 discriminant, 53

Dehn invariant, 129 Disordered system, 77

Dehn-Sommerville conditions, 68 divergence, 22

Demokrit, 151 Divine Triangles, 133

Denjoy Koksma theorem, 93 division algebra, 34, 146

Denjoy theorem, 93 division ring, 90

density of sphere packing, 42 divisor, 30

Depressed cubic, 24 dodecahedron, 18

Derivation, 122 Dodgons condensation, 97

derivative, 2 doformation idea, 148

Desnanot-Jacobi, 97 Dolbeault operators, 70

Desnanot-Jacobi adjoint matrix theorem, 97 Doob martingale convergence, 52

double suspension, 134
Determinant, 90
doubly stochastic matrix, 109
determinant, 17, 97
Douglas problem, 98
Diagonalizable, 7
Dual billiards, 121
diagonalization, 14
dual linear programming problem, 44
Dieudonné determinant, 90
Dual numbers, 34
dieomorphic, 50
dual numbers, 34
Dieomorphism, 28
duality, 148
dieomorphism, 69
dynamical system, 76
dieomorphisms, 127
dierentiable, 2
edge labeling, 47
dierentiable manifold, 17 edges, 4, 47
dierentiable sphere theorem, 56 Edwards-Anderson model, 77
dierential equation, 8 eective divisor, 30
dierential form, 51 Egyptian fractions, 89
dierential Galois theory, 122 eigenbasis, 7
Dierential operator, 29, 32 Eigenvalue, 32
dierential operator, 38, 49 Eigenvalues, 75
dierential ring, 122 eigenvalues, 7, 44
Die-Hellman system, 6 Einstein constant, 64
dimension, 11 Einstein formula, 143
Dimension theory, 130 Elasticity, 110
Diophantie approximation, 150 elementary catastrophe, 76
Diophantine, 93 elementary function, 122
Diophantine analysis, 15 elementary theory of the category of sets, 9
Diophantine condition, 27 elliptic curve, 21, 58, 135
Diophantine number, 27, 28, 36 Elliptic curve cryptography, 59
Dirac delta function, 28 elliptic regularity, 32, 38
Dirac operator, 17, 47 embedding, 20, 50
direct sum of representations, 9 Embedding theorems, 146
Dirichlet, 8 empty set, 3
Dirichlet Eta function, 12 energy momentum tensor, 64
Dirichlet prime number theorem, 16 Entire functions, 131

196
OLIVER KNILL

Entropy, 45 Fagnano triangle, 109

entropy, 54, 59, 175 Feigenbaum conjectures, 143
equalizer, 23 Feigenbaum universality, 40
Equichordal point, 100 Feigenbaum-Civtanovic functional equation, 40
equicontinuous, 40 Fenchel duality, 87
equidecomposable, 129 Fermat, 133
equilateral triangle, 127 Fermat law, 80
equilibrium, 14 Fermat prime, 38
equilibrium point, 11 Fermat's last theorem, 38
equivalence class, 21 Fermat's little theorem, 6
Eratostenes, 151 Fermat's principle, 80
Erdös conjecture on arithmetic progressions, 16 Fermat's right angle theorem, 99
Erdös Straus conjecture, 89 Feuerbach theorem, 164
Erdös Straus equation, 89 Fibonacci coding, 104
Erdoes-Ko-Rado Theorem, 86 Fibonacci sequence, 100
Ergodic, 3 Field, 131
ergodic, 40, 93 eld extension, 25
Erlangen program, 164 fteen theorem, 41, 42
error correction, 181 nite group, 9, 21
Eschenburg manifold, 95 nite projective plane, 108
Euclid's geometry, 129 nite simple graph, 4
Euclidean algorithm, 20 nitely presented group, 56, 126
Euler characteristic, 5, 11, 85, 146
rst Baire category, 18
Euler Gem, 126
First fundamental form, 53
Euler gem formula, 5
rst integral, 19
Euler golden key, 12, 22, 143
xed point set, 29
Euler handshake, 5
xed point theorem, 114
Euler law on quadrilaterals, 94
xed points, 25
Euler polyhedron formula, 5, 126
formal rules, 9
Euler polynomial, 12
four square theorem, 26, 32
Euler product, 22
Fourier basis, 19
Euler totient function, 6
Fourier coecients, 19, 28
Euler-Lagrange equations, 18, 80
Fourier series, 19, 43, 72
Euler-Pythagoras theorem, 94
Fourier theory, 2
exhaustion, 13
Fourier transform, 10, 118
Exotic sphere, 150
Fréchet derivative, 55
exotic sphere, 56, 62
Fréchet space, 65
Exotic spheres, 94
Fractal, 130
expectation, 175
fractal, 46
exponential map, 14
fractals, 95
exponential of categories, 23
Frechet derivative, 133
Extended nite Ramsey theorem, 116
Fredholm alternative, 16
extending mathematical operations, 162
Fredholm theory, 16, 32
exterior algebra, 34
Free action, 48
exterior derivative, 17
Freiheitssatz, 126
Extremal set theory, 86
Friendship problem, 47
extreme point, 39, 68
Frobenius determinant, 82
f-vector, 68 Frobenius determinant theorem, 82
Fürstenberg theorem, 24 functional calculus, 7
Féjer kernel, 19 functor, 10
faces, 63 fundamental class, 38
factor (von Neumann algebra), 105 fundamental counting principle, 12
Factorial, 71 fundamental group, 73
factorization domain, 1 fundamental lemma of calculus of variations, 18

197
FUNDAMENTAL THEOREMS

fundamental region, 15 graph, 4

fundamental theorem of algebra, 2 graph coloring, 66
Fundamental theorem of Lucas, 100 graph database, 153
fundamental theorem of Riemannian geometry, 16 Green Tao Theorem, 16
fundamental theorem of trigonometry, 13 Green's formula, 74
fundamental theorem of Vlasov dynamics, 33 Green's function, 32
Gregory, 2
g-conjecture, 68
Grobman-Hartman linearization, 45
Gödel, 9
Gromov Hausdor distance, 95
Gabriel's theorem, 104
Gross-Zagier formula, 102
Galois extension, 25, 131
Grothendick group completion, 7
Galois eld, 131
Grothendieck, 7
Galois theory, 148
Grothendieck program, 148
Game of life, 22
Group, 29
Gateaux derivative, 133
group, 7, 21
Gauss, 1, 2, 22
group completion, 7
Gauss Bonnet, 81
Group of units, 115
Gauss identity, 143
Grove-Searle theorem, 96
Gauss sum, 83
Guss-Bonnet formula, 35
Gauss sums, 55
Gauss-Bonnet, 5, 143 h-vector, 68
Gauss-Bonnet-Chern, 37, 96 Haar measure, 10, 71, 103
Gaussian curvature, 37 Hadamard, 22
Gaussian random variables, 3 Hadamard factorization, 131
Gelfond constant, 120 Hadamard product, 133
Gelfond theorem, 120 Hadwiger, 14
Gelfond-Schneider constant, 120 Hahn-Banach, 5
general position, 21 Hall Mariage, 49
general recursive function, 10 Hall marriage problem, 64
General theory of relativity, 64 Hamilton principle, 80
generalized functions, 49 happy groups, 57
Generalized handshake, 5 Harmitian, 43
Generalized Poincare conjecture, 62 harmonic analysis, 112
Generating function, 29 harmonic forms, 70
Generic, 18 harmonic series, 22
genus, 30 Hausdo distance, 95
geodesic, 134 Hausdor dimension, 28, 46, 130
geodesic complete, 41 Hausdor dimension of measure, 45
geodesic distance, 17, 41 Hausdor metric, 14
geometric mean, 20 Hausdor space, 40
Geometric probability, 97 Hedlund-Curtis-Lyndon, 22
geometric realization, 63 Heegner number, 99
Gerschgorin disc, 88 Heegner point, 102
Gershgorin circle theorem, 88 Heine-Cantor theorem, 79
Gershgorin disc, 88 Heisenberg model, 77
Gibbs formula, 143 Hermite constant, 135
Gini coecient, 93 Hermite interlace Theorem, 130
Gini potential, 93 hexagonal lattice, 135
Girko law, 44 Hidden variables, 108
global maximum, 80 Hilbert, 6
God number, 58, 169 Hilbert action, 64
Goldbach conjecture, 69, 153 Hilbert cube, 146
Golden ratio, 28, 104, 150 Hilbert distance, 101
Google matrix, 73 Hilbert metric, 73, 101
Grönwall inequality, 45 Hilbert Schmidt kernel, 16

198
OLIVER KNILL

Hilbert space, 6, 17 inconsistent, 9

Hilbert's 12th problem, 38 independence, 54
Hilbert's problems, 73 independence number, 122
Hilbert's program, 9 Independence Theorem, 126
Hilbert's third problem, 129 independent random variable, 3
Hilbert-Einstein equations, 64 independent set, 121
Hilbert-Schmidt, 110 index, 8, 21, 85
Hilbert-Waring theorem, 32 Indiscrete topological space, 124
Hippocrates Theorem, 80 Inductive dimension, 130
Hippocrates theorem, 80 Inequalities, 146
Hodge dual, 48 initial value theorem, 8
Hodge operator, 17 injective, 3
Hofstadter butterly, 179 inner product, 2, 16, 19
Holder continuity, 49 Inner Universal Teichmuller Theory, 145
Hollywood, 145 Inscribed angle theorem, 77
holomorphic, 40 integer polynomial, 23
holomorphic function, 8, 120 integer quadratic form, 41
homeomorphism, 24, 25 integrable, 40
HOMFLYPT, 33 integrable function, 19
homogeneous polynomial, 21 Integrable system, 123
homology group, 73 integral, 2
Homology sphere, 134 Integral geometry, 97
homotopica, 62 integral operator, 15
homotopy group, 73 integral quadratic form, 42
homotopy idea, 148 Interlace Theorem, 130
Hook's law, 75 intermediate value theorem, 92
Hopf conjecture, 96 invariant tori, 40
Hopf method, 60 invariant valuation, 14
Hopf Umlaufsatz, 81 inverse, 7
Hopf-Rinov theorem, 41 inverse problem, 43, 153
horse shoe, 28 inverse problems, 50
Hurewicz homomorphism, 74 inward condition, 114
Hurewicz theorem, 73 irrational rotation, 93
Hurwitz estimate, 146 Irreducible representation, 86
Hutchingson operator, 46 irreducible representation, 9
Hydra game, 116 Ishango bone, 162
hyperbola, 70 Ising model, 77
hyperbolic attractor, 106 isoperimetric inequality, 30
hyperbolic geometry, 36 isospectral deformation, 19
hyperbolic set, 28 Isospectral drum, 75
hypercomplex algebra, 34 iterated function system, 46, 95
hypercomplex numbers, 34
hypersurface, 21 Jacobi matrix, 36
Jacobi triple product, 29
icosahedron, 18 Jensen inequality, 20
icosean group, 134 Jones polynonial, 33
ideal, 6 Jordan block, 80
identity element, 7 Jordan curve, 20
ill posed problem, 43 Jordan normal form, 80
image, 8 Jordan-Brouwer separation theorem, 20
imaginary quadratic eld, 38 Jordan-Chevalley decomposition, 80
implicit function theorem, 11, 36 Julia set, 26, 130
inapproximability, 87
Income curve, 93 k-connected, 123
Inconsistency, 116 K-Theory, 7

199
FUNDAMENTAL THEOREMS

Kähler class, 70 Lebesgue measure, 15, 71

Kähler manifold, 70 Lebowski theorem, 50
König's theorem, 82 Ledrappier-Young formula, 45
Kakutani xed point theorem, 15 Leech lattice, 42
Kakutani skyscraper, 48 Lefschetz xed point theorem, 15
Kalman lter, 61 Lefschetz-Hopf xed point theorem, 25
KAM, 36, 143 left invariant, 10
KAM theory, 31, 36 Legendre conjecture, 69
KAM tori, 150 Legendre symbol, 26
Kepler problem, 75, 143
Legendre transform, 87
kernel, 8
Leibniz, 2
Khipu, 153
Leibniz rule, 16, 122
Killing-Hopf theorem, 84
lemma
Kirchho Laplacian, 126
Borel-Cantelli, 128
Klein Erlangen program, 148
length of polygon, 30
Klein model, 36
Leonardo Pisano, 100
knapsack decision problem, 87
Lerch transcendent, 12
knot, 33, 78
Levi Civita, 16
knot sum, 33
Levi-Civita connection, 16
Koch curve, 46
Levy, 3
Kochen-Specker theorem, 108
Liber Abbaci, 100
Koebe function, 120
liberal arts and sciences, 160
Koebe one quarter theorem, 120
Lidelof, 9
Koenigsberg bridge problem, 176
Kolmogorov zero-one law, 128 Lidskii-Last, 43

Koopman theory, 148 Lie group, 14

Korn inequality, 110 limit, 2, 12, 13

Koszul formula, 16 limit cycle, 27

Kowalevskaya, 14 line bundle, 38

Kreisnormierungsproblem, 120 line segment, 19, 20

Kronecker pairing, 38 linear, 2
Kronecker-Weber theorem, 38 linear functionals, 65
Kruskal-Katona Theorem, 86 linear program, 44
Kuratowski, 5 link, 33
Liouville integrable, 19
L'Hopital rule, 13
Lipschitz, 8
Lagrange equations, 30, 80
local automaton, 22
Lagrange four square theorem, 41, 42
local continuation, 11
Lagrange rest term, 29
local eld, 38
Lagrange theorem, 21
local maximum, 80
Landau conjecture, 69
locally compact, 10
Landau problem, 16, 145, 153
locally compact topological space, 39
Landsberg-Schaar relation, 55
locally connected, 26
Langlands program, 145
Locally convex, 65
Laplace equation, 32
Loewenheim-Skolem theorem, 115
Laplacian, 75, 93
last theorem of Fermat, 143 Loewner, 135

lattice, 15 logarithm, 13, 22

lattice gas model, 77 Logistic map, 150

law of iterated logarithm, 175 Lorenz attractor, 27

law of large numbers, 3, 175 Lorenz curve, 93

law of quadratic reciprocity, 26 Lovasz umbrella, 122

Lax system, 19 lower semi-continuous, 114
least square solution, 8 Lucas sequence, 100
Lebesgue decomposition, 96 lunes of Hippocrates, 80

200
OLIVER KNILL

Lusternik Schnirelman Borsuk antipodal theorem, Millenium problems, 145

92 Milnor sphere, 94
Lusternik Schnirelmann, 134 mimiocretin, 50
Lyapunov exponent, 45, 59, 103 minimal Matching, 82
minimal separator, 123
Magnetic resonance imaging, 43
Minimal surface, 98
Magnus theorem, 126
Minkowski, 15
Mandelbrot set, 26, 65, 130, 150
Minkowski addition, 114
Mandelbulb set, 65, 150
Minkowski sum, 114
manifold, 17, 50
Minor, 102
Margulis-Ruelle inequality, 45
mixing, 40
Markov matrix, 73
modular elliptic curve, 102
Markov's inequality, 125
modular function, 102
marriage condition, 64
moduli, 20
marriage theorem, 64
moduli space, 164
Martin Axiom, 126
Moebius transformation, 120
martingale, 53
moment generating function, 52, 125
martingale transform, 53
Moment methods, 52
mass critical, 123
Monge-Kantorovich, 31
mass subcritical, 123
monique polynomial, 25
Masur theorem, 110
monochromatic, 47
matching, 82
monoid, 7
Math in movies, 145
monomial, 21
Mathematical depth, 140
monster, 57
mathematical roots, 160
Montel theorem, 40
Mather set, 150
Moore's theorem, 131
Mathieu operator, 179
Mordell-Weil theorem, 66
matrix, 8, 97
Morera's theorem, 8
matrix computation, 109
Morley theorem, 164
Matrix elements, 86
Morley triangle, 78
maximal connector, 123
morphism, 10
maximal equilibrium, 14
Morse function, 17
maximum principle, 44, 146
Morse index, 17
Maxwell equations, 143
Morse inequality, 17
maze, 126
Moser trick, 17
Mazur torsion theorem, 66
Moser-Neumann problem, 121
meager set, 18
MRI, 43
meagre set, 18
multi-complex, 68
mean, 3
Multibrot set, 26
Mean value theorem, 29
multidigraph, 23
measurable, 96
multiple connections, 4
measurable space, 96
multiple recurrent, 24
measure preservation, 62
multivariate moments, 52
measure preserving, 3
memory, 153 n-body problem, 33
Menger carpet, 46 n-connected space, 74
Menger sponge, 150 n-linearization, 45
Menger's theorem, 123 Napoleon points, 127
Mergelyan theorem, 27 Napoleon triangle, 127
meromorphic function, 30 Napoleon's theorem, 127
Mersenne, 133 Nash embedding theorem, 89
Mersenne primes, 166 Nash equilibrim, 15
method of characteristics, 33 Nash equilibrium, 112
metric space, 12 Nash-Kuiper, 150
metrizable, 40 Nash-Moser inverse function theorem, 89

201
FUNDAMENTAL THEOREMS

natural logarithm, 13, 22 Optimal transport problem, 31

natural numbers, 1 optimization problem, 50
natural transformation, 10 orbit, 29
Negative curvature manifolds, 84 order of a perfect dierence set, 108
nerve, 92 orthogonal complement, 8
Neuberger's theorem, 133 orthogonal group, 9
new foundations, 9 Orthogonal projection, 31
Newton, 2 orthonormal eigenbasis, 7
Newton method, 11 orthopic triangle, 109
Newton potential, 75, 93 Ostrowski, 2
Nine chapters on the Mathematical art, 163 Ostrowski theorem, 146
Noether theorem, 151 Outer billiards, 121
non-Archimedean, 118
p-adic eld, 38
non-commutative geometry, 17, 151
p-adic integers, 9, 118
non-commutative measure theory, 106
p-adic norm, 118
non-degenerate, 11
p-adic numbers, 118, 127
non-degenerate 2-form, 17
p-adic valuation, 127, 146
non-degenerate critical point, 17
Page rank, 136
non-degenerate critical points, 94
page rank, 73, 136
non-Euclidean geometry, 36
Palais-Smale condition, 55
non-negative matrix, 72
Pappus centroid theorem, 91
non-singular curve, 58
Pappus hexagon theorem, 70
Noncommutative determinant, 90
Pappus theorem, 164
nonlinear partial dierential equation, 123
Paradigm, 149
nonlinear Schroedinger equation, 123
Parallelogram law, 94
Nonsqueezing theorem, 143
parallelogram law, 102
normal extension, 25
parametrization, 50
normal family, 40
pariah, 57
normal group, 57
Parris-Harrington theorem, 116
normal operator, 6
Parry-Sullivan Theorem, 132
normal subgroup, 56
Parseval's identity, 72
Normal topological space, 124
partial derivatives, 49
notation, 153
partial dierences, 52
NP complete, 82
partial dierential equation, 14
NP decision problem, 87
partially hyperbolic attractor, 106
nuclear magnetic resonance, 43
partition, 21
Nullstellensatz, 6
partition function, 143
Nullstellenstatz, 6
Pascal congurations, 70
number eld, 115
Pascal theorem, 70
Number theory, 15
patterns, 151
Numerical analysis, 27
payo function, 14
numerical methods, 151
PCP theorem, 87
Nyquist-Shannon sampling theorem, 85
Peano, 9
OCR, 50 Peano axioms, 9, 116
octahedron, 18 pedagogy, 153
octic conjecture, 35 Pedal triangle map, 182
octonions, 34 Pell's equation, 37
Olympiad problems, 145 Penrose polygon, 121
open domain, 8 Pentagonal number theorem, 29
Open problems, 145 Perelman theorem, 143
open set, 25 perfect dierence set, 108
optimal sphere packing, 42 Perfect Euler brick, 94
optimal stopping time, 53 perfect eld, 80
optimal transport, 31 perfect graphs, 107

202
OLIVER KNILL

perfect numbers, 144 positive matrix, 72

periodic geodesic, 134 Potts model, 77
periodic points, 48 power set, 3
permanent, 109 pre-sheave, 11
perpendicular, 2, 8 Preissmann's theorem, 84
Perron Frobenius, 101 primality test, 71
Perron-Frobenius theorem, 72 prime, 6, 22
Perseval equality, 2 prime counting function, 22, 69
Pesin formula, 45 prime factorization, 1

Peter-Weyl theorem, 86 prime factors, 1

Pfaan, 37 prime manifolds, 62

Phase transition, 146 Prime number theorem, 22

phase transition, 77 prime power conjecture, 108

Picard, 9 prime twin, 111

Pick theorem, 74 Prime twin conjecture, 153

pigeonhole principle, 7 primes, 1

Pizza theorem, 132 primitive root of unity, 38

place, 118 principal curvature, 37

planar, 63 principal divisor, 30

planar graph, 66, 126 principal ideal domain, 43

plasma, 123 principal logarithm, 14

Plateau problem, 98 prizes, 145

probability measure, 4
Platonic polytop, 18
probability space, 3, 4, 19, 125
player, 14
probability vector, 109
Poincaré Bendixon, 27
product topology, 5
Poincaré conjecture, 62, 148
pronite group, 9
Poincaré disk, 36
projection function, 5
Poincaré homology sphere, 134
projective completion, 21
Poincaré recurrence, 24
projective geometry, 70
Poincaré's last theorem, 62
projective metric, 101
Poincaré-Hopf, 85
projective space, 21
Poincaré-Sigel theorem, 36
Projective spaces, 95
Poincare duality, 47
provably consistent, 9
Poisson commute, 19
Pruefer group, 118
Poisson equation, 32
pseudo orbit, 28
Poisson summation, 28
Ptolemy's theorem, 129
Polish space, 18, 31, 146
Puiseux formula, 53
polychora, 177
pure point measure, 15
polydisc, 103
Putnam problems, 145
polydisk, 29
Pythagoras theorem, 2
polygon, 129
Pythagorean triples, 94, 99
polyhedral formula, 63
polylogarithm, 12 quadratic family, 26
polynomial, 2, 27, 130 quadratic form, 42, 119
Polynomial averages, 83 quadratic non-residue, 26
Polynomial ergodic theorem, 83 quadratic reciprocity, 26, 143
polytopes, 177 quadratic residue, 26
Pontryagin dual, 118, 127 quadrilateral theorem, 94
Pontryagin duality, 10 quadrisecant, 78
Popoviciu theorem, 129 quadrivium, 160
Positive curvature manifolds, 95 quality, 145
positive denite, 133 quantisation, 151
positive denite tensor, 16 quantization, 119
positive denition, 42 Quantum mechanics, 108

203
FUNDAMENTAL THEOREMS

quarter pinched, 56 Riemannian manifolds, 84

quartic equation, 24 right angle triangle, 2
quasi-linear Cauchy problem, 14 rigid body, 19
Quasi-rational polygons, 121 rigid motion, 14
Quasiperiodic, 40 Ring of integers, 115
quaternion, 90, 119 Rising sun lemma, 79
Quiver, 132 Rising sun property, 78
quiver, 23, 104 Robbins numbers, 81
quotient group, 21 Roessler attractor, 27
rogue waves, 123
r-Diophantine, 93
Rokhlin lemma, 40
radical, 6, 145
Rokholin tower, 48
Radon theorem, 39, 81
root, 11
Radon transform, 43
root of unity, 38
Radon-Nikodym derivative, 96
rotation, 41
Radon-Nikodym theorem, 15, 96
row reduction, 90
Ramanujan constant, 99
RSA, 181
Ramanujan Cubic Identity, 135
RSA crypto system, 6
Ramanujan primes, 69
Rubik cube, 58, 169
Ramsey theory, 47, 116
Rubik cuboid, 58
random matrix, 44
Rudin sphere, 150
random variable, 3
rule of product, 12
Rank of a ring of integers, 115
ruler and compass, 83
rank-nullity, 8
Runge Kutta, 151
Rasiowa Sikorski theorem, 126
Runge theorem, 27
real analytic, 29
Rychlik's theorem, 100
recursive function, 10
Reeb sphere theorem, 94 saddle point, 55
reection, 41 Sakai theorem, 105
reection ambiguity, 32 Sandwich theorem, 122
regular Hausdor, 25 Sard theorem, 58
regular n-gon, 38 Scalar curvature, 64
regular polygon, 18 Schauder xed point theorem, 26
relative density, 16 scheme, 50
reminder, 6 Schoenies theorem, 20
representation, 9 Schröder equation, 36
residual, 76 Schroeder equation, 36
residual set, 18 Schroeder-Bernstein, 4
residue, 8 Schroedinger equation, 32
Residue calculus, 8 Schroedinger operator, 32, 36
resonance condition, 45 Schur complement, 102
Revolution, 149 Schur product, 133
Rhynd papyrus, 89 Schwartz space, 85
Ricci curvature, 70 Schwarz lemma, 146
Ricci ow, 56 scissors congruent, 129
Ricci tensor, 64 Second Baire category, 18
Riemann curvature tensor, 37 second countable, 25
Riemann hypothesis, 145 second fundamental form, 53
Riemann Lebesgue theorem, 28 Second Theorem, 129
Riemann mapping theorem, 60 secret notebook, 63
Riemann surface, 30, 60 sectional curvature, 56
Riemann zeta function, 22 sedenions, 34
Riemann-Roch theorem, 31 Self adjoint, 43
Riemannian geometry, 36 self-adjoint operator, 6
Riemannian manifold, 16 self-loops, 4

204
OLIVER KNILL

semi simple representation, 9 spectral measure, 44

set category, 10 Spectral triple, 17
sextic conjecture, 35 spectrum, 15
shadowing, 143 sphere, 5, 36
shadowing property, 28 Sphere packing, 146
Shannon capacity, 121, 122 sphere packing, 42
Shannon entropy, 54, 109, 146 sphere theorem, 56
Shannon sampling theorem, 85 spherical geometry, 36
Shannon zero error capacity, 121 spin gas model, 77
Shapley-Folkman theorem, 114 spin structure, 119
Sherrington-Kirpatrick, 77 Spinors, 119
Shishikura theorem, 130 splines, 27
Sidon sets, 108 split algebra, 34
Siegel linearization theorem, 36 sprasiers, 143
Siegel theorem, 28 square matrix, 42
Sieve bound for prime twins, 111 stability, 36
sign changes, 42 standard deviation, 3, 175
signal matrix, 112 standard Dirichlet series, 12
signal to noise ratio, 122 Standard map, 36
signature, 90 Standard model, 151
similarity dimension, 46 standard symplectic form, 17
Simon's problems, 145 Stark-Heegner theorem, 99
simple, 20 Steinitz theorem, 63
Simple group, 127 Sternberg linarization, 45
simple group, 56, 57 Stirling formula, 143
simplex algorithm, 44 stochastic dierence equation, 61
simplicial category, 117 stochastic matrix, 109
simplicial complex, 11 stochastic process, 52
simplicial set, 117 Stokes, 2
simplicial spheres, 68 Stokes theorem, 51
simply connected, 62 Stone-Weierstrass theorem, 27
sinc function, 13 stopping time, 53
singular continuous, 96 straightedge and compass, 83
singular continuous measure, 15 strange attractor, 27, 106
singular value decomposition, 8 strategy prole, 14
singularity theory, 76 strong divisibility sequence, 100
Skew elds, 131 strong implicit function theorem, 36
Smale horse shoe, 28, 150 Strong law of small numbers, 46
Smale's problems, 145 strong Morse inequality, 17
small divisors, 36 strong perfect graph conjecture, 107
Smith normal form, 43 strong perfect graph theorem, 107
smooth, 18 Strong product, 121
smooth function, 11 structure from motion, 31
Smooth Poincare conjecture, 62 Study determinant, 90
smooth vector eld, 16 Sturm chain, 42
Soap bubbles, 98 Sturm-Liouville theory, 42
Sobolev embedding, 49 Subbase theorem, 134
Sobolev norm, 110 subgraph, 5
soliton, 123 subgroup, 21
space average, 3 subharmonic, 103
space form, 84 Subhift of nite type, 132
space group, 41 submanifold, 51
spanning tree, 126 subobject classier, 23
spectral integrability, 40 Super space, 95

205
FUNDAMENTAL THEOREMS

superposition theorem, 34 Time-1-map, 45

supremum norm, 40 tissue density, 43
surface area, 91 Toda system, 19
surface area of sphere, 30 Todd class, 38
surjective, 3 Toeplitz matrix, 36
surreal numbers, 162 tomography, 151
Suspension, 134 topological base, 25
symbol, 38 Topological dimension, 130
symemtry groyup, 58 topological dynamical system, 24
symmetric tensor, 16 topological group, 10
Symmetrized Sobolev norm, 110 Topological index, 38
Symmetry, 148 topological spaces, 25
symplectic, 9 toral dynamical system, 48
symplectic capacity, 69 toric variety, 68
symplectic embedding, 69 torsion free, 16
symplectic form, 17 Torus inequality, 135
symplectic manifold, 17, 19, 127 total curvature, 78
symplectic matrix, 17 Tower of Hanoi problem, 100
symplectomorphism, 69 trace Cayley Hamilton, 109
synchronization, 151 trace powers, 109
Systole, 135 transcendental extension, 25
Szemerédi theorem, 16, 24 transcendental number, 120
Szpilrajn-Marczewski theorem, 105 transcendental numbers, 23
transformation, 25
tail σ -algebra, 128
translation, 41
Takens's embedding theorem, 106
transport equation, 29
tally stick, 162
Traveling salesperson problem, 87
Tate's theorem, 127
tree, 63
Taxonomy, 150
trigonometric functions, 13
Taylor compatibility, 125
trigonometry, 13
Taylor series, 12, 29
trisectrix, 83
Taylor's theorem, 125
Triskaidecagon, 83
tempered distributions, 49
trivium, 160
terminal object, 23
Tubes, 91
tessarines, 34
Turan graph theorem, 104
tesseract, 18
Turan graphs, 104
test functions, 49
Turing computable, 10
tetrahedron, 18
Turing machine, 10
Thales theorem, 77
Tverberg partition, 81
Theorem of Hausdor-Hildebrandt-Schoenberg, 52
Tverberg's theorem, 81
Theorem of Helly, 39
Twin prime, 166
Theorem of Lagrange, 32
Twin prime conjecture, 69
Theorem of Lebowski, 50
twist homeomorphism, 62
theorem of the three geodesics, 134
twist map, 62
Theorema egregium, 53
Twist map theorem, 36
Thietze extension theorem, 124
Two Chords Theorem, 129
three body problem, 62
Tychonov theorem, 5, 113
Three circle theorem, 146
type I factor, 105
three circle theorem, 146
type II factor, 105
Thue-Siegel-Roth Theorem, 27
type III factor, 105
Thurston geometry, 62
tight binding approximation, 151 Ulam spiral, 172
Tikhonov theorem, 134 Ullman's theorem, 31
time average, 3 ultimate question, 50
time series, 106 ultra metric, 118

206
OLIVER KNILL

uncountable, 4 Wallace-Bolyai-Gerwien, 129

uniform structure, 79 Wallach manifolds, 95
uniformization theorem, 60 Waring problem, 32
uniformly bounded, 40 Weak convergence, 40
Unimodular map, 40 weak Morse inequalities, 17
Unique prime factorization, 1 Weak solution, 33
unit ball, 14 weak solutions, 18
unit sphere, 5, 62 weak* topology, 65
unitary group, 9 weakly closed, 105
unitary operator, 6 weakly mixing, 40
unitary representation, 10 Wedderburn little theorem, 34
universal, 41, 42 Wedderburn's theorem, 131
Universal property, 119 Weierstrass approximation theorem, 27
Universality, 40 Weierstrass equation, 58
universality, 151 Weierstrass factorization, 131
unknot, 33, 78 well ordered, 113
unknotted, 78 Well ordering theorem, 113
upper bound conjecture, 68 Weyl conjectures, 148
upper bound theorem, 146 Whisper galleries, 60
Upper density, 24 Whitney embedding problem, 89
Urysohn metrization, 25 Whitney extension property, 125
Whitney extension theorem, 125
valuation, 5, 14 Whitney topology, 76
valuation matrix, 112 Wiener 1/f theorem, 112
valuation of a eld, 146 Wiener algebra, 112
value, 136 Wiener problem, 61
value function, 151 Wiener theorem, 28
Van der Monde determinant, 84 Wigner semi circle law, 44
van der Waerden conjecture, 109 Wilson primes, 71
Van der Waerden's theorem, 47 Wilson theorem, 22, 71, 143
van Hove limit, 77 winding number, 8
Variance, 2 wireless communication, 122
variance, 20, 175 Witten deformation, 17
variational problem, 18 Wolf bone, 162
Varignon theorem, 143 Wolfram numbering, 22
Vector eld, 11 Wonderland theorem, 40
vector eld, 14
vector space, 9 Yoneda lemma, 10

vertex Cover, 82
Zagier inequality, 92
vertex cover problem, 82
Zeckendorf multiplication, 104
vertex degree, 4
Zeckendorf representation, 104
vertices, 4
Zermele-Frenkel, 5
Vickrey-Clarke-Groves auction, 112
Zermelo Fraenkel, 73
Vietoris theorem, 124
zero dimensional complexes, 12
viral eects, 136
Zeta function, 124, 150
Vitali theorem, 71
Zeta-3, 124
Vlasov dynamics, 33
ZFC, 126
Vlasov system, 33
ZFC axiom system, 5
volume, 15, 30, 91
Zhang's theorem, 111
volume of ball, 30
Zorn lemma, 5
von Neumann algebra, 105
Zorn's lemma, 113
von Staudth theorem, 126

Wahrscheinlichkeit, 54
wall paper group, 41

207
FUNDAMENTAL THEOREMS

Bibliography
References

[1] A005130. The on-line encyclopedia of integer sequences. https://oeis.org.

[2] P. Abad and J. Abad. The hundred greatest theorems.
http://pirate.shu.edu/ kahlnath/Top100.html, 1999.
[3] R. Abraham, J.E. Marsden, and T. Ratiu. Manifolds, Tensor Analysis and Applications. Applied Mathe-
matical Sciences, 75. Springer Verlag, New York etc., second edition, 1988.
[4] A. Aczel. Descartes's secret notebook, a true tale of Mathematics, Mysticism and the Quest to Understand
the Universe. Broadway Books, 2005.
[5] C.C. Adams. The Knot Book. Freeman and Company, 1994.
[6] D. Adams. The Hitchhiker's guide to the galaxy. Pan Books, 1979.
[7] L. Addario-Berry and B.A. Reed B.A. Ballot theorems, old and new. In E. Gyori, G.O.H. Katona, L. Lo-
vasz, and G. Sagi (Editors), editors, Horizons of Combinatorics, volume 17 of Bolyai Society Mathematical
Studies. Springer, Berlin, Heidelberg, 2008.
[8] C.C. Aggarwal and H. Wang. Managing and Mining Graph Data, volume 40 of Advances in database
systems. Springer, 2010.
[9] R. Aharoni. Mathematics, Poetry and Beauty. World Scientic, 2015.
[10] L. Ahlfors. Complex Analysis. McGraw-Hill Education, 1979.
[11] M. Aigner. Turán's graph theorem. American Mathematical Monthly, 102:808816, 1995.
[12] M. Aigner and G.M. Ziegler. Proofs from the book. Springer Verlag, Berlin, 2 edition, 2010. Chapter 29.
[13] A. Alexander. Duel at Dawn. Harvard University Press, 2010.
[14] P.S. Alexandrov. Combinatorial topology. Dover books on Mathematics. Dover Publications, Inc, 1956.
Three volumes bound as one.
[15] J.M. Almira and A. Romero. Yet another application of the Gauss-Bonnet Theorem for the sphere. Bull.
Belg. Math. Soc., 14:341342, 2007.
[16] S. Alpern and V.S. Prasad. Combinatorial proofs of the Conley-Zehnder-Franks theorem on a xed point
for torus homeomorphisms. Advances in Mathematics, 99:238247, 1993.
[17] C. Alsina and R.B. Nelsen. Charming Proofs. A journey into Elegant Mathematics, volume 42 of Dolciani
Mathematical Expositions. MAA, 2010.
[18] A. Ambrosetti and P. Rabinowitz. Dual variational methods in critical point theory and applications.
Journal of Functional Analysis, 14:349381, 1973.
[19] J.W. Anderson. Hyperbolic Geometry. Springer, 2 edition, 2005.
[20] T. Andreescu, O. Mushkarov, and L. Stoyanov. Geometric problems on maxima and minima. Birkhüser,
2006.
[21] G.E. Andrews. The theory of partitions. Cambridge Mathematical Library. Cambridge University Press,
1976.
[22] Archimedes. On Spirals, pages 151188. Cambridge Library Collection - Mathematics. Cambridge Uni-
versity Press, 2009.
[23] K. Arnold and J. Peyton, editors. A C User's Guide to ANSI C. Prentice-Hall, Inc., 1992.
[24] V.I. Arnold. Mathematical Methods of Classical Mechanics. Springer Verlag, New York, 2 edition, 1980.
[25] V.I. Arnold. Lectures on Partial Dierential Equations. Springer Verlag, 2004.
[26] V.I. Arnold. Experimental Mathematics. AMS, 2015. Translated by Dmitry Fuch and Mark Saul.
[27] E. Artin. Geometric Algebra. Interscience, 1957.
[28] M. Ascher and R. Ascher. Mathematics of the Incas: Code of the Quipu. Dover Publications, 1981.
[29] E. Asplund and L. Bungart. A rst course in integration. Holt, Rinehart and Winston, 1966.
[30] M. Atiyah. K-Theory. W.A. Benjamin, Inc, 1967.
[31] M. Atiyah. Mathematics in the 20th century. American Mathematical Monthly, 108:654666, 2001.
[32] J-P. Aubin and I. Ekeland. Applied nonlinear Analysis. John Wiley and Sons, 1984.
[33] T. Aubin. Nonlinear Analysis on Manifolds. Monge-Amp`ere equations. Springer, 1982.
[34] W. Willems B. Huppert. Lineare Algebra. Vieweg,Teubner, 2 edition, 2010.
[35] M. Mirek B. Krause and T. Tao. Pointwise ergodic theorems for non-conventional bilinear polynomial
averages. https://arxiv.org/abs/2008.00857, 2020.

208
OLIVER KNILL

[36] J. Bach. The lebowski theorem. https://twitter.com/plinz/status/985249543582355458, Apr 14, 2018.

[37] J.C. Baez. The octonions. Bull. Amer. Math. Soc. (N.S.), 39(2):145205, 2002.
[38] R. Balakrishnan and K. Ranganathan. A textbook of Graph Theory. Springer, 2012.
[39] W.W. Rouse Ball. A short account of the History of mathematics. McMillan and co, London and New
York, 1988. Reprinted by Dover Publications, 1960.
[40] W. Ballmann. Lectures on Kähler Manifolds. ESI Lectures in Mathematics and Physics. European Math-
ematical Society, 2006.
[41] T. Bancho. Critical points and curvature for embedded polyhedra. J. Dierential Geometry, 1:245256,
1967.
[42] T. Bancho. Critical points and curvature for embedded polyhedral surfaces. Amer. Math. Monthly,
77:475485, 1970.
[43] T. Bancho. Beyond the Third Dimension, Geometry, Computer Graphics and Higher Dimensions. Sci-
entic American Library, 1990.
[44] A. Banyaga. Sur la structure du groupe des diëomorphismes qui prëservent une forme symplectique.
Comm.Math.Helv., pages 174227, 1978.
[45] A. Banyaga. The structure of classical dieomorphism groups, volume 400 of Mathematics and its appli-
cations. Kluwer Academic Publisher's Group, 1997.
[46] A-L. Barabasi. Linked, The New Science of Networks. Perseus Books Group, 2002.
[47] G. Baumslag. Topics in combinatorial group theory. Birkhäuser Verlag, 1993.
[48] G. Baumslag. Musings on magnus. In K. Kuiken W. Abiko, J.S. Birman, editor, The Mathematical
Legacy of Wilhelm Magnus, Groups, Geometry and Special Functions. AMS, 1994.
[49] A. Beardon. Iteration of Rational Functions. Graduate Texts in Mathematics. Springer-Verlag, New York,
1990.
[50] E. Behrends. Fünf Minuten Mathematik. Vieweg + Teubner, 2006.
[51] E.T. Bell. The Development of Mathematics. McGraw Hill Book Company, 1945.
[52] E.T. Bell. Men of Mathematics. Penguin books, 1953 (originally 1937).
[53] B.Engquist and W. Schmid (Editors). Mathematics Unlimited - 2001 and Beyond. Springer, 2001.
[54] S.K. Berberian. Fundamentals of Real analysis. Springer, 1998.
[55] C. Berge. Färbung von graphen, deren sämtliche bzw. deren ungerade kreise starr sind. Wiss. Z. Martin-
Luther-Univ. Halle-Wittenberg Math.-Natur. Reihe, 10:114, 1961.
[56] M. Berger. Riemannian Geometry During the Second Half of the Twentieth Century. AMS, 2002.
[57] M. Berger. A Panoramic View of Riemannian Geometry. Springer, 2003.
[58] M. Berger and B. Gostiaux. Dierential geometry: manifolds, curves, and surfaces, volume 115 of Graduate
Texts in Mathematics. Springer-Verlag, New York, 1988.
[59] W. Berlingho and F. Gouvea. Math through the ages. Mathematical Association of America Textbooks,
2004.
[60] J. Bertrand. Solution d'un probleme. Comptes Rendus de l'Academie des Sciences, 105:369, 1887.
[61] A. Besikowitsch. Über analytische Funktionen mit vorgeschriebenen Werten ihrer Ableitung. Mathema-
tische Zeitschrift, 21:111118, 1924.
[62] L. Bieberbach. Über die Koezienten derjenigen Potenzreihen, welche eine schlichte Abbildung des Ein-
heitskreises vermitteln. Preussische Akademie der Wissenschaften Berlin: Sitzungsberichte der Preuÿischen
Akademie der Wissenschaften zu Berlin. Reimer in Komm., 1916.
[63] L. Bieberbach. Zur lehre von den kubischen konstruktionen. Journal für die reine und angewandte Math-
ematik, pages 142146, 1932.
[64] H-G. Bigalke. Heinrich Heesch, Kristallgeometrie, Parkettierungen, Vierfarbenforschung. Birkhäuser,
1988.
[65] N.L. Biggs. The roots of combinatorics. Historia Mathematica, 6:109136, 1979.
[66] B. Birch. Heegner Points: The Beginnings, volume 49, pages 110. MSRI Publications, 2004.
[67] G. Birkho. Extensions of Jentzsch's theorem. Trans. Amer. Math. Soc., 85:219227, 1957.
[68] G. D. Birkho. An extension of poincareés last theorem. Acta Math., 47:297311, 1925.
[69] R.L. Bishop and S.I. Goldberg. Some implications on the generalized Gauss-Bonnet theorem. Transactions
of the AMS, 112:508535, 1964.
[70] B. Blackadar. Operator Algebras: Theory of C*-Algebras and Von Neumann Algebras. Encyclopaedia of
Mathematical Sciences. Springer, 1 edition, 2005.

209
FUNDAMENTAL THEOREMS

[71] W. Blaschke. Vorlesungen über Integralgeometrie. Chelsea Publishing Company, New York, 1949.
[72] W. Blaschke, W. Rothe, and R. Weitzenböck. Aufgabe 552. Arch. Math. Phys., 27, 1917.
[73] R. P. Boas. Entire functions. Pure and Applied Mathematics. Academic Press Inc, 1954.
[74] A. Bogomolny. Wallace bolyai gerwien theorem. https://www.cut-the-
knot.org/do_you_know /Bolyai.shtml, assessed March 14, 2022.
[75] B. Bollobás. The art of mathematics, coee time in memphis. Cambridge University Press, 2006.
[76] J. Bondy and U. Murty. Graph theory, volume 244 of Graduate Texts in Mathematics. Springer, New
York, 2008.
[77] T. Bonnesen and W. Fenchel. Theorie der konvexen Körper. Springer Verlag, berichtigter reprint edition,
1974.
[78] W.W. Boone and G. Higman. An algebraic characterization of the solvability of the word problem. J.
Austral. Math. Soc., 18, 1974.
[79] K.C. Border. Fixed point theorems with applications to economics and game theory. Cambridge University
Press, 1985.
[80] K. Borsuk. Drei Sätze über die n-dimensionale euklidische Sphäre. Fund. Math., pages 177190, 1933.
[81] N. Boston. The proof of fermat's last theorem. Lecture notes, Spring 2003, 2003.
[82] N. Bourbaki. Elements d'histoire des Mathematiques. Springer, 1984.
[83] N. Bourbaki. Elements de Mathematique. Springer, 2006.
[84] J. Bourgain. On the maximal ergodic theorem for certain subsets of the integers. Israel J. Math, 61:3972,
1988.
[85] J. Bourgain. Green's function estimates for lattice Schrödinger operators and applications, volume 158 of
Annals of Mathematics Studies. Princeton Univ. Press, Princeton, NJ, 2005.
[86] P. Bourke. Visualising volumetric fractals. GSTF Journal on Computing, 5, 2017.
[87] A. Boutot. Catastrophe theory and its critics. Synthese, 96:167200, 1993.
[88] R. Bowen and J. Franks. Homology for zero-dimensional nonwandering sets. Annals of Mathematics (Sec-
ond Series), 106(1):7392, 1977.
[89] C. Boyer. A History of Mathematics. John Wiley and Sons, Inc, 2nd edition, 1991.
[90] F. Brechenmacher. Histoire du theoreme de Jordan de la decomposition matricielle (1870-1930). Ecole des
Hautes Etudes en Sciences Sociales, 2005. PhD thesis, EHESS.
[91] G.E. Bredon. Topology and Geometry, volume 139 of Graduate Texts in Mathematics. Springer Verlag,
1993.
[92] S. Brendle. Ricci Flow and the Sphere theorem. Graduate Studies in Mathematics. AMS, 2010.
[93] J.L. Brenner. Applications of the Dieudonné Determimant. Linear algebra and its applications, 1:511536,
1968.
[94] D. Bressoud. Historical reections on the fundamental theorem of calculus. MAA Talk of May 11, 2011.
[95] D. M. Bressoud. Factorization and Primality Testing. Springer Verlag, 1989.
[96] M. Bressoud. Proofs and Conrmations, The story of the Alternating Sign Matrix Conjecture. MMA and
Cambridge University Press, 1999.
[97] O. Bretscher. Calculus I. Lecture Notes Harvard, 2006.
[98] H. Brezis. Functional Anslysis, Sobolev Spaces and Partial Dierential Equations. University text.
Springer, 2011.
[99] M. Brown and W.D. Neuman. Proof of the Poincaré-Birkho xed point theorem. Michigan Mathematical
Journal, 24:2131, 1977.
[100] R.A. Brualdi. Introductory Combinatorics. Pearson Prantice Hall, forth edition, 2004.
[101] G. Van Brummelen. Heavenly Mathematics. Princeton University Press, 2013.
[102] C. Bruter. Mathematics and Modern Art, volume 18 of Springer Proceedings in mathematics. Springer,
2012.
[103] Z. Buczolich and R.D. Mauldin. Divergent square averages. Annals of Mathematics, 171, 2010.
[104] P. Bullen. Dictionary of Inequalities. CRC Press, 2 edition, 2015.
[105] L.A. Bunimovich. On the ergodic properties of nowhere dispersing billiards. Communications in Mathe-
matical Physics, 65:295312, 1979.
[106] E.B. Burger. Exploring the Number Jungle. AMS, 2000.
[107] E.B. Burger and R. Tubbs. Making Transcendence Transparent. Spiegel, 2004.
[108] W. Burnside. Theory of Groups of Finite Order. Cambridge at the University Press, 1897.

210
OLIVER KNILL

[109] W. Byers. How Mathematicians Think. Princeton University Press, 2007.

[110] A. Glavieux C. Berrou and P. Thitimajshima. Near Shannon limit error-correcting coding and decoding:
Turbo codes. IEEE proceedings, 1993.
[111] F. Marchetti C. Boldrighini C.M Keane. Billiards in polygons. The Annals of Probability, 6:532540, 1978.
[112] W.T. Haight II C. Cavagnaro. Classical and Theoretical mathematics. CRC Press, 2001.
[113] G. Cantor. Ueber unendliche, lineare Punktmannigfaltigkeiten, Mengenlehre aus den Jahren 1872-1884.
Teubner Archiv zur Mathematik. Springer, 1884.
[114] M. Capobianco and J.C. Molluzzo. Examples and Counterexamples in Graph Theory. North-Holland, New
York, 1978.
[115] O. Caramello. Theories, Sites, Toposes. Oxford University Press, 2018.
[116] J. Caristi. Fixed point theorems for mappings satisfying the inwardness conditions. Transactions of the
AMS, 215, 1976.
[117] L. Carleson and T.W. Gamelin. Complex Dynamics. Springer-Verlag, New York, 1993.
[118] L. Carroll. Alice's Adventures in Wonderland. D. Appleton and Co, 1866.
[119] E. Cartan. The theory of Spinors. Dover Publications, New York, 1981. In French: 1937, in English 1966,
Dover 1981.
[120] A-L. Cauchy. Cours d'Analyse. L'Imprimerie Royale, 1821.
[121] C.Clapman and J. Nicholson. Oxford Concise Dictionary of Mathematics. Oxford Paperback Reference.
Oxford University Press, 1990.
[122] A. Lax C.D. Olds and G. Davido. The Geometry of Numbers, volume 41 of Anneli Lax New Mathematical
Library. AMS, 2000.
[123] P. E. Ceruzzi. A History of Modern Computing. MIT Press, second edition, 2003.
[124] C.G. C.G. Lekkerkerker. Voorstelling van natuurlijke getallen door een som van getallen van bonacci.
Stichting Mathematisch Centrum. Zuivere Wiskunde., jan 1951.
[125] M. Chamberland. Single Digits. Princeton Univ. Press, 2015.
[126] V. Chandrasekar. The congruent number problem. Resonance, August, 1998.
[127] K. Chandrasekharan. Introduction to Analytic Number Theory, volume 148 of Grundlehren der mathema-
tischen Wissenschaften. Springer, 1968.
[128] G. Chartrand and P. Zhang. Chromatic Graph Theory. CRC Press, 2009.
[129] W.Y.C. Chen and R. P. Stanley. The g-conjecture for spheres. http://www.billchen.org/unpublished/g-
conjecture/g-conjecture-english.pdf, 2008.
[130] S-S. Chern. The geometry of G-structures. Bull. Amer. Math. Soc., 72:167219, 1966.
[131] N. Chernof and R. Markarian. Chaotic billiards. AMS, 2006.
[132] C. Chevalley. Theory of Lie Groups. Princeton University Press, 1946.
[133] C. Chevalley. The Algebraic Theory of Spinors and Cliord Algebras. Springer, 1995.
[134] J.R. Choksi and M.G.Nadkarni. Baire category in spaces of measures, unitary operators and transforma-
tions. In Invariant Subspaces and Allied Topics, pages 147163. Narosa Publ. Co., New Delhi, 1990.
[135] M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas. The strong perfect graph theorem. Annals
of Mathematics, 164:51229, 2006.
[136] P.G. Ciarlet. On korn's inequality. Chin. Ann. Math, 31B:607618, 2010.
[137] I. Ciufolini and J.A. Wheeler. Gravitation and inertia. Princeton Series in Physics, 1995.
[138] R. Cochrane. The secret Life of equations: the 50 greatest equations and how they work. Octopus Company,
2016.
[139] R. Cochrane. Math Hacks. Cassell Illustrated, 2018.
[140] E.A. Coddington and N. Levinson. Theory of Ordinary Dierential Equations. McGraw-Hill, New York,
1955.
[141] P.J. Cohen. The independence of the continuum hypothesis. Proc. Nat. Acad. Sci. USA, 6:11431148,
1963.
[142] P.J. Cohen. Set theory and the continuum hypothesis. W.A. Benjamin, inc, 1966.
[143] P.R. Comwell. Polyhedra. Cambridge University Press, 1997.
[144] P.E. Conner. On the action of the circle group. Mich. Math. J., 4:241247, 1957.
[145] A. Connes. Noncommutative geometry. Academic Press, 1994.
[146] K. Conrad. The character group of q. Math 248A course notes, Stanford, 2008.

211
FUNDAMENTAL THEOREMS

[147] K. Conrad. The origin of representation theory. https://kconrad.math.uconn.edu/articles/groupdet.pdf,

2010.
[148] K. Conrad. The congruent number problem. https://kconrad.math.uconn.edu/blurbs/ugradnumthy/congnumber.pdf,
2018.
[149] J. Conway. Universal quadratic forms and the fteen theorem. Contemporary Mathematics, 272:2326,
1999.
[150] J.B. Conway. Functions of One Complex Variable. Springer Verlag, 2. edition, 1978.
[151] J.B. Conway. A course in functional analysis. Springer Verlag, 1990.
[152] J.B. Conway. Mathematical Connections: A Capstone Course. American Mathematical Society, 2010.
[153] J.H. Conway. On Numbers and Games. A K Peters, Ltd, 2001.
[154] J.H. Conway and R.K. Guy. The book of numbers. Copernicus, 1996.
[155] J.H. Conway and N.J.A.Sloane. What are all the best sphere packings in low dimensions. Discr. Comp.
Geom., 13:383403, 1995.
[156] J.H. Conway and N.J.A. Sloane. Sphere packings, Lattices and Groups, volume 290 of A series of Com-
prehensive Studies in Mathematics. Springer Verlag, New York, 2.nd edition edition, 1993.
[157] N. Copernicus. De revolutionibus orbium coelestium. Norimbergae, Apud J. Petreium, 1543.
[158] I.P. Cornfeld, S.V.Fomin, and Ya.G.Sinai. Ergodic Theory, volume 115 of Grundlehren der mathematischen
Wissenschaften in Einzeldarstellungen. Springer Verlag, 1982.
[159] R. Courant. The existence of minimal surfaces of given topological structure under prescribed boundary
conditions. Acta Mathematica, 72:5198, 1940.
[160] R. Courant and H. Robbins. Was ist Mathematik. Springer, fth edition, 1941.
[161] T. Crilly. 50 mathematical ideas you really need to know. Quercus, 2007.
[162] D. Cristofaro-Gardiner and M. Hutchings. From one reeb orbit to two. https://arxiv.org/abs/1202.4839,
2014.
[163] M. Crofton. On the theory of local probability. applied to straight lines drawn at random in a plane.
Philosophical Transactions of the Royal Society of London, 158:181199, 1968.
[164] J.A. Cunge and W.H. Hager. Alexandre preissmann: his scheme and his career. Journal of Hydraulic
research, 53:413422, 2015.
[165] K.S. Thorne C.W. Misner and J.A. Wheeler. Gravitation. Freeman, San Francisco, 1973.
[166] H.L. Cycon, R.G.Froese, W.Kirsch, and B.Simon. Schrödinger Operatorswith Application to Quantum
Mechanics and Global Geometry. Springer-Verlag, 1987.
[167] D.Downing. Dictionary of Mathematical Terms. Barron's Educational Series, 1995.
[168] M. de Gosson. The symplectic camel principle and semicalssical mechanics. Journal of Physics A: Math-
ematical and General, 35(32):6825, 2002.
[169] W. de Melo and S. van Strien. One dimensional dynamics, volume 25 of Series of modern surveys in
mathematics. Springer Verlag, 1993.
[170] P. A. Deift. Applications of a commutation formula. Duke Math. J., 45(2):267310, 1978.
[171] C. Demeter. Pointwise convergence of the ergodic bilinear hilbert transform. Illinois J. Math, 51:1123
1158, 2007.
[172] M. Denker, C. Grillenberger, and K. Sigmund. Ergodic Theory on Compact Spaces. Lecture Notes in
Mathematics 527. Springer, 1976.
[173] E. Denne. Alternating quadrisecants of knots. Thesis at University of Illinois at Urbana-Champaign, 2004.
[174] Harvard Mathematics Department. Quals collection: 1990-2002. https://www.math.harvard.edu/media/quals90-
02.pdf.
[175] P. Desnanot. Complement de la Theorie des Equations du Premier Degré. Chez Volland Jeune, a Paris,
1819. page 152.
[176] L.E. Dickson. History of the theory of numbers.Vol. I:Divisibility and primality. Chelsea, New York, 1966.
[177] L.E. Dickson. History of the theory of numbers.Vol.II:Diophantine analysis. Chelsea, New York, 1966.
[178] R. Diestel. Graph theory, volume 173 of Graduate Texts in Mathematics. Springer, 5th edition, 2016.
[179] J. Dieudonné. Les determinants sur un corps non commutatif. Bulletin de la S.M.F., 71:2745, 1943.
[180] J. Dieudonné. A Panorama of Pure Mathematics. Academic Press, 1982.
[181] J. Dieudonné. Grundzüge der modernen Analysis. Vieweg, dritte edition, 1985.
[182] C. Ding, D. Pei, and A. Salomaa. Chinese Remainder Theorem, Applications in Computing, Coding
Cryptography. World Scientic, 1996.

212
OLIVER KNILL

[183] I. Dinur. The pcp theorem by gap amplication. Journal of the ACM, 54, 2007.
[184] G.A. Dirac. Ovals with equichordal points. Journal of the London Mathematical Society, 1-27:429437,
1952.
[185] P. Dirac. The evolution of the physicists's picture of nature. Scientic American, 208:4553, 1963.
[186] J. Dixmier. Von Neumann algebras, volume 27 of North-Holland mathematical library. North-Holland,
Amsterdam, 1981.
[187] DK. The Math Book: Big Ideas Simply Explained. DK Pub, 2019.
[188] M.P. do Carmo. Dierential Forms and Applications. Springer Verlag, 1994.
[189] C. L. Dodgson. Condensation of determinants, being a new and brief method for computing their arith-
metical values.Proceedings of the Royal Society, 15:150155, 1866-1967.
[190] A. Dold. Lectures on algebraic topology. Springer, 1980.
[191] V. Dolotin and A. Morozov. Universal Mandelbrot Set: Beginning of the Story. World Scientic, 2006.
[192] S. Donaldson and P. Kronheimer. The topology of four manifolds. Clarendon Press, 1990.
[193] J. Doob. Stochastic processes. Wiley series in probability and mathematical statistics. Wiley, New York,
1953.
[194] A. Douady and J.H. Hubbard. Étude dynamicque des polynômes complexes. Publ. Math. d'Orsay, 1er
partie 84-02, 2me partie 85-04, 2007.
[195] R. Douady. Application du théorème des tores invariantes. These 3 ème cycle, Université Paris VII, 1982.
[196] B.A. Dubrovin, A.T.Fomenko, and S.P.Novikov. Modern Geometry-Methods and Applications Part I,II,III.
Graduate Texts in Mathematics. Springer Verlag, New York, 1985.
[197] U. Dudley. A Budget of Trisectors. Springer Science and Business Media, LLC, 1987.
[198] S. Dumas. The KAM story, A friendly introduction to the content, history and signicance of the classical
Kolmogorov-Arnold-Moser theory. World Scientic Publishing Company, 2014.
[199] W. Dunham. Journey through Genius, The great theorems of Mathematics. Wiley Science Editions, 1990.
[200] H-D. Ebbinghaus. Zermelo and the heidelberg congress 1904. Historia Mathematica, 34:428432, 2007.
[201] H-D. Ebbinghaus and V. Peckhaus. Ernst Zermelo: An approach to his life and work. Springer, 2007.
[202] J.-P. Eckmann, H. Koch, and P. Wittwer. A computer-assisted proof of universality for area-preserving
maps. Memoirs of the AMS, 47:1122, 1984.
[203] J-P. Eckmann and D. Ruelle. Ergodic theory of chaos and strange attractors. Rev. Mod. Phys., 57:617656,
1985.
[204] B.G. Sidharth (Editor). A century of Ideas. Springer, 2008.
[205] K. Ito (Editor). Encyclopedic Dictionary of Mathematics (2 volumes). MIT Press, second edition edition,
1993.
[206] N.J. Higham (Editor). The Princeton Companion to Applied Mathematics. Princeton University Press,
2015.
[207] R. Brown (Editor). 30-Second Maths. Ivy Press, 2012.
[208] S.G. Krantz (editor). Comprehensive Dictionary of Mathematics. CRC Press, 2001.
[209] S.G. Krantz (Editor). Dictionary of Algebra, Arithmetic and Trigonometry. CRC Press, 2001.
[210] T. Gowers (Editor). The Princeton Companion to Mathematics. Princeton University Press, 2008.
[211] C.H. Edwards. The historical Development of the Calculus. Springer Verlag, 1979.
[212] D.A. Edwards. The structure of superspace. Studies in Topology, pages 121133, 1975.
[213] R.D. Edwards. Suspensions of homology spheres. https://arxiv.org/abs/math/0610573, 1970/2006.
[214] N.Simanyi E.Gutkin. Dual polygonal billiards and necklace dynamics. Commun. Math. Phys., 143:431
449, 1992.
[215] R. L. Eisenman. Classroom Notes: Spoof of the Fundamental Theorem of Calculus. Amer. Math. Monthly,
68(4):371, 1961.
[216] I. Ekeland. Sur les problems variationnels. C. R. Acad. Sei. Paris Ser. A-B, 275:10571059, 1972.
[217] I. Ekeland and R. Temam. Convex analysis and variational problems. 28, 1999.
[218] C. Elsholtz and T. Tao. Counting the number of solutions to the Erdös Straus equation on unit fractions.
Journal of the Australian Mathematical Society, 94:50105, 2013.
[219] R. Elwes. Math in 100 Key Breakthroughs. Quercus, 2013.
[220] R. Shakarchi E.M. Stein. Real analysis: measure theory, integration, and Hilbert spaces, volume 3 of
Princeton Lectures in Analysis. Princeton University Press, 2005.

213
FUNDAMENTAL THEOREMS

[221] P. Erdoes, C. Ko, and R. Rado. Intersection theorems for systems of nite sets. Quart. J. Math, 12:313320,
1961.
[222] M. Erickson. Beautiful Mathematics. MAA, 2011.
[223] J.-H. Eschenburg. New examples of manifolds with strictly positive curvature. Invent. Math, 66:469480,
1982.
[224] T. Rokicki et al. God's number is 20. http://www.cube20.org/, 2010.
[225] R.L. Eubank. A Kalman Filter Primer. Chapman and Hall, CRC, 2006.
[226] H. Eves. Great moments in mathematics (I and II. The Dolciani Mathematical Expositions. Mathematical
Association of America, Washington, D.C., 1981.
[227] K.J. Falconer. Fractal Geometry, Mathematical Foundations and Applications. Wiley, second edition edi-
tion, 2003.
[228] B. Farb and R.K. Dennis. Noncommutative Algebra. Springer, 1993.
[229] B. Farb and J. Wolfson. Resolvent degree, hilbert's 13th problem and geometry.
https://www.math.uchicago.edu/ farb/papers/RD.pdf, 2018.
[230] O. Faugeras. Three-dimensional computer vision: a geometric viewpoint. Cambridge MA: MIT Press,
Cambridge, MA, USA, second edition edition, 1996.
[231] H. Federer. Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften, Band 153.
Springer-Verlag New York Inc., New York, 1969.
[232] W. Feller. An introduction to probability theory and its applications, volume 1. John Wiley and Sons, 2nd
edition, 1968.
[233] E.A. Fellmann. Leonhard Euler. Birkhäuser, Basel, Boston, Berlin, 2007.
[234] B. Fine and G. Rosenberger. The Freiheitssatz and its extensions. In K. Kuiken W. Abiko, J.S. Birman,
editor, The Mathematical Legacy of Wilhelm Magnus, Groups, Geometry and Special Functions. AMS,
1994.
[235] B. Fine and G. Rosenberger. The Fundamental Theorem of Algebra. Undergraduate Texts in Mathematics.
Springer, 1997.
[236] K. Fink. A brief History of Mathematics. Open Court Publishing Co, 1900.
[237] G. Fischer. Mathematical Models. Springer Spektrum, 2 edition, 2017.
[238] H. Fischer. A history of the central limit theorem. Springer Verlag, 2011.
[239] P. Fischer and W.R. Smith, editors. Chaos, Fractals, and Dynamics, volume 98 of Lecture Notes in Pure
and Applied Mathematics. 1985.
[240] S. Fisk. A very short proof of Cauchy's interlace theorem for eigenvalues of Hermitian matrices.
https://arxiv.org/abs/math/0502408, 2005.
[241] A. Fomenko. Visual Geometry and Topology. Springer-Verlag, Berlin, 1994. From the Russian by Marianna
V. Tsaplina.
[242] A.T. Fomenko. The Plateau problem, Part I,II. Gordon and Breach, Science Publishers, 1990.
[243] J-P. Francoise, G.L. Naber, and T.S. Tsun. Encylopedia of Mathematical Physics. Elsevier, 2006.
[244] T. Frankel. Manifolds with positive curvature. Pacic J. Math., 1:165174, 1961.
[245] P. Frankl. A new short proof for the kruskal-katona theorem. Discrete Mathematics, 48:327329, 1984.
[246] T. Franzen. Gödel's Theorem. A.K. Peters, 2005.
[247] G. Frederickson. Dissections: Plane and Fancy. Cambridge University Press, 1997.
[248] N.A. Friedman. Introduction to Ergodic Theory. Van Nostrand-Reinhold, Princeton, New York, 1970.
[249] R. Fritsch and G. Fritsch. The four-color theorem. Springer-Verlag, New York, 1998. History, topological
foundations, and idea of proof, Translated from the 1994 German original by Julie Peschke.
[250] G. Frobenius. über matrizen aus positiven elementen i, ii. S.-B. kgl. Preuss. Akad. Berlin, pages 417476
and 514518, 1908 and 1909.
[251] G. Frobenius. über matrizen aus nicht negativen elementen. Sitzung der physikalisch-mathematischen
Classe, 23. Mai, 1912.
[252] M. Fujiwara. Über die mittelkurve zweier geschlossenen konvexen Curven in Bezug auf einen Punkt.
Tohoku Math. J., 10, 1916.
[253] H. Furstenberg. Recurrence in ergodic theory and combinatorial number theory. Princeton University Press,
Princeton, N.J., 1981. M. B. Porter Lectures.
[254] P. Gabriel. Des catégories abéliennes. Bulletin de la S.M.F., 90:323448, 1962.
[255] P. Gabriel. Unzerlegbare darstellungen i. Manuscripta Mathematica, 6:71103, 1972.

214
OLIVER KNILL

[256] L. Gamwell. Mathematics + Art, a cultural history. Princeton University Press, 2016.
[257] D.J.H. Garling. Cliord Algebras: An Introduction, volume 78 of London Mathematical Society. Cambridge
Unversity Press, 2011.
[258] T.A. Garrity. All the Mathematics you Missed. Cambridge University Press, 2002.
[259] W. Gautschi. Alexander m. ostrowski (1893-1986): His life, work and students. Expanded version of a
lecture presented at a meeting of the Ostrowski Foundation in Bellinzona, Switzerland, May 24-25, 2002,
2002.
[260] H. Geiges. An Introduction to Contact Geometry, volume 109 of Cambridge studies in advanced mathe-
matics. Cambridge University Press, 2005.
[261] B.R. Gelbaum and J.M.H. Olmsted. Theorems and Counterexamples in Mathematics. Springer, 1990.
[262] Israel M. Gelfand, Semen G. Gindikin, Victor W. Guillemin, Alexandr A. Kirillov, Bertram Kostant, and
Shlomo Sternberg. Izrail M. Gelfand : Collected Papers 1-III. Springer Collected Works in Mathematics.
Springer Verlag, 1988.
[263] I.M. Gessel and D. Zeilberger. Random walk in a weyl chamber. Proc. Am. Math. Soc, 115:27331, 1992.
[264] J.E. Littlewood G.H. Hardy and G. Polya. Inequalities. Cambridge at the University Press, 1959.
[265] M. Giaquinta and S. Hildebrandt. Calculus of variations. I,II, volume 310 of Grundlehren der Mathema-
tischen Wissenschaften. Springer-Verlag, Berlin, 1996.
[266] E. Girondo and G. González-Diez. Introduction to compact Riemann surfaces and dessins d'enfants, vol-
ume 79. Cambridge University Press, 2012.
[267] P. Glaister. Intersecting chords theorem: 30 years on. Mathematics in School, 36:2222, 2007.
[268] A.M. Gleason. Angle trisection, the heptagon and the triskaidecagon. American Mathematical Monthly,
95:185194, 1988.
[269] P. Glendinning. Math In Minutes. Quercus, 2012.
[270] K. Goedel. The consistency of the axiom of choice and of the generalized continuum hypothesis with the
axioms of set theory. Princeton University Press, 1940.
[271] P.G. Goerss and J.F. Jardine. Simplicial homotopy theory, volume 174 of Progress in Mathematics.
Birkhauser Verlag, Basel, 1999.
[272] D. Goldfeld. Beyond the last theorem. The Sciences, 3/4:4340, 1996.
[273] L.J. Goldstein. Analytic Number Theory. Prentice Hall, Englewood Clis, N.J., 1971.
[274] C. Golé. Symplectic Twist Maps, Global Variational Techniques. World Scientic, 2001.
[275] A.O. Golfond. Transcendental and Algebraic Numbers. Dover, New York, 1960.
[276] S.W. Golomb. Rubik's cube and quarks: Twists on the eight corner cells of rubik's cube provide a model
for many aspects of quark behavior. American Scientist, 70:257259, 1982.
[277] A.W. Goodman and G. Goodman. Generalizations of the theorems of pappus. The American Mathematical
Monthly, 76:355366, 1969.
[278] C. Gordon, D.L. Webb, and S. Wolpert. One cannot hear the shape of a drum. Bulletin (New Series) of
the American Mathematical Society, 27:134137, 1992.
[279] F.Q. Gouvea. p-adic Numbers. Springer, second edition, 1997.
[280] T. Gowers. Mathematics, a very short introduction. Oxford University Press, 2002.
[281] T. Gowers. What is deep mathematics? https://gowers.wordpress.com/2008/07/25/what-is-deep-
mathematics/, 2008.
[282] J.V. Grabiner. Who gave you the epsilon? cauchy and the origins of rigorous calculus.American Mathe-
matical Monthly, 90:185194, 1983.
[283] R. Graham. Rudiments of Ramsey Theory, volume 45 of Regional conference series in Mathematics. AMS,
1980.
[284] R. Graham. Some of my favorite problems in ramsey theory. Integers: electronic journal of combinatorial
number theory, 7, 2007.
[285] I. Grattan-Guinness. The Rainbow of Mathematics. W. W. Norton, Company, 2000.
[286] A. Gray. Tubes. Addison-Wesley Publishing Company Advanced Book Program, Redwood City, CA, 1990.
[287] B. Grechuk. Theorems of the 21st century. https://theorems.home.blog/theorems-list, 2020.
[288] B. Green and T. Tao. The primes contain arbitrarily long arithmetic progressions. Annals of Mathematics,
167:481547, 2008.
[289] J. Green and W.P. Heller. Mathematical analysis and convexity with applications to economics, volume 1.
North-Holland, Amsterdam, 1981.

215
FUNDAMENTAL THEOREMS

[290] P. Griths and J. Harris. Principles of Algebraic Geometry. Pure and Applied Mathematics. John Wiley
and Sons, 1978.
[291] D. Grinberg. The trace cayley-hamilton theorem. online notes, July 14, 2019.
[292] H. Groemer. Existenzsätze für Lagerungen im Euklidischen Raum. Mathematische Zeitschrift, 81:260278,
1963.
[293] B.H. Gross and D.B. Zagier. Heegner points and derivatives of l-series. Invent. Math, 84:225320, 1986.
[294] J. Gross and J. Yellen, editors. Handbook of graph theory. Discrete Mathematics and its Applications
(Boca Raton). CRC Press, Boca Raton, FL, 2004.
[295] K. Grove and K. Searle. Positively curved manifolds with maximal symmetry-rank. J. of Pure and Applied
Algebra., 91:137142, 1994.
[296] B. Gruenbaum. Is napoleon's theorem really napoleon's theorem. American Mathematical Monthly,
119:495501, 2012.
[297] B. Grünbaum. Are your polyhedra the same as my polyhedra? In Discrete and computational geometry,
volume 25 of Algorithms Combin., pages 461488. Springer, Berlin, 2003.
[298] B. Grünbaum. Convex Polytopes. Springer, 2003.
[299] B. Grünbaum and G.C. Shephard. Tilings and Patterns. Dover Publications, 2013.
[300] M. Günther. Isometric embeddings of riemannian manifolds. In Proceedings of the ICM, Vol. I, II (Kyoto,
1990), pages 11371143. Math. Soc. Japan, 1991.
[301] E. Gutkin. Billiards in polygons. Physica D, 19:311333, 1986.
[302] R. Guy. The strong law of small numbers. Amer. Math. Monthly, 95:697712, 1988.
[303] R. K. Guy. Unsolved Problems in Number Theory. Springer, Berlin, 3 edition, 2004.
[304] B. Gyires. On inequalities concerning the permanent of matrices. J. Comb. Inf. Syst. Sci., 2:107113,
1977.
[305] B. Gyires. The common source of several inequalities concerning doubly stochastic matrices. Publicationes
Mathematicae Institutum Mathematicum Universitatis Debreceniensis, 27:291304, 1980.
[306] B. Gyires. Elementary proof for a van der waerden's conjecture and related theorems. Computers and
Mathematics with Applications, 31:721, 1996.
[307] B. Gyires. Contribution to van der waerden's conjecture. Computers and Mathematics with Applications,
42:14311437, 2001.
[308] L. Halbeisen and N. Hungerbühler. Periodic billiard trajectories in obtuse triangles. SIAM Review, 42:657
670, 2000.
[309] T. Hales. A review of the lean theorem prover. https://jiggerwit.wordpress.com/2018/09/18, 2018.
[310] T.C. Hales. Jordan's proof of the Jordan curve theorem. Studies in logic, grammar and rhetorik, 10, 2007.
[311] G.R. Hall. Some examples of permutations modelling area preserving monotone twist maps. Physica D,
28:393400, 1987.
[312] P. Halmos. Lectures on ergodic theory. The mathematical society of Japan, 1956.
[313] P.R. Halmos. Naive set theory. Van Nostrand Reinhold Company, 1960.
[314] R.S. Hamilton. The inverse function theorem of nash and moser. Bull. Amer. Math. Soc. New Series,
7:65222, 1982.
[315] D. Hanson. On a theorem of Sylvester and Schur. Canad. Math. Bull., 16, 1973.
[316] G.H. Hardy. Divergent Series. AMS Chelsea Publishing, 1991.
[317] G.H. Hardy. A mathematician's Apology. Cambridge University Press, 1994.
[318] G.H. Hardy and M. Riesz. The general theory of Dirichlet's series. Hafner Publishing Company, 1972.
[319] G.H. Hardy and E.M. Wright. An Introduction to the Theory of Numbers. Oxford University Press, 1980.
[320] R. Hartley and A. Zissermann. Multiple View Geometry in computer Vision. Cambridge UK: Cambridge
University Press, 2003. Second edition.
[321] R. Hartshorne. Algebraic Geometry. Springer-Verlag, New York, 1977. Graduate Texts in Mathematics,
No. 52.
[322] A. Hatcher. Algebraic Topology. Cambridge University Press, 2002.
[323] F. Hausdor. Grundzüge der Mengenlehre. Verlag von Veit, Leipzig, 1914.
[324] F. Hausdor. Summationsmethoden und Momentfolgen, I,II. Mathematische Zeitschrift, 9:74109, 280
299, 1921.
[325] T. Hawkins. Emergence of the Theory of Lie Groups. Sources and Studies in the History of Mathematics
and Physical Sciences. Springer Verlag, 1 edition, 2000.

216
OLIVER KNILL

[326] T. Hawkins. The Mathematics of Frobenius in Context. Sources and Studies in the History of Mathematics
and Physical Sciences. Springer, 2013.
[327] T. Heath. The thirteen books of Euclid's elements, Vol 1-3. Cambridge University Press, 1908.
[328] T. Heath. A history of Greek Mathematics. Oxford at the Clarendon Press, 1921.
[329] G.A. Hedlund. Endomorphisms and automorphisms of the shift dynamical system. Math. Syst. Theor.,
3:320375, 1969.
[330] K. Heegner. Diophantische Analysis und Modulfuntionen. Math Zeitschrift, 56, 1952.
[331] S. Helgason. The Radon transform, volume 5 of Progress in mathematics. Birkhäuser, 1980.
[332] J.M. Henshaw. An equation for every occasion = 52 formulas + why they matter. Johns Hopkins University
Press, 2014.
[333] M. Herman. Sur la conjugaison diérentiable des diéomorphismes du cercle à des rotations. IHES, 49:5
233, 1979.
[334] M.R. Herman. Une méthode pour minorer les exposants deLyapounov et quelques exemples montrant le
caractère local d'un théorème d'Arnold et de Moser sur le tore de dimension 2. Commentarii Mathematici
Helvetici, 58:453502, 1983.
[335] R. Hersh. What is Mathematics, Really? Oxford University Press, 1997.
[336] D. Hilbert. Über die vollen Invariantensysteme. Math. Ann, 42:313373, 1893.
[337] D. HIlbert. über die gerade linie als kürzeste verbindung zweier punkte: Aus einem an herrn f. klein
gerichteten briefe. Letter from 14. August 1894, mar 1895.
[338] T.H. Hildebrandt and I.J. Schoenberg. On linear functional operations and the moment problem for a
nite interval in one or several dimensions. Annals of Mathematics, 34:317328, 1933.
[339] J. Hirschhorn, M.D. Hirschhorn, J.K. Hirschhorn, A.D. Hirschhorn, and P.M. Hirschhorn. The pizza
theorem. Austral. Math. Soc. Gaz., 26:120121, 1999.
[340] H. Hofer and E. Zehnder. Symplectic invariants and Hamiltonian Dynamics. Birkhäuser Advanced texts.
Birkhäuser, 1994.
[341] A.J. Homan and C.W. Wu. A simple proof of a generalized Cauchy-Binet theorem. American Mathe-
matical Monthly, 123:928930, 2016.
[342] D. R. Hofstadter. Goedel, Escher Bach, an eternal golden braid. Basic Books, 1979.
[343] J.B. Hogendijk. Al-mutaman ibn hud, 11the century king of saragossa and brilliant mathematician. His-
toria Mathematica, 22:118, 1995.
[344] A.N. Hone, H. Lundmark, and J. Szmigielski. Explicit multipeakon solutions of novikov's cubically non-
linear integrable camassa-holm type equation. https://arxiv.org/abs/0903.3663.
[345] A.N. Hone, H. Lundmark, and J. Szmigielski. The canada day theorem. The electronic journal of combi-
natorics, 20:116, 2013.
[346] H. Hopf. Über die Curvatura integra geschlossener Hyperächen. Math. Ann., 95(1):340367, 1926.
[347] H. Hopf. Vektorfelder in n-dimensionalen Mannigfaltigkeiten. Math. Ann., 96(1):225249, 1927.
[348] H. Hopf. Dierentialgeometrie und Topologische Gestalt. Jahresbericht der Deutschen Mathematiker-
Vereinigung, 41:209228, 1932.
[349] H. Hopf. Über die Drehung der Tangenten und Sehnen ebener Curven. Compositio Math., 2, pages 5062,
1935.
[350] H. Hopf. Eine verallgemeinerung bekannter Abbildungs- und überdeckungssätze. Portugaliae Math., 4:129
139, 1944.
[351] H. Hopf. Bericht über einige neue Ergebnisse in der Topologie. Revista Matematica Hispano-Americana
4. Serie Tomo VI, 6, 1946.
[352] H. Hopf. Sulla geometria Riemanniana globale della supercie. Rendiconti del Seminario matematico e
sico di Milano, pages 4863, 1953.
[353] H. Hopf. Zum cliord-kleinschen raumproblem. Mathematische Annalen, 95:313339, 1956.
[354] H. Hopf and W. Rinow. Ueber den Begri der vollständigen dierentialgeometrischen Fläche. Comment.
Math. Helv., 3(1):209225, 1931.
[355] W-Y. Hsiang and B. Kleiner. On the topology of positively curved 4-manifolds with symmetry. J. Di.
Geom., 29, 1989.
[356] L.K. Hua. Introduction to Number theory. Springer Verlag, Berlin, 1982.
[357] A. Hubacher. Instability of the boundary in the billiard ball problem. Communications in Mathematical
Physics, 108:483488, 1987.

217
FUNDAMENTAL THEOREMS

[358] G.P. Huet. A mechanization of type theory. In N. J. Nilsson, editor, Proc. 3rd Int. Joint Conf. on Articial
Intelligence, pages 139146. William Kaufmann, 1973. Part of Huet's thesis at Case western Reserve
University.
[359] J. Humphreys. Introduction to Lie Algebras and Representation Theory. Springer, third printing, revised
edition, 1972.
[360] W. Hurewicz. Homotopy and homology. In Proceedings of the International Congress of Mathematicians,
Cambridge, Mass., 1950, vol. 2, pages 344349. Amer. Math. Soc., Providence, R. I., 1952.
[361] M. Hutchings. Taubes proof of the weinstein conjecture in dimension three. Bull. Amer. Math. Soc.,
47:73125, 2010.
[362] M. Hutchings. Fun with symplectic embeddings. Slides from Frankferst, Feb 6, 2016.
[363] M. Hutchings and C.H. Taubes. The Weinstein conjecture for stable Hamiltonian structures. Geom. Topol.,
13(2):901941, 2009.
[364] P. V. M. Blagojevi¢ I. Bárány and G.M. Ziegler. Tverberg's theorem at 50: Extensions and counterexam-
ples. Notices of the AMS, pages 732739, 2016.
[365] H.S. Zuckerman I. Niven and H.L. Montogmery. An introduction to the theory of numbers. Wiley, 1991.
[366] O. Lanford III. A computer-assisted proof of the feigenbaum conjectures. Bull. Amer. Math. Soc., 6:427
434, 1982.
[367] O.E. Lanford III. A shorter proof of the existence of the Feigenbaum xed point. Commun. Math. Phys,
96:521538, 1984.
[368] L. Illusie. What is a topos? Notices of the AMS, 51, 2004.
[369] I.R.Shafarevich and A.O. Remizov. Linear Algebra and Geometry. Springer, 2009.
[370] R. Girgensohn J. Borwein, D.Bailey. Experimentation in Mathematics. A.K. Peters, 2004. Computational
Paths to Discovery.
[371] P.C. Jackson. Introduction to articial intelligence. Dover publications, 1985. Second, Enlarged Edition.
[372] T. Jackson. An illustrated History of Numbers. Shelter Harbor Press, 2012.
[373] C.G. Jacobi. über die pfasche methode eine gewöhnliche lineare dierentialgleichung zwischen 2n vari-
abeln durch ein system von gleichungen zu integrieren. Journal für die reine und angewandte Mathematik,
347, 1827. Reprinted in. G.J. Jacobi's Gesammelte Werke (1886).
[374] T. Jech. The axiom of choice. Dover, 2008.
[375] M. Jeng and O. Knill. Billiards in the lp unit balls of the plane. Chaos, Fractals, Solitons, 7:543545,
1996.
[376] C. Goodman-Strauss J.H. Conway, H.Burgiel. The Symmetries of Things. A.K. Peterse, Ltd., 2008.
[377] S. Jitomirskaya. Metal-insulator transition for the almost Mathieu operator. Annals of Mathematics,
150:11591175, 1999.
[378] S.G.B. Johnson and S. Steinerberger. Intuitions about mathematical beauty: a case study in the aesthetic
experience of ideas. Cognition, 242, 2019.
[379] M.C. Jordan. Cours d'Analyse, volume Tome Troisieme. Gauthier-Villards,Imprimeur-Libraire, 1887.
[380] D. D. Joseph, K. Burns, David Rand, and Lai-Sang Young (editors). Dynamical Systems and Turbulence,
Warwick 1980: Proceedings of a Symposium Held at the University of Warwick 1979/80. Lecture Notes
in Mathematics 898. Springer-Verlag, 1 edition, 1981.
[381] J. Jost. Riemannian Geometry and Geometric Analysis. Springer Verlag, 2005.
[382] D. Joyner. Adventures in Group Theory, Rubik's Cube, Merlin's Machine and Other Mathematical Toys.
Johns Hopkins University Press, second edition, 2008.
[383] M. Kac. Can one hear the shape of a drum? Amer. Math. Monthly, 73:123, 1966.
[384] C.H. Kahn. Pythagoras and the Phythagoreans, A brief history. Hackett Publishing Company, 2001.
[385] R.E. Kalman. A new approach to linear ltering and prediction problems. Transactions of the ASME
Journal of Basic Engineering, 82(Series D):3545, 1960.
[386] L.A. Kaluzhnin. The Fundamental Theorem of Arithmetic. Little Mathematics Library. Mir Publishers,
Moscow, 1979.
[387] G.A. Kandall. Euler's theorem for generalized quadrilaterals. College Mathematics Journal, 33:403404,
2002.
[388] M. Kaoubi. K-Theory. Springer Verlag, 1978.
[389] A. Katok. Fifty years of entropy in dynamics: 1958-2007. Journal of Modern Dynamics, 1:545596, 2007.

218
OLIVER KNILL

[390] A. Katok and B. Hasselblatt. Introduction to the modern theory of dynamical systems, volume 54 of
Encyclopedia of Mathematics and its applications. Cambridge University Press, 1995.
[391] A. Katok and J.-M. Strelcyn. Invariant manifolds, entropy and billiards, smooth maps with singularities,
volume 1222 of Lecture notes in mathematics. Springer-Verlag, 1986.
[392] A.B. Katok and A.M. Stepin. Approximations in ergodic theory. Russ. Math. Surveys, 22:77102, 1967.
[393] A.B. Katok and A.M. Stepin. Metric properties of measure preserving homemorphisms. Russ. Math.
Surveys, 25:191220, 1970.
[394] S. Katok. p-adic Analysis Compared with Real. AMS, 2007.
[395] M.G. Katz. Systolic Geometry and Topology, volume 137 of Mathematical Surveys and Monographs. AMS,
2017.
[396] V. Katz. The history of Stokes theorem. Mathematics Magazine, 52, 1979.
[397] V. Katz. Mathematics of Egypt, Mesopotamia, China, India and Islam. Princeton Univ. Press, 2007.
[398] V.J. Katz. The history of stokes theorem. Mathematics Magazine, 52:146156, 1979.
[399] Y. Katznelson. An introduction to harmonic analysis. John Wiley and Sons, Inc, New York, 1968.
[400] A.S. Kechris. Classical Descriptive Set Theory, volume 156 of Graduate Texts in Mathematics. Springer-
Verlag, Berlin, 1994.
[401] K. Kendig. Never a dull moment, Hassler Whitney, Mathematics Pioneer, volume 93 of MAA Spectrum.
MAA Press, 2018.
[402] M. Kernaghan. Bell-kochen-specker theorem for 20 vectors. Journal of Physics A: Mathematical and
General, 27(21):L829L830, 1994.
[403] V.P. Khavin and N.K. Nikol'skij. Commutative Harmonic Analysis I. Springer, 1991.
[404] A.Ya. Khinchin. Continued Fractions. Dover, third edition, 1992.
[405] W. Killing. Ueber die cliord-klein'schen raumformen. Mathematische Annalen, 39:257278, 1891.
[406] L. Kirby and J. Paris. Accessible independence results for peano arithmetic. Bull. London Math. Soc,
90:669675, 1983.
[407] W.A. Kirk. A xed point theorem for mappings which do not increase distances. American Mathematical
Monthly, 72:10041006, 1965.
[408] D.A. Klain and G-C. Rota. Introduction to geometric probability. Lezioni Lincee. Accademia nazionale dei
lincei, 1997.
[409] F. Klein. Vorlesungen über die Entwicklung der Mathematik im 19. Jahrhundert. Springer, 1979 (originally
1926).
[410] P. Klemperer. Auctions: Theory and Practice. Princeton University Press, 2004.
[411] J.R. Kline. What is the Jordan curve theorem? American Mathematical Monthly, 49:281286, 1942.
[412] M. Kline. Mathematical Thought from Ancient to Modern Time. Oxford University Press, 1972.
[413] W. Klingenberg. Lectures on closed geodesics, volume 230 of Grundlehren der mathematischen Wis-
senschaften. 1978.
[414] S. Klymchuk. Counterexamples in Calculus. MAA, 2010.
[415] O. Knill. From the sphere to the mandelbulb. talk May 18, 2013, http://www.math.harvard.edu/ knill/s-
lides/boston/sic.pdf.
[416] O. Knill. Positive lyapunov exponents for a dense set of bounded measurable sl(2,r)-cocycles. Ergodic
Theory and Dynamical Systems, 12(2):319331, 1992.
[417] O. Knill. Singular continuous spectrum and quantitative rates of weakly mixing. Discrete and continuous
dynamical systems, 4:3342, 1998.
[418] O. Knill. Weakly mixing invariant tori of hamiltonian systems. Commun. Math. Phys., 205:8588, 1999.
[419] O. Knill. A multivariable chinese remainder theorem and diophantine approximation.
http://people.brandeis.edu/ kleinboc/EP0405/knill.html, 2005.
[420] O. Knill. Probability Theory and Stochastic Processes with Applications. Overseas Press, 2009.
[421] O. Knill. A discrete Gauss-Bonnet type theorem. Elemente der Mathematik, 67:117, 2012.
[422] O. Knill. A multivariable chinese remainder theorem.
https://arxiv.org/abs/1206.5114, 2012.
[423] O. Knill. A Cauchy-Binet theorem for Pseudo determinants. Linear Algebra and its Applications, 459:522
547, 2014.
[424] O. Knill. Some fundamental theorems in mathematics.
https://arxiv.org/abs/1807.08416, 2018.

219
FUNDAMENTAL THEOREMS

[425] O. Knill. Top 10 fundamental theorems.

https://www.youtube.com/watch?v=Dvj0gfajdNI, 2018.
[426] O. Knill, J. Carlsson, A. Chi, and M. Lezama. An articial intelligence experiment in college math
education. http://www.math.harvard.edu/˜knill/preprints/soa.pdf, 2003.
[427] O. Knill and O. Ramirez-Herran. On Ullman's theorem in computer vision. Retrieved February 2, 2009,
from http://arxiv.org/abs/0708.2442, 2007.
[428] O. Knill and E. Slavkovsky. Visualizing mathematics using 3d printers. In C. Fonda E. Canessa and
M. Zennaro, editors, Low-Cost 3D Printing for science, education and Sustainable Development. ICTP,
2013. ISBN-92-95003-48-9.
[429] D. Knuth. The art of computer programming, I-III. Addison-Wesley, 1981,1997,1998.
[430] D. Knuth. Fibonacci multiplication. Appl. Math. Lett., 1:5760, 1988.
[431] D. Knuth. The sandwich theorem. Electronic journal of Combinatorics, 1, 1994.
[432] D. Knuth. Overlapping pfaans. Electronic Journal of Combinatorics, 3, 1996.
[433] D. E. Knuth. Surreal numbers. Addison-Wesley Publishing Co., Reading, Mass.-London-Amsterdam, 1974.
[434] S. Kobayashi. Fixed points of isometries. Nagoya Math. J., 13:6368, 1958.
[435] S. Kochen and E.P. Specker. The problem of hidden variables in quantum mechanic. Journal of Mathe-
matics and Mechanics, 17:5987, 1967.
[436] T.W. Koerner. Fourier Analysis. Cambridge University Press, 1988.
[437] E. Kohlberg and J.W. Pratt. The contraction mapping approach to the perron-frebenius theory: Why
hilbert's metric? Mathematics of Operations Research, 7:198210, 1982.
[438] A.N. Kolmogorov. Grundbegrie der Wahrscheinlichkeitsrechnung. Berlin, 1933. English: Foundations of
Probability Theory, Chelsea, New York, 1950., 1933.
[439] R. Kolodziej. The antibilliard outside a polygon. Bull. Polish. Acad. Sci. Math., 37:163168, 1990.
[440] N. Kondratieva. Three most beautiful mathematical formulas.
http://fy.chalmers.se/˜tfkhj/BeautifulFormulas.pdf, 2006.
[441] A. Korn. Ueber einige ungleichungen, welche in der theorie der elastischen und elektrischen schwingungen
eine rolle spielen. Bulletin internationale de l'Academie de Sciences de Cracovie, pages 705724, 1909.
[442] J. Kottke. The Lebowski theorem of machine superintelligence. https://kottke.org/18/04/the-lebowski-
theorem-of-machine-superintelligence, Apr 16, 2018.
[443] S. Krantz. Function Theory of Several Complex Variables. AMS Chelsea publishing, 2 edition, 2001.
[444] S.G. Krantz. Mathematical Apocrypha. AMS, 2002.
[445] S.G. Krantz. Mathematical Apocrypha Redux. AMS, 2005.
[446] S.G. Krantz and H. Parks. A mathematical Odyssey: journal from the real to the complex. Springer Verlag,
2014.
[447] S.G. Krantz and H.G. Parks. The Implicit Function Theorem, History, Theory and Applications. Modern
Birkhäuser Classics. Birkhäuser, 2013.
[448] S.N. Krivoshapko and N.N. Ivanov. Encyclopedia of Analytical Surfaces. Springer, 2015.
[449] R. Krömer. Tool and Object: A history and Philosophy of Category Theory. Birkhäuser Verlag, 2007.
[450] H. W. Kuhn and S. Nasar. The Essential Nash. Princeton University Press, 2002.
[451] T. Kuhn. The Structure of Scientic Revolutions, volume 2 of International Encyclopedia of Unied
Science. University of Chicago Press, second edition, enlarged edition, 1970.
[452] G. Kuperberg. Another proof of the alternating-sign matrix conjecture. Internat. Math. Res. Notices,
pages 139150, 1996.
[453] M. Lacey. The bilinear maximal functions map into lp for 2/3 < p ≤ 1. Ann. of Math, 151:3557, 2000.
[454] M. Lackenby. The whitney trick. Topology and its Applications, 71(2):115 118, 1996.
[455] J. Lagarias. Facts and conjectures about factorizations of bonacci and lucas numbers. Edouard Lucas
Memorial lecture, Rochester Institute of Technology, 2014.
[456] J. C. Lagarias. Sets of primes dividing the lucas numbers has density 2/3. Pacic J. Math, 118:449462,
1985.
[457] I. Lakatos. Proofs and Refutations. Cambridge University Press, 1976.
[458] S. Mac Lane. Categories for the Working Mathematician. Springer, 1998.
[459] O. Lanford. Lecture notes in dynamical systems. ETH Zuerich, 1987.
[460] S. Lang. Algebraic number theory, volume 110 of Graduate Texts in Mathematics. Springer, 1994.
[461] M. Lange. Depth and explanation in mathematics. Philosophia Mathematica, 2014.

220
OLIVER KNILL

[462] A.N. Langville and C.D. Meyer. Google's page rank and beyond: the science of search engine rankings.
Princeton University Press, 2006.
[463] H.B. Lawson and M-L. Michelsohn. Spin Geometry. Princeton University Press, 1989.
[464] P.D. Lax. Approximation of measure preserving transformations. Commun. Pure Appl. Math., 24:133135,
1971.
[465] T. Leinster. Rethinking set theory.
https://arxiv.org/abs/1212.6543, 2012.
[466] S. Lem. The futurological congress. A Harvest book, 1985.
[467] B. Lemmens and R. Nussbaum. Birkho 's version of hilbert's metric and its applications in analysis. In
M. Troyanov A. Papadapoulous, editor, Handbook of Hilbert Geometry, volume 22 of IRMA Lectures in
Mathematics and Theoretical Physics. European Mathematical Society, 2014.
[468] H.W. Lenstra. Solving the Pell equation. Algorithmic Number Theory, 44, 2008.
[469] P. Lévy. Sur la convergence absolue des séries de Fourier. Compositio Mathematica, 1:114, 1935.
[470] L. Li. On moser's boundedness problem of dual billiards. Ergodic Theory Dynam. Systems, 29:613635,
2009.
[471] E.H. Lieb and M. Loss. Analysis, volume 14 of Graduate Studies in Mathematics. American Mathematical
Society, 1996.
[472] J. W. Lindeberg. Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung. An-
nales Academiae Scientiarum Fennica, 15:211224, 1922.
[473] M. Livio. The equation that couldn't be solved. Simon and Schuster, 2005.
[474] M. Livio. Is God a mathematician. Simon and Schuster Paperbacks, 2009.
[475] L.L. Locke. The ancient quipu, a peruvian knot record. American Anthropologist, 14:325332, 1912.
[476] L. Lovasz. On the Shannon capacity of a graph. IEEE Transactions on Information Theory, 25:17, 1979.
[477] E. Lucas. The theory of simply periodic numerical functions. American Journal of Mathematics, 1:184
240,289321, 1878. Translated and reprinted by the Fibonacci Association, 1969.
[478] E. Lucas. Récréations Mathématiques. Librairie scientique et technique Albert Blanchard, 1891. Volumes
1,2 and 3.
[479] J. Lurie. Kerodon. https://kerodon.net, 2020.
[480] S. Morrison M. Feedman, R. Gompf and K. Walker. Man and machine thinking about the smooth 4-
dimensional poincaré conjecture. https://arxiv.org/pdf/0906.5177.pdf, 2009.
[481] J. Strother Moore M. Kaufmann, P. Manolios. Computer-Aided Reasoning: ACL2 Case Studies, volume 4
of Advances in Formal Methods. Springer, 2000.
[482] J. MacCormick. 9 Algorithms that changed the future. Princeton University Press, 2012.
[483] P. Maddy. Defending the axioms. Oxford University Press, 2011.
[484] A. R. Magid. Dierential galois theory. Notices of the AMS, October:10411049, 1999.
[485] Andy R. Magid. Lectures on dierential Galois theory. University lecture series 7. American Mathematical
Society, 1994.
[486] B.B. Mandelbrot. The Fractal Geometry of Nature. W.H. Freeman and Company, 1982.
[487] B.B. Mandelbrot. Fractals and Chaos: The Mandelbrot Set and Beyond. Springer Verlag, 2004.
[488] E. Maor. The Pythagorean Theorem: A 4000 year history. Princeton University Press, 2007.
[489] R. Marchthaler and S. Dingler. Kalman-Filter. Springer Verlag, 2017.
[490] P. Maritz and S. Mouton. Francis Guthrie: A colourful life. Mathematical Intelligencer, 34, 2012.
[491] D. Marker. Model Theory: An introduction, volume 217 of Graduate Texts in Mathematics. Springer, 2002.
[492] D.A. Martin and R.M. Solovay. Internal cohen extensions. Ann. Math. Logic., 2:143178, 1970.
[493] R. Martinez. Proofs that every mathematician should know.
https://math.stackexchange.com/questions/178940/proofs-that-every-mathematician-should-know, 2018.
[494] J-C. Martzlo. A history of Chinese Mathematics. Springer Verlag, second edition, 1997.
[495] E. Maskin. Commentary: Nash equilibrium and mechanism design. Games and Economic Behavior, 71:9
11, 2011.
[496] H. Masur. Closed trajectories for quadratic dierentials with an application to billiards. Duke Math. J,
53:307314, 1986.
[497] J. Matousek. Thirty-three Miniatures, volume 53 of Student Mathematical Library. AMS, 2010.
[498] J.P. May. Simplicial Objects in Algebraic Topology. Chicago lectures in Mathematics. University of Chicago
Press, 1967.

221
FUNDAMENTAL THEOREMS

[499] J. Maynard. The twin prime conjecture. Jpn. J. Math., pages 175206, 2019.
[500] B. Mazur. Imagining Numbers. Penguin Books, 2003.
[501] J. Mazur. Zeno's paradox, Unraveling the Ancient Mystery behind the science of Space and time. Plume
Book, 2007.
[502] J. Mazur. Fluke, the math and myth of coincidence. Basic Books, 2016.
[503] L.F. McAuley. A topological reeb-milnor-rosen theorem and characterizations of manifolds. Bulletin of
the AMS, 78, 1972.
[504] D. McDu. What is sympectic geometry. Talk of March 31, 2009, 2009.
[505] D. McDu and D. Salamon. Introduction to Symplectic Topology. Clarendon Press, 1998.
[506] C. McLarty. The uses and abuses of the history of topos theory. Brit. J. Phil. Sci, pages 351375, 1990.
[507] C. McLarty. Elementary Categories, Elementary Toposes. Clarendon Press, 1992.
[508] C. McLarty. The rising sea: Grothendieck on simplicity and generality. In Episodes in the history of
modern algebra (18001950), volume 32 of Hist. Math., pages 301325. Amer. Math. Soc., Providence,
RI, 2007.
[509] C. McMullen. Riemann surfaces, dynamics and geometry. Course Notes, Harvard University, 2014.
[510] C.T. McMullen. Complex Dynamics and Renormalization, volume 135 of Annals of Mathematics Studies.
Princeton University Press, Princeton, 1994.
[511] K. Menger. Zur allgemeinen kurventheorie. Fund. Math, 10:96115, 1927.
[512] R. Merris. Graph theory. Interscience Series in discrete mathematics and optimization. Wiley, 2001.
[513] Singer M.F. Introduction to the galois theory of linear dierential equations.
https://arxiv.org/abs/0712.4124, 2008.
[514] P.W. Michor. Elementary Catastrophe Theory. Tipograa Universitatii din Timisorara, 1985.
[515] P. Milgrom. Putting auction theory to work. 2004.
[516] M.J. Miller. On sendov's conjecture for roots near the unit circle. J. Math. Anal. Appl., 175:632639, 1993.
[517] J. Milnor. On the total curvature of knots. Annals of Mathematics, 52:248257, 1950.
[518] J. Milnor. Topology from the dierential viewpoint. University of Virginia Press, Charlottesville, Va, 1965.
[519] J. Milnor. Self-similarity and hairiness in the mandelbrot set. In M. C. Tangora, editor, Computers in
Geometry and Topology, volume 114 of Lect. Notes in Pure and Appl. Math, pages 211257, 1989.
[520] J. Milnor. Dynamics in one complex variable. Introductory Lectures, SUNY, 1991.
[521] J.W. Milnor. On manifolds homeomorphic to the 7-sphere. Annals of Mathematics, 64:399405, 1956.
[522] H. Minc. Permanents, volume 6 of Encyclopedia of Mathematics and its applications. Addison-Wesley
Publishing Company, 1978.
[523] H. Minc. Nonnegative Matrices. John Wiley and Sons, 1988.
[524] M. Minsky. The Society of Mind. A Touchstone Book, published by Simon @ Shuster Inc, New York,
London, Toronto, Sydney and Tokyo, 1988.
[525] C. Misbah. Complex Dynamics and Morphogenesis and Morphogenesis, An Introduction to Nonlinear
Science. Springer, 2017.
[526] U. Montano. Explaining Beauty in Mathematics: An Aesthetic Theory of Mathematics, volume 370 of
Synthese Library. Springer, 2013.
[527] J.W. Morgan. The poincaré conjecture. In Proceedings of the ICM, 2006, pages 713736, 2007.
[528] J.W. Morgan and G. Tian. Ricci Flow and the Poincaré Conjecture, volume 3 of Clay Mathematics
Monographs. AMS, Clay Math Institute, 2007.
[529] D. Morley. Categoricity in power. Trans. Am. Math. Soc., 114:514538, 1965.
[530] J. Moser. Stable and random Motion in dynamical systems. Princeton University Press, Princeton, 1973.
[531] J. Moser. Is the solar system stable? The Mathematical Intelligencer, 1:6571, 1978.
[532] J. Moser. Selected chapters in the calculus of variations. Lectures in Mathematics ETH Zürich. Birkhäuser
Verlag, Basel, 2003. Lecture notes by Oliver Knill.
[533] D. Mumford. O. zariski. National Academy of Sciences, 2013.
[534] K.G. Murty. Linear Programming. Wiley, 1983.
[535] M.Ram Murty and A Pacelli. Quadratic reciprocity via theta functions. Proc. Int. Conf. Number Theory,
1:107116, 2004.
[536] S. Pawar M.V. Patil and Z. Saquib. Coding techniques for 5g networks: A review. pages 208213, 2020.
2020 3rd International Conference on Communication System, Computing and IT Applications (CSCITA).
[537] E. Nagel and J.R. Newman. Gödel's proof. New York University Press, 2001.

222
OLIVER KNILL

[538] P.J. Nahin. In Pursuit of Zeta-3, The world's most mysterious unsolved math problem. Princeton Univer-
sity Press, 2021.
[539] I.P. Natanson. Constructive Function theory. Frederick Ungar Publishing, 1965.
[540] E. Nelson. Internal set theory: A new approach to nonstandard analysis. Bull. Amer. Math. Soc, 83:1165
1198, 1977.
[541] J. Neuberger. The continuous Newton's method, inverse functions, and Nash-Moser. Amer. Math. Monthly,
114(5):432437, 2007.
[542] J. Neukirch. Algebraische Zahlentheorie. Springer, 1999.
[543] B.H.R. Neumann. Sharing ham and eggs. 1959.
[544] J. Von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Princeton University
Press, 1953.
[545] J. Neunhäuserer. Schöne Sätze der Mathematik. Springer Spektrum, 2017.
[546] D.J. Newman. A simple proof of wiener's 1/f theorem. Proceedings of the AMS, 48, 1975.
[547] D.J. Newman. Analytic proof of the prime number theorem. American Mathematical Monthly, 87:693696,
1980.
[548] T.Lewis N.I. Fisher and B.J. Embleton. Statistical analysis of spherical data. Cambridge University Press,
1987.
[549] C. Mouhot (notes by T. Feng). Analysis of partial dierential equations. Lectures at Cambridge, 2013.
[550] M. Nowak. Evolutionary Dynamics: Exploring the Equations of Life. Harvard University Press, 2006.
[551] Editors of Mathematical Reviews and zbMath. Msc2020-mathematical sciences classication system. 2020.
[552] O. Ore. The Four-Color Problem. Academic Press, 1967.
[553] M.I. Voitsekhovskii (originator). Korn inequality. Encyclopedia of Mathematics, Accessed Oct 7. 2020,
2020.
[554] A. Ostrowski. Über einige Lösungen der Funktionalgleichung f(x) f(y)=f(xy). Acta Math., 41:271284,
1916.
[555] V. Ovsienko. Ramanujan's cubic composition formula. Mathematical Intelligencer, 44:212214, 2022.
[556] J. Owen. The grammar of Ornament. Quaritch, 1910.
[557] J.C. Oxtoby. Measure and Category. Springer Verlag, New York, 1971.
[558] A.W. Goodman P. Erdös and L. Pósa. The respresentation of a graph by a set intersections. Canadian
Journal of Mathematics, 18:106112, 1966.
[559] R.S. Palais. Seminar on the Atiyah-Singer Index Theorem, volume 57 of Annals of Mathematics Studies.
Princeton University Press, 1965.
[560] R.S. Palais. A simple proof of the Banach contraction principle. Journal Fixed point theory and Applica-
tions, 2:221223, 2007.
[561] R.S. Palais and S. Smale. A generalized morse theory. Bull. Amer. Math. Soc., 70:165172, 1964.
[562] V. Papanicolaou. The weierstrass p-function of the hexagonal lattice. https://arxiv.org/abs/2105.04307v1,
2021.
[563] J. Paris and L. Harrington. A mathematical incompleteness in peano arithmetic. In J. Barwise, editor,
Handbook of Mathematical Logic, pages 11331142, 1977.
[564] E. Pariser. The Filter Bubble. Penguin Books, 2011.
[565] B. Parry and D. Sullivan. A topological invariant of ows on 1-dimensional spaces. Topology, 14:297299,
1975.
[566] H-O. Peitgen and P.H.Richter. The beauty of fractals. Springer-Verlag, New York, 1986.
[567] S. Peluse. An asymptotic version of the prime power conjecture for perfect dierence sets.
https://arxiv.org/abs/2003.04929, 2020.
[568] A Peres. Two simple proofs of the kochen-specker theorem. Journal of Physics A: Mathematical and
General, 24(4):L175L178, 1991.
[569] O. Perron. Zur theorie der matrices. Math. Ann, 64:248263, 1907.
[570] F. Peter and H. Weyl. Die Vollständigkeit der primitiven Darstellungen einer geschlossenen kontinuier-
lichen Gruppe. Math. Ann., 97:737755, 1927.
[571] P. Petersen. Riemannian Geometry. Springer Verlag, third and second 2006 edition, 2016.
[572] C. Petzold. The Annotated Turing: A Guided Tour Through Alan Turing's Historic Paper on Computabil-
ity and the Turing Machine. Wiley Publishing, 2008.
[573] R. R. Phelps. Lectures on Choquet's theorem. D.Van Nostrand Company, Princeton, New Jersey, 1966.

223
FUNDAMENTAL THEOREMS

[574] K. Phillips. A note on local eld duality. Rocky Mountain Journal of Mathematics, 9, 1978.
[575] G. Pick. Geometrisches zur Zahlenlehre.Sitzungsberichte des deutschen naturwissenschaftlich-
medicinischen Vereines für Böhmen `Lotos` in Prag, Band XIX, 1899.
[576] C.A. Pickover. Loom of Good. Sterling, New York, 2009.
[577] C.A. Pickover. The Math Book. Sterling, New York, 2012.
[578] B.C. Pierce. Basic Category Theory for Computer Scientists. MIT Press, 1991.
[579] Josep Pla i Carrera. The fundamental theorem of algebra before Carl Friedrich Gauss. Publicacions
Matemàtiques, 36(2B):879911 (1993), 1992.
[580] H. Poincaré. Sur les courbes dénies par les équations dierentielles. Journ. de Math, 4, 1885.
[581] M. Pollicott. Lectures on ergodic theory and Pesin theory on compact manifolds. London Mathematical
Society Lecture notes 180. Cambridge university Press, 1993.
[582] B. Polster and M. Ross. Math Goes to the Movies. Johns Hopkins University Press, 2012.
[583] L. Polterovich. The geometry of the group of symplectic dieomorphisms. Lectures in Mathematics.
Springer Verlag, 2001.
[584] A.S. Posamentier. The Pythagorean Theorem: The Story of its Power and Beauty. Prometheus, 2010.
[585] T. Poston and I. Stewart. Catastrophe theory and its applications. Pitman, 1978.
[586] A. Preissmann. Quelques propriétés globales des espaces de riemann. Commentarii Mathematici Helvetici,
15:175216, 1942.
[587] J. Propp. What I learned from Richard Stanley.
https://arxiv.org/abs/1501.00719, 2015.
[588] P.M. Pu. Some inequalities in certain nonorientable Riemannian manifolds. Pacic J. Math., 2:5571,
1952.
[589] T. Puettmann and C. Searle. The Hopf conjecture for manifolds with low cohomogeneity or high symmetry
rank. Proc. of the AMS, 130, 2012.
[590] E. Kaplan R. Kaplan. Hidden Harmonies. Bloomsbury, 2011.
[591] P.H. Rabinowitz. Minimax Methods in Critical Point Theory with applications to dierential equations,
volume 65 of Regional Conference Series in Mathematics. AMS, 1986.
[592] Q.I. Rahman and G. Schmeisser. Analytic theory of polynomials, volume 26 of London Mathematical
Society Monographs. Oxford University Press, new series edition, 2002.
[593] A.R. Rajwade and A.K. Bhandari. Surprises and Counterexamples in Real Function Theory. Hindustan
Book Agency, 2007.
[594] F. Rannou. Numerical study of discrete area-preserving mappings. Acta Arithm, 31:289301, 1974.
[595] G. Razvan and T. Andreescu. Putnam and beyond. Springer, 2007.
[596] G. Reeb. Sur certaines proprietes topologiques des varieties feuilletes. Publ.Inst.Math.Univ.Strasbourg,
pages 91154, 1952.
[597] M. Reed and B. Simon. Methods of modern mathematical physics. Academic Press, Orlando, 1980.
[598] H. Reitberger. Leopold vietoris (1891-2002). Notices of the AMS, 49, 2002.
[599] H. Ricardo. Goldbach's conjecture implies bertrand's postulate. Amer. Math. Monthly, 112:492492, 2005.
[600] D.S. Richeson. Euler's Gem. Princeton University Press, Princeton, NJ, 2008. The polyhedron formula
and the birth of topology.
[601] E. Riehl. A leisurely introduction to simplicial sets. http://www.math.jhu.edu/ eriehl/ssets.pdf, 2011.
[602] H. Riesel. Prime numbers and computer methods for factorization, volume 57 of Progress in Mathematics.
Birkhäuser Boston Inc., 1985.
[603] David P. Robbins. The story of 1,2,7,42,429,7436,... Mathematical Intelligencer, 13:1219, 1991.
[604] A. Robert. Analyse non standard. Presses polytechniques romandes, 1985.
[605] R.T. Rockafellar. Convex Analysis. Princeton University Press, 1970.
[606] J. Rognes. On the Atiyah-Singer index theorem. http://www.abelprize.no, 2004.
[607] X. Rong and X. Su. The Hopf conjecture for manifolds with abelian group actions. Communications in
Contemporary Mathematics, 7:121136, 2005.
[608] J. Rosenberg. Algebraic K-theory and its applications. Graduate Texts in Mathematics. Springer, 1994.
[609] J. Rosenhouse. The Monty Hall Problem: The Remarkable Story of Math's Most Contentious Brain Teaser.
Oxford University Press, 2009.
[610] G-C. Rota. The phenomenology of mathematical beauty. Synthese, 111(2):171182, 1997. Proof and
progress in mathematics (Boston, MA, 1996).

224
OLIVER KNILL

[611] C. Rovelli. The order of Time. Riverhead Books, 2018.

[612] L.A. Rubel and J.E. Colliander. Entire and Meromorphic functions. Springer Verlag, 1996.
[613] E. Rubik. Cubed, The puzzle of us all. Flatiron Books, New York, 2020.
[614] D. Ruelle. Statistical Mechanics. Mathematics Physics Monograph Series. W.A. Benjamin, Inc, 1968.
[615] D. Ruelle. Thermodynamic Formalism. The Mathematical Structures of Classical Equilibrium Statistical
Mechanics, volume 5 of Encyclopedia of mathematics and its applications. Addison-Wesley Publishing
Company, London, 1978.
[616] D. Ruelle. The mathematician's brain. Princeton University Press, Princeton, NJ, 2007.
[617] J. W. Russell. An Elementary Treatise on Pure Geometry: With Numerous Examples. Clarendon Press,
1893.
[618] S. Russell and P. Norvig. Articial Intelligence: A modern approach. Prentice Hall, 1995, 2003. second
edition.
[619] M.R. Rychlik. A complete solution to the equichordal point problem of fujiwara, blaschke, rothe and
weizenböck. Invent. Math., pages 141212, 1997.
[620] D.M.T.Benincasa S. Zeiki, J.P. Romaya and M.F.Atiyah. The experience of mathematical beauty and its
neural correlates.Front. Hum. Neurosci, 13, 2014.
[621] T.L. Saaty and P.C. Kainen. The four color problem, assaults and conquest. Dover Publications, 1986.
S. Sakai. A characterization of w algebras. Pacic Journal of Mathematics, 6:763773, 1956.
∗
[622]
[623] H. Samelson. On the perron-frobenius theorem. Michigan Math. J., 4:5759, 1957.
[624] L.A. Santalo. Introduction to integral geometry. Hermann and Editeurs, Paris, 1953.
[625] L.A. Santalo. Integral Geometry and Geometric Probability. Cambridge University Press, second edition,
2004.
[626] A. Sard. The measure of the critical values of dierentiable maps. Bull. Amer. Math. Soc., 48:883890,
1942.
[627] S. Savchev and T. Andreescu. Mathematical Miniatures. MAA, 2003.
[628] H. Schenk. Computationsl algebraic Geometry, volume 58 of Student texts. Cambridge University Press,
2003.
[629] A. Schirrmacher. Establishing Quantum Physics in Göttingen: David Hilbert, Max Born, and Peter Debye
in Context, 1900-1926. Springer Briefs in History of Science and Technology. Springer, 2019.
[630] L. Schläi. Theorie der Vielfachen Kontinuität. Cornell University Library Digital Collections, 1901.
[631] D. Schleicher, editor. Complex Dynamics, Families and Friends. AK Peters, 2009.
[632] K. Schmüdgen. The Moment Problem. Graduate Texts in Mathematics 277. Springer International Pub-
lishing, 1 edition, 2017.
[633] R. Schneider. Integral geometric tools for stochastic geometry, volume 1892 of Lecture Notes in Math.
Springer, 2007.
[634] I. Schur. Bemerkungen zur Theorie der beschränkten Biliniearformen mit unendlich vielen Veränderlichen.
J. für die reine und angewandte Mathematik, 140:128, 1911.
[635] J.T. Schwartz. Nonlinear Functional Analysis. Gordan and Breach, 1969.
[636] R.E. Schwartz. Unbounded orbits for outer billiards. Journal of Modern Dynamics, 3, 2007.
[637] R.E. Schwartz. Outer billiards on Kites. Annals of Mathematics Studies. Princeton University Press, 2009.
[638] S.L. Segal. Mathematicians under the Nazis. Princeton university press, 2003.
[639] H. Segerman. Visualizing Mathematics with 3D Printing. John Hopkins University Press, 2016.
[640] S.Eilenberg and J. Zilber. On products of complexes. Amer. J. Math., 75:200204, 1953.
[641] G. Shafer and V. Vovk. The sources of komogorov's grundbegrie. Statistical Science, pages 7098, 2006.
[642] C. Shannon. The zero error capacity of a noisy channel. IRE Transactions on Information Theory, 2:819,
1956.
[643] C.E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379
423,623656, 1948.
[644] S. Shelah. Proper Forcing. Lecture Notes in Mathematics. Springer, 1982.
[645] S. Shelah. Classication theory, volume 92 of Studies in Logic. North Holland, 1990.
[646] M. Shishikura. The hausdor dimension of the boundary of the mandelbrot set and julia sets. Annals of
Mathematics, 147:225267, 1998. proven 1991.
[647] T. Siegfried. A beautiful math. Joseph Henry Press, 2006.

225
FUNDAMENTAL THEOREMS

[648] J.H. Silverman. The Arithmetic of Elliptic Curves, volume 106 of Graduate Texts in Mathematics. Springer,
1986.
[649] B. Simon. Fifteen problems in mathematical physics. In Anniversary of Oberwolfach, Perspectives in
Mathematics. Birkhäuser Verlag, Basel, 1984.
[650] B. Simon. The statistical mechanics of lattice gases, volume Volume I. Princeton University Press, 1993.
[651] B. Simon. Operators with singular continuous spectrum: I. General operators. Annals of Mathematics,
141:131145, 1995.
[652] B. Simon. Trace Ideals and their Applications. AMS, 2. edition, 2010.
[653] B. Simon. A comprehensive course in Analysis. AMS, 2017.
[654] Ya.G. Sinai. Introduction to ergodic theory. PrincetonUniversity press, Princeton, 1976.
[655] N. Sinclair. Mathematics and Beauty. Teachers College Press, 2006.
[656] J. Singer. A theorem of nite projective geometry and some applications to number theory. Trans. Amer.
Math. Soc, 43:377385, 1938.
[657] S. Skinner. Sacred Geometry. Sterling, 2006.
[658] J.L. Snell and R. Vanderbei. Three bewitching paradoxes. In Topics in Contemporary Probability and its
Applications, Probability and Stochastics Series, pages 355370. CRC Press, Boca Raton, 1995.
[659] J. Snygg. Cliord Algebra, A computational Tool for Physicists. Oxford University Press, 1997.
[660] A. Soifer. The Mathematical Coloring Book: Mathematics of Coloring and the colorful life of its creators.
Springer Verlag, 2009.
[661] A. Soifer. Mathematics as problem solving. Springer, New York, second edition, 2009. With forewords by
Branko Grünbaum, Peter D. Johnson, Jr. and Cecil Rousseau.
[662] R. Solomon. A brief history of the classication of the nite simple groups. Bulletin (new series) of the
AMS, 38:315352, 2001.
[663] E. Sondheimer and A. Rogerson. Numbers and innity. Dover Publications Inc., Mineola, NY, 2006. A
historical account of mathematical concepts, Reprint of the 1981 original.
[664] E. Sorets and T. Spencer. Positive Lyapunov exponents for Schrödinger operators with quasi-periodic
potentials. Communications in Mathematical Physics, 142:543566, 1991.
[665] J. Spencer. Large numbers and unprovable theorems. AMS, 90:669675, 1983.
[666] M. Spivak. A comprehensive Introduction to Dierential Geometry I-V. Publish or Perish, Inc, Berkeley,
third edition, 1999.
[667] H. Spohn. Large scale dynamics of interacting particles. Texts and monographs in physics. Springer-Verlag,
New York, 1991.
[668] R. Stanley. Combinatorics and Commutative Algebra. Progress in Math. Birkhäuser, second edition, 1996.
[669] H.M. Stark. On the 'gap' in a theorem of heegner. Journal of Number Theory, pages 1627, 1969.
[670] F. Staudacher. Jost Bürgi. Verlag NZZ, 2016.
[671] K.G.C. Von Staudt. Geometrie der Lage. Nuernberg, 1847. page 21.
[672] L.A. Steen and J.A. Seebach. Counterexamples in Topology. Dover, 1995.
[673] W. Stein. Elementary Number Theory: Primes, Congruences, and Secrets. Undergraduate Texts in Math-
ematics. Springer Verlag, 2009.
[674] W. Stein. Algebraic Number Theory, a computational Approach. 2012.
https://wstein.org/books/ant/ant.pdf.
[675] S. Sternberg. Local contractions and a theorem of Poincaré. Amer. J. Math., 79:809824, 1957.
[676] S. Sternberg. Lectures on Dierential Geometry. Prentice Hall, 1964.
[677] S. Sternberg. A mathematical Companion to Quantum Mechanics. Dover Publications, 2019.
[678] I. Stewart. From Here to Innity. Oxford University Press, 1996.
[679] I. Stewart. In Pursuit of the Unknown: 17 equations that changed the world. Basic Books, 2012.
[680] I. Stewart and D. Tall. Algebraic Number Theory and Fermat's last Theorem. A.K. Peters, third edition,
2002.
[681] J. Stillwell. Mathematics and its history. Springer, 2010.
[682] J. Stillwell. Elements of Mathematics: From Euclid to Goedel. Princeton University Press, 2016.
[683] J.M. Stoyanov. Counterexamples in Probability. Dover, 3nd edition, 2013.
[684] G. Strang. The fundamental theorem of linear algebra. Amer. Math. Monthly, 100(9):848855, 1993.
[685] P. Strathern. The big idea: Pythagoras and his theorem. Arrow books, 1997.
[686] S. H. Strogatz. Sync: The Ermerging Science of Spontaneous Order. Hyperion, 2003.

226
OLIVER KNILL

[687] D.W. Stroock. Probability theory, an analytic view. Cambridge University Press, 1993.
[688] D.J. Struik. A concise History of Mathematics. Dover, 1948.
[689] M. Struwe. Plateau's Problem and the Calculus of Variations, volume 35 of Mathematical Notes. Princeton
Univ Press, 1989.
[690] Graduate Students. Graduate student's guide to generals. https://web.math.princeton.edu/generals/, Ac-
cessed 2019.
[691] E. Study. Zur Theorie der linearen Gleichungen. Acta Math, 42:161, 1920.
[692] B. Sury. Multivariable chinese remainder theorem. Resonance, 20:206216, 2015.
[693] F.J. Swetz and T.I. Kao. Was Pythagoras Chinese? Pennsylvania State University Press, 1988.
[694] E. Szpilrajn-Marczewski. Sur deux proprietés des classes d'ensembles. Fund. Math., 33:303307, 1945.
Translation by B. Burlingham and L. Stewart, 2009.
[695] B. Enescu T. Andreescu. Mathematical Olympiad Treasures. Birkháuser, second edition, 2004.
[696] L.C. Paulson T. Nipkov and M. Wenzel. A proof assistant for Higher order logic. Springer, 2002.
[697] M. Visan T. Tao and X. Zhang. Minimal-mass blowup solutions of the mass-critical nls. Forum Mathe-
maticum, 20, 2008.
[698] S. Tabachnikov. Billiards. Panoramas et synthèses. Société Mathématique de France, 1995.
[699] J. Tanton. Encyclopedia of Mathematics. Facts on File, Inc, 2005.
[700] T. Tao. What is good mathematics. https://arxiv.org/abs/math/0702396, 2007.
[701] T. Tao. Structure and Randomness. AMS, 2008.
[702] T. Tao. Topics in Random Matrix Theory. Graduate Studies in Mathematics. AMS, 2012.
[703] T. Tao. Matrix identities as derivatives of determinant identities.
https://terrytao.wordpress.com/tag/sylvester-determinant-identity/, 2013.
[704] T. Tao. Holomorphic images of disks. https://terrytao.wordpress.com/2020/11/28/holomorphic-images-
of-disks, 2020.
[705] C. Tapp. An den Grenzen des Endlichen: Das Hilbertprogramm im Kontext von Formalismus und
Finitismus. Springer Verlag, 2013.
[706] J. Tate. Fourier Analysis in Number Fields and Hecke's Zeta-Functions, pages 305347. Academic Press,
1967.
[707] M.E. Taylor. Partial Dierential equations, I,II,III, volume 115,116,117 of Applied Mathematical Sciences.
Springer, 2011.
[708] R.F. Thom. Mathematical Models of Morphogenesis. Mathematics and Its Applications. Horwood Ltd,
1984.
[709] J. Todd. Constructive Theory of Functions. Birkhäuser Verlag, 1963.
[710] E. Trucco and A. Verri. Introductory techniques for 3-D computer vision. Upper Saddle River NJ: Prentice
Hall, 1998.
[711] L.W. Tu. Dierential Geometry. Springer Verlag, 2017.
[712] J.B. Tunnell. The classical diophantine problem and modular forms of weight 3/2. Invent. Math, 72:323
334, 1983.
[713] P. Turan. On an extremal problem in graph theory. Math. Fiz. Lapok, 48:436452, 1941.
[714] H. Tverberg. A generalization of radon's theorem. Journal of the London Mathematical Society, 41:123
128, 1966.
[715] S. Ullman. The interpretation of visual motion. Cambridge MA : MIT Press, 1979.
[716] A. Urquhart. Mathematical depth. Philosophia Mathematica, 23:233241, 2015.
[717] G. Urton. Inka History in Knots. University of Texas Press, 2017.
[718] R.S. Varga. Gershgorin and His Circles. Springer Series in Computational Mathematics 36. Springer,
2004.
[719] V.Arnold. Catastrophe theory. Springer, 1981.
[720] Y. Vertot and P. Casteran. Calculus of Inductive Constructions. Springer, 2004.
[721] R. Jones V.F. Von neumann algebras. 2015. Notes November 13, 2015.
[722] M. Viazovsaka. The sphere packing problem in dimension 8. http://arxiv.org/abs/1603.04246, 2016.
[723] C. Villani. Optimal transport, old and new. Springer Verlag, 2008.
[724] I.M. Vinogradov. The Method of Trigonometric Sums in the Theory of Numbers. Dover Publications, 1954.
[725] F. Vivaldi and A. Shaidenko. Global stability of a class of discontinuous dual billiards. Commmun. Math.
Phys., 110:625640, 1987.

227
FUNDAMENTAL THEOREMS

[726] D.V. Treshchev V.V. Kozlov. Billiards, volume 89 of Translations of mathematical monographs. AMS,
1991.
[727] Fulton W. Algebraic Curves. An Introduction to Algebraic Geometry. Addison Wesley, 1969.
[728] N.R. Wallach. Compact homogeneous Riemannian manifolds with strictly positive curvature. Annals of
Mathematics, Second Series, 96:277295, 1972.
[729] F. Warner. Foundations of dierentiable manifolds and Lie groups, volume 94 of Graduate texts in math-
ematics. Springer, New York, 1983.
[730] L. Washington. On the Self-Duality of Qp . Am. Math. Monthly, 81:369371, 1974.
[731] D. J. Watts. Small Worlds. Princeton University Press, 1999.
[732] D. J. Watts. Six Degrees. W. W. Norton and Company, 2003.
[733] J.N. Webb. Game Theory. Springer, 2007.
[734] H. Weber. Lehrbuch der Algebra. Friedrich Vieweg und Sohn, zeite auage edition, 1898.
[735] I. Wegener. Complexity Theory: Exploring the Limits of Ecient Algorithms. Springer, 2005.
[736] E.W. Weinstein. CRC Concise Encyclopedia of Mathematics. CRC Press, 1999.
[737] J. Weizenbaum. ELIZAa computer program for the study of natural language communication between
man and machine. Communications of the ACM, 9:3645, 1965.
[738] D. Wells. Which is the most beautiful? Mathematical Intelligencer, 10:3031, 1988.
[739] D. Wells. Are these the most beautiful? The Mathematical Intelligencer, 12:9, 1990.
[740] Q. Westrich. The calabi-yau theorem. University of Wisconsin-Madion, 2014.
[741] J.E. Wetzel. Converses of napoleon's theorem. American Mathematical Monthly, 99:339351, 1992.
[742] H. Weyl. On the volume of tubes. American Journal of Mathematics, 61:461472, 1939.
[743] W.Fulton and J. Harris. Representation theory. Graduate Texts in Mathematics. Springer, 2004.
[744] H. Whitney. Analytic extensions of dierentiable functions dened in closed sets. Trans. of the AMS,
36:6389, 1934.
[745] H. Whitney. Collected Works. Birkhäuser Verlag, 1992.
[746] J.M. Whittaker. Interpolatory Function Theory. Cambridge University Press, 1935.
[747] W.A. Whitworth. Choice and Chance with one Thousand Exercises. Deighton, Bell and Co, 1886. repub-
lished Hafner Pub, 1965.
[748] N. Wiener. Tauberian theorems. Annals of Mathematics, 33:1100, 1932.
[749] B. Wilking. Torus actions on manifolds of positive sectional curvature. Acta Math., 191:259297, 2003.
[750] D. Williams. Probability with Martingales. Cambridge mathematical Texbooks, 1991.
[751] R. Wilson. Four Colors Suce. Princeton Science Library. Princeton University Press, 2002.
[752] G.L. Wise and E.B.Hall. Counterexamples in Probability and Real Analysis. Oxford University Press, 1993.
[753] M. Wojtkowski. Principles for the disign of billiards with nonvanishing lyapunov exponents. Communica-
tions in Mathematical Physics, 105:391414, 1986.
[754] N. Wolchover. Proven - 'most important unsolved problem' in numbers. NBC news, 9/11/2012, 2012.
[755] J.A. Wolf. Spaces of Constant Curvature. AMS Chelsea Publishing, sixth edition, 2011.
[756] S. Wolfram. Theory and Applications of Cellular Automata. World Scientic, 1986.
[757] S. Wolfram. A new kind of Science. Wolfram Media, 2002.
[758] W.H. Woodin. The continuum hypothesis, part i. Notices of the AMS, 2001.
[759] H. Wussing. 6000 Jahre Mathematik. Springer, 2009.
[760] F.B. Cannonito W.W. Boone and R.C. Lyndon. Word problems. North-Holland Publishing Company,
1973.
[761] S.T. Yau. Problem section. In Seminar on Dierential Geometry, volume 102 of Annals of Mathematics
Studies. Princeton University Press, 1982.
[762] R. Zach. Hilbert's Program then and now, pages 411447. Elsevier, 2007.
[763] D. Zagier. Inequalities for the gini coecient for composite populations. Journal of Mathematical Eco-
nomics, 12:103118, 1983.
[764] D. Zagier. A one-sentence proof that every prime p = 1 mod 4 is a sum of two squares. Amer. Math.
Monthly, 97:144, 1990.
[765] D. Zagier. A converse to Cauchy's inequality. American Mathematical Monthly, 102:919920, 1995.
[766] D. Zagier. A renement of a theorem of J.E. Littlewood. American Mathematical Monthly, 121:618, 2014.
[767] O. Zariski and P. Samuel. Commutative Algebra I,II. University Series in Higher Mathematics. Springer,
1958,1960.

228
OLIVER KNILL

[768] D. Zeilberger. André's reection proof generalized to many-candidates ballot problem. Discrete Mathe-
matics, 44(Number 3), 1983.
[769] D. Zeilberger. Gert almkvist's generalization of a mistake of bourbaki. Contemporary Mathematics,
143:609612, 1993.
[770] D. Zeilberger. Proof of the alternating-sign matrix conjecture. Elec. J. Combin., 3(Number 2), 1996.
[771] G.M. Ziegler. Lectures on Polytopes. Springer Verlag, 1995.
[772] W. Ziller. Riemannian manifolds with positive sectional curvature. In Geometry of Manifolds with Non-
negative Sectional Curvature. Springer, 2014. Lecture given in Guanajuato of 2010.
[773] R. Zimmer. Essential results of Functional Analysis. Chicago Lectures in Mathematics. 1990.
[774] J. D. Zund. George David Birkho and John von Neumann: A question of priority and the ergodic
theorems, 1931-1932. Historia Mathematica, 29:138156, 2002.
[775] D. Zwillinger. CRC Standard Mathematical Tables and Formulas. CRC Press, 33 edition, 2017.

Department of Mathematics, Harvard University, Cambridge, MA, 02138

229

Fundamental Maths
No ratings yet
Fundamental Maths
165 pages
MA138 Lecture Notes
No ratings yet
MA138 Lecture Notes
55 pages
Mathematical Problems and Proofs - Combinatorics, Number - Branislav Kisacanin - 2002 - Kluwer
100% (1)
Mathematical Problems and Proofs - Combinatorics, Number - Branislav Kisacanin - 2002 - Kluwer
237 pages
Fundamental
No ratings yet
Fundamental
227 pages
Kedlaya 1994
No ratings yet
Kedlaya 1994
4 pages
Theory of Numbers
100% (1)
Theory of Numbers
117 pages
Research: 1 Theorems and Open Problems
No ratings yet
Research: 1 Theorems and Open Problems
12 pages
254 Tutnotes
No ratings yet
254 Tutnotes
107 pages
mth303 Notes 2024
No ratings yet
mth303 Notes 2024
114 pages
Notes On Probability Theory and Statistics - Joel Terschuur
No ratings yet
Notes On Probability Theory and Statistics - Joel Terschuur
58 pages
Ent 02
No ratings yet
Ent 02
10 pages
A First Course in Number Theory: Alexandru Buium
No ratings yet
A First Course in Number Theory: Alexandru Buium
46 pages
Algebra & Number Theory
No ratings yet
Algebra & Number Theory
73 pages
Elementary Number Theory - Chen PDF
100% (1)
Elementary Number Theory - Chen PDF
72 pages
(P.D.T.a. Elliott) Probabilistic Number Theory II.
No ratings yet
(P.D.T.a. Elliott) Probabilistic Number Theory II.
393 pages
preRMO Revision Last Minute For SR
No ratings yet
preRMO Revision Last Minute For SR
22 pages
M.N. Huxley-The Distribution of Prime Numbers - Large Sieves and Zero-Density Theorems (Oxford Mathematical Monographs) - Oxford University Press (1972)
No ratings yet
M.N. Huxley-The Distribution of Prime Numbers - Large Sieves and Zero-Density Theorems (Oxford Mathematical Monographs) - Oxford University Press (1972)
138 pages
Analytic Number Theory Note
No ratings yet
Analytic Number Theory Note
36 pages
Ent 01
No ratings yet
Ent 01
6 pages
Glosario de Conjuntos y Números
No ratings yet
Glosario de Conjuntos y Números
4 pages
History Mathematics
No ratings yet
History Mathematics
79 pages
The Green TaoTheorem
No ratings yet
The Green TaoTheorem
60 pages
Fundamental
No ratings yet
Fundamental
213 pages
PreRMO Revision Last Minute For Adv
No ratings yet
PreRMO Revision Last Minute For Adv
27 pages
Divisibility
No ratings yet
Divisibility
8 pages
Twenty One Articles For Mathematics Competitions Look Inside
No ratings yet
Twenty One Articles For Mathematics Competitions Look Inside
13 pages
Cu 31924063439008
No ratings yet
Cu 31924063439008
104 pages
Don McLeish Probability
No ratings yet
Don McLeish Probability
101 pages
Abstract Algebra - Ash
No ratings yet
Abstract Algebra - Ash
9 pages
Properties of Real Numbers (Axioms) : Chapter 0. Appendix 1
No ratings yet
Properties of Real Numbers (Axioms) : Chapter 0. Appendix 1
4 pages
Integers Part1
No ratings yet
Integers Part1
5 pages
Book
No ratings yet
Book
82 pages
Analytic Number Theory: Davoud Cheraghi May 13, 2016
No ratings yet
Analytic Number Theory: Davoud Cheraghi May 13, 2016
66 pages
Abstract Algebra - Ash
No ratings yet
Abstract Algebra - Ash
8 pages
NumberTheory ArthurBaragar 1998summer
No ratings yet
NumberTheory ArthurBaragar 1998summer
14 pages
Fundamental
No ratings yet
Fundamental
156 pages
Theory of Probability: Lecture Notes
No ratings yet
Theory of Probability: Lecture Notes
162 pages
TheTheory of Numbers. A. A. Gioia PDF
No ratings yet
TheTheory of Numbers. A. A. Gioia PDF
197 pages
Theory of Probability Zitcovic PDF
No ratings yet
Theory of Probability Zitcovic PDF
162 pages
2012 Putnam Solutions
No ratings yet
2012 Putnam Solutions
6 pages
Lecture Notes of Advanced Probability
No ratings yet
Lecture Notes of Advanced Probability
101 pages
Lectures On Ergodic Theory
No ratings yet
Lectures On Ergodic Theory
153 pages
Chapter 0
No ratings yet
Chapter 0
5 pages
Some Problems in Additive Number Theory
No ratings yet
Some Problems in Additive Number Theory
89 pages
NP Notes Complete
No ratings yet
NP Notes Complete
119 pages
J.W.S. Cassels An Introduction To Diophantine Approximation 1957 Print
100% (1)
J.W.S. Cassels An Introduction To Diophantine Approximation 1957 Print
178 pages
0 Notes2 MemoryCPU
No ratings yet
0 Notes2 MemoryCPU
186 pages
Anthony A. Gioia - The Theory of Numbers (An Introduction) - (Markham Mathematics Series) - Dover Publications, 2001 - 222p PDF
No ratings yet
Anthony A. Gioia - The Theory of Numbers (An Introduction) - (Markham Mathematics Series) - Dover Publications, 2001 - 222p PDF
222 pages
Topics in Number Theory, Algebra, and Geometry: Ambar N. Sengupta December, 2006
No ratings yet
Topics in Number Theory, Algebra, and Geometry: Ambar N. Sengupta December, 2006
79 pages
Number Theory Darrang
No ratings yet
Number Theory Darrang
10 pages
MATH/STAT 235A - Probability Theory Lecture Notes, Fall 2011
No ratings yet
MATH/STAT 235A - Probability Theory Lecture Notes, Fall 2011
111 pages
Math 55a
No ratings yet
Math 55a
68 pages
Ergodic Theory Number Theory
No ratings yet
Ergodic Theory Number Theory
104 pages
IMC 2010 (Day 2)
No ratings yet
IMC 2010 (Day 2)
4 pages
Ergodic Notes
No ratings yet
Ergodic Notes
115 pages
Weil - Number Theory For Beginners
No ratings yet
Weil - Number Theory For Beginners
39 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Fundamental

Uploaded by

Fundamental

Uploaded by

SOME FUNDAMENTAL THEOREMS IN MATHEMATICS

Abstract. An expository hitchhikers guide to some theorems in mathematics.

Let N = {0, 1, 2, 3, . . . } be the set of natural numbers. A number p ∈ N, p > 1 is prime

Theorem: Every n ∈ N, n > 1 has a unique prime factorization.

Date : 7/22/2018, last update 6/25/2023.

Let f be a function of one variables which is continuously dierentiable, meaning that

A polynomial is a complex-valued function of the form f (x) = a0 + a1 x + · · · + an xn , where

Theorem: Every f ∈ C[x] of degree n can be factored into n linear factors.

Given a sequence Xk of independent random variables on a probability space (Ω, A, P)

Theorem: (X1 + X2 + · · · + Xn ) → Z in distribution.

Proven in a special case by Abraham De-Moivre in 1711 (and rediscovered by Pierre-Simon

A probability space (Ω, A, P) consists of a set Ω, a σ -algebra A and a probability mea-

Theorem: P[A|B] = P[B|A]P[A]/P[B]

A nite simple graph G = (V, E) is a nite collection V of vertices connected by a nite

x∈V d(x)/2 = |E|.

Theorem: For d = 2, χ(G) = v − e + f = 2. For d-spheres, χ(G) = 1 + (−1)d .

Theorem: If all Xi are compact, then i∈I Xi is compact.

12. Algebraic geometry

Theorem: ap = a mod p for every prime p and every integer a.

14. Spectral theorem

A bounded linear operator A on a Hilbert space is called normal if AA∗ = A∗ A, where

Theorem: A is normal if and only if A is unitarily diagonalizable.

15. Number systems

A monoid is a set X equipped with an associative operation ∗ and an identity element

Theorem: If |X| > |Y | then no function X → Y can be injective.

17. Complex analysis

Assume f is an analytic function in an open domain G of the complex plane C. Such

18. Linear algebra

If A is a m × n matrix with image ran(A) and kernel ker(A). If V is a linear subspace of

Theorem: dim(kerA) + dim(ranA) = n, dim((ranA)⊥ ) = dim(kerAT ).

19. Differential equations

Theorem: If f is Lipschitz, a unique solution of x′ = f (x), x(0) = x0 exists.

An axiom system A is a collection of formal statements assumed to be true. We assume it to

Theorem: An axiom system is neither complete nor provably consistent.

21. Representation theory

Theorem: Representations of compact topological groups are semi simple.

22. Lie theory

Given a topological group G, a Borel measure µ on G is called left invariant if µ(gA) =

Theorem: The generally recursive class is the Turing computable class.

24. Category theory

Theorem: N (hA , F ) can be identied with F (A).

25. Perturbation theory

Theorem: A non-degenerate root persists under perturbation.

Theorem: χ(X × Y ) = χ(X)χ(Y ).

27. Metric spaces

A continuous map T : X → X , where (X, d) is a complete non-empty metric space is called

Theorem: A contraction has a unique xed point in X .

28. Dirichlet series

The abscissa of simple convergence of a Dirichlet series ζ(s) = ∞ −λn s

does not converge.

p (1 − 1/p ) . See [316, 318].

Theorem: limx→0 sin(x)/x = 1.

31. Geometric probability

A subset K of Rn is called compact if it is closed and bounded. By Bolzano-Weierstrass

Theorem: The space of valuations is (n + 1)-dimensional.

32. Partial differential equations

A quasilinear partial dierential equation is a dierential equation of the form ut (x, t) =

Theorem: A quasi-linear Cauchy problem has a unique analytic solution.

33. Game theory

If S = (S1 , . . . , Sn ) are n players and f = (f1 , . . . , fn ) is a payo function dened on a

34. Measure theory

Theorem: µ = µac + µsc + µpp .

35. Geometric number theory

Theorem: If K is Minkowski and |K| > 2n |Γ| then K ∩ Γ ̸= ∅.

Theorem: Either ∃f ̸= 0 with Af = 0 or for all g , ∃f with Af = g .

37. Prime distribution

Theorem: The set of primes contains arbitrary long arithmetic progressions.

38. Riemannian geometry

A Riemannian manifold is a smooth nite dimensional manifold M equipped with a smooth,

39. Symplectic geometry

A symplectic manifold (M, ω) is a smooth 2n-manifold M equipped with a non-degenerate

Theorem: Every symplectic form is locally dieomorphic to ω0 .

40. Differential topology

Given a smooth function f on a dierentiable manifold M . Let df denote the gradient

Theorem: ck − ck−1 + · · · + (−1)k c0 ≥ bk − bk−1 + · · · + (−1)k b0 .

41. Non-commutative geometry

Let f be a function of one variables which is continuously dierentiable, meaning that

A nite simple graph G = (V, E) is a nite collection V of vertices connected by a nite

Theorem: N (hA , F ) can be identied with F (A).

Theorem: A contraction has a unique xed point in X .

A quasilinear partial dierential equation is a dierential equation of the form ut (x, t) =

If S = (S1 , . . . , Sn ) are n players and f = (f1 , . . . , fn ) is a payo function dened on a

A Riemannian manifold is a smooth nite dimensional manifold M equipped with a smooth,

Theorem: Every symplectic form is locally dieomorphic to ω0 .

Given a smooth function f on a dierentiable manifold M . Let df denote the gradient

Given a Hamilton dierential equation x′ = J∇H(x) on a compact symplectic 2n-

On the vector space X of continuously dierentiable 2π periodic, complex- valued functions,

Theorem: The set {f = c} is either innite or has d elements.

A homeomorphism T : X → X of a compact topological space X denes a topological

Theorem: A second countable regular Hausdor space is metrizable.

If FixT (X) is nite, then χT (X) = indT (x).

Theorem: Any dierential equation in the plane is integrable.

Let T be a dieomorphism on a smooth Riemannian manifold M with geodesic metric

In the case d = 2, there is a reection ambiguity. In dimension d = 3, the number of ambiguities

Theorem: For f ∈ Lp and p > n, then Kf is dierentiable.