0% found this document useful (0 votes)
23 views190 pages

Math 2131 Lecture Notes

The document contains lecture notes for MATH 2131, an Honors course in Linear and Abstract Algebra at Hong Kong University of Science and Technology. It summarizes key concepts, definitions, theorems, and proofs, focusing on vector spaces and linear transformations while briefly introducing abstract algebra. Additional topics relevant to industry and data sciences are included, along with a structured approach to mathematical notation and conventions.

Uploaded by

bommarcusbom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views190 pages

Math 2131 Lecture Notes

The document contains lecture notes for MATH 2131, an Honors course in Linear and Abstract Algebra at Hong Kong University of Science and Technology. It summarizes key concepts, definitions, theorems, and proofs, focusing on vector spaces and linear transformations while briefly introducing abstract algebra. Additional topics relevant to industry and data sciences are included, along with a structured approach to mathematical notation and conventions.

Uploaded by

bommarcusbom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 190

MATH 2131

Honors in Linear and Abstract Algebra I

Fall 2023 Hong Kong University of Science and Technology

Ivan Ip

(Last updated: December 6, 2023)


Preface

This set of lecture notes summarize the definitions, theorems, idea of proofs and examples discussed
in class. It is an extended version of the Advanced Linear Algebra course I first taught in Kyoto
University in Fall 2017. A lot of topics have since been added, including the complete proofs of the
Jordan and Rational Canonical Form with all the necessary prerequisites.

Although the course title includes “abstract algebra”, the usual theory of groups, rings and fields
will be left to the second part of the Honors series. We will only briefly introduce them in Section
1.1 to formally build up the necessary mathematical foundations, and this will be the only place
where abstract algebra is mentioned in this course.

Additional topics on quotient spaces and dual spaces are also added in the current version. These are
some of the most difficult concepts for beginning Mathematics major students, but still extremely
important. Therefore the expositions are slightly longer than the rest of the notes to give as much
motivations and details as possible.

Even though this course focuses on the abstract setting, in order to cater a broader audience, we
also include some topics that are of interest to industry, especially on data sciences. These include
e.g. the QR decomposition, least square approximation and singular value decomposition.

Keep in mind that this is Lecture Notes, not a textbook:

ˆ Most of the proofs are represented in point form.


ˆ This helps breaking down long proofs and make the logic a bit clearer.
ˆ You will almost never find such presentation in other advanced mathematics textbooks.
ˆ Hence after you understand the content, you are recommended to read some textbooks and
compare the style to get familiar with the “scientific way of writings” in mathematics.

Finally, no Exercises are included in the notes because I am lazy. But students are expected to fill
in the necessary details of the proofs and examples in these notes, and look for exercises in other
reference textbooks or on the math.stackexchange website.

Fall 2020
Ivan Ip
HKUST

i
ii
Preface 2

Yeah I can teach this course again after 3 years of waiting!

Special thanks to Wu Chun Pui (Mike) and Choy Ka Hei (Jimmy) for proofreading the previous
version of the lecture notes and fixing various typos together with some suggestive comments.

Fall 2023
Ivan Ip
HKUST

iii
iv
Introduction

Real life problems are hard.

Problems with linearity are easy.

We make linear approximations to real life problems, and reduce the problems to systems
of linear equations where we can then use the techniques from Linear Algebra to solve for
approximate solutions. Linear Algebra also gives new insights and tools to the original problems.

Real Life Problems Linear Algebra


Optimization Problems −→ Tangent Spaces
Economics −→ Linear Regression
Stochastic Process −→ Transition Matrices
Engineering −→ Vector Calculus
Data Science −→ Principal Component Analysis
Signal Processing −→ Fourier Analysis
Artificial Intelligence −→ Deep Learning (Tensor Calculus)
Computer Graphics −→ Euclidean Geometry

Roughly speaking,
Real Life Problems Linear Algebra
Data / Data Sets ←→ Vectors / Vector Spaces
Relationships between Data Sets ←→ Linear Transformations

In this course, we will focus on the axiomatic approach to linear algebra, and study the descrip-
tions, structures and properties of vector spaces and linear transformations.

With such a mindset, the usual theory of solving linear equations with matrices can be considered
as a very straightforward specialization of the abstract setting.

v
vi
Mathematical Notations
Numbers:

ˆ R : Real numbers

ˆ Q : Rational numbers

ˆ C : Complex numbers x + iy where x, y ∈ R and i2 = −1

ˆ N : Natural numbers {1, 2, 3, ...}

ˆ Z : Integers {..., −2, −1, 0, 1, 2, ...}

Logical symbols:

ˆ ∀ : “for every”

ˆ ∃ : “there exists”

ˆ := : “is defined to be”

ˆ =⇒ : “implies”

ˆ ⇐⇒ or iff : “if and only if”. A iff B means


– if: B =⇒ A
– only if: A =⇒ B

ˆ WLOG: “Without loss of generality”

Sets:

ˆ x∈X : x is an element in the set X

ˆ S⊂X : S is a subset of X

ˆ S⊊X : S is a subset of X but not equals to X

ˆ X=Y : X ⊂ Y and Y ⊂ X

ˆ X ×Y : Cartesian product, the set of pairs {(x, y) : x ∈ X, y ∈ Y }

ˆ |S| : Cardinality (size) of the set S

ˆ ∅ : Empty set

ˆ X −→ Y : A map from X to Y

ˆ x 7→ y : A map sending x to y (“x maps to y”)

ˆ ι : X ,→ Y : Inclusion (when X ⊂ Y ), injection

ˆ π : X ↠ Y : Surjection
Notational conventions in this lecture notes:

I will try to stick with it most of the time, with few exceptions.

• a, b, c, d scalars
• i, j, k, l (summation) indices (may start from zero)
• m, n, r dimension, natural numbers (always > 0 unless otherwise specified)
• f, g, h functions (usually in variable x)
• p, q, r polynomials (usually in variable t or λ)
• t, x, y, z variables
• u, v, w, ... (bold small letter) vectors
− ui , vj , wk ,... (unbold) coordinates
• ei standard basis vectors
• K base field (usually R or C)
• U, V, W vector spaces
(most of the time U ⊂ V and dim U = r, dim V = n, dim W = m)
• L (V, W ) the set of linear transformations from V to W
• S, T linear transformations
• I identity map
• O zero map
• Mm×n (K) the set of m × n matrices over the field K
• A, B, C, ... (bold capital letter) matrices
• I identity matrix
• O zero matrix
• B, E, S... (curly capital letter) bases, subsets
• λ, µ eigenvalues

viii
Color Codes

Definition 0.1. This is a Definition.

Proposition 0.2. This is a Proposition.

Theorem 0.3. This is a Theorem.

Proof. This is a proof.

Corollary 0.4. This is a direct consequence of the Theorem above.

Example 0.1. This is an Example.

Non-Example 0.2. This is a Counterexample.

Note. This is an important note or a warning.

Notation. This is a note about notations.

Remark. This is a minor comment or optional material.

This is an important keypoint.

ix
x
Contents

1 Abstract Vector Spaces 1

1.1 Groups, Rings, Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Linearly Independent Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.6 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.7 Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Linear Transformations and Matrices 21

2.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 Injections, Surjections and Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4 Fundamental Theorems of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . 33

2.5 Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.6 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.7 Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.8 Vandermonde Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

xi
3 Determinants 45

3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.5 Laplace Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Inner Product Spaces 61

4.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Orthogonal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.5 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.6 Least Square Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.7 Gram Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5 Spectral Theory 81

5.1 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.2 Characteristic Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.4 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6 Positive Definite Matrices and SVD 95

6.1 Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.2 Singular Value Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

xii
7 Complex Matrices 105

7.1 Adjoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7.2 Unitary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.3 Hermitian Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.4 Normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

8 Invariant Subspaces 119

8.1 Invariant Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.2 Cayley–Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.3 Minimal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8.4 Spectral Theorem of Commuting Operators . . . . . . . . . . . . . . . . . . . . . . . 128

9 Canonical Forms 131

9.1 Nilpotent Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

9.2 Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

9.3 Rational Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

10 Quotient and Dual Spaces 141

10.1 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

10.2 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

A Equivalence Relations 155

B Euclidean Algorithm 157

C Complexification 161

D Proof of Rational Canonical Forms 167

xiii
xiv
CHAPTER 1

Abstract Vector Spaces

1.1 Groups, Rings, Fields

In linear algebra, we study vector spaces over a field, i.e. a “number system” where you can add,
subtract, multiply and divide. To formally build up our theory, let us first define all the axioms
that we expect a number system to hold. (This is the only “abstract algebra” in this course.)

Definition 1.1. A set K is called a field if it has two binary operations

+ : K × K −→ K (addition)
∗ : K × K −→ K (product)

such that the following rules hold for all a, b, c ∈ K:

(+1) Addition is associative: (a + b) + c = a + (b + c).


(+2) Zero exists: ∃0 ∈ K such that a + 0 = a.
(+3) Inverse exists (subtraction): ∃b ∈ K such that a + b = 0. We write b = −a.
(+4) Addition is commutative: a + b = b + a.
(∗1) Product is associative: (a ∗ b) ∗ c = a ∗ (b ∗ c).
(∗2) Identity exists: ∃1 ∈ K such that a ∗ 1 = 1 ∗ a = a.
(∗3) Inverse exists (division): If a ̸= 0, ∃b ∈ K such that a ∗ b = b ∗ a = 1. We write b = a−1 .
(∗4) Product is commutative: a ∗ b = b ∗ a.
(∗5) Product is distributive: c ∗ (a + b) = c ∗ a + c ∗ b,
(a + b) ∗ c = a ∗ c + b ∗ c.

-1-
Chapter 1. Abstract Vector Spaces 1.1. Groups, Rings, Fields

Notation. Usually the symbol ∗ will be omitted when we write the product.

Example 1.1. Important examples of fields include:

ˆ Q, R, C.
√ √
ˆ Q( 2): the set of all real numbers of the form a + 2b where a, b ∈ Q.

ˆ Rational functions with real coefficients.

ˆ Finite fields: Z/pZ (arithmetic modulo p) where p is a prime number.

In abstract algebra, one usually studies mathematical structures that satisfy only some of the
axioms. This leads to the notion of groups and rings.

Definition 1.2. A set with a single operation satisfying (∗1)–(∗3) only is called a group.
A group satisfying in addition (∗4) is called an abelian group.

Notation. (+1)–(+4) and (∗1)–(∗4) are identical under a change of notations:

a + b ←→ a ∗ b 0 ←→ 1 − a ←→ a−1 .

Usually we prefer using + for abelian groups and ∗ for general groups.

Example 1.2. Examples of groups:

ˆ abelian: Z (under +), Z/nZ (under + mod n), {en : n ∈ Z} (under ×).

ˆ non-abelian: permutations (under composition), SL(n, R) (n × n matrix of determinant 1


under matrix multiplication).

Definition 1.3. A set with two operations satisfying (+1)–(+4) and (∗1), (∗2), (∗5) is called a
ring . This means we can add, subtract and multiply only.

A ring satisfying in addition (∗4) is called a commutative ring .

A field is just a commutative ring that allows division (i.e. taking inverse of non-zero elements).

Example 1.3. Examples of rings:

ˆ commutative: Z, R[t] (polynomials in t), fields.

ˆ non-commutative: Mn×n (R) (n × n matrices).

-2-
Chapter 1. Abstract Vector Spaces 1.2. Vector Spaces

1.2 Vector Spaces

Let K be a field. In this course we will mainly consider K = R or C.

The definition of a vector space is nearly the same as that of a ring, however with the product
replaced by scalar multiplication:

Definition 1.4. A vector space over a field K is a set V together with two binary operations:

+ : V × V −→ V (addition)
• : K × V −→ V (scalar multiplication)

subject to the following 8 rules for all u, v, w ∈ V and c, d ∈ K:

(+1) Addition is commutative: u + v = v + u.


(+2) Addition is associative: (u + v) + w = u + (v + w).
(+3) Zero exists: ∃0 ∈ V such that u + 0 = u.
(+4) Inverse exists: ∃u′ ∈ V such that u + u′ = 0. We write u′ := −u.
(·1) Multiplication is associative: (cd) · u = c · (d · u).
(·2) Unity: 1 · u = u.
(·3) Multiplication is distributive: c · (u + v) = c · u + c · v.
(·4) Multiplication is distributive: (c + d) · u = c · u + d · u.

The elements of a vector space V are called vectors.

Note. Rules (+1)–(+4) just say that V is an abelian group under addition.
Also (+3) implies that V must be non-empty by definition.

Remark. One can also replace K by any ring R, the resulting structure is called an R-module. However,
the algebraic structure will be very different from a vector space, because we cannot do division in general,
so that we cannot “rescale” vectors arbitrarily.


Notation. We will denote a vector with boldface u in this note, but you should use u for hand-
writing. Sometimes we will also omit the · for scalar multiplication if it is clear from the context.

Note. Unless otherwise specified, all vector spaces in the examples below are over R.

Notation. Vectors in Rn are always written vertically (i.e. as n × 1 matrices).

-3-
Chapter 1. Abstract Vector Spaces 1.2. Vector Spaces

The following properties follow from the definitions:

Proposition 1.5 (Uniqueness).

ˆ The zero vector 0 ∈ V is unique.

ˆ For any u ∈ V , the negative vector −u ∈ V is unique.

Proof. Straightforward play-around with the definition:

ˆ If there were two zeros 0, 0′ , then 0 = 0 + 0′ = 0′ .

ˆ If there were two additive inverses u′ , u′′ to u, then

u′ = 0 + u′ = (u′′ + u) + u′ = u′′ + (u + u′ ) = u′′ + 0 = u′′ .

Note that these proofs apply to any (abelian) groups. You should identify how each of the = signs
followed from the axioms.

The existence of a unique negative vector allows us to do subtraction:

Proposition 1.6 (Cancellation Law). For any u, v, v′ ∈ V ,

u + v = u + v′ ⇐⇒ v = v′ .

In particular
u = u + v ⇐⇒ v = 0.

Proof. Add −u on both sides, use associativity and commutativity repeatedly.

Proposition 1.7. For any u ∈ V and c ∈ K,

c · u = 0 ⇐⇒ c = 0 or u = 0.

Proof. We need to show both directions:

(⇐=) • 0 · u = (0 + 0) · u = 0 · u + 0 · u. By cancellation law, 0 = 0 · u.


• c · 0 = c · (0 + 0) = c · 0 + c · 0. By cancellation law, 0 = c · 0.

(=⇒) • If c = 0 then there is nothing to prove.


• If c ̸= 0, then u = 1 · u = (c−1 · c) · u = c−1 · (c · u) = c−1 · 0 = 0.

-4-
Chapter 1. Abstract Vector Spaces 1.2. Vector Spaces

As a consequence, we have:

Corollary 1.8. −u = (−1) · u.

Proof. Using Proposition 1.7,

ˆ 0 = 0 · u = (1 + (−1)) · u = 1 · u + (−1) · u = u + (−1) · u.


ˆ Since negative vector is unique, (−1) · u = −u.

Examples of vector spaces over R:

Example 1.4. The zero vector space {0}.

Example 1.5. The space Rn , n ≥ 1 with the usual vector addition and scalar multiplication.

Example 1.6. C is a vector space over R.

 
x
  3
Example 1.7. The subset {
y  : x + y + z = 0} ⊂ R .

z

Example 1.8. The set C 0 (R) of continuous real-valued functions f (x) defined on R.

Example 1.9. The set of real-valued twice-differentiable functions satisfying

d2 f
f+ = 0.
dx2

Examples of vector spaces over a field K:

Example 1.10. The zero vector space {0}.

Example 1.11. The space K n (n ≥ 1) of n × 1 column vectors in K.

Example 1.12. The ring K[t] of all polynomials with variable t and coefficients in K:
p(t) = a0 + a1 t + a2 t2 + ... + an tn
with ai ∈ K for all i.

-5-
Chapter 1. Abstract Vector Spaces 1.3. Subspaces

Example 1.13. The ring Mm×n (K) of all m × n matrices with entries in K.
(In this chapter we will sometimes use matrices as examples. More details starting from Chapter 2.3.)

Example 1.14. More generally, if V is a ring containing a field K as a subring (i.e. sharing the
same addition and multiplication operators), then V is a vector space over K.

Counter-Examples: these are not vector spaces (under usual + and ·):

Non-Example 1.15. R is not a vector space over C.

!
x
Non-Example 1.16. The first quadrant { : x ≥ 0, y ≥ 0} ⊂ R2 .
y

Non-Example 1.17. The set of all invertible 2 × 2 matrices.

Non-Example 1.18. Any straight line in R2 not passing through the origin.

Non-Example 1.19. The set of polynomials of degree exactly n.

Non-Example 1.20. The set of functions satisfying f (0) = 1.

1.3 Subspaces

Let V be a vector space over a field K.

Definition 1.9. A subset U ⊂ V is called a subspace of V if U is itself a vector space.

To check whether a subset U ⊂ V is a subspace, we only need to check zero and closures.
Proposition 1.10. Let V be a vector space. A subset U ⊂ V is a subspace of V iff it satisfies all
the following conditions:

(1) Zero exists: 0 ∈ U.


(2) Closure under addition: u, v ∈ U =⇒ u + v ∈ U .
(3) Closure under multiplication: u ∈ U , c ∈ K =⇒ c · u ∈ U .

-6-
Chapter 1. Abstract Vector Spaces 1.3. Subspaces

Note. By Proposition 1.7 and (3) (taking c = 0), (1) can be replaced by “U is non-empty”.

Proof. If U is a subspace, then it is a vector space, so (1)–(3) are satisfied by definition.

If U satisfies (1)–(3), then for the 8 rules of vector space, (+1)(+2)(·1)–(·4) are automatically
satisfied from V , while (1)=⇒(+3) and (3)=⇒(+4) by multiplication with −1.

Example 1.21. Every vector space has a zero subspace {0}.

Example 1.22. A plane in R3 containing the origin is a subspace of R3 .

Example 1.23. The set Kn [t] of all polynomials of degree at most n with coefficients in K, is a
subspace of the vector space K[t] of all polynomials with coefficients in K.

Example 1.24. Real-valued functions satisfying f (0) = 0 is a subspace of the vector space of all
real-valued functions.

Non-Example 1.25. Any straight line in R2 not passing through (0, 0) is not a subspace of R2 .

Non-Example 2 3
  1.26. R is not a subspace of R .
x
  3 2 3
However {y 

 : x ∈ R, y ∈ R} ⊂ R , which “looks exactly like” R , is a subspace of R .
0

Definition 1.11. Let S ⊂ V be a subset of vectors in V .

ˆ A linear combination of S is any finite sum of the form

c1 v1 + ... + cr vr ∈ V, c1 , ..., cr ∈ K, v1 , ..., vr ∈ S.

ˆ The set spanned by S is the set of all linear combinations of S, denoted by Span(S).

ˆ If S = ∅, we define Span(S) := {0}.

It follows from definition that

Theorem 1.12. Span(S) is a subspace of V . In particular Span(V ) = V .

-7-
Chapter 1. Abstract Vector Spaces 1.4. Linearly Independent Sets

It is also easy to check if a subset is a subspace:

Theorem 1.13 (Subspace criterion). U ⊂ V is a subspace of V iff

(1) U is non-empty, and

(2) c ∈ K, u, v ∈ U =⇒ c · u + v ∈ U .

Proof. Follows directly from Proposition 1.10: Taking c = 1 gives (2). Taking v = 0 gives (3).

 
a − 3b
 
b−a
Example 1.27. The set U := {  ∈ R4 : a, b ∈ R} is a subspace of R4 , since every element
 
 a 
 
b
of U can be written as a linear combination of u1 and u2 :
   
1 −3
   
−1 1
au1 + bu2 := a   + b   ∈ R4 .
   
1 0
   
0 1

Hence U = Span(u1 , u2 ) is a subspace by Theorem 1.12.

1.4 Linearly Independent Sets

Definition 1.14. A set of vectors {v1 , ..., vr } ⊂ V is linearly dependent if

c1 v1 + · · · + cr vr = 0

for some ci ∈ K, not all of them are zeros.

Linearly independent set are those vectors that are not linearly dependent:

Definition 1.15. A set of vectors {v1 , ..., vr } ⊂ V is linearly independent if

c1 v1 + · · · + cr vr = 0

implies ci = 0 for all i.

-8-
Chapter 1. Abstract Vector Spaces 1.5. Bases

Example 1.28. A set of one element {v} is linearly independent iff v ̸= 0.

Example 1.29. A set of two elements {u, v} is linearly independent iff u, v are not multiple of
each other.

Example 1.30. Any set containing 0 is linearly dependent.

Example 1.31. A set of vectors is linearly dependent if one vector is a linear combination of other
vectors.

     
1 0 0
      3
Example 1.32. The set of vectors {
0 , 1 , 0} in R is linearly independent.
    
0 0 1

     
1 4 7
      3
Example 1.33. The set of vectors {
2 , 5 , 8} in R is linearly dependent.
    
3 6 9

Example 1.34. The set of polynomials {t2 , t, 4t − t2 } in R[t] is linearly dependent.

Example 1.35. The set {sin 2x, sin x cos x} in C 0 (R) is linearly dependent.

Example 1.36. Linear dependence depends on the field. The set of functions {sin x, cos x, eix }
(x ∈ R) is linearly independent over R, but linearly dependent over C.

1.5 Bases

Definition 1.16. An ordered set of vectors B ⊂ V is a basis for V iff it satisfies

(1) B is a linearly independent set, and

(2) V = Span(B).

Remark. The plural of “basis” is “bases”.

-9-
Chapter 1. Abstract Vector Spaces 1.5. Bases

Example 1.37. The columns of the n × n identity matrix In×n :


     
1 0 0
     
0 1 0
e1 :=  .  , e2 :=  .  , · · · , en :=  . 
     
 ..   ..   .. 
     
0 0 1

form the standard basis E for K n .

Example 1.38. The monomials {1, t, t2 , ..., tn } form the standard basis E for Kn [t].

Theorem 1.17 (Spanning Set Theorem). Let S = {v1 , ..., vr } ⊂ V and let U = Span(S).
(1) If one of the vectors, say vk , is a linear combination of the remaining vectors in S, then
U = Span(S \ {vk }).

(2) If U ̸= {0}, some subset of S is a basis of U .

Proof.

(1) We need to prove Span(S \ {vk }) ⊂ U and U ⊂ Span(S \ {vk }).


ˆ By definition Span(S \ {vk }) ⊂ Span(S) = U .
ˆ To show U ⊂ Span(S \ {vk }):
– WLOG, assume vr ∈ Span(v1 , ..., vr−1 ), so that

vr = a1 v1 + · · · + ar−1 vr−1 , for some ai ∈ K.

– Let u ∈ U . Then
u = c1 v1 + · · · + cr−1 vr−1 + cr vr , for some ci ∈ K
= c1 v1 + · · · + cr−1 vr−1 + cr (a1 v1 + · · · + ar−1 vr−1 )
= (c1 + cr a1 )v1 + · · · + (cr−1 + cr ar−1 )vr−1
∈ Span(S \ {vk }).
(2) If S is a basis of U , then there is nothing to prove. So assume S is not a basis of U .
ˆ Since S spans U , this means S is not linearly independent.
ˆ Hence there exists ai ∈ K, not all zero, such that a1 v1 + · · · + ar vr = 0.
ˆ WLOG, assume ar ̸= 0, so that
a1 ar−1
vr = − v1 − · · · − vr−1 .
ar ar

ˆ This means U = Span(S \ {vr }) by (1).


ˆ Repeat the process. Since U ̸= {0}, this process will stop.

-10-
Chapter 1. Abstract Vector Spaces 1.5. Bases

Theorem 1.18 (Unique Representation Theorem). Let B = {b1 , ..., bn } ⊂ V .

ˆ If B is a basis for V , then for any v ∈ V , there exists unique scalars c1 , ..., cn ∈ K such that

v = c1 b1 + ... + cn bn .

ˆ Conversely, if any v ∈ V can be expressed uniquely as a linear combination of B, then B is


a basis for V .

Definition 1.19. The scalars c1 , ..., cn ∈ K are called the coordinates of v relative to the basis
B, and  
c1
. n
.
.∈K
[v]B := 
cn
is the coordinate vector of v relative to B.

Note. The coordinate vector depends on the order of the basis vectors in B.

Proof. (Unique Representation Theorem)

ˆ Existence is trivial since V = Span(B).


ˆ Uniqueness: If v = c′1 b1 + · · · + c′n bn for some other scalars c′1 , ..., c′n ∈ K, then
0 = (c1 − c′1 )b1 + · · · + (cn − c′n )bn .

ˆ Since B is linearly independent, ci − c′i = 0 and hence c′i = ci for all i.


ˆ Conversely, the statement just says B spans V and is a linearly independent set.

Example 1.39. The coordinate vector of the polynomial p(t) = t3 + 2t2 + 3t + 4 ∈ K3 [t] relative
to the standard basis E = {1, t, t2 , t3 } is
 
4
 
3
[p(t)]E =   ∈ K 4 .
 
2
 
1

Example 1.40. The columns of an invertible matrix A ∈ Mn×n (R) form a basis of Rn , because
Ax = b always has a unique solution (see Theorem 2.32).

-11-
Chapter 1. Abstract Vector Spaces 1.6. Dimensions

1.6 Dimensions

Theorem 1.20 (Replacement Theorem). If V = Span(v1 , ..., vn ), and S = {u1 , ..., um } is


another linearly independent set in V , then |S| = m ≤ n.

Remark. This Theorem is also known as the Steinitz Exchange Lemma.

Proof. (Outline)

ˆ Let S ′ = {u1 , v1 , ..., vn }. One can replace some vi by u1 so that S ′ \ {vi } also spans V .
– This is because u1 = a1 v1 + · · · + an vn and ai ̸= 0 for some i.
– Hence vi = a−1 ′
i (u1 − a1 v1 − · · · − an vn ) ∈ Span(S \ {vi }) where the sum omit the i-th
term.
– By the Spanning Set Theorem, V = Span(S ′ \ {vi }).
– Reindexing the vi ’s, WLOG we may assume V = Span(u1 , v2 , v3 , ..., vn ).
ˆ Assume on the contrary that m > n.
– Repeating the above process we can replace all v’s by u’s:
V = Span(u1 , v2 , v3 , ..., vn ) = Span(u1 , u2 , v3 , ..., vn ) = · · · = Span(u1 , u2 , ..., un )
(We have to use the assumption that {u1 , ..., un } is linearly independent.)
– Since um ∈ V = Span(u1 , ..., un ), the set {u1 , ..., um } cannot be linearly independent, a
contradiction to the assumption of S.

Applying this statement to different bases B and B ′ , which are both spanning and linearly
independent, we get

Theorem 1.21. If a vector space V has a basis of n vectors, then every basis of V must also
consists of exactly n vectors.

By the Spanning Set Theorem, if V is spanned by a finite set, it has a basis. Then the following
definition makes sense by Theorem 1.21.
Definition 1.22. If V is spanned by a finite set, then V is said to be finite dimensional ,
dim(V ) < ∞. The dimension of V is the number of vectors in any basis B of V :
B = {b1 , ..., bn } =⇒ dim V := |B| = n.

If V = {0} is the zero vector space, by convention dim V := 0 (and B = ∅.)

If V is not spanned by a finite set, it is infinite dimensional , and dim(V ) := ∞.

-12-
Chapter 1. Abstract Vector Spaces 1.6. Dimensions

Notation. If the vector space is over the field K we will write dimK V . If it is over R or if the
field is not specified (as in Definition 1.22 above), we simply write dim V instead.

Example 1.41. dim Rn = n.

Example 1.42. dimK Kn [t] = n + 1, dimK K[t] = ∞.

Example 1.43. dimK Mm×n (K) = mn.

 
x
  3
Example 1.44. Let V = {
y  ∈ R : x + y + z = 0}. Then dim V = 2.

z

Example 1.45. The space of real-valued functions on R is infinite dimensional.

Example 1.46. dimR C = 2 but dimC C = 1. dimR R = 1 but dimQ R = ∞.

Theorem 1.23 (Basis Extension Theorem). Let dim V < ∞ and U ⊂ V be a subspace.
Any linearly independent set in U can be extended, if necessary, to a basis for U .
Also, U is finite dimensional and
dim U ≤ dim V.

Proof. Let {b1 , ..., br } ⊂ U be a linearly independent set.

ˆ If it does not span U , there exists u ∈ U such that {b1 , ..., br , u} is linearly independent.

ˆ If this still does not span U , repeat the process. It must stop by the Replacement Theorem.

Corollary 1.24. Let dim V < ∞. If U ⊂ V is a subspace and dim U = dim V , then U = V .

Proof. Assume dim U = dim V = n and take a basis B = {b1 , ..., bn } of U .

ˆ If U ̸= V , then B does not span V , so it is not a basis of V .

ˆ By the Basis Extension Theorem, B can be extended to a basis of V .

ˆ But this implies dim V > n, a contradiction.

-13-
Chapter 1. Abstract Vector Spaces 1.6. Dimensions

Example 1.47. Subspaces of R3 are classified as follows:

ˆ 0-dimensional subspace: only the zero space {0}.

ˆ 1-dimensional subspace: any line passing through the origin.

ˆ 2-dimensional subspace: any plane passing through the origin.

ˆ 3-dimensional subspace: only R3 itself.

Summarize: For dim V < ∞:


Basis Spanning
Extension Set
Theorem Theorem

Linearly
Spanning
Independent ⊆ Basis ⊆
Set
Set

Replacement Theorem

Finally, if we know the dimension of V , the following Theorem gives a useful criterion to check
whether a set is a basis:

Theorem 1.25 (Basis Criterions). Let dim V = n ≥ 1, and let S ⊂ V be a finite subset with
exactly n elements.

(1) If S is linearly independent, then S is a basis for V .

(2) If S spans V , then S is a basis for V .

Proof. By Theorem 1.21 any basis of V consists of n elements by assumption.

(1) If S is not spanning, by the Basis Extension Theorem we can extend S to a basis with > n
elements, contradicting the dimension.
(2) If S is not linearly independent, by the Spanning Set Theorem a subset of S with < n elements
forms a basis of V , contradicting the dimension.

Remark. Every vector space has a basis, even when dim V = ∞. However, it requires (and is equivalent
to) the Axiom of Choice, a fundamental axiom in infinite (ZFC) set theory.

-14-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums

1.7 Direct Sums


Definition 1.26. Let U, W be subspaces of V

ˆ U ∩ W (as a set) is the intersection of U and W .

ˆ U + W := {u + w : u ∈ U, w ∈ W } ⊂ V is the sum of U and W .

Note. By definition U + W = Span{U ∪ W }.

Proposition 1.27. U ∩ W and U + W are both vector subspaces of V .

Definition 1.28. Let U, W ⊂ V be subspaces. Assume

(1) V = U + W ,

(2) U ∩ W = {0}.

Then V is called the direct sum of U and W , written as

V = U ⊕ W.

The subspaces U, W are called the direct summands of V .

Note. The order of ⊕ does not matter: by definition U ⊕ W = W ⊕ U . However, their bases
will have different order: usually the basis vectors from the first direct summand come first.

   
x 0
3
   
Example 1.48. R = {
y  : x, y ∈ R} ⊕ {0 : z ∈ R}.
  
0 z

Example 1.49. {Polynomials} = {Constants}⊕{p(t) : p(0) = 0}.

( ) ( )
Even functions Odd functions
Example 1.50. {Space of functions on R}= ⊕ .
f (−x) = f (x) f (−x) = −f (x)

( ) ( )
Symmetric matrices Anti-symmetric matrices
Example 1.51. {Square matrices} = ⊕ .
AT = A AT = −A

-15-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums

Theorem 1.29 (Uniqueness of Direct Sum). V = U ⊕ W iff every v ∈ V can be written


uniquely as
v=u+w
for some u ∈ U and w ∈ W .

Proof. We prove the two directions:

(=⇒) If v = u + w = u′ + w′ , then
u − u′ ∈ U
=w′ − w ∈ W.
Since U ∩ W = {0}, u − u′ = w′ − w = 0 and hence u′ = u, w′ = w.
(⇐=) Let v ∈ U ∩ W . Then
v=v+0=0+v
so that by uniqueness v = 0, and hence U ∩ W = {0}.

Theorem 1.30 (Direct Sum Complement). Assume dim V < ∞ and let U ⊂ V be a subspace.
Then there exists another subspace W ⊂ V such that

V = U ⊕ W.

W is called the direct sum complement to U .

Proof. Let {u1 , ..., ur } be a basis of U .

ˆ By the Basis Extension Theorem, we can extend it to a basis {u1 , ..., ur , w1 , ..., wm } of V .
ˆ Then W := Span(w1 , ..., wm ) is a direct sum complement to U .

Note. The complement W is not unique: there are many ways to extend a basis.

Remark. Theorem 1.30 also holds when dim V = ∞, which again is equivalent to the Axiom of Choice.

Theorem 1.31 (Dimension Formula). If U, W ⊂ V are subspaces, then


dim U + dim W = dim(U + W ) + dim(U ∩ W ).
In particular
dim U + dim W = dim(U ⊕ W ).

-16-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums

Proof. The statement holds trivially if one of U, W is infinite dimensional.

Hence assume both U, W are finite dimensional, and let B = {b1 , ..., bm } be a basis of U ∩ W .

ˆ By the Basis Extension Theorem, extend B to basis


{b1 , ..., bm , u1 , ..., uk } ⊂ U
{b1 , ..., bm , w1 , ..., wl } ⊂ W
where ui , wi ̸∈ U ∩ W .
ˆ We claim that {b1 , ..., bm , u1 , ..., uk , w1 , ..., wl } is a basis of U + W .
– By definition it spans U + W .
– To show linear independence, let ci , di , d′i ∈ K be such that
c1 b1 + · · · + cm bm + d1 u1 + · · · + dk uk + d′1 w1 + · · · + d′l wl = 0.
– Rearrange, we have
c1 b1 + · · · + cm bm + d1 u1 + · · · + dk uk ∈ U
= − d′1 w1 − · · · − d′l wl ∈ W
so that both expressions ∈ U ∩ W , hence = c′1 b1 + · · · + c′m bm for some c′i ∈ K.
– Since {b1 , ..., bm , w1 , ..., wl } is linearly independent, c′i = d′i = 0.
– Since {b1 , ..., bm , u1 , ..., uk } is linearly independent, ci = di = 0.
ˆ Therefore the dimension formula reads
(k + m) + (l + m) = (k + l + m) + m.

Example 1.52. If U and W are two different planes passing through the origin in R3 , then U ∩W
must be a line and U + W = R3 . The dimension formula then gives 2 + 2 = 3 + 1.

By Theorem 1.29, it motivates the following construction:

Definition 1.32. Let U, W be two vector spaces over K (not necessarily a subspace of some V ).

The product of U, W is the set (Cartesian) product

U × W := {(u, w) : u ∈ U, w ∈ W }

together with component-wise binary operations:

(u, w) + (u′ , w′ ) := (u + u′ , w + w′ )
c · (u, w) := (c · u, c · w), c∈K

which makes U × W into a vector space with the zero vector (0, 0) ∈ U × W .

-17-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums

It is clear that:

Proposition 1.33. We have a direct sum decomposition

U × W = U′ ⊕ W′

where
U ′ := {(u, 0) : u ∈ U } ⊂ U × W
W ′ := {(0, w) : w ∈ W } ⊂ U × W

and dim(U × W ) = dim U + dim W.

Note. We see that U ′ and U are nearly identical, except the extra 0 component. Similarly for W
and W ′ . More precisely, they are isomorphic by the terminology in the next chapter.

Note. U ×W never equals U ⊕W even with U, W ⊂ V , because U ×W ⊂ V ×V but U ⊕W ⊂ V .

Remark. However, by abuse of notation

ˆ Some textbook also write U × W as U ⊕ W without referencing a bigger vector space V , and call it
the external direct sum.
ˆ On the other hand, if U, W ⊂ V , then V = U ⊕ W is called an internal direct sum.

We will always refer to internal direct sum when we write U ⊕ W , i.e. U, W are subspaces of some vector
space V to begin with.

Remark. U × W is different from the tensor product U ⊗ W which we will not discuss in this course.

Example 1.53. R3 can be expressed as both internal and external direct sums by:
   
x 0
3
   
R = {y  : x, y ∈ R} ⊕ {0 : z ∈ R}
  
0 z
= R2 × R.

-18-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums

Finally, one can extend direct sum to finitely many copies:

Definition 1.34. Let U1 , ..., Un be subspaces of V . Assume

(1) V = U1 + · · · + Un ,
X
(2) Uj ∩ Ui = {0} for all j.
i̸=j

Then we say that V is a direct sum

V := U1 ⊕ · · · ⊕ Un .

Note. It is not enough to replace (2) by the condition Ui ∩ Uj = {0} for all i ̸= j. E.g. Take the
x, y axis together with the line y = x in R2 .

Remark. Condition (2) can be replaced by


j−1
X
Uj ∩ Ui = {0} for all j > 1.
i=1

Then it is easy to see that the direct sum is equivalent to V = (U1 ⊕ · · · ⊕ Un−1 ) ⊕ Un defined inductively.

A direct generalization of Theorem 1.29 is

Theorem 1.35 (Uniqueness of Direct Sum). V = U1 ⊕ · · · ⊕ Un iff every v ∈ V can be written


uniquely as
v = u1 + · · · + un
where ui ∈ Ui .

Equivalently, V is a direct sum iff any collection of vectors {u1 , ..., un } for ui ∈ Ui are linearly
independent.

Proof. Apply Theorem 1.29 to V = (U1 ⊕ · · · ⊕ Un−1 ) ⊕ Un and induction.

-19-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums

-20-
CHAPTER 2

Linear Transformations and Matrices

Now that we have defined the notion of vector spaces and have seen many examples of them, in
this chapter we study linear transformations, i.e. maps between vector spaces that have certain
“good” property — namely they preserve linearity.

We will see that when we choose a basis to our vector space in order to associate each vector a
coordinate (see Definition 1.19), linear transformation becomes the usual notion of a matrix.

2.1 Linear Transformations

Let V, W be vector spaces over a field K.

Definition 2.1. A linear transformation T is a map

T : V −→ W

such that for all vectors u, v ∈ V and scalars c ∈ K:

(1) T (u + v) = T (u) + T (v),

(2) T (c · v) = c · T (v).

V is called the domain, and W is called the target of T .

The set of all such linear transformations T : V −→ W is denoted by L (V, W ).

Note. As in Theorem 1.13, T is a linear transformation iff for all u, v ∈ V and c ∈ K,


T (c · u + v) = c · T (u) + T (v).

-21-
Chapter 2. Linear Transformations and Matrices 2.1. Linear Transformations

Remark. In some advanced texts,


ˆ L (V, W ) is denoted by Hom(V, W ), which stands for homomorphisms.
ˆ L (V, V ) is denoted by End(V ), which stands for endomorphisms.
ˆ Linear transformations are also called (linear) operators, especially when dim V = ∞.

We can also define addition and scalar multiplication of linear transformations:

Definition 2.2. Let S ∈ L (V, W ) and T ∈ L (V, W ). Let c ∈ K. Then for any u ∈ V we define
ˆ (S + T )(u) := S(u) + T (u) ∈ W ,

ˆ (cT )(u) := c · T (u) ∈ W .


This makes L (V, W ) into a vector space.

By the linearity (i.e. (1)–(2)) of T , we have the following easy property:

Proposition 2.3. Any T ∈ L (V, W ) is uniquely determined by the image of any basis B of V .

Example 2.1. The zero map O : V −→ W given by O(v) := 0.

Example 2.2. The identity map I : V −→ V given by I (v) := v.

Notation. We may sometimes include the subscript IV to emphasis its domain and target.

Example 2.3. Let U ⊂ V be a subspace and T ∈ L (V, W ). Then we have:


ˆ restriction: T |U ∈ L (U, W ) defined by T |U (u) := T (u) ∈ W .
ˆ inclusion: If V = W , then I |U ∈ L (U, V ) will be denoted by U ,→ V .

Example 2.4. Evt0 : Kn [t] −→ K: Evaluating a degree ≤ n polynomial at a fixed t = t0 ∈ K.

Example 2.5. Tr : Mn×n (K) −→ K: Taking trace of a square matrix:


Xn
Tr(A) := aii .
i=1

Example 2.6. (Linear) differential operators on the space C ∞ (R) of smooth functions on R.

Example 2.7. Associating to a smooth periodic function its Fourier series.

-22-
Chapter 2. Linear Transformations and Matrices 2.1. Linear Transformations

Since we just need to know the image of the basis vectors to determine a linear transformation,
when V = K n and W = K m we can record them into a matrix:

Definition 2.4. A linear transformation T : K n −→ K m is denoted by an m × n matrix A:


n
 
| |
m a1 ··· an 
 

| |
where the i-th column ai of A is the image T (ei ) of the standard basis vector ei ∈ K n .
Conversely, we say that A ∈ Mm×n (K) represents T ∈ L (K n , K m ) if ai = T (ei ).

Since matrices represent linear maps, we can add and multiply matrices by scalar according to
Definition 2.2, which is done component-wise. We will study matrices in detail in Section 2.3.

Definition 2.5. Let T ∈ L (V, W ). Then


ˆ The kernel or null space of T is the subset:
Ker(T ) := {v ∈ V : T (v) = 0} ⊂ V.
ˆ The image or range of T is the subset:
Im(T ) := {w ∈ W : w = T (v) for some v ∈ V } ⊂ W.

Notation. We also write Im(T ) as T (V ).

Theorem 2.6. Let T ∈ L (V, W ). Then


ˆ The kernel of T is a subspace of V .

ˆ The image of T is a subspace of W .

Proof. Just verify Proposition 1.10. Note that 0 lies in both spaces.

d
Example 2.8. The kernel of on C ∞ (R) is the set of all constant functions.
dx

Proposition 2.7. If S ∈ L (U, V ) and T ∈ L (V, W ), then the composition is linear:

T ◦ S ∈ L (U, W ).

In particular, L (V, V ) forms a ring (see Definition 1.3).

Notation. We write T n := |T ◦ T ◦{z· · · ◦ T} for n ∈ N.


n times

-23-
Chapter 2. Linear Transformations and Matrices 2.2. Injections, Surjections and Isomorphisms

2.2 Injections, Surjections and Isomorphisms

Definition 2.8. A linear transformation T : V −→ W is

ˆ one-to-one or injective if T (u) = T (v) implies u = v;

ˆ onto or surjective if for every w ∈ W , there exists v ∈ V such that T (v) = w;

ˆ an isomorphism if T is one-to-one and onto.

Definition 2.9. If there exists an isomorphism T ∈ L (V, W ), we say V is isomorphic to W ,


written as V ≃ W .

Proposition 2.10. Let T ∈ L (V, W ). Then the following holds:

(1) T is surjective ⇐⇒ T (V ) = W .

(2) T is surjective ⇐⇒ it maps spanning set to spanning set.

(3) T is injective ⇐⇒ Ker(T ) = {0}, i.e. T (v) = 0 =⇒ v = 0.

(4) T is injective ⇐⇒ it maps linearly independent set to linearly independent set.

(5) T is injective ⇐⇒ dim T (U ) = dim U for any subspace U ⊂ V .

Proof. (1), (3) follow by definition. (1)=⇒(2) and (3)=⇒(4) by linearity of T .

To prove (5), check that if {b1 , ..., br } is a basis of U , then {T (b1 ), ..., T (br )} is a basis for T (U ).

Example 2.9. If U ⊂ V is a subspace, then the inclusion I |U : U ,→ V is injective.

Example 2.10. If T ∈ L (V, W ), then the same map viewed as V −→ T (V ) is surjective.

Example 2.11. If V = U ⊕ W is a direct sum and v = u + w is the unique decomposition (see


Theorem 1.29), then the map

π : V −→ W
v 7→ w

is surjective, called the natural projection map (onto W ).

-24-
Chapter 2. Linear Transformations and Matrices 2.2. Injections, Surjections and Isomorphisms

Proposition 2.11. If T is an isomorphism, then dim V = dim W , and there exists a unique map
T −1 ∈ L (W, V ) which is also an isomorphism, such that

T −1 ◦ T = IV , T ◦ T −1 = IW .

In particular this makes V ≃ W an equivalence relation (see Appendix A).

Proof. By Proposition 2.10 (1) and (5), dim W = dim T (V ) = dim V .

ˆ For any w ∈ W , we define T −1 (w) := v where T (v) = w.


– Such v exists since T is surjective.
– Such v is unique since T is injective.
– T −1 ◦ T (v) = T −1 (w) = v which also implies T −1 is surjective
– T ◦ T −1 (w) = T (v) = w which also implies T −1 is injective.
ˆ T −1 is linear:
T −1 (w1 + w2 ) = v1 + v2 = T −1 (w1 ) + T −1 (w2 ).
T −1 (cw) = cv = cT −1 (w).
ˆ T −1 is unique: If there exists another inverse T ′ , then
T −1 = T −1 ◦ IW = T −1 ◦ T ◦ T ′ = IV ◦ T ′ = T ′ .

Note that by the uniqueness of the inverse, we have

Proposition 2.12. If S, T are isomorphisms, then S ◦ T is also an isomorphism with

(S ◦ T )−1 = T −1 ◦ S −1 .

Theorem 2.13. If B = {b1 , ..., bn } is a basis for a vector space V , then the coordinate mapping

ψV,B : V −→ K n
v 7→ [v]B

is an isomorphism V ≃ K n .

Proof. The map ψV,B is obviously linear, injective and surjective. The inverse is given by
 
c1
−1  .. 
ψV,B :  .  ∈ K n 7→ c1 b1 + · · · + cn bn ∈ V.
 

cn

-25-
Chapter 2. Linear Transformations and Matrices 2.2. Injections, Surjections and Isomorphisms

   
x x
  2
  2
Example 2.12. {y  : x, y ∈ R} =
  ̸ R but {y 

 : x, y ∈ R} ≃ R .
0 0

Example 2.13. Kn [t] ≃ K n+1 .

Example 2.14. Mm×n (K) ≃ K mn .

Example 2.15. C ≃ R2 as a vector space over R.

Example 2.16. In Proposition 1.33, U × W = U ′ ⊕ W ′ , where U ≃ U ′ and W ≃ W ′ .

Now let V, W be finite dimensional vector spaces with dim V = n and dim W = m.
Given a basis B of V and B ′ of W , if T (v) = w, then T can be represented by a matrix
[T ]B
B′ : K
n
−→ K m
[v]B 7→ [w]B′ .
By definition, the i-th column of [T ]B
B′ is given by [T (bi )]B′ where bi ∈ B.

Definition 2.14. The m × n matrix

[T ]B
B′ ∈ L (K , K )
n m

is called the matrix representing T with respect to the bases B and B ′ .


This can be summarized in the following commutative diagram:
T
V W
≃B ≃B ′
[T ]B
B′
Kn Km
More precisely, [T ]B
B′ is the composition of linear maps
−1
[T ]B
B′ = ψW,B′ ◦ T ◦ ψV,B .

Notation. When V = W and B ′ = B, we sometimes just write [T ]B instead of [T ]B


B.

Note. Definition 2.4 says that a matrix A, considered as a linear map L (K n , K m ), represents
itself with respect to the standard basis.

So let us recall how matrix operates.

-26-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices

2.3 Matrices

Definition 2.15 (Matrix multiplication). If A = (aij ) is an m × n matrix, and B = (bij ) is an


n × r matrix, then the matrix product A · B is an m × r matrix with entries
n
X
(A · B)ij := aik bkj , 1 ≤ i ≤ m, 1 ≤ j ≤ r,
k=1

i.e. the (i, j)-th entry is given by multiplying the entries of the i-th row of A and j-th column of
B.

Note. A vector is just an n × 1 matrix, hence one can multiply a vector with A from the left.

Using matrix multiplication, we verify directly that

Proposition 2.16. If A, B ∈ Mn×n (K), then Tr(A · B) = Tr(B · A).

n
X n X
X n n X
X n n
X
Proof. Tr(A · B) = (A · B)ii = aik bki = bki aik = (B · A)kk = Tr(B · A).
i=1 i=1 k=1 k=1 i=1 k=1

A less-known fact is that matrix multiplication can also work “column-wise”:

Proposition 2.17. Multiplication of an m × n matrix A and an n × 1 vector b can be written as:


 
  b1
| | |  
  b 2

A·b= a a · · · a  = b1 a1 + · · · + bn an .
 
 ... 

 1 2 n 
| | |  
bn

Multiplication of an m × n matrix A and an n × r matrix B can be written as:


   
| | | | | |
   
A·B=A· b1 b2 · · · br  = A · b1 A · b2 · · · A · br  ,
  
| | | | | |

i.e. the j-th column of A · B is given by the vector A · bj .

Proof. Just check that the (i, j)-th entry on both sides are the same.

-27-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices

The exact relationship between matrices and linear transformations is given as follows.
Recall that by Definition 2.4, an m×n matrix A represents a linear transformation T : K n −→ K m ,
where the i-th column is the image of ei ∈ K n .

Proposition 2.18. Let A be the matrix representing a linear transformation T : K n −→ K m .


Then A · b = T (b).
If B is an n × r matrix representing another linear transformation S : K r −→ K n , then A · B
represents the linear transformation T ◦ S.

Proof. Let b ∈ K n . Note that b = b1 e1 + · · · + bn en in terms of the standard basis vectors.

ˆ By definition, if aj is the j-th column of A, then

T (b) = T (b1 e1 + · · · + bn en ) = b1 T (e1 ) + · · · + bn T (en ) = b1 a1 + · · · + bn an ,

but this is exactly A · b by Proposition 2.17.

ˆ By definition, the j-th column of the matrix representing T ◦ S is the image of ej :

(T ◦ S)(ej ) = T (S(ej )) = T (bj )

where bj is the j-th column of B. But by above, T (bj ) = A · bj is precisely the j-th column
of A · B by Proposition 2.17.

Notation. Usually the dot · will be omitted for matrix multiplications.

More generally, Definition 2.14 implies that

Proposition 2.19. For S ∈ L (U, V ), T ∈ L (V, W ) and choosing a basis B, B ′ , B ′′ for U, V, W


respectively, we have

[T ]B
B′′ [v]B′ = [T (v)]B′′ , ∀v ∈ V.
B′
[T ◦ S]B
B′′ = [T ] B′′ [S]B
B′ .

This can be summarized in the commutative diagram:

Composition of linear maps: T ◦S

S T
U V W
≃B ≃B ′ ≃B′′

[S]B
B′
[T ]B
B′′
Kr Kn Km


[T ]B
B′′
[S]B
B′

-28-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices

The kernel and image of a linear transformation represented by a matrix are given special names:

Definition 2.20. If T : K n −→ K m is represented by a matrix A, then

ˆ The kernel of T is called the null space NulA. It is the set of all solutions to Ax = 0 of
m homogeneous linear equations in n unknowns. It is a subspace of the domain K n .

ˆ The image of T is called the column space ColA. It is the set of all linear combinations of
the columns of A. It is a subspace of the target K m .

To find the basis of NulA and ColA, we can use row and column operations to change the
matrix into the reduced echelon form as follows.

Definition 2.21. Row operations on a matrix A consist of (c ∈ K, c ̸= 0)

(1) Interchanging two rows: ri ←→ rj .

(2) Adding a scalar multiple of a row to another row: ri 7→ ri + c rj .

(3) Rescaling a row: ri 7→ c ri .

Column operations are defined similarly.

Notation. When we talk about row vectors, we will treat them as horizontal 1 × n vectors.

It follows by definition that

Proposition 2.22. If A′ is obtained from A by (a series of) row operations, then


NulA = NulA′ .

Similarly, if A′′ is obtained from A by column operations, then


ColA = ColA′′ .

Therefore it suffices to apply row operations to simplify the matrix into a simpler form, the most
useful one is known as the reduced row echelon form.
Definition 2.23. The row echelon form is a matrix such that

ˆ all rows consisting of only zeroes are at the bottom, and


ˆ the pivot (leading entry) of a nonzero row equals 1 and is strictly to the right of the pivot of
the row above it.

It is reduced if in addition each column containing the pivot has zeros in all its other entries.

The (reduced) column echelon form is defined similarly.

-29-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices

Note. By definition, the non-zero rows of a (reduced) row echelon form are linearly independent.

It will become clearer by demonstrating the definitions with an example.


 
2 4 0 2
 
Example 2.17. Consider A = 
1 2 3 . Apply the following row operations to obtain:
4
2 4 8 10
     
2 4 0 2 0 0 −6 −6 0 0 −6 −6
  r1 7→r1 −2r2   r3 7→r3 −2r2  
1 2 3 4  −
 −−−−−−→ 1 2 3 4  −−−−−−−→ 1 2 3 4
  
 
2 4 8 10 2 4 8 10 0 0 2 2
     
0 0 1 1 0 0 1 1 1 2 3 4
r1 7→− 61 r1   r3 7→r3 −2r1   r1 ←→r2  
−−−−−−→  1 2 3 4 −−−−−−−→ 1
  2 3 4  −−−−−→  0
 0 1 1.
0 0 2 2 0 0 0 0 0 0 0 0

This is the row echelon form. The highlighted 1 are the pivots.

We further row operates to get the reduced row echelon form


 
1 2 0 1
r1 7→r1 −3r2   ′
−− −−−−−→   0 0 1 1 = A .

0 0 0 0

Now to find NulA = NulA′ , i.e. the solutions to


 
  x  
1 2 0 1   0
  y   
0 0 1 1 
  = 0

,
 z 
0 0 0 0   0
t

we rewrite this as a system of linear equations


(
x + 2y + t = 0,
z+t = 0.

Solving for the variables corresponding to the pivots (i.e. 1st and 3rd variable), we see that
(
x = −2y − t,
z = −t.

-30-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices

Therefore every solution to Ax = 0 is of the form


       
x −2y − t −2 −1
       
y   y  1 0
 =  = y  + t , ∀y, t ∈ R.
       
 z   −t  0 −1
       
t t 0 1

In other words,    
−2 −1
   
1 0
NulA = Span(  ,  )
   
 0  −1
   
0 1

and it is a 2-dimensional subspace of R4 .

Example 2.18. On the other hand, applying column operations instead, we have
       
2 4 0 2 2 0 0 2 2 0 0 0 2 0 0 0
  c2 7→c2 −2c1   c4 7→c4 −c1   c4 7→c4 −c3  
1 2 3 4  −
 −−−−−−→ 1 0 3 4  −−−−−−→ 1 0 3 3 −−−−−−→ 1 0 3 0 .
     

2 4 8 10 2 0 8 10 2 0 8 8 2 0 8 0

Obviously the two columns are linearly independent, so that


   
2 0
   
ColA = Span( 1 , 3)
  
2 8

and it is 2-dimensional subspace of R3 . We see that sometimes it is not necessary to go all the way
to the reduced echelon form to describe the column space or its dimension.

Another observation about row operations is the following:

-31-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices

Proposition 2.24. Row operations do not change the linear dependency of column vectors.
More precisely, if  
| | |
 
A= a 1 a 2 · · · a n


| | |
and we have a linear dependence

c1 a1 + · · · + cn an = 0, ci ∈ K

then after any row operations:  


| | |
A′ = 
 
a ′ a ′ · · · a ′ 
 1 2 n 
| | |
we still have
c1 a′1 + · · · + cn a′n = 0
for the same coefficients ci ∈ K.

In particular, ColA is spanned by the columns of the original matrix which contains the pivots
after row operations (which are linearly independent).

Example 2.19. Looking at Example 2.17 again, we can conclude that the 1st and 3rd column
vectors of A forms the basis of ColA, as shown in Example 2.18.

Corollary 2.25. The reduced row echelon form of a matrix A is unique, and uniquely determines
a spanning set of NulA that is reduced column echelon.

Proof. By Proposition 2.24, the reduced row echelon form is uniquely determined by how the
columns ck depend linearly on the previous columns {c1 , ..., ck−1 }, independently or as a linear
combination.
Recall that

Definition 2.26. The transpose AT of an m × n matrix A is the n × m matrix with entries

(AT )ij := (A)ji , 1 ≤ i ≤ n, 1 ≤ j ≤ m.

Remark. Some physics books use the notation t A for transpose.

-32-
Chapter 2. Linear Transformations and Matrices 2.4. Fundamental Theorems of Linear Algebra

Therefore column operations and column echelon form of A are just row operations and row echelon
form of AT , so in fact we only need to use one of them.

(Usually row operations are easier to deal with because we learn to add vertically as a child.)

We also see from above that row operations are enough to calculate both NulA and ColA.

A useful fact about transpose is the following property:

Proposition 2.27. The transpose of a matrix product is given by

(AB)T = BT AT .

Proof. This can be seen directly by changing the indices in the matrix multiplication formula
(Definition 2.15).

2.4 Fundamental Theorems of Linear Algebra

Now let us focus on the dimensions of the image and kernel of T ∈ L (V, W ).

Definition 2.28.

ˆ The rank of T is the dimension of the image of T .

ˆ The nullity of T is the dimension of the kernel of T .

Let T ∈ L (K n , K m ) be represented by an m × n matrix A. Then

ˆ The rank of A is the dimension of ColA.

ˆ The row space is the space spanned by the (transpose of) the rows of A.
It is a subspace of K n .

ˆ The row rank is the dimension of the row space.

ˆ The nullity of A is the dimension of NulA.

The next two Theorems form the Fundamental Theorems of Linear Algebra:

-33-
Chapter 2. Linear Transformations and Matrices 2.4. Fundamental Theorems of Linear Algebra

Theorem 2.29 (Column Rank = Row Rank). Rank of A = Row Rank of A.

Since row space of A is just the column space of AT , we can also rephrase Theorem 2.29 as

Rank of A = Rank of AT .

Proof. Example 2.17 and Proposition 2.24 tell us that dim ColA is given by the number of linearly
independent column vectors after either the row operations or the column operations.

Theorem 2.30 (Rank–Nullity Theorem). Let dim V < ∞ and T ∈ L (V, W ). Then

dim Im(T ) + dim Ker(T ) = dim V.

Proof. If dim W < ∞, then we can represent T by a matrix. From the steps of Example 2.17, we
can conclude that after row operations,

ˆ dim Ker(T ) is the number of columns that does not contain pivots.

ˆ dim Im(T ) is the number of columns that contain pivots.

ˆ dim V is just the number of columns.

If dim W = ∞, then we observe that Im(T ) is still finite dimensional:

ˆ If {u1 , ..., un } is a basis of V , then Im(T ) = Span(T (u1 ), ..., T (un )).

Therefore T can be considered instead as a composition of linear maps


T′
T : V −→ Im(T ) ,→ W.

Note that Im(T ) = Im(T ′ ) and Ker(T ) = Ker(T ′ ), so we can just apply the above arguments to T ′
instead.

Example 2.20. Using the matrix from Example 2.17 again, we see that A : R4 −→ R3 with

dim ColA = 2, dim NulA = 2, dim V = 4.

Remark. The Rank-Nullity Theorem still holds in the infinite dimensional case, in the sense that
if dim V = ∞, then either Im(T ) or Ker(T ) is infinite dimensional.

-34-
Chapter 2. Linear Transformations and Matrices 2.5. Invertible Matrices

2.5 Invertible Matrices

We turn our attention to square matrix A representing a linear transformation T ∈ L (K n , K n ).

Definition 2.31. A square n×n matrix A ∈ Mn×n (K) is invertible if it represents an isomorphism
T : K n ≃ K n . By Proposition 2.11, there exists a unique n × n matrix A−1 representing T −1 such
that
A−1 · A = A · A−1 = In×n .

There are many ways to determine when a matrix is invertible. The main ones are the following:

Theorem 2.32 (Invertible Matrix Theorem). Let A be a square n × n matrix. Then A is


invertible if and only if any one of the statements hold:

(1) Columns of A form a basis of K n .

(2) ColA = K n (i.e. A is surjective).

(3) NulA = {0} (i.e. A is injective).

(4) Nullity of A = 0.

(5) Rank of A = Row Rank of A = n.

(6) The reduced row echelon form of A is the identity matrix In×n .

Proof. (1)–(6) are all equivalent by the Fundamental Theorems of Linear Algebra.
Let A be a square n × n invertible matrix. Then for any b ∈ K n , the equation

Ax = b

has a unique solution


x = A−1 b.

To solve Ax = b directly, since the reduced row echelon form of A is the identity matrix In×n , by
row operations, the system of linear equations (in x) can be reduced to

I·x=u

where x = u is obtained from b by the same row operations and gives us the solution.

-35-
Chapter 2. Linear Transformations and Matrices 2.5. Invertible Matrices

In other words, if we write down the extended matrix as


 
A b
then after the same row operations, we will obtain
 
I u
so that the last column is our desired solution.

Now to find the inverse A−1 , assume that it is of the form


 
| | |
A−1 = 
 
c1 c2 · · · cn  .

| | |
Then we need to find the column vectors ci such that
Aci = ei .
Therefore, by the above method, we write the extended matrix as
   
A I = A e1 · · · en .
Then after the same row operations as above, we will obtain
   
I c1 · · · cn = I A−1
and the right half of the extended matrix will be the inverse of A.
   
2 0 2 1
   
Example 2.21. Consider A = 0 3 2. For example, to solve Ax = b where b = 2
  
,
0 1 1 3
we do row operations on the extended matrix
     
2 0 2 1 2 0 2 1 2 0 2 1
  r3 7→3r3   r3 7→r3 −r2  
0 3 2 2 −
 −−−−→ 0 3 2 2 −−−−−−→ 0 3 2 2
   

0 1 1 3 0 3 3 9 0 0 1 7
     
1
2 0 2 1 2 0 0 −13 r1 7→ 21 r1 1 0 0 − 13
2 
r2 7→r2 −2r3   r1 7→r1 −2r3   r2 7→ 3 r2 
−− −−−−−→  0 3 0 −12  −−−−−−−→ 0 3 0 −12 −−
   − − −→ 0 1 0 −4 
 
0 0 1 7 0 0 1 7 0 0 1 7
 
− 13
2
 
so that the (unique) solution is given by x = 
 −4 .

7

-36-
Chapter 2. Linear Transformations and Matrices 2.6. Change of Basis

To find the inverse, we apply the same row operations to the extended matrix
   
2 0 2 1 0 0 2 0 2 1 0 0
  r3 7→3r3  
0 3 2
 0 1 0  −
− −−−→ 0 3 2 0 1 0
 
0 1 1 0 0 1 0 3 3 0 0 3
   
2 0 2 1 0 0 2 0 2 1 0 0
r3 7→r3 −r2   r2 7→r2 −2r3  
−−−−−−→ 0 3 2 0 1
 0 −−−−−−−→ 0 3 0 0 3 −6
 
0 0 1 0 −1 3 0 0 1 0 −1 3
   
1
2 0 0 1 2 −6 r1 7→ 21 r1 1 0 0 12 1 −3
r1 7→r1 −2r3   r2 7→ 3 r2  
−−−−−−−→ 0 3 0 0 3
 −6 −−−−−→ 0 1 0 0 1 −2 .
 
0 0 1 0 −1 3 0 0 1 0 −1 3
 
1
−31
2
Therefore A−1 −1

=
 0 1 −2. We verify that indeed x = A b.

0 −1 3

Example 2.22. An important case is the inverse of a 2 × 2 matrix: If ad − bc ̸= 0, then


!−1 !
a b 1 d −b
= .
c d ad − bc −c a

2.6 Change of Basis

Recall that for v ∈ V and B = {b1 , ..., bn } a basis of V ,


 
c1
. n
.
[v]B = .∈K
cn

is the B-coordinate vector of v if v = c1 b1 + ... + cn bn .

If B ′ = {b′1 , ..., b′n } is another basis of V , then


 
c′1
. n
.
[v]B′ .∈K
=
c′n

is the B ′ -coordinate vector of v if v = c′1 b′1 + ... + c′n b′n .

-37-
Chapter 2. Linear Transformations and Matrices 2.6. Change of Basis

Recall from Definition 2.14 that


[v]B′ = [I ]B
B′ [v]B

which can be explicitly written as follows

Theorem 2.33 (Change of Basis Formula). There exists an n × n matrix PB


B′ such that

[v]B′ = PB
B′ [v]B

where column-wise it is given by


 
| | |
PB B
 
B′ := [I ]B′ = [b1 ]B′ [b2 ]B′ .
· · · [bn ]B′ 

| | |

PB ′
B′ is called the change-of-coordinate matrix from B to B .

By interchanging B and B ′ , we can conclude that

Proposition 2.34. The n × n matrix PB


B′ is invertible. We have

[v]B = PB
B [v]B′ .

Hence the inverse is given by ′ −1


PB B
B = PB ′ .

In practice, to find PB
B′ , it is better to do the change of basis through the standard basis:

B −→ E −→ B′ .

More precisely, if B is a basis of K n and E is the standard basis of K n , then by definition


[bi ]E = bi .

Hence we simply have  


| | |
PB
 
E = b1 b2 · · · bn  .
 
| | |

Proposition 2.35. For any bases B, B ′ , B ′′ of V , we have



PB B B
B′′ PB′ = PB′′ .

Therefore if B, B ′ are bases of K n , then



PB B −1 B
B′ = (PE ) PE .


Proof. It follows from Proposition 2.19 since it is just [I ]B B B
B′′ [I ]B′ = [I ]B′′ .
-38-
Chapter 2. Linear Transformations and Matrices 2.7. Similar Matrices

! !
1 0
Example 2.23. Let E = { , } be the standard basis of R2 . Let
0 1
! !
1 1
B = {b1 = , b2 = },
1 −1
! !
1 1
B ′ = {b′1 = , b′2 = }
0 1

be two other bases of R2 . Then !


1 1
PB
E =
1 −1
!
′ 1 1
PB
E =
0 1

PB B −1 B
B′ = (PE ) PE
!−1 ! !
1 1 1 1 0 2
= = .
0 1 1 −1 1 −1

One can check that this obeys the formula from Theorem 2.33:

b1 = 0 · b′1 + 1 · b′2
b2 = 2 · b′1 + (−1) · b′2 .

2.7 Similar Matrices

Let dim V = n. Recall that a linear transformation T ∈ L (V, V ) can be represented by an n × n


matrix K n −→ K n with respect to a basis B of V :
[T ]B : [x]B 7→ [T x]B .
Also recall the change of basis from B to B ′ :
PB
B′ : [x]B 7→ [x]B′ .

Then Proposition 2.19 implies:

Theorem 2.36. Let T ∈ L (V, V ) be such that [T ]B = A, [T ]B′ = B and P = PB


B′ . Then

B = PAP−1

Conversely, if B = PAP−1 and A = [T ]B for some T ∈ L (V, V ), then B = [T ]B′ where B ′ = P(B).

-39-
Chapter 2. Linear Transformations and Matrices 2.7. Similar Matrices

In a commutative diagram, we have

A
[x]B [T x]B
P P
B
[x]B′ [T x]B′

This leads to the following definition:

Definition 2.37. A is similar to B if there is an invertible matrix P such that

B = PAP−1 .

We write this as
A ∼ B.

In other words,

Similar matrices represent the same linear transformation with respect to different bases.

Since similar matrices represent the same linear transformation, it is clearly an equivalence re-
lation (see Appendix A). Hence we can simply say that A and B are similar .

An equivalence class [A] of similar matrices is identified precisely with the same linear transforma-
tion T they represent under different bases of V .

An important consequence of this discussion is that

Any quantities defined in terms of T only must be the same for similar matrices.

For T ∈ L (V, W ), if we choose a “nice basis” B, the matrix [T ]B can be very nice!

The whole point of Linear Algebra is to find nice bases for linear transformations.

When W = V , the best choice of such a “nice basis” is given by diagonalization, which means
under this basis, the linear transformation is represented by a diagonal matrix. We need to build
up more knowledge before learning the conditions that allow diagonalization in Chapter 5 and 7.

Remark. When V ̸= W , we have instead the Singular Value Decomposition, see Section 6.2.

-40-
Chapter 2. Linear Transformations and Matrices 2.7. Similar Matrices

! ! ! !
1 0 −2 1
Example 2.24. In R2 , let E = { , } be the standard basis, and B = { , }
0 1 2 1
be another basis.! Let T be the linear transformation represented in the standard basis E by
1 14 −2
A= . Then T is diagonalized in the basis B:
5 −2 11

!−1 ! ! !
14
−1 1 −2 5 − 25 1 −2 2 0
D := [T ]B = P AP = = .
2 1 − 25 11
5 2 1 0 3

!
1 −2
P = PB
E =
2 1

B
E

A = [T ]E D = [T ]B

P = PB
E

E B

! !
1 14 −2 2 0
A= D = P−1 AP =
5 −2 11 0 3

-41-
Chapter 2. Linear Transformations and Matrices 2.8. Vandermonde Matrices

2.8 Vandermonde Matrices

We conclude this section with an application about polynomials (over any field K).

We assume the obvious fact that a polynomial of degree n has at most n roots.

Proposition 2.38. Two polynomials in Kn [t] are the same if they agree on n + 1 distinct points.

Proof. If p1 (t) and p2 (t) are two polynomials of degree ≤ n that agree on n + 1 distinct points,
then p1 (t) − p2 (t) is a degree ≤ n polynomial with n + 1 distinct roots, hence must be 0.

Generalizing Example 2.4, we have

Definition 2.39. Given n + 1 distinct points t0 , ..., tn ∈ K (i.e. ti ̸= tj for i ̸= j), the evaluation
map is the linear transformation

T : Kn [t] −→ K n+1
 
p(t0 )
 
p(t) 7→  ··· .

p(tn )

Note that by Proposition 2.38, T is injective from Kn [t] −→ K n+1 . Since both vector spaces have
the same dimension n + 1, T must be an isomorphism by Theorem 2.32.

Since T is a linear transformation, by choosing the standard basis E = {1, t, t2 , ..., tn } of Kn [t]
and E ′ = {e0 , e1 , ..., en } of K n+1 (notice the index!), we can represent it by a matrix:

Proposition 2.40. Given n + 1 distinct points t0 , ..., tn ∈ K, T is represented by the


Vandermonde matrix :  
1 t0 t20 · · · tn0
 
1 t1 t2 · · · tn 
1 1
[T ]EE ′ =  . .

 .. .. .. . . .. 
 . . . 
1 tn t2n · · · tnn
and it is an invertible matrix.

Proof. The k-th column of [T ]EE ′ is the evaluation of the monomial tk at the points t0 , ..., tn .

Since T is an isomorphism, we can find its inverse. In other words, given n+1 points λ0 , ..., λn ∈ K,
we want to reconstruct a polynomial p(t) of degree n such that p(tk ) = λk .

-42-
Chapter 2. Linear Transformations and Matrices 2.8. Vandermonde Matrices

Proposition 2.41. For k = 0, 1, ..., n, the image of the polynomial (of degree n)

(t − t0 )(t − t1 ) · · · (t − tk−1 )(t − tk+1 ) · · · (t − tn ) Y t − tj


pk (t) := =
(tk − t0 )(tk − t1 ) · · · (tk − tk−1 )(tk − tk+1 ) · · · (tk − tn ) tk − tj
0≤j≤n
j̸=k

under T is the standard basis vector ek ∈ K n+1 .

Therefore the image of the polynomial


 
n
X Y t − tj 

p(t) := λ0 p0 (t) + λ1 p1 (t) + · · · + λn pn (t) = λk 
 tk − tj 
k=0 0≤j≤n
j̸=k
 
λ0
 
 λ1 
under T is exactly  .  ∈ K n+1 . This is called the Lagrange interpolation polynomial .
 
 .. 
 
λn
−1
In terms of the Vandermonde matrix, it follows that the k-th column of the inverse [T ]EE ′
is precisely given by the polynomial under the standard basis E having evaluation ek , i.e. the
coefficients of pk (t) above (ordered as E = {1, t, t2 , ..., tn }).
Example 2.25. For n = 2, we have
(t − t1 )(t − t2 )
p0 (t) =
(t0 − t1 )(t0 − t2 )
t1 t2 −t1 − t2 1
= ·1+ ·t+ · t2 .
(t0 − t1 )(t0 − t2 ) (t0 − t1 )(t0 − t2 ) (t0 − t1 )(t0 − t2 )
In other words,  
t1 t2
1 −t1 − t2  ∈ K 3 .
 
[p0 (t)]E =
(t0 − t1 )(t0 − t2 )  
1
Similarly,
   
t0 t2 t0 t1
1   1  
[p1 (t)]E = −t0 − t2  , [p2 (t)]E = −t0 − t1  .
(t1 − t0 )(t1 − t2 )   (t2 − t0 )(t2 − t1 )  
1 1
Therefore  −1  
t1 t2 t0 t2 t0 t1
1 t0 t20
 (t0 −t1 )(t0 −t2 ) (t1 −t0 )(t1 −t2 ) (t2 −t0 )(t2 −t1 )
 −t1 −t2 −t0 −t2 −t0 −t1
  
1 t1 t2  = .
 1  (t0 −t1 )(t0 −t2 ) (t1 −t0 )(t1 −t2 ) (t2 −t0 )(t2 −t1 ) 
1 1 1
1 t2 t22 (t0 −t1 )(t0 −t2 ) (t1 −t0 )(t1 −t2 ) (t2 −t0 )(t2 −t1 )

-43-
Chapter 2. Linear Transformations and Matrices 2.8. Vandermonde Matrices

-44-
CHAPTER 3

Determinants

3.1 Definitions

Let A be a square n × n matrix over a field K.

The determinant of A is a scalar that determines when a system of linear equations Ax = b


has a unique solution. Therefore it also allows us to check if A is invertible.

The determinant can be defined by specifying only the properties (axioms) we want it to satisfy.

Definition 3.1. The determinant is a function

det : Mn×n (K) −→ K

which satisfies the following properties:


(1) It is multilinear , i.e. it is linear with respect to any fixed columns:
– Addition:      
| | |
     
· · · u + v · · · = det · · · u · · · + det · · · v · · · .
det      
| | |

– Scalar multiplication: forc ∈ K,   


| |
   
· · · c · u · · · = c det · · · u · · · .
det    
| |

In particular, if a matrix A contains a zero column, det(A) = 0.

45
Chapter 3. Determinants 3.1. Definitions

(2) It is alternating : interchanging two columns gives a sign:


   
| | | |
   
det · · · ui
 uj · · · = − det · · · uj
 
.
ui · · · 
| | | |

In particular, det(A) = 0 if the matrix A has two same columns.

(3) det(I) = 1.

Note. By considering  
| |
 
det 
· · · u + v u + v · · · 

| |
we see that if det(A) = 0 for any matrix A which has two same columns, then the alternating
properties hold. Therefore the two properties are equivalent.

Assume the determinant is well-defined. Then the properties allow us to calculate det A using
column operations. For example we have
     
| | | | | |
     
det 
· · · u i + cu j u j · · ·  = det · · · ui uj · · · + c det · · · uj uj · · ·
    
| | | | | |
 
| |
 
= det 
· · · u i u j · · · +0

| |
 
| |
 
= det 
· · · ui uj · · · .

| |

-46-
Chapter 3. Determinants 3.1. Definitions

Proposition 3.2. The determinant of a triangular (in particular diagonal) matrix is the product
of its diagonal entries.

Proof. Let us consider lower triangular matrix. By multilinearity and column operations,

   
a11 0 · · · ··· 0 1 0 ··· ··· 0
   
 ∗ a22 0 ··· 0  ∗ 1 0 · · · 0
 
 
 . .. .. ..  (1) . . . . . .. 
det  . ∗ . . = a a · · · a det  .. ∗ . . .
 . . 
 11 22 nn  
 . . .. . .. ..
 .. ..  ..
 
 . 0  
 . . 0

∗ ··· ··· ∗ ann ∗ ··· ··· ∗ 1
 
1 0 ··· ··· 0
 
0 1
 0 · · · 0 
. . . . . .. 
. 0
(column) = a11 a22 · · · ann det  . . . .
. .. ..
 ..

 . . 0
0 ··· ··· 0 1
| {z }
=I
(3)
= a11 a22 · · · ann .

Similarly for upper triangular matrix. Note that the proof still works even for some aii = 0.

Example 3.1. We calculate by the properties:


   
1 2 3 c2 7→c2 −2c1 1 0 0
  c3 7→c3 −3c1  
4 5 6 ==== det 4
det    −3 −6 

7 8 8 7 −6 −13
 
1 0 0
c3 7→c3 −2c2  
==== det  4 −3 0 
7 −6 −1
= 1 · (−3) · (−1) = 3.

However, in order for these calculations to be well-defined, we first need to show that such
determinant function satisfying (1)-(3) actually exists and is unique.

-47-
Chapter 3. Determinants 3.2. Existence and Uniqueness

3.2 Existence and Uniqueness

We show that the determinant function is well-defined and unique, by explicitly writing down its
formula and check that it satisfies the conditions.

Theorem 3.3 (Leibniz Expansion / Expansion by Permutations).

ˆ Define X
det A := ϵ(σ)aσ(1)1 aσ(2)2 · · · aσ(n)n
σ∈Sn

where
– Sn is the set of all permutations of {1, ..., n},
( (
+1 even
– the sign of permutation ϵ(σ) = if σ ∈ Sn can be obtained by an number
−1 odd
of transpositions (i.e. exchanging 2 numbers).
Then it satisfies (1)–(3) of Definition 3.1.

ˆ Any function D satisfying (1)–(2) is a scalar multiple c · det where c = D(I).

ˆ In particular det is the unique function satisfying (1)–(3).

Example 3.2. When n = 1:  


det a = a.

Example 3.3. When n = 2: !


a b
det = ad − bc.
c d

Example 3.4. When n = 3, it is also well-known:


 
a b c
 
det d e f  = aei + bf g + cdh − af h − bdi − ceg,

g h i

i.e. adding all the “forward diagonals” (cyclically) and subtracting the “backward diagonals”.

To explain the Leibniz expansion formula, we first show that if D is any function satisfying (1)–(3),
it must be of the form defined in the Theorem.

-48-
Chapter 3. Determinants 3.2. Existence and Uniqueness

Proof. (Uniqueness.) Assume the function D satisfying (1)–(3) exists.


n
X
ˆ Note that each column aj of A can be written in its components as aij ei .
i=1

ˆ Then by multilinearity,
 
| |
 
D(A) = D 
a 1 · · · a n


| |
 
| |
X n n
X 
 
= D ai1 1 ei1 · · · ain n ein 

i1 =1 in =1 
| |
 
n | |
X  
= ai1 1 · · · ain n D 
ei1 · · · ein 

i1 ,...,in =1
| |

summing over all possible indices.


ˆ By alternating properties, most of the terms vanish (with repeated columns), except those
where (i1 , i2 , ..., in ) do not repeat, i.e. it is a permutation of {1, ..., n}.

ˆ Hence  
| |
X  
D(A) = aσ(1)1 aσ(2)2 · · · aσ(n)n D 
eσ(1) · · · eσ(n) 

σ∈Sn
| |
where Sn is the set of permutations of {1, ..., n}.

ˆ Again by alternating properties, we can swap the columns to bring it back to the identity
matrix, picking up a sign ± which depends on σ. More precisely,
 
| |
 
Deσ(1) · · · eσ(n)  = ϵ(σ)D(I)

| |

where the sign ϵ(σ) of permutation σ is defined as in the Theorem.

Therefore any function satisfying (1)–(2) must be of the form


X
D(A) = ϵ(σ)aσ(1)1 aσ(2)2 · · · aσ(n)n D(I)
σ∈Sn

and condition (3) requires D(I) = 1 which will fix the formula.
-49-
Chapter 3. Determinants 3.2. Existence and Uniqueness

It remains to show that this expression is indeed well-defined, i.e. it satisfies (1)–(3).

Proof. (Existence.) Define det by the Leibniz expansion formula.

(1) For each fixed k, since each term in the expansion contains exactly one term aσ(k)k with
second index k, the expression is multilinear in the k-th column.
(If ak = u + cv then aσ(k)k 7→ uσ(k) + cvσ(k) .)
(2) If the k-th and l-th column of A are the same, then the terms in the expansion

ϵ(σ) · · · aik · · · ajl · · · and ϵ(σ ′ ) · · · ajk · · · ail · · ·

differs by a sign
ˆ σ ′ is related to σ by swapping i and j,
ˆ aik = ail , ajk = ajl by assumption.
so they cancel in the expansion, and det(A) = 0.
(3) Obvious.
The uniqueness enables us to show easily the multiplicative property of det:

Corollary 3.4. For any n × n matrices A and B,

det(AB) = det(A) det(B).

Proof. Fix the matrix A.

ˆ Define a function D on matrices by


 
D(B) := det(AB) = det Ab1 · · · Abn

which is obviously multilinear and alternating in the columns of B.


ˆ Hence D satisfies (1)–(2), and so by the uniqueness part of the Theorem

D(B) = c det(B)

where c = D(I) = det(A), which implies the identity.

Corollary 3.5. A is invertible if and only if det(A) ̸= 0, and

det(A−1 ) = det(A)−1 .

Proof. If A is invertible, det(A) det(A−1 ) = det(I) = 1 =⇒ det(A) ̸= 0.

If A is not invertible, then column operations will bring A to the column echelon form with a zero
column, hence det A = 0.
-50-
Chapter 3. Determinants 3.2. Existence and Uniqueness

Corollary 3.6. If A and B are similar, then

det(A) = det(B).

Proof. Assume B = PAP−1 . Then


det(B) = det(PAP−1 ) = det(P) det(A) det(P)−1 = det(A).

Therefore, the following is well-defined:

Definition 3.7. If dim V < ∞, the determinant of a linear transformation T ∈ L (V, V ) is


defined to be
det(T ) := det(A)
for any square matrix A representing T in some basis B of V .

Finally, we state a useful calculational result.

Corollary 3.8. The determinant of a block triangular (in particular block diagonal) matrix
is the product of determinant of its diagonal blocks:
!
Ak×k Bk×l
det = det(Ak×k ) det(Cl×l )
Ol×k Cl×l

where O is the zero matrix.

Proof. We have a matrix product decomposition


! ! !
A B I B A O
= .
O C O C O I

ˆ By column operations, we have


! !
I B I O
det = det .
O C O C
!
A O
ˆ As a function of A, det obviously satisfies (1)-(3), hence equals det A.
O I
!
I O
(Or just see it directly from the expansion formula.) Similarly det = det C.
O C

-51-
Chapter 3. Determinants 3.3. Elementary Matrices

3.3 Elementary Matrices

Now that we know determinant is well-defined, we can compute it by column operations.

By representing the column operations as elementary matrices, we show that det A = det AT ,
so that in fact the determinant can also be computed by row operations.

Proposition 3.9. The column operations correspond to multiplying on the right by the
elementary matrices (c ∈ K, c ̸= 0):

i j
..
 
.
 
ˆ E =
 1 c i
 : Adding a multiple of i-th column to the j-th column.
 

 0 1 j

..
.

det(E) = 1, det(AE) = det A.

Note. We allow i > j, in which case c is on the lower left.

i
. 
..
ˆ E =
  : Scalar multiplying the i-th column by c.
 c i

..
.

det(E) = c, det(AE) = c · det A.

i j
..
 
.
 
ˆ E =
 0 1 i
 : Interchanging two columns.
 

 1 0 j

..
.

det(E) = −1, det(AE) = − det A.

.
Here the . . means it is 1 on the diagonal and 0 otherwise outside the part shown.

-52-
Chapter 3. Determinants 3.3. Elementary Matrices

In exact analogy:

Proposition 3.10. Row operations correspond to multiplying on the left by elementary matrices.

We can now prove the following important result:

Theorem 3.11. For any square matrix A, we have

det A = det AT .

In other words:

We can calculate the determinant by both column and row operations.

Proof. Note that if E is an elementary matrix of the type listed, then

det E = det ET .

ˆ If det A = 0, then A is not invertible, hence AT is also not invertible (by Theorem 2.29),
hence det AT = 0.

ˆ Otherwise, we can reduce A to I by column operations

ˆ This is the same as saying we can obtain A from I by column operations, i.e.

A = IE1 E2 · · · Ek .

ˆ Taking transpose, we have


AT = ETk · · · ET2 ET1 I.

In other words, AT can be obtained from I by (the same) row operations.

ˆ Taking determinant, we obtain

det A = det(IE1 E2 · · · Ek )
= det(E1 ) det(E2 ) · · · det(Ek )
= det(ET1 ) det(ET2 ) · · · det(ETk )
= det(ETk · · · ET2 ET1 I) = det AT .

-53-
Chapter 3. Determinants 3.4. Volumes

3.4 Volumes

When K = R, the determinant has a geometric meaning.

Theorem 3.12. det A is the n-dimensional signed volume of the parallelepiped

P := {c1 v1 + · · · + cn vn : 0 ≤ ci ≤ 1} ⊂ Rn

spanned by the columns vi of A (and | det A| is the volume).

Volume of a parallelepiped is defined inductively by Base Volume × Height.

In our notation, the base is formed by any n − 1 column vectors.

v3 v2

v1

Proof. By column operations, we can reduce the matrix to I, which corresponds to a hypercube
and we define its volume to be 1. Hence we only need to check that volume behaves like | det |
under column operations.

ˆ Adding multiples of one column to another means shearing :

v 7→v +c·v
v3 −−3−−−3−−−→
2
v′
v2 3 v2

v1 v1
It does not change the base and height, so that

Vol(P′ ) = Vol(P).

-54-
Chapter 3. Determinants 3.4. Volumes

ˆ Scalar multiplying a column by c scales the parallelepiped:

v 7→2v
v3 v2 −−1−−−→
1
v3 v2

v1
2v1
Vol(P′ ) = |c|Vol(P).

ˆ Interchanging two columns does not change the parallelepiped.

The sign of the determinant tells us about orientation:

Definition 3.13. Let B = {v1 , ..., vn } ⊂ Rn be a basis.


 
| |
ˆ B is positively oriented if det 
 
v1 · · · vn  > 0.

| |

ˆ Otherwise B is negatively oriented .

Note. The parallelepiped formed by linearly dependent vectors has zero volume.
In this case the orientation is not defined.

Intuitively, this means the order of the vectors “looks like” the order of the standard basis.
In 3 dimension, this is known as the right hand rule (which is also used to determine the direction
of the vector cross product u × v.)

Interchanging two columns, or scalar multiply by negative numbers, will switch the orientation.

-55-
Chapter 3. Determinants 3.5. Laplace Expansion

Positively oriented:

v2
+ v3 v2 v1 v3

v1
v1 v2
Negatively oriented:
v2
v1
v1
v2
− v3 v1 v2

v3
If a parallelepiped is spanned by positively oriented vectors, then after shearing, it becomes a
rotated rectangular box of the standard basis.

Recall that the image of the standard basis {e1 , ..., en } ∈ Rn under A is exactly its columns.
Therefore | det A| gives the scaling factor of a linear transformation. This gives a geometric
explanation of the product formula
det(AB) = det(A) det(B).

3.5 Laplace Expansion

An alternative and much more useful formula is the Laplace expansion formula.

Theorem 3.14 (Laplace Expansion / Expansion by Cofactors). For any i, j, we have

ˆ Expansion by row:
det(A) = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin .

ˆ Expansion by column:

det(A) = a1j C1j + a2j C2j + · · · + anj Cnj .

where

ˆ Cij := (−1)i+j det Aij are called the cofactors,

ˆ Aij is the submatrix obtained from A by deleting the i-th row and j-th column.

-56-
Chapter 3. Determinants 3.5. Laplace Expansion

Proof. Since det A = det AT , we just need to prove one version. Let’s do expansion by column.

ˆ We write aj = a1j e1 + · · · + anj en .


ˆ By linearity,
det(A) = a1j D1 (A) + · · · + anj Dn (A)
where
··· ···
 
a11 0 a1n
 .. .. .. 
 . . . 
 
Di (A) = det  ai1 ··· 1 ··· ain  i
 
 .. .. .. 
 
 . . . 
an1 ··· 0 ··· ann
j

i.e. the j-th column is replaced by ei .


ˆ By column operations, we can remove the i-th row. The remaining entries equal Aij .
ˆ We can swap the 1 upward (one step at a time, which picks up a sign (−1)i−1 ) and leftward
(which picks up (−1)j−1 ) to bring it to the upper left corner and get
 
1 0 ··· 0
 
0 
Di (A) = (−1)i+j det  .  = (−1)i+j det(Aij ).
 
 .. Aij 
 
0

Example 3.5. Random example. Using the first row,


 
0 2 3 ! ! !
  0 1 1 1 1 0
det 1 0 1 = 0 · det 3 2 − 2 · det 0 2 + 3 · det 0 3 = 0 − 2 · 2 + 3 · 3 = 5,

0 3 2

or using the second column,


 
0 2 3 ! ! !
  1 1 0 3 0 3
det 
1 0 1 = −2 · det 0 2 + 0 · det 0 2 − 3 · det 1 1 = −2 · 2 + 0 − 3 · (−3) = 5.
0 3 2

Of course, the smart way


! is to choose the first column, because there are 2 zeros, giving us imme-
2 3
diately (−1) det = 5.
3 2

-57-
Chapter 3. Determinants 3.6. Cramer’s Rule

3.6 Cramer’s Rule

Given an n × n invertible matrix A, we can find the solution to Ax = b explicitly as follows.


 
x1
 . 
Theorem 3.15 (Cramer’s Rule). If A is invertible, the solution to Ax = b is given by x =  . 
 . 
xn
where
det(Ai )
xi =
det(A)
and Ai is obtained from A by replacing the i-th column with b.

Geometrically, if B := {a1 , ..., an } is the basis formed from the columns of A, then the amount xi it
requires to scale in the ai direction to obtain b is precisely the ratio between the volume spanned
by B and the volume of B ′ obtained from B by replacing ai with b.

 
x1
 . 
Proof. By definition, if x =  . 
 .  is a solution, then
xn
x1 a1 + · · · + xn an = b.
Therefore by multilinearity and the alternating properties,
i
 
| | |
 
| | |
det(Ai ) = deta1 ··· b ··· an = x det 
  
i
a1 · · · ai · · · an  = xi det(A).
 
| | | | | |

Remark. Cramer’s Rule is pretty useless in practice: it requires calculating n + 1 determinants of size
n × n, which is computationally difficult when n is large.

Example 3.6. Using the same matrix from Example 2.21, we want to solve
    
2 0 2 x 1
    
0 3 2 y  = 2 .
    
0 1 1 z 3

-58-
Chapter 3. Determinants 3.6. Cramer’s Rule

Then det(A) = 2 and


     
1 0 2 2 1 2 2 0 1
det(A1 ) = det 2 3 2 = −13 det(A2 ) = det 0 2 2 = −8 det(A3 ) = det 0 3 2 = 14.
     

3 1 1 0 3 1 0 1 3

Hence by Cramer’s Rule, the solution is given by


   
−13 − 13
1   2
x =  −8 

 =  −4 
 
2
14 7

as before.

With Cramer’s Rule, we can easily derive a closed formula for the inverse A−1 as follows.

Recall the cofactors


Cij = (−1)i+j det(Aij )
where Aij is obtained from A by removing the i-th row and j-th column.

We form the cofactor matrix  


C := Cij
and the adjugate matrix
adj(A) := CT .

Theorem 3.16. If A is invertible, we have

adj(A)
A−1 = .
det(A)

Proof. The j-th column of A−1 equals the solution of Ax = ej , so the (i, j)-th entry is xi .

By Cramer’s rule,

 
| | |
xi det A = det Ai = deta1 ··· ej ··· an .
 

| | |
i

By Laplace expansion, this is the same as det(Aji ) with the appropriate sign (−1)i+j .

-59-
Chapter 3. Determinants 3.6. Cramer’s Rule

-60-
CHAPTER 4

Inner Product Spaces

In this chapter we focus on K = R. We define the geometric concepts of length, distance, angle
and perpendicularity for Rn . This gives Rn the structure of an Euclidean space.

4.1 Inner Products

As before, we write a point u ∈ Rn as a column vector, i.e. an n × 1 matrix.


−−→
Notation. We shall not distinguish a point P ∈ Rn and its position vector OP from the origin.

Definition 4.1. The dot product of u, v ∈ Rn , i.e.


   
u1 v1
 .  .
u= .  .
 . , v=
.
un vn

is defined to be  
v1 n
.. 
   X
u · v := uT v = u1 · · · un  .  =
 ui vi ∈ R.
i=1
vn

Notation. To avoid confusion, I will omit · for scalar multiplication: writing cu instead of c · u.

-61-
Chapter 4. Inner Product Spaces 4.1. Inner Products

With this notation, we easily have

Proposition 4.2. If A = (aij ) is an m × n matrix, then the matrix entries are given by

aij = e′i · Aej

where {ej } is the standard basis for Rn and {e′i } is the standard basis for Rm .
   
− aT1 − | |
 ..   
 is an m × k matrix, and B = b1 · · · bn  is a k × n matrix, then the
If A = 
 .   
− aTm − | |
matrix product is given by  
a1 · b1 · · · a1 · bn
 . .. .. 
AB =  .
 . . . .
am · b1 · · · am · bn
This is just a restatement of Proposition 2.17.

Motivated from the obvious properties of dot product, a more general notion is the inner product:

Definition 4.3. An inner product is a binary operator ⟨ , ⟩ : V × V −→ R satisfying


for any u, v, w ∈ V and c ∈ R:

(1) Commutativity: ⟨u, v⟩ = ⟨v, u⟩.


(2) Linearity: ⟨u + v, w⟩ = ⟨u, w⟩ + ⟨v, w⟩.
⟨cu, v⟩ = c⟨u, v⟩.
(3) Positivity: ⟨u, u⟩ ≥ 0, and ⟨u, u⟩ = 0 ⇐⇒ u = 0.

A vector space V over R equipped with an inner product is called an inner product space.

Note. (1) implies that the inner product is also linear in the second argument.

Example 4.1. V = Rn equipped with the dot product as the inner product
⟨u, v⟩ := u · v

is called an Euclidean space.

Example 4.2. There can be other inner product on Rn . For example:

⟨x, y⟩ := c1 x1 y1 + · · · + cn xn yn

for any choice of ci > 0.

-62-
Chapter 4. Inner Product Spaces 4.1. Inner Products

Example 4.3. Given two m × n matrices A, B,

⟨A, B⟩ := Tr(AT B)

satisfies Definition 4.3 and is an inner product. Under the isomorphism Mm×n (R) ≃ Rmn , it is just
the usual dot product.

Example 4.4. An example of inner product space that is infinite dimensional:


Let C 0 [0, 1] be the vector space of real-valued continuous function defined on [0, 1].
For f, g ∈ C 0 [0, 1],
Z 1
⟨f, g⟩ := f (x)g(x)dx
0

gives an inner product on C 0 [0, 1].

Remark. Infinite dimensional inner product space belongs to the theory of Hilbert spaces which is very
important in mathematical physics and functional analysis, but also much more complicated.

In this chapter, we will mostly focus on the finite dimensional case unless otherwise specified.

Notation. We will fix the dot product as the inner product of Rn in all the Examples below
unless otherwise specified.

Remark. When K = C, we need to modify Definition 4.3 so that it involves the complex conjugate:

⟨ , ⟩ : V × V −→ C

(1∗ ) : ⟨u, v⟩ = ⟨v, u⟩.


The dot product on Cn is then defined to be
 
u1
  n
   u2  X
u · v := v1 v2 ··· vn  .  = ui vi .
 
 . .
  i=1
un

This ensures the positivity (3) holds. Also note that we now have

⟨u, cv⟩ = ⟨cv, u⟩ = c⟨v, u⟩ = c⟨u, v⟩

so the inner product is conjugate-linear in the second argument.

We will focus on complex inner product space in Chapter 7.

-63-
Chapter 4. Inner Product Spaces 4.1. Inner Products

Let V be an inner product space. Motivated from classical geometry, we have a list of analogies:

Definition 4.4. The norm (or length) of v ∈ V is the non-negative scalar


p
∥v∥ := ⟨v, v⟩ ∈ R≥0 .

Obviously, for c ∈ R or C, we have ∥cv∥ = |c|∥v∥.


p
For the dot product, this is just the usual Euclidean length ∥v∥ = v12 + · · · + vn2 .

Definition 4.5. A vector u ∈ V with unit length, i.e. ∥u∥ = 1 is called a unit vector .

1
Given 0 ̸= v ∈ V , the vector v has unit length and is called the normalization of v.
∥v∥

 
1
 
−2
Example 4.5. v =   ∈ R4 has Euclidean length
 
2
 
0
p √
∥v∥ = 12 + (−2)2 + 22 + 02 = 9 = 3,
 
1
 3 
1 − 2 
and the unit vector v =  23  is its normalization.
 
3  
 3 
0

Definition 4.6. The distance between u, v ∈ V is defined by

dist(u, v) := ∥u − v∥.

Definition 4.7 (Law of Cosine). The angle 0 ≤ θ ≤ π between nonzero u, v ∈ V is defined by

⟨u, v⟩ = ∥u∥∥v∥ cos θ.

The angle θ is well defined by the Cauchy-Schwarz Inequality below.

Note. For general inner product, θ might not be the same as the Euclidean angle between two
vectors.

-64-
Chapter 4. Inner Product Spaces 4.2. Orthogonal Bases

π
In analogy to the Euclidean case θ = , we say that
2
Definition 4.8. Two vectors u, v ∈ V are orthogonal (or perpendicular ) to each other if

⟨u, v⟩ = 0.

! !
1 1
Example 4.6. and are orthogonal to each other in R2 with respect to the dot product.
1 −1

Example 4.7. 0 is orthogonal to every vector in V by definition (even though θ is undefined).

Simple results from classical geometry still holds for general inner product space:

Theorem 4.9 (Pythagorean Theorem). Let u, v ∈ V . If ⟨u, v⟩ = 0, then

∥u + v∥2 = ∥u∥2 + ∥v∥2 .

Theorem 4.10 (Cauchy–Schwarz Inequality). For all u, v ∈ V ,

|⟨u, v⟩| ≤ ∥u∥∥v∥.

Theorem 4.11 (Triangle Inequality). For all u, v ∈ V ,

∥u + v∥ ≤ ∥u∥ + ∥v∥.

Proof.

(4.9) Expand both sides as inner products using Definition 4.4.

(4.10) ∥u + λv∥2 ≥ 0 for all λ. Expanding as inner products give a quadratic polynomial in λ.
Take the discriminant ∆ ≤ 0 gives the result.

(4.11) Squaring both sides and simplify, this is just 4.10.

4.2 Orthogonal Bases

Let V be an inner product space over R (which can be infinite dimensional).

-65-
Chapter 4. Inner Product Spaces 4.2. Orthogonal Bases

Definition 4.12. Let S = {u1 , ..., ur } ⊂ V be a finite set.


ˆ S is called an orthogonal set if ⟨ui , uj ⟩ = 0 for all i ̸= j.

ˆ If in addition S is a basis of V , it is called an orthogonal basis for V .

ˆ If in addition all vectors in S has unit norm, it is called an orthonormal basis for V .

Example 4.8. The standard basis {e1 , e2 , ..., en } for Rn is an orthonormal basis:
(
1 if i = j,
ei · ej = δij =
0 if i ̸= j.

where δij is called the Kronecker delta.

( ! !)
1 1
Example 4.9. The set , is an orthogonal basis for R2 .
1 −1
   
 √1 √1 
Its rescaled version, the set  12  ,  21  is an orthonormal basis for R2 .
 √
2
−√  2

Proposition 4.13. Let B = {b1 , ..., bn } be an orthogonal basis for V (i.e. dim V = n). Then the
coordinate mapping with respect to B is given explicitly by
ψV,B : V −→ Rn
 
c1
.
v 7→ [v]B :=  .
.
cn

where ⟨v, bi ⟩
ci = , i = 1, ..., n.
⟨bi , bi ⟩

Proof. Let v = c1 b1 + · · · + cn bn and take inner product with bi on both sides.


An immediate consequence is:

Corollary 4.14. If T ∈ L (V, W ) is represented by a matrix A = (aij ) = [T ]BB′ with respect to a



basis B = {v1 , ..., vn } ⊂ V and an orthonormal basis B = {w1 , ..., wm } ⊂ W , then

aij = ⟨wi , T vj ⟩.

Proof. By definition the j-th column of A is [T vj ]B′ . Letting bi = wi in Proposition 4.13 gives the
i-th component of this column, which is exactly aij .
-66-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections

Sometimes the coordinates also make sense when V is infinite dimensional:

Example 4.10. Consider 2π-periodic smooth real functions on R with inner product
Z 2π
⟨f, g⟩ := f (x)g(x)dx.
0

Then the functions {cos nx, sin nx}n∈Z≥0 are orthogonal to each other. The coefficients under the
coordinate mapping is exactly the coefficients an , bn of the Fourier series.

Consider 2π-periodic smooth complex functions on R with complex inner product


Z 2π
⟨f, g⟩ := f (x)g(x)dx.
0

Then the functions {einx }n∈Z are orthogonal to each other. The coefficients under the coordinate
mapping is exactly the coefficients cn of the complex Fourier series.

4.3 Orthogonal Projections

Let V be an inner product space (which can be infinite dimensional), and let U ⊂ V be a subspace.

Definition 4.15. The orthogonal complement of U is the subset

U ⊥ := {v ∈ V : ⟨v, u⟩ = 0 for every u ∈ U }.

From the definition, we have the following simple properties:

Proposition 4.16. The following properties hold:

ˆ U ⊥ is a subspace of V .

ˆ U ⊂ (U ⊥ )⊥ .

ˆ u ∈ U ⊥ iff u is orthogonal to every vector in a spanning set of U .

In terms of matrices, we also see by definition that

Proposition 4.17. Let A ∈ Mm×n (R). Then

(RowA)⊥ = NulA, (ColA)⊥ = NulAT

with respect to the standard dot product on Rn and Rm .

-67-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections

Let u ∈ V . For any vector b ∈ V , we can project to u “perpendicularly”. In other words, we have
b = b∥ + b⊥
such that Proju (b) := b∥ is parallel to u (i.e. multiple of u), while b⊥ is perpendicular to u.

b⊥

u
e Proju (b)

Proposition 4.18. The orthogonal projection of b onto u is given by

⟨b, u⟩
Proju (b) := ⟨b, e⟩e = u
⟨u, u⟩
u
where e := is the unit normalization of u.
∥u∥

Proof. Let Proju (b) = ce, i.e. b = ce + b⊥ . Take inner product with e to get c = ⟨b, e⟩.
By straightforward induction, the Gram–Schmidt Process gives a simple algorithm to construct
an orthogonal basis from an arbitrary basis. It works for both K = R or C.

Theorem 4.19 (Gram–Schmidt Process). Let dim V = n and {x1 , ..., xn } be a basis for V .
Define

u1 := x1
u2 := x2 − Proju1 (x2 )
u3 := x3 − Proju1 (x3 ) − Proju2 (x3 )
..
.
un := xn − Proju1 (xn ) − Proju2 (xn ) − · · · − Projun−1 (xn )

where
⟨x, u⟩
Proju (x) := u
⟨u, u⟩
is the orthogonal projection (see Proposition 4.18).

Then B = {u1 , ..., un } is an orthogonal basis for V . Furthermore, for all 1 ≤ k ≤ n,

Span(u1 , ..., uk ) = Span(x1 , ..., xk ).

-68-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections

To obtain an orthonormal basis, just normalize all the basis vectors:


ui
ui ⇝ .
∥ui ∥

In particular,

Corollary 4.20. Any finite dimensional inner product space has an orthonormal basis.

We can now establish the following important result, which gives a canonical choice (i.e. indepen-
dent of basis) of direct sum complement (see Theorem 1.30) if an inner product exists.

Theorem 4.21 (Orthogonal Decomposition Theorem). Let dim V < ∞ and U ⊂ V be a


subspace. Then each v ∈ V can be written uniquely in the form

v = v∥ + v⊥

where v∥ ∈ U and v⊥ ∈ U ⊥ . Therefore we have

V = U ⊕ U ⊥.

In particular U ∩ U ⊥ = {0}.

We also write ProjU (v) := v∥ . Note that ProjU ∈ L (V, V ) with

Im(ProjU ) = U, Ker(ProjU ) = U ⊥ .

Also if u ∈ U , then ProjU (u) = u.

Proof. Let B = {u1 , ..., ur } be an orthonormal basis of U (which exists by Corollary 4.20.)

ˆ Let v∥ = c1 u1 + · · · + cr ur .

ˆ In order for v − v∥ ∈ U ⊥ , we take inner product with ui ∈ U and get

ci = ⟨v, ui ⟩.

ˆ Therefore

v∥ = ⟨v, u1 ⟩u1 + · · · + ⟨v, ur ⟩ur


= Proju1 (v) + · · · + Projur (v)

and v⊥ := v − v∥ give us the unique decomposition.

-69-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections

Remark. V can be infinite dimensional. Theorem 4.21 only requires U to be finite dimensional.

Note. The uniqueness statement says that the orthogonal decomposition, i.e. the formula for v∥
does not depend on the choice of basis B of U used in the proof.

U⊥

v⊥ v

u2 Proju (v)
2

u1
Proju (v)
1

U = Span(u1 , u2 ) ProjU (v)

Corollary 4.22. If dim V < ∞ and U ⊂ V is a subspace, then

(U ⊥ )⊥ = U.

Proof. By Proposition 4.16, we only need to show that (U ⊥ )⊥ ⊂ U .

ˆ If v ∈ (U ⊥ )⊥ , then ⟨v, w⊥ ⟩ = 0 for any w⊥ ∈ U ⊥ .


ˆ Also by the Orthogonal Decomposition Theorem, v = v∥ + v⊥ where v∥ ∈ U and v⊥ ∈ U ⊥ .
ˆ Therefore ⟨v, v⊥ ⟩ = 0 =⇒ ⟨v⊥ , v⊥ ⟩ = 0 =⇒ v⊥ = 0.
ˆ So v = v∥ ∈ U .

Remark. If V is infinite dimensional, then in general U ⊊ (U ⊥ )⊥ .

By choosing an orthonormal basis with respect to the dot product, we can represent the projection
ProjU by a matrix:

-70-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections

Proposition 4.23. If {u1 , ..., ur } is an orthonormal basis for U ⊂ Rn with respect to the dot
product, then
ProjU (x) = (x · u1 )u1 + · · · + (x · ur )ur .
 
| |
 
Equivalently, if P =  u 1 · · · u r
 is an n × r matrix, then
 
| |

ProjU (x) = PPT x.

The matrix PPT is an n × n matrix, and by uniqueness it does not depend on the choice of
orthonormal basis used to construct P. In fact, it is an orthogonal projection matrix:

Definition 4.24. A projection matrix is an n × n matrix M such that

M2 = M.

It is an orthogonal projection matrix if in addition

MT = M.

   
1 1
   
 then {u1 , u2 } is an orthogonal
Example 4.11. If U = Span(u1 , u2 ) where u1 = 0 , u2 =  1 
  
1 −1
basis for U since u1 · u2 = 0.
   
√1 √1
u1  2 u2  13 
The normalization = 0 , =  √  is then an orthonormal basis for U . We have
∥u1 ∥   ∥u2 ∥  3 
√1 − √13
2
 
√1 √1
 2 3 
P= √1 
0 3 
√1 − √13
2

and therefore
   
√1 √1   5 1 1
 2 3  √1 0 √1 6 3 6
ProjU = PPT = 

√1   2 2  = 1 1
− 13 
0 3  √1 √1 − √13 3 3 .
1
√1 − 3
√ 1 3 3
6 − 13 5
6
2

-71-
Chapter 4. Inner Product Spaces 4.4. Orthogonal Matrices

A straightforward application of projection matrix is the following optimization method.

Theorem 4.25 (Best Approximation Theorem). Let U be a subspace of Rn and x ∈ Rn .


Then
∥x − ProjU (x)∥ ≤ ∥x − u∥, for any u ∈ U ,
i.e. ProjU (x) ∈ U is the closest point in U to x.

Proof. By the Pythagorean Theorem,

∥x − x∥ ∥2 + ∥x∥ − u∥2 = ∥x − u∥2

since (x − x∥ ) ∈ U ⊥ and (x∥ − u) ∈ U are perpendicular to each other.

4.4 Orthogonal Matrices

Let V, W be inner product spaces.

Definition 4.26. A linear transformation T ∈ L (V, W ) between inner product spaces is called an
isometry if it preserves distance: ∥T v∥ = ∥v∥
for any vector v ∈ V .

Proposition 4.27. If T ∈ L (V, W ) is a linear isometry, then for any v, w ∈ V ,

(1) ⟨T v, T w⟩W = ⟨v, w⟩V (i.e. it preserves inner product).

(2) ⟨T v, T w⟩W = 0 ⇐⇒ ⟨v, w⟩V = 0 (i.e. it preserves right angle).

(3) T is injective.

In particular if dim V = dim W are finite dimensional, then T is an isomorphism.

Notation. We may omit the subscript of the inner product later if it is clear from the context.

Proof.

(1) Expanding ∥T (v + w)∥2 = ∥v + w∥2 and simplify.

(2) Follows from (1).

(3) If T v = 0, then ∥v∥ = ∥T v∥ = 0 =⇒ v = 0.

-72-
Chapter 4. Inner Product Spaces 4.4. Orthogonal Matrices

If V is finite dimensional, then every inner product is equivalent:

Theorem 4.28. If V is an inner product space, dim V = n, then it is isometrically isomorphic


to the Euclidean space Rn with the dot product.

Proof. By the Gram–Schmidt Process, V has an orthonormal basis B = {u1 , ..., un }. Then the
coordinate mapping (see Theorem 2.11)
ψV,B : V −→ Rn
ui 7→ ei
is the required isomorphism, which is clearly an isometry.
We say that a matrix A represents T between inner product spaces if it is with respect to
orthonormal bases and the dot product of Euclidean space under the above isomorphism.

Proposition 4.29. If T ∈ L (V, W ) is an isometry represented by an m × n matrix P under the


above isometric isomorphism:
T
V W
≃isom ≃isom
P
Rn Rm
Then
ˆ PT P = In×n .

ˆ P has orthonormal columns.

Proof. The (i, j)-th entry equals


eTi PT Pej = (Pei ) · (Pej ) = ⟨T ui , T uj ⟩W = ⟨ui , uj ⟩V = ei · ej = δij .

Definition 4.30. If n = m, the square matrix P corresponding to a linear isometry under the
isometric isomorphism is called an orthogonal matrix . It is invertible with

P−1 = PT .

The set of n × n orthogonal matrices is denoted by O(n).

Theorem 4.31. The set of orthogonal matrices forms a group (see Definition 1.2), i.e.

ˆ In×n ∈ O(n).

ˆ If P ∈ O(n), then P−1 ∈ O(n).

ˆ If P1 , P2 ∈ O(n), then P1 P2 ∈ O(n).

-73-
Chapter 4. Inner Product Spaces 4.4. Orthogonal Matrices

Example 4.12. In R2 and R3 , an orthogonal matrix corresponds to a combination of rotations


and mirror reflections.

In R2 , all orthogonal matrices are of the form

ˆ Counterclockwise rotations by angle θ:


!
cos θ − sin θ
P= .
sin θ cos θ

ˆ Mirror reflections along the line with slope tan 2θ passing through the origin:
!
cos θ sin θ
P= .
sin θ − cos θ

In R3 , any linear isometry can be decomposed into a rotation in the xy plane, followed by a rotation
in the yz plane, and possibly followed by a mirror reflection. This generalizes to higher dimension.

In general, if P is orthogonal, by Corollary 3.4 we have

1 = det(I) = det(PT P) = det(P)2

so that
det(P) = ±1.
Alternatively, note that the columns of P form a unit cube (length 1 and orthogonal to each other).
Therefore in terms of volume (see Chapter 3.4) the scaling factor is | det(P)| = 1.

The sign indicates whether it consists of a mirror reflection or not (i.e. whether the orientation
of the image is flipped). The set of all orthogonal matrices P with det P = 1 also forms a group,
called the special orthogonal group, denoted by SO(n).

Example 4.13. The change-of-coordinate matrix PB ′


B′ between orthonormal bases B and B is an
orthogonal matrix.

Example 4.14. The n × m matrix P from Proposition 4.23 such that the projection ProjU = PPT
is an isometry: it consists of orthonormal columns.

Non-Example 4.15. ProjU is in general not an isometry: it may not preserve lengths.

-74-
Chapter 4. Inner Product Spaces 4.5. QR Decomposition

4.5 QR Decomposition
The Gram–Schmidt Process implies the following factorization, which is very important in compu-
tation algorithms, best linear approximations, and eigenvalues decompositions.

Theorem 4.32 (QR Decomposition). If A is an m×n matrix with linearly independent columns,
then
A = QR
where
ˆ Q is an m × n matrix with orthonormal columns forming a basis for ColA.
It is obtained from the columns of A by the Gram–Schmidt Process.

ˆ R is an n × n upper triangular invertible matrix with positive entries on the diagonal.


It can be computed by R = QT A.

Proof. Let x1 , ..., xn be the columns of A.


ˆ Rewriting the Gram–Schmidt Process, we see that
x 1 = u1
x2 = u2 + c12 u1
x3 = u3 + c23 u2 + c13 u1
.. ..
. .
xn = un + cn−1,n un−1 + · · · + c1n u1

for some scalars cij ∈ R. i.e. xk is a linear combination of {u1 , ..., uk } where the coefficient
of uk equals 1.

ˆ Hence we have the form


 
    1 ∗ ∗ ··· ∗
| | | | | |  
    0 1 ∗ · · · ∗
A=
x1 x2 · · · xn  = u1 u2 · · · un  .
 
 ... .. .. 
 

. .
| | | | | |  
0 ··· ··· 1
| {z }
upper triangular
ˆ Normalizing the columns, we can rewrite as
 
  ∥u1 ∥ ∗ ∗ ··· ∗
| | |  
 u u2
 0 ∥u2 ∥ ∗ · · · ∗ 
un  
A= 1
··· .

 ∥u1 ∥ ∥u2 ∥  ...
∥un ∥   ..
.
.. 
. 
| | |  
| {z } 0 ··· · · · ∥un ∥
Q | {z }
R

ˆ Finally, since Q is a linear isometry, Q A = Q (QR) = I · R = R.


T T

-75-
Chapter 4. Inner Product Spaces 4.5. QR Decomposition

Example 4.16. As an explicit example, let


 
1 0 0
 
1 1 0
A= .
 
1 1 1
 
1 1 1
     
1 0 0
     
1 1 0
Let x1 =   , x2 =   , x3 =   be the columns of A.
     
1 1 1
     
1 1 1

Part 1: Gram–Schmidt Process.


 
1
 
1
u1 = x 1 =  
 
1
 
1
     
0 1 −3
     
1 3 1 1  1 
u2 = x2 − Proju1 x2 =   −   =  
     
1 4 1 4  1 
     
1 1 1
       
0 1 −3 0
       
0 1 1 1  1  1 −2
u3 = x3 − Proju1 x3 − Proju2 x3 =   −   −   =   .
       
1 2 1 6  1  3  1 
       
1 1 1 1
Then {u1 , u2 , u3 } is an orthogonal basis for ColA.

Part 2: Orthonormal basis and Q.

The vectors have lengths


√ √
12 6
∥u1 ∥ = 2, ∥u2 ∥ = , ∥u3 ∥ = ,
4 3
hence the corresponding orthonormal basis is
     
1 −3 0
     
1 1 1 1 −2
1 
   
u′1 =   , u′2 = √   , u′3 = √  .

2 1 12  1  6
1
 
  
1 1 1

-76-
Chapter 4. Inner Product Spaces 4.6. Least Square Approximation

Then Q is formed by {u′1 , u′2 , u′3 }, i.e.


   
1
− √312 0 1 −3 0
2
 
   12 0 0
1 √1 − √26  1 1 −2
2 12  √1

Q= = 0 0
.
 
1  12
√1 1

 √  1 1 1
2 12 6    0 0 √1
1 √1 √1 1 1 1 6
2 12 6

Part 3: The triangular matrix R.

R = QT A
 

1
 1  0 0
0 0 1 1 1 1 
2

 1
 1 0
√1

0
= 0  −3 1 1 1 
  
12

1 1 1
0 0 √1 0 −2 1 1  
6 1 1 1
 
3
2 2 1
√3 √2  .
 
0
= 12 12 
0 0 √2
6

Note that the diagonal of R is the same as the length of ∥u1 ∥, ∥u2 ∥, ∥u3 ∥ used for the normalization
in Part 2.

4.6 Least Square Approximation

If A is an m × n matrix with m > n, then the system of linear equations

Ax = b

is over-determined and we may not have a solution.

If A has linear independent columns, we can use the QR decomposition to find a best approximation
x
b to the solution of Abx = b such that ∥Abx − b∥ is the smallest.

By the Best Approximation Theorem, the closest point Ax ∈ ColA to b ∈ Rm should be ProjColA b.
But ColA has orthonormal basis given by columns of Q, so ProjColA = QQT . Hence

x = QQT b.
Ab

Using A = QR we obtain:

-77-
Chapter 4. Inner Product Spaces 4.6. Least Square Approximation

Theorem 4.33 (Least Square Approximation). If A is an m × n matrix with m > n and has
linear independent columns, such that we have the QR decomposition A = QR, then

b = R−1 QT b ∈ Rn
x

is the vector such that for every x ∈ Rn ,

∥Ab
x − b∥ ≤ ∥Ax − b∥.

Remark. It is easier to solve for x


b by
x = QT b
Rb
instead of finding R−1 explicitly, because R is upper triangular, so that we can use the technique of
backward substitution from elementary linear algebra.

 
1
2
Example 4.17. Continue with Example 4.16, if b =  , then the closest point x
b such that
 
3
4
∥Ax − b∥ is the smallest will be given by

x = QT b
Rb
 

3
 
1

1 1 1
 1  
2 2 1 x1 2 2 2 2

 √5 

 2
√3 √2  x2  = − √3 √1 √1 1
    
0 √    =  3 .
 
12 12    12 12 12 12   

2
 3  3 
0 0 √
6
x 3 0 − √26 √1
6
√1
6
 √
6
4

This is a very simple system of linear equations, and we can solve for x
b to get
   
x1 1
   
x
b= x2  =  1  .
  
3
x3 2

Therefore 
1 0

0  
 
2
1 1 0 1
   1 4
 
x=
Ab  1 =  

1 1 1 3 2 7
2
1 1 1 7
 
1
2
is the closest approximation in ColA to  .
 
3
4

-78-
Chapter 4. Inner Product Spaces 4.7. Gram Determinants

Remark. It is called the least square because ∥u∥ is computed by summing the squares of the coordinates,
and we want to find the smallest value of ∥Ax−b∥. This is very useful in regression problems in statistics,
where we want to fit the data onto a linear model as closely as possible.

4.7 Gram Determinants

Another application of QR decomposition is to give a formula for the volume of the parallelepiped
spanned by k vectors inside an n-dimensional vector space (k ≤ n).

Theorem 4.34. The k dimensional volume of the parallelepiped P spanned by {v1 , ..., vk } in Rn
is given by q
Vol(P ) = det(AT A).
 
| |
 
where A =  v 1 · · · v k
 ∈ Mn×k (R).
 
| |

The matrix AT A ∈ Mk×k (R) is known as the Gram matrix (or Gramian) of {v1 , ..., vk }, and
det(AT A) is called the Gram determinant.

Proof.

ˆ If A has linearly dependent columns, then

rank(AT A) ≤ rank(A) < k,

hence det(AT A) = 0.

ˆ Otherwise write A = QR, which means column operations changing the columns of A into
orthonormal columns, which has volume 1.

ˆ The scaling factor is det R > 0.

ˆ On the other hand, det(AT A) = det(RT QT QR) = det(RT R) = det(R)2 since R is upper
triangular.

-79-
Chapter 4. Inner Product Spaces 4.7. Gram Determinants

 
| |
 
Example 4.18. For two vectors u, v ∈ Rn , we have A = 
u v, and the Gram matrix is

| |
!
T u·u u·v
A A= .
v·u v·v

Hence
p p
Vol(P ) = ∥u∥2 ∥v∥2 − (u · v)2 = ∥u∥2 ∥v∥2 (1 − cos2 θ) = ∥u∥∥v∥ sin θ, (0 ≤ θ ≤ π)

as expected.

-80-
CHAPTER 5

Spectral Theory

In this chapter we will learn the important notion of eigenvectors and eigenvalues of matrices and
linear transformations in general. The study of eigenspaces of general linear transformations is
known as Spectral Theory.

In this chapter V = W over any field K, so that T ∈ L (V, V ) and T is represented by a square
matrix A if dim V < ∞.

5.1 Eigenvectors and Eigenvalues

Definition 5.1. Let T ∈ L (V, V ) be a linear transformation.

ˆ An eigenvector of T is a nonzero vector u ∈ V such that

T u = λu

for some scalar λ ∈ K.

ˆ λ is called the eigenvalue of the eigenvector u.

ˆ The space
Vλ := {u : T u = λu} ⊂ V
is called the eigenspace of the eigenvalue λ.

ˆ If a finite set of eigenvectors forms a basis of V , it is called an eigenbasis (of V or T ).

We have the same definition if T ∈ L (K n , K n ) is represented by an n × n matrix A.

-81-
Chapter 5. Spectral Theory 5.1. Eigenvectors and Eigenvalues

Remark. eigen- from the German word meaning “own”, “characteristic”, “special”.

Proposition 5.2. If λ is an eigenvalue, Vλ = Ker(T − λI ) is a nonzero subspace of V .

Note. 0 ∈ Vλ although 0 is not an eigenvector by definition.

In particular, any linear combinations of eigenvectors with eigenvalue λ is again an eigenvector with
eigenvalue λ if it is a nonzero vector.

Proposition 5.3. If dim V < ∞, λ is an eigenvalue of T if and only if T − λI is not invertible.

Proof. Vλ ̸= {0} ⇐⇒ dim Ker(T −λI) ̸= 0 ⇐⇒ T −λI not invertible by the Rank-Nullity Theorem.

In particular when dim V < ∞, T is invertible if and only if 0 is not an eigenvalue of T .

Hence the general strategy to find eigenvalues and eigenvectors (when dim V < ∞) is:

Fix a basis B of V , and let A be a square matrix representing T .

Step 1. Find the set of eigenvalues λ by solving det(A − λI) = 0.

Step 2. For each eigenvalue λ, find the eigenspace by solving the linear equations (A − λI)x = 0.
Any nonzero vector of the eigenspace will be an eigenvector.

!
1 1
Example 5.1. Let A = . To find the eigenvalues,
4 1
!
1−λ 1
det = (1 − λ)(1 − λ) − 4 = λ2 − 2λ − 3 = (λ − 3)(λ + 1) = 0,
4 1−λ

hence λ = 3 or λ = −1.
! ! !
−2 1 x 1
For λ = 3, we have = 0 =⇒ Span is the eigenspace for λ = 3.
4 −2 y 2
! ! !
2 1 x 1
For λ = −1, we have = 0 =⇒ Span is the eigenspace for λ = −1.
4 2 y −2

-82-
Chapter 5. Spectral Theory 5.1. Eigenvectors and Eigenvalues

d
Example 5.2. Let T = on the space of real-valued smooth functions C ∞ (R). By solving
dx
f ′ (x) = λf (x)

we see that any λ ∈ R is an eigenvalue with eigenspace Vλ = Span(eλx ).

By the properties of determinant, we obviously have:

Proposition 5.4. The eigenvalues of a triangular matrix (in particular diagonal matrix) are
given by the entries on its main diagonal.

 
1 1 1
 
Example 5.3. A = 
 0 2 2  has eigenvalues λ = 1, 2, 3.

0 0 3

 
3 0 0
 
Example 5.4. A = 
0 1 0 has eigenvalues λ = 1, 3 only.

0 0 1
     
1 0 0
     
V3 = Span 0 is 1-dimensional, while V1 = Span{1 , 0} is 2-dimensional.
    
0 0 1

 
3 0 0
 
Example 5.5. A =   0 1 1  has eigenvalues λ = 1, 3 only.

0 0 1
   
1 0
   
V3 = Span 0 is 1-dimensional, but V1 = Span 1
  
 is also only 1-dimensional.
0 0

-83-
Chapter 5. Spectral Theory 5.1. Eigenvectors and Eigenvalues

d
Example 5.6. The operator T = on R2 [t] is represented in the standard basis {1, t, t2 } by
dt
 
0 1 0
 
A= 0 0 2 .

0 0 0

Hence it only has a single eigenvalue λ = 0 with the constant polynomials as eigenvectors.

From Example 5.6, we see that

Eigenspaces in general do not span the whole vector space.

Example 5.7. Existence of! eigenvalues depend on the field K.


0 −1
For example, A = has no eigenvalues when K = R, but it has 2 eigenvalues when K = C:
1 0
! !
i −i
λ = i with eigenspace SpanC , and λ = −i with eigenspace SpanC .
1 1

Theorem 5.5. If v1 , ..., vr are eigenvectors of T corresponding to distinct eigenvalues λ1 , ..., λr ,


then the set {v1 , ..., vr } is linearly independent.

Proof. By induction. When r = 1 it is obvious.

ˆ Assume c1 v1 + · · · + cr vr = 0. Apply T , we get

c1 λ1 v1 + · · · + cr λr vr = 0.

ˆ Subtracting with λr (c1 v1 + · · · + cr vr ) = 0, we get

(λ1 − λr )c1 v1 + · · · + (λr−1 − λr )cr−1 vr−1 = 0.

By assumption (λi − λr ) ̸= 0 for 1 ≤ i ≤ r − 1.

ˆ By induction, c1 = · · · = cr−1 = 0. This also implies cr = 0.

-84-
Chapter 5. Spectral Theory 5.2. Characteristic Polynomials

5.2 Characteristic Polynomials

Recall that det(T ) is defined to be det(A) for any matrix A representing T , and it is independent
of the choice of A since similar matrices have the same determinant.

Hence everything below are defined intrinsically to T without any references to matrices.

Definition 5.6. Let dim V = n and T ∈ L (V, V ). The function in λ defined by

p(λ) := det(T − λI )

is a polynomial over K of degree n, called the characteristic polynomial .

Eigenvalues are the roots of the characteristic equation p(λ) = 0.

Remark. Some textbooks define the characteristic polynomial as p(λ) = det(λI − T ).

ˆ It has the advantage that p(λ) is monic (i.e. with leading coefficient 1), which is mathematically
more natural.
ˆ It has the disadvantage that you have to negate every entry of the matrix to compute p(λ).

But otherwise they are the same up to a sign, and gives the same equation for λ.

It follows from direct computation that:

Proposition 5.7. For any n × n matrix A representing T , p(λ) satisfies:

ˆ The top term is λn with coefficient (−1)n .

ˆ The coefficient of λn−1 is (−1)n−1 Tr(A).

ˆ The constant term is det(A).

Next, we define the notion of multiplicity:

Definition 5.8.

ˆ The dimension of the eigenspace Vλ is called the geometric multiplicity .

ˆ The multiplicity of the root λ of p(λ) = 0 is called the algebraic multiplicity .

-85-
Chapter 5. Spectral Theory 5.2. Characteristic Polynomials

 
3 0 0
 
Example 5.8. If A = 
0 1 1, then the characteristic polynomial is

0 0 1
 
3−λ 0 0
 = (1 − λ)2 (3 − λ).
 
p(λ) = det(A − λI) = det 
 0 1 − λ 1 
0 0 1−λ

Note that λ = 1 is a multiple root, hence the algebraic multiplicity of λ = 1 is 2.


 
0
 
On the other hand, the eigenspace V1 is spanned by  1 only (see Example 5.5), so λ = 1 has

0
geometric multiplicity = 1 only.

From Example 5.8, we conclude that

In general, geometric multiplicity ̸= algebraic multiplicity.

A field K is algebraically closed if any polynomial of degree n has exactly n roots (counted with
multiplicities). By the Fundamental Theorem of Algebra, K = C is algebraically closed.

Proposition 5.9. If K = C, the algebraic multiplicities of all eigenvalues add up to dimC V = n.


i.e. Every complex n × n square matrix A has n eigenvalues (counted with multiplicities).

Hence it follows from Proposition 5.7 that

Corollary 5.10. We have

ˆ Tr(A) is the sum of all (complex) eigenvalues (counted with algebraic multiplicities).

ˆ det(A) is the product of all (complex) eigenvalues (counted with algebraic multiplicities).

and we can take this as the definitions of Tr(T ) and det(T ), independent of the choice of A.

-86-
Chapter 5. Spectral Theory 5.3. Diagonalization

Since everything is defined in terms of T only, from the previous remark we conclude that:

Proposition 5.11. If A and B are similar, then the following holds:

ˆ They have the same determinant.

ˆ They have the same trace.

ˆ They have the same characteristic polynomial.

ˆ They have the same eigenvalues.

ˆ They have the same algebraic and geometric multiplicities.

5.3 Diagonalization

Definition 5.12. A square matrix A is diagonalizable if A is similar to a diagonal matrix, i.e.

A = PDP−1

for a diagonal matrix D and an invertible matrix P.

Diagonalization let us simplify many matrix calculations and prove algebraic theorems due to the
following properties:

Proposition 5.13. If A = PBP−1 , then for any integer k (including negative),

Ak = PBk P−1 .

In particular, for any polynomial p(t), p(A) ∼ p(B):

p(A) = P · p(B) · P−1 .

From the geometric point of view, if both A, B represent T in different bases, then clearly both of
p(A), p(B) represent p(T ) in those bases.

-87-
Chapter 5. Spectral Theory 5.3. Diagonalization

Example 5.9. If A is diagonalizable, then it is easy to compute its powers by Proposition 5.13:

Ak = PDk P−1 .

!
4 −3
For example, let A = . Then A = PDP−1 where
2 −1
! ! !
3 1 2 0 1 −1
P= , D= , P−1 = .
2 1 0 1 −2 3

Then for example ! !


8 28 0 256 0
D = =
0 18 0 1
and

A8 = PD8 P−1
! ! !
3 1 256 0 1 −1
=
2 1 0 1 −2 3
!
766 −765
= .
510 −509

Theorem 5.14 (Diagonalization Theorem). An n × n matrix A is diagonalizable:

A = PDP−1

iff A has n linearly independent eigenvectors (i.e. K n has an eigenbasis of A). In this case:

ˆ The columns of P consist of eigenvectors of A, and

ˆ D is a diagonal matrix consisting of the corresponding eigenvalues.

Proof. If A is diagonalizable, then since P is invertible, its columns are linearly independent.
We have
      
| | | | | | λ1
AP = A 
      .. 
v1 · · · vn  = λ1 v1 · · · λn vn  = v1 · · · vn  
     .  = PD.

| | | | | | λn

Hence the columns of P are the eigenvectors and D consists of the corresponding eigenvalues.

The same equation proves the converse.


-88-
Chapter 5. Spectral Theory 5.3. Diagonalization

Since similar matrices represent the same linear transformation T with respect to different basis,
we say that

Definition 5.15. A linear transformation T is diagonalizable if V has an eigenbasis of T .

Since eigenvectors with distinct eigenvalues are linearly independent, by Theorem 1.35,

Corollary 5.16. T is diagonalizable if and only if V decomposes as a direct sum of eigenspaces

V = Vλ1 ⊕ · · · ⊕ Vλr

with distinct eigenvalues λ1 , ..., λr ∈ K.

 
3 −2 4
 
Example 5.10. Let us diagonalize A = 
−2 6 .
2
4 2 3

Step 1: Find eigenvalues. The characteristic equation is


p(λ) = det(A − λI) = −λ3 + 12λ2 − 21λ − 98 = −(λ − 7)2 (λ + 2) = 0.
Hence the eigenvalues are λ = 7 and λ = −2.

Step 2: Find eigenvectors. We find by usual procedure the linearly independent eigenvectors:
     
1 −1 −2
     
λ = 7 : v1 = 
0 , v2 =  2  ,
   λ = −2 : v3 = −1 .

1 0 2

Step 3: P constructed from eigenvectors. Putting them in columns,


 
1 −1 −2
 
P= 0 2 −1 .

1 0 2

Step 4: D consists of the eigenvalues. Putting the eigenvalues according to vi :


 
7 0 0
 
D= 0 7 0 

0 0 −2
and we have
A = PDP−1 .

-89-
Chapter 5. Spectral Theory 5.3. Diagonalization

We have seen in Theorem 5.5 that if the eigenvectors have different eigenvalues, then they are
linearly independent. Therefore by the Diagonalization Theorem,

Corollary 5.17. If T has n different eigenvalues, then it is diagonalizable.

 
3 4 5
 
Example 5.11. The matrix A =  0 0 7 is triangular, hence the eigenvalues are the diagonal

0 0 6
entries λ = 3, λ = 0 and λ = 6. Since they are all different, A is diagonalizable.

 
3 0 0
 
Non-Example 5.12. We have seen from Example 5.5 that the matrix A =  0 1 1  has two
0 0 1
eigenvalues λ = 1, 3 only, so we cannot apply Corollary 5.17. In fact, each of the eigenvalue only
correponds to 1-dimensional eigenvectors. Hence R3 does not have a basis formed by eigenvectors
and so it is not diagonalizable by the Diagonalization Theorem.

From the discussion of Non-Example 5.12, we can also deduce that

Corollary 5.18. If T is diagonalizable, the algebraic multiplicity equals the geometric multiplicity
for each eigenvalue λ.

Conversely, if the algebraic multiplicity equals the geometric multiplicity, and the algebraic multi-
plicities add up to dim V = n, then T is diagonalizable.

Example 5.13. Let p(λ) be the characteristic polynomial of A with the eigenvalues λi as roots.
 
λ1
If A = PDP−1 is diagonalizable, then since D = 
 .. 
,
 . 
λn
 
p(λ1 )

p(D) =  .. 
=O
 . 
p(λn )
where O is the zero matrix. By Proposition 5.13 we conclude that
p(A) = P · p(D) · P−1 = O.

The general form of this statement is the Cayley–Hamilton Theorem (see Theorem 8.8).

-90-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices

5.4 Symmetric Matrices

In this Section, we focus on V = Rn with the standard dot product on Rn to learn about the main
structural results about diagonalizable matrices.

We will see in Chapter 7 that most results generalize to general (complex) inner product spaces.

Definition 5.19. An n × n matrix is called symmetric if

AT = A.

Symmetric matrices represent symmetric operators.

Equivalently,

Proposition 5.20. A is symmetric if and only if

u · Av = Au · v, ∀u, v ∈ Rn .

Proof. By linearity and Proposition 4.2,


u · Av = Au · v ∀u, v ∈ Rn
⇐⇒ ei · Aej = Aei · ej ∀i, j = 1, ..., n
⇐⇒ aij = aji ∀i, j = 1, ..., n.

The first important property of symmetric matrix is the orthogonality between eigenspaces.

Theorem 5.21. If A is symmetric, then different eigenspaces are orthogonal to each other.

Proof. If λ1 ̸= λ2 and v1 ∈ Vλ1 , v2 ∈ Vλ2 are eigenvectors, then


λ1 v1 · v2 = Av1 · v2 = v1 · Av2 = λ2 v1 · v2
and so we must have v1 · v2 = 0.
Therefore, if the eigenspaces span Rn , we can always choose an orthonormal eigenbasis by the
Gram–Schmidt Process. Then the matrix P formed by the eigenvectors will consist of orthonormal
columns, i.e. P is an orthogonal matrix.
Definition 5.22. A matrix A is orthogonally diagonalizable if
A = PDP−1 = PDPT
for some orthogonal matrix P and diagonal matrix D.

-91-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices

 
3 −2 4
 
Example 5.14 (Example 5.10 cont’d). We have diagonalize the matrix A = 
−2 6  before.
2
4 2 3
But the matrix P we found was not an orthogonal matrix.

We have found before (Step 1, Step 2)


     
1 −1 −2
     
λ = 7 : v1 = 
0 , v2 =  2  ,
   λ = −2 : v3 = 
−1 .

1 0 2
Since A is symmetric, different eigenspaces are orthogonal to each other. So for example
v1 · v3 = v2 · v3 = 0.
So we just need to find an orthogonal basis for the eigenspace V7 .

Step 2a: Use the Gram–Schmidt Process on V7 :


 
1
 
b1 = v1 = 
0 ,

1
 
  −1
b1 · v 2 1 
b2 = v2 − b1 =  4 .
b1 · b1 2 
1
Therefore {b1 , b2 } is an orthogonal basis for V7 , and {b1 , b2 , v3 } is an orthogonal eigenbasis of R3 .

Step 2b: Normalize.


     
1 −1 −2
′ 1  1   1 
b′2 = √  v′3 = 

b1 = √ 0 , 4 , −1 .
2  18   3 
1 1 2
Step 3, Step 4: Construct P and D

Putting together the eigenvectors, we have


 
√1 − √118 − 23
 2 
P= √4 − 13 
0 18 
√1 √1 2
2 18 3
 
7 0 0
 
and D = 
0 7 0 , consisting of the eigenvalues, is the same as before.

0 0 −2
-92-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices

Theorem 5.23. If A is a real symmetric n × n matrix, then all the eigenvalues are real.
In particular the characteristic polynomial p(λ) has n real roots (counted with multiplicities).


Note. The statement is not true for complex symmetric matrices, e.g. take a 1 × 1 matrix i .

In particular, it implies that an eigenvector always exists for real symmetric matrix.

Proof. Consider A as a complex matrix with real entries, we show that its eigenvalues are real.
Note that we need to use the complex dot product to ensure that v · v ≥ 0 is real.

ˆ If v ̸= 0 is a (complex) eigenvector with eigenvalue λ ∈ C, then

v · Av = v · λv = λv · v = λv · v.

ˆ Since A is real symmetric,


v · Av = Av · v = λv · v.

ˆ Hence λ = λ and λ is real.

ˆ If v = vRe + ivIm , then vRe is a real eigenvector of A.

Since a real eigenvector exists, this allows us to construct a full set of them by induction:

Theorem 5.24. A square matrix A is symmetric if and only if it is orthogonally diagonalizable.

The collection of Theorems 5.21, 5.23 and 5.24 are known as the Spectral Theorem for
Symmetric Matrices.

Example 5.15. The following matrix is real symmetric:


 
1 2 3 4
 
2 3 4 5
A= .
 
3 4 5 6
 
4 5 6 7

Hence we know that it is orthogonally diagonalizable, without even calculating its eigenvalues
or eigenvectors!

(The eigenvalues are given by λ = 0 (with multiplicity 2) and 8 ± 2 21, all of them are real.)

-93-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices

Proof.

(⇐=) If A = PDPT , then

AT = (PDPT )T = PDT PT = PDPT = A.

(=⇒) If A is symmetric, we prove by induction that we can find n orthonormal eigenvectors of A.


– If n = 1, it is trivial. Hence let n > 1 and assume any (n − 1) × (n − 1) symmetric
matrices are orthogonally diagonalizable.
– Let v be a unit eigenvector of A with eigenvalue λ. (It exists by Theorem 5.23.)
– Let U = Span(v). If u⊥ ∈ U ⊥ , then for any u ∈ U ,

u · Au⊥ = Au · u⊥ = λu · u⊥ = 0

so that Au⊥ ∈ U ⊥ .
– Extend {v} to an orthonormal basis B = {v, v1 , ..., vn−1 } of Rn = U ⊕ U ⊥ where
vi ∈ U ⊥ .
– Since A : U −→ U and U ⊥ −→ U ⊥ , with respect to B, [A]B is a block diagonal matrix
of the form !
λ 0
[A]B =
0 B
which is still symmetric since the change of basis is orthogonal.
– In particular the restriction B := A|U ⊥ is a symmetric (n − 1) × (n − 1) matrix, so that
it is orthogonally diagonalizable by induction.
– Since Rn = U ⊕ U ⊥ , the (orthonormal) eigenvectors of B, together with v, form an
orthonormal eigenbasis of [A]B . Reverting the change of basis we get an orthonormal
eigenbasis of A.
By the remark before Definition 5.22, the matrix P formed by these eigenvectors orthogonally
diagonalize A.

Remark. The property


A(U ⊥ ) ⊂ U ⊥
means that U ⊥ is an invariant subspace of A. We will come back to this notion in Chapter 8.

-94-
CHAPTER 6

Positive Definite Matrices and SVD

We know that not all matrices can be diagonalized. In this chapter, we derived a simple alternative
approach called the Singular Value Decomposition (SVD), which can be applied even to
rectangular matrices! This method is also extremely important in data analysis.

Another approach to approximate diagonalization by various Canonical Forms will be discussed


in Chapter 9.

Throughout this chapter, we assume K = R and dim V < ∞.

Remark. Everything works for K = C as well with minor modifications. (See the Summary in Chapter 7.)

6.1 Positive Definite Matrices

Definition 6.1. A bilinear form on V is a (real-valued) function f (x, y) in two variables that is
linear in both arguments x, y ∈ V .

Example 6.1. If K = R, any inner product f (x, y) := ⟨x, y⟩ is a bilinear form.

Example 6.2. For n × n real square matrices,

f (A, B) := 2nTr(AB) − 2Tr(A)Tr(B)

is a bilinear form on Mn×n (R) called the Killing form (named after the Mathematician Wilhelm
Killing), which is very important in Lie Theory.

-95-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices

Proposition 6.2. Any bilinear form f (x, y) on Rn is of the form

f (x, y) = xT Ay

for some matrix A.

Proof. Expand the bilinear form in terms of the standard basis by linearity, we get

(A)ij = f (ei , ej )

Definition 6.3. Let A be a real symmetric matrix.

ˆ It is called positive definite if

xT Ax = x · Ax > 0 for all nonzero x ∈ Rn .

ˆ It is called positive semidefinite if

xT Ax = x · Ax ≥ 0 for all nonzero x ∈ Rn .

ˆ The function Q(x) := xT Ax is called the quadratic form associated to A.

With this terminology, the symmetric and positivity properties of inner product implies that

Proposition 6.4. Any inner product on Rn is of the form

⟨x, y⟩ = xT Ay

for some positive definite matrix A.

Example 6.3. A = I corresponds to the standard inner product.

 
c1

Example 6.4. The diagonal matrix A =  .. 
 with all ci > 0 corresponds to the inner
 . 
cn
product on Rn
⟨x, y⟩ = c1 x1 y1 + · · · + cn xn yn
discussed in Example 4.2.

-96-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices

! !
  9 0 x
Example 6.5. Q(x) := x y = 9x2 + y 2 is positive definite.
0 1 y

!
5 4
Example 6.6. Let A = . Then Q(x) := xT Ax = 5x2 + 8xy + 5y 2 is positive definite.
4 5
We can see that its level sets are represented
! by ellipses as
! follows: We can diagonalize the matrix
9 0 1 1
by A = PDPT where D = and P = √12 . Then
0 1 1 −1

Q(x) = xT PDPT x = (PT x)T D(PT x).


!
x
= PT x, i.e. rotating (and reflecting) the basis by P−1 , then
b
Therefore if we let x
b=
yb

Q(b x2 + yb2
x) = 9b

and its level sets are represented by ellipses.

PT = (PB
E)
−1

E B

Q(x) = 5x2 + 8xy + 5y 2 = 1 Q(b x2 + yb2 = 1


x) = 9b

Theorem 6.5. A symmetric matrix A is positive (semi)definite if and only if λi > 0 (λi ≥ 0) for
all the eigenvalues of A.

Proof. Substitute x ⇝ Px where A = PDPT is the orthogonal diagonalization.

-97-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices

In particular, by Corollary 5.10, we have

Corollary 6.6. If A is a symmetric positive (semi)definite matrix, then

det(A) > 0 (≥ 0).

Remark. Let A be a symmetric matrix.

ˆ If all eigenvalues of A are λi < 0 (λi ≤ 0), we call A negative (semi)definite.


ˆ Otherwise if some λi are positive and some are negative, it is called indefinite.
ˆ In general, the numbers (n+ , n− ) where

n+ = #(λi > 0), n− = #(λi < 0)

is called the signature of A.


 
x
 
 y
In V = R4 with coordinate 
 z , the indefinite quadratic form with signature (3, 1)

 
t
 
1 0 0 0
 
0 1 0 0
M := 
0

 0 1 0

0 0 0 −1

gives V the structure of a Minkowski space, which is very important in Theory of Relativity.

Since any real symmetric matrix can be diagonalized, we can always find a “square root” of A if it
is positive (semi)definite.

Theorem 6.7. Let A be a positive (semi)definite matrix.

ˆ There exists a unique positive (semi)definite matrix B such that

B2 = A.

We call B the square root of A and is denoted by A.

ˆ If A = PDPT is the orthogonal diagonalization, we have


√ 1
A = PD 2 PT
1
where D 2 is the diagonal matrix where we take the square root of the entries of D.

-98-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices

Proof. Let B be a square root of A. To show uniqueness:

ˆ Since B commutes with A(= B2 ), every eigenspace Vλ of A is B-invariant.


So we only need to study uniqueness on eigenspaces.

ˆ A = λI on Vλ , so B2 = λI on Vλ .

ˆ Since B is positive (semi)definite, it can only have λ ≥ 0 as eigenvalue.

ˆ Since B is diagonalizable, it must equal λI on Vλ .

An interesting property is that



Proposition 6.8. A = p(A) for some polynomial p(t).

Proof. Let λi ≥ 0 be the eigenvalues of A. Pick any polynomial p(t) such that p(λi ) = λi , which
is always possible by the Lagrange interpolation formula. Then
√ 1
A = PD 2 PT = Pp(D)PT = p(A).

As a consequence, we have

Corollary 6.9. If B commutes with a positive semidefinite matrix A, then it commutes with A.

Similarly, we can also construct the “absolute value” of any matrix A.

This is used in the construction of Singular Value Decomposition in the next section.

Theorem 6.10. Let A be any m × n matrix.

ˆ AT A is a positive semidefinite matrix. Hence we can define the matrix


p
|A| := AT A

which is also positive semidefinite, called the absolute value of A.

ˆ Conversely, any positive semidefinite matrix is of the form AT A for some A.

Proof. Note that AT A is symmetric. If v is an eigenvector of AT A, then

λ∥v∥2 = v · AT Av = ∥Av∥2 ≥ 0

hence any eigenvalue λ ≥ 0.



Conversely, if B is positive (semi)definite, let A = B (which is symmetric) so that B = AT A.

-99-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions

Note. When m = n and A is positive semidefinite, AT A = A2 and hence |A| = A.

Note. When m < n, AT A is never positive definite, since

Rank of AT A ≤ m < n

is not full rank, hence there are always zero eigenvalues.

! ! !
5 4 9 0 1 1 1
Example 6.7. Let A = =P PT where P = √ . Then
4 5 0 1 2 1 −1
! !
√ 3 0 T 2 1
A=P P = .
0 1 1 2

1 3
We observe that if we pick the polynomial p(t) = t + so that p(9) = 3 and p(1) = 1, then indeed
4 4
we have
1 3 √
p(A) = A + I = A.
4 4
! ! !
2
1 −5 T 5 4 2 1
Let B = . Then B B = , therefore by above |B| = .
−2 − 11 5 4 5 1 2

6.2 Singular Value Decompositions

Recall that if v is an eigenvector with Av = λv, the effect is “stretching by the factor λ” along the
direction v. We want to consider all such directions if possible, even for rectangular matrices.

Let A be an m × n matrix. The image of A has dimension rank(A) = r ≤ n, m. Our goal is to


find an orthonormal basis B of Rn such that its image consists of r orthonormal vectors in Rm ,
but stretched by some factors, similar to that of eigenvectors (while the other n − r basis vectors
mapped to zero.)

The image of B in Rm under A clearly can be normalized and extended to an orthonormal basis
B ′ of Rm of the target space.

-100-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions

More precisely, we want to find orthogonal matrices V ∈ O(n) (formed by B) and U ∈ O(m)
(formed by B ′ ) such that (
σi ui 1 ≤ i ≤ r
Avi =
0 r+1≤i≤n
or rewriting in terms of matrices:
AV = UΣ.
for some quasi-diagonal matrix Σ of size m × n and rank r ≤ m, n:

r n−r
!
Σ= D O r
O O m−r

where D = diag(σ1 , ..., σr ) is a diagonal matrix.


(When r = m or n, we omit the rows or columns of zeros).

Such a strategy leads to the singular value decomposition.

Definition 6.11. Let A be an m × n matrix.



ˆ The singular values of A is the eigenvalues σi of |A| = AT A.

ˆ If A has rank r, then we have r nonzero singular values. We arrange them as

σ1 ≥ σ2 ≥ · · · ≥ σr > 0.

Since AT A is a positive semidefinite symmetric matrix, it has an orthonormal set of eigenvectors


{v1 , ..., vn } with non-negative eigenvalues {λ1 , ..., λn }. Then
∥Avi ∥2 = vTi AT Avi = vTi λi vi = λi .

Therefore the singular values σi = λi = ∥Avi ∥ of A, if nonzero, are precisely the lengths of the
vectors Avi , i.e. the stretching factors we are seeking.

Theorem 6.12 (Singular Value Decomposition). Let A be an m × n matrix with rank r.


Then we have a factorization
A = UΣVT
where

ˆ Σ is the m × n quasi-diagonal matrix with D consisting of the r nonzero singular values


σ1 , ..., σr of A.

ˆ U is an m × m orthogonal matrix.

ˆ V is an n × n orthogonal matrix.

-101-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions

Proof. Note that if the SVD exists,

AT A = VΣT UT UΣVT = V(ΣT Σ)VT

is an orthogonal diagonalization.
 
| |
ˆ Hence we take V = 
 
v1 · · · vn  where the columns are given by the orthonormal eigen-

| |
T
basis {v1 , ..., vn } of A A (which exists by the Spectral Theorem of Symmetric Matrices).

Now we compare AV = UΣ to see that


 
| |
ˆ Since ∥Avi ∥ = σi , we take U = 
 
u 1 · · · u m
 where we extend the orthogonal set
 
| |
{Av1 , ..., Avr } to a basis of Rm , and normalize to obtain an orthonormal basis {u1 , ..., um }.

Taking transpose, this also implies the columns of U are the orthonormal eigenbasis of AAT .
 
! 2 2 0
1 1 1
. Then AT A = 
 
Example 6.8. Let A = 2 2 0  and it has eigenvalues
1 1 −1  
0 0 2

λ1 = 4, λ2 = 2, λ3 = 0

with orthonormal eigenvectors


     
1 0 −1
1     1  
v1 = √  1, v2 = 
0 ,
 v3 = √  1 .
2  2 
0 1 0

Therefore  
1 0 −1
1  
V= √  1 0 1.
2 √
0 2 0
√ √ √
Also σ1 = λ1 = 2, σ2 = λ2 = 2. Therefore
!
2 0 0
Σ= √ .
0 2 0

-102-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions

Finally !
Av1 Av1 1 1
u1 = = =√ ,
∥Av1 ∥ σ1 2 1
!
Av2 Av2 1 1
u2 = = =√ .
∥Av2 ∥ σ2 2 −1

Therefore !
1 1 1
U= √
2 1 −1
and  
  ! √1 √1 0
√1 √1 2 0 0  2 2
A = UΣVT = 

2 2 
√  0 0 1
√1 − 12
√ 0 2 0  1 
2 − √2 √1 0
2

is the Singular Value Decomposition of A.

A
−−−→

Figure 6.1: Multiplication by A. It squashed the v3 direction to zero.

When m = n and A is a square matrix, SVD reduces to the polar decomposition:

Theorem 6.13. For any real square matrix A, we have the polar decomposition

A = PH

where P is orthogonal and H is positive semidefinite.

Proof. If A = UΣVT is a SVD, we take P = UVT and H = VΣVT .


We can see that this is a generalization of the polar decomposition of the complex numbers
(viewed as a linear transformation on the complex plane):

z = reiθ .

This means that any linear transformation can be represented by scaling in some directions, followed
by rotations and mirror reflections.

-103-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions

One useful application of SVD is the description of the bases of the fundamental subspaces.

Theorem 6.14. Let A be m × n matrix with rank r, and A = UΣVT be the SVD.
   
| | | |
   
u1 · · · um  and V = v1 · · · vn . Then
Assume U =    
| | | |

ˆ {u1 , ..., ur } is an orthonormal basis of ColA.

ˆ {ur+1 , ..., um } is an orthonormal basis of Nul(AT ).

ˆ {v1 , ..., vr } is an orthonormal basis of RowA = ColAT .

ˆ {vr+1 , ..., vn } is an orthonormal basis of NulA.

This allows us to derive another application of SVD for the least square approximation which
works like the example from QR decomposition in Section 4.6.

Computation using SVD is usually more efficient because finding orthonormal eigenbasis for a
symmetric matrix is easy. But of course, it always depends on the specifics of the problem itself.
   
| | | |
   
Definition 6.15. Let Ur = u1 · · · ur  , Vr = v1 · · · vr  be the submatrix consisting of
  
| | | |
the first r columns. Then
! !
  D O VTr
A = Ur ∗ = Ur DVTr .
O O ∗

The pseudo-inverse of A is defined to be

A+ := Vr D−1 UTr .

The pseudo-inverse satisfies for example


AA+ A = A and AA+ = ProjColA
because
AA+ = (Ur DVTr )(Vr D−1 UTr ) = Ur DD−1 UTr = Ur UTr = ProjColA .

Theorem 6.16. Given the equation Ax = b, the least square solution is given by

b = A+ b = Vr D−1 UTr b.
x

x = AA+ b = ProjColA b, Ab
Proof. Since Ab x is the closest point to b in ColA.
-104-
CHAPTER 7

Complex Matrices

If the field is K = C, most of the results from Chapter 4 and Section 5.4 carries over to general
inner product spaces with minor modifications.

From another point of view, most of the results for K = R in the previous chapters are just
the special cases of the results in K = C where we just ignore the complex conjugations.

Note. Another mainstream approach is to consider complexification of real vector spaces, see
Appendix C for more details.

Recall that a complex inner product (also called Hermitian inner product) satisfies

(1) Conjugate-commutativity: ⟨u, v⟩ = ⟨v, u⟩.


(2) Linearity (in the first argument): ⟨u + v, w⟩ = ⟨u, w⟩ + ⟨v, w⟩.
⟨cu, v⟩ = c⟨u, v⟩, c ∈ C.
(3) Positivity: ⟨u, u⟩ ≥ 0, and ⟨u, u⟩ = 0 ⇐⇒ u = 0.
Recall also that (1) and (2) imply that it is conjugate-linear in the second argument:
⟨u, v + w⟩ = ⟨u, v⟩ + ⟨u, w⟩.
⟨u, cv⟩ = c⟨u, v⟩, c ∈ C.

Remark. A function with two arguments which is linear in one and conjugate-linear in another is known
as a sesquilinear form, from the Latin prefix “sesqui-” meaning “one and a half”.

Note. In many physics books, the complex inner product is written with Dirac’s bra-ket notation:
⟨x|y⟩ := ⟨y, x⟩
With this notation, the inner product is conjugate-linear in the first argument instead.

-105-
Chapter 7. Complex Matrices 7.1. Adjoints

7.1 Adjoints

Let V, W be inner product spaces over R or C.

Definition 7.1. If T ∈ L (V, W ), the adjoint of T is the linear transformation T ∗ ∈ L (W, V )


satisfying
⟨T v, w⟩W = ⟨v, T ∗ w⟩V , ∀v ∈ V, w ∈ W.

Proposition 7.2 (Existence and Uniqueness).

ˆ If dim V < ∞, then T ∗ exists.

ˆ If T ∗ exists, it is unique.

The Gram–Schmidt Process still works for K = C with exactly the same formula. It implies that

An orthonormal basis always exists for complex finite dimensional inner product space.

Proof. (Existence.)

ˆ Let {u1 , ..., un } be an orthonormal basis of V , and let w ∈ W .

ˆ Assume
v′ := c1 u1 + · · · + cn un ∈ V.

ˆ Then ⟨T ui , w⟩W = ⟨ui , v′ ⟩V implies

ci = ⟨T ui , w⟩W = ⟨w, T ui ⟩W .

ˆ Hence by linearity, the map defined by T ∗ (w) := v′ satisfies the required condition.

ˆ It is obvious from the definition of ci that T ∗ is linear.

(Uniqueness.)

ˆ Assume T1∗ , T2∗ are two adjoints. i.e.

⟨T v, w⟩W = ⟨v, T1∗ w⟩V = ⟨v, T2∗ w⟩V

for any v ∈ V and w ∈ W .

ˆ Then ⟨v, T1∗ w − T2∗ w⟩V = 0 for all v ∈ V .

ˆ Taking v = T1∗ w − T2∗ w implies T1∗ w − T2∗ w = 0 for any w ∈ W .

-106-
Chapter 7. Complex Matrices 7.1. Adjoints

Example 7.1. If K = R and T is represented by A, then T ∗ is represented by AT .

Example 7.2. For an infinite dimensional example, consider the space of all complex 2π-periodic
smooth functions similar to Example 4.10 with inner product
Z 2π
⟨f, g⟩ := f (x)g(x)dx.
0

Then using integration by parts, one checks that


 ∗
d d
=− .
dx dx

It also follows directly from definition that

Proposition 7.3. If S ∈ L (U, V ) and T, T ′ ∈ L (V, W ) such that their adjoints exist, then

(c · T )∗ = c · T ∗ , ∀c ∈ C.
′ ∗ ∗ ′∗
(T + T ) = T + T .
(T ∗ )∗ = T.
(T ◦ S)∗ = S ∗ ◦ T ∗ .

In the case of Cn with the standard dot product, the formula in the proof of Proposition 7.2 implies
that if T is represented by A, then T ∗ is represented by the conjugate transpose A∗ :
 
a11 ··· a1n
 . .. .. 
Definition 7.4. Let A =  .. . .  be an m × n complex matrix.
am1 ··· amn
Then the Hermitian adjoint is an n × m complex matrix given by
 
a11 ··· am1
∗  . T .. .. 
A := A =  .. . . .
a1n ··· amn

That is, we take the transpose and conjugate every matrix entry:
 ∗  
aij := aji .

The dot product can be written simply as


u · v = v∗ u = u∗ v.

Remark. Another common notation in physics for Hermitian adjoint is A† .

-107-
Chapter 7. Complex Matrices 7.1. Adjoints

The analogue of Theorem 3.11 is the following:

Theorem 7.5. If A is a square matrix, then

det(A∗ ) = det(A).

T
Proof. det(A∗ ) = det(A ) = det(A) = det(A).
By Proposition 7.3,

Proposition 7.6. We have


(AB)∗ = B∗ A∗ .

A special case when W = V is given by the following:

Definition 7.7. A linear transformation T ∈ L (V, V ) is self-adjoint if

T = T ∗.

In other words, we have


⟨T v, w⟩ = ⟨v, T w⟩, ∀v, w ∈ V.

Example 7.3. Taking the standard dot product on Rn , self-adjoint operators are represented by
symmetric matrices
AT = A.

Example 7.4. Taking the standard dot product on Cn , self-adjoint operators are represented by
Hermitian matrices
A∗ = A.

d
Example 7.5. The operator i in the setting of Example 7.2 is self-adjoint.
dx

Definition 7.8. The orthogonal complement of U is (still) defined by

U ⊥ := {v ∈ V : ⟨u, v⟩ = 0, for any u ∈ U }.

A linear transformation P ∈ L (V, V ) is called an orthogonal projection if

P2 = P and P ∗ = P.

Most results about symmetric operators from Section 5.4 carry over:

-108-
Chapter 7. Complex Matrices 7.2. Unitary Matrices

Theorem 7.9. If T ∈ L (V, V ) is self-adjoint, then

ˆ Different eigenspaces are orthogonal to each other.

ˆ All eigenvalues are real.

ˆ If dim V < ∞, then T has an orthonormal eigenbasis.

Proof. All the proofs are the same as in Section 5.4 with u · v replaced by ⟨u, v⟩.

7.2 Unitary Matrices

If K = C, most results from Chapter 4 also hold when AT is replaced by A∗ .

Definition 7.10. A matrix U corresponding to a linear isometry satisfies

U∗ U = I

and it consists of orthonormal columns (under the complex dot product).

If U is a square matrix, it is called a unitary matrix . It is invertible with

U−1 = U∗ .

The set of all n × n unitary matrices is denoted by U (n).

Exactly the same as Theorem 4.31 for orthogonal matrices, we have:

Theorem 7.11. The set of all n × n unitary matrices forms a group:

ˆ In×n ∈ U (n).

ˆ If U ∈ U (n), then U−1 ∈ U (n).

ˆ If U, V ∈ U (n), then UV ∈ U (n).

Definition 7.12. A is unitarily equivalent to B if there exists a unitary matrix U such that

A = UBU∗ .

-109-
Chapter 7. Complex Matrices 7.2. Unitary Matrices

By Theorem 7.5, we have

Proposition 7.13. The determinant of a unitary matrix U is a complex number with norm 1:

| det(U)| = 1.

Proposition 7.14. If λ ∈ C is an eigenvalue of a unitary matrix U, then |λ| = 1.

Proof. If Uv = λv, then

∥v∥2 = v∗ v = v∗ U∗ Uv = (Uv)∗ (Uv) = λλv∗ v = |λ|2 ∥v∥2 .

When K = C we have the following important structural result:

Theorem 7.15 (Schur’s Lemma). Any complex square matrix is unitarily equivalent to an
upper triangular matrix.

Note. Using the same proof, if A is a real matrix with real eigenvalues only, then it is orthogo-
nally similar to a real upper triangular matrix.

Proof. The idea of the proof is very similar to that of Theorem 5.24.

By induction, if n = 1 it is trivial. Hence assume n > 1 and the statement is true for any
(n − 1) × (n − 1) matrices.

ˆ Pick an eigenvector v ∈ Cn with an eigenvalue λ. This exists as our matrix is over C.

ˆ By Gram–Schmidt Process, extend to an orthonormal basis B = {v, u2 , ..., un } of Cn

ˆ The matrix U′ with columns B is unitary, and we have the block form
 
λ ∗ ··· ∗
 
0 
′∗ ′
U AU =  .
 
 .. A

n−1 


0

for some (n − 1) × (n − 1) matrix An−1 .

ˆ By induction, there exists unitary matrix Un−1 such that U∗n−1 An−1 Un−1 = Tn−1 is upper
triangular.

-110-
Chapter 7. Complex Matrices 7.2. Unitary Matrices

 
1 0 ··· 0
 
0 
′′
ˆ Then U :=  .  is also unitary, hence U = U′ U′′ is unitary and
 
 .. U
n−1 


0
 
λ ∗ ··· ∗
 
0 

U AU =  .
 
 .. T

n−1 


0

is upper triangular.

Finally, concerning similarity, it turns out that we do not need to care about the underlying fields.
Intuitively, an eigenvector u + iv ∈ Cn with real eigenvalues gives real eigenvectors u, v ∈ Rn .

Theorem 7.16. If two real matrices A ∼ B in K = C, then A ∼ B in K = R.

Furthermore, if they are unitarily equivalent, then they are actually orthogonally similar. i.e.

A = UBU∗ for some U ∈ U (n) =⇒ A = PBPT for some P ∈ O(n).

Proof. Assume A = UBU−1 for some complex invertible matrix U, i.e.

AU = UB.

ˆ Let U = X + iY be the real and imaginary parts. Then it follows that

AX = XB, AY = YB.

ˆ Since U is invertible, det(X + iY) ̸= 0, so the polynomial q(λ) := det(X + λY) is nonzero.

ˆ Hence there exists λ ∈ R such that det(X + λY) ̸= 0, i.e. R := X + λY is invertible.

ˆ It follows that AR = RB so A ∼ B by a real matrix R.

-111-
Chapter 7. Complex Matrices 7.3. Hermitian Matrices

If further U is unitary, then taking adjoint, we also have


AT U = UBT .

ˆ Then we have
AT U = UBT =⇒ AT R = RBT =⇒ RT A = BRT .
ˆ Consider polar decomposition R = PH where P is orthogonal and H is positive definite.
(Note: R is invertible =⇒ H is invertible.)
ˆ Since RT R = (PH)T (PH) = HPT PH = H2 , it follows that
BH2 = BRT R = RT AR = RT RB = H2 B.
Since H2 is also positive definite, by Corollary 6.9 BH = HB as well.
ˆ Hence
APH = AR = RB = PHB = PBH.
Since H is invertible, AP = PB for an orthogonal matrix P as desired.

7.3 Hermitian Matrices

Recall that A is Hermitian if A∗ = A. The exact same discussion as in symmetric matrix leads to
the following:

Definition 7.17. A matrix A is unitarily diagonalizable if it is unitarily equivalent to a diagonal


matrix, i.e.
A = UDU−1 = UDU∗
for some unitary matrix U and diagonal matrix D.

ˆ The columns of U consist of orthonormal eigenvectors

ˆ The diagonal entries of D consist of the correspnding eigenvalues.

Theorem 7.9 specializes to the Spectral Theorem of Hermitian Matrices similar to the one
of symmetric matrices (see Theorem 5.21, 5.23 and 5.24):

Theorem 7.18 (Spectral Theorem for Hermitian Matrices). If A is Hermitian, then

ˆ Two eigenvectors from different eigenspaces are orthogonal.


ˆ All the eigenvalues are real.
ˆ It is unitarily diagonalizable.

Conversely, if A is unitarily diagonalizable and all eigenvalues are real, then A is Hermitian.

-112-
Chapter 7. Complex Matrices 7.4. Normal matrices

!
1 1+i
Example 7.6. A = is Hermitian. It has eigenvalues λ = 3, 0 with eigenvectors
1−i 2
! !
1+i −1 − i
, respectively. Normalizing, we have the diagonalization
2 1

!   !  1+i −1
1+i
1 1+i √ − 1+i
√ 3 0 √ − 1+i

=  6 3   6 3  .
1−i 2 √2 √1 0 0 √2 √1
6 3 6 3

Note. If the eigenvalues are not real, A cannot be Hermitian. Therefore unitarily diagonalizable
does not imply Hermitian. We need another characterization of unitarily diagonalizable matrices.

7.4 Normal matrices

Definition 7.19. A linear transformation T ∈ L (V, V ) is normal if

T T ∗ = T ∗ T.

Correspondingly, a complex matrix is normal if


AA∗ = A∗ A.

Example 7.7. Examples of normal matrices include:

ˆ Hermitian matrices: A commute with A∗ = A.

ˆ Unitary matrices: AA∗ = I = A∗ A.

ˆ Skew-Hermitian matrices: satisfying A∗ = −A.

ˆ Symmetric and orthogonal matrices are normal when they are considered as complex matrix.

Normal operators / matrices share similar properties to the self-adjoint case.

Proposition 7.20. Let T ∈ L (V, V ) be a normal operator.

ˆ For all v ∈ V , we have


∥T v∥ = ∥T ∗ v∥.

ˆ If λ is an eigenvalue of T , then λ is an eigenvalue of T ∗ .


ˆ Eigenvectors corresponding to different eigenvalues are orthogonal to each other.

-113-
Chapter 7. Complex Matrices 7.4. Normal matrices

Proof.

ˆ ⟨T v, T v⟩ = ⟨v, T ∗ T v⟩ = ⟨v, T T ∗ v⟩ = ⟨T ∗ v, T ∗ v⟩.

ˆ T − λI is normal. Hence if T v = λv, then

0 = ∥(T − λI )v∥ = ∥(T − λI )∗ v∥ = ∥(T ∗ − λI )v∥

so v is also an eigenvector of T ∗ .

ˆ If λ1 ̸= λ2 and v1 ∈ Vλ1 , v2 ∈ Vλ2 are eigenvectors, then

(λ1 − λ2 )⟨v1 , v2 ⟩ = ⟨T v1 , v2 ⟩ − ⟨v1 , T ∗ v2 ⟩ = 0

so we must have ⟨v1 , v2 ⟩ = 0.

Proposition 7.21. An upper triangular matrix is normal if and only if it is diagonal.

Proof. Diagonal matrix is obviously normal. Now assume T is an n × n upper triangular normal.

ˆ We proceed by induction. The case n = 1 is trivial.


 
a1 a2 · · · an
 
0 
ˆ Let T =  .  where Tn−1 is (n − 1) × (n − 1) upper triangular.
 
.. Tn−1 
 
0

ˆ Then comparing the (1,1)-entry:

(T∗ T)11 = |a1 |2


(TT∗ )11 = |a1 |2 + · · · + |an |2 .

It follows that if T is normal, |a2 |2 + · · · + |an |2 = 0 so that a2 = · · · = an = 0.


 
a1 0 · · · 0
 
0 
ˆ Hence T =  .  and Tn−1 is normal, hence diagonal by induction.
 
 .. T n−1 


0

We can now state the main theorem:

-114-
Chapter 7. Complex Matrices 7.4. Normal matrices

Theorem 7.22. A square matrix A is unitarily diagonalizable if and only if it is normal.

i.e. for dim V < ∞, V has an orthonormal eigenbasis of T ∈ L (V, V ) if and only if it is normal.

Proof. By Schur’s Lemma, A = UTU∗ for some upper triangular matrix T.

ˆ By direct calculation:

AA∗ = UTU∗ (UTU∗ )∗ = UTU∗ UT∗ U∗ = UTT∗ U∗

A∗ A = (UTU∗ )∗ UTU∗ = UT∗ U∗ UTU∗ = UT∗ TU∗


so that A is normal if and only if T is normal.

ˆ By Proposition 7.21, T is normal if and only if T is diagonal.

Example 7.8. Over K = C, the rotation matrix


! !
cos θ sin θ eiθ 0
R= ∼ .
− sin θ cos θ 0 e−iθ

This follows immediately since R is orthogonal, hence normal, hence diagonalizable.

det(R − λI) = λ2 − 2λ cos θ + 1 = (λ − eiθ )(λ − e−iθ ).


!
∓i
The eigenvectors can be found to be respectively. Note that the eigenvectors are orthogonal
1
since R is normal. But the eigenvalues are not real since R is not Hermitian.

Normalizing, the unitary diagonalization is given by


!   ! i −1
cos θ sin θ − √i2 √i
2
eiθ 0 − √2 √i
2
=  .
− sin θ cos θ √1
2
√1
2
0 e−iθ √1
2
√1
2

This also shows that over K = R, real normal matrix AAT = AT A may not be real diagonal-
izable.

-115-
Chapter 7. Complex Matrices 7.4. Normal matrices

Example 7.9. Since real orthogonal matrices are unitary, their eigenvalues satisfy |λ| = 1.

Solving the characteristic equation, λ can only be 1, −1 or some conjugate pairs e±iθk .

Therefore by Example 7.8, any orthogonal matrix is unitarily equivalent to


 
1
 . 
 .. 
 
 

 −1 

 .. 

 . 

P∼
 
 cos θ 1 sin θ 1 

− sin θ1 cos θ1
 
 
..
 
.
 
 
 

 cos θk sin θk 
− sin θk cos θk

(depending on λ some “blocks” may not exist).

By Theorem 7.16, this is actually an orthogonal similarity. Therefore any orthogonal matrix is
composed of some mirror reflections and rotations with respect to an orthonormal basis of Rn .

A similar argument shows that although any real normal matrix may not be real diagoanlizable,
! block diagonal matrix with 1 × 1 diagonal entries and
it is always orthogonally similar to a real
a b
2 × 2 diagonal blocks of the form for some a, b ∈ R.
−b a

-116-
Chapter 7. Complex Matrices 7.4. Normal matrices

Summarize: We have the following correspondences between n × n matrices over K = R and C:

K=R K=C
Real inner product: −→ Complex inner product:
u · v = u1 v1 + · · · + un vn u · v = u1 v1 + · · · + un vn
(bilinear forms) −→ (sesquilinear forms)

Transpose: −→ Adjoint (conjugate transpose):


T
A : (aij )T = (aji ) A∗ : (aij )∗ = (aji )
det(AT ) = det(A) −→ det(A∗ ) = det(A)

Some matrix may not have −→ Any matrix has n eigenvalues


(real) eigenvalues (with multiplicity)

Orthogonal matrix: −→ Unitary matrix:


PT P = I U∗ U = I
det(P) = ±1 −→ | det(U)| = 1
(complex) eigenvalues |λ| = 1 eigenvalues |λ| = 1

Symmetric matrix: −→ Hermitian matrix:


AT = A A∗ = A
real eigenvalues real eigenvalues

Orthogonally diagonalizable: −→ Unitarily diagonalizable:


A = PDPT A = UDU∗
⇐⇒Symmetric ⇐⇒ Normal: AA∗ = A∗ A

Positive (semi)definite: −→ Positive (semi)definite:


xT Ax > 0 (≥ 0) x∗ Ax > 0 (≥ 0)
√ √
|A| = AT A |A| = A∗ A

SVD: −→ SVD:
A = P1 ΣPT2 A = U1 ΣU∗2

Polar decomposition: −→ Polar decomposition:


A = PH A = UH

-117-
Chapter 7. Complex Matrices 7.4. Normal matrices

-118-
CHAPTER 8

Invariant Subspaces

In this chapter, we study the invariant subspaces of a linear transformation, which reveals more of
its structure through its characteristic polynomial.

8.1 Invariant Subspaces

Let V be a vector space over a field K, and let T ∈ L (V, V ).

Definition 8.1. A subspace U ⊂ V is called a T -invariant subspace if

T (U ) ⊂ U

i.e. u ∈ U =⇒ T (u) ∈ U .

If U is T -invariant, the restriction T |U can be viewed as a linear map U −→ U .

Example 8.1. {0} and V are called trivial invariant subspaces.

Example 8.2. Eigenspaces Vλ of T with eigenvalue λ.

d
Example 8.3. The space of polynomials R[t] is a -invariant subspace of C ∞ (R).
dt

Example 8.4. Ker(T ) and Im(T ) are invariant subspaces of T .

-119-
Chapter 8. Invariant Subspaces 8.1. Invariant Subspaces

Example 8.5. U is an invariant subspace of ProjU .

!
cos θ sin θ
Example 8.6. The only real invariant subspaces of a rotation are the trivial ones
− sin θ cos θ
unless θ is a multiple of π.

Example 8.7. In the proof of Theorem 5.24, we see that if A is symmetric and Vλ is an eigenspace,
then both Vλ and Vλ⊥ are invariant subspaces. This is a special case of Proposition 8.6 below.

The next example of invariant subspace is very important and deserves its own definition.

Definition 8.2. Given a vector v ∈ V ,

Uv := Span{v, T (v), T 2 (v), ...}

is called the cyclic subspace of T generated by the cyclic vector v ∈ V .

Note. Any element u ∈ Uv is a finite sum of the form

u = a0 v + a1 T (v) + a2 T 2 (v) + · · · + aN T N (v), ak ∈ K


= p(T )(v)

for the polynomial p(t) = a0 + a1 t + a2 t2 + · · · + aN tN ∈ K[t].

Proposition 8.3. Let Uv be a cyclic subspace of T generated by v ∈ V .

ˆ Uv is a T -invariant subspace.

ˆ If dim V < ∞, Uv is spanned by the first r elements where r = dim Uv .

Proof.
N
X
ˆ If u ∈ Uv , then u = ak T k (v) for some ak ∈ K. Then
k=0

N
X
T (u) = ak T k+1 (v) ∈ Uv .
k=0

ˆ Let T k (v) be a linearly combination of B = {v, T (v), ..., T k−1 (v)}.


– By induction T l (v) will also be linearly combination of B for any l ≥ k.
– Hence r = dim Uv is the smallest k such that B forms a basis of Uv .

-120-
Chapter 8. Invariant Subspaces 8.1. Invariant Subspaces

Definition 8.4. Let T ∈ L (V, V ). Assume V is a direct sum of T -invariant subspaces Ui

V = U1 ⊕ U2 ⊕ · · · ⊕ Uk .

Let v = u1 + u2 + · · · + uk be the unique decomposition where ui ∈ Ui . Then

T (v) = T1 (u1 ) + T2 (u2 ) + · · · + Tk (uk )

is also a unique decomposition, where Tk := T |Uk .

We write
T = T1 ⊕ T2 ⊕ · · · ⊕ Tk .
Conversely, this means that V is a direct sum of T -invariant subspaces Ui = Dom(Ti ).

If V is finite dimensional, by picking a basis of the invariant subspaces, we can represent T by a


block diagonal matrix.

Proposition 8.5. Let dim V < ∞. Then with respect to a direct sum basis of V :

(1) If T = T1 ⊕ · · · ⊕ Tk and Ti is represented by a square matrix Ai , then T is represented by a


block diagonal matrix
 
A1 O
A=
 .. 
.
 . 
O Ak

(2) Let U ⊂ V be T -invariant and write V = U ⊕ W . If the square matrix AU represents T |U ,


then T is represented by a block upper triangular matrix
!
AU ∗
A=
O ∗

for some block matrices ∗ of appropriate sizes.

If V is an inner product space, we can say something about the adjoint as well.

Proposition 8.6. Let U ⊂ V be T -invariant subspace and V = U ⊕ U ⊥ an orthogonal direct sum.

ˆ U ⊥ is T ∗ -invariant.

ˆ If dim V < ∞ and T is normal, then U ⊥ is also T -invariant (so that U is also T ∗ -invariant).

-121-
Chapter 8. Invariant Subspaces 8.2. Cayley–Hamilton Theorem

Proof.

ˆ If w ∈ U ⊥ , then for any u ∈ U , T (u) ∈ U and

⟨u, T ∗ (w)⟩ = ⟨T (u), w⟩ = 0

so that T ∗ (w) ∈ U ⊥ .

ˆ If T is normal and V is finite dimensional, we represent T by a block matrix with respect to


a basis of V = U ⊕ U ⊥ : !
A B
.
O C
Then T T ∗ = T ∗ T implies, on the upper left block:

AA∗ + BB∗ = A∗ A.

ˆ Note that the diagonal entries (BB∗ )ii is just ∥ri ∥2 where ri is the i-th row of B.

ˆ But Tr(AA∗ ) = Tr(A∗ A) (see Proposition 2.16).

ˆ Since trace is linear, Tr(BB∗ ) = 0, so that ∥ri ∥ = 0 for all i. i.e. B = O is a zero matrix.

ˆ Therefore T is block diagonal, and T = T |U ⊕ T |U ⊥ is its invariant subspace decomposition.

8.2 Cayley–Hamilton Theorem

For the rest of the chapter, let dim V = n over K be finite dimensional, and let A ∈ Mn×n (K) be
a matrix representing T ∈ L (V, V ).

Recall that the characteristic polynomial of T is a polynomial over K of degree n in λ given by

p(λ) = det(A − λI)

and it does not depend on the choice of A (matrices representing T for different bases are similar).

Proposition 8.7. If U ⊂ V is a T -invariant subspace, then the characteristic polynomial p|U (λ)
of T |U divides p(λ). i.e.
p(λ) = p|U (λ)q(λ)
for some polynomial q(λ).

Proof. By Proposition 8.5 (2), represent T by a block triangular matrix, and apply Corollary 3.8.

-122-
Chapter 8. Invariant Subspaces 8.2. Cayley–Hamilton Theorem

We can now state the following fundamental result.

Theorem 8.8 (Cayley–Hamilton Theorem). If p(λ) is the characteristic polynomial of T , then

p(T ) = O.

Proof. Let v ∈ V be any vector. Let U = Uv be the cyclic subspace of T , which is T -invariant.

ˆ First we show that p|U (T )(v) = 0.


– By Proposition 8.3, U has a basis B = {v, T (v), ..., T r−1 (v)} for some 1 ≤ r ≤ n.
– Hence T r (v) = −a0 v − a1 T (v) − · · · − ar−1 T r−1 (v) for some scalar ai ∈ K.
– This means that with respect to B, T |U is represented by the matrix
 
0 0 ··· 0 −a0
 
1 0 · · · 0 −a1 
 
 .. 
A = 0 1
 0 . −a2  .
. . . .
 .. .. .. .. 

 
0 ··· 0 1 −ar−1
– A direct calculation (e.g. using Laplace expansion) shows that
p|U (λ) = det(A − λI) = (−1)r (λr + ar−1 λr−1 + · · · + a1 λ + a0 ).
– Hence p|U (T )(v) = 0 by the equation of T r (v) above.
ˆ Now by Proposition 8.7, p(λ) = p|U (λ)q(λ) for some polynomial q(λ).
ˆ Since p|U (T )(v) = 0, we have p(T )(v) = 0.
ˆ Since p(T ) does not depend on v, this is true for any v ∈ V , so p(T ) is the zero map.
The Cayley–Hamilton Theorem gives another way to calculate the inverse of a matrix.
 
3 0 0
 
Example 8.8. Let A = 0 1 1. Then
0 0 1
det(A − λI) = (1 − λ)2 (3 − λ) = 3 − 7λ + 5λ2 − λ3 .
In particular λ = 0 is not an eigenvalue, so A−1 exists. By the Cayley–Hamilton Theorem, we have
3I − 7A + 5A2 − A3 = O.
Therefore 3I = 7A − 5A2 + A3 .
Multiplying both sides by A−1 we obtain
3A−1 = 7I − 5A + A2 .
This gives the inverse of A easily by usual matrix multiplication only.

-123-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials

8.3 Minimal Polynomials

By the Cayley–Hamilton Theorem, if p(λ) is the characteristic polynomial of T , then

p(T ) = O.

Although this polynomial tells us about the eigenvalues (and their multiplicities), it is sometimes
too “big” to tell us information about the structure of the linear map.

Definition 8.9. The minimal polynomial m(λ) is the unique polynomial such that

m(T ) = O

with leading coefficient 1, and has the smallest degree among such polynomials.

The condition “leading coefficient equals 1” also means we exclude the case of zero polynomial.
Since p(T ) = O, a minimal polynomial must exist.

To see it is unique: If we have different minimal polynomials m, m′ , then m(T ) − m′ (T ) = O,


but since m, m′ have the same degree with the same leading coefficient, m − m′ is a polynomial
with smaller degree satisfying the condition (after rescaling), unless m − m′ is identically zero.

Note. Since m(λ) is defined in terms of T only, it is the same for any matrix representing T .

Since m(λ) has the smallest degree, in particular we have

deg(m) ≤ deg(p) = n.

 
2 0 0
 
Example 8.9. The diagonal matrix A = 
0 2 0 has characteristic polynomial

0 0 2

p(λ) = (2 − λ)3

but obviously A − 2I = O, hence the minimal polynomial of A is just

m(λ) = λ − 2.

In particular,

The minimal polynomial m(λ) of A has degree 1 if and only if A is a multiple of I.

-124-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials

 
1 0 0
 
Example 8.10. The diagonal matrix A = 
0 2 0 has characteristic polynomial

0 0 2

p(λ) = (1 − λ)(2 − λ)2 .

Since A is not a multiple of I, m(λ) has degree at least 2. Since (A−I)(A−2I) = O, the polynomial

m(λ) = (λ − 1)(λ − 2)

having degree 2 is the minimal polynomial.

 
1 1 0
 
Example 8.11. The matrix A = 
 0 1 1  has characteristic polynomial

0 0 1

p(λ) = (1 − λ)3

and it turns out that the minimal polynomial is (up to a sign) the same also:

m(λ) = (λ − 1)3 .

From the above examples, we also observe that

Proposition 8.10. If p(λ) is any polynomial such that p(T ) = O, then m(λ) divides p(λ), i.e.

p(λ) = m(λ)q(λ)

for some polynomial q(λ).

In particular, the minimal polynomial divides the characteristic polynomial.

Proof.

ˆ We can do a polynomial long division

p(λ) = m(λ)q(λ) + r(λ)

where r(λ) is the remainder with deg(r) < deg(m).

ˆ Since p(T ) = O and m(T ) = O, we must have r(T ) = O.

ˆ But since deg(r) < deg(m) and m is minimal, r must be the zero polynomial.

-125-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials

Proposition 8.11. The set of roots of m(λ) consist of all the eigenvalues of T .

Proof.

ˆ Since m(λ) divides the characteristic polynomial p(λ), it only has eigenvalues as roots.

ˆ On the other hand, if v is an eigenvector with eigenvalue µ, then m(T ) = O implies

0 = m(T )v = m(µ)v.

ˆ Since v ̸= 0, we have m(µ) = 0, so µ is a root of m(λ).

The minimal polynomial of T allows us to decompose V into T -invariant subspaces.

Theorem 8.12 (Primary Decomposition Theorem). If we have a factorization

m(λ) = p1 (λ)p2 (λ) · · · pk (λ)

where pi (λ) and pj (λ) are relatively prime (i.e. have no common factors) for i ̸= j, then

V = Ker(p1 (T )) ⊕ Ker(p2 (T )) ⊕ · · · ⊕ Ker(pk (T )).

Note that each Ker(pi (T )) is a T -invariant subspace.

Proof. The case k = 1 is trivial, while the general case follows directly by induction.
Hence we only need to prove the case k = 2.

ˆ By the Euclidean algorithm, there exists polynomials q1 (λ), q2 (λ) such that

1 = p1 (λ)q1 (λ) + p2 (λ)q2 (λ),

i.e.
I = p1 (T )q1 (T ) + p2 (T )q2 (T ).

ˆ Applying to any v ∈ V , we define v1 , v2 by:

v = p1 (T )q1 (T )(v) + p2 (T )q2 (T )(v) .


| {z } | {z }
v2 v1

-126-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials

ˆ We have V = Ker(p1 (T )) + Ker(p2 (T )):


– We check that

p1 (T )(v1 ) = p1 (T )p2 (T )q2 (T )(v) = m(T )q2 (T )(v) = 0

so that v1 ∈ Ker(p1 (T )). Similarly v2 ∈ Ker(p2 (T )).

ˆ We have Ker(p1 (T )) ∩ Ker(p2 (T )) = {0}:


– If u ∈ Ker(p1 (T )) ∩ Ker(p2 (T )), then by above,

u = q1 (T )p1 (T )(u) + q2 (T )p2 (T )(u) = 0 + 0 = 0.

ˆ Therefore V = Ker(p1 (T )) ⊕ Ker(p2 (T )).

We arrive at one of the most useful criterion of diagonalization.

Theorem 8.13. T is diagonalizable if and only if m(λ) only has distinct linear factors.

Proof.

(⇐=) If m(λ) = (λ − λ1 ) · · · (λ − λk ) for distinct λi , by the Primary Decomposition Theorem,

V = Ker(T − λ1 I ) ⊕ · · · ⊕ Ker(T − λk I ).

But Ker(T − λi I ) = Vλi is just the eigenspace of T . So we have decomposed V into direct
sums of eigenspace (in particular V has an eigenbasis), hence T is diagonalizable.

(=⇒) If T is diagonalizable, let {u1 , ..., un } be an eigenbasis of V with distinct eigenvalues µ1 , ..., µk .
Then m(λ) = (λ − µ1 ) · · · (λ − µk ) is clearly the smallest polynomial containing all µi as roots,
and
m(T )ui = m(λi )ui = 0
since λi = µj for some j. So m(λ) is the minimal polynomial by Proposition 8.11.

Using this result, minimal polynomials allow us to determine whether a matrix is diagonalizable or
not without even calculating the eigenspaces!
!
−1 1
Example 8.12. The matrix A = has characteristic polynomial p(λ) = (λ − 1)2 . Since
−4 3
m(λ) ̸= λ − 1 because A ̸= I, we must have m(λ) = (λ − 1)2 , hence A is not diagonalizable.

-127-
Chapter 8. Invariant Subspaces 8.4. Spectral Theorem of Commuting Operators

 
−1 1 0
  2
Example 8.13. The matrix A =  −4 3 0 has characteristic polynomial p(λ) = −λ(λ − 1) ,
−1 0 0
hence it has eigenvalues λ = 1 and λ = 0. The minimal polynomial can only be λ(λ−1) or λ(λ−1)2 .
Since
A(A − I) ̸= O
the minimal polynomial must be m(λ) = λ(λ − 1)2 , hence A is not diagonalizable.

 
2 −2 2
  2
Example 8.14. The matrix A =  0 −2 4 has characteristic polynomial p(λ) = −λ(λ − 2) ,

0 −2 4
hence it has eigenvalues λ = 2 and λ = 0. The minimal polynomial can only be λ(λ−2) or λ(λ−2)2 .
Since
A(A − 2I) = O
the minimal polynomial is m(λ) = λ(λ − 2), hence A is diagonalizable.

8.4 Spectral Theorem of Commuting Operators

One of the most important results in Linear Algebra says that commuting linear operators can be
simultaneously diagonalized, i.e. they can be diagonalized at the same time by a single basis.

First we derive some more properties of invariant subspaces.

Proposition 8.14. Let V be an inner product space and U ⊂ V be a T -invariant subspace.

ˆ If T is diagonalizable, then T |U is also diagonalizable.

ˆ If T is normal, then T |U is also normal.

Proof.

ˆ If T is diagonalizable, m(λ) only has distinct linear factors. T |U satisfies m(T |U ) = 0 also, so
m|U (λ) divides m(λ) by Proposition 8.10, and m|U (λ) also only has distinct linear factors.

ˆ By Proposition 8.6, T = T |U ⊕ T |U ⊥ , so the diagonal block A representing T |U satisfies


AA∗ = A∗ A and is normal.
(Alternatively, also follows from the proof of Proposition 8.6 itself since B = O.)

-128-
Chapter 8. Invariant Subspaces 8.4. Spectral Theorem of Commuting Operators

Theorem 8.15 (Spectral Theorem of Commuting Operators). Let {Ti }ki=1 be a finite set of
diagonalizable linear transformations in L (V, V ). Then

Ti Tj = Tj Ti , ∀i, j

if and only if they can be simultaneously diagonalized, i.e. there exists a single basis B of V
such that they are eigenvectors for all Ti .

If in addition each Ti is normal, then they can be simultaneously unitarily diagonalized.

Proof. Assume the Ti ’s commute. We proceed by induction, where the k = 1 case is trivial.

ˆ Let V = Vλ1 ⊕ · · · ⊕ VλN be an eigenspace decomposition of Tk .


(If Tk is normal, this is an orthogonal decomposition.)

ˆ Each Vλj is Ti -invariant:

v ∈ Vλj =⇒ Tk Ti (v) = Ti Tk (v) = λj Ti (v) =⇒ Ti (v) ∈ Vλj .

ˆ By Proposition 8.14, for any j the restriction

Ti |Vλj , i = 1, ..., k − 1

are diagonalizable (and normal), and commute with each other.

ˆ Hence by induction, there exists a simultaneous (orthonormal) eigenbasis of Ti on Vλj for


i = 1, ..., k − 1.

ˆ But Vλj is eigenspace of Tk . Therefore this is a simultaneous eigenbasis for all Ti for i = 1, ..., k.

ˆ Collecting all the basis vectors for different Vλj gives us the basis B.

On the other hand, if they can be simultaneously diagonalized, let {u1 , ..., un } be a common
eigenbasis. Then

Ti Tj uk = λTi uk = λλ′ uk
Tj Ti uk = λ′ Tj uk = λ′ λuk

for some eigenvalues λ, λ′ . Hence Ti Tj = Tj Ti on a basis, hence on the whole space.

-129-
Chapter 8. Invariant Subspaces 8.4. Spectral Theorem of Commuting Operators

Corollary 8.16. The Spectral Theorem of Commuting Operators also holds for an infinite collec-
tion {Ti } of commuting operators.

Proof. Since L (V, V ) is finite dimensional, every Ti is a linear combination of a finite basis of
operators. Hence we just need to apply the Spectral Theorem to the finite set.

In terms of matrices, we can rephrase this in a more familiar form:

Corollary 8.17 (Spectral Theorem of Commuting Matrices). Let {Ai } be a collection of


commuting matrices that are diagonalizable. Then they can be simultaneously diagonalized, i.e.
there exists a single invertible matrix P such that PAi P−1 is diagonal for all i.

If all the Ai are normal, then P can be chosen to be a unitary matrix.

-130-
CHAPTER 9

Canonical Forms

We are now ready to construct the canonical forms of a matrix. This completely determines the
structure of a given matrix. It is also the best approximation to diagonalization if the matrix is
not diagonalizable.

In this chapter, let dim V < ∞ and T ∈ L (V, V ).

9.1 Nilpotent Operators

Assume the minimal polynomial of T completely factorizes (e.g. when K = C):

m(λ) = (λ − λ1 )m1 · · · (λ − λk )mk

for distinct eigenvalues λi . The Primary Decomposition Theorem says that

V = Ker(T − λ1 I )m1 ⊕ · · · ⊕ Ker(T − λk I )mk .

Definition 9.1. The invariant subspace of the form

Ker(T − λI )m , λ ∈ C, m ∈ N

is called a generalized eigenspace.

Therefore it suffices to study the restriction of T to generalized eigenspaces.

-131-
Chapter 9. Canonical Forms 9.1. Nilpotent Operators

Definition 9.2. A linear operator T ∈ L (V, V ) is nilpotent if

Tm = O

for some positive integer m.

Example 9.1. Any upper triangular matrix with 0’s on the diagonal is nilpotent.

Let V = Ker(T − λI )m where m is the smallest possible (i.e. the exponent of m(λ)).

Then S = T − λI is nilpotent. Note that S-invariant also means T -invariant.

It remains to find a basis such that S looks simple.

Proposition 9.3. Let v ∈ V be a vector such that S k v = 0 but S k−1 v ̸= 0.

Let Uv be the S-invariant cyclic subspace generated by v.

ˆ dim(Uv ) = k with basis {S k−1 v, ..., Sv, v}, called a Jordan chain of size k.

ˆ For r ≤ k, Ker(S r ) ∩ Uv is spanned by the first r basis vectors.

Note. S k−1 v is an eigenvector of T .

Note. If v ∈ Ker(S), then {v} is a Jordan chain of size 1.

Proof. If c0 v + c1 Sv + · · · + ck−1 S k−1 v = 0, applying S repeatedly shows that ci = 0 ∀i.


The order of the basis is chosen such that S|Uv is represented by an upper triangular matrix:
 
0 1
 
 0 1 
 
S=
 . . . .

. .  .

 

 0 1 

0

-132-
Chapter 9. Canonical Forms 9.1. Nilpotent Operators

Definition 9.4. The matrix representing T = S + λI restricted to Uv is given by the Jordan


block of size k × k and eigenvalue λ:
 
λ 1
 
 λ 1 
 
(k) 
Jλ := 
 . .
.. ..

.

 

 λ 1 
λ

The following is our main result, which shows that V can be decomposed into cyclic subspaces.

Theorem 9.5. V admits a basis of Jordan chains, i.e. a basis of the form

B = {S k1 −1 v1 , ..., v1 } ∪ {S k2 −1 v2 , ..., v2 } ∪ · · · ∪ {S kN −1 vN , ..., vN }

for some integers N, ki ≥ 1 and some vi ∈ V satisfying Proposition 9.3.

In other words, in terms of T -invariant cyclic subspaces,

V = Uv1 ⊕ · · · ⊕ UvN
(k )
where T |Uvi is represented by the Jordan block Jλ i .

Proof. We prove by induction on dim V . If dim V = 1 it is trivial.

ˆ Since S is nilpotent, it is not invertible:

0 = det(Sk ) = det(S)k =⇒ det(S) = 0

so Im(S) ⊊ V has smaller dimension.

ˆ By induction, there exists Jordan chain basis of Im(S):


N
[
Bu = {S ki −1 ui , ..., ui }
i=1

for some ui ∈ Im(S). Our goal is to extend Bu to a Jordan chain basis of V .

ˆ Since ui ∈ Im(S), there exists vi ∈ V such that Svi = ui .


Note that
{S ki −1 ui , ..., ui } = {S ki vi , ..., Svi }.

-133-
Chapter 9. Canonical Forms 9.1. Nilpotent Operators

ˆ We add vi to our basis, and show that the collection of Jordan chains
N
[
Bv = {S ki vi , ..., Svi , vi }
i=1
is linearly independent:
X
– If aij S j vi = 0, apply S gives us a span of basis in Bu , hence all aij = 0 except
i,j
possibly when j = ki where the terms are killed by S. Rename ai = aiki .
X X
– We are left with ai S ki vi = 0, which is ai S ki −1 ui = 0, again a span of basis in
i i
Bu . Hence all ai = 0.
ˆ Finally we extend Bv to a basis B of V by possibly adding some vectors w1 , ..., wk ∈ V .
However, this is not a Jordan chain yet.
ˆ Swi ∈ Im(S) = Span(Bu ). But any vector from Bu is obtained from Bv by applying S. Hence
there exists w′i ∈ Span(Bv ) such that
Swi = Sw′i .

ˆ Modifying wi 7→ wi − w′i , we have wi ∈ Ker(S), hence {wi } is a Jordan chain of size 1.


ˆ Since w′i ∈ Span(Bv ), the span does not change. Therefore the collection
N
[
B= {S ki vi , ..., Svi , vi } ∪ {w1 } ∪ · · · ∪ {wk }
i=1
is our required Jordan chain basis.

This decomposition is unique up to permutations of the blocks:

Proposition 9.6. The combination of Jordan blocks is uniquely determined by T .

Proof. Ker(S k ) are T -invariant subspaces with


{0} ⊂ Ker(S) ⊂ · · · ⊂ Ker(S m ) = V.
By construction, the first k basis vectors of each block together span the space Ker(S k ).

Let dk = dim Ker(S k ). If Nk is the number of Jordan blocks of size k × k, then


N1 + N2 + · · · + Nm = d1
N2 + · · · + Nm = d2 − d1
..
.
Nm = dm − dm−1 .
Therefore all Nk , and hence the combination of Jordan blocks is uniquely determined by T .

-134-
Chapter 9. Canonical Forms 9.2. Jordan Canonical Form

9.2 Jordan Canonical Form

Combining all general eigenspaces, we can now state the main theorem:

Theorem 9.7 (Jordan Canonical Form). There exists a basis of V such that T is represented
by
(k ) (k )
J := Jλ11 ⊕ · · · ⊕ JλNN
where λi consists of all the eigenvalues of T . (λi with different indices may repeat!).

In other words, any A ∈ Mn×n (C) is similar to a Jordan Canonical Form J.

This decomposition is unique up to permuting the order of the Jordan blocks.

Note. An orthonormal basis for this decomposition may not exist.

Since eigenvalues, characteristic polynomials, minimal polynomials, and multiplicity etc. are all the
same for similar matrices, if we can determine the Jordan block from these data, we can determine
the Jordan Canonical Form of a matrix A.

Notation. From now on we normalize characteristic polynomial using p(λ) = det(λI − T ) instead.

Let us first consider a single block.

(k)
Example 9.2. The Jordan block Jλ1 has

ˆ only one eigenvalue λ1 ,

ˆ characteristic polynomial (λ − λ1 )k ,

ˆ minimal polynomial (λ − λ1 )k ,

ˆ geometric multiplicity of λ1 is 1.

Now let us combine several blocks with the same eigenvalue λ1 :


(k ) (k )
Example 9.3. The matrix Jλ11 ⊕ · · · ⊕ Jλ1N has

ˆ only one eigenvalue λ1 ,


ˆ characteristic polynomial (λ − λ1 )k1 +···+kN ,
ˆ minimal polynomial (λ − λ1 )max(k1 ,...,kN ) ,
ˆ geometric multiplicity of λ1 is N .

-135-
Chapter 9. Canonical Forms 9.2. Jordan Canonical Form

Now we can do the same analysis by combining different Jordan blocks and obtain:

Proposition 9.8. Given a matrix A in the Jordan canonical form:


ˆ The eigenvalues λ1 , ..., λk are the entries of the diagonal.

ˆ The characteristic polynomial is

p(λ) = (λ − λ1 )n1 · · · (λ − λk )nk

where the algebraic multiplicity ni is the number of occurrences of λi on the diagonal.

ˆ The minimal polynomial is

m(λ) = (λ − λ1 )m1 · · · (λ − λk )mk

where mi is the size of the largest λi -block in A.

ˆ The geometric multiplicity of λi is the number of λi -blocks in A.

Example 9.4. Assume A is a 6 × 6 matrix with characteristic polynomial


p(λ) = (λ − 2)4 (λ − 3)2
and minimal polynomial
m(λ) = (λ − 2)2 (λ − 3)2
with eigenspaces dim V2 = 3 and dim V3 = 1. Then it must have 3 blocks of λ = 2 with maximum
block-size 2 so that the λ = 2 blocks add up to 4 rows. It also has 1 block of λ = 3 with block-size 2.
Hence
(2) (1) (1) (2)
A ∼ J2 ⊕ J2 ⊕ J2 ⊕ J3 .
 
2 1
 
 0 2 
 
 2 
A∼
 
 2 .
 

 3 1  
0 3

The uniqueness of Jordan Canonical Form says that A is also similar to the matrix where the
Jordan blocks are in different order. For example we can have:
 
3 1
 
 0 3 
 
 2 
A∼
 
2 1 .
 

 0 2 

2
This is simply obtained by permuting the basis.

-136-
Chapter 9. Canonical Forms 9.2. Jordan Canonical Form

Example 9.5. Let A be a matrix such that it has characteristic polynomial

p(λ) = λ4 (λ − 1)3 (λ − 2)3

and minimal polynomial m(λ) = λ3 (λ − 1)2 (λ − 2).


With this information only, we can determine
(3) (1) (2) (1) (1) (1) (1)
A ∼ J0 ⊕ J0 ⊕ J1 ⊕ J1 ⊕ J2 ⊕ J2 ⊕ J2 .
 
 0 1 0 
 0 0 1 
 
 0 0 0 
 

 0 

1 1
 
A∼
.

 0 1 
 

 1 


 2 


 2 

2

It turns out that when the matrix is bigger than 6 × 6, sometimes we cannot determine the
Jordan Canonical Form just by knowing p(λ), m(λ) and the dimension of the eigenspaces only:

Example 9.6. Consider a 7 × 7 matrix A. Let p(λ) = λ7 , m(λ) = λ3 , and dim V0 = 3. Then A
has 3 blocks and the largest block has size 3. So it may be similar to
(3) (3) (1) (3) (2) (2)
J0 ⊕ J0 ⊕ J0 or J0 ⊕ J0 ⊕ J0 .

However, by the uniqueness of Jordan Canonical Form, we know that these two are not similar to
each other, but we cannot tell which one is similar to A just from the given information.

Example 9.7. If each eigenvalue corresponds to a unique block, we can find the basis P such that
A = PJP−1 by the following:

(1) For each eigenvector v1 = v with eigenvalue λ, solve for vi such that (T − λI )vi = vi−1 until
no solutions can be found.
(k)
(2) The collection {v1 , v2 , ..., vk } will be the basis corresponding to a Jordan block Jλ .

(3) Repeat with all eigenvectors.

However this method does not work in general if we have multiple blocks of the same eigenvalues.
In this case we need find the basis as Jordan chains directly following the proof of Theorem 9.5.

-137-
Chapter 9. Canonical Forms 9.3. Rational Canonical Form

Note that the Jordan Canonical Form is constructed only with a given minimal polynomial, which
2
always exists since dim L (V, V ) = n2 (so that {I, T, T 2 , · · · , T n } must be linearly dependent).

Therefore it allows us to prove many previous results that work for similar matrices.

Theorem 9.9. Let A be a complex square matrix.

ˆ Cayley–Hamilton Theorem holds.

ˆ A is diagonalizable if and only if m(λ) only has linear factors.

ˆ A is similar to an upper triangular matrix.

ˆ A is similar to AT .

Proof. We illustrate the proof of the last statement. Write A in terms of the Jordan Canonical
Form J.
(k)
ˆ Each k × k block Jλ is similar to its transpose by the permutation matrix
 
0 ··· 1
. ..  −1
Sk =  .. 1
 . = Sk .
1 ··· 0

ˆ Combining different blocks, we conclude that J = SJT S−1 for some S, hence
A ∼ J ∼ JT ∼ AT .

9.3 Rational Canonical Form

Let dim V < ∞ over K, and T ∈ L (V, V ) be our linear transformation.



If m(λ) does not factorize completely (e.g. when K = Q, x2 −2 cannot be factorized since 2 ̸∈ Q),
we cannot apply Jordan Canonical Form. However, there is another form one can reduce to, called
the Rational Canonical Form.

Formally speaking,

ˆ Jordan Canonical Form writes V as a direct sum of as many cyclic subspaces as possible.
ˆ Rational Canonical Form writes V as a direct sum of as few cyclic subspaces as possible.

Pick any vector v ∈ V , and form the T -invariant cyclic subspace Uv .

Then from the proof of Cayley–Hamilton Theorem, we have already deduced that:

-138-
Chapter 9. Canonical Forms 9.3. Rational Canonical Form

Proposition 9.10. Let B := {v, T v, ..., T r−1 v} be a basis of Uv . Then T |Uv is represented by a
matrix  
0 0 ··· 0 −a0
 
1 0
 ··· 0 −a1 
 .. 
C(g) :=  0 1 0 . −a2 
. .. .. .. 
 .. . . . 
 
0 ··· 0 1 −ar−1
called the companion matrix of g(λ) where

g(λ) := p|Uv (λ) = λr + ar−1 λr−1 + · · · + a1 λ + a0

is the characteristic and minimal polynomial of T |Uv .

In particular, g(λ) = p|Uv (λ) divides the characteristics polynomial p(λ) of T on V

Note that the characteristic and minimal polynomial of T |Uv must be the same since the degree of
the minimal polynomial must be degree r, otherwise B cannot be linearly independent.

It turns out that by picking these cyclic vectors one by one “smartly”, one can decompose V
into cyclic subspaces, where the factors g(λ) of p(λ), called the invariant factors obey certain
conditions. The “largest factor” will be the minimal polynomial m(λ).

Note that all the distinct irreducible factors of p(λ) should be a factor of m(λ) since m(λ) contains
all the (complex) eigenvalues as roots.
Theorem 9.11 (Rational Canonical Form). V can be decomposed into T -invariant cyclic
subspaces
V = Uv1 ⊕ · · · ⊕ Uvk
such that if gi (λ) is the characteristic and minimal polynomial of T |Uvi , then they satisfy
ˆ gi (λ) divides gi+1 (λ),
ˆ p(λ) = g1 (λ) · · · gk (λ),
ˆ m(λ) = gk (λ).
The collection of invariant factors {g1 (λ), ..., gk (λ)} is uniquely determined by T .

In terms of the decomposition, T is represented by block diagonal matrix:


 
C(g1 ) O
A=
 .. 
 . 

O C(gk )
where C(gi ) are the companion matrices of gi (λ).

-139-
Chapter 9. Canonical Forms 9.3. Rational Canonical Form

Both the Jordan and rational canonical forms are special cases of the “Structure Theorem for
Finitely Generated Modules over a Principal Ideal Domain”, belonging to the branch of
module theory in advanced abstract algebra.

Nevertheless, for completeness, I tried to translate the proof of the Theorem 9.11 purely in terms
of linear algebra and put it in Appendix D for the advanced students who are interested.

Example 9.8. Assume A is a matrix such that

p(λ) = λ4 (λ − 1)3 (λ − 2)3


m(λ) = λ(λ − 1)2 (λ − 2).

With these information, we can conclude that:

ˆ A has size 10 × 10 since deg p = 10.

ˆ The invariant factors must be of the form:

λ, λ(λ − 2), λ(λ − 1)(λ − 2), λ(λ − 1)2 (λ − 2)

in order to satisfy the conditions.

By expanding, we have

g1 (λ) = λ
g2 (λ) = λ(λ − 2) = λ2 − 2λ
g3 (λ) = λ(λ − 1)(λ − 2) = λ3 − 3λ2 + 2λ
g4 (λ) = λ(λ − 1)2 (λ − 2) = λ4 − 4λ3 + 5λ2 − 2λ.

Hence the rational canonical form of A is given by


 
0
 

 0 0 


 1 2 


 0 0 0 

A∼
 1 0 −2 
.
 
 0 1 3 
 

 0 0 0 0 


 1 0 0 2 


 0 1 0 −5 

0 0 1 4

-140-
CHAPTER 10

Quotient and Dual Spaces

10.1 Quotient Spaces


In mathematics, we often encounter situations where we want to “ignore” some properties and treat
objects with the same properties as “the same” or “equivalent”:
ˆ In number theory, we often only care about the remainder of an integer under division by
some integer n, i.e. we don’t care about the multiples of n, or we only consider integers “up
to a multiple of n”. This is the modular arithmetic.
ˆ When we differentiate, two functions that differ by a constant give us the same derivative, i.e.
we don’t care about functions that differ by a constant, or we only consider functions “up to
addition of a constant”. This is evident in the suggestive notation of the indefinite integral:
Z
f (x)dx = F (x) + C.

ˆ When we integrate, two functions differ by finitely many values give us the same integrals,
i.e. we don’t care about functions that differ by finitely many values, or we only consider
functions “up to finitely many changes in values” (more generally, up to a measure zero set).

This means that we often only consider equivalence classes of objects sharing the same properties.

In Linear Algebra, such situation happens usually when we consider direct sum:

V = U ⊕ W.

By uniqueness of direct sum decomposition, sometimes we don’t care about the U part, and only
want to focus our attention on the W part. However there are many choices for W , so just taking
any W coordinates is not a well-defined process. If V is an inner product space, then we have the
orthogonal complement U ⊥ . But in general we do not have such canonical choice.

-141-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

Still, we know that whatever W we chose, they will have the same dimension. Therefore we want
to define canonically a vector space that behaves just like W for any choice of W .

To do the “ignore the U part” procedure, we use the same idea as above, i.e. by considering two
vectors to be equivalent if they differ by an element in U . We give the following definition:

Definition 10.1. Let U ⊂ V be a subspace.

ˆ We define an equivalence relation on V such that for any v, w ∈ V :

v ∼U w ⇐⇒ v − w ∈ U.

For any w ∈ V , we denote the equivalence class by w, i.e. the subset of V given by

w := {v ∈ V : v − w ∈ U } = w + U

(sometimes called the U -coset).

ˆ We call any vector w0 ∈ V a representative of w if

w0 = w.

By definition, for any w1 , w2 ∈ V ,

w1 = w2 ⇐⇒ w1 + U = w2 + U ⇐⇒ w1 − w2 ∈ U.

ˆ We denote the set of equivalence classes by

V /U := V / ∼U .

It turns out we can define addition and scalar multiplication on V /U , making it a vector space:

Theorem 10.2. Let U ⊂ V be subspace. Then

ˆ V /U is a vector space with binary operations given by:

v + w := v + w, v, w ∈ V.
c · v := c · v, c ∈ K, v ∈ V.

ˆ The zero element is given by 0 ∈ V /U .

We call V /U the quotient space of V by U .

Note. For any u ∈ U , u ∈ V /U is also the zero element: u = 0 ⇐⇒ u − 0 ∈ U .

-142-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

The formula relies on representatives v, w of v and w and apply the operation on the original
vector spaces. But as we know an equivalence class may have different representatives:

v′ = v, w′ = w,

so we might as well pick another v′ , w′ to calculate.

Therefore to see that the operations are well-defined, we need to check that:

If v′ , w′ ∈ V are another representatives of v, w respectively, then we still have

v′ + w′ = v + w
c · v′ = c · v.

Proof. If v′ = v and w′ = w, then

(v′ + w′ ) − (v + w) = (v′ − v) + (w′ − w) ∈ U

and
(c · v′ ) − (c · v) = c · (v′ − v) ∈ U.
Hence both sides define the same equivalence classes.

Since the operations are now well-defined, the vector space axioms of V /U follows from the vector
space axioms of V , e.g.
v+w=v+w=w+v=w+v
and so on.

From the definition, we immediate have the following result:

Proposition 10.3. Let U ⊂ V be a subspace. The map

π:V −→ V /U
v 7→ v

is a surjective linear transformation with kernel Ker(π) = U .

It is called the quotient map.

Conversely, if T is a surjective linear map, then two vectors mapped to the same point if they differ
by an element in Ker(T ). So we can identify Im(T ) with the vectors in V “up to Ker(T )”.

More precisely, we have the following fundamental theorem.

-143-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

Theorem 10.4 (First Isomorphism Theorem). If T : V −→ W is a linear transformation,


then we have a canonical isomorphism

V /Ker(T ) ≃ Im(T )
v 7→ T (v).

Note. Recall that canonical means “does not depend on basis”.

Proof. There are several things to check, which are all straightforward.

ˆ The map is well-defined:


– If v′ is another representative of v, then

T (v′ ) − T (v) = T (v′ − v) = 0

since v′ − v ∈ Ker(T ).

ˆ The map is linear:

u + c · v = u + c · v 7→ T (u + c · v) = T (u) + c · T (v).

ˆ The map is an isomorphism:


– Surjective: If w ∈ Im(T ), then ∃v ∈ V such that T (v) = w, and v is the preimage.
– Injective: If T (v) = 0, then v ∈ Ker(T ), hence v = 0.

Remark. As the name suggested, in fact there are also Second, Third and Fourth Isomorphism The-
orems. Also the construction of quotients and the Isomorphism Theorems work as well for groups, rings,
modules, fields, topological spaces etc.

This motivates Category Theory, which studies the common features of all mathematical objects in a
universal setting, including the universal properties below.

As an immediate consequence, by Example 2.11 we have

Corollary 10.5. If V = U ⊕ W , then we have a canonical isomorphism

W ≃ V /U

under the projection map π : V −→ W .

-144-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

a
Remark. For integers, q = is a quotient if a = bq. Recall that as a set U ⊕ W ≃ U × W . Hence
b
V = U ⊕ W ≃ U × V /U

gives the set theoretical explanation of the name “quotient” space.

Since dim U + dim W = dim V , we have (also true for infinite dimensional)

Corollary 10.6 (Dimension Formula).

dim V /U = dim V − dim U.

Combining with First Isomorphism Theorem, it gives another proof of the Rank-Nullity Theorem.

In addition, if V is an inner product space:

Corollary 10.7. If V = U ⊕ U ⊥ , then


ˆ We have
U ⊥ ≃ V /U
under the projection map V −→ U ⊥ .

ˆ The isomorphism provides V /U with an inner product given by

⟨v, w⟩ := ⟨v⊥ , w⊥ ⟩.

ˆ The composition of maps


V −→ V /U ≃ U ⊥ ,→ V
is just the orthogonal projection ProjU ⊥ : V −→ V .

Note. Although U ⊥ and V /U look the same (isomorphic), they are different vector spaces:

ˆ U ⊥ is a subspace of V .

ˆ V /U is not a subspace of V .
We encounter this Example very early on already in Non-Example 1.26:

   
0 x

  3
  3
Example 10.1. If U = { 0 : z ∈ R} ⊂ R , then U = {y  : x ∈ R, y ∈ R} ⊂ R is a
  
z 0
3 2
subspace, but R /U ≃ R is not a subspace.

-145-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

Example 10.2. Let


V := {(a1 , a2 , a3 , ...) : ai ∈ K}
be the vector spaces of sequences in K, and

U := {(0, a2 , a3 , ...) : ai ∈ K} ⊂ V

be the subspace of sequences with a1 = 0. Then the linear map


V −→ K
(a1 , a2 , a3 , ...) 7→ a1

is surjective with kernel U , hence by the First Isomorphism Theorem


V /U ≃ K.

Intuitively, V /U means “ignoring all the terms except the first one”.
Note that V /U is not the same as the subspace

W := {(a1 , 0, 0, ...) : a1 ∈ K} ⊂ V

although V = U ⊕ W and W ≃ V /U ≃ K.
This Example also shows that dim V /U can be finite even though dim V = dim U = ∞.

Example 10.3. Another useful example comes from analysis. Let R([0, 1]) be the space of real-
valued Riemann integrable functions on [0, 1]. Then the bilinear form
Z 1
⟨f, g⟩ := f (x)g(x)dx
0

is not really an inner product: ⟨f, f ⟩ = 0 ̸=⇒ f (x) = 0


since we know there exists nonzero Riemann integrable functions with
Z 1
|f (x)|2 dx = 0.
0

However, if we “ignore” all these functions, namely if we consider the quotient space
V := R([0, 1])/Z([0, 1])

where
Z 1
Z([0, 1]) := {f ∈ R([0, 1]) : |f (x)|2 dx = 0}
0
then the bilinear form becomes an inner product on V .

Remark. However, for this to be a Hilbert space, we need Lebesgue integrable functions instead, such
that the L2 -completeness condition of Hilbert space is satisfied.

-146-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

Quotient spaces are very useful for induction on dimension: If we know information about a
subspace U ⊊ V and V /U which are of smaller dimensions, we can construct relevant things on V .

One common trick is illustrated in the proof of the following version of the Basis Extension
Theorem for quotient.

Theorem 10.8. Let dim V < ∞ and U ⊂ V be a subspace with {u1 , ..., ur } a basis of U .

ˆ If V /U has a basis {v1 , ..., vm }, then

{u1 , ..., ur , v1 , ..., vm } ⊂ V

is a basis of V for any representatives v1 , ..., vm ∈ V .

ˆ Conversely, if {u1 , ..., ur , v1 , ..., vm } is a basis of V , then {v1 , ..., vm } is a basis for V /U .

Proof. By the Dimension Formula, dim V = r + m. So we only need to check that the bases are
linearly independent. This follows from
d1 v1 + · · · + dm vm = 0
⇐⇒d1 v1 + · · · + dm vm ∈ U
⇐⇒c1 u1 + · · · + cr ur + d1 v1 + · · · + dm vm = 0
for some ci ∈ K.
Finally, we look at linear maps, which completes the description of Proposition 8.5.

Theorem 10.9. Let T ∈ L (V, W ). Let U ⊂ V and U ′ ⊂ W be subspaces such that T (U ) ⊂ U ′ .


Then we have an induced linear map on quotients:

T : V /U −→ W/U ′
v 7→ T (v).

In particular, if T ∈ L (V, V ) and U is T -invariant, then we have a map

T : V /U −→ V /U.

Furthermore, if dim V < ∞ and T is represented by a block matrix


!
AU ∗
A=
O AV /U

then T is represented by AV /U with respect to the basis constructed in Theorem 10.8.

Intuitively speaking, taking quotient by U means “killing” all the contributions of vectors from U .

-147-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces

Proof. Again we need to check the map is well-defined, i.e. if v′ is another representative of v, then

T (v′ ) = T (v),

i.e. T (v′ ) − T (v) = T (v′ − v) ∈ U ′ , which is true since v′ − v ∈ U .

The basis constructed by Theorem 10.8 gives A the required upper triangular block form.

Taking U ′ = {0} in the Theorem, we see that V /U is a canonical object:

Theorem 10.10 (Universal Property of Quotient). Let U ⊂ V be a subspace and T ∈


L (V, W ).

Then U ⊂ Ker(T ) if and only if there exists a unique T ∈ L (V /U, W ) such that

T = T ◦ π.

We say that T factorizes through the quotient V /U .

We can express this in a commutative diagram:

T
V W
π
T
V /U

Finally, we illustrate the use of quotient spaces by rewriting the proof of Schur’s Lemma:

Any T ∈ L (V, V ) with dimC V < ∞ can be represented by an upper triangular matrix.

Proof. Pick an eigenvector v1 ∈ V as before, and let U = Span(v1 ) ⊂ V .

ˆ By induction, pick a basis {v2 , ..., vn } of V /U such that T : V /U −→ V /U is upper triangular.

ˆ Then T with respect to the basis {v1 , v2 , ..., vn } is upper triangular:

T (vi ) = T (vi ) ∈ Span(v2 , ..., vi ) =⇒ T (vi ) ∈ Span(v1 , v2 , ..., vi ).

ˆ By Corollary 10.7, we can make the basis orthonormal as well.

-148-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces

10.2 Dual Spaces

In mathematics, sometimes two seemingly different concepts have exactly the same mathematical
structures. This is known as duality. To give some examples:

ˆ In R3 , the concept of straight lines and planes are dual to each other. Both of them are
specified by two data:
– Line: a point and a direction vector.
– Plane: a point and a normal vector.
In fact, given a normal vector n ∈ R3 , any plane is specified by an equation of the form

n · x = c, x ∈ R3 , c ∈ R.

As a special case, the set of “all lines through 0”, and the set of “all planes containing 0”,
have exactly the same geometry known as projective spaces.
ˆ In R2 , the concept of points and straight lines are dual to each other.
– Intersection of 2 lines ←→ a line joining 2 points.
– concurrent lines (intersecting at 1 point) ←→ colinear points (lying on 1 line).
The study of the combinatorics of lines and points is a special case of incidence geometry.
ˆ In multivariable calculus, interpreting with differential forms, the gradient and divergence
operations are dual to each other

f 7→ ∇f F 7→ ∇ · F
scalar function −→ vector field ←→ vector field −→ scalar function

while curl is self-dual:

F 7→ ∇ × F
vector field −→ vector field

This is a special case of Hodge duality and is important in electromagnetic theory.


ˆ Consider the space V = Rn [t] of degree ≤ n polynomials. For any p ∈ Rn [t], we can apply
the evaluation maps Evi ∈ L (V, R),

Evi : p 7→ p(i), i = 0, 1, ..., n

to get the values


{p(0), p(1), ..., p(n)}.
Conversely, knowing the values uniquely determine the polynomial p(t) by the Lagrange
Interpolation Formula. In this sense, we have a duality

{evaluation maps} ←→ {polynomials}

It is easy to check that the set of evaluation maps {Ev0 , Ev1 , ..., Evn } forms a basis of L (V, R).

-149-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces

The last example motivates the definition of dual space in Linear Algebra.

Definition 10.11. Let V be a vector space over K.

ˆ The dual space of V is defined to be the vector space

V ∗ := L (V, K).

ˆ An element ϕ ∈ L (V, K) is called a linear functional .

Example 10.4. If V = K n , then V ∗ is just the space of 1 × n matrix (horizontal vectors).

More generally, if dim V = n with basis B = {u1 , ..., un }, then any ϕ ∈ V ∗ is represented by
 
[ϕ]B = ϕ(u1 ) · · · ϕ(un )

with respect to B.

Since any element in L (V, K) is determined by its image on the basis vectors, we have

Proposition 10.12. Let dim V = n and B = {u1 , ..., un } be a basis of V . Then the linear
functionals u∗i ∈ V ∗ defined by
(
1 i=j
u∗i (uj ) := , i, j = 1, ..., n
0 i ̸= j

form a basis of V ∗ , called the dual basis with respect to B. In particular

dim V ∗ = dim V.

 
c1
. n
Note. For any v ∈ V , if [v]B =  .
 .  ∈ K is the coordinate vector with respect to B, then
cn

u∗i (v) = ci .

Since the dimension are the same, we have an isomorphism V ≃ V ∗ , but different bases of V gives
different isomorphism. However, the situation gets better if V is a (real) inner product space, since
we can obtain the coordinates by taking inner product (Proposition 4.13).

-150-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces

Proposition 10.13. Let V be an inner product space over K = R with dim V = n.


Then we have a canonical isomorphism

V ≃ V∗
v 7→ v∗

where for any u ∈ V , the linear functional is defined by


v∗ (u) := ⟨u, v⟩.

In particular any linear functional is of the form of an inner product.

Notation. The notation is consistent with the adjoint (Definition 7.4) when K = R.

Remark. When K = C, the situation is more complicated since the above map is conjugate-linear. We
have to modify here and there with complex conjugates instead and is quite troublesome, so we omit it.

Proof. Since dim V = dim V ∗ , we only need to check injectivity of the map.
If v∗ is the zero functional, i.e. ⟨u, v⟩ = 0 for all u ∈ V , then ⟨v, v⟩ = 0 =⇒ v = 0.

Example 10.5. Any bilinear form f (u, v) on V also defines a linear map

V −→ V ∗
v 7→ v∗

where for any u ∈ V , v∗ (u) := f (u, v). In general, if f (u, v) is non-degenerate, i.e.

f (u, v) = 0 ∀u ∈ V =⇒ v = 0

then the map V −→ V ∗ is injective (and hence isomorphism if dim V < ∞).

Remark. If dim V = ∞, in general we only have injectivity V ,→ V ∗ but not surjectivity.


Consider the space of real-valued smooth functions C ∞ [−π, π] with inner product
Z π
⟨f, g⟩ := f (x)g(x)dx.
−π

Then the linear functional given by the evaluation map

f (x) 7→ f (0)

cannot be written as ⟨f, δ⟩ for some smooth function δ(x).


The “not-a-function” δ(x) written formally in this way is called the Dirac delta functional .
The study of infinite dimensional dual spaces is known as distribution theory.

-151-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces

Given a linear map T ∈ L (V, W ), we can also define its dual :

Definition 10.14. Let T ∈ L (V, W ). The dual linear map T ∗ ∈ L (W ∗ , V ∗ ) is defined by

T ∗ : W ∗ −→ V ∗
ϕ 7→ ϕ◦T

i.e. if ϕ ∈ W ∗ , then T ∗ ϕ is a linear functional on V defined by evaluating

(T ∗ ϕ)(v) := ϕ(T v).

Again the notation is compatible with the adjoint (i.e. transpose) defined in Chapter 7:

Proposition 10.15. Let V, W be finite dimensional. Under the canonical isomorphisms V ≃ V ∗


and W ≃ W ∗ , the dual linear map
T ∗ : W ∗ −→ V ∗
is the same as the adjoint map
T ∗ : W −→ V.

In other words, we have the commutative diagram:


T∗
W∗ V∗
≃ ≃
T∗
W V
In particular, all the properties for the adjoint (Proposition 7.3) are satisfied for the dual maps.

Proof. For w ∈ W , assume T ∗ (w) = v ∈ V .

ˆ If w ∈ W corresponds to w∗ ∈ W ∗ , then T ∗ w∗ ∈ V ∗ is given by evaluating for any u ∈ V ,


(T ∗ w∗ )(u) = w∗ (T u) = ⟨T u, w⟩ = ⟨u, T ∗ w⟩ = ⟨u, v⟩ = v∗ (u).

ˆ In other words, T ∗ w∗ = v∗ , which corresponds to v ∈ V .

Corollary 10.16 (Fundamental Theorems of Linear Algebra). We have

ˆ Rank of T = Rank of T ∗ .

ˆ dim Ker(T ∗ ) + dim V = dim Ker(T ) + dim W .

ˆ T injective ⇐⇒ T ∗ surjective.

ˆ T surjective ⇐⇒ T ∗ injective.

-152-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces

If dim V < ∞ (inner product not needed), taking double dual naturally identifies with itself:

Theorem 10.17. If dim V < ∞ over any field K, then there is a canonical isomorphism

V ≃ V ∗∗
v 7→ v∗∗

where v∗∗ evaluating on a linear functional ϕ ∈ V ∗ is given by

v∗∗ (ϕ) := ϕ(v).

Proof. Since dim V = dim V ∗ = dim V ∗∗ , we just need to check the map is injective.

ˆ If v∗∗ is the zero functional on V ∗ , then v∗∗ (ϕ) = ϕ(v) = 0 for any ϕ ∈ V ∗ .

ˆ By Proposition 10.12, evaluating with the dual basis shows that the coordinates of v are all 0.

Finally, the notions of subspaces and quotients are dual to each other:

Theorem 10.18. Let U ⊂ V be a subspace.

ˆ We have a canonical isomorphism

(V /U )∗ ≃ U 0 ⊂ V ∗

where the subspace U 0 is the annihilator of U given by

U 0 := {ϕ ∈ V ∗ : ϕ(U ) = 0}.

ˆ In particular, if dim V < ∞, then

dim U + dim U 0 = dim V.

ˆ Conversely, if dim V < ∞, any subspace of V ∗ is the annihilator of some subspace of V ,


i.e. it is isomorphic to the dual of a unique quotient of V .

The idea is that

Subspaces ←→ Injective maps U ,→ V .


Quotients ←→ Surjective maps V ↠ V /U .

Hence taking dual will flip them around by Corollary 10.16.

-153-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces

Proof. Let U ⊂ V be a subspace:

ˆ We have a composition of injective and surjective maps:


ι π
U V V /U

where π ◦ ι = O. (This is known as a short exact sequence.)

ˆ Taking dual, by Corollary 10.16, we have composition of injective and surjective maps:

π∗ ι∗
(V /U )∗ V∗ U∗

so that (V /U )∗ is naturally identified under π ∗ with a subspace Im(π ∗ ) of V ∗ .

ˆ To see that Im(π ∗ ) = U 0 :


– If ϕ ∈ (V /U )∗ , then (π ∗ ϕ)(u) = ϕ(u) = 0 and hence Im(π ∗ ) ⊂ U 0 .
– On the other hand, if ϕ ∈ U 0 , then by the Universal Property of Quotient,

ϕ = ψ ◦ π = π∗ψ

for a unique linear map ψ ∈ (V /U )∗ . Hence U 0 ⊂ Im(π ∗ ).

ˆ Also easy to check that Ker(ι∗ ) = U 0 , hence we have a canonical isomorphism U ∗ ≃ V ∗ /U 0 .

If dim V < ∞, we have

dim U 0 = dim(V /U )∗ = dim V /U = dim V − dim U.

Conversely, if dim V < ∞ and W is a subspace of V ∗ :

ˆ Again we have a composition of injective and surjective maps:


ι π
W V∗ V ∗ /W.

ˆ By the proof above, W 0 = Im(π ∗ ) ⊂ V ∗∗ and W ∗ ≃ V ∗∗ /W 0 .

ˆ Under the isomorphism V ≃ V ∗∗ , we have

W 0 := {v∗∗ ∈ V ∗∗ : v∗∗ (ϕ) = 0 for all ϕ ∈ W } ⊂ V ∗∗


≃ U := {v ∈ V : ϕ(v) = 0 for all ϕ ∈ W } ⊂ V.

ˆ Hence we have canonical isomorphisms W ∗ ≃ V ∗∗ /W 0 ≃ V /U .

ˆ Taking dual again gives W ≃ (V /U )∗ .

ˆ Since W ⊂ U 0 , the dimension formula shows that dim W = dim U 0 so they are in fact equal.

-154-
APPENDIX A

Equivalence Relations

We give the definition of equivalence classes, which is a fundamental mathematical construction


and allows us to define the quotient spaces.

Let S be a set.

Definition A.1. A binary relation of S is a subset R ⊂ S × S.

For any x, y ∈ S, we write


x ∼ y ⇐⇒ (x, y) ∈ R
and say that x is related to y by R.

Definition A.2. A binary relation ∼ is an equivalence relation if it is

(1) Reflexive: x ∼ x.
(2) Symmetric: x ∼ y =⇒ y ∼ x.
(3) Transitive: x ∼ y and y ∼ z =⇒ x ∼ z.

If ∼ is an equivalence relation, the equivalence class of x ∈ S is the subset

[x] := {z ∈ S : z ∼ x} ⊂ S.

The set of all equivalence classes is denoted by S/ ∼.

Equivalence classes partition S into different “groups”. More precisely,

155
Appendix A. Equivalence Relations

Proposition A.3. Let ∼ be an equivalence relation. Then

ˆ For any x, y ∈ S, either


[x] = [y] or [x] ∩ [y] = ∅.
In other words,
[x] = [y] ⇐⇒ x ∼ y.

ˆ S is a disjoint union of equivalence classes:


[ a
S= [x] = [x].
x∈S [x]∈S/∼

Proof. It follows from straightforward set manipulation.

ˆ If [x] ∩ [y] ̸= ∅, let z ∈ [x] ∩ [y]. Then z ∼ x and z ∼ y. By symmetry (2), x ∼ z.

ˆ If w ∈ [x], then by transitivity (3),

w ∼ x ∼ z ∼ y =⇒ w ∼ y

so that w ∈ [y], i.e. [x] ⊂ [y].

ˆ Similarly [y] ⊂ [x], so [x] = [y].

It is clear that S is a union of equivalence classes since by reflexivity (1), x ∈ [x] ⊂ S.

The previous statement shows that the equivalence classes are either equal or disjoint.

Note. All the properties (1)–(3) of an equivalence relation are needed in the proof!

Example A.1. If S = Z and x ∼ y ⇐⇒ x − y is a multiple of n ∈ N, then

[r] = {m ∈ Z : m = kn + r for some k ∈ Z} ⊂ Z

i.e.
(S/ ∼) = {[0], [1], ..., [n − 1]}
is identified with the remainders of division by n.

-156-
APPENDIX B

Euclidean Algorithm

We know that given positive integers a > b, we can divide to obtain

a = qb + r, 0≤r<b

where q is the quotient and r is the remainder .

Using this, Euclid in his book Elements, stated a simple algorithm to calculate the greatest
common divisor (gcd) of a, b, i.e. the largest number d such that both a, b are multiples of d.

Theorem B.1 (Euclidean Algorithm). The gcd is calculated by successively taking quotient
and remainder:

a = q1 b + r1
b = q2 r1 + r2
r1 = q3 r2 + r3
···
rn−2 = qn rn−1 + rn
rn−1 = qn+1 rn

where the process must stop since {rk } is a strictly decreasing sequence of positive integers.

Then d = rn is the greatest common divisor .

157
Appendix B. Euclidean Algorithm

Proof.

ˆ By (backward) induction, we see that d = rn divides rk for all k, including r0 := b and


r−1 := a.

ˆ By (forward) induction, we see that any common factors of a, b must divide rk for all k,
including rn = d.

Now we can express r1 as an integer combination of a, b.

Then r2 can be expressed as integer combination of a, b and so on... We conclude that

Theorem B.2 (Bézout’s Identity). Given a, b ∈ N, there exists integers m, n ∈ Z such that

d = ma + nb

where d = gcd(a, b).

Example B.1. Consider a = 3131, b = 1111. Then

3131 = 2 · 1111 + 909


1111 = 1 · 909 + 202
909 = 4 · 202 + 101
202 = 2 · 101.

Therefore d = 101.

Reverting the calculations, we get

101 = 909 − 4 · 202


= 909 − 4 · (1111 − 1 · 909)
= 5 · 909 − 4 · 1111
= 5 · (3131 − 2 · 1111) − 4 · 1111
= 5 · 3131 − 14 · 1111.

The same logic applies to polynomial division over any field K: Given polynomial a(t), b(t) ∈ K[t]
with deg a ≥ deg b,
a(t) = q(t)b(t) + r(t), deg r < deg b
where q(t) is the quotient and r(t) is the remainder .

-158-
Appendix B. Euclidean Algorithm

Using exactly the same algorithm, we have

Theorem B.3 (Euclidean Algorithm / Bézout Identity). If a(t), b(t) ∈ K[t], then there exists
polynomials p(t), q(t) ∈ K[t] such that

d(t) = p(t)a(t) + q(t)b(t)

where d(t) is the greatest common divisors (i.e. with the largest degree) of a(t) and b(t), defined
up to a scalar multiple.

Remark. Euclidean algorithm may not work if the coefficients are not from a field, because we may not
be able to do long division.

Example B.2. Consider a(t) = t5 + 2t4 + t3 − t2 − 2t − 1 and b(t) = t4 + 2t3 + 2t2 + 2t + 1.

By long division, we obtain

a(t) = t · b(t) + (−t3 − 3t2 − 3t − 1)


b(t) = (1 − t) · (−t3 − 3t2 − 3t − 1) + (2t2 + 4t + 2)
t+1
−t3 − 3t2 − 3t − 1 = − · (2t2 + 4t + 2).
2
We see that d(t) is proportional to t2 + 2t + 1 = (t + 1)2 .

Reverting the calculations, we obtain:


1 1
(t + 1)2 = b(t) − (1 − t)(−t3 − 3t2 − 3t − 1)
2 2
1 1
= b(t) − (1 − t)(a(t) − t · b(t))
2 2
1 1
= (t − 1)a(t) − (t2 − t − 1)b(t).
2 2

-159-
Appendix B. Euclidean Algorithm

-160-
APPENDIX C

Complexification

Complexification is another way to study real transformations on a complex vector space, or


equivalently, matrices with real entries viewed as complex matrices.

The main goal is to give an explicit construction of an orthonormal basis to block diagonalize a
real normal matrix, which was stated after Example 7.9.

Throughout this Appendix, let V, W be vector spaces over K = R.

Definition C.1. The complexification of V is a vector space VC over C defined by:

VC = V × V

where

ˆ The element (u, v) ∈ V × V is written as u + iv

ˆ Addition is component-wise:

(u + iv) + (u′ + iv′ ) := (u + u′ ) + i(v + v′ ).

ˆ Scalar multiplication by a + ib ∈ C is given by

(a + ib)(u + iv) := (au − bv) + i(bu + av)

i.e. usual product of complex numbers.

ˆ Zero element is 0C := 0 + i0.

Intuitively, it just means that now we allow complex coefficients for our vector space V .

-161-
Appendix C. Complexification

A subset S ⊂ V can naturally be considered as a subset of VC with zero imaginary component.

Proposition C.2. Let dimR V < ∞.

ˆ If B = {b1 , b2 , ..., bn } is a basis of V over R, then it is also a basis of VC over C.

ˆ In particular,
dimR V = dimC VC .

Proof. Check the definition of basis.

ˆ (Linearly independent.) If

c1 b1 + · · · + cn bn = 0, ck = ak + ibk ∈ C

then the real and imaginary parts give

a1 b1 + · · · + an bn = 0

b1 b1 + · · · + bn bn = 0
over R, which means ak , bk = 0 and hence ck = 0.

ˆ (Spanning) Any vector of VC is of the form

u + iv = (a1 b1 + · · · + an bn ) + i(b1 b1 + · · · + bn bn )

for some ak , bk ∈ R, and hence equals

c1 b1 + · · · + cn bn

for ck = ak + ibk ∈ C.

ˆ The dimension follows from |B| = dim V .

Next we define complexification of linear transformations.

Definition C.3. Let T ∈ L (V, W ). The complexification TC ∈ L (VC , WC ) is defined by

TC (u + iv) := T (u) + iT (v)

i.e. it just acts linearly with respect to complex scalars.

-162-
Appendix C. Complexification

Some straightforward observations include:

Proposition C.4. If T is represented by a (real) matrix A with respect to bases B, B ′ of V, W ,


then TC is represented by the same real matrix with respect to the same bases of VC , WC .

Furthermore,

ˆ T and TC have the same characteristic and minimal polynomials.

ˆ λ is an eigenvalue of T if and only if it is a real eigenvalue of TC .

ˆ if λ is an eigenvalue of TC , then λ is also an eigenvalue of TC .

In other words,

Complexification means treating a matrix with real entries as a complex matrix.

Note that if TC has a real eigenvalue λ, then both the real and imaginary parts of its (complex)
eigenvector are also real eigenvectors, so that λ is really an eigenvalue of T .

We can now state the first result about real matrix.

Proposition C.5. Let dimR V < ∞. If T ∈ L (V, V ), then there exists a T -invariant subspace
U ⊂ V with dimR U = 1 or 2.

Recall from Definition 8.1 that a subspace U ⊂ V is T -invariant if T (U ) ⊂ U .

Proof. The result is trivial if dim V ≤ 2. Let TC ∈ L (VC , VC ) be the complexification.

ˆ Since K = C, there exists an eigenvalue λ = a + ib.

ˆ Let u + iv be an eigenvector with eigenvalue a + ib, i.e.

TC (u + iv) = T (u) + iT (v) = (a + ib)(u + iv) = (au − bv) + i(bu + av).

ˆ In other words,

T (u) = au − bv
T (v) = bu + av

so that the real span U = Span(u, v) ⊂ V is a T -invariant subspace with dimR U ≤ 2.

ˆ dimR U = 1 if u is a multiple of v, in which case b = 0 and they are both eigenvectors of T


with eigenvalue a ∈ R.

-163-
Appendix C. Complexification

To study normal matrix, we first study the base case:

Proposition C.6. Let dimR V = 2. Any normal operator T ∈ L (V, V ) is either

ˆ symmetric, and hence orthogonally diagonalizable, or

ˆ it is represented by a matrix of the form


!
a b
A= , a, b ∈ R, b ̸= 0
−b a

with respect to any orthonormal basis of V .

!
a b
Proof. Assume T is not symmetric. Let A = represents T in an orthonormal basis.
c d

ˆ Expanding AAT = AT A we get b2 = c2 and ab + cd = ac + bd.

ˆ Since A is not symmetric, b = −c ̸= 0 and hence b(a − d) = 0 which implies d = a.

We now state the main result, which gives us an explicit construction (by induction) of an orthonor-
mal basis to block diagonalize a real normal matrix (this is mentioned in Example 7.9).

Theorem C.7 (Spectral Theorem of Real Normal Matrices). Let A be a real normal matrix.
Then it is orthogonally similar to a block diagonal matrix of the form
 
λ1
 

 λ2 


 . ..


 
 

 λr 

a1 b1
 
 
 

 −b1 a1 


 . ..


 
 

 am bm  
−bm am

where ak , bk , λk ∈ R.

The eigenvalues (with multiplicities) are given by λk and ak ± ibk ∈ C.

-164-
Appendix C. Complexification

Proof. Let A represent a linear transformation T ∈ L (V, V ) in the standard basis.

The base case dim V = 1 is trivial, while we have shown the case dim V = 2 in Proposition C.6.
Hence assume dim V ≥ 3.

ˆ By Proposition C.5, there exists a 1 or 2 dimensional T -invariant subspace U .

ˆ By Proposition 8.6, U and U ⊥ are T -invariant subspaces.

ˆ Hence there exists an orthonormal basis such that T is represented in block diagonal form
!
AU O
A= .
O AU ⊥

ˆ If dim U = 1, we apply induction to AU ⊥ . Hence assume dim U = 2.

ˆ By Proposition 8.14, T |U is also normal, hence it is either orthogonally diagonalizable, or it


is of the required 2 × 2 block form.

ˆ U ⊥ has smaller dimension and T |U ⊥ is normal, hence by induction AU ⊥ is similar to the


required form in some orthonormal basis B ⊥ of U ⊥ .

ˆ Combining with the basis from U gives (up to permutation) the full orthonormal basis for V
that gives A the required form.

-165-
Appendix C. Complexification

-166-
APPENDIX D

Proof of Rational Canonical Forms

The existence and uniqueness of the Rational Canonical Form (as well as the Jordan Canon-
ical Form) are special cases of the “Structure Theorem for Finitely Generated Modules
overa Principal Ideal Domain”, belonging to the branch of module theory in advanced ab-
stract algebra.

Here I will “translate” the proof and present it in a self-contained way using only Linear Algebra.
Those who have studied ring and module theory may find the proof obscuring with algebraic
structures such as ideals, divisors, generators and submodules etc.

One may also get a “taste” of what to expect in advanced abstract algebra in the future.
The proofs below follow the one from Advanced Linear Algebra by S. Roman.

If you just want to get the main idea how the Rational Canonical Form is constructed,
you can safely skip the blue proofs, trust the results, and move on.

D.1 The Set Up

Let dim V < ∞ and T ∈ L (V, V ) be our linear transformation.


Our goal is to decompose V into a direct sum of cyclic subspaces of T .

Let the minimal polynomial be factorized into irreducible factors:

m(λ) = p1 (λ)m1 · · · pk (λ)mk .

Recall that if K is not algebraically closed, these polynomials may not be linear.

167
Appendix D. Proof of Rational Canonical Forms D.2. Decomposing V = Ker(p(T )m )

Recall the Primary Decomposition Theorem which says that

V = Ker(p1 (T )m1 ) ⊕ · · · ⊕ Ker(pk (T )mk )

as a direct sum of T -invariant subspaces.

Just like the proof of Jordan Canonical Form, we can first focus onto the case when

V = Ker(p(T )m )

and study the decomposition.

In that proof, we have split V into cyclic subspaces Uv of S := T − λI , which is also T -invariant.
However, if p(λ) is not linear, the subspaces may not be T -invariant anymore!

We need more refinement.

D.2 Decomposing V = Ker(p(T )m )

The first step is to decompose V = Ker(p(T )m ) into cyclic subspaces of T .


(This is actually the hardest step.)

Recall that m is chosen to be the smallest possible, so that by the definition of V , the minimal
polynomial of T is p(λ)m .

Proposition D.1. We have a direct sum decomposition

V = Uv1 ⊕ Uv2 ⊕ · · · ⊕ UvN

into cyclic subspaces of T , such that the minimal polynomial of T |Uvi is p(λ)ei with

m = e1 ≥ e2 ≥ · · · ≥ eN .

Since m(λ) = p(λ)m , there exists a vector v1 ∈ V such that p(T )m−1 v1 ̸= 0.
(Otherwise p(T )m−1 v = 0 for any v ∈ V , which means p(λ)m−1 is minimal instead!)

Let Uv1 be the cyclic subspaces of T generated by v1 . The idea is to show that

Lemma D.2. If p(T )m−1 v1 ̸= 0, then

V = Uv1 ⊕ W

for some T -invariant subspace W .

-168-
Appendix D. Proof of Rational Canonical Forms D.2. Decomposing V = Ker(p(T )m )

Then we can proceed by induction,

V = Uv1 ⊕ W
= Uv1 ⊕ Uv2 ⊕ W ′
···
= Uv1 ⊕ Uv2 ⊕ Uv3 ⊕ · · · ⊕ UvN

and complete the proof (the process must end since V is finite dimensional).

Definition D.3. Given vectors G = {v1 , ..., vk } ⊂ V , we define a subspace of V by

Uv1 ,...,vk := Uv1 + · · · + Uvk ,

the vector space sums (not necessary direct). Note that Uvi is a subspace of Uv1 ,...,vk .

We call the set G the generator of Uv1 ,...,vk .

Note. A generator of V always exists (e.g. by taking a basis) but in general the set is much smaller.

Note. The definition really says Uv1 ,...,vk is spanned by vectors of the form {T i vj }, or in other
words, linear combinations of {pj (T )vj } for some polynomials pj (t).

With this terminology, we can now proof our Lemma.

Proof. (Lemma D.2) Step 1. Induction Idea

We proceed by induction on the number of generators.

ˆ If V = Uv1 , then W = {0} and it is trivial.

ˆ Assume V = Uv1 ,v2 ,...,vk ,u has minimal number of generators. By induction

Uv1 ,v2 ,...,vk = Uv1 ⊕ W0

for some T -invariant subspace W0 .

ˆ Note that W = W0 + Uu is T -invariant. Therefore if Uv1 ∩ W = {0} then we are done.


Unfortunately this is not always true, since Uv1 ∩ Uu ̸= {0} in general as u can be any vector.

ˆ But we may try to fix this by modifying u to be

u′ = u − α(T )v1

for some polynomial α(t). Note that we still have

V = Uv1 ,v2 ,...,vk ,u′ .

-169-
Appendix D. Proof of Rational Canonical Forms D.2. Decomposing V = Ker(p(T )m )

Step 2. The Equation to Find α(t)

Hence our new goal is to find α(t) such that W := W0 + Uu′ intersects Uv1 trivially.

ˆ In other words,
Uv1 ∩ (W0 + Uu′ ) = {0}
so that
V = Uv1 ⊕ W
and complete our construction.

ˆ The condition means for any w0 ∈ W0 and polynomial r(t),

w0 + r(T )u′ ∈ Uv1 =⇒ w0 + r(T )u′ = 0.

i.e.
w0 + r(T )(u − α(t)v1 ) ∈ Uv1 =⇒ w0 + r(T )(u − α(t)v1 ) = 0.
Rewriting, this means for any r(t),

r(T )u ∈ Uv1 ⊕ W0 =⇒ r(T )(u − α(T )v1 ) ∈ W0 . (⋆)

Step 3. Describe LHS of (⋆)

Now consider the set I of all polynomials r(t) such that

r(T )u ∈ Uv1 ⊕ W0 .

ˆ Note that u ∈
/ Uv1 ⊕ W0 = Uv1 ,...,vk otherwise V can have one less generator.

ˆ By definition p(t)m ∈ I since it kills everything.

ˆ If r(t) ∈ I, then it must have a common factor with p(t)m .


– Otherwise by the Euclidean algorithm, there exists a(t), b(t) such that

1 = a(t)r(t) + b(t)p(t)m .

– Then 1 · u = u ∈ Uv1 ⊕ W0 , a contradiction.

ˆ Hence any r(t) ∈ I must be a multiple of p(t)d for some 1 ≤ d ≤ m as big as possible.

ˆ By the same Euclidean algorithm, p(t)d ∈ I as well.

This is a long way of saying that I = (pd ) is a principal ideal......

-170-
Appendix D. Proof of Rational Canonical Forms D.3. Combining cyclic subspaces

Step 4. Describe RHS of (⋆)

Now we know if r(T )u ∈ Uv1 ⊕ W0 , then r(t) = q(t)p(t)d for some polynomial q(t).

ˆ This means the RHS of (⋆) is

r(T )(u − α(T )v1 ) = q(T )p(T )d (u − α(T )v1 ).

ˆ Therefore if we can find α(t) such that p(T )d (u − α(T )v1 ) ∈ W0 then (⋆) is satisfied.

ˆ Since p(t)d ∈ I, by definition


p(T )d u = s(T )v1 + w0
for some polynomial s(t) and w0 ∈ W0 .

ˆ We show that p(t)d must divide s(t).


– We have
0 = p(T )m−d p(T )d u = p(T )m−d s(T )v1 + p(T )m−d w0 .

– Since we have direct sum Uv1 ⊕ W0 , we have p(T )m−d s(T )v1 = 0.
– Since T restricted to Uv1 has minimal polynomial p(t)m , it must divide p(t)m−d s(t).
– Hence s(t) must be a multiple of p(t)d .

ˆ Finally, take α(t) such that s(t) = α(t)p(t)d . Then

p(T )d (u − α(T )v1 ) = p(T )d u − s(T )v1 = w0 ∈ W0

is satisfied!

D.3 Combining cyclic subspaces

Therefore, now we have a decomposition of V into cyclic subspaces:

V = Ker(p1 (T )m1 ) ⊕ · · · ⊕ Ker(pk (T )mk )


= (Uv11 ⊕ · · · ⊕ Uv1N1 ) ⊕ · · · ⊕ (Uvk1 ⊕ · · · ⊕ UvkNk )

where each cyclic subspace Uvij has the same (see Proposition 9.10) characteristic and minimal
polynomials pi (λ)eij for some powers eij with mi = ei1 ≥ ei2 ≥ · · · ≥ eiNi ≥ 1.

Definition D.4. The polynomials pi (λ)eij are called the elementary factors.

Note. If each pi is linear, then elementary factors are precisely the minimal polynomials of each
Jordan blocks.

-171-
Appendix D. Proof of Rational Canonical Forms D.3. Combining cyclic subspaces

From this, in principle we can already write down the companion matrices of each pieces. However,
if the irreducible factors pi (λ) do not look simple, this matrix form is too complicated!

Therefore we want to use as few cyclic subspaces as possible. The observation is that

Direct sums of cyclic subspaces is a cyclic subspace if their characteristic polynomials


have no common factors.

More precisely,

Proposition D.5. Let p(λ) be the characteristic polynomial of Uv and q(λ) be that of Uw .

If p(λ) and q(λ) have no common factors, then

Uv+w = Uv ⊕ Uw

with characteristic and minimal polynomial p(λ)q(λ).

Proof. By definition Uv+w ⊂ Uv + Uw .

ˆ The sum is direct: Uv ∩ Uw = {0}.


– If u ∈ Uv ∩ Uw , it is killed by p(T ) and q(T ).
– By the Euclidean algorithm, there exists a(t), b(t) such that 1 = a(t)p(t) + b(t)q(t).
– Hence u is killed by I = a(T )p(T ) + b(T )q(T ), and must be the zero vector.

ˆ The minimal polynomial of Uv+w is p(λ)q(λ).


– We only need to check the cyclic vector v + w, since the space is generated by it.
– Obviously
p(T )q(T )(v + w) = 0.

– If it is not minimal, then since p(λ), q(λ) is coprime, WLOG assume v + w is killed by
p1 (T )q(T ) for some p1 (λ) with deg p1 < deg p.
– But then
0 = p1 (T )q(T )(v + w) = p1 (T )q(T )v,
so the minimal polynomial p(λ) divides p1 (λ)q(λ), which is impossible.

ˆ Since deg(pq) = deg p + deg q and cyclic subspace has the same characteristic and minimal
polynomial,
Uv+w and Uv ⊕ Uw
have the same dimension, so must be equal.

-172-
Appendix D. Proof of Rational Canonical Forms D.4. Invariant Factors

D.4 Invariant Factors

Now we write down the decomposition vertically: (Note: the rows may have different lengths)

V =Uv11 ⊕ Uv12 ⊕ · · · ⊕ · · · ⊕ Uv1N1 ⊕


Uv21 ⊕ Uv22 ⊕ · · · ⊕ · · · ⊕ · · · ⊕ Uv2N2 ⊕
..
.
Uvk1 ⊕ Uvk2 ⊕ · · · ⊕ UvkNk .

By previous section, the vertical sum (possibly skipping the terms that are missing) is cyclic:

Uv1j ⊕ · · · ⊕ Uvkj = Uwj

for some vectors wj with minimal polynomial

gj (λ) := p1 (λ)e1j · · · pk (λ)ekj

(take eij = 0 if the terms are missing).

In particular, the first column gives

g1 (λ) := p1 (λ)e11 · · · pk (λ)ek1 = p1 (λ)m1 · · · pk (λ)mk = m(λ).

We immediately see that

Proposition D.6. The invariant factors gi (λ) satisfies:

ˆ gi+1 (λ) divides gi (λ).

ˆ p(λ) = g1 (λ) · · · gN (λ) where N = max(N1 , ..., Nk ) (i.e. number of columns).

ˆ m(λ) = g1 (λ).

(Note that the degree of gi (λ) is from large to small instead of those in Theorem 9.11.

However it does not matter, we can always change the order of basis to permute the blocks, so the
two forms are matrix similar.)

This completes the construction of the Rational Canonical Form!

-173-
Appendix D. Proof of Rational Canonical Forms D.5. Uniqueness

D.5 Uniqueness

To complete the proof, we show the uniqueness of invariant factors / elementary factors.

Proposition D.7. The list of invariant factors g1 (λ), ..., gk (λ) is unique.

Proof. We outline the idea. By the conditions on invariant factors, they are uniquely determined
by elementary factors.

Now since
V = Ker(p1 (T )m1 ) ⊕ · · · ⊕ Ker(pk (T )mk )
we only need to focus on a single invariant subspace. Hence let V = Ker(p(T )m ).

ˆ Assume we have two different cyclic decompositions:

V = Uu1 ⊕ · · · ⊕ UuM

with elementary factors pd1 , ..., pdM , and

V = Uw1 ⊕ · · · ⊕ UwN

with elementary factors pe1 , ..., peN .

ˆ Consider Im(p(T )). All those subspaces with di = 1 and ej = 1 are killed under p(T ).

ˆ Also T restricted to Im(p(T )) has minimal polynomial p(T )m−1 .

ˆ By induction on the degree of m(λ), the list of elementary factors (which is just one degree
less of the original ones) is unique for Im(p(T )).

ˆ Hence we must also have the same number of di = 1 and ej = 1 subspaces to begin with to
match the dimension of V .

-174-

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy