Math 2131 Lecture Notes
Math 2131 Lecture Notes
Ivan Ip
This set of lecture notes summarize the definitions, theorems, idea of proofs and examples discussed
in class. It is an extended version of the Advanced Linear Algebra course I first taught in Kyoto
University in Fall 2017. A lot of topics have since been added, including the complete proofs of the
Jordan and Rational Canonical Form with all the necessary prerequisites.
Although the course title includes “abstract algebra”, the usual theory of groups, rings and fields
will be left to the second part of the Honors series. We will only briefly introduce them in Section
1.1 to formally build up the necessary mathematical foundations, and this will be the only place
where abstract algebra is mentioned in this course.
Additional topics on quotient spaces and dual spaces are also added in the current version. These are
some of the most difficult concepts for beginning Mathematics major students, but still extremely
important. Therefore the expositions are slightly longer than the rest of the notes to give as much
motivations and details as possible.
Even though this course focuses on the abstract setting, in order to cater a broader audience, we
also include some topics that are of interest to industry, especially on data sciences. These include
e.g. the QR decomposition, least square approximation and singular value decomposition.
Finally, no Exercises are included in the notes because I am lazy. But students are expected to fill
in the necessary details of the proofs and examples in these notes, and look for exercises in other
reference textbooks or on the math.stackexchange website.
Fall 2020
Ivan Ip
HKUST
i
ii
Preface 2
Special thanks to Wu Chun Pui (Mike) and Choy Ka Hei (Jimmy) for proofreading the previous
version of the lecture notes and fixing various typos together with some suggestive comments.
Fall 2023
Ivan Ip
HKUST
iii
iv
Introduction
We make linear approximations to real life problems, and reduce the problems to systems
of linear equations where we can then use the techniques from Linear Algebra to solve for
approximate solutions. Linear Algebra also gives new insights and tools to the original problems.
Roughly speaking,
Real Life Problems Linear Algebra
Data / Data Sets ←→ Vectors / Vector Spaces
Relationships between Data Sets ←→ Linear Transformations
In this course, we will focus on the axiomatic approach to linear algebra, and study the descrip-
tions, structures and properties of vector spaces and linear transformations.
With such a mindset, the usual theory of solving linear equations with matrices can be considered
as a very straightforward specialization of the abstract setting.
v
vi
Mathematical Notations
Numbers:
R : Real numbers
Q : Rational numbers
Logical symbols:
∀ : “for every”
∃ : “there exists”
=⇒ : “implies”
Sets:
S⊂X : S is a subset of X
X=Y : X ⊂ Y and Y ⊂ X
∅ : Empty set
X −→ Y : A map from X to Y
π : X ↠ Y : Surjection
Notational conventions in this lecture notes:
I will try to stick with it most of the time, with few exceptions.
• a, b, c, d scalars
• i, j, k, l (summation) indices (may start from zero)
• m, n, r dimension, natural numbers (always > 0 unless otherwise specified)
• f, g, h functions (usually in variable x)
• p, q, r polynomials (usually in variable t or λ)
• t, x, y, z variables
• u, v, w, ... (bold small letter) vectors
− ui , vj , wk ,... (unbold) coordinates
• ei standard basis vectors
• K base field (usually R or C)
• U, V, W vector spaces
(most of the time U ⊂ V and dim U = r, dim V = n, dim W = m)
• L (V, W ) the set of linear transformations from V to W
• S, T linear transformations
• I identity map
• O zero map
• Mm×n (K) the set of m × n matrices over the field K
• A, B, C, ... (bold capital letter) matrices
• I identity matrix
• O zero matrix
• B, E, S... (curly capital letter) bases, subsets
• λ, µ eigenvalues
viii
Color Codes
ix
x
Contents
1.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
xi
3 Determinants 45
3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5 Spectral Theory 81
5.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
xii
7 Complex Matrices 105
C Complexification 161
xiii
xiv
CHAPTER 1
In linear algebra, we study vector spaces over a field, i.e. a “number system” where you can add,
subtract, multiply and divide. To formally build up our theory, let us first define all the axioms
that we expect a number system to hold. (This is the only “abstract algebra” in this course.)
+ : K × K −→ K (addition)
∗ : K × K −→ K (product)
-1-
Chapter 1. Abstract Vector Spaces 1.1. Groups, Rings, Fields
Notation. Usually the symbol ∗ will be omitted when we write the product.
Q, R, C.
√ √
Q( 2): the set of all real numbers of the form a + 2b where a, b ∈ Q.
In abstract algebra, one usually studies mathematical structures that satisfy only some of the
axioms. This leads to the notion of groups and rings.
Definition 1.2. A set with a single operation satisfying (∗1)–(∗3) only is called a group.
A group satisfying in addition (∗4) is called an abelian group.
a + b ←→ a ∗ b 0 ←→ 1 − a ←→ a−1 .
Usually we prefer using + for abelian groups and ∗ for general groups.
abelian: Z (under +), Z/nZ (under + mod n), {en : n ∈ Z} (under ×).
Definition 1.3. A set with two operations satisfying (+1)–(+4) and (∗1), (∗2), (∗5) is called a
ring . This means we can add, subtract and multiply only.
A field is just a commutative ring that allows division (i.e. taking inverse of non-zero elements).
-2-
Chapter 1. Abstract Vector Spaces 1.2. Vector Spaces
The definition of a vector space is nearly the same as that of a ring, however with the product
replaced by scalar multiplication:
Definition 1.4. A vector space over a field K is a set V together with two binary operations:
+ : V × V −→ V (addition)
• : K × V −→ V (scalar multiplication)
Note. Rules (+1)–(+4) just say that V is an abelian group under addition.
Also (+3) implies that V must be non-empty by definition.
Remark. One can also replace K by any ring R, the resulting structure is called an R-module. However,
the algebraic structure will be very different from a vector space, because we cannot do division in general,
so that we cannot “rescale” vectors arbitrarily.
⇀
Notation. We will denote a vector with boldface u in this note, but you should use u for hand-
writing. Sometimes we will also omit the · for scalar multiplication if it is clear from the context.
Note. Unless otherwise specified, all vector spaces in the examples below are over R.
-3-
Chapter 1. Abstract Vector Spaces 1.2. Vector Spaces
Note that these proofs apply to any (abelian) groups. You should identify how each of the = signs
followed from the axioms.
u + v = u + v′ ⇐⇒ v = v′ .
In particular
u = u + v ⇐⇒ v = 0.
c · u = 0 ⇐⇒ c = 0 or u = 0.
-4-
Chapter 1. Abstract Vector Spaces 1.2. Vector Spaces
As a consequence, we have:
Example 1.5. The space Rn , n ≥ 1 with the usual vector addition and scalar multiplication.
x
3
Example 1.7. The subset {
y : x + y + z = 0} ⊂ R .
z
Example 1.8. The set C 0 (R) of continuous real-valued functions f (x) defined on R.
d2 f
f+ = 0.
dx2
Example 1.12. The ring K[t] of all polynomials with variable t and coefficients in K:
p(t) = a0 + a1 t + a2 t2 + ... + an tn
with ai ∈ K for all i.
-5-
Chapter 1. Abstract Vector Spaces 1.3. Subspaces
Example 1.13. The ring Mm×n (K) of all m × n matrices with entries in K.
(In this chapter we will sometimes use matrices as examples. More details starting from Chapter 2.3.)
Example 1.14. More generally, if V is a ring containing a field K as a subring (i.e. sharing the
same addition and multiplication operators), then V is a vector space over K.
Counter-Examples: these are not vector spaces (under usual + and ·):
!
x
Non-Example 1.16. The first quadrant { : x ≥ 0, y ≥ 0} ⊂ R2 .
y
Non-Example 1.18. Any straight line in R2 not passing through the origin.
1.3 Subspaces
To check whether a subset U ⊂ V is a subspace, we only need to check zero and closures.
Proposition 1.10. Let V be a vector space. A subset U ⊂ V is a subspace of V iff it satisfies all
the following conditions:
-6-
Chapter 1. Abstract Vector Spaces 1.3. Subspaces
Note. By Proposition 1.7 and (3) (taking c = 0), (1) can be replaced by “U is non-empty”.
If U satisfies (1)–(3), then for the 8 rules of vector space, (+1)(+2)(·1)–(·4) are automatically
satisfied from V , while (1)=⇒(+3) and (3)=⇒(+4) by multiplication with −1.
Example 1.23. The set Kn [t] of all polynomials of degree at most n with coefficients in K, is a
subspace of the vector space K[t] of all polynomials with coefficients in K.
Example 1.24. Real-valued functions satisfying f (0) = 0 is a subspace of the vector space of all
real-valued functions.
Non-Example 1.25. Any straight line in R2 not passing through (0, 0) is not a subspace of R2 .
Non-Example 2 3
1.26. R is not a subspace of R .
x
3 2 3
However {y
: x ∈ R, y ∈ R} ⊂ R , which “looks exactly like” R , is a subspace of R .
0
The set spanned by S is the set of all linear combinations of S, denoted by Span(S).
-7-
Chapter 1. Abstract Vector Spaces 1.4. Linearly Independent Sets
(2) c ∈ K, u, v ∈ U =⇒ c · u + v ∈ U .
Proof. Follows directly from Proposition 1.10: Taking c = 1 gives (2). Taking v = 0 gives (3).
a − 3b
b−a
Example 1.27. The set U := { ∈ R4 : a, b ∈ R} is a subspace of R4 , since every element
a
b
of U can be written as a linear combination of u1 and u2 :
1 −3
−1 1
au1 + bu2 := a + b ∈ R4 .
1 0
0 1
c1 v1 + · · · + cr vr = 0
Linearly independent set are those vectors that are not linearly dependent:
c1 v1 + · · · + cr vr = 0
-8-
Chapter 1. Abstract Vector Spaces 1.5. Bases
Example 1.29. A set of two elements {u, v} is linearly independent iff u, v are not multiple of
each other.
Example 1.31. A set of vectors is linearly dependent if one vector is a linear combination of other
vectors.
1 0 0
3
Example 1.32. The set of vectors {
0 , 1 , 0} in R is linearly independent.
0 0 1
1 4 7
3
Example 1.33. The set of vectors {
2 , 5 , 8} in R is linearly dependent.
3 6 9
Example 1.35. The set {sin 2x, sin x cos x} in C 0 (R) is linearly dependent.
Example 1.36. Linear dependence depends on the field. The set of functions {sin x, cos x, eix }
(x ∈ R) is linearly independent over R, but linearly dependent over C.
1.5 Bases
(2) V = Span(B).
-9-
Chapter 1. Abstract Vector Spaces 1.5. Bases
Example 1.38. The monomials {1, t, t2 , ..., tn } form the standard basis E for Kn [t].
Theorem 1.17 (Spanning Set Theorem). Let S = {v1 , ..., vr } ⊂ V and let U = Span(S).
(1) If one of the vectors, say vk , is a linear combination of the remaining vectors in S, then
U = Span(S \ {vk }).
Proof.
– Let u ∈ U . Then
u = c1 v1 + · · · + cr−1 vr−1 + cr vr , for some ci ∈ K
= c1 v1 + · · · + cr−1 vr−1 + cr (a1 v1 + · · · + ar−1 vr−1 )
= (c1 + cr a1 )v1 + · · · + (cr−1 + cr ar−1 )vr−1
∈ Span(S \ {vk }).
(2) If S is a basis of U , then there is nothing to prove. So assume S is not a basis of U .
Since S spans U , this means S is not linearly independent.
Hence there exists ai ∈ K, not all zero, such that a1 v1 + · · · + ar vr = 0.
WLOG, assume ar ̸= 0, so that
a1 ar−1
vr = − v1 − · · · − vr−1 .
ar ar
-10-
Chapter 1. Abstract Vector Spaces 1.5. Bases
If B is a basis for V , then for any v ∈ V , there exists unique scalars c1 , ..., cn ∈ K such that
v = c1 b1 + ... + cn bn .
Definition 1.19. The scalars c1 , ..., cn ∈ K are called the coordinates of v relative to the basis
B, and
c1
. n
.
.∈K
[v]B :=
cn
is the coordinate vector of v relative to B.
Note. The coordinate vector depends on the order of the basis vectors in B.
Example 1.39. The coordinate vector of the polynomial p(t) = t3 + 2t2 + 3t + 4 ∈ K3 [t] relative
to the standard basis E = {1, t, t2 , t3 } is
4
3
[p(t)]E = ∈ K 4 .
2
1
Example 1.40. The columns of an invertible matrix A ∈ Mn×n (R) form a basis of Rn , because
Ax = b always has a unique solution (see Theorem 2.32).
-11-
Chapter 1. Abstract Vector Spaces 1.6. Dimensions
1.6 Dimensions
Proof. (Outline)
Let S ′ = {u1 , v1 , ..., vn }. One can replace some vi by u1 so that S ′ \ {vi } also spans V .
– This is because u1 = a1 v1 + · · · + an vn and ai ̸= 0 for some i.
– Hence vi = a−1 ′
i (u1 − a1 v1 − · · · − an vn ) ∈ Span(S \ {vi }) where the sum omit the i-th
term.
– By the Spanning Set Theorem, V = Span(S ′ \ {vi }).
– Reindexing the vi ’s, WLOG we may assume V = Span(u1 , v2 , v3 , ..., vn ).
Assume on the contrary that m > n.
– Repeating the above process we can replace all v’s by u’s:
V = Span(u1 , v2 , v3 , ..., vn ) = Span(u1 , u2 , v3 , ..., vn ) = · · · = Span(u1 , u2 , ..., un )
(We have to use the assumption that {u1 , ..., un } is linearly independent.)
– Since um ∈ V = Span(u1 , ..., un ), the set {u1 , ..., um } cannot be linearly independent, a
contradiction to the assumption of S.
Applying this statement to different bases B and B ′ , which are both spanning and linearly
independent, we get
Theorem 1.21. If a vector space V has a basis of n vectors, then every basis of V must also
consists of exactly n vectors.
By the Spanning Set Theorem, if V is spanned by a finite set, it has a basis. Then the following
definition makes sense by Theorem 1.21.
Definition 1.22. If V is spanned by a finite set, then V is said to be finite dimensional ,
dim(V ) < ∞. The dimension of V is the number of vectors in any basis B of V :
B = {b1 , ..., bn } =⇒ dim V := |B| = n.
-12-
Chapter 1. Abstract Vector Spaces 1.6. Dimensions
Notation. If the vector space is over the field K we will write dimK V . If it is over R or if the
field is not specified (as in Definition 1.22 above), we simply write dim V instead.
x
3
Example 1.44. Let V = {
y ∈ R : x + y + z = 0}. Then dim V = 2.
z
Theorem 1.23 (Basis Extension Theorem). Let dim V < ∞ and U ⊂ V be a subspace.
Any linearly independent set in U can be extended, if necessary, to a basis for U .
Also, U is finite dimensional and
dim U ≤ dim V.
If it does not span U , there exists u ∈ U such that {b1 , ..., br , u} is linearly independent.
If this still does not span U , repeat the process. It must stop by the Replacement Theorem.
Corollary 1.24. Let dim V < ∞. If U ⊂ V is a subspace and dim U = dim V , then U = V .
-13-
Chapter 1. Abstract Vector Spaces 1.6. Dimensions
Linearly
Spanning
Independent ⊆ Basis ⊆
Set
Set
Replacement Theorem
Finally, if we know the dimension of V , the following Theorem gives a useful criterion to check
whether a set is a basis:
Theorem 1.25 (Basis Criterions). Let dim V = n ≥ 1, and let S ⊂ V be a finite subset with
exactly n elements.
(1) If S is not spanning, by the Basis Extension Theorem we can extend S to a basis with > n
elements, contradicting the dimension.
(2) If S is not linearly independent, by the Spanning Set Theorem a subset of S with < n elements
forms a basis of V , contradicting the dimension.
Remark. Every vector space has a basis, even when dim V = ∞. However, it requires (and is equivalent
to) the Axiom of Choice, a fundamental axiom in infinite (ZFC) set theory.
-14-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums
(1) V = U + W ,
(2) U ∩ W = {0}.
V = U ⊕ W.
Note. The order of ⊕ does not matter: by definition U ⊕ W = W ⊕ U . However, their bases
will have different order: usually the basis vectors from the first direct summand come first.
x 0
3
Example 1.48. R = {
y : x, y ∈ R} ⊕ {0 : z ∈ R}.
0 z
( ) ( )
Even functions Odd functions
Example 1.50. {Space of functions on R}= ⊕ .
f (−x) = f (x) f (−x) = −f (x)
( ) ( )
Symmetric matrices Anti-symmetric matrices
Example 1.51. {Square matrices} = ⊕ .
AT = A AT = −A
-15-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums
(=⇒) If v = u + w = u′ + w′ , then
u − u′ ∈ U
=w′ − w ∈ W.
Since U ∩ W = {0}, u − u′ = w′ − w = 0 and hence u′ = u, w′ = w.
(⇐=) Let v ∈ U ∩ W . Then
v=v+0=0+v
so that by uniqueness v = 0, and hence U ∩ W = {0}.
Theorem 1.30 (Direct Sum Complement). Assume dim V < ∞ and let U ⊂ V be a subspace.
Then there exists another subspace W ⊂ V such that
V = U ⊕ W.
By the Basis Extension Theorem, we can extend it to a basis {u1 , ..., ur , w1 , ..., wm } of V .
Then W := Span(w1 , ..., wm ) is a direct sum complement to U .
Note. The complement W is not unique: there are many ways to extend a basis.
Remark. Theorem 1.30 also holds when dim V = ∞, which again is equivalent to the Axiom of Choice.
-16-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums
Hence assume both U, W are finite dimensional, and let B = {b1 , ..., bm } be a basis of U ∩ W .
Example 1.52. If U and W are two different planes passing through the origin in R3 , then U ∩W
must be a line and U + W = R3 . The dimension formula then gives 2 + 2 = 3 + 1.
Definition 1.32. Let U, W be two vector spaces over K (not necessarily a subspace of some V ).
U × W := {(u, w) : u ∈ U, w ∈ W }
(u, w) + (u′ , w′ ) := (u + u′ , w + w′ )
c · (u, w) := (c · u, c · w), c∈K
which makes U × W into a vector space with the zero vector (0, 0) ∈ U × W .
-17-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums
It is clear that:
U × W = U′ ⊕ W′
where
U ′ := {(u, 0) : u ∈ U } ⊂ U × W
W ′ := {(0, w) : w ∈ W } ⊂ U × W
Note. We see that U ′ and U are nearly identical, except the extra 0 component. Similarly for W
and W ′ . More precisely, they are isomorphic by the terminology in the next chapter.
Some textbook also write U × W as U ⊕ W without referencing a bigger vector space V , and call it
the external direct sum.
On the other hand, if U, W ⊂ V , then V = U ⊕ W is called an internal direct sum.
We will always refer to internal direct sum when we write U ⊕ W , i.e. U, W are subspaces of some vector
space V to begin with.
Remark. U × W is different from the tensor product U ⊗ W which we will not discuss in this course.
Example 1.53. R3 can be expressed as both internal and external direct sums by:
x 0
3
R = {y : x, y ∈ R} ⊕ {0 : z ∈ R}
0 z
= R2 × R.
-18-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums
(1) V = U1 + · · · + Un ,
X
(2) Uj ∩ Ui = {0} for all j.
i̸=j
V := U1 ⊕ · · · ⊕ Un .
Note. It is not enough to replace (2) by the condition Ui ∩ Uj = {0} for all i ̸= j. E.g. Take the
x, y axis together with the line y = x in R2 .
Then it is easy to see that the direct sum is equivalent to V = (U1 ⊕ · · · ⊕ Un−1 ) ⊕ Un defined inductively.
Equivalently, V is a direct sum iff any collection of vectors {u1 , ..., un } for ui ∈ Ui are linearly
independent.
-19-
Chapter 1. Abstract Vector Spaces 1.7. Direct Sums
-20-
CHAPTER 2
Now that we have defined the notion of vector spaces and have seen many examples of them, in
this chapter we study linear transformations, i.e. maps between vector spaces that have certain
“good” property — namely they preserve linearity.
We will see that when we choose a basis to our vector space in order to associate each vector a
coordinate (see Definition 1.19), linear transformation becomes the usual notion of a matrix.
T : V −→ W
(2) T (c · v) = c · T (v).
-21-
Chapter 2. Linear Transformations and Matrices 2.1. Linear Transformations
Definition 2.2. Let S ∈ L (V, W ) and T ∈ L (V, W ). Let c ∈ K. Then for any u ∈ V we define
(S + T )(u) := S(u) + T (u) ∈ W ,
Proposition 2.3. Any T ∈ L (V, W ) is uniquely determined by the image of any basis B of V .
Notation. We may sometimes include the subscript IV to emphasis its domain and target.
Example 2.6. (Linear) differential operators on the space C ∞ (R) of smooth functions on R.
-22-
Chapter 2. Linear Transformations and Matrices 2.1. Linear Transformations
Since we just need to know the image of the basis vectors to determine a linear transformation,
when V = K n and W = K m we can record them into a matrix:
| |
where the i-th column ai of A is the image T (ei ) of the standard basis vector ei ∈ K n .
Conversely, we say that A ∈ Mm×n (K) represents T ∈ L (K n , K m ) if ai = T (ei ).
Since matrices represent linear maps, we can add and multiply matrices by scalar according to
Definition 2.2, which is done component-wise. We will study matrices in detail in Section 2.3.
Proof. Just verify Proposition 1.10. Note that 0 lies in both spaces.
d
Example 2.8. The kernel of on C ∞ (R) is the set of all constant functions.
dx
T ◦ S ∈ L (U, W ).
-23-
Chapter 2. Linear Transformations and Matrices 2.2. Injections, Surjections and Isomorphisms
(1) T is surjective ⇐⇒ T (V ) = W .
To prove (5), check that if {b1 , ..., br } is a basis of U , then {T (b1 ), ..., T (br )} is a basis for T (U ).
π : V −→ W
v 7→ w
-24-
Chapter 2. Linear Transformations and Matrices 2.2. Injections, Surjections and Isomorphisms
Proposition 2.11. If T is an isomorphism, then dim V = dim W , and there exists a unique map
T −1 ∈ L (W, V ) which is also an isomorphism, such that
T −1 ◦ T = IV , T ◦ T −1 = IW .
(S ◦ T )−1 = T −1 ◦ S −1 .
Theorem 2.13. If B = {b1 , ..., bn } is a basis for a vector space V , then the coordinate mapping
ψV,B : V −→ K n
v 7→ [v]B
is an isomorphism V ≃ K n .
Proof. The map ψV,B is obviously linear, injective and surjective. The inverse is given by
c1
−1 ..
ψV,B : . ∈ K n 7→ c1 b1 + · · · + cn bn ∈ V.
cn
-25-
Chapter 2. Linear Transformations and Matrices 2.2. Injections, Surjections and Isomorphisms
x x
2
2
Example 2.12. {y : x, y ∈ R} =
̸ R but {y
: x, y ∈ R} ≃ R .
0 0
Now let V, W be finite dimensional vector spaces with dim V = n and dim W = m.
Given a basis B of V and B ′ of W , if T (v) = w, then T can be represented by a matrix
[T ]B
B′ : K
n
−→ K m
[v]B 7→ [w]B′ .
By definition, the i-th column of [T ]B
B′ is given by [T (bi )]B′ where bi ∈ B.
[T ]B
B′ ∈ L (K , K )
n m
Note. Definition 2.4 says that a matrix A, considered as a linear map L (K n , K m ), represents
itself with respect to the standard basis.
-26-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices
2.3 Matrices
i.e. the (i, j)-th entry is given by multiplying the entries of the i-th row of A and j-th column of
B.
Note. A vector is just an n × 1 matrix, hence one can multiply a vector with A from the left.
n
X n X
X n n X
X n n
X
Proof. Tr(A · B) = (A · B)ii = aik bki = bki aik = (B · A)kk = Tr(B · A).
i=1 i=1 k=1 k=1 i=1 k=1
Proof. Just check that the (i, j)-th entry on both sides are the same.
-27-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices
The exact relationship between matrices and linear transformations is given as follows.
Recall that by Definition 2.4, an m×n matrix A represents a linear transformation T : K n −→ K m ,
where the i-th column is the image of ei ∈ K n .
where bj is the j-th column of B. But by above, T (bj ) = A · bj is precisely the j-th column
of A · B by Proposition 2.17.
S T
U V W
≃B ≃B ′ ≃B′′
′
[S]B
B′
[T ]B
B′′
Kr Kn Km
′
[T ]B
B′′
[S]B
B′
-28-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices
The kernel and image of a linear transformation represented by a matrix are given special names:
The kernel of T is called the null space NulA. It is the set of all solutions to Ax = 0 of
m homogeneous linear equations in n unknowns. It is a subspace of the domain K n .
The image of T is called the column space ColA. It is the set of all linear combinations of
the columns of A. It is a subspace of the target K m .
To find the basis of NulA and ColA, we can use row and column operations to change the
matrix into the reduced echelon form as follows.
Notation. When we talk about row vectors, we will treat them as horizontal 1 × n vectors.
Therefore it suffices to apply row operations to simplify the matrix into a simpler form, the most
useful one is known as the reduced row echelon form.
Definition 2.23. The row echelon form is a matrix such that
It is reduced if in addition each column containing the pivot has zeros in all its other entries.
-29-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices
Note. By definition, the non-zero rows of a (reduced) row echelon form are linearly independent.
This is the row echelon form. The highlighted 1 are the pivots.
Solving for the variables corresponding to the pivots (i.e. 1st and 3rd variable), we see that
(
x = −2y − t,
z = −t.
-30-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices
In other words,
−2 −1
1 0
NulA = Span( , )
0 −1
0 1
Example 2.18. On the other hand, applying column operations instead, we have
2 4 0 2 2 0 0 2 2 0 0 0 2 0 0 0
c2 7→c2 −2c1 c4 7→c4 −c1 c4 7→c4 −c3
1 2 3 4 −
−−−−−−→ 1 0 3 4 −−−−−−→ 1 0 3 3 −−−−−−→ 1 0 3 0 .
2 4 8 10 2 0 8 10 2 0 8 8 2 0 8 0
and it is 2-dimensional subspace of R3 . We see that sometimes it is not necessary to go all the way
to the reduced echelon form to describe the column space or its dimension.
-31-
Chapter 2. Linear Transformations and Matrices 2.3. Matrices
Proposition 2.24. Row operations do not change the linear dependency of column vectors.
More precisely, if
| | |
A= a 1 a 2 · · · a n
| | |
and we have a linear dependence
c1 a1 + · · · + cn an = 0, ci ∈ K
In particular, ColA is spanned by the columns of the original matrix which contains the pivots
after row operations (which are linearly independent).
Example 2.19. Looking at Example 2.17 again, we can conclude that the 1st and 3rd column
vectors of A forms the basis of ColA, as shown in Example 2.18.
Corollary 2.25. The reduced row echelon form of a matrix A is unique, and uniquely determines
a spanning set of NulA that is reduced column echelon.
Proof. By Proposition 2.24, the reduced row echelon form is uniquely determined by how the
columns ck depend linearly on the previous columns {c1 , ..., ck−1 }, independently or as a linear
combination.
Recall that
-32-
Chapter 2. Linear Transformations and Matrices 2.4. Fundamental Theorems of Linear Algebra
Therefore column operations and column echelon form of A are just row operations and row echelon
form of AT , so in fact we only need to use one of them.
(Usually row operations are easier to deal with because we learn to add vertically as a child.)
We also see from above that row operations are enough to calculate both NulA and ColA.
(AB)T = BT AT .
Proof. This can be seen directly by changing the indices in the matrix multiplication formula
(Definition 2.15).
Now let us focus on the dimensions of the image and kernel of T ∈ L (V, W ).
Definition 2.28.
The row space is the space spanned by the (transpose of) the rows of A.
It is a subspace of K n .
The next two Theorems form the Fundamental Theorems of Linear Algebra:
-33-
Chapter 2. Linear Transformations and Matrices 2.4. Fundamental Theorems of Linear Algebra
Since row space of A is just the column space of AT , we can also rephrase Theorem 2.29 as
Rank of A = Rank of AT .
Proof. Example 2.17 and Proposition 2.24 tell us that dim ColA is given by the number of linearly
independent column vectors after either the row operations or the column operations.
Theorem 2.30 (Rank–Nullity Theorem). Let dim V < ∞ and T ∈ L (V, W ). Then
Proof. If dim W < ∞, then we can represent T by a matrix. From the steps of Example 2.17, we
can conclude that after row operations,
dim Ker(T ) is the number of columns that does not contain pivots.
If {u1 , ..., un } is a basis of V , then Im(T ) = Span(T (u1 ), ..., T (un )).
Note that Im(T ) = Im(T ′ ) and Ker(T ) = Ker(T ′ ), so we can just apply the above arguments to T ′
instead.
Example 2.20. Using the matrix from Example 2.17 again, we see that A : R4 −→ R3 with
Remark. The Rank-Nullity Theorem still holds in the infinite dimensional case, in the sense that
if dim V = ∞, then either Im(T ) or Ker(T ) is infinite dimensional.
-34-
Chapter 2. Linear Transformations and Matrices 2.5. Invertible Matrices
Definition 2.31. A square n×n matrix A ∈ Mn×n (K) is invertible if it represents an isomorphism
T : K n ≃ K n . By Proposition 2.11, there exists a unique n × n matrix A−1 representing T −1 such
that
A−1 · A = A · A−1 = In×n .
There are many ways to determine when a matrix is invertible. The main ones are the following:
(4) Nullity of A = 0.
(6) The reduced row echelon form of A is the identity matrix In×n .
Proof. (1)–(6) are all equivalent by the Fundamental Theorems of Linear Algebra.
Let A be a square n × n invertible matrix. Then for any b ∈ K n , the equation
Ax = b
To solve Ax = b directly, since the reduced row echelon form of A is the identity matrix In×n , by
row operations, the system of linear equations (in x) can be reduced to
I·x=u
where x = u is obtained from b by the same row operations and gives us the solution.
-35-
Chapter 2. Linear Transformations and Matrices 2.5. Invertible Matrices
-36-
Chapter 2. Linear Transformations and Matrices 2.6. Change of Basis
To find the inverse, we apply the same row operations to the extended matrix
2 0 2 1 0 0 2 0 2 1 0 0
r3 7→3r3
0 3 2
0 1 0 −
− −−−→ 0 3 2 0 1 0
0 1 1 0 0 1 0 3 3 0 0 3
2 0 2 1 0 0 2 0 2 1 0 0
r3 7→r3 −r2 r2 7→r2 −2r3
−−−−−−→ 0 3 2 0 1
0 −−−−−−−→ 0 3 0 0 3 −6
0 0 1 0 −1 3 0 0 1 0 −1 3
1
2 0 0 1 2 −6 r1 7→ 21 r1 1 0 0 12 1 −3
r1 7→r1 −2r3 r2 7→ 3 r2
−−−−−−−→ 0 3 0 0 3
−6 −−−−−→ 0 1 0 0 1 −2 .
0 0 1 0 −1 3 0 0 1 0 −1 3
1
−31
2
Therefore A−1 −1
=
0 1 −2. We verify that indeed x = A b.
0 −1 3
-37-
Chapter 2. Linear Transformations and Matrices 2.6. Change of Basis
[v]B′ = PB
B′ [v]B
PB ′
B′ is called the change-of-coordinate matrix from B to B .
In practice, to find PB
B′ , it is better to do the change of basis through the standard basis:
B −→ E −→ B′ .
′
Proof. It follows from Proposition 2.19 since it is just [I ]B B B
B′′ [I ]B′ = [I ]B′′ .
-38-
Chapter 2. Linear Transformations and Matrices 2.7. Similar Matrices
! !
1 0
Example 2.23. Let E = { , } be the standard basis of R2 . Let
0 1
! !
1 1
B = {b1 = , b2 = },
1 −1
! !
1 1
B ′ = {b′1 = , b′2 = }
0 1
One can check that this obeys the formula from Theorem 2.33:
b1 = 0 · b′1 + 1 · b′2
b2 = 2 · b′1 + (−1) · b′2 .
B = PAP−1
Conversely, if B = PAP−1 and A = [T ]B for some T ∈ L (V, V ), then B = [T ]B′ where B ′ = P(B).
-39-
Chapter 2. Linear Transformations and Matrices 2.7. Similar Matrices
A
[x]B [T x]B
P P
B
[x]B′ [T x]B′
B = PAP−1 .
We write this as
A ∼ B.
In other words,
Similar matrices represent the same linear transformation with respect to different bases.
Since similar matrices represent the same linear transformation, it is clearly an equivalence re-
lation (see Appendix A). Hence we can simply say that A and B are similar .
An equivalence class [A] of similar matrices is identified precisely with the same linear transforma-
tion T they represent under different bases of V .
Any quantities defined in terms of T only must be the same for similar matrices.
For T ∈ L (V, W ), if we choose a “nice basis” B, the matrix [T ]B can be very nice!
The whole point of Linear Algebra is to find nice bases for linear transformations.
When W = V , the best choice of such a “nice basis” is given by diagonalization, which means
under this basis, the linear transformation is represented by a diagonal matrix. We need to build
up more knowledge before learning the conditions that allow diagonalization in Chapter 5 and 7.
Remark. When V ̸= W , we have instead the Singular Value Decomposition, see Section 6.2.
-40-
Chapter 2. Linear Transformations and Matrices 2.7. Similar Matrices
! ! ! !
1 0 −2 1
Example 2.24. In R2 , let E = { , } be the standard basis, and B = { , }
0 1 2 1
be another basis.! Let T be the linear transformation represented in the standard basis E by
1 14 −2
A= . Then T is diagonalized in the basis B:
5 −2 11
!−1 ! ! !
14
−1 1 −2 5 − 25 1 −2 2 0
D := [T ]B = P AP = = .
2 1 − 25 11
5 2 1 0 3
!
1 −2
P = PB
E =
2 1
B
E
A = [T ]E D = [T ]B
P = PB
E
E B
! !
1 14 −2 2 0
A= D = P−1 AP =
5 −2 11 0 3
-41-
Chapter 2. Linear Transformations and Matrices 2.8. Vandermonde Matrices
We conclude this section with an application about polynomials (over any field K).
We assume the obvious fact that a polynomial of degree n has at most n roots.
Proposition 2.38. Two polynomials in Kn [t] are the same if they agree on n + 1 distinct points.
Proof. If p1 (t) and p2 (t) are two polynomials of degree ≤ n that agree on n + 1 distinct points,
then p1 (t) − p2 (t) is a degree ≤ n polynomial with n + 1 distinct roots, hence must be 0.
Definition 2.39. Given n + 1 distinct points t0 , ..., tn ∈ K (i.e. ti ̸= tj for i ̸= j), the evaluation
map is the linear transformation
T : Kn [t] −→ K n+1
p(t0 )
p(t) 7→ ··· .
p(tn )
Note that by Proposition 2.38, T is injective from Kn [t] −→ K n+1 . Since both vector spaces have
the same dimension n + 1, T must be an isomorphism by Theorem 2.32.
Since T is a linear transformation, by choosing the standard basis E = {1, t, t2 , ..., tn } of Kn [t]
and E ′ = {e0 , e1 , ..., en } of K n+1 (notice the index!), we can represent it by a matrix:
Proof. The k-th column of [T ]EE ′ is the evaluation of the monomial tk at the points t0 , ..., tn .
Since T is an isomorphism, we can find its inverse. In other words, given n+1 points λ0 , ..., λn ∈ K,
we want to reconstruct a polynomial p(t) of degree n such that p(tk ) = λk .
-42-
Chapter 2. Linear Transformations and Matrices 2.8. Vandermonde Matrices
Proposition 2.41. For k = 0, 1, ..., n, the image of the polynomial (of degree n)
-43-
Chapter 2. Linear Transformations and Matrices 2.8. Vandermonde Matrices
-44-
CHAPTER 3
Determinants
3.1 Definitions
The determinant can be defined by specifying only the properties (axioms) we want it to satisfy.
45
Chapter 3. Determinants 3.1. Definitions
(3) det(I) = 1.
Note. By considering
| |
det
· · · u + v u + v · · ·
| |
we see that if det(A) = 0 for any matrix A which has two same columns, then the alternating
properties hold. Therefore the two properties are equivalent.
Assume the determinant is well-defined. Then the properties allow us to calculate det A using
column operations. For example we have
| | | | | |
det
· · · u i + cu j u j · · · = det · · · ui uj · · · + c det · · · uj uj · · ·
| | | | | |
| |
= det
· · · u i u j · · · +0
| |
| |
= det
· · · ui uj · · · .
| |
-46-
Chapter 3. Determinants 3.1. Definitions
Proposition 3.2. The determinant of a triangular (in particular diagonal) matrix is the product
of its diagonal entries.
Proof. Let us consider lower triangular matrix. By multilinearity and column operations,
a11 0 · · · ··· 0 1 0 ··· ··· 0
∗ a22 0 ··· 0 ∗ 1 0 · · · 0
. .. .. .. (1) . . . . . ..
det . ∗ . . = a a · · · a det .. ∗ . . .
. .
11 22 nn
. . .. . .. ..
.. .. ..
. 0
. . 0
∗ ··· ··· ∗ ann ∗ ··· ··· ∗ 1
1 0 ··· ··· 0
0 1
0 · · · 0
. . . . . ..
. 0
(column) = a11 a22 · · · ann det . . . .
. .. ..
..
. . 0
0 ··· ··· 0 1
| {z }
=I
(3)
= a11 a22 · · · ann .
Similarly for upper triangular matrix. Note that the proof still works even for some aii = 0.
However, in order for these calculations to be well-defined, we first need to show that such
determinant function satisfying (1)-(3) actually exists and is unique.
-47-
Chapter 3. Determinants 3.2. Existence and Uniqueness
We show that the determinant function is well-defined and unique, by explicitly writing down its
formula and check that it satisfies the conditions.
Define X
det A := ϵ(σ)aσ(1)1 aσ(2)2 · · · aσ(n)n
σ∈Sn
where
– Sn is the set of all permutations of {1, ..., n},
( (
+1 even
– the sign of permutation ϵ(σ) = if σ ∈ Sn can be obtained by an number
−1 odd
of transpositions (i.e. exchanging 2 numbers).
Then it satisfies (1)–(3) of Definition 3.1.
i.e. adding all the “forward diagonals” (cyclically) and subtracting the “backward diagonals”.
To explain the Leibniz expansion formula, we first show that if D is any function satisfying (1)–(3),
it must be of the form defined in the Theorem.
-48-
Chapter 3. Determinants 3.2. Existence and Uniqueness
Then by multilinearity,
| |
D(A) = D
a 1 · · · a n
| |
| |
X n n
X
= D ai1 1 ei1 · · · ain n ein
i1 =1 in =1
| |
n | |
X
= ai1 1 · · · ain n D
ei1 · · · ein
i1 ,...,in =1
| |
Hence
| |
X
D(A) = aσ(1)1 aσ(2)2 · · · aσ(n)n D
eσ(1) · · · eσ(n)
σ∈Sn
| |
where Sn is the set of permutations of {1, ..., n}.
Again by alternating properties, we can swap the columns to bring it back to the identity
matrix, picking up a sign ± which depends on σ. More precisely,
| |
Deσ(1) · · · eσ(n) = ϵ(σ)D(I)
| |
and condition (3) requires D(I) = 1 which will fix the formula.
-49-
Chapter 3. Determinants 3.2. Existence and Uniqueness
It remains to show that this expression is indeed well-defined, i.e. it satisfies (1)–(3).
(1) For each fixed k, since each term in the expansion contains exactly one term aσ(k)k with
second index k, the expression is multilinear in the k-th column.
(If ak = u + cv then aσ(k)k 7→ uσ(k) + cvσ(k) .)
(2) If the k-th and l-th column of A are the same, then the terms in the expansion
differs by a sign
σ ′ is related to σ by swapping i and j,
aik = ail , ajk = ajl by assumption.
so they cancel in the expansion, and det(A) = 0.
(3) Obvious.
The uniqueness enables us to show easily the multiplicative property of det:
D(B) = c det(B)
det(A−1 ) = det(A)−1 .
If A is not invertible, then column operations will bring A to the column echelon form with a zero
column, hence det A = 0.
-50-
Chapter 3. Determinants 3.2. Existence and Uniqueness
det(A) = det(B).
Corollary 3.8. The determinant of a block triangular (in particular block diagonal) matrix
is the product of determinant of its diagonal blocks:
!
Ak×k Bk×l
det = det(Ak×k ) det(Cl×l )
Ol×k Cl×l
-51-
Chapter 3. Determinants 3.3. Elementary Matrices
By representing the column operations as elementary matrices, we show that det A = det AT ,
so that in fact the determinant can also be computed by row operations.
Proposition 3.9. The column operations correspond to multiplying on the right by the
elementary matrices (c ∈ K, c ̸= 0):
i j
..
.
E =
1 c i
: Adding a multiple of i-th column to the j-th column.
0 1 j
..
.
i
.
..
E =
: Scalar multiplying the i-th column by c.
c i
..
.
i j
..
.
E =
0 1 i
: Interchanging two columns.
1 0 j
..
.
.
Here the . . means it is 1 on the diagonal and 0 otherwise outside the part shown.
-52-
Chapter 3. Determinants 3.3. Elementary Matrices
In exact analogy:
Proposition 3.10. Row operations correspond to multiplying on the left by elementary matrices.
det A = det AT .
In other words:
det E = det ET .
If det A = 0, then A is not invertible, hence AT is also not invertible (by Theorem 2.29),
hence det AT = 0.
This is the same as saying we can obtain A from I by column operations, i.e.
A = IE1 E2 · · · Ek .
det A = det(IE1 E2 · · · Ek )
= det(E1 ) det(E2 ) · · · det(Ek )
= det(ET1 ) det(ET2 ) · · · det(ETk )
= det(ETk · · · ET2 ET1 I) = det AT .
-53-
Chapter 3. Determinants 3.4. Volumes
3.4 Volumes
P := {c1 v1 + · · · + cn vn : 0 ≤ ci ≤ 1} ⊂ Rn
v3 v2
v1
Proof. By column operations, we can reduce the matrix to I, which corresponds to a hypercube
and we define its volume to be 1. Hence we only need to check that volume behaves like | det |
under column operations.
v 7→v +c·v
v3 −−3−−−3−−−→
2
v′
v2 3 v2
v1 v1
It does not change the base and height, so that
Vol(P′ ) = Vol(P).
-54-
Chapter 3. Determinants 3.4. Volumes
v 7→2v
v3 v2 −−1−−−→
1
v3 v2
v1
2v1
Vol(P′ ) = |c|Vol(P).
Note. The parallelepiped formed by linearly dependent vectors has zero volume.
In this case the orientation is not defined.
Intuitively, this means the order of the vectors “looks like” the order of the standard basis.
In 3 dimension, this is known as the right hand rule (which is also used to determine the direction
of the vector cross product u × v.)
Interchanging two columns, or scalar multiply by negative numbers, will switch the orientation.
-55-
Chapter 3. Determinants 3.5. Laplace Expansion
Positively oriented:
v2
+ v3 v2 v1 v3
v1
v1 v2
Negatively oriented:
v2
v1
v1
v2
− v3 v1 v2
v3
If a parallelepiped is spanned by positively oriented vectors, then after shearing, it becomes a
rotated rectangular box of the standard basis.
Recall that the image of the standard basis {e1 , ..., en } ∈ Rn under A is exactly its columns.
Therefore | det A| gives the scaling factor of a linear transformation. This gives a geometric
explanation of the product formula
det(AB) = det(A) det(B).
An alternative and much more useful formula is the Laplace expansion formula.
Expansion by row:
det(A) = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin .
Expansion by column:
where
Aij is the submatrix obtained from A by deleting the i-th row and j-th column.
-56-
Chapter 3. Determinants 3.5. Laplace Expansion
Proof. Since det A = det AT , we just need to prove one version. Let’s do expansion by column.
-57-
Chapter 3. Determinants 3.6. Cramer’s Rule
Geometrically, if B := {a1 , ..., an } is the basis formed from the columns of A, then the amount xi it
requires to scale in the ai direction to obtain b is precisely the ratio between the volume spanned
by B and the volume of B ′ obtained from B by replacing ai with b.
x1
.
Proof. By definition, if x = .
. is a solution, then
xn
x1 a1 + · · · + xn an = b.
Therefore by multilinearity and the alternating properties,
i
| | |
| | |
det(Ai ) = deta1 ··· b ··· an = x det
i
a1 · · · ai · · · an = xi det(A).
| | | | | |
Remark. Cramer’s Rule is pretty useless in practice: it requires calculating n + 1 determinants of size
n × n, which is computationally difficult when n is large.
Example 3.6. Using the same matrix from Example 2.21, we want to solve
2 0 2 x 1
0 3 2 y = 2 .
0 1 1 z 3
-58-
Chapter 3. Determinants 3.6. Cramer’s Rule
3 1 1 0 3 1 0 1 3
as before.
With Cramer’s Rule, we can easily derive a closed formula for the inverse A−1 as follows.
adj(A)
A−1 = .
det(A)
Proof. The j-th column of A−1 equals the solution of Ax = ej , so the (i, j)-th entry is xi .
By Cramer’s rule,
| | |
xi det A = det Ai = deta1 ··· ej ··· an .
| | |
i
By Laplace expansion, this is the same as det(Aji ) with the appropriate sign (−1)i+j .
-59-
Chapter 3. Determinants 3.6. Cramer’s Rule
-60-
CHAPTER 4
In this chapter we focus on K = R. We define the geometric concepts of length, distance, angle
and perpendicularity for Rn . This gives Rn the structure of an Euclidean space.
is defined to be
v1 n
..
X
u · v := uT v = u1 · · · un . =
ui vi ∈ R.
i=1
vn
Notation. To avoid confusion, I will omit · for scalar multiplication: writing cu instead of c · u.
-61-
Chapter 4. Inner Product Spaces 4.1. Inner Products
Proposition 4.2. If A = (aij ) is an m × n matrix, then the matrix entries are given by
where {ej } is the standard basis for Rn and {e′i } is the standard basis for Rm .
− aT1 − | |
..
is an m × k matrix, and B = b1 · · · bn is a k × n matrix, then the
If A =
.
− aTm − | |
matrix product is given by
a1 · b1 · · · a1 · bn
. .. ..
AB = .
. . . .
am · b1 · · · am · bn
This is just a restatement of Proposition 2.17.
Motivated from the obvious properties of dot product, a more general notion is the inner product:
A vector space V over R equipped with an inner product is called an inner product space.
Note. (1) implies that the inner product is also linear in the second argument.
Example 4.1. V = Rn equipped with the dot product as the inner product
⟨u, v⟩ := u · v
⟨x, y⟩ := c1 x1 y1 + · · · + cn xn yn
-62-
Chapter 4. Inner Product Spaces 4.1. Inner Products
⟨A, B⟩ := Tr(AT B)
satisfies Definition 4.3 and is an inner product. Under the isomorphism Mm×n (R) ≃ Rmn , it is just
the usual dot product.
Remark. Infinite dimensional inner product space belongs to the theory of Hilbert spaces which is very
important in mathematical physics and functional analysis, but also much more complicated.
In this chapter, we will mostly focus on the finite dimensional case unless otherwise specified.
Notation. We will fix the dot product as the inner product of Rn in all the Examples below
unless otherwise specified.
Remark. When K = C, we need to modify Definition 4.3 so that it involves the complex conjugate:
⟨ , ⟩ : V × V −→ C
This ensures the positivity (3) holds. Also note that we now have
-63-
Chapter 4. Inner Product Spaces 4.1. Inner Products
Let V be an inner product space. Motivated from classical geometry, we have a list of analogies:
Definition 4.5. A vector u ∈ V with unit length, i.e. ∥u∥ = 1 is called a unit vector .
1
Given 0 ̸= v ∈ V , the vector v has unit length and is called the normalization of v.
∥v∥
1
−2
Example 4.5. v = ∈ R4 has Euclidean length
2
0
p √
∥v∥ = 12 + (−2)2 + 22 + 02 = 9 = 3,
1
3
1 − 2
and the unit vector v = 23 is its normalization.
3
3
0
dist(u, v) := ∥u − v∥.
Note. For general inner product, θ might not be the same as the Euclidean angle between two
vectors.
-64-
Chapter 4. Inner Product Spaces 4.2. Orthogonal Bases
π
In analogy to the Euclidean case θ = , we say that
2
Definition 4.8. Two vectors u, v ∈ V are orthogonal (or perpendicular ) to each other if
⟨u, v⟩ = 0.
! !
1 1
Example 4.6. and are orthogonal to each other in R2 with respect to the dot product.
1 −1
Simple results from classical geometry still holds for general inner product space:
∥u + v∥ ≤ ∥u∥ + ∥v∥.
Proof.
(4.10) ∥u + λv∥2 ≥ 0 for all λ. Expanding as inner products give a quadratic polynomial in λ.
Take the discriminant ∆ ≤ 0 gives the result.
-65-
Chapter 4. Inner Product Spaces 4.2. Orthogonal Bases
If in addition all vectors in S has unit norm, it is called an orthonormal basis for V .
Example 4.8. The standard basis {e1 , e2 , ..., en } for Rn is an orthonormal basis:
(
1 if i = j,
ei · ej = δij =
0 if i ̸= j.
( ! !)
1 1
Example 4.9. The set , is an orthogonal basis for R2 .
1 −1
√1 √1
Its rescaled version, the set 12 , 21 is an orthonormal basis for R2 .
√
2
−√ 2
Proposition 4.13. Let B = {b1 , ..., bn } be an orthogonal basis for V (i.e. dim V = n). Then the
coordinate mapping with respect to B is given explicitly by
ψV,B : V −→ Rn
c1
.
v 7→ [v]B := .
.
cn
where ⟨v, bi ⟩
ci = , i = 1, ..., n.
⟨bi , bi ⟩
aij = ⟨wi , T vj ⟩.
Proof. By definition the j-th column of A is [T vj ]B′ . Letting bi = wi in Proposition 4.13 gives the
i-th component of this column, which is exactly aij .
-66-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections
Example 4.10. Consider 2π-periodic smooth real functions on R with inner product
Z 2π
⟨f, g⟩ := f (x)g(x)dx.
0
Then the functions {cos nx, sin nx}n∈Z≥0 are orthogonal to each other. The coefficients under the
coordinate mapping is exactly the coefficients an , bn of the Fourier series.
Then the functions {einx }n∈Z are orthogonal to each other. The coefficients under the coordinate
mapping is exactly the coefficients cn of the complex Fourier series.
Let V be an inner product space (which can be infinite dimensional), and let U ⊂ V be a subspace.
U ⊥ is a subspace of V .
U ⊂ (U ⊥ )⊥ .
-67-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections
Let u ∈ V . For any vector b ∈ V , we can project to u “perpendicularly”. In other words, we have
b = b∥ + b⊥
such that Proju (b) := b∥ is parallel to u (i.e. multiple of u), while b⊥ is perpendicular to u.
b⊥
u
e Proju (b)
⟨b, u⟩
Proju (b) := ⟨b, e⟩e = u
⟨u, u⟩
u
where e := is the unit normalization of u.
∥u∥
Proof. Let Proju (b) = ce, i.e. b = ce + b⊥ . Take inner product with e to get c = ⟨b, e⟩.
By straightforward induction, the Gram–Schmidt Process gives a simple algorithm to construct
an orthogonal basis from an arbitrary basis. It works for both K = R or C.
Theorem 4.19 (Gram–Schmidt Process). Let dim V = n and {x1 , ..., xn } be a basis for V .
Define
u1 := x1
u2 := x2 − Proju1 (x2 )
u3 := x3 − Proju1 (x3 ) − Proju2 (x3 )
..
.
un := xn − Proju1 (xn ) − Proju2 (xn ) − · · · − Projun−1 (xn )
where
⟨x, u⟩
Proju (x) := u
⟨u, u⟩
is the orthogonal projection (see Proposition 4.18).
-68-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections
In particular,
Corollary 4.20. Any finite dimensional inner product space has an orthonormal basis.
We can now establish the following important result, which gives a canonical choice (i.e. indepen-
dent of basis) of direct sum complement (see Theorem 1.30) if an inner product exists.
v = v∥ + v⊥
V = U ⊕ U ⊥.
In particular U ∩ U ⊥ = {0}.
Im(ProjU ) = U, Ker(ProjU ) = U ⊥ .
Proof. Let B = {u1 , ..., ur } be an orthonormal basis of U (which exists by Corollary 4.20.)
Let v∥ = c1 u1 + · · · + cr ur .
ci = ⟨v, ui ⟩.
Therefore
-69-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections
Remark. V can be infinite dimensional. Theorem 4.21 only requires U to be finite dimensional.
Note. The uniqueness statement says that the orthogonal decomposition, i.e. the formula for v∥
does not depend on the choice of basis B of U used in the proof.
U⊥
v⊥ v
u2 Proju (v)
2
u1
Proju (v)
1
(U ⊥ )⊥ = U.
By choosing an orthonormal basis with respect to the dot product, we can represent the projection
ProjU by a matrix:
-70-
Chapter 4. Inner Product Spaces 4.3. Orthogonal Projections
Proposition 4.23. If {u1 , ..., ur } is an orthonormal basis for U ⊂ Rn with respect to the dot
product, then
ProjU (x) = (x · u1 )u1 + · · · + (x · ur )ur .
| |
Equivalently, if P = u 1 · · · u r
is an n × r matrix, then
| |
The matrix PPT is an n × n matrix, and by uniqueness it does not depend on the choice of
orthonormal basis used to construct P. In fact, it is an orthogonal projection matrix:
M2 = M.
MT = M.
1 1
then {u1 , u2 } is an orthogonal
Example 4.11. If U = Span(u1 , u2 ) where u1 = 0 , u2 = 1
1 −1
basis for U since u1 · u2 = 0.
√1 √1
u1 2 u2 13
The normalization = 0 , = √ is then an orthonormal basis for U . We have
∥u1 ∥ ∥u2 ∥ 3
√1 − √13
2
√1 √1
2 3
P= √1
0 3
√1 − √13
2
and therefore
√1 √1 5 1 1
2 3 √1 0 √1 6 3 6
ProjU = PPT =
√1 2 2 = 1 1
− 13
0 3 √1 √1 − √13 3 3 .
1
√1 − 3
√ 1 3 3
6 − 13 5
6
2
-71-
Chapter 4. Inner Product Spaces 4.4. Orthogonal Matrices
Definition 4.26. A linear transformation T ∈ L (V, W ) between inner product spaces is called an
isometry if it preserves distance: ∥T v∥ = ∥v∥
for any vector v ∈ V .
(3) T is injective.
Notation. We may omit the subscript of the inner product later if it is clear from the context.
Proof.
-72-
Chapter 4. Inner Product Spaces 4.4. Orthogonal Matrices
Proof. By the Gram–Schmidt Process, V has an orthonormal basis B = {u1 , ..., un }. Then the
coordinate mapping (see Theorem 2.11)
ψV,B : V −→ Rn
ui 7→ ei
is the required isomorphism, which is clearly an isometry.
We say that a matrix A represents T between inner product spaces if it is with respect to
orthonormal bases and the dot product of Euclidean space under the above isomorphism.
Definition 4.30. If n = m, the square matrix P corresponding to a linear isometry under the
isometric isomorphism is called an orthogonal matrix . It is invertible with
P−1 = PT .
Theorem 4.31. The set of orthogonal matrices forms a group (see Definition 1.2), i.e.
In×n ∈ O(n).
-73-
Chapter 4. Inner Product Spaces 4.4. Orthogonal Matrices
Mirror reflections along the line with slope tan 2θ passing through the origin:
!
cos θ sin θ
P= .
sin θ − cos θ
In R3 , any linear isometry can be decomposed into a rotation in the xy plane, followed by a rotation
in the yz plane, and possibly followed by a mirror reflection. This generalizes to higher dimension.
so that
det(P) = ±1.
Alternatively, note that the columns of P form a unit cube (length 1 and orthogonal to each other).
Therefore in terms of volume (see Chapter 3.4) the scaling factor is | det(P)| = 1.
The sign indicates whether it consists of a mirror reflection or not (i.e. whether the orientation
of the image is flipped). The set of all orthogonal matrices P with det P = 1 also forms a group,
called the special orthogonal group, denoted by SO(n).
Example 4.14. The n × m matrix P from Proposition 4.23 such that the projection ProjU = PPT
is an isometry: it consists of orthonormal columns.
Non-Example 4.15. ProjU is in general not an isometry: it may not preserve lengths.
-74-
Chapter 4. Inner Product Spaces 4.5. QR Decomposition
4.5 QR Decomposition
The Gram–Schmidt Process implies the following factorization, which is very important in compu-
tation algorithms, best linear approximations, and eigenvalues decompositions.
Theorem 4.32 (QR Decomposition). If A is an m×n matrix with linearly independent columns,
then
A = QR
where
Q is an m × n matrix with orthonormal columns forming a basis for ColA.
It is obtained from the columns of A by the Gram–Schmidt Process.
for some scalars cij ∈ R. i.e. xk is a linear combination of {u1 , ..., uk } where the coefficient
of uk equals 1.
-75-
Chapter 4. Inner Product Spaces 4.5. QR Decomposition
-76-
Chapter 4. Inner Product Spaces 4.6. Least Square Approximation
R = QT A
1
1 0 0
0 0 1 1 1 1
2
1
1 0
√1
0
= 0 −3 1 1 1
12
1 1 1
0 0 √1 0 −2 1 1
6 1 1 1
3
2 2 1
√3 √2 .
0
= 12 12
0 0 √2
6
Note that the diagonal of R is the same as the length of ∥u1 ∥, ∥u2 ∥, ∥u3 ∥ used for the normalization
in Part 2.
Ax = b
If A has linear independent columns, we can use the QR decomposition to find a best approximation
x
b to the solution of Abx = b such that ∥Abx − b∥ is the smallest.
By the Best Approximation Theorem, the closest point Ax ∈ ColA to b ∈ Rm should be ProjColA b.
But ColA has orthonormal basis given by columns of Q, so ProjColA = QQT . Hence
x = QQT b.
Ab
Using A = QR we obtain:
-77-
Chapter 4. Inner Product Spaces 4.6. Least Square Approximation
Theorem 4.33 (Least Square Approximation). If A is an m × n matrix with m > n and has
linear independent columns, such that we have the QR decomposition A = QR, then
b = R−1 QT b ∈ Rn
x
∥Ab
x − b∥ ≤ ∥Ax − b∥.
1
2
Example 4.17. Continue with Example 4.16, if b = , then the closest point x
b such that
3
4
∥Ax − b∥ is the smallest will be given by
x = QT b
Rb
3
1
1 1 1
1
2 2 1 x1 2 2 2 2
√5
2
√3 √2 x2 = − √3 √1 √1 1
0 √ = 3 .
12 12 12 12 12 12
2
3 3
0 0 √
6
x 3 0 − √26 √1
6
√1
6
√
6
4
This is a very simple system of linear equations, and we can solve for x
b to get
x1 1
x
b= x2 = 1 .
3
x3 2
Therefore
1 0
0
2
1 1 0 1
1 4
x=
Ab 1 =
1 1 1 3 2 7
2
1 1 1 7
1
2
is the closest approximation in ColA to .
3
4
-78-
Chapter 4. Inner Product Spaces 4.7. Gram Determinants
Remark. It is called the least square because ∥u∥ is computed by summing the squares of the coordinates,
and we want to find the smallest value of ∥Ax−b∥. This is very useful in regression problems in statistics,
where we want to fit the data onto a linear model as closely as possible.
Another application of QR decomposition is to give a formula for the volume of the parallelepiped
spanned by k vectors inside an n-dimensional vector space (k ≤ n).
Theorem 4.34. The k dimensional volume of the parallelepiped P spanned by {v1 , ..., vk } in Rn
is given by q
Vol(P ) = det(AT A).
| |
where A = v 1 · · · v k
∈ Mn×k (R).
| |
The matrix AT A ∈ Mk×k (R) is known as the Gram matrix (or Gramian) of {v1 , ..., vk }, and
det(AT A) is called the Gram determinant.
Proof.
hence det(AT A) = 0.
Otherwise write A = QR, which means column operations changing the columns of A into
orthonormal columns, which has volume 1.
On the other hand, det(AT A) = det(RT QT QR) = det(RT R) = det(R)2 since R is upper
triangular.
-79-
Chapter 4. Inner Product Spaces 4.7. Gram Determinants
| |
Example 4.18. For two vectors u, v ∈ Rn , we have A =
u v, and the Gram matrix is
| |
!
T u·u u·v
A A= .
v·u v·v
Hence
p p
Vol(P ) = ∥u∥2 ∥v∥2 − (u · v)2 = ∥u∥2 ∥v∥2 (1 − cos2 θ) = ∥u∥∥v∥ sin θ, (0 ≤ θ ≤ π)
as expected.
-80-
CHAPTER 5
Spectral Theory
In this chapter we will learn the important notion of eigenvectors and eigenvalues of matrices and
linear transformations in general. The study of eigenspaces of general linear transformations is
known as Spectral Theory.
In this chapter V = W over any field K, so that T ∈ L (V, V ) and T is represented by a square
matrix A if dim V < ∞.
T u = λu
The space
Vλ := {u : T u = λu} ⊂ V
is called the eigenspace of the eigenvalue λ.
-81-
Chapter 5. Spectral Theory 5.1. Eigenvectors and Eigenvalues
Remark. eigen- from the German word meaning “own”, “characteristic”, “special”.
In particular, any linear combinations of eigenvectors with eigenvalue λ is again an eigenvector with
eigenvalue λ if it is a nonzero vector.
Proof. Vλ ̸= {0} ⇐⇒ dim Ker(T −λI) ̸= 0 ⇐⇒ T −λI not invertible by the Rank-Nullity Theorem.
Hence the general strategy to find eigenvalues and eigenvectors (when dim V < ∞) is:
Step 2. For each eigenvalue λ, find the eigenspace by solving the linear equations (A − λI)x = 0.
Any nonzero vector of the eigenspace will be an eigenvector.
!
1 1
Example 5.1. Let A = . To find the eigenvalues,
4 1
!
1−λ 1
det = (1 − λ)(1 − λ) − 4 = λ2 − 2λ − 3 = (λ − 3)(λ + 1) = 0,
4 1−λ
hence λ = 3 or λ = −1.
! ! !
−2 1 x 1
For λ = 3, we have = 0 =⇒ Span is the eigenspace for λ = 3.
4 −2 y 2
! ! !
2 1 x 1
For λ = −1, we have = 0 =⇒ Span is the eigenspace for λ = −1.
4 2 y −2
-82-
Chapter 5. Spectral Theory 5.1. Eigenvectors and Eigenvalues
d
Example 5.2. Let T = on the space of real-valued smooth functions C ∞ (R). By solving
dx
f ′ (x) = λf (x)
Proposition 5.4. The eigenvalues of a triangular matrix (in particular diagonal matrix) are
given by the entries on its main diagonal.
1 1 1
Example 5.3. A =
0 2 2 has eigenvalues λ = 1, 2, 3.
0 0 3
3 0 0
Example 5.4. A =
0 1 0 has eigenvalues λ = 1, 3 only.
0 0 1
1 0 0
V3 = Span 0 is 1-dimensional, while V1 = Span{1 , 0} is 2-dimensional.
0 0 1
3 0 0
Example 5.5. A = 0 1 1 has eigenvalues λ = 1, 3 only.
0 0 1
1 0
V3 = Span 0 is 1-dimensional, but V1 = Span 1
is also only 1-dimensional.
0 0
-83-
Chapter 5. Spectral Theory 5.1. Eigenvectors and Eigenvalues
d
Example 5.6. The operator T = on R2 [t] is represented in the standard basis {1, t, t2 } by
dt
0 1 0
A= 0 0 2 .
0 0 0
Hence it only has a single eigenvalue λ = 0 with the constant polynomials as eigenvectors.
c1 λ1 v1 + · · · + cr λr vr = 0.
-84-
Chapter 5. Spectral Theory 5.2. Characteristic Polynomials
Recall that det(T ) is defined to be det(A) for any matrix A representing T , and it is independent
of the choice of A since similar matrices have the same determinant.
Hence everything below are defined intrinsically to T without any references to matrices.
p(λ) := det(T − λI )
It has the advantage that p(λ) is monic (i.e. with leading coefficient 1), which is mathematically
more natural.
It has the disadvantage that you have to negate every entry of the matrix to compute p(λ).
But otherwise they are the same up to a sign, and gives the same equation for λ.
Definition 5.8.
-85-
Chapter 5. Spectral Theory 5.2. Characteristic Polynomials
3 0 0
Example 5.8. If A =
0 1 1, then the characteristic polynomial is
0 0 1
3−λ 0 0
= (1 − λ)2 (3 − λ).
p(λ) = det(A − λI) = det
0 1 − λ 1
0 0 1−λ
A field K is algebraically closed if any polynomial of degree n has exactly n roots (counted with
multiplicities). By the Fundamental Theorem of Algebra, K = C is algebraically closed.
Tr(A) is the sum of all (complex) eigenvalues (counted with algebraic multiplicities).
det(A) is the product of all (complex) eigenvalues (counted with algebraic multiplicities).
and we can take this as the definitions of Tr(T ) and det(T ), independent of the choice of A.
-86-
Chapter 5. Spectral Theory 5.3. Diagonalization
Since everything is defined in terms of T only, from the previous remark we conclude that:
5.3 Diagonalization
A = PDP−1
Diagonalization let us simplify many matrix calculations and prove algebraic theorems due to the
following properties:
Ak = PBk P−1 .
From the geometric point of view, if both A, B represent T in different bases, then clearly both of
p(A), p(B) represent p(T ) in those bases.
-87-
Chapter 5. Spectral Theory 5.3. Diagonalization
Example 5.9. If A is diagonalizable, then it is easy to compute its powers by Proposition 5.13:
Ak = PDk P−1 .
!
4 −3
For example, let A = . Then A = PDP−1 where
2 −1
! ! !
3 1 2 0 1 −1
P= , D= , P−1 = .
2 1 0 1 −2 3
A8 = PD8 P−1
! ! !
3 1 256 0 1 −1
=
2 1 0 1 −2 3
!
766 −765
= .
510 −509
A = PDP−1
iff A has n linearly independent eigenvectors (i.e. K n has an eigenbasis of A). In this case:
Proof. If A is diagonalizable, then since P is invertible, its columns are linearly independent.
We have
| | | | | | λ1
AP = A
..
v1 · · · vn = λ1 v1 · · · λn vn = v1 · · · vn
. = PD.
| | | | | | λn
Hence the columns of P are the eigenvectors and D consists of the corresponding eigenvalues.
Since similar matrices represent the same linear transformation T with respect to different basis,
we say that
Since eigenvectors with distinct eigenvalues are linearly independent, by Theorem 1.35,
V = Vλ1 ⊕ · · · ⊕ Vλr
3 −2 4
Example 5.10. Let us diagonalize A =
−2 6 .
2
4 2 3
Step 2: Find eigenvectors. We find by usual procedure the linearly independent eigenvectors:
1 −1 −2
λ = 7 : v1 =
0 , v2 = 2 ,
λ = −2 : v3 = −1 .
1 0 2
-89-
Chapter 5. Spectral Theory 5.3. Diagonalization
We have seen in Theorem 5.5 that if the eigenvectors have different eigenvalues, then they are
linearly independent. Therefore by the Diagonalization Theorem,
3 4 5
Example 5.11. The matrix A = 0 0 7 is triangular, hence the eigenvalues are the diagonal
0 0 6
entries λ = 3, λ = 0 and λ = 6. Since they are all different, A is diagonalizable.
3 0 0
Non-Example 5.12. We have seen from Example 5.5 that the matrix A = 0 1 1 has two
0 0 1
eigenvalues λ = 1, 3 only, so we cannot apply Corollary 5.17. In fact, each of the eigenvalue only
correponds to 1-dimensional eigenvectors. Hence R3 does not have a basis formed by eigenvectors
and so it is not diagonalizable by the Diagonalization Theorem.
Corollary 5.18. If T is diagonalizable, the algebraic multiplicity equals the geometric multiplicity
for each eigenvalue λ.
Conversely, if the algebraic multiplicity equals the geometric multiplicity, and the algebraic multi-
plicities add up to dim V = n, then T is diagonalizable.
Example 5.13. Let p(λ) be the characteristic polynomial of A with the eigenvalues λi as roots.
λ1
If A = PDP−1 is diagonalizable, then since D =
..
,
.
λn
p(λ1 )
p(D) = ..
=O
.
p(λn )
where O is the zero matrix. By Proposition 5.13 we conclude that
p(A) = P · p(D) · P−1 = O.
The general form of this statement is the Cayley–Hamilton Theorem (see Theorem 8.8).
-90-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices
In this Section, we focus on V = Rn with the standard dot product on Rn to learn about the main
structural results about diagonalizable matrices.
We will see in Chapter 7 that most results generalize to general (complex) inner product spaces.
AT = A.
Equivalently,
u · Av = Au · v, ∀u, v ∈ Rn .
The first important property of symmetric matrix is the orthogonality between eigenspaces.
Theorem 5.21. If A is symmetric, then different eigenspaces are orthogonal to each other.
-91-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices
3 −2 4
Example 5.14 (Example 5.10 cont’d). We have diagonalize the matrix A =
−2 6 before.
2
4 2 3
But the matrix P we found was not an orthogonal matrix.
Theorem 5.23. If A is a real symmetric n × n matrix, then all the eigenvalues are real.
In particular the characteristic polynomial p(λ) has n real roots (counted with multiplicities).
Note. The statement is not true for complex symmetric matrices, e.g. take a 1 × 1 matrix i .
In particular, it implies that an eigenvector always exists for real symmetric matrix.
Proof. Consider A as a complex matrix with real entries, we show that its eigenvalues are real.
Note that we need to use the complex dot product to ensure that v · v ≥ 0 is real.
v · Av = v · λv = λv · v = λv · v.
Since a real eigenvector exists, this allows us to construct a full set of them by induction:
The collection of Theorems 5.21, 5.23 and 5.24 are known as the Spectral Theorem for
Symmetric Matrices.
Hence we know that it is orthogonally diagonalizable, without even calculating its eigenvalues
or eigenvectors!
√
(The eigenvalues are given by λ = 0 (with multiplicity 2) and 8 ± 2 21, all of them are real.)
-93-
Chapter 5. Spectral Theory 5.4. Symmetric Matrices
Proof.
u · Au⊥ = Au · u⊥ = λu · u⊥ = 0
so that Au⊥ ∈ U ⊥ .
– Extend {v} to an orthonormal basis B = {v, v1 , ..., vn−1 } of Rn = U ⊕ U ⊥ where
vi ∈ U ⊥ .
– Since A : U −→ U and U ⊥ −→ U ⊥ , with respect to B, [A]B is a block diagonal matrix
of the form !
λ 0
[A]B =
0 B
which is still symmetric since the change of basis is orthogonal.
– In particular the restriction B := A|U ⊥ is a symmetric (n − 1) × (n − 1) matrix, so that
it is orthogonally diagonalizable by induction.
– Since Rn = U ⊕ U ⊥ , the (orthonormal) eigenvectors of B, together with v, form an
orthonormal eigenbasis of [A]B . Reverting the change of basis we get an orthonormal
eigenbasis of A.
By the remark before Definition 5.22, the matrix P formed by these eigenvectors orthogonally
diagonalize A.
-94-
CHAPTER 6
We know that not all matrices can be diagonalized. In this chapter, we derived a simple alternative
approach called the Singular Value Decomposition (SVD), which can be applied even to
rectangular matrices! This method is also extremely important in data analysis.
Remark. Everything works for K = C as well with minor modifications. (See the Summary in Chapter 7.)
Definition 6.1. A bilinear form on V is a (real-valued) function f (x, y) in two variables that is
linear in both arguments x, y ∈ V .
is a bilinear form on Mn×n (R) called the Killing form (named after the Mathematician Wilhelm
Killing), which is very important in Lie Theory.
-95-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices
f (x, y) = xT Ay
Proof. Expand the bilinear form in terms of the standard basis by linearity, we get
(A)ij = f (ei , ej )
With this terminology, the symmetric and positivity properties of inner product implies that
⟨x, y⟩ = xT Ay
c1
Example 6.4. The diagonal matrix A = ..
with all ci > 0 corresponds to the inner
.
cn
product on Rn
⟨x, y⟩ = c1 x1 y1 + · · · + cn xn yn
discussed in Example 4.2.
-96-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices
! !
9 0 x
Example 6.5. Q(x) := x y = 9x2 + y 2 is positive definite.
0 1 y
!
5 4
Example 6.6. Let A = . Then Q(x) := xT Ax = 5x2 + 8xy + 5y 2 is positive definite.
4 5
We can see that its level sets are represented
! by ellipses as
! follows: We can diagonalize the matrix
9 0 1 1
by A = PDPT where D = and P = √12 . Then
0 1 1 −1
Q(b x2 + yb2
x) = 9b
PT = (PB
E)
−1
E B
Theorem 6.5. A symmetric matrix A is positive (semi)definite if and only if λi > 0 (λi ≥ 0) for
all the eigenvalues of A.
-97-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices
gives V the structure of a Minkowski space, which is very important in Theory of Relativity.
Since any real symmetric matrix can be diagonalized, we can always find a “square root” of A if it
is positive (semi)definite.
B2 = A.
√
We call B the square root of A and is denoted by A.
-98-
Chapter 6. Positive Definite Matrices and SVD 6.1. Positive Definite Matrices
A = λI on Vλ , so B2 = λI on Vλ .
√
Since B is positive (semi)definite, it can only have λ ≥ 0 as eigenvalue.
√
Since B is diagonalizable, it must equal λI on Vλ .
As a consequence, we have
√
Corollary 6.9. If B commutes with a positive semidefinite matrix A, then it commutes with A.
This is used in the construction of Singular Value Decomposition in the next section.
λ∥v∥2 = v · AT Av = ∥Av∥2 ≥ 0
-99-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions
Rank of AT A ≤ m < n
! ! !
5 4 9 0 1 1 1
Example 6.7. Let A = =P PT where P = √ . Then
4 5 0 1 2 1 −1
! !
√ 3 0 T 2 1
A=P P = .
0 1 1 2
1 3
We observe that if we pick the polynomial p(t) = t + so that p(9) = 3 and p(1) = 1, then indeed
4 4
we have
1 3 √
p(A) = A + I = A.
4 4
! ! !
2
1 −5 T 5 4 2 1
Let B = . Then B B = , therefore by above |B| = .
−2 − 11 5 4 5 1 2
Recall that if v is an eigenvector with Av = λv, the effect is “stretching by the factor λ” along the
direction v. We want to consider all such directions if possible, even for rectangular matrices.
The image of B in Rm under A clearly can be normalized and extended to an orthonormal basis
B ′ of Rm of the target space.
-100-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions
More precisely, we want to find orthogonal matrices V ∈ O(n) (formed by B) and U ∈ O(m)
(formed by B ′ ) such that (
σi ui 1 ≤ i ≤ r
Avi =
0 r+1≤i≤n
or rewriting in terms of matrices:
AV = UΣ.
for some quasi-diagonal matrix Σ of size m × n and rank r ≤ m, n:
r n−r
!
Σ= D O r
O O m−r
σ1 ≥ σ2 ≥ · · · ≥ σr > 0.
U is an m × m orthogonal matrix.
V is an n × n orthogonal matrix.
-101-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions
is an orthogonal diagonalization.
| |
Hence we take V =
v1 · · · vn where the columns are given by the orthonormal eigen-
| |
T
basis {v1 , ..., vn } of A A (which exists by the Spectral Theorem of Symmetric Matrices).
Taking transpose, this also implies the columns of U are the orthonormal eigenbasis of AAT .
! 2 2 0
1 1 1
. Then AT A =
Example 6.8. Let A = 2 2 0 and it has eigenvalues
1 1 −1
0 0 2
λ1 = 4, λ2 = 2, λ3 = 0
Therefore
1 0 −1
1
V= √ 1 0 1.
2 √
0 2 0
√ √ √
Also σ1 = λ1 = 2, σ2 = λ2 = 2. Therefore
!
2 0 0
Σ= √ .
0 2 0
-102-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions
Finally !
Av1 Av1 1 1
u1 = = =√ ,
∥Av1 ∥ σ1 2 1
!
Av2 Av2 1 1
u2 = = =√ .
∥Av2 ∥ σ2 2 −1
Therefore !
1 1 1
U= √
2 1 −1
and
! √1 √1 0
√1 √1 2 0 0 2 2
A = UΣVT =
2 2
√ 0 0 1
√1 − 12
√ 0 2 0 1
2 − √2 √1 0
2
A
−−−→
Theorem 6.13. For any real square matrix A, we have the polar decomposition
A = PH
z = reiθ .
This means that any linear transformation can be represented by scaling in some directions, followed
by rotations and mirror reflections.
-103-
Chapter 6. Positive Definite Matrices and SVD 6.2. Singular Value Decompositions
One useful application of SVD is the description of the bases of the fundamental subspaces.
Theorem 6.14. Let A be m × n matrix with rank r, and A = UΣVT be the SVD.
| | | |
u1 · · · um and V = v1 · · · vn . Then
Assume U =
| | | |
This allows us to derive another application of SVD for the least square approximation which
works like the example from QR decomposition in Section 4.6.
Computation using SVD is usually more efficient because finding orthonormal eigenbasis for a
symmetric matrix is easy. But of course, it always depends on the specifics of the problem itself.
| | | |
Definition 6.15. Let Ur = u1 · · · ur , Vr = v1 · · · vr be the submatrix consisting of
| | | |
the first r columns. Then
! !
D O VTr
A = Ur ∗ = Ur DVTr .
O O ∗
A+ := Vr D−1 UTr .
Theorem 6.16. Given the equation Ax = b, the least square solution is given by
b = A+ b = Vr D−1 UTr b.
x
x = AA+ b = ProjColA b, Ab
Proof. Since Ab x is the closest point to b in ColA.
-104-
CHAPTER 7
Complex Matrices
If the field is K = C, most of the results from Chapter 4 and Section 5.4 carries over to general
inner product spaces with minor modifications.
From another point of view, most of the results for K = R in the previous chapters are just
the special cases of the results in K = C where we just ignore the complex conjugations.
Note. Another mainstream approach is to consider complexification of real vector spaces, see
Appendix C for more details.
Recall that a complex inner product (also called Hermitian inner product) satisfies
Remark. A function with two arguments which is linear in one and conjugate-linear in another is known
as a sesquilinear form, from the Latin prefix “sesqui-” meaning “one and a half”.
Note. In many physics books, the complex inner product is written with Dirac’s bra-ket notation:
⟨x|y⟩ := ⟨y, x⟩
With this notation, the inner product is conjugate-linear in the first argument instead.
-105-
Chapter 7. Complex Matrices 7.1. Adjoints
7.1 Adjoints
If T ∗ exists, it is unique.
The Gram–Schmidt Process still works for K = C with exactly the same formula. It implies that
An orthonormal basis always exists for complex finite dimensional inner product space.
Proof. (Existence.)
Assume
v′ := c1 u1 + · · · + cn un ∈ V.
ci = ⟨T ui , w⟩W = ⟨w, T ui ⟩W .
Hence by linearity, the map defined by T ∗ (w) := v′ satisfies the required condition.
(Uniqueness.)
-106-
Chapter 7. Complex Matrices 7.1. Adjoints
Example 7.2. For an infinite dimensional example, consider the space of all complex 2π-periodic
smooth functions similar to Example 4.10 with inner product
Z 2π
⟨f, g⟩ := f (x)g(x)dx.
0
Proposition 7.3. If S ∈ L (U, V ) and T, T ′ ∈ L (V, W ) such that their adjoints exist, then
(c · T )∗ = c · T ∗ , ∀c ∈ C.
′ ∗ ∗ ′∗
(T + T ) = T + T .
(T ∗ )∗ = T.
(T ◦ S)∗ = S ∗ ◦ T ∗ .
In the case of Cn with the standard dot product, the formula in the proof of Proposition 7.2 implies
that if T is represented by A, then T ∗ is represented by the conjugate transpose A∗ :
a11 ··· a1n
. .. ..
Definition 7.4. Let A = .. . . be an m × n complex matrix.
am1 ··· amn
Then the Hermitian adjoint is an n × m complex matrix given by
a11 ··· am1
∗ . T .. ..
A := A = .. . . .
a1n ··· amn
That is, we take the transpose and conjugate every matrix entry:
∗
aij := aji .
-107-
Chapter 7. Complex Matrices 7.1. Adjoints
det(A∗ ) = det(A).
T
Proof. det(A∗ ) = det(A ) = det(A) = det(A).
By Proposition 7.3,
T = T ∗.
Example 7.3. Taking the standard dot product on Rn , self-adjoint operators are represented by
symmetric matrices
AT = A.
Example 7.4. Taking the standard dot product on Cn , self-adjoint operators are represented by
Hermitian matrices
A∗ = A.
d
Example 7.5. The operator i in the setting of Example 7.2 is self-adjoint.
dx
P2 = P and P ∗ = P.
Most results about symmetric operators from Section 5.4 carry over:
-108-
Chapter 7. Complex Matrices 7.2. Unitary Matrices
Proof. All the proofs are the same as in Section 5.4 with u · v replaced by ⟨u, v⟩.
U∗ U = I
U−1 = U∗ .
In×n ∈ U (n).
Definition 7.12. A is unitarily equivalent to B if there exists a unitary matrix U such that
A = UBU∗ .
-109-
Chapter 7. Complex Matrices 7.2. Unitary Matrices
Proposition 7.13. The determinant of a unitary matrix U is a complex number with norm 1:
| det(U)| = 1.
Theorem 7.15 (Schur’s Lemma). Any complex square matrix is unitarily equivalent to an
upper triangular matrix.
Note. Using the same proof, if A is a real matrix with real eigenvalues only, then it is orthogo-
nally similar to a real upper triangular matrix.
Proof. The idea of the proof is very similar to that of Theorem 5.24.
By induction, if n = 1 it is trivial. Hence assume n > 1 and the statement is true for any
(n − 1) × (n − 1) matrices.
The matrix U′ with columns B is unitary, and we have the block form
λ ∗ ··· ∗
0
′∗ ′
U AU = .
.. A
n−1
0
By induction, there exists unitary matrix Un−1 such that U∗n−1 An−1 Un−1 = Tn−1 is upper
triangular.
-110-
Chapter 7. Complex Matrices 7.2. Unitary Matrices
1 0 ··· 0
0
′′
Then U := . is also unitary, hence U = U′ U′′ is unitary and
.. U
n−1
0
λ ∗ ··· ∗
0
∗
U AU = .
.. T
n−1
0
is upper triangular.
Finally, concerning similarity, it turns out that we do not need to care about the underlying fields.
Intuitively, an eigenvector u + iv ∈ Cn with real eigenvalues gives real eigenvectors u, v ∈ Rn .
Furthermore, if they are unitarily equivalent, then they are actually orthogonally similar. i.e.
AU = UB.
AX = XB, AY = YB.
Since U is invertible, det(X + iY) ̸= 0, so the polynomial q(λ) := det(X + λY) is nonzero.
-111-
Chapter 7. Complex Matrices 7.3. Hermitian Matrices
Then we have
AT U = UBT =⇒ AT R = RBT =⇒ RT A = BRT .
Consider polar decomposition R = PH where P is orthogonal and H is positive definite.
(Note: R is invertible =⇒ H is invertible.)
Since RT R = (PH)T (PH) = HPT PH = H2 , it follows that
BH2 = BRT R = RT AR = RT RB = H2 B.
Since H2 is also positive definite, by Corollary 6.9 BH = HB as well.
Hence
APH = AR = RB = PHB = PBH.
Since H is invertible, AP = PB for an orthogonal matrix P as desired.
Recall that A is Hermitian if A∗ = A. The exact same discussion as in symmetric matrix leads to
the following:
Theorem 7.9 specializes to the Spectral Theorem of Hermitian Matrices similar to the one
of symmetric matrices (see Theorem 5.21, 5.23 and 5.24):
Conversely, if A is unitarily diagonalizable and all eigenvalues are real, then A is Hermitian.
-112-
Chapter 7. Complex Matrices 7.4. Normal matrices
!
1 1+i
Example 7.6. A = is Hermitian. It has eigenvalues λ = 3, 0 with eigenvectors
1−i 2
! !
1+i −1 − i
, respectively. Normalizing, we have the diagonalization
2 1
! ! 1+i −1
1+i
1 1+i √ − 1+i
√ 3 0 √ − 1+i
√
= 6 3 6 3 .
1−i 2 √2 √1 0 0 √2 √1
6 3 6 3
Note. If the eigenvalues are not real, A cannot be Hermitian. Therefore unitarily diagonalizable
does not imply Hermitian. We need another characterization of unitarily diagonalizable matrices.
T T ∗ = T ∗ T.
Symmetric and orthogonal matrices are normal when they are considered as complex matrix.
-113-
Chapter 7. Complex Matrices 7.4. Normal matrices
Proof.
so v is also an eigenvector of T ∗ .
Proof. Diagonal matrix is obviously normal. Now assume T is an n × n upper triangular normal.
-114-
Chapter 7. Complex Matrices 7.4. Normal matrices
i.e. for dim V < ∞, V has an orthonormal eigenbasis of T ∈ L (V, V ) if and only if it is normal.
By direct calculation:
This also shows that over K = R, real normal matrix AAT = AT A may not be real diagonal-
izable.
-115-
Chapter 7. Complex Matrices 7.4. Normal matrices
Example 7.9. Since real orthogonal matrices are unitary, their eigenvalues satisfy |λ| = 1.
Solving the characteristic equation, λ can only be 1, −1 or some conjugate pairs e±iθk .
By Theorem 7.16, this is actually an orthogonal similarity. Therefore any orthogonal matrix is
composed of some mirror reflections and rotations with respect to an orthonormal basis of Rn .
A similar argument shows that although any real normal matrix may not be real diagoanlizable,
! block diagonal matrix with 1 × 1 diagonal entries and
it is always orthogonally similar to a real
a b
2 × 2 diagonal blocks of the form for some a, b ∈ R.
−b a
-116-
Chapter 7. Complex Matrices 7.4. Normal matrices
K=R K=C
Real inner product: −→ Complex inner product:
u · v = u1 v1 + · · · + un vn u · v = u1 v1 + · · · + un vn
(bilinear forms) −→ (sesquilinear forms)
SVD: −→ SVD:
A = P1 ΣPT2 A = U1 ΣU∗2
-117-
Chapter 7. Complex Matrices 7.4. Normal matrices
-118-
CHAPTER 8
Invariant Subspaces
In this chapter, we study the invariant subspaces of a linear transformation, which reveals more of
its structure through its characteristic polynomial.
T (U ) ⊂ U
i.e. u ∈ U =⇒ T (u) ∈ U .
d
Example 8.3. The space of polynomials R[t] is a -invariant subspace of C ∞ (R).
dt
-119-
Chapter 8. Invariant Subspaces 8.1. Invariant Subspaces
!
cos θ sin θ
Example 8.6. The only real invariant subspaces of a rotation are the trivial ones
− sin θ cos θ
unless θ is a multiple of π.
Example 8.7. In the proof of Theorem 5.24, we see that if A is symmetric and Vλ is an eigenspace,
then both Vλ and Vλ⊥ are invariant subspaces. This is a special case of Proposition 8.6 below.
The next example of invariant subspace is very important and deserves its own definition.
Uv is a T -invariant subspace.
Proof.
N
X
If u ∈ Uv , then u = ak T k (v) for some ak ∈ K. Then
k=0
N
X
T (u) = ak T k+1 (v) ∈ Uv .
k=0
-120-
Chapter 8. Invariant Subspaces 8.1. Invariant Subspaces
V = U1 ⊕ U2 ⊕ · · · ⊕ Uk .
We write
T = T1 ⊕ T2 ⊕ · · · ⊕ Tk .
Conversely, this means that V is a direct sum of T -invariant subspaces Ui = Dom(Ti ).
Proposition 8.5. Let dim V < ∞. Then with respect to a direct sum basis of V :
If V is an inner product space, we can say something about the adjoint as well.
U ⊥ is T ∗ -invariant.
If dim V < ∞ and T is normal, then U ⊥ is also T -invariant (so that U is also T ∗ -invariant).
-121-
Chapter 8. Invariant Subspaces 8.2. Cayley–Hamilton Theorem
Proof.
so that T ∗ (w) ∈ U ⊥ .
AA∗ + BB∗ = A∗ A.
Note that the diagonal entries (BB∗ )ii is just ∥ri ∥2 where ri is the i-th row of B.
Since trace is linear, Tr(BB∗ ) = 0, so that ∥ri ∥ = 0 for all i. i.e. B = O is a zero matrix.
For the rest of the chapter, let dim V = n over K be finite dimensional, and let A ∈ Mn×n (K) be
a matrix representing T ∈ L (V, V ).
and it does not depend on the choice of A (matrices representing T for different bases are similar).
Proposition 8.7. If U ⊂ V is a T -invariant subspace, then the characteristic polynomial p|U (λ)
of T |U divides p(λ). i.e.
p(λ) = p|U (λ)q(λ)
for some polynomial q(λ).
Proof. By Proposition 8.5 (2), represent T by a block triangular matrix, and apply Corollary 3.8.
-122-
Chapter 8. Invariant Subspaces 8.2. Cayley–Hamilton Theorem
p(T ) = O.
Proof. Let v ∈ V be any vector. Let U = Uv be the cyclic subspace of T , which is T -invariant.
-123-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials
p(T ) = O.
Although this polynomial tells us about the eigenvalues (and their multiplicities), it is sometimes
too “big” to tell us information about the structure of the linear map.
Definition 8.9. The minimal polynomial m(λ) is the unique polynomial such that
m(T ) = O
with leading coefficient 1, and has the smallest degree among such polynomials.
The condition “leading coefficient equals 1” also means we exclude the case of zero polynomial.
Since p(T ) = O, a minimal polynomial must exist.
Note. Since m(λ) is defined in terms of T only, it is the same for any matrix representing T .
deg(m) ≤ deg(p) = n.
2 0 0
Example 8.9. The diagonal matrix A =
0 2 0 has characteristic polynomial
0 0 2
p(λ) = (2 − λ)3
m(λ) = λ − 2.
In particular,
-124-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials
1 0 0
Example 8.10. The diagonal matrix A =
0 2 0 has characteristic polynomial
0 0 2
Since A is not a multiple of I, m(λ) has degree at least 2. Since (A−I)(A−2I) = O, the polynomial
m(λ) = (λ − 1)(λ − 2)
1 1 0
Example 8.11. The matrix A =
0 1 1 has characteristic polynomial
0 0 1
p(λ) = (1 − λ)3
and it turns out that the minimal polynomial is (up to a sign) the same also:
m(λ) = (λ − 1)3 .
Proposition 8.10. If p(λ) is any polynomial such that p(T ) = O, then m(λ) divides p(λ), i.e.
p(λ) = m(λ)q(λ)
Proof.
But since deg(r) < deg(m) and m is minimal, r must be the zero polynomial.
-125-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials
Proposition 8.11. The set of roots of m(λ) consist of all the eigenvalues of T .
Proof.
Since m(λ) divides the characteristic polynomial p(λ), it only has eigenvalues as roots.
0 = m(T )v = m(µ)v.
where pi (λ) and pj (λ) are relatively prime (i.e. have no common factors) for i ̸= j, then
Proof. The case k = 1 is trivial, while the general case follows directly by induction.
Hence we only need to prove the case k = 2.
By the Euclidean algorithm, there exists polynomials q1 (λ), q2 (λ) such that
i.e.
I = p1 (T )q1 (T ) + p2 (T )q2 (T ).
-126-
Chapter 8. Invariant Subspaces 8.3. Minimal Polynomials
Theorem 8.13. T is diagonalizable if and only if m(λ) only has distinct linear factors.
Proof.
V = Ker(T − λ1 I ) ⊕ · · · ⊕ Ker(T − λk I ).
But Ker(T − λi I ) = Vλi is just the eigenspace of T . So we have decomposed V into direct
sums of eigenspace (in particular V has an eigenbasis), hence T is diagonalizable.
(=⇒) If T is diagonalizable, let {u1 , ..., un } be an eigenbasis of V with distinct eigenvalues µ1 , ..., µk .
Then m(λ) = (λ − µ1 ) · · · (λ − µk ) is clearly the smallest polynomial containing all µi as roots,
and
m(T )ui = m(λi )ui = 0
since λi = µj for some j. So m(λ) is the minimal polynomial by Proposition 8.11.
Using this result, minimal polynomials allow us to determine whether a matrix is diagonalizable or
not without even calculating the eigenspaces!
!
−1 1
Example 8.12. The matrix A = has characteristic polynomial p(λ) = (λ − 1)2 . Since
−4 3
m(λ) ̸= λ − 1 because A ̸= I, we must have m(λ) = (λ − 1)2 , hence A is not diagonalizable.
-127-
Chapter 8. Invariant Subspaces 8.4. Spectral Theorem of Commuting Operators
−1 1 0
2
Example 8.13. The matrix A = −4 3 0 has characteristic polynomial p(λ) = −λ(λ − 1) ,
−1 0 0
hence it has eigenvalues λ = 1 and λ = 0. The minimal polynomial can only be λ(λ−1) or λ(λ−1)2 .
Since
A(A − I) ̸= O
the minimal polynomial must be m(λ) = λ(λ − 1)2 , hence A is not diagonalizable.
2 −2 2
2
Example 8.14. The matrix A = 0 −2 4 has characteristic polynomial p(λ) = −λ(λ − 2) ,
0 −2 4
hence it has eigenvalues λ = 2 and λ = 0. The minimal polynomial can only be λ(λ−2) or λ(λ−2)2 .
Since
A(A − 2I) = O
the minimal polynomial is m(λ) = λ(λ − 2), hence A is diagonalizable.
One of the most important results in Linear Algebra says that commuting linear operators can be
simultaneously diagonalized, i.e. they can be diagonalized at the same time by a single basis.
Proof.
If T is diagonalizable, m(λ) only has distinct linear factors. T |U satisfies m(T |U ) = 0 also, so
m|U (λ) divides m(λ) by Proposition 8.10, and m|U (λ) also only has distinct linear factors.
-128-
Chapter 8. Invariant Subspaces 8.4. Spectral Theorem of Commuting Operators
Theorem 8.15 (Spectral Theorem of Commuting Operators). Let {Ti }ki=1 be a finite set of
diagonalizable linear transformations in L (V, V ). Then
Ti Tj = Tj Ti , ∀i, j
if and only if they can be simultaneously diagonalized, i.e. there exists a single basis B of V
such that they are eigenvectors for all Ti .
Proof. Assume the Ti ’s commute. We proceed by induction, where the k = 1 case is trivial.
Ti |Vλj , i = 1, ..., k − 1
But Vλj is eigenspace of Tk . Therefore this is a simultaneous eigenbasis for all Ti for i = 1, ..., k.
Collecting all the basis vectors for different Vλj gives us the basis B.
On the other hand, if they can be simultaneously diagonalized, let {u1 , ..., un } be a common
eigenbasis. Then
Ti Tj uk = λTi uk = λλ′ uk
Tj Ti uk = λ′ Tj uk = λ′ λuk
-129-
Chapter 8. Invariant Subspaces 8.4. Spectral Theorem of Commuting Operators
Corollary 8.16. The Spectral Theorem of Commuting Operators also holds for an infinite collec-
tion {Ti } of commuting operators.
Proof. Since L (V, V ) is finite dimensional, every Ti is a linear combination of a finite basis of
operators. Hence we just need to apply the Spectral Theorem to the finite set.
-130-
CHAPTER 9
Canonical Forms
We are now ready to construct the canonical forms of a matrix. This completely determines the
structure of a given matrix. It is also the best approximation to diagonalization if the matrix is
not diagonalizable.
Ker(T − λI )m , λ ∈ C, m ∈ N
-131-
Chapter 9. Canonical Forms 9.1. Nilpotent Operators
Tm = O
Example 9.1. Any upper triangular matrix with 0’s on the diagonal is nilpotent.
Let V = Ker(T − λI )m where m is the smallest possible (i.e. the exponent of m(λ)).
dim(Uv ) = k with basis {S k−1 v, ..., Sv, v}, called a Jordan chain of size k.
-132-
Chapter 9. Canonical Forms 9.1. Nilpotent Operators
The following is our main result, which shows that V can be decomposed into cyclic subspaces.
Theorem 9.5. V admits a basis of Jordan chains, i.e. a basis of the form
V = Uv1 ⊕ · · · ⊕ UvN
(k )
where T |Uvi is represented by the Jordan block Jλ i .
-133-
Chapter 9. Canonical Forms 9.1. Nilpotent Operators
We add vi to our basis, and show that the collection of Jordan chains
N
[
Bv = {S ki vi , ..., Svi , vi }
i=1
is linearly independent:
X
– If aij S j vi = 0, apply S gives us a span of basis in Bu , hence all aij = 0 except
i,j
possibly when j = ki where the terms are killed by S. Rename ai = aiki .
X X
– We are left with ai S ki vi = 0, which is ai S ki −1 ui = 0, again a span of basis in
i i
Bu . Hence all ai = 0.
Finally we extend Bv to a basis B of V by possibly adding some vectors w1 , ..., wk ∈ V .
However, this is not a Jordan chain yet.
Swi ∈ Im(S) = Span(Bu ). But any vector from Bu is obtained from Bv by applying S. Hence
there exists w′i ∈ Span(Bv ) such that
Swi = Sw′i .
-134-
Chapter 9. Canonical Forms 9.2. Jordan Canonical Form
Combining all general eigenspaces, we can now state the main theorem:
Theorem 9.7 (Jordan Canonical Form). There exists a basis of V such that T is represented
by
(k ) (k )
J := Jλ11 ⊕ · · · ⊕ JλNN
where λi consists of all the eigenvalues of T . (λi with different indices may repeat!).
Since eigenvalues, characteristic polynomials, minimal polynomials, and multiplicity etc. are all the
same for similar matrices, if we can determine the Jordan block from these data, we can determine
the Jordan Canonical Form of a matrix A.
Notation. From now on we normalize characteristic polynomial using p(λ) = det(λI − T ) instead.
(k)
Example 9.2. The Jordan block Jλ1 has
characteristic polynomial (λ − λ1 )k ,
minimal polynomial (λ − λ1 )k ,
geometric multiplicity of λ1 is 1.
-135-
Chapter 9. Canonical Forms 9.2. Jordan Canonical Form
Now we can do the same analysis by combining different Jordan blocks and obtain:
The uniqueness of Jordan Canonical Form says that A is also similar to the matrix where the
Jordan blocks are in different order. For example we can have:
3 1
0 3
2
A∼
2 1 .
0 2
2
This is simply obtained by permuting the basis.
-136-
Chapter 9. Canonical Forms 9.2. Jordan Canonical Form
It turns out that when the matrix is bigger than 6 × 6, sometimes we cannot determine the
Jordan Canonical Form just by knowing p(λ), m(λ) and the dimension of the eigenspaces only:
Example 9.6. Consider a 7 × 7 matrix A. Let p(λ) = λ7 , m(λ) = λ3 , and dim V0 = 3. Then A
has 3 blocks and the largest block has size 3. So it may be similar to
(3) (3) (1) (3) (2) (2)
J0 ⊕ J0 ⊕ J0 or J0 ⊕ J0 ⊕ J0 .
However, by the uniqueness of Jordan Canonical Form, we know that these two are not similar to
each other, but we cannot tell which one is similar to A just from the given information.
Example 9.7. If each eigenvalue corresponds to a unique block, we can find the basis P such that
A = PJP−1 by the following:
(1) For each eigenvector v1 = v with eigenvalue λ, solve for vi such that (T − λI )vi = vi−1 until
no solutions can be found.
(k)
(2) The collection {v1 , v2 , ..., vk } will be the basis corresponding to a Jordan block Jλ .
However this method does not work in general if we have multiple blocks of the same eigenvalues.
In this case we need find the basis as Jordan chains directly following the proof of Theorem 9.5.
-137-
Chapter 9. Canonical Forms 9.3. Rational Canonical Form
Note that the Jordan Canonical Form is constructed only with a given minimal polynomial, which
2
always exists since dim L (V, V ) = n2 (so that {I, T, T 2 , · · · , T n } must be linearly dependent).
Therefore it allows us to prove many previous results that work for similar matrices.
A is similar to AT .
Proof. We illustrate the proof of the last statement. Write A in terms of the Jordan Canonical
Form J.
(k)
Each k × k block Jλ is similar to its transpose by the permutation matrix
0 ··· 1
. .. −1
Sk = .. 1
. = Sk .
1 ··· 0
Combining different blocks, we conclude that J = SJT S−1 for some S, hence
A ∼ J ∼ JT ∼ AT .
Formally speaking,
Jordan Canonical Form writes V as a direct sum of as many cyclic subspaces as possible.
Rational Canonical Form writes V as a direct sum of as few cyclic subspaces as possible.
Then from the proof of Cayley–Hamilton Theorem, we have already deduced that:
-138-
Chapter 9. Canonical Forms 9.3. Rational Canonical Form
Proposition 9.10. Let B := {v, T v, ..., T r−1 v} be a basis of Uv . Then T |Uv is represented by a
matrix
0 0 ··· 0 −a0
1 0
··· 0 −a1
..
C(g) := 0 1 0 . −a2
. .. .. ..
.. . . .
0 ··· 0 1 −ar−1
called the companion matrix of g(λ) where
Note that the characteristic and minimal polynomial of T |Uv must be the same since the degree of
the minimal polynomial must be degree r, otherwise B cannot be linearly independent.
It turns out that by picking these cyclic vectors one by one “smartly”, one can decompose V
into cyclic subspaces, where the factors g(λ) of p(λ), called the invariant factors obey certain
conditions. The “largest factor” will be the minimal polynomial m(λ).
Note that all the distinct irreducible factors of p(λ) should be a factor of m(λ) since m(λ) contains
all the (complex) eigenvalues as roots.
Theorem 9.11 (Rational Canonical Form). V can be decomposed into T -invariant cyclic
subspaces
V = Uv1 ⊕ · · · ⊕ Uvk
such that if gi (λ) is the characteristic and minimal polynomial of T |Uvi , then they satisfy
gi (λ) divides gi+1 (λ),
p(λ) = g1 (λ) · · · gk (λ),
m(λ) = gk (λ).
The collection of invariant factors {g1 (λ), ..., gk (λ)} is uniquely determined by T .
-139-
Chapter 9. Canonical Forms 9.3. Rational Canonical Form
Both the Jordan and rational canonical forms are special cases of the “Structure Theorem for
Finitely Generated Modules over a Principal Ideal Domain”, belonging to the branch of
module theory in advanced abstract algebra.
Nevertheless, for completeness, I tried to translate the proof of the Theorem 9.11 purely in terms
of linear algebra and put it in Appendix D for the advanced students who are interested.
By expanding, we have
g1 (λ) = λ
g2 (λ) = λ(λ − 2) = λ2 − 2λ
g3 (λ) = λ(λ − 1)(λ − 2) = λ3 − 3λ2 + 2λ
g4 (λ) = λ(λ − 1)2 (λ − 2) = λ4 − 4λ3 + 5λ2 − 2λ.
-140-
CHAPTER 10
When we integrate, two functions differ by finitely many values give us the same integrals,
i.e. we don’t care about functions that differ by finitely many values, or we only consider
functions “up to finitely many changes in values” (more generally, up to a measure zero set).
This means that we often only consider equivalence classes of objects sharing the same properties.
In Linear Algebra, such situation happens usually when we consider direct sum:
V = U ⊕ W.
By uniqueness of direct sum decomposition, sometimes we don’t care about the U part, and only
want to focus our attention on the W part. However there are many choices for W , so just taking
any W coordinates is not a well-defined process. If V is an inner product space, then we have the
orthogonal complement U ⊥ . But in general we do not have such canonical choice.
-141-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
Still, we know that whatever W we chose, they will have the same dimension. Therefore we want
to define canonically a vector space that behaves just like W for any choice of W .
To do the “ignore the U part” procedure, we use the same idea as above, i.e. by considering two
vectors to be equivalent if they differ by an element in U . We give the following definition:
v ∼U w ⇐⇒ v − w ∈ U.
For any w ∈ V , we denote the equivalence class by w, i.e. the subset of V given by
w := {v ∈ V : v − w ∈ U } = w + U
w0 = w.
w1 = w2 ⇐⇒ w1 + U = w2 + U ⇐⇒ w1 − w2 ∈ U.
V /U := V / ∼U .
It turns out we can define addition and scalar multiplication on V /U , making it a vector space:
v + w := v + w, v, w ∈ V.
c · v := c · v, c ∈ K, v ∈ V.
-142-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
The formula relies on representatives v, w of v and w and apply the operation on the original
vector spaces. But as we know an equivalence class may have different representatives:
v′ = v, w′ = w,
Therefore to see that the operations are well-defined, we need to check that:
v′ + w′ = v + w
c · v′ = c · v.
and
(c · v′ ) − (c · v) = c · (v′ − v) ∈ U.
Hence both sides define the same equivalence classes.
Since the operations are now well-defined, the vector space axioms of V /U follows from the vector
space axioms of V , e.g.
v+w=v+w=w+v=w+v
and so on.
π:V −→ V /U
v 7→ v
Conversely, if T is a surjective linear map, then two vectors mapped to the same point if they differ
by an element in Ker(T ). So we can identify Im(T ) with the vectors in V “up to Ker(T )”.
-143-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
V /Ker(T ) ≃ Im(T )
v 7→ T (v).
Proof. There are several things to check, which are all straightforward.
since v′ − v ∈ Ker(T ).
u + c · v = u + c · v 7→ T (u + c · v) = T (u) + c · T (v).
Remark. As the name suggested, in fact there are also Second, Third and Fourth Isomorphism The-
orems. Also the construction of quotients and the Isomorphism Theorems work as well for groups, rings,
modules, fields, topological spaces etc.
This motivates Category Theory, which studies the common features of all mathematical objects in a
universal setting, including the universal properties below.
W ≃ V /U
-144-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
a
Remark. For integers, q = is a quotient if a = bq. Recall that as a set U ⊕ W ≃ U × W . Hence
b
V = U ⊕ W ≃ U × V /U
Since dim U + dim W = dim V , we have (also true for infinite dimensional)
Combining with First Isomorphism Theorem, it gives another proof of the Rank-Nullity Theorem.
⟨v, w⟩ := ⟨v⊥ , w⊥ ⟩.
Note. Although U ⊥ and V /U look the same (isomorphic), they are different vector spaces:
U ⊥ is a subspace of V .
V /U is not a subspace of V .
We encounter this Example very early on already in Non-Example 1.26:
0 x
⊥
3
3
Example 10.1. If U = { 0 : z ∈ R} ⊂ R , then U = {y : x ∈ R, y ∈ R} ⊂ R is a
z 0
3 2
subspace, but R /U ≃ R is not a subspace.
-145-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
U := {(0, a2 , a3 , ...) : ai ∈ K} ⊂ V
Intuitively, V /U means “ignoring all the terms except the first one”.
Note that V /U is not the same as the subspace
W := {(a1 , 0, 0, ...) : a1 ∈ K} ⊂ V
although V = U ⊕ W and W ≃ V /U ≃ K.
This Example also shows that dim V /U can be finite even though dim V = dim U = ∞.
Example 10.3. Another useful example comes from analysis. Let R([0, 1]) be the space of real-
valued Riemann integrable functions on [0, 1]. Then the bilinear form
Z 1
⟨f, g⟩ := f (x)g(x)dx
0
However, if we “ignore” all these functions, namely if we consider the quotient space
V := R([0, 1])/Z([0, 1])
where
Z 1
Z([0, 1]) := {f ∈ R([0, 1]) : |f (x)|2 dx = 0}
0
then the bilinear form becomes an inner product on V .
Remark. However, for this to be a Hilbert space, we need Lebesgue integrable functions instead, such
that the L2 -completeness condition of Hilbert space is satisfied.
-146-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
Quotient spaces are very useful for induction on dimension: If we know information about a
subspace U ⊊ V and V /U which are of smaller dimensions, we can construct relevant things on V .
One common trick is illustrated in the proof of the following version of the Basis Extension
Theorem for quotient.
Theorem 10.8. Let dim V < ∞ and U ⊂ V be a subspace with {u1 , ..., ur } a basis of U .
Conversely, if {u1 , ..., ur , v1 , ..., vm } is a basis of V , then {v1 , ..., vm } is a basis for V /U .
Proof. By the Dimension Formula, dim V = r + m. So we only need to check that the bases are
linearly independent. This follows from
d1 v1 + · · · + dm vm = 0
⇐⇒d1 v1 + · · · + dm vm ∈ U
⇐⇒c1 u1 + · · · + cr ur + d1 v1 + · · · + dm vm = 0
for some ci ∈ K.
Finally, we look at linear maps, which completes the description of Proposition 8.5.
T : V /U −→ W/U ′
v 7→ T (v).
T : V /U −→ V /U.
Intuitively speaking, taking quotient by U means “killing” all the contributions of vectors from U .
-147-
Chapter 10. Quotient and Dual Spaces 10.1. Quotient Spaces
Proof. Again we need to check the map is well-defined, i.e. if v′ is another representative of v, then
T (v′ ) = T (v),
The basis constructed by Theorem 10.8 gives A the required upper triangular block form.
Then U ⊂ Ker(T ) if and only if there exists a unique T ∈ L (V /U, W ) such that
T = T ◦ π.
T
V W
π
T
V /U
Finally, we illustrate the use of quotient spaces by rewriting the proof of Schur’s Lemma:
Any T ∈ L (V, V ) with dimC V < ∞ can be represented by an upper triangular matrix.
-148-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces
In mathematics, sometimes two seemingly different concepts have exactly the same mathematical
structures. This is known as duality. To give some examples:
In R3 , the concept of straight lines and planes are dual to each other. Both of them are
specified by two data:
– Line: a point and a direction vector.
– Plane: a point and a normal vector.
In fact, given a normal vector n ∈ R3 , any plane is specified by an equation of the form
n · x = c, x ∈ R3 , c ∈ R.
As a special case, the set of “all lines through 0”, and the set of “all planes containing 0”,
have exactly the same geometry known as projective spaces.
In R2 , the concept of points and straight lines are dual to each other.
– Intersection of 2 lines ←→ a line joining 2 points.
– concurrent lines (intersecting at 1 point) ←→ colinear points (lying on 1 line).
The study of the combinatorics of lines and points is a special case of incidence geometry.
In multivariable calculus, interpreting with differential forms, the gradient and divergence
operations are dual to each other
f 7→ ∇f F 7→ ∇ · F
scalar function −→ vector field ←→ vector field −→ scalar function
F 7→ ∇ × F
vector field −→ vector field
It is easy to check that the set of evaluation maps {Ev0 , Ev1 , ..., Evn } forms a basis of L (V, R).
-149-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces
The last example motivates the definition of dual space in Linear Algebra.
V ∗ := L (V, K).
More generally, if dim V = n with basis B = {u1 , ..., un }, then any ϕ ∈ V ∗ is represented by
[ϕ]B = ϕ(u1 ) · · · ϕ(un )
with respect to B.
Since any element in L (V, K) is determined by its image on the basis vectors, we have
Proposition 10.12. Let dim V = n and B = {u1 , ..., un } be a basis of V . Then the linear
functionals u∗i ∈ V ∗ defined by
(
1 i=j
u∗i (uj ) := , i, j = 1, ..., n
0 i ̸= j
dim V ∗ = dim V.
c1
. n
Note. For any v ∈ V , if [v]B = .
. ∈ K is the coordinate vector with respect to B, then
cn
u∗i (v) = ci .
Since the dimension are the same, we have an isomorphism V ≃ V ∗ , but different bases of V gives
different isomorphism. However, the situation gets better if V is a (real) inner product space, since
we can obtain the coordinates by taking inner product (Proposition 4.13).
-150-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces
V ≃ V∗
v 7→ v∗
Notation. The notation is consistent with the adjoint (Definition 7.4) when K = R.
Remark. When K = C, the situation is more complicated since the above map is conjugate-linear. We
have to modify here and there with complex conjugates instead and is quite troublesome, so we omit it.
Proof. Since dim V = dim V ∗ , we only need to check injectivity of the map.
If v∗ is the zero functional, i.e. ⟨u, v⟩ = 0 for all u ∈ V , then ⟨v, v⟩ = 0 =⇒ v = 0.
Example 10.5. Any bilinear form f (u, v) on V also defines a linear map
V −→ V ∗
v 7→ v∗
where for any u ∈ V , v∗ (u) := f (u, v). In general, if f (u, v) is non-degenerate, i.e.
f (u, v) = 0 ∀u ∈ V =⇒ v = 0
then the map V −→ V ∗ is injective (and hence isomorphism if dim V < ∞).
f (x) 7→ f (0)
-151-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces
T ∗ : W ∗ −→ V ∗
ϕ 7→ ϕ◦T
Again the notation is compatible with the adjoint (i.e. transpose) defined in Chapter 7:
Rank of T = Rank of T ∗ .
T injective ⇐⇒ T ∗ surjective.
T surjective ⇐⇒ T ∗ injective.
-152-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces
If dim V < ∞ (inner product not needed), taking double dual naturally identifies with itself:
Theorem 10.17. If dim V < ∞ over any field K, then there is a canonical isomorphism
V ≃ V ∗∗
v 7→ v∗∗
Proof. Since dim V = dim V ∗ = dim V ∗∗ , we just need to check the map is injective.
If v∗∗ is the zero functional on V ∗ , then v∗∗ (ϕ) = ϕ(v) = 0 for any ϕ ∈ V ∗ .
By Proposition 10.12, evaluating with the dual basis shows that the coordinates of v are all 0.
Finally, the notions of subspaces and quotients are dual to each other:
(V /U )∗ ≃ U 0 ⊂ V ∗
U 0 := {ϕ ∈ V ∗ : ϕ(U ) = 0}.
-153-
Chapter 10. Quotient and Dual Spaces 10.2. Dual Spaces
Taking dual, by Corollary 10.16, we have composition of injective and surjective maps:
π∗ ι∗
(V /U )∗ V∗ U∗
ϕ = ψ ◦ π = π∗ψ
Since W ⊂ U 0 , the dimension formula shows that dim W = dim U 0 so they are in fact equal.
-154-
APPENDIX A
Equivalence Relations
Let S be a set.
(1) Reflexive: x ∼ x.
(2) Symmetric: x ∼ y =⇒ y ∼ x.
(3) Transitive: x ∼ y and y ∼ z =⇒ x ∼ z.
[x] := {z ∈ S : z ∼ x} ⊂ S.
155
Appendix A. Equivalence Relations
w ∼ x ∼ z ∼ y =⇒ w ∼ y
The previous statement shows that the equivalence classes are either equal or disjoint.
Note. All the properties (1)–(3) of an equivalence relation are needed in the proof!
i.e.
(S/ ∼) = {[0], [1], ..., [n − 1]}
is identified with the remainders of division by n.
-156-
APPENDIX B
Euclidean Algorithm
a = qb + r, 0≤r<b
Using this, Euclid in his book Elements, stated a simple algorithm to calculate the greatest
common divisor (gcd) of a, b, i.e. the largest number d such that both a, b are multiples of d.
Theorem B.1 (Euclidean Algorithm). The gcd is calculated by successively taking quotient
and remainder:
a = q1 b + r1
b = q2 r1 + r2
r1 = q3 r2 + r3
···
rn−2 = qn rn−1 + rn
rn−1 = qn+1 rn
where the process must stop since {rk } is a strictly decreasing sequence of positive integers.
157
Appendix B. Euclidean Algorithm
Proof.
By (forward) induction, we see that any common factors of a, b must divide rk for all k,
including rn = d.
Theorem B.2 (Bézout’s Identity). Given a, b ∈ N, there exists integers m, n ∈ Z such that
d = ma + nb
Therefore d = 101.
The same logic applies to polynomial division over any field K: Given polynomial a(t), b(t) ∈ K[t]
with deg a ≥ deg b,
a(t) = q(t)b(t) + r(t), deg r < deg b
where q(t) is the quotient and r(t) is the remainder .
-158-
Appendix B. Euclidean Algorithm
Theorem B.3 (Euclidean Algorithm / Bézout Identity). If a(t), b(t) ∈ K[t], then there exists
polynomials p(t), q(t) ∈ K[t] such that
where d(t) is the greatest common divisors (i.e. with the largest degree) of a(t) and b(t), defined
up to a scalar multiple.
Remark. Euclidean algorithm may not work if the coefficients are not from a field, because we may not
be able to do long division.
-159-
Appendix B. Euclidean Algorithm
-160-
APPENDIX C
Complexification
The main goal is to give an explicit construction of an orthonormal basis to block diagonalize a
real normal matrix, which was stated after Example 7.9.
VC = V × V
where
Addition is component-wise:
Intuitively, it just means that now we allow complex coefficients for our vector space V .
-161-
Appendix C. Complexification
In particular,
dimR V = dimC VC .
(Linearly independent.) If
c1 b1 + · · · + cn bn = 0, ck = ak + ibk ∈ C
a1 b1 + · · · + an bn = 0
b1 b1 + · · · + bn bn = 0
over R, which means ak , bk = 0 and hence ck = 0.
u + iv = (a1 b1 + · · · + an bn ) + i(b1 b1 + · · · + bn bn )
c1 b1 + · · · + cn bn
for ck = ak + ibk ∈ C.
-162-
Appendix C. Complexification
Furthermore,
In other words,
Note that if TC has a real eigenvalue λ, then both the real and imaginary parts of its (complex)
eigenvector are also real eigenvectors, so that λ is really an eigenvalue of T .
Proposition C.5. Let dimR V < ∞. If T ∈ L (V, V ), then there exists a T -invariant subspace
U ⊂ V with dimR U = 1 or 2.
In other words,
T (u) = au − bv
T (v) = bu + av
-163-
Appendix C. Complexification
!
a b
Proof. Assume T is not symmetric. Let A = represents T in an orthonormal basis.
c d
We now state the main result, which gives us an explicit construction (by induction) of an orthonor-
mal basis to block diagonalize a real normal matrix (this is mentioned in Example 7.9).
Theorem C.7 (Spectral Theorem of Real Normal Matrices). Let A be a real normal matrix.
Then it is orthogonally similar to a block diagonal matrix of the form
λ1
λ2
. ..
λr
a1 b1
−b1 a1
. ..
am bm
−bm am
where ak , bk , λk ∈ R.
-164-
Appendix C. Complexification
The base case dim V = 1 is trivial, while we have shown the case dim V = 2 in Proposition C.6.
Hence assume dim V ≥ 3.
Hence there exists an orthonormal basis such that T is represented in block diagonal form
!
AU O
A= .
O AU ⊥
Combining with the basis from U gives (up to permutation) the full orthonormal basis for V
that gives A the required form.
-165-
Appendix C. Complexification
-166-
APPENDIX D
The existence and uniqueness of the Rational Canonical Form (as well as the Jordan Canon-
ical Form) are special cases of the “Structure Theorem for Finitely Generated Modules
overa Principal Ideal Domain”, belonging to the branch of module theory in advanced ab-
stract algebra.
Here I will “translate” the proof and present it in a self-contained way using only Linear Algebra.
Those who have studied ring and module theory may find the proof obscuring with algebraic
structures such as ideals, divisors, generators and submodules etc.
One may also get a “taste” of what to expect in advanced abstract algebra in the future.
The proofs below follow the one from Advanced Linear Algebra by S. Roman.
If you just want to get the main idea how the Rational Canonical Form is constructed,
you can safely skip the blue proofs, trust the results, and move on.
Recall that if K is not algebraically closed, these polynomials may not be linear.
167
Appendix D. Proof of Rational Canonical Forms D.2. Decomposing V = Ker(p(T )m )
Just like the proof of Jordan Canonical Form, we can first focus onto the case when
V = Ker(p(T )m )
In that proof, we have split V into cyclic subspaces Uv of S := T − λI , which is also T -invariant.
However, if p(λ) is not linear, the subspaces may not be T -invariant anymore!
Recall that m is chosen to be the smallest possible, so that by the definition of V , the minimal
polynomial of T is p(λ)m .
into cyclic subspaces of T , such that the minimal polynomial of T |Uvi is p(λ)ei with
m = e1 ≥ e2 ≥ · · · ≥ eN .
Since m(λ) = p(λ)m , there exists a vector v1 ∈ V such that p(T )m−1 v1 ̸= 0.
(Otherwise p(T )m−1 v = 0 for any v ∈ V , which means p(λ)m−1 is minimal instead!)
Let Uv1 be the cyclic subspaces of T generated by v1 . The idea is to show that
V = Uv1 ⊕ W
-168-
Appendix D. Proof of Rational Canonical Forms D.2. Decomposing V = Ker(p(T )m )
V = Uv1 ⊕ W
= Uv1 ⊕ Uv2 ⊕ W ′
···
= Uv1 ⊕ Uv2 ⊕ Uv3 ⊕ · · · ⊕ UvN
and complete the proof (the process must end since V is finite dimensional).
the vector space sums (not necessary direct). Note that Uvi is a subspace of Uv1 ,...,vk .
Note. A generator of V always exists (e.g. by taking a basis) but in general the set is much smaller.
Note. The definition really says Uv1 ,...,vk is spanned by vectors of the form {T i vj }, or in other
words, linear combinations of {pj (T )vj } for some polynomials pj (t).
u′ = u − α(T )v1
-169-
Appendix D. Proof of Rational Canonical Forms D.2. Decomposing V = Ker(p(T )m )
Hence our new goal is to find α(t) such that W := W0 + Uu′ intersects Uv1 trivially.
In other words,
Uv1 ∩ (W0 + Uu′ ) = {0}
so that
V = Uv1 ⊕ W
and complete our construction.
i.e.
w0 + r(T )(u − α(t)v1 ) ∈ Uv1 =⇒ w0 + r(T )(u − α(t)v1 ) = 0.
Rewriting, this means for any r(t),
r(T )u ∈ Uv1 ⊕ W0 .
Note that u ∈
/ Uv1 ⊕ W0 = Uv1 ,...,vk otherwise V can have one less generator.
1 = a(t)r(t) + b(t)p(t)m .
Hence any r(t) ∈ I must be a multiple of p(t)d for some 1 ≤ d ≤ m as big as possible.
-170-
Appendix D. Proof of Rational Canonical Forms D.3. Combining cyclic subspaces
Now we know if r(T )u ∈ Uv1 ⊕ W0 , then r(t) = q(t)p(t)d for some polynomial q(t).
Therefore if we can find α(t) such that p(T )d (u − α(T )v1 ) ∈ W0 then (⋆) is satisfied.
– Since we have direct sum Uv1 ⊕ W0 , we have p(T )m−d s(T )v1 = 0.
– Since T restricted to Uv1 has minimal polynomial p(t)m , it must divide p(t)m−d s(t).
– Hence s(t) must be a multiple of p(t)d .
is satisfied!
where each cyclic subspace Uvij has the same (see Proposition 9.10) characteristic and minimal
polynomials pi (λ)eij for some powers eij with mi = ei1 ≥ ei2 ≥ · · · ≥ eiNi ≥ 1.
Definition D.4. The polynomials pi (λ)eij are called the elementary factors.
Note. If each pi is linear, then elementary factors are precisely the minimal polynomials of each
Jordan blocks.
-171-
Appendix D. Proof of Rational Canonical Forms D.3. Combining cyclic subspaces
From this, in principle we can already write down the companion matrices of each pieces. However,
if the irreducible factors pi (λ) do not look simple, this matrix form is too complicated!
Therefore we want to use as few cyclic subspaces as possible. The observation is that
More precisely,
Proposition D.5. Let p(λ) be the characteristic polynomial of Uv and q(λ) be that of Uw .
Uv+w = Uv ⊕ Uw
– If it is not minimal, then since p(λ), q(λ) is coprime, WLOG assume v + w is killed by
p1 (T )q(T ) for some p1 (λ) with deg p1 < deg p.
– But then
0 = p1 (T )q(T )(v + w) = p1 (T )q(T )v,
so the minimal polynomial p(λ) divides p1 (λ)q(λ), which is impossible.
Since deg(pq) = deg p + deg q and cyclic subspace has the same characteristic and minimal
polynomial,
Uv+w and Uv ⊕ Uw
have the same dimension, so must be equal.
-172-
Appendix D. Proof of Rational Canonical Forms D.4. Invariant Factors
Now we write down the decomposition vertically: (Note: the rows may have different lengths)
By previous section, the vertical sum (possibly skipping the terms that are missing) is cyclic:
m(λ) = g1 (λ).
(Note that the degree of gi (λ) is from large to small instead of those in Theorem 9.11.
However it does not matter, we can always change the order of basis to permute the blocks, so the
two forms are matrix similar.)
-173-
Appendix D. Proof of Rational Canonical Forms D.5. Uniqueness
D.5 Uniqueness
To complete the proof, we show the uniqueness of invariant factors / elementary factors.
Proposition D.7. The list of invariant factors g1 (λ), ..., gk (λ) is unique.
Proof. We outline the idea. By the conditions on invariant factors, they are uniquely determined
by elementary factors.
Now since
V = Ker(p1 (T )m1 ) ⊕ · · · ⊕ Ker(pk (T )mk )
we only need to focus on a single invariant subspace. Hence let V = Ker(p(T )m ).
V = Uu1 ⊕ · · · ⊕ UuM
V = Uw1 ⊕ · · · ⊕ UwN
Consider Im(p(T )). All those subspaces with di = 1 and ej = 1 are killed under p(T ).
By induction on the degree of m(λ), the list of elementary factors (which is just one degree
less of the original ones) is unique for Im(p(T )).
Hence we must also have the same number of di = 1 and ej = 1 subspaces to begin with to
match the dimension of V .
-174-