0% found this document useful (0 votes)
137 views59 pages

115af18 Lecture Notes

These lecture notes cover topics in linear algebra including vector spaces, linear transformations, determinants, eigenvalues and eigenvectors, and inner product spaces. The notes are based primarily on the course textbook with some modifications and omissions by the instructor. Students are advised to use these notes to learn the mathematical content of the course but should refer to the textbook for additional examples, illustrations, and practice problems.

Uploaded by

Shela Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views59 pages

115af18 Lecture Notes

These lecture notes cover topics in linear algebra including vector spaces, linear transformations, determinants, eigenvalues and eigenvectors, and inner product spaces. The notes are based primarily on the course textbook with some modifications and omissions by the instructor. Students are advised to use these notes to learn the mathematical content of the course but should refer to the textbook for additional examples, illustrations, and practice problems.

Uploaded by

Shela Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

MATH115A LECTURE NOTES

ALLEN GEHRET

Abstract. These lecture notes are largely based off of the course textbook [1] and the notes of
Artem Chernikov (math.ucla.edu/~chernikov/teaching/17S-MATH115A/index.html), except for
the sake of time I exclude/rearrange some material and modify some proofs. I recommend you refer
to these notes (and the textbook) for learning the mathematical content of the course, and refer to
the textbook for additional examples, pictures, and practice problems.
Note: these lecture notes are subject to revision, so the numbering of Lemmas, Theorems,
etc. may change throughout the course and I do not recommend you print out too many pages
beyond the section where we are in lecture. Any and all questions, comments, and corrections are
enthusiastically welcome!

Contents
1. Vector spaces 2
1.1. Fields 2
1.2. Vector spaces 3
1.3. Basic properties of vector spaces 4
1.4. Subspaces 5
1.5. Linear combinations and span 7
1.6. Linear independence 9
1.7. Bases and dimension 11
2. Linear transformations 15
2.1. Basic properties of linear transformations 15
2.2. Null space and range 15
2.3. The matrix representation of a linear transformation 19
2.4. Algebraic description of the operations in L(V, W ) 21
2.5. Composition of linear transformations and matrix multiplication 22
2.6. Calculating the value of a linear transformation using its matrix representation 25
2.7. Associating a linear transformation to a matrix 26
2.8. Invertibility 27
2.9. Isomorphisms 28
2.10. Change of coordinate matrix 30
3. Determinants 32
3.1. Computing the determinant 32
3.2. Properties of the determinant 33
3.3. The determinant of a linear operator (New!) 34
4. Eigenvalues and eigenvectors 36
4.1. Determining eigenvectors and eigenvalues of a linear operator. 39
5. Inner product spaces 44
5.1. Inner products and norms 44
5.2. Orthogonality 47

Date: February 18, 2019.


1
5.3. Orthonormal bases and Gram-Schmidt orthogonalization 48
5.4. Orthogonal complement 51
6. Appendix: non-linear algebra math 54
6.1. Sets 54
6.2. Set operations - making new sets from old 55
6.3. Functions 56
6.4. Induction 57
References 59

1. Vector spaces
1.1. Fields. You are probably familiar with “linear algebra with real scalars/coefficients” and
“linear algebra with complex scalars/coefficients”. To make the notion of “which scalars are we
using” precise, we give the definition of a field :
Definition 1.1. A field is a set F is a set equipped with two operations + and · (called (scalar)
addition and (scalar) multiplication, respectively) and two special elements 0 and 1, such that
for every a, b, c ∈ F :
(F1) a + b = b + a and a · b = b · a (commutativity of addition and multiplication),
(F2) (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c) (associativity),
(F3) 0 + a = a and 1 · a = a (additive and multiplicative identities)
(F4) there exists an element −a ∈ F such that
a + (−a) = 0 (additive inverse)
and if b 6= 0, then there exists an element b−1 ∈ F such that
b · b−1 = 1 (multiplicative inverse)
(F5) a · (b + c) = a · b + a · c (distributivity)
For us, the following are the two most important examples:
Examples 1.2. (1) The set R of real numbers, with the usual operations + and ·, and the
usual 0 and 1 is a field.
(2) The set C = {a + bi : a, b ∈ R} of complex numbers, with the usual +, ·, 0, 1 is also a
field.
(3) The set nm o
Q = : m, n ∈ Z and n 6= 0
n
of rational numbers, with the usual +, ·, 0, 1 is a field.
(4) Let Z2 be a set with two elements Z2 = {0, 1} and define the operations of addition and
multiplication as follows:
0 + 0 = 1 + 1 := 0 and 0 + 1 = 1 + 0 := 1 and 0 · 0 = 0 · 1 = 1 · 0 := 0 and 1 · 1 := 1.
Then Z2 together with these two operations is a field (a field of two elements). Think of Z2
as the field of “binary arithmetic”.
In most of the class, we will be agnostic about which field we are working over. The reason is
because, we could prove a theorem working the scalars coming from R, and then prove a second
nearly identical theorem using scalars from C. However, it is easier and more efficient to prove just
one theorem which works for any field we want (multiple theorems for the price of one!). Thus we
have:
2
Convention 1.3. For the rest of the course we let F denote a field. It is good to keep in mind
the examples F = R or F = C. However, we will rarely be using any special properties of R or C
beyond just the field axioms, so in general F is allowed to be any field, not just these two.
1.2. Vector spaces.
Definition 1.4. A vector space V over F is a set V with two operations:
• (vector addition) for every x, y ∈ V , there is an element x + y ∈ V
• (scalar multiplication) for every a ∈ F and x ∈ V , there is an element a · x ∈ V
such that the following axioms hold:
(VS1) x + y = y + x for every x, y ∈ V (commutativity of addition)
(VS2) (x + y) + z = x + (y + z) for every x, y, z ∈ V (associativity of addition)
(VS3) there is an element 0 ∈ V such that x + 0 = x for every x ∈ V (vector additive identity)
(VS4) for every x ∈ V there is y ∈ V such that x + y = 0 (vector additive inverse)
(VS5) 1 · x = x for every x ∈ V (where 1 is additive identity of F )
(VS6) a · (b · x) = (a · b) · x for every a, b ∈ F and x ∈ V (associativity of scalar multiplication)
(VS7) a · (x + y) = a · x + a · y for every a ∈ F and x, y ∈ V (distributivity)
(VS8) (a + b) · x = a · x + b · x for every a, b ∈ F and x ∈ V (distributivity)
Given a vector space V over F , we will refer to elements of V as vectors and elements of F as
scalars.
Example 1.5. Given a field F , and natural number n ≥ 1, consider the set
F n = (x1 , . . . , xn ) : xi ∈ F ,


the set of all n-tuples of elements from F . We equip F n with the operations of vector addition:
(x1 , . . . , xn ) + (y1 , . . . , yn ) := (x1 + y1 , . . . , xn + yn ) for all (x1 , . . . , xn ), (y1 , . . . , yn ) ∈ F n
and scalar multiplication:
a · (x1 , . . . , xn ) := (a · x1 , . . . , a · xn ) for all a ∈ F and (x1 , . . . , xn ) ∈ F n .
It follows that F n with these two operations is a vector space over the field F (i.e., (VS1)-(VS8)
hold). As an example, if F = R and n = 2 or n = 3, then this gives the familiar vector spaces R2
(the Cartesian plane) and R3 (3D Euclidean space)
Example 1.6. Let F be a field, and let P (F ) be the set of all polynomials with coefficients in F .
That is, P (F ) consists of all expressions of the form
p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0
for some n ≥ 0, with a0 , . . . , an ∈ F . If ai = 0 for all i = 0, . . . , n, then p(x) is called the zero
polynomial. The degree deg p(x) of a nonzero polynomial is the largest i such that ai 6= 0 (the
degree of the zero polynomial is defined to be −1). Two polynomials p(x) = an xn + · · · + a1 x + a0
and q(x) = bm xm + · · · + b1 x + b0 are equal if they have the same degree and the same coefficients,
i.e., ai = bi for every i.
We equip P (F ) with the operation of vector addition: Given p(x) and q(x) as above, we may
assume that m = n (if m < n, then we set bm+1 = · · · = bn = 0, and if n < m, then we set
an+1 = · · · = am = 0). Then we define:
p(x) + q(x) := (p + q)(x) = (an + bn )xn + (an−1 + bn−1 )xn−1 + · · · + (a1 + b1 )x + (a0 + b0 )
We also equip P (F ) with the operation of scalar multiplication: given p(x) as above, and c ∈ F ,
define:
c · p(x) := (cp)(x) = can xn + can−1 xn + · · · + ca1 x + ca0 .
3
It follows that P (F ) with these two operations is a vector space over F (need to check that (VS1)-
(VS8) hold).
Example 1.7. Let M2×2 (R) be the set of all 2 × 2 matrices with entries in R. We define vector
addition and scalar multiplication in the usual way:
     
a b e f a+e b+f
+ := and
c d g h c+g d+h
   
a b αa αb
α· :=
c d αc αd
for every a, b, c, d, e, f, g, h, α ∈ R. With these operations M2×2 (R) is a vector space over R. Anal-
ogously, the space Mm×n (R) of all m × n matrices with real coefficients is a vector space over
R.
Example 1.8 (The most boring vector space). Let V = {0}, i.e., V contains a single element 0.
The operations of vector addition and scalar multiplication are defined in the only way possible:
0 + 0 := 0 and α · 0 := 0 for every α ∈ F .
With these operations, V is a vector space over F .
1.3. Basic properties of vector spaces. In this subsection we derive some very basic properties
of vector spaces from the axioms (VS1)-(VS8).
Cancellation Law 1.9. Let V be a vector space over F and suppose x, y, z ∈ V . Then
(1) if x + z = y + z, then x = y;
(2) if z + x = z + y, then x = y.
Proof. (1) By (VS4), there is an element z̃ ∈ V such that z + z̃ = 0. Then
x=x+0 by (VS3)
= x + (z + z̃) by the choice of z̃
= (x + z) + z̃ by (VS2)
= (y + z) + z̃ by assumption
= y + (z + z̃) by (VS2) again
= y + 0 by choice of z̃
=y by (VS3) again.
Thus x = y.
(2) Assume z + x = z + y. Then x + z = y + z by applying (VS1) to both sides. Thus x = y
by part (1). 
Corollary 1.10. The vector 0 described in (VS3) is unique, i.e., if 0, 00 ∈ V are such that
x+0=x and x + 00 = x for every x ∈ V ,
then 0 = 00 .
Proof. Assume 0, 00 ∈ V are as above. Then for any x ∈ V we have
x + 0 = x = x + 00
by (VS3) for 0 first and then 00 . By the Cancellation Law 1.9, we conclude that 0 = 00 . 
This permits us to define the zero vector of a vector space V to be the unique vector 0 which
satisfies (VS3).
4
Example 1.11. In the vector n
 space
 F , the zero vector is (0, 0, . . . , 0) (the n-tuple of all 0’s). In
0 0
M2×2 (R) the zero vector is . In P (F ), the zero vector is the zero polynomial.
0 0
Corollary 1.12. Given x ∈ V , the vector y described in (VS4) is unique, i.e., if y, y 0 ∈ V are such
that x + y = 0 and x + y 0 = 0, then y = y 0 . This vector is called the additive inverse of x and is
denoted by −x.
Proof. With y, y 0 ∈ V as above, if x+y = 0 = x+y 0 , then by the Cancellation Law 1.9, we conclude
that y = y 0 . 
We also record here some more useful properties of vector spaces:
Proposition 1.13. Let V be a vector space over a field F . For every x ∈ V and a ∈ F we have
(1) 0 · x = 0 (the scalar 0 times any vector x equals the zero vector)
(2) (−a) · x = −(ax) = a · (−x) (the first thing is the additive inverse of the scalar a times
the vector x, the second thing is the additive inverse of the vector ax, the third thing is the
scalar a times the additive inverse of x)
(3) a · 0 = 0 (the scalar a times the zero vector is the zero vector).
Proof. Exercise. 
1.4. Subspaces.
Definition 1.14. Let V be a vector space over a field F . A subset W of V (W ⊆ V ) is a subspace
of V if W itself is a vector space over F , with respect to the vector addition and scalar multiplication
defined on V .
That is, W satisfies (VS1)-(VS8) using + and · defined on V .
Example 1.15. Consider the vector space V = F n . The subset
W := (x1 , . . . , xn−1 , 0) : x1 , . . . , xn−1 ∈ F ⊆ F n


is a subspace of F n (we’ll show this soon).


Example 1.16. Consider the vector space V = P (F ) of all polynomials with coefficients from F .
For each n ≥ 0, define the subset Pn (F ) ⊆ P (F ) consisting of all polynomials of degree less than
or equal to n. We’ll show Pn (F ) is a subspace of P (F ).
Example 1.17. Let V be any vector space. Then V is a subspace of V (the biggest possible
subspace), and {0} ⊆ V is a subspace of V (the smallest possible subspace) where 0 is the zero
vector of V . These two subspaces are always guaranteed to be there.
Example 1.18. Let S be a non-empty set and F a field. Let F(S, F ) denote the set of all functions
from S to F . Two functions f, g ∈ F(S, F ) are equal if f (x) = g(x) for every x ∈ S. We can equip
F(S, F ) with the operations of vector addition and scalar multiplication: given f, g ∈ F(S, F ) and
c ∈ F , define (f +g)(x) := f (x)+g(x) and define (cf )(x) := c·f (x). With these operations F(S, F )
is a vector space over F .
In particular, F(R, R) is the vector space of all real-valued functions defined on the real numbers.
Let C(R) ⊆ F(R, R) denote the subset of all continuous functions. We’ll show C(R) is a subspace
of F(R, R).
The next lemma shows that to check that W ⊆ V is a subspace of V , we need to check fewer things
than all of (VS1)-(VS8):
Subspace Test 1.19. Let V be a vector space over F , and let W ⊆ V be a subset of V . Then W
is a subspace of V if and only if
5
(a) 0 ∈ W (i.e., the zero vector of V is in W ),
(b) if x, y ∈ W , then x + y ∈ W (i.e., W is closed under vector addition), and
(c) if x ∈ W and c ∈ F , then cx ∈ W (i.e., W is closed under scalar multiplication).
Proof. (⇒) Assume that W is a subspace of V . This means that W is a vector space under the
operations of addition and scalar multiplication defined on V . In particular, for every x, y ∈ W
and c ∈ F , x + y, cx ∈ W , so (b) and (c) hold. Furthermore, (VS3) holds in W , so there is 00 ∈ W
such that x + 00 = x for all x ∈ W . In particular, 00 + 00 = 00 . Since 00 ∈ V , we also have 00 + 0 = 00
(since 0 ∈ V satisfies (VS3) in V ). By the Cancellation Law 1.9, we conclude that 00 = 0, and so
0 ∈ W which implies (a).
(⇐) Now suppose (a), (b), (c) hold. We will show that W is a subspace of V . Properties (b)
and (c) tell us W has two operations defined on it (+ and · coming from V ), so we need to check
that (VS1)-(VS8) hold in order to conclude that W is a vector space with respect to these two
operations (and hence a subspace of V ). First, as V is a vector space, (VS1), (VS2), (VS5), (VS6),
(VS7), (VS8) hold for all elements V . Since W is a subset of V these axioms also hold for all
vectors in W . (VS3) holds by (a). It remains to check (VS4). Let x ∈ W be arbitrary. We need to
find some y ∈ W such that x + y = 0. We know that in V , (−1)x = −x is an additive inverse of x
(see Proposition 1.13). By (c), (−1)x ∈ W , so we can take y = (−1)x. 

We will now use Subspace Test 1.19 to check the subspaces in the previous examples are indeed
subspaces.
Example 1.20. Given V = F n and W = {(x1 , . . . , xn−1 , 0) : xi ∈ F }, we will check (a), (b), and
(c). For (a), we note that (0, 0, . . . , 0) ∈ W , taking x1 = · · · = xn−1 = 0. For (b) and (c), note that
given any (x1 , . . . , xn−1 , 0), (y1 , . . . , yn−1 , 0) ∈ W and c ∈ F , we have
+ 0}) ∈ W
(x1 , . . . , xn−1 , 0) + (y1 , . . . , yn−1 , 0) = (x1 + y1 , . . . , xn−1 + yn−1 , 0| {z
=0

and
a · (x1 , . . . , xn−1 , 0) = (ax1 , . . . , axn−1 , |{z}
a · 0) ∈ W.
=0

Example 1.21. Given V = P (F ) and W = Pn (F ), we will check (a), (b) and (c). For (a), the
zero vector in P (F ) is the zero polynomial: p(x) = an xn + · · · + a1 x + a0 , where an = · · · = a0 = 0.
The degree of p(x) is −1, so p(x) ∈ Pn (F ). For (b) if both p(x) and q(x) are in Pn (F ), then their
degrees are at most n, so the degree of p(x) + q(x) is at most n, so p(x) + q(x) ∈ Pn (F ). For (c), if
p(x) ∈ Pn (F ) and c ∈ F , then cp(x) either is the zero polynomial (if c = 0), or has the same degree
as p(x) (if c 6= 0), in either case, cp(x) ∈ Pn (F ).
Example 1.22. Given V = F(R, R) and W = C(R). The zero vector of V is the constant zero
function: f (x) = 0 for all x ∈ R. By basic calculus, all constant functions are continuous (including
the constant zero function), the sum of two continuous functions is continuous, and a scalar multiple
of a continuous function is continuous. Thus (a), (b), and (c) hold for W .
The intersection of two subspaces is again a subspace. More generally:
Subspace Intersection 1.23. Let V be a vector space over F , and suppose {Wi : i ∈ I} is a
collection of subspaces of V . Then the intersection
\
W := Wi = {x ∈ V : x ∈ Wi for every i ∈ I}
i∈I

is a subspace of V .
6
Proof. We will verify (a), (b), and (c) of Subspace Test 1.19 for W above. For (a), since each Wi
is a subspace, we know that 0 ∈ Wi for each i, and so 0 ∈ W . For (b), let x, y ∈ W . Then for each
i ∈ I, x, y ∈ Wi . Since for each i ∈ I, Wi is a subspace, it follows by (b) for Wi that x + y ∈ Wi .
Since x + y ∈ Wi for each i, it follows that x + y ∈ W . Finally for (c), let x ∈ W and c ∈ F . Then
x ∈ Wi for each i ∈ I, so cx ∈ Wi for each i ∈ I, so cx ∈ W . 
Example 1.24. Let V = R2 , a vector space over R. Let W1 := {(x1 , 0) : x1 ∈ R} and W2 :=
{(0, x2 ) : x2 ∈ R}. Then W1 , W2 are subspaces of V . The intersection W1 ∩ W2 = {(0, 0)} is also a
subspace of R2 (the zero subspace). However, the union W1 ∪ W2 is not a subspace of R2 !
1.5. Linear combinations and span.
Definition 1.25. Let V be a vector space over F and S ⊆ V a nonempty subset of V . A vector
v ∈ V is called a linear combination of vectors of S if there exists n ∈ N, vectors u1 , . . . , un ∈ S,
and scalars a1 , . . . , an ∈ F such that
v = a1 u1 + a2 u2 + · · · + an un .
In this case we also say that v is a linear combination of u1 , . . . , un and call a1 , . . . , an the coeffi-
cients of the linear combination.
Here is a fundamental question in linear algebra:
Question 1.26. Given v ∈ V and u1 , . . . , un ∈ V , how does one determine whether v is a linear
combination of the vectors u1 , . . . , un ?
Answer. This reduces to solving a system of linear equations. 
Example 1.27. Let V = P1 (R), v = x + 5, and u1 = 2x, u2 = 3x − 1. We want to know if v is a
linear combination of u1 , u2 . In other words, are there a1 , a2 ∈ R such that v = a1 u1 + a2 u2 , i.e.,
can we find a1 , a2 ∈ R such that
x + 5 = a1 (2x) + a2 (3x − 1) = (2a1 + 3a2 )x + (−a2 )
By writing what it means for two polynomials to be equal, we arrive at a system of linear equations:
1 = 2a1 + 3a2
5 = −a2
which we solve: a2 = −5, and then a1 = 8. Thus v is a linear combination of u1 and u2 ; indeed,
v = −5u1 + 8u2 .
The following definition allows us to create a subspace starting from a subset S ⊆ V :
Definition 1.28. Let S ⊆ V be a nonempty subset of V . The span of S, denoted span(S) is the
set of all linear combinations of vectors in S, i.e.,
span(S) := {a1 u1 + · · · + an un : n ∈ N, ai ∈ F, ui ∈ S} ⊆ V.
As convention, we also define span(∅) = {0} ⊆ V . Note that S ⊆ span(S) since for every u ∈ S,
s = 1 · s ∈ span(S).
Example 1.29. Consider the vectors u1 = (1, 0, 0) and u2 = (0, 1, 0) in V = R3 , and let S :=
{u1 , u2 } ⊆ V . Then the vectors in span(S) are all vectors of the form a1 u1 + a2 u2 , where a1 , a2
vary over R. Thus
span(S) = {a1 (1, 0, 0) + a2 (0, 1, 0) : a1 , a2 ∈ R} = {(a1 , a2 , 0) : a1 , a2 ∈ R},
which is a subspace of R3 which contains S.
Lemma 1.30. Let V be a vector space over F , and suppose S ⊆ V . Then
7
(1) The span of S is a subspace of V .
(2) Any subspace of V that contains S must contain span(S).
In particular, span(S) is the smallest subspace of V which contains S.
Proof. Both (1) and (2) are obvious if S = ∅, because span(∅) = {0} is always a subspace of V , and
if W ⊆ V is a subspace of V which contains ∅ (this last part is redundant as all subsets contain ∅
automatically), then 0 ∈ W by Subspace Test 1.19(a), so W contains {0} = span(∅).
Now suppose S 6= ∅. Then there is z ∈ S. Since 0 · z = 0, we have 0 ∈ span(S). Next, suppose
x, y ∈ span(S). Then by definition of span(S) you can write
x = a1 u1 + · · · + am um and y = b1 v1 + · · · + bn vn
for some a1 , . . . , am , b1 , . . . , bn ∈ F and u1 , . . . , um , v1 , . . . , vm ∈ S. Then both
x + y = a1 u1 + · · · + am um + b1 v1 + · · · + bn vn and c · x = (ca1 )u1 + · · · + (cam )um
are also linear combinations of vectors in S, for c ∈ F , and so they belong to span(S). Since (a),
(b) and (c) hold in Subspace Test 1.19, we conclude that span(S) is a subspace of V .
Next, suppose W ⊆ V is an arbitrary subspace of V which contains S. Let w ∈ span(S).
Then w = c1 w1 + · · · + ck wk for some c1 , . . . , ck ∈ F and w1 , . . . , wk ∈ S. Since S ⊆ W we have
w1 , . . . , wk ∈ W . Since W is itself a vector space over F , we have w = c1 w1 + · · · + ck wk ∈ W .
Thus span(S) ⊆ W . 
Definition 1.31. A subset S of a vector space V generates (or spans) V if span(S) = V . In this
case, we will also say the vectors of S generate (or span) V .
In general, finding a small (finite, if possible) generating set for a vector space V is an efficient way
of describing V and simplifies working with it.
Example 1.32. For any vector space V , span(V ) = V , so V is generated by itself.
Example 1.33. In V = R3 , the vectors (1, 0, 0), (0, 1, 0) and (0, 0, 1) generate all of V . Indeed,
given any vector (a, b, c) ∈ R3 , we can express it as a linear combination of these three:
(a, b, c) = a(1, 0, 0) + b(0, 1, 0) + c(0, 0, 1).
Thus (a, b, c) ∈ span (1, 0, 0), (0, 1, 0), (0, 0, 1) = R3 .


Example 1.34. Let V = M2×2 (R) be the vector space of all 2 × 2 matrices with coefficients in R.
Then we set        
1 0 0 1 0 0 0 0
M1 = M2 = M3 = M4 =
0 0 0 0 1 0 0 1
and we claim that the four vectors M1 , M2 , M3 , M4 generate M2×2 (R). Indeed, note that
 
a b
= aM1 + bM2 + cM3 + dM4 .
c d

Thus span M1 , M2 , M3 , M4 = M2×2 (R).
Example 1.35. Recall that P (F ) is the vector space of all polynomials with coefficients in F .
Then the set {1, x, x2 , x3 , . . .} generates all of P (F ). Indeed, we have
span {1, x, x2 , . . .} = {a0 + a1 x + · · · + an xn : n ∈ N ∪ {0}, ai ∈ F }


Likewise, we have Pn (F ) is generated by {1, x, x2 , . . . , xn }.


8
1.6. Linear independence. Usually there are many different subsets which are able to generate
the same vector space. For instance
Example 1.36. The set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} can generate the entire vector space R3 . The
set 
(1, 0, 0), (0, 1, 0), (0, 0, 1), (2, 3, −1)
also generates all of R3 . In some sense, the last vector (2, 3, −1) is redundant, at least when it
comes to generating a subspace.
In general, it is natural to look for a smallest possible subset of V that generates all of V . This is
because it is not very helpful to have redundant vectors at your disposal for describing the subspace
a certain subset generates.
We will first look at the situation when it is possible to remove a redundant vector from a generating
set and it remains a generating set.
First, note that if V is a vector space over F and u1 , . . . , un ∈ V , then the zero vector is always a
linear combination of u1 , . . . , un via the trivial representation (using only the scalar 0 ∈ F as
every coefficient):
0 = 0 · u1 + 0 · u2 + · · · + 0 · un
sometimes (but not always) it may or may not be possible to write
0 = a1 u1 + · · · + an un
where a1 , . . . , an ∈ F and some ai 6= 0. Such a linear combination of 0 is called a non-trivial
representation of 0.
Example 1.37. In R2 ,
(0, 0) = 2 · (1, 2) + 5 · (2, 1) + 3 · (−4, −3)
is a non-trivial representation of the zero vector (0, 0).
Definition 1.38. A subset S of a vector space V is called linearly dependent if there exists a
finite number of distinct vectors u1 , . . . , un ∈ S and scalars a1 , . . . , an ∈ F such that at least one
ai 6= 0 and
a1 u1 + a2 u2 + · · · + an un = 0,
or in other words, u1 , . . . , un admit a non-trivial representation of 0.
If S is not linearly dependent, then we say that S is linearly independent.
We also say that the vectors v1 , . . . , vn are linearly dependent/independent if the set {v1 , . . . , vn }
is linearly dependent/independent.
Example 1.39. Let V = R2 . The set S1 = {(0, 1), (1, 0)} is linearly independent. Indeed, if
(0, 0) = a1 (0, 1) + a2 (1, 0), then (a1 , a2 ) must be a solution to the system
0 = a1 · 0 + a2 · 1
0 = a1 · 1 + a2 · 0
although, we see that a1 = 0 and a2 = 0 is the only solution. This means that every representation
of the zero vector must be the trivial representation and so S1 is linearly independent.
The set S2 = {(0, 1), (1, 0), (17, 18)} is linearly dependent. Indeed,
18 · (0, 1) + 17 · (1, 0) + (−1) · (17, 18) = (0, 0),
which is a non-trivial representation of the zero vector (0, 0).
Example 1.40. Let V be a vector space over F .
9
(1) Any subset S ⊆ V which contains 0 is automatically linearly dependent. Indeed, 0 = 1 · 0
is a non-trivial representation of 0 (since the coefficient 1 is a nonzero scalar)
(2) The empty set ∅ ⊆ V is linearly independent since we cannot form a non-trivial represen-
tation of zero with it’s elements (since it does not have any elements.)
(3) If S = {u} ⊆ V consists of a single non-zero vector u, then S is linearly independent.
Indeed, if {u} is linearly dependent, then a · u = 0 for some a ∈ F such that a 6= 0. Then
u = (a−1 · a) · u = a−1 · (a · u) = a−1 · 0 = 0.
Example 1.41. In V = M2×2 (R), the set
       
1 0 0 1 0 0 0 0
S = , , ,
0 0 0 0 1 0 0 1
is linearly independent. Indeed, assume that
         
0 0 1 0 0 1 0 0 0 0
= a1 + a2 + a3 + a4
0 0 0 0 0 0 1 0 0 1
 
0 0
is an arbitrary representation of the zero vector ∈ M2×2 (R). This means that the scalars
0 0
a1 , a2 , a3 , a4 ∈ R is a solution to the system of linear equations:
0 = a1 · 1 + a2 · 0 + a3 · 0 + a4 · 0
0 = a1 · 0 + a2 · 1 + a3 · 0 + a4 · 0
0 = a1 · 0 + a2 · 0 + a3 · 1 + a4 · 0
0 = a1 · 0 + a2 · 0 + a3 · 0 + a4 · 1

This is only possible if a1 = a2 = a3 = a4 = 0. In other words, the only representation of the zero
vector is the trivial representation.
Example 1.42. In V = Pn (F ), the set S = {1, x, . . . , xn } is linearly independent. Indeed, assume
that 0 = a0 · 1 + a1 · x + · · · + an · xn is a representation of the zero vector (i.e., the zero polynomials)
in Pn (F ). This means that a0 = a1 = · · · = an = 0, which implies the representation must be the
trivial representation.
Lemma 1.43. Let V be a vector space over F , and let S1 ⊆ S2 ⊆ V be two subsets. Then
(1) if S1 is linearly dependent, then S2 is linearly dependent, and
(2) if S2 is linearly independent, then S1 is linearly independent.
Proof. (1) Suppose S1 is linearly dependent. Then there are distinct u1 , . . . , un ∈ S1 and scalars
a1 , . . . , an ∈ F such that at least one ai 6= 0, such that a1 u1 + · · · + an un = 0. Then since
u1 , . . . , un ∈ S2 as well, we have that S2 is also linearly dependent.
(2) This is the contrapositive of (1). 
The following relates the notion of span and linear independence:
Proposition 1.44. Let S be a linearly independent subset of a vector space V , and let v ∈ V be
such that v 6∈ S. Then S ∪ {v} is linearly dependent if and only if v ∈ span(S).
Proof. (⇒) Suppose S ∪ {v} is linearly dependent. Then we can write
0 = a1 u1 + · · · + an un
for distinct u1 , . . . , un ∈ S ∪ {v} and some scalars a1 , . . . , an ∈ F , not all zero. Because S itself is
linearly independent, it is not possible that each ui ∈ S, for i = 1, . . . , n, i.e., at least one of the ui ,
10
say u1 , equals v, and a1 = 6 0. Thus a1 v + a2 u2 + · · · + an un = 0. Dividing by a1 and solving for v
yields
1 a2  an 
v = (−a2 u2 − · · · − an un ) = − u2 + · · · + − un .
a1 a1 a1
This shows that v is a linear combination of u2 , . . . , un ∈ S, and so v ∈ span(S).
(⇐) Suppose v ∈ span(S). Then we have v = b1 v1 + · · · bm vm for some vectors distinct
v1 , . . . , vm ∈ S and some scalars b1 , . . . , bm ∈ F . Hence
0 = b1 v1 + · · · + bm vm + (−1)v.
This is a nontrivial representation of the zero vector by the distinct vectors v1 , . . . , vm , v. Thus
{v1 , . . . , vm , v} are linearly dependent, hence S ∪ {v} is also linearly dependent by Lemma 1.43(1).

1.7. Bases and dimension.
Definition 1.45. Let V be a vector space over F . A basis for a vector space V is a linearly
independent subset of V that generates V . If β is a basis for V , we also say that the vectors of β
form a basis for V .
Example 1.46. Recall that span(∅) = {0} and ∅ is linearly indepenent. Thus the empty set ∅ is
a basis for the zero vector space.
Example 1.47. In V = F n , consider the vectors
e1 := (1, 0, 0, . . . , 0), e2 := (0, 1, 0, 0, . . . , 0), ..., en := (0, 0, . . . , 0, 1).
Then β = {e1 , e2 , . . . , en } is a basis for Fn and is called the standard basis for F n .
Example 1.48. In V = Mm×n (F ), let E ij denote the matrix whose only nonzero entry is a 1 in
the ith row and jth column. Then the set
β := {E ij : 1 ≤ i ≤ m, 1 ≤ j ≤ n}
is a basis for Mm×n (F ).
Example 1.49. In V = Pn (F ), the set {1, x, x2 , . . . , xn } is a basis. We call this basis the standard
basis for Pn (F ).
Example 1.50. In V = P (F ), the set {1, x, x2 , . . .} is a basis. This shows in particular that a
basis need not be finite.
The next proposition expresses a very important property of bases. Namely that every vector
can be expressed in a unique way as a linear combination of elements from a basis. This property
shows that bases are the building blocks of vector spaces in the sense that they allow us to efficiently
“parameterize” the entire space.
Proposition 1.51. Let V be a vector space over F and β = {u1 , . . . , un } be a subset of V . Then
the following are equivalent:
(1) β is a basis for V .
(2) Every vector v ∈ V can be uniquely expressed as a linear combination of vectors in β, i.e.,
can be expressed in the form v = a1 u1 + · · · + an un for unique scalars a1 , . . . , an ∈ F .
Proof. (1)⇒(2) Suppose β is a basis of V , and let v ∈ V be arbitrary. Since β spans V , we have
v ∈ span(β) and so there are coefficients a1 , . . . , an ∈ F such that v = a1 u1 + · · · + an un . Suppose
b1 , . . . , bn ∈ F are another collection of coefficients such that v = b1 u1 + · · · + bn un . Subtracting
the second linear combination from the first linear combination yields
v − v = 0 = (a1 − b1 )u1 + · · · + (an − bn )un ,
11
which is a representation of the zero vector using the vectors u1 , . . . , un . Since these vectors are
also linearly independent, this must be the zero representation and so the coefficients must all be
= 0. In other words, ai − bi = 0 for all i = 1, . . . , n, or rather, ai = bi for all i = 1, . . . , n. To
summarize, there is only one way to express v as a linear combination of u1 , . . . , un .
(2)⇒(1) Suppose every v ∈ V can be uniquely expressed as a linear combination of u1 , . . . , un .
Then in particular span(β) = V (this is true even without “uniquely”). It remains to check that β
is linearly independent. Assume 0 = a1 u1 + · · · + an un for certain coefficients a1 , . . . , an ∈ F . We
also know that 0 = 0u1 + · · · + 0un . Since by assumption there is only one way to express 0 as a
linear combination of the vectors from β, it must be the case that a1 = a2 = · · · = an = 0. We
conclude that β is also linearly independent, hence a basis of V . 
The following provides a method for producing a basis for a vector space:
Basis Existence 1.52. If V is a vector space generated by a finite set S ⊆ V , then some subset
of S is a basis for V . In particular, V has a finite basis.
Proof. First, if S = ∅, or S = {0}, then V = span(S) = {0}, and so ∅ is a subset of S which is a
basis for V .
Otherwise, suppose that S contains a vector v 6= 0. By a previous example, the set {v} is
linearly independent. Now, consider all finitely many subsets of S, and pick one that is linearly
independent and has largest possible size; say, β = {u1 , . . . , un } ⊆ S is linearly independent. Since
{v} is a linearly independent set of size 1, we know that β 6= ∅, i.e., n ≥ 1. We claim that β is a
basis for V . By choice of β, we know that β is linearly independent. By Lemma 1.30(2), we must
show that S ⊆ span(β) (because if the subspace span(β) contains S, then it contains span(S) = V ,
so span(β) = V ). Let v ∈ S. If v ∈ β, then v ∈ span(β). Otherwise, if v 6∈ β, then β ∪ {v} is
linearly dependent (because β is a maximal linearly independent subset of S), so v ∈ span(β) by
Proposition 1.44. Thus S ⊆ span(β). 
A consequence of the above lemma is that every finite generating set can be reduced to a basis for
V by removing the “redundant” vectors. Vector spaces that are not generated by a finite subset
still have a basis (assuming the Axiom of Choice), but this is more complicated and we will not
discuss this in this class since our main focus is on finitely generated vector spaces.
The following useful and technical result says that a linearly independent set L is always as most
as big as a generating set G, and that L can be “completed” to a generating set in an efficient way
using vectors from G.
Replacement Lemma 1.53. Let V be a vector space generated by a subset G ⊆ V with |G| = n,
and let L ⊆ V be a linearly independent set such that |L| = m. Then:
(1) m ≤ n, and
(2) there exists H ⊆ G with |H| = n − m such that L ∪ H generates V .
Proof. We will prove this by induction on m ∈ N ∪ {0}. For the base case m = 0, necessarily L = ∅,
m = 0 ≤ n, and H := G has the property |H| = n − m = n − 0 and L ∪ H = G generates V .
Next, suppose that the result is true for a certain m ≥ 0. We will prove it for m + 1. Let
L = {v1 , . . . , vm+1 } ⊆ V be linearly independent, so |L| = m + 1. By Lemma 1.43, the set
{v1 , . . . , vm } is also linearly independent. By the inductive hypothesis, we have m ≤ n and there is
a subset {u1 , . . . , un−m } ⊆ G such that {v1 , . . . , vm }∪{u1 , . . . , un−m } generates V . Since vm+1 ∈ V ,
this means there are scalars a1 , . . . , am , b1 , . . . , bn−m ∈ F such that
(∗) a1 v1 + · · · + am vm + b1 u1 + · · · + bn−m un−m = vm+1
Note that in the above expression, we have n − m > 0 (so n ≥ m + 1), and also bi 6= 0 for
some i ∈ {1, . . . , n − m}, for otherwise we would be able to write vm+1 as a linear combination of
12
v1 , . . . , vm , contradicting the assumption that v1 , . . . , vm+1 is linearly independent. By rearranging
the bi ’s and ui ’s, we may assume that b1 6= 0. Thus we can solve for u1 in (∗):
         
a1 am 1 b2 bn−m
(∗∗) u1 = − v1 + · · · + − vm + vm+1 + − u2 + · · · + − un−m
b1 b1 b1 b1 b1
Now let H = {u2 , . . . , un−m }, so |H| = (n − m) − 2 + 1 = n − (m + 1). By (∗∗) we have
u1 ∈ span(L ∪ H), so {v1 , . . . , vm , u1 , . . . , un−m } ⊆ span(L ∪ H). As {v1 , . . . , vm , u1 , . . . , un−m }
generates V , span(L ∪ H) = V by Lemma 1.30 (because span(L ∪ H) is a subspace which con-
tains {v1 , . . . , vm , u1 , . . . , un−m }, so it must contain span(v1 , . . . , vm , u1 , . . . , un−m ) = V ). Thus the
lemma is true for m + 1. 
For vector spaces with a finite basis, every basis has the same number of elements:
Corollary 1.54. Let V be a vector space with a finite basis. Then there is a number n ∈ N ∪ {0}
such that for every basis β of V , |β| = n.
Proof. By assumption, there is a finite basis γ of V . Define n := |γ|. Suppose β is an arbitrary
basis of V . Recall that a basis of V is both linearly independent and generates V . Then since β is
linearly independent and γ generates V , we have |β| ≤ |γ| = n by the Replacement Lemma 1.53(1).
Likewise, since γ is linearly independent and β generates V , we also have n = |γ| ≤ |β|. We
conclude that |β| = n. 
Definition 1.55. Given a vector space V , if V has a finite basis then we say that V is finite-
dimensional and we define dim(V ) to be the unique number n from Corollary 1.54, i.e.,
dim(V ) = size of any basis of V .
If V does not have a finite basis, then we say that V is infinite-dimensional, and we write
dim(V ) = ∞.
Example 1.56. (1) dim({0}) = 0, ∅ is a basis, and |∅| = 0,
(2) dim(F n ) = n, the standard basis {e1 , . . . , en } contains n vectors,
(3) dim(Mm×n ) = mn, the basis {E ij : 1 ≤ i ≤ m, 1 ≤ j ≤ n} has mn vectors,
(4) dim(Pn (F )) = n + 1, the set {1, x, . . . , xn } is a basis and has n + 1 vectors in it.
Example 1.57. The vector space P (F ) has an infinite basis {1, x, x2 , . . .}. It follows from the
Replacement Lemma 1.53 that P (F ) cannot also have a finite basis. Indeed, assume towards a
contradiction that β is a finite basis of P (F ), say |β| = n. Then the set L = {1, x, . . . , xn } is linearly
independent and |L| = n + 1. The Replacement Lemma then implies that |L| = n + 1 ≤ |β| = n
(since β generates P (F )), a contradiction.
We conclude that P (F ) is infinite-dimensional.
Corollary 1.58. Let V be a vector space of dimension n. Then
(a) Every linearly independent subset of V with n elements is a basis for V .
(b) Every linearly independent subset of V can be extended to a basis for V .
Proof. Let β be a basis for V . In particular, |β| = n. Then for (a) we suppose L ⊆ V is linearly
independent with |L| = n. Then by the Replacement Lemma 1.53, there is H ⊆ β such that
|H| = n − n − 0 and L ∪ H = L generates V . Thus H = ∅ and L generates V , so L is a basis for V .
For (b), suppose L is a linearly independent subset of V such that |L| = m. By the Replacement
Theorem there is H ⊆ β such that |H| = n − m and L ∪ H generates V . Note that |L ∪ H| = n, and
by Basis Existence 1.52, there is S ⊆ L ∪ H which is a basis of V . Since |S| = n by Corollary 1.54,
we must have S = L ∪ H, i.e., L ∪ H is a basis for V . 
Dimension Monotonicity 1.59. Let W be a subspace of a vector space V with dim(V ) < ∞.
Then dim(W ) ≤ dim(V ). Moreover, if dim(W ) = dim(V ), then V = W .
13
Proof. Let β be a basis for W . Then since β is a linearly independent subset of W , hence also V , it
can be extended to a basis γ of V by Corollary 1.58. In particular, dim(W ) = |β| ≤ dim(V ) = |γ|.
In the case where dim(W ) = dim(V ), then necessarily β = γ, and so β is a basis for V as well.
Thus W = span(β) = V , since β is a basis for both V and W . 
Corollary 1.60. Suppose W is a subspace of a vector space V , with dim(V ) < ∞. Then any basis
for W can be extended to a basis for V .
Proof. If S ⊆ W is a basis for W , it is a linearly independent subset of W and V , hence can be
extended to a basis for V . 

14
2. Linear transformations
The primary objects of study in linear algebra are linear transformations. Here is the definition:
Definition 2.1. Let V and W be vector spaces over F . A function T : V → W is a linear
transformation from V to W if, for every x, y ∈ V and c ∈ F :
(1) T (x + y) = T (x) + T (y)
(2) T (cx) = cT (y)
2.1. Basic properties of linear transformations.
Lemma 2.2. Let T : V → W be a linear transformation. Then
(1) T (0) = 0,
(2) T (cx + y) = cT (x) + T (y) for all x, y ∈ V and c ∈ F (in fact, this holds if and only if T is
linear),
P− y) = T (x)P
(3) T (x − T (y),
(4) T ( ni=1 ai xi ) = ni=1 ai T (xi ), for all x1 , . . . , xn ∈ V and a1 , . . . , an ∈ F .
Example 2.3. Define T : Mm×n (F ) → Mn×m (F ) by T (A) = At , where At is the transpose of A.
Then T is a linear transformation.
Example 2.4. For n ≥ 1, let T : Pn (R) → Pn−1 (R) be defined by T (f (x)) = f 0 (x), where f 0 (x)
denotes the derivative of f (x). We will show that T is linear: let g(x), h(x) ∈ Pn (R) and a ∈ R be
arbitrary. Then
T (ag(x) + h(x)) = (ag(x) + h(x))0 = ag 0 (x) + h0 (x) = a · T (g(x)) + T (h(x)).
Example 2.5. Let V = C(R), the vector space of continuous real-valued functions on R. Let
a, b ∈ R be two fixed real numbers such that a < b. Define T : V → R (where R is the vector space
R1 over R) by:
Z b
T (f (x)) := f (t)dt
a
for all functions f ∈ V (this definition uses the fact that continuous functions are integrable,
something which is proved in math131a). Then T is linear:
Z b Z b Z b
T (ag(t) + h(t)) = (ag(t) + h(t))dt = a g(t)dt + h(t)dt = a · T (g) + T (h).
a a a
2.2. Null space and range.
Definition 2.6. Let V and W be vector spaces over F , and T : V → W be linear.
(1) Let N (T ) := {x ∈ V : T (x) = 0}, the null space (or kernel) of T ,
(2) Let R(T ) := {T (x) : x ∈ V }, the range (or image) of T .
Example 2.7. Let V and W be vector spaces over F . Then
(1) We define I : V → V by I(x) = x for all x ∈ V . We call I the identity transformation.
I is a linear transformation, N (I) = {0} and R(I) = V .
(2) We define T0 : V → W by T0 (x) = 0 for all x ∈ V . We call T0 the zero transformation.
Then T0 is linear, N (T0 ) = V and R(T0 ) = {0}.
Proposition 2.8. Let V, W be vector spaces over F and suppose T : V → W is a linear transfor-
mation. Then
(1) N (T ) is a subspace of V , and
(2) R(T ) is a subspace of W .
15
Proof. (1) We will show that N (T ) ⊆ V is a subspace of V . First, note that T (0) = 0, and
so 0 ∈ N (T ). Next, let x, y ∈ N (T ). Then T (x + y) = T (x) + T (y) = 0 + 0 = 0. Thus
x + y ∈ N (T ). Finally, suppose x, ∈ N (T ) and a ∈ F . Then T (ax) = a · T (x) = a · 0 = 0.
Thus ax ∈ N (T ). We conclude that N (T ) is a subspace of V .
(2) We will show that R(T ) ⊆ W is a subspace of W . First, note that T (0) ∈ R(T ) and
T (0) = 0. Thus 0 ∈ R(T ). Next, suppose w1 , w2 ∈ R(T ). Then there exists v1 , v2 ∈ V
such that T (v1 ) = w1 and T (v2 ) = w2 . Then T (v1 + v2 ) = T (v1 ) + T (v2 ) = w1 + w2 . Thus
w1 + w2 ∈ R(T ). Finally, suppose c ∈ F and w ∈ R(T ). Then there is v ∈ V such that
T (v) = w. Then T (cv) = cT (v) = cw, and so cw ∈ R(T ). We conclude that R(T ) is a
subspace of W . 
Proposition 2.9. Let V, W be vector spaces over F and assume T : V → W is a linear transfor-
mation. If β = {v1 , . . . , vn } is a basis for V , then

R(T ) = span{T (β)} = span {T (v1 ), . . . , T (vn )}
Proof. Clearly T (vi ) ∈ R(T ) for each i = 1, . . . , n. As R(T ) is a subspace of W ,

span {T (v1 ), . . . , T (vn )} ⊆ R(T )
by Lemma 1.30(2). Now, suppose w ∈ R(T ), then Pw = T (v) for some v ∈ V . Since β is a basis for
V , we have scalars a1 , . . . , an ∈ F such that v = ni=1 ai vi . Since T is linear,
n
X 
w = T (v) = ai T (vi ) ∈ span {T (v1 ), . . . , T (vn )} = span(T (β)).
i=1

Thus R(T ) ⊆ span(T (β)). 


Definition 2.10. Let V, W be vector spaces over F and T : V → W a linear transformation. If
N (T ) and R(T ) are finite-dimensional then we define

nullity(T ) := dim N (T )

rank(T ) := dim R(T ) .

Intuitively, if N (T ) is very large (i.e., T sends many vectors from V to 0), then R(T ) should be
small (not too many vectors in W are obtained by being sent to V by T ), and vice-versa. The
following important theorem makes this precise:
Dimension Theorem 2.11. Let T : V → W be a linear transformation. If dim(V ) < ∞, then
nullity(T ) + rank(T ) = dim(V ).
Proof. Suppose that dim(V ) = n, dim(N (T )) = k, and {v1 , . . . , vk } is a basis for N (T ). By
Corollary 1.60, we can extend {v1 , . . . , vk } to a basis β = {v1 , . . . , vk , vk+1 , . . . , vn } of V .
Claim. S := {T (vk+1 ), . . . , T (vn )} is a basis for R(T ).
Proof. First, we will show that S generates R(T ). As T (vi ) = 0 for i = 1, . . . , k, by Proposition 2.9,
we have
 
R(T ) = span {T (v1 ), . . . , T (vk ), T (vk+1 ), . . . , T (vn )} = span T (vk+1 ), . . . , T (vn ) .
| {z }
=0
Pn
Second, we will show that S is linearly indpendent. Pn Suppose i=k+1 bP i T (vi ) = 0 for some scalars
bk+1 , . . . , bn ∈ F . Since T is linearly, we have T ( i=k+1 bi vi ) = 0, so ni=k+1 bi vi ∈ N (T ). Thus,
16
Pk Pn
there exists c1 , . . . , ck ∈ F such that i=1 ci vi = i=k+1 bi vi , or rather,
n
X n
X
(−ci )vi + bi vi = 0.
i=1 i=k+1
Since β is a basis for V , this implies that every ci = 0 and every bi = 0. This implies that S is
linearly independent. 
Now counting the size of the relevant spaces gives
dim(V ) = #{v1 , . . . , vn } = n
dim N (T ) = #{v1 , . . . , vk } = k
dim R(T ) = #{T (vk+1 ), . . . , T (vn )} = n − (k + 1) + 1 = n − k.
Thus the formula in the statement of the theorem is true. 
Example 2.12. (1) Let T : F n → F n−1 be defined by T ((a0 , . . . , an )) := (a1 , . . . , an−1 ) (so T
“forgets” the nth component). Then T is a linear transformation,
and R(T ) = F n−1 .

N (T ) = (0, . . . , 0, an ) : an ∈ F
| {z }
n−1

In this situation, dim(F n )


= n, dim(N (T )) = 1, and dim(R(T )) = dim(F n−1 ) = n − 1.
Note that these dimensions add up correctly as predicted by the Dimension Theorem 2.11.
(2) Let T : Pn (R) → Pn−1 (R) be the differentiation linear transformation, i.e., T (p(x)) = p0 (x)
for every polynomial p(x) ∈ Pn (R). Then T (p(x)) = 0 iff p0 (x) = 0 iff p(x) is a constant
polynomial (i.e., has degree 0 or −1). Thus
N (T ) = {constant polynomials in P (R)}.
Recall that {1, x, . . . , xn−1 } is a basis for Pn−1 (R). Since 1 = T (x), x = 21 T (x2 ), . . . , xn−1 =
1 n n
n T (x ), it follows that Pn−1 (R) = span({T (x), . . . , T (x )}) = R(T ). Thus dim(Pn (R)) =
n + 1, dim(R(T )) = n and dim(N (T )) = 1, and we recognize that the dimensions add up
correctly in this example as well.
Definition 2.13. Let T : V → W be a linear transformation. We say that
(1) T is one-to-one (or injective) if for all u, v ∈ V , if T (u) = T (v), then u = v,
(2) T is onto (or surjective) if for every w ∈ W there exists some v ∈ V such that T (v) = w,
and
(3) T is bijective if T is both one-to-one and onto.
The null space allows us to detect the “one-to-oneness” of a linear transformation:
One-to-one Criterion 2.14. Let T : V → W be a linear transformation. Then T is one-to-one
if and only if N (T ) = {0}.
Proof. (⇒) Suppose T is one-to-one, and let x ∈ N (T ). Then T (x) = 0 = T (0). By the definition
of one-to-one, this implies that x = 0 ∈ {0}. Thus N (T ) = {0}.
(⇐) Suppose N (T ) = {0}. We want to show that T is one-to-one. Let u, v ∈ V be such
that T (u) = T (v) = 0. Consider the vector u − v ∈ V . Applying T and using linearity we get
T (u − v) = T (u) − T (v) = 0 − 0 = 0, and thus u − v ∈ N (T ). By assumption N (T ) = {0} and thus
u − v = 0, i.e., u = v. We conclude that T is one-to-one. 
Proposition 2.15. Let T : V → W be a linear transformation, and suppose dim(V ) = dim(W ) <
∞. Then the following are equivalent:
(1) T is one-to-one,
17
(2) T is onto,
(3) T is bijective (i.e., one-to-one and onto),
(4) dim(R(T )) = dim(V ).
Proof. Recall that by the Dimension Theorem 2.11, we have dim(N (T )) + dim(R(T )) = dim(V ).
Now note that
T is one-to-one ⇐⇒ N (T ) = 0, by One-to-one Criterion 2.14
⇐⇒ dim(N (T )) = 0
⇐⇒ dim(R(T )) = dim(V ) by the Dimension Theorem 2.11
⇐⇒ dim(R(T )) = dim(W ) by assumption that dim(V ) = dim(W )
⇐⇒ R(T ) = W by Dimension Monotonicity 1.59
⇐⇒ T is onto. 

(1) Define T : F 2 → F 2 by T (a1 , a2 ) = (a1 +a2 , a1 ). Then N (T ) = (0, 0) ,


 
Example 2.16.
so T is one-to-one. By Proposition 2.15, T is also onto.
(2) Define T : Pn (R) → Rn+1 by T (a0 + a1 x + · · · + an xn ) = (a0 , a1 , . . . , an ). Then T is a linear
transformation and is one-to-one. Thus T is onto, since dim(Pn (R)) = dim(Rn+1 ).

The following shows that a linear transformation is completely determined by the values it takes
on a basis, and conversely, we can define a linear transformation by prescribing values that it takes
on a basis (and we have complete freedom when doing so).
Linear Transformation Prescription 2.17. Let V, W be vector spaces over a field F , and let
{v1 , . . . , vn } be a basis for V . Then for any w1 , . . . , wn ∈ W , there exists exactly one linear trans-
formation T : V → W such that T (vi ) = wi for i = 1, . . . , n.
Proof. Let x ∈ V . Then x = ni=1 ai vi for unique scalars a1 , . . . , an ∈ F (by Proposition 1.51, since
P
{v1 , . . . , vn } is a basis). Then we define a function T : V → W by
n
X
T (x) := ai wi
i=1

[T is well-defined because the scalars ai are unique] We have a few things to show:
(a) T is a linear transformation. To show this, let u, v ∈ V and d ∈ F . Then we can write
n
X n
X
u = bi vi and v = ci vi
i=1 i=1
for some scalars b1 , . . . , bn , c1 , . . . , cn ∈ F . Then
n
X
du + v = (dbi + ci )vi
i=1
and in particular, the scalars dbi + ci are the unique scalars used to represent du + v in terms
of the basis {v1 , . . . , vn }. Applying T to du + v gives
n
X n
X n
X
T (du + v) = (dbi + ci )wi = d bi wi + ci wi = dT (u) + T (v).
i=1 i=1 i=1
Pn
(b) It is clear that T (vi ) = wi for i = 1, . . . , n, since vj = i=1 aivi where aj = 1 and ai = 0 if
i 6= j.
18
(c) T is unique. To prove this, assume that U : V → W is a linear Ptransformation with the property
that U (vi ) = wi for i = 1, . . . , n. Then, for x ∈ V with x = ni=1 ai vi we have (using that U is
linear)
Xn n
X
U (x) = ai U (vi ) = ai wi = T (x).
i=1 i=1
Thus U = T . 

As suggested by the proof, this gives us a way of checking if two linear transformations are actually
the same:
Corollary 2.18. Let V, W be vector spaces over F , and suppose V has a finite basis {v1 , . . . , vn }.
If U, T : V → W are linear transformations and U (vi ) = T (vi ) for i = 1, . . . , n, then U = T (i.e.,
U (x) = T (x) for every x ∈ V ).
Example 2.19. Let T : R2 → R2 be the linear transformation defined by T (a1 , a2 ) = (2a2 −a1 , 3a1 ).
Suppose that U : R2 → R2 is any linear transformation. If we know that U (1, 2) = (3, 3) and
U (1, 1) = (1, 3), then U = T . This follows from the above corollary because {(1, 2), (1, 1)} is a basis
for R2 .
2.3. The matrix representation of a linear transformation.
Definition 2.20. Let V be a finite dimension vector space over F . A ordered basis for V is a
basis for V equipped with a specific order.
Example 2.21. (1) In F 3 , β = {e1 , e2 , e3 } and γ = {e3 , e1 , e2 } are both ordered bases for F 3 .
Note that β and γ describe the same (unordered) basis for F 3 , but they are different as
ordered bases because the first basis vector of β is e1 whereas the first basis vector of γ is
e3 .
(2) For the vectors space F n over F , we call {e1 , e2 , . . . , en } the standard ordered basis for
F n.
(3) For the vector space Pn (F ) over F , we call {1, x, . . . , xn } the standard ordered basis for
Pn (F ).

We will not stress the precise set-theoretic difference between an ordered basis and an (unordered)
basis. Usually it will be clear when the situation demands that we work with an ordered basis, for
instance, when representing a linear transformation as a matrix as we’ll see below.
Definition 2.22. Let β = {u1 , . . . , un } be an ordered basis for a finite-dimensional vector space
V . For x ∈ V , let a1 , . . . , an ∈ F be the unique scalars (see Proposition 1.51) such that
n
X
x = ai ui .
i=1

We define the coordinate vector of x relative to β, denoted [x]β , by


 
a1
 a2 
[x]β :=  ..  ∈ F n .
 
.
an
Note that for each i = 1, . . . , n, we have [ui ]β = ei . Furthermore, the correspondence x 7→ [x]β :
V → F n is actually a bijective linear transformation (exercise).
19
Example 2.23. Let V = P2 (R), and let β = {1, x, x2 } be the standard ordered basis for V . Let
f (x) = 4 + 6x − 7x2 ∈ V . Then  
  4
f (x) β = 6 .

7
Definition 2.24. Suppose V, W are finite-dimensional vector spaces, with ordered bases β =
{v1 , . . . , vn } and γ = {w1 , . . . , wm }, respectively. Suppose T : V → W is a linear transfor-
mation. Then for each j = 1, . . . , n, there exists unique scalars (again, by Proposition 1.51)
a1j , a2j , . . . , amj ∈ F such that
m
X
T (vj ) = aij wi for j = 1, . . . , n.
i=1
We call the m×n matrix A defined by Aij = aij the matrix representation of T in the ordered
bases β and γ, and we write A = [T ]γβ . In other words:
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
γ
[T ]β := A =  .. .. .. .. 

 . . . . 

..
a a m1 . am2 mn

In the special case where V = W and β = γ, then we write just A = [T ]β . Note that
(1) for j = 1, . . . , n, then jth column of A is [T (vj )]γ (the coordinate vector of T (vj ) relative
to γ)
(2) If U : V → W is a linear transformation such that [U ]γβ = [T ]γβ , then U = T by Corol-
lary 2.18.
(3) In practice, [T ]γβ gives an explicit way to describe T which is very useful for computations.
Example 2.25. Let T : R2 → R3 be the linear transformation given by T (a1 , a2 ) := (a1 +
3a2 , 0, 2a1 − 4a2 ). Let β = {e1 , e2 } and γ = {e1 , e2 , e3 } be the standard ordered basis for R2 and
R3 , respectively. Then we compute:
T (1, 0) = (1, 0, 2) = 1e1 + 0e2 + 2e3
T (0, 1) = (3, 0, −4) = 3e1 + 0e2 − 4e3
and so  
1 3
[T ]γβ = 0 0 
2 −4
0
If instead we consider the ordered basis γ = {e3 , e2 , e1 }, then
 
0
2 −4
[T ]γβ = 0 0  .
1 3

We have now given a method of associating to a linear transformation T : V → W a certain


matrix [T ]γβ . We will now show that this process of association faithfully preserves all of the “linear
algebra” going on with T : V → W . We will make this more precise.
Definition 2.26. Let T, U : V → W be functions, where V, W are vector spaces over F , and let
a ∈ F . We define two new functions:
• T + U : V → W by (T + U )(x) := T (x) + U (x) for all x ∈ V , and
20
• aT : V → W by (aT )(x) := a · T (x), for all x ∈ V .
Proposition 2.27. Let V, W be vector spaces over F , and suppose T, U : V → W are linear
transformations. Then
(1) for every a ∈ F , aT + U : V → W is a linear transformation.
(2) the set of all linear transformations from V to W is a vector space over F (with the opera-
tions given in Definition 2.26).
Proof. (1) We need to show that aT + U is a linear transformation. To do this, let x, y ∈ V
and c ∈ F be arbitrary. Then note that

(aT + U )(cx + y) = (aT )(cx + y) + U (cx + y) = a T (cx + y) + (cU )(x) + U (y)
= a(cT (x) + T (y)) + cU (x) + U (y) = acT (x) + cU (x) + aT (y) + U (y)
= c(aT + U )(x) + (aT + U )(y),
and so the map aT + U is linear.
(2) The zero vector is the linear transformation T0 : V → W defined by T0 (x) = 0 for all x ∈ V .
Verifying the axioms (VS1)-(VS8) is routine. 
Definition 2.28. For vector spaces V, W over F , we denote
L(V, W ) := {T : T is a linear transformation from V to W },
which is a vector space over F . In the special case where V = W , then we write L(V ) instead of
L(V, W ).
2.4. Algebraic description of the operations in L(V, W ). We have shown that every linear
transformation V → W can be represented by a matrix. We will now show that the operations of
pointwise addition and scalar multiplication on L(V, W ) correspond to matrix addition and scalar
multiplication of their matrix representations:
Proposition 2.29. Let V and W be finite-dimensional vector spaces with ordered bases β and γ,
respectively. Let T, U : V → W be linear transformations. Then:
(1) [T + U ]γβ = [T ]γβ + [U ]γβ
(2) [aT ]γβ = a[T ]γβ , for all scalars a ∈ F .
Note, the operations on the right side of these equations are operations on matrices (adding to
matrices and scalar multiplying a matrix by a ∈ F ).
Proof. (1) Let β = {v1 , . . . , vn } and γ = {w1 , . . . , wm }. Then there exist unique scalars aij and
bij , for 1 ≤ i ≤ m, 1 ≤ j ≤ n such that
m
X m
X
T (vj ) = aij wi and U (vj ) = bij wi , for 1 ≤ j ≤ n.
i=1 i=1

Thus
m
X
(T + U )(vj ) = (aij + bij )wi .
i=1
Thus, for the matrix [T + U ]γβ we have

[T + U ]γβ ij = aij + bij = [T ]γβ + [U ]γβ .


 

(2) Similar. 
21
Example 2.30. Let T, U : R2 → R3 be defined by
T (a1 , a2 ) := (a1 + 3a2 , 0, 2a1 − 4a2 )
U (a1 , a2 ) := (a1 − a2 , 2a1 , 3a1 + 2a2 ).
Let β, γ be the standard ordered bases for R2 and R3 , respectively. Then
   
1 3 1 −1
[T ]γβ = 0 0  and [U ]γβ = 2 0 
2 −4 3 2
Applying the definition, we also have
(T + U )(a1 , a2 ) = (a1 + 3a2 , 0, 2a1 − 4a2 ) + (a1 − a2 , 2a1 , 3a1 + 2a2 ) = (2a1 + 2a2 , 2a1 , 5a1 − 2a2 ).
Thus  
2 2
[T + U ]γβ = 2 0  = [T ]γβ + [U ]γβ ,
5 −2
as predicted by the above proposition.
2.5. Composition of linear transformations and matrix multiplication.
Definition 2.31. Let T : V → W and U : W → Z be two linear transformations of vector spaces.
The composition of T and U , denoted U T : V → Z, is a function from V to Z defined by
(U T )(x) := U (T (x)) for every x ∈ V .
The composition of two linear transformations is also a linear transformation:
Lemma 2.32. Suppose T : V → W and U : W → Z are linear transformations. Then U T : V → Z
is also a linear transformation.
Proof. Let x, y ∈ V and a ∈ F be arbitrary. Then
U T (ax + y) = U (T (ax + y)) def. of composition
= U (aT (x) + T (y)) because T is linear
= aU (T (x)) + U (T (y)) because U is linear
= a · U T (x) + U T (y) def. of composition. 
Here are some properties of linear transformations which we state without proof:
Proposition 2.33. Let V be a vector space over F and suppose T, U1 , U2 ∈ L(V ). Then
(1) T (U1 + U2 ) = T U1 + T U2 and (U1 + U2 )T = U1 T + U2 T ,
(2) T (U1 U2 ) = (T U1 )U2 ,
(3) T I = IT = T , where I : V → V is the identity transformation, and
(4) a(U1 U2 ) = (aU1 )U2 = U1 (aU2 ) for every a ∈ F .
Now assume that V, W, Z are vector spaces over F , and let α = {v1 , . . . , vp }, β = {w1 , . . . , wn },
γ = {z1 , . . . , zm } be ordered bases for V, W, and Z, respectively. Furthermore, let T : V → W and
U : W → Z be linear transformations. Then as before we can represent T and U as matrices with
respect to these bases. Let:
A := [U ]γβ and B := [T ]βα
be their matrix representations.
We also have the linear transformation U T : V → Z. We are free to compute [U T ]γα . For
1 ≤ j ≤ n, we have
 Pn  Pn
(U T )(vj ) = U T (vj ) = U k=1 Bkj wk = k=1 Bkj U (wk )
22
n
X m
X m
X n
X m
X
 
= Bkj Aik zi = Aik Bkj zi = Cij zi
k=1 i=1 i=1 k=1 i=1

[U T ]γα
Pn
where Cij := k=1 Aik Bkj . Thus = C = (Cij )1≤i≤p,1≤j≤n .
This computation motivates the usual definition of matrix multiplication.
Definition 2.34. Let A be an m × n matrix and B an n × p matrix (both with coefficients in F ).
We define the product of A and B, denoted AB, to be the m × p matrix AB with the property
that for 1 ≤ i ≤ m and 1 ≤ j ≤ p, the ij-entry is
n
X
(AB)ij := Aik Bkj .
k=1

Example 2.35. (1) We have the following multiplication (of matrices with coefficients in R):
 
  4    
1 2 1 1·4+2·2+1·5 13
· 2 =
  = .
0 4 −1 0 · 4 + 4 · 2 + (−1) · 5 3
5
(2) In general, matrix multiplication is NOT commutative:
           
1 1 0 1 1 1 0 1 1 1 0 0
· = , but · = .
0 0 1 0 0 0 1 0 0 0 1 1
Thus, in general you should expect that AB 6= BA for square matrices A and B. (Occa-
sionally we have AB = BA, but this is a rather special situation).
(3) Recall the definition of the transpose of a matrix A ∈ Mm×n (F ): At ∈ Mn×m (F ) is the
matrix given by (At )ij = Aji for 1 ≤ i ≤ n, 1 ≤ j ≤ m. We will show that (AB)t = B t At for
matrices A ∈ Mm×n (F ) and B ∈ Mn×p (F ). Indeed, note that for 1 ≤ i ≤ p and 1 ≤ j ≤ m
we have
X n Xn Xn
(AB)tij = (AB)ji = Ajk Bki = Bki Ajk = (B t )ik (At )kj = (B t At )ij
k=1 k=1 k=1

We can now relate composition of linear transformations with matrix multiplication:


Proposition 2.36. Let V, W and Z be finite dimensional vector spaces with ordered bases α, β,
and γ respectively (as above), and suppose T : V → W and U : W → Z are linear transformations.
Then
[U T ]γα = [U ]γβ [T ]βα .
[Note that the left side is the matrix representation of the linear transformation U T , whereas the
right side is the matrix multiplication of two matrices.]
Proof. Note that no proof is actually needed since we conveniently defined the ij-entry of the right
side to be the ij-entry of the left side. See our above motivating calculation to see what the ij-entry
of the left side is. 

In the case where all vector spaces are the same and all ordered bases are the same this simplifies
to:
Corollary 2.37. Let V be a finite-dimensional vector space with an ordered basis β. Let T, U ∈
L(V ). Then
[U T ]β = [U ]β [T ]β .
23
Example 2.38. Let U : P3 (R) → P2 (R) and R x T : P2 (R) → P3 (R) be the linear transformations
defined by U (f (x)) := f 0 (x) and T (f (x)) := 0 f (t)dt. Let α = {1, x, x2 , x3 } and β = {1, x, x2 } be
the standard ordered bases of P3 (R) and P2 (R) respectively. Then we have
U (1) = 0 = 0 · 1 + 0 · x + 0 · x2
U (x) = 1 = 1 · 1 + 0 · x + 0 · x2
U (x2 ) = 2x = 0 · 1 + 2 · x + 0 · x2
U (x3 ) = 3x2 = 0 · 1 + 0 · x + 3 · x2
and so  
0 1 0 0
[U ]βα = 0 0 2 0
0 0 0 3
We also have
T (1) = x = 0 · 1 + 1 · x + 0 · x2 + 0 · x3
1 1
T (x) = x2 = 0 · 1 + 0 · x + · x2 + 0 · x3
2 2
1 3 1
T (x ) = x = 0 · 1 + 0 · x + 0 · x2 + · x3
2
3 3
and so  
0 0 0
1 0 0
[T ]αβ = 
0 1

2 0
1
0 0 3
Thus  
1 0 0
[U T ]β = [U ]βα [T ]αβ = 0 1 0 = [I]β ,
0 0 1
where I : P2 (R) → P2 (R) is the identity transformation. This agrees with the fundamental theorem
of calculus (that differentiation is the inverse operation to integration).
Definition 2.39. The n × n identity matrix In (over a field F ) is defined by
(
1 if i = j
(In )ij :=
0 if i 6= j
so for instance,  
  1 0 0
 1 0
I1 = 1 , I2 = , I3 = 0 1 0 
0 1
0 0 1
Here are some basic properties of matrix multiplication which we state without proof:
Proposition 2.40. Let A ∈ Mm×n (F ), B, C ∈ Mn×p (F ), and D, E ∈ Mq×m (F ). Then
(1) A(B + C) = AB + AC and (D + E)A = DA + EA,
(2) a(AB) = (aA)B = A(aB) for every scalar a ∈ F ,
(3) Im A = A = AIn , and
(4) if dim(V ) = n and I : V → V is the identity transformation, then for every ordered basis β
of V , [I]β = In .
24
2.6. Calculating the value of a linear transformation using its matrix representation.
The following shows that by using appropriate representations, we can reduce the action of applying
a linear transformation to that of matrix multiplication. In some sense, this shows that much of
finite-dimensional linear algebra can be reduced to the study of matrices (like in math33a).
Proposition 2.41. Let T : V → W be a linear transformation such that V and W are finite-
dimensional vector spaces with ordered bases β and γ, respectively. Then, for each v ∈ V we
have
T (v) γ = [T ]γβ [v]β .
 

Proof. Suppose β = {v1 , . . . , vn } and γ = {w1 , . . . , wm } are our ordered bases for V and W . Let
v ∈ V be arbitrary. Then v = a1 v1 + · · · + an vn for a unique choice of scalars a1 , . . . , an ∈ F . Thus
 
a1
 .. 
[v]β =  . 
an
Now let B = [T ]γβ . Then
n
X
T (v) = a1 T (v1 ) + · · · + an T (vn ) = aj T (vj ) (because T is linear)
j=1
n m
!
X X
= aj Bij wi (definition of B)
j=1 i=1
 
m
X n
X
=  aj Bij  wi (rearranging summation)
i=1 j=1

Thus  Pn   
j=1 aj B1j a1
  .. .
T (v) γ =   = B ·  ..  ,
 
Pn .
j=1 aj Bmj an
as desired. 
Example 2.42. Let T : P3 (R) → P2 (R) be given by T f (x) := f 0 (x). Then with β, γ the standard


ordered bases of P3 (R) and P2 (R), we have shown previously that


 
 γ 0 1 0 0
T β = 0 0 2 0
0 0 0 3
Suppose p(x) = 2 − 4x + x2 + 3x3 . Then we compute directly that T (p(x)) = p0 (x) = −4 + 2x + 9x2 .
Representing this as a vector, we get
 
−4
T (p(x)) = p0 (x) =  2  .
   
9
We can also arrive at this via Proposition 2.41 by multiplying matrices:
 
  2  
0 1 0 0   −4
 γ   −4
T β p(x) β = 0 0  2 0   =
   2 .
1
0 0 0 3 9
3
25
2.7. Associating a linear transformation to a matrix. We have just seen that a linear trans-
formation can be represented by a matrix. We will now go in the reverse direction and associate
to a matrix a linear transformation. Remember, a priori a matrix is just a rectangular array of
scalars (a static object), it doesn’t do anything. We’ll show there is a natural way to associate a
linear transformation (a dynamic object) to a matrix.
Definition 2.43. Let A ∈ Mm×n (F ). We denote by LA the mapping LA : F n → F m defined by
LA (x) = Ax. Here we regard vectors from F n and F m now as column vectors, and the expression
Ax denotes the multiplication of the m × n matrix A with the m × 1 column vector x. We call LA
a left-multiplication transformation.
 
1 2 1
Example 2.44. Let A = ∈ M2×3 (R). This gives rise to the linear transformation
0 1 2
 
1
LA : R3 → R2 . For example, If x =  3  ∈ R3 , then
−1
 
  1  
1 2 1   6
LA (x) = Ax = 3 =
0 1 2 1
−1
Here are some basic properties of LA :
Proposition 2.45. Let A ∈ Mm×n (F ). Then the function LA : F n → F m is a linear transfor-
mation. Furthermore, if B ∈ Mm×n (F ) and β, γ are the standard ordered bases for F n and F m ,
respectively, then
 γ
(1) LA β = A,
(2) LA = LB if and only if A = B,
(3) LA+B = LA + LB , and LaA = a · LA for every a ∈ F ,
(4) if T : F n → F m is a linear transformation, then there is a unique C ∈ Mm×n (F ) such that
T = LC . In fact, C = [T ]γβ ,
(5) If E ∈ Mn×p (F ), then LAE = LA LE , and
(6) if m = n, then LIn = IF n , where IF n : F n → F n is the identity linear transformation.
Proof. The fact that LA is linear is clear from Proposition 2.40 (using that we can view vectors in
F n as the same thing as matrices in Mn×1 (F )).
(1) Note that the jth column of [LA ]γβ is LA (ej ) = Aej (by definition of the matrix representa-
tion), which is also the jth column of the matrix A. Thus [LA ]γβ = A.
(2) (⇐) is clear. (⇒) Suppose LA = LB . Then from (1), A = [LA ]γβ = [LB ]γβ = B.
(3) Exercise.
(4) Suppose T : F n → F m is a linear transformation and set C := [T ]γβ . Then by Proposi-
tion 2.41, [T (x)]γ = [T ]γβ [x]β for all x ∈ V , so T (x) = Cx = LC (x) for all x ∈ F n . Thus
T = LC . The uniqueness follows from (2).
(5) We first claim that (AE)ej = A(Eej ), for j = 1, . . . , p. Note that these are both m × 1
matrices. for k = 1, . . . , m we have
p p n p Xn
!
 X X X X
(AE)ej k1 = (AE)kl (ej )l1 = Aki Eil (ej )l1 = Aki Eil (ej )l1
l=1 l=1 i=1 l=1 i=1
p
n X n p n
!
X X X X 
= Aki Eil (ej )l1 = Aki Eil (ej )l1 = Aki (Eej )i1 = A(Eej ) k1 .
i=1 l=1 i=1 l=1 i=1
26
Thus
LAE (ej ) = (AE)ej = A(Eej ) = LA (Eej ) = LA (LE (ej )) = LA LE (ej ).
Thus LAE = LA LE since they take the same values on a basis, by Corollary 2.18 to Linear
Transformation Prescription 2.17.
(6) Exercise 
The next proposition shows that matrix multiplication is associative:
Proposition 2.46. Let A ∈ Mm×n (F ), B ∈ Mn×p (F ), and C ∈ Mp×r (F ). Then
A(BC) = (AB)C.
Proof. By Proposition 2.45(5) and associativity of composition of functions (Proposition 6.8), we
have
LA(BC) = LA LBC = LA (LB LC ) = (LA LB )LC = LAB LC = L(AB)C .
Thus A(BC) = (AB)C by Proposition 2.45(2).
Note: this can also be proved directly by a computation similar to the one done in the proof of
Proposition 2.45(5). 
2.8. Invertibility. In this subsection, we generalize the notion of invertible square matrix to arbi-
trary linear transformations. We assume the reader is familiar with basic facts about invertibility
of functions.
Definition 2.47. Let V and W be vector spaces, and let T : V → W be a linear transformation.
We say T is invertible, if it is invertible as a function (see appendix).
If T is invertible, then it has an inverse function. The next Proposition tells us that this inverse
function is automatically also a linear transformation:
Proposition 2.48. Let T : V → W be an invertible linear transformation. Then the function
T −1 : V → W is also a linear transformation.
Proof. Let w1 , w2 ∈ W and c ∈ F be arbitrary. Since T is one-to-one and onto (because it is
invertible), there are unique v1 , v2 ∈ V such that T (v1 ) = w1 and T (v2 ) = w2 . Thus v1 = T −1 (w1 )
and v2 = T −1 (w2 ), and so
T −1 (cw1 + w2 ) = T −1 (cT (v1 ) + T (v2 )) = T −1 T (cv1 + v2 )
= IV (cv1 + v2 ) = cv1 + v2 = cT −1 (w1 ) + T −1 (w2 ) 
Example 2.49. Let T : P1 (R) → R2 be the linear transformation defined by T (a+bx) := (a, a+b).
Then T −1 : R2 → P1 (R) is defined by T −1 (c, d) = c + (d − c)x, which is also a linear transformation.
We also have a notion of invertibility for matrices:
Definition 2.50. Let A ∈ Mn×n (F ). We say that A is invertible if there is B ∈ Mn×n (F ) such
that AB = BA = In . We call such a matrix B the inverse of A, and we write B = A−1 . Just like
for functions, inverses for matrices are unique (when they exist). The proof of this uniqueness is
the same.
We will only talk about the invertibility of a matrix when it is a square matrix. All non-square
matrices are automatically not invertible.
   
1 0 1 0
Example 2.51. The inverse of is . Indeed,
1 1 −1 1
       
1 0 1 0 1 0 1 0 1 0
= = .
1 1 −1 1 −1 1 1 1 0 1
27
Lemma 2.52. Let T : V → W be an invertible linear transformation, and suppose dim V < ∞.
Then dim V = dim W .
Proof. Recall that since T is invertible, it is a bijection and so it is one-to-one and onto. Let
β = {x1 , . . . , xn } be a basis for V . By Proposition 2.9, span(T (β)) = R(T ) = W (since T is
onto). Next, since T is one-to-one, we have that N (T ) = {0}, so dim N (T ) = 0. By the Dimension
Theorem, this implies that dim W = dim R(T ) = dim V . 
Invertibility of linear transformations corresponds to invertibility of matrices:
Proposition 2.53. Let V, W be finite-dimensional vector spaces with ordered bases β and γ, re-
spectively. Suppose T : V → W is a linear transformation. Then T is invertible if and only if the
matrix [T ]γβ is invertible. Furthermore, if either of these hold, then [T −1 ]βγ = ([T ]γβ )−1 .

Proof. (⇒) Suppose T is invertible. By Lemma 2.52, we have that both [T ]γβ , [T −1 ]βγ ∈ Mn×n (F ).
Note that
[T −1 ]βγ [T ]γβ = [T −1 T ]β = [IV ]β = In ,
using Proposition 2.36 and Proposition 2.45(4). Likewise, we have [T ]γβ [T −1 ]βγ = In . Thus [T ]γβ is
an invertible matrix, and its inverse is ([T ]γβ ) = [T −1 ]βγ .
(⇐) Now suppose A = [T ]γβ is invertible, say with inverse B ∈ Mn×n (F ), so AB = BA = In .
By Linear Transformation Prescription 2.17, there is U ∈ L(W, V ) such that U (wj ) = ni=1 Bij vi
P

for j = 1, . . . , n, where γ = {w1 , . . . , wm } and β = {v1 , . . . , vn }. Then [U ]βγ = B. To show that


U = T −1 , note that
[U T ]β = [U ]βγ [T ]γβ = BA = In = [IV ]β ,
by Proposition 2.36. Thus U T = IV . Similarly T U = IW . 
Example 2.54. Let β and γ be the standard ordered bases of P1 (R) and R2 respectively. For T
given by T (a + bx) = (a, a + b) from the previous example, we have
   
γ 1 0 −1 β 1 0
[T ]β = and [T ]γ =
1 1 −1 1
which we already know are matrix inverses of each other.
Corollary 2.55. Let A ∈ Mn×n (F ). Then A is invertible if and only if LA is invertible. If either
of these hold, then (LA )−1 = LA−1 .
2.9. Isomorphisms. You may have noticed, for instance, that the vector spaces P3 (R) and R4 are
essentially “the same”, at least from linear algebra’s point of view. The concept of isomorphism
makes this precise.
Definition 2.56. Let V, W be vector spaces over F . We say that V is isomorphic to W if there
exists an invertible linear transformation T : V → W . Such a linear transformation T is called an
isomorphism from V to W .
Remark 2.57. The following observations follow immediately from the definition of isomorphism:
(1) V is isomorphic to V (using IV )
(2) V is isomorphic to W if and only if W is isomorphic to V .
(3) If V is isomorphic to W and W is isomorphic to Z, then V is isomorphic to Z.
The above three properties, taken together, say that isomorphism is a so-called equivalence relation.
Example 2.58. Let T : F 2 → P1 (F ) be given by T (a1 , a2 ) := a1 + a2 x. Then T is an isomorphism,
so F 2 is isomorphic to P1 (F ).
28
By Lemma 2.52, isomorphic vector spaces have the same dimension. Remarkably, the converse also
holds for finite-dimensional spaces:
Proposition 2.59. Let V, W be finite-dimensional vector spaces over F . Then V is isomorphic to
W if and only if dim(V ) = dim(W ).
Proof. (⇒) Suppose T : V → W is an isomorphism. Then dim V = dim W by Lemma 2.52.
(⇐) Suppose dim V = dim W , and let β = {v1 , . . . , vn } and γ = {w1 , . . . , wn } be bases for V
and W , respectively. By Linear Transformation Prescription 2.17, there is a linear transformation
T : V → W such that T (vi ) = wi for each i = 1, . . . , n. By Proposition 2.9, R(T ) = span(T (β)) =
span(γ) = W , so T is surjective. By Proposition 2.15, T is in fact a bijection (hence invertible).
Thus T is an isomorphism. 
Corollary 2.60. Let V be a vector space over F . Then V is isomorphic to F n if and only if
dim V = n.
The following shows that we can identify the spaces L(V, W ) with Mm×n (F ) (for V, W of appro-
priate finite-dimensions).
Proposition 2.61. Let V, W be vector spaces over F with dim V = n and dim W = m. Suppose
β, γ are ordered bases for V and W , respectively. Then the linear transformation Φ : L(V, W ) →
Mm×n (F ) defined by Φ(T ) := [T ]γβ for all T ∈ L(V, W ) is an isomorphism.
Proof. Φ is a linear transformation by Proposition 2.29, so it remains to show that Φ is a bijection.
This means we need to show that for every A ∈ Mm×n (F ), there is a unique linear transformation
T : V → W such that Φ(T ) = A.
Suppose β = {v1 , . . . , vn }, γ = {w1 , . . . , wm } and A ∈ Mm×n (F ) are given. By Linear Trans-
formationPPrescription 2.17, there is a unique linear transformation T : V → W such that
γ
T (vj ) = mi=1 A ij w i for each j = 1, . . . , n. Then [T ]β = A, so Φ(T ) = A. 

Corollary 2.62. If dim V = n and dim W = m, then dim L(V, W ) = mn = dim Mm×n (F ).
Here is an example of an isomorphism we have been working with all along.
Example 2.63. Suppose V is an n-dimensional vector space over F , with ordered basis β. Then
the linear transformation φβ : V → F n given by φβ (v) := [v]β is an isomorphism.
In light of this, another way to state Proposition 2.41 would be to say that the following com-
positions are equal:
LA φ β = φ γ T
where T : V → W is a linear transformation between finite-dimensional vector spaces V and W ,
β, γ are ordered bases for V, W , and A = [T ]γβ . In diagram form we would say that the following
diagram “commutes”:
V
T /W

φβ φγ
 
Fn / Fm
LA

In other words, if a vector v ∈ V begins its journey in the upper-left space V , then it has two
possible paths to get to the bottom-right space: one way is to go over to T (v) ∈ W and then down
to φγ T (v) ∈ F m . The other path is to go down first to φβ (v) ∈ F n and then over to LA φβ (v) ∈ F m .
Saying that this diagram “commutes” means that these two different paths arrive at the same place,
φγ T (v) = LA φβ (v).
29
2.10. Change of coordinate matrix. Everything in this subsection is an extremely useful special
case of everything we have done so far. The following is almost immediate:
Proposition 2.64. Let β and β 0 be ordered bases for a finite-dimensional vector space V , and let
Q = [IV ]ββ 0 . Then
0
(1) Q is invertible (with Q−1 = [IV ]ββ ),
(2) For every v ∈ V , [v]β = Q[v]β 0 .

(1) Since IV is invertible, Proposition 2.53 implies Q = [IV ]ββ 0 is invertible, with inverse Q−1 =
0 0
[IV−1 ]ββ = [IV ]ββ .
(2) Let v ∈ V , then
[v]β = [IV (v)]β = [IV ]ββ 0 [v]β 0 = Q[v]β 0 ,
by Proposition 2.41.
The above matrix shows that multiplying by Q changes the β 0 -coordinates of a vector v to the
β-coordinates of v. This motivates the following definition:
Definition 2.65. Let β 0 and β be ordered bases for a finite-dimensional vector space V . We define
the change of coordinate matrix from β 0 to β to be Q = [IV ]ββ 0 . Sometimes this is also called
the change of basis matrix.
 
1
Example 2.66. Suppose V = R , u = 2 . Consider the ordered bases β 0 = {(−1, 0), (0, −1)}
1
and β = {(1, 0), (0, 1)}. Then we can compute directly
   
−1 1
[u]β 0 = and [u]β = .
−1 1
However, we could separately compute
      
−1 0 −1 0 −1 1
[IV ]ββ 0 = and then conclude [u]β = = .
0 −1 0 −1 −1 1
Definition 2.67. A linear transformation T : V → V from a vector space to itself is called a
linear operator on V .
Suppose we have ordered bases β 0 , β for V . Then how do you compute [T ]β 0 from [T ]β ?
Proposition 2.68. Let T be a linear operator on a finite-dimensional vector space V and suppose
β 0 , β are ordered bases for V . Let Q = [IV ]ββ 0 be the change of coordinate matrix (from β 0 -coordinates
to β-coordinates). Then
[T ]β 0 = Q−1 [T ]β Q.
Proof. First, recall that
T = IV T = T IV .
Thus
0
Q[T ]β 0 = [IV ]ββ 0 [T ]ββ 0 =∗ [IV T ]ββ 0 = [T IV ]ββ 0 =∗ [T ]ββ [IV ]ββ 0 = [T ]β Q
using Proposition 2.36 at each step *. Multiplying the first and last on the left by Q−1 then yields
[T ]β 0 = Q−1 [T ]β Q,
as desired. 
30
Example 2.69. Consider the linear operator T on R2 defined by T (x, y) = (x + y, x − y). Let
β = {(1, 0), (0, 1)} and β 0 = {(−1, 0), (0, −1)} be ordered bases. By the previous example:
 
β −1 0
Q = [IV ]β 0 = .
0 −1
Coincidentally, we also have  
0 −1 0
Q−1
= [IV ]ββ = .
0 −1
We easily see that  
1 1
[T ]β = .
1 −1
Thus
        
−1 −1 0 1 1 −1 0 −1 0 −1 −1 1 1
[T ]
β0 = Q [T ]β Q = = =
0 −1 1 −1 0 −1 0 −1 −1 1 1 −1

31
3. Determinants
This section we briefly review determinants of matrices. Determinants are these magical and
mysterious functions defined on square matrices. We won’t study them or their secrets in this
class, but instead use them as mathematical tools for use in later sections. Another way to say this
is that we will black-box 1 the theory of determinants.

Let F be a field. Then for each n ≥ 1 there is a function

det : Mn×n (F ) → F

called the determinant. We will not officially define the determinant, instead we will pretend that
it is already given to us and we will just say what its properties are. If you like, you can take the
formulas in the “computing the determinant” subsection below to be the definition of determinant,
although the real story is much more elaborate and elegant (and unfortunately outside the scope
of the class if we really want to do the theory of determinants justice).

3.1. Computing the determinant. For n = 1, computing the determinant is easy:

Given A = (A11 ) ∈ M1×1 (F ), we have det A = A11 .

For n = 2, there is also a fairly simple formula for computing the determinant:
 
A11 A12
Given A = ∈ M2×2 (F ), we have det A = A11 A22 − A21 A12 .
A21 A22

Now suppose n ≥ 2, and let A ∈ Mn×n (F ). Then for any i, j ∈ {1, . . . , n} we define the ij-cofactor
matrix of A to be the matrix Aeij ∈ M(n−1)×(n−1) (F ) obtained from A by deleting the ith row and
the jth column. Then we have can compute the determinant of A by cofactor expansion:
n
X
det(A) = (−1)i+j Aij · det(A
eij ) for any 1 ≤ i ≤ n,
j=1

i.e., we can use cofactor expansion along any row, not just the top row i = 1. Similarly, we can use
cofactor expansion along any column to compute the determinant of A:
n
X
det(A) = (−1)i+j Aij · det(A
eij ) for any 1 ≤ j ≤ n.
i=1

Note that the cofactor expansion formulas reduce the computation of the determinant of an n × n
matrix down to the computation of several (n − 1) × (n − 1) sized determinants. Applying cofactor
expansion recursively, eventually the computation will reduce to 2 × 2 or 1 × 1-sized determinants,
which we know how to compute directly from above.

Example 3.1. Consider the 3 × 3 matrix


 
1 3 −3
A = −3 −5 2  ∈ M3×3 (R).
−4 4 −6

1See https://en.wikipedia.org/wiki/Black_box
32
We will calculate the determinant using cofactor expansion along the 1st row (i = 1):
 
1 3 −3
det −3 −5 2  = (−1)1+1 A11 det(A e11 ) + (−1)1+2 A12 det(A
e12 ) + (−1)1+3 A13 det(A
e13 )
−4 4 −6
     
−5 2 −3 2 −3 −5
= det − 3 det − 3 det
4 −6 −4 −6 −4 4
     
= (−5)(−6) − 2 · 4 − 3 (−3)(−6) − 2(−4) − 3 (−3)4 − (−5)(−4)
= 22 − 3 · 26 − 3(−32) = 40.
In general, when using cofactor expansion to compute determinants, it helps to judiciously pick a
row or a column that has many zeros, if there is one.

3.2. Properties of the determinant. For the sake of computation, we also record here how the
determinant changes when you apply row or column operations to a matrix. Suppose A ∈ Mn×n (F ).
If B is a matrix obtained from A by...
(1) switching two rows (or two columns), then
det B = − det A,
(2) multiplying a row (or a column) of A by a scalar c ∈ F , then
det(B) = c · det A,
(3) for i 6= j, adding a multiple of row i to row j (or a multiple of column i to column j), then
det B = det A.
Using these properties and the following fact allows you to easily compute the determinant of a
matrix using the usual row-reducing algorithm.
Fact 3.2. If A ∈ Mn×n (F ) is upper triangular, i.e., if Aij = 0 for all i > j (entries below the
diagonal are = 0), then det A = A11 · A22 · · · Ann .
 
0 1 3
Example 3.3. Let B = −2 −3 −5. We will use row operations to take B to an upper-
4 −4 4
triangular matrix:
     
0 1 3 −2 −3 −5 −2 −3 −5
R ↔R2 R →R +2R1
B = −2 −3 −5 −−1−−−→ 0 1 3  −−3−−−3−−−→ 0 1 3
4 −4 4 4 −4 4 0 −10 −6
 
−2 −3 −5
R →R +10R3
−−3−−−3−−−−→ 0 1 3
0 0 24
This shows us that det B = (−1) · (−2) · 1 · 24 = 48, since the determinant of the final matrix is
(−2) · 1 · 24 = −48, and then we have to multiply by an additional (−1) since we did a row exchange
in the first step (the other steps leave the determinant unchanged).
Going forward, the following facts about the determinant are the most important:
Fact 3.4. Let A, B ∈ Mn×n (F ). Then
(1) det(AB) = det(A) · det(B).
(2) If In ∈ Mn×n (F ) is the identity matrix, then det(In ) = 1.
33
(3) A is invertible if and only if det(A) 6= 0. In this case,

1
det(A−1 ) = .
det(A)

(4) det(A) = det(At ).

Finally, we say that A and B are similar if there is an invertible matrix Q ∈ Mn×n (F ) such that
B = Q−1 AQ. Then we have that

(5) if A and B are similar, then det A = det B.

Proof. (1) We will just take this for granted.


(2) Since In is a particular upper-triangular matrix, we have det In = 1| ·{z
· · 1} = 1.
n times
(3) Suppose A is invertible. Then there is B such that AB = In . Applying (1) and (2) gives
det(AB) = det(A) · det(B) = det(In ) = 1. Thus det(A) 6= 0 and det(B) = 1/ det(A).
Suppose A is not invertible. Then the columns are not linearly-independent, so we can
apply column operations to transform A into a matrix that has a column of all zeros. This
last matrix will have determinant 0, so A has determinant 0.
(4) We will take this for granted. To convince yourself, note that if you compute det(A) com-
pletely by always doing cofactor expansion along the top-most row, and if you compute
det(At ) completely by always doing cofactor expansion along the left-most column, then
your two computations will be identical (mirror-images of each other about the main diag-
onal), so you will get the same number.
(5) Follows from (1) and (3). 

3.3. The determinant of a linear operator (New!) We saw previously that linear operators
T : V → V on finite-dimensional vector spaces are very analogous to square matrices. This analogy
also applies to determinants:

Lemma 3.5. Suppose V is a finite-dimensional vector space over a field F , and T : V → V . Then
there is a scalar d ∈ F such that for every ordered basis β of V , we have d = det[T ]β .

Proof. First, let γ be an ordered basis of V , and set d := det[T ]γ . Next, let β be an arbitrary
ordered basis of V . Consider the (invertible) change of coordinates matrix Q := [IV ]βγ . Then we
have [T ]γ = Q−1 [T ]β Q, i.e., [T ]γ and [T ]β are similar matrices. Thus det[T ]β = det[T ]γ = d. 

Definition 3.6. For a linear operator T : V → V on a finite-dimensional vector space, we define


its determinant, det T , as follows: choose any ordered basis β of V and define det T = det[T ]β .
By the previous lemma, the choice of β does not matter.

Example 3.7. Define the operator T : M2×2 (R) → M2×2 (R) as follows:

    
a b 2 1 a b
T := .
c d 0 3 c d
34
We want to compute det T . Consider the standard ordered basis β = {E11 , E12 , E21 , E22 } of
M2×2 (R). We first compute [T ]β :
      
1 0 2 1 1 0 2 0
T = = = 2E11 + 0E12 + 0E21 + 0E22
0 0 0 3 0 0 0 0
      
0 1 2 1 0 1 0 2
T = = = 0E11 + 2E12 + 0E21 + 0E22
0 0 0 3 0 0 0 0
      
0 0 2 1 0 0 1 0
T = = = 1E11 + 0E12 + 3E21 + 0E22
1 0 0 3 1 0 3 0
      
0 0 2 1 0 0 0 1
T = = = 0E11 + 1E12 + 0E21 + 3E22
0 1 0 3 0 1 0 3

and thus  
2 0 1 0
0 2 0 1
[T ]β = 
0
.
0 3 0
0 0 0 3
We now compute
 
2 0 1 0
0 2 0 1
det T = det[T ]β = det   = 2 · 2 · 3 · 3 = 36.
0 0 3 0
0 0 0 3
by cofactor expansion along the bottom row twice.
Proposition 3.8. Suppose T : V → V is a linear operator. Then
(1) T is bijective if and only if det T 6= 0,
(2) if T is bijective, then det(T −1 ) = (det T )−1 ,
(3) if U : V → V is another linear operator on V , then det(T U ) = det(T ) det(U ).
Proof. (1) T is bijective if and only if [T ]β is invertible, where β is some (any) ordered basis
of V (by Proposition 2.53). The matrix [T ]β is invertible if and only if det[T ]β 6= 0, by
Fact 3.4(3).
(2) Suppose T is bijective (hence injective), and β is an ordered basis. Then
1 = det In = det[IV ]β = det[T T −1 ]β = det [T ]β [T −1 ]β


= (det[T ]β )(det[T −1 ]β ) = (det T )(det T −1 ).


Thus (det T )−1 = det(T −1 ).
(3) Again, let β be an ordered basis of V . Then we have
  
det(T U ) = det[T U ]β = det [T ]β [U ]β = det[T ]β det[U ]β = (det T )(det U ). 

35
4. Eigenvalues and eigenvectors
In this section, V is a finite-dimensional vector space over a field F .
Since diagonal matrices are very nice to work with, our goal in this section is to study, given a
linear operator T : V → V whether or not there is an ordered basis β of V such that [T ]β is a
diagonal matrix. To make this precise:
Definition 4.1. (1) A matrix B ∈ Mn×n (F ) is a diagonal matrix if for all i, j ∈ {1, . . . , n},
if i 6= j, then Bij = 0 (i.e., the non-diagonal entries of B are all 0). So a diagonal matrix
looks like  
B11 0 · · · 0
 0 B22 · · · 0 
B =  ..
 
.. .. .. 
 . . . . 
0 0 ··· Bnn
where B11 , . . . , Bnn ∈ F (possibly also zero).
(2) A linear operator T : V → V is called diagonalizable if there is an ordered basis β of V
such that [T ]β is a diagonal matrix.
(3) A matrix A ∈ Mn×n (F ) is diagonalizable if A is similar to a diagonal matrix, i.e., if there
is an invertible matrix Q ∈ Mn×n (F ) such that B = Q−1 AQ is a diagonal matrix.
Diagonalizability for linear operators and for matrices essentially amounts to the same thing:
Proposition 4.2. Let T : V → V be a linear operator and suppose β is an ordered basis for V .
Then T is diagonalizable if and only if [T ]β is a diagonalizable matrix.
Proof. (⇒) Suppose T is diagonalizable. Then there is an ordered basis γ for V such that D = [T ]γ
is a diagonal matrix. Let Q = [IV ]βγ be the change of coordinate matrix from γ-coordinates to
β-coordinates. Then D = [T ]γ = Q−1 [T ]β Q, [T ]β is similar to a diagonal matrix D, hence [T ]β is
diagonalizable.
(⇐) Suppose [T ]β is a diagonalizable matrix. Then there is an invertible Q ∈ Mn×n (F ) such
that Q−1 [T ]β Q is a diagonal matrix. We want to show that the operator T is diagonalizable, i.e.,
we want to find an ordered basis γ of V such that [T ]γ is a diagonal matrix. How do we find this
basis γ? The idea is to use the entries of Q to construct γ using β. Pn
Specifically, suppose β = {v1 , . . . , vn }. For j = 1, . . . , n, define wj := i=1 Qij vi . Define
γ = {w1 , . . . , wn }. First, we need to show that γ is actually a basis of V . To do this, define the
auxiliary operator U : V → V by U (vj ) := wj for all j = 1, . . . , n (defining in this way is possible
by Linear Transformation Prescription and the fact that β is a basis). Then [U ]β = Q is invertible,
so U is invertible. Thus R(U ) = V = span(γ), and so γ is a basis (it has n vectors and it spans
an n-dimensional space). Since γ is an ordered basis, we can ask what is the matrix [IV ]βγ ? By
definition of γ, we actually have [IV ]βγ = Q. Finally, note that Q−1 [T ]β Q = [IV ]γβ [T ]β [IV ]βγ = [T ]γ
is a diagonal matrix. Thus T is diagonal. 
Corollary 4.3. A ∈ Mn×n (F ) is diagonalizable if and only if LA is diagonalizable.
In this section we will answer the following question:
Question 4.4. When is a matrix A ∈ Mn×n (F ) (equivalently, a linear operator T : V → V )
diagonalizable?
Already, we can provide an answer (not necessarily the definitive answer, since its basically just a
restatement of the definition):
36
Proposition 4.5. Suppose T : V → V is a linear operator. Then T is diagonalizable if and only
if there is an ordered basis β = {v1 , . . . , vn } for V and scalars λ1 , . . . , λn ∈ F such that
T (vj ) = λj vj for 1 ≤ j ≤ n.
Proof. (⇒) Suppose T is diagonalizable, i.e., suppose there is an ordered basis β = P{v1 , . . . , vn }
such that D = [T ]β is a diagonal matrix. Then for each vj ∈ β, we have T (vj ) = ni=1 Dij vj =
Djj vj = λj vJ , for λj = Djj .
(⇐) Suppose there is an ordered basis β = {v1 , . . . , vn } and scalars λ1 , . . . , λn ∈ F such that
T (vj ) = λj vj for each 1 ≤ j ≤ n. Then
 
λ1 0 · · · 0
 0 λ2 · · · 0 
[T ]β =  ..
 
.. . . . 
. . . .. 
0 0 ··· λn
is a diagonal matrix, so T is diagonalizable. 
The above Proposition suggests the following definition:
Definition 4.6. (1) A non-zero vector v ∈ V is an eigenvector of T if T (v) = λv for some
λ ∈ F . We call λ the eigenvalue of T corresponding to the eigenvector v.
(2) Let A ∈ Mn×n (F ). A non-zero v ∈ F n is an eigenvector of A if Av = λv for some λ ∈ F ,
and λ is called the eigenvalue of A corresponding to the eigenvector v.
(3) The vectors vj in the basis β in Proposition 4.5 are eigenvectors of T with corresponding
eigenvalues λj .
Eigenvalue Criterion 4.7. Suppose T : V → V is a linear operator and A ∈ Mn×n (F ) is a
matrix. Let λ ∈ F .
(1) The scalar λ is an eigenvalue of T if and only if det(T − λIV ) = 0.
(2) The scalar λ is an eigenvalue of A if and only if det(A − λIn ) = 0.
Proof. We’ll show (1). We have
λ is an eigenvalue of T ⇔ T (v) = λv for some v 6= 0 in V
⇔ (T − λIn )(v) = 0 for some v 6= 0 in V
⇔ N (T − λIV ) 6= {0}
⇔ T − λIV is not bijective, One-to-one Criterion 2.14, Prop 2.15
⇔ det(T − λIV ) = 0.
The proof for (2) is analogous. 
 
1 1
Example 4.8. Let A = ∈ M2×2 (R). Then
4 1
 
1−λ 1
det(A − λI2 ) = det = (1 − λ)2 − 4 = (λ − 3)(λ + 1).
4 1−λ
Thus the eigenvalues of A are the solutions to the equation (λ − 3)(λ + 1) = 0, which are λ = 3, −1.
Definition 4.9. (1) The polynomial f (t) := det(A − tIn ) in the variable t is called the char-
acteristic polynomial of A.
(2) Given a linear operator T : V → V , we define the characteristic polynomial of T to
be f (t) = det(T − tIV ). Note that this is the same thing as f (t) = det(A − tIn ), where
A = [T ]β and β is any ordered basis for V . Indeed, let β be an ordered basis for V , then
f (t) = det(T − tIV ) = det([T − tI V ]β ) = det([T ]β − t[IV ]β ) = det([T ]β − tIn ).
37
Here are some easy consequences of the definitions so far:
Lemma 4.10. Let A ∈ Mn×n (F ) be given, and let f (t) be its characteristic polynomial
(1) f (t) is a polynomial of degree n, with leading coefficient (−1)n :
f (t) = (−1)n tn + cn−1 tn−1 + · · · + c0 for some c0 , . . . , cn ∈ F .
(2) A scalar λ ∈ F is an eigenvalue of A if and only if f (λ) = 0.
(3) A has at most n distinct eigenvalues (since f (t) has at most n distinct roots).
(4) If λ ∈ F is an eigenvalue of A, then a vector x ∈ F n is an eigenvector of A corresponding
to λ if and only if x 6= 0 and x ∈ N (LA − λIF n ).
 
1 1
Example 4.11. We’ll consider A = again and find all eigenvectors corresponding to each
4 1
of its eigenvalues.
(1) We say the eigenvalues
 of A are  λ1 = 3 and λ2 =  −1.
−2 1 x1
(2) Let B1 = A−λ1 I2 = . Then x = ∈ R2 is an eigenvector of A corresponding
4 −2 x2
  
−2 1 x1
to λ1 = 3 if and only if x 6= 0 and x ∈ N (LB ), if and only if x 6= 0 and =
4 −2 x2
 
0
, if and only if
0
−2x1 + x2 = 0
4x1 − 2x2 = 0.
The set of all solutions to this system of equations is
   
1
t :t∈R
2
 
1
Thus x ∈ R2 is an eigenvector of A corresponding to λ1 = 3 if and only if x = t for
2
some t 6= 0.  
2 1
(3) Now let B2 := A − λ2 I2 = . Hence x ∈ R2 is an eigenvector of A corresponding
4 2
to
 λ2 if andonly if x 6= 0 and x ∈ N (LB2 ), if and only if B2 · x = 0, if and only if
2 1 x1 0
= , if and only if
4 2 x2 0
2x1 + x2 = 0
4x1 + 2x2 = 0.
Thus    
1
N (LB2 ) = t :t∈R .
−2
 
1
This means x is an eigenvector of A corresponding to λ2 = −1 if and only if x = t
−2
for some t 6= 0.
   
1 1
Note that , is a basis for R2 consisting of eigenvectors of A. Thus LA , and hence A,
2 −2
is diagonalizable.
38
4.1. Determining eigenvectors and eigenvalues of a linear operator. Suppose dim(V ) = n
and let β be some ordered basis for V . Let T ∈ L(V ) be a linear operator on V . Summarizing
the results of the previous section, we describe how to determine all eigenvalues and corresponding
eigenvectors of T .
(1) First, determine the matrix representation [T ]β of T .
(2) Next, determine the eigenvalues of T . Use that λ ∈ F is an eigenvalue of T if and only if λ
is a root of the characteristic polynomial of T . That is, we need to find the solutions x ∈ F
of det([T ]β − xIn ) = 0. There are at most n distinct solutions λ1 , . . . , λn (but possibly
fewer).
(3) For each eigenvalue λ of T , we can determine the corresponding eigenvectors. We have
T (v) = λv if and only if (T −λIV )(v) = 0 if and only if [T −λIV ]β [v]β = 0. Thus, eigenvectors
v corresponding to λ are the nonzero solutions of this system of linear equations (more
precisely, solving this system we find the β-coordinate vector [v]β , which then determines
v).
Distinctness of eigenvalues goes a long way:
Proposition 4.12. Let T ∈ L(V ) be a linear operator on V , and let λ1 , . . . , λk be distinct eigen-
values of T . If v1 , . . . , vk are eigenvectors of T such that vi corresponds to λi for i = 1, . . . , k, then
{v1 , . . . , vk } is linearly independent.
Proof. We prove this by induction on k.
For the base case k = 1, suppose v1 is an eigenvector corresponding to λ1 . Then v1 6= 0, by
definition of eigenvector. Thus the set {v1 } is linearly independent.
Let k ≥ 2, and suppose we know the theorem is true for k − 1 many eigenvalues and eigenvectors.
We have the vectors v1 , . . . , vk , eigenvectors corresponding to the distinct eigenvalues λ1 , . . . , λk .
Suppose a1 , . . . , ak ∈ F are arbitrary such that
a1 v1 + · · · + ak vk = 0.
Apply the linear transformation T − λk IV to both sides and use linearity to get
(T − λk IV )(0) = T (0) − λk IV (0) = 0 − 0 = 0
for the right side, and for the left side:
(T − λk IV )(a1 v1 + · · · + ak vk ) = (aT (v1 ) + · · · + ak T (vk )) − λk (a1 v1 + · · · + ak vk )
= aλ1 v1 + · · · + ak λk vk − λk a1 v1 − · · · − λk ak vk as T (vi ) = λi vi
= a1 (λ1 − λk )v1 + · · · + ak−1 (λk−1 − λk )vk−1
= 0 because left side must equal right side.
By the inductive hypothesis, {v1 , . . . , vk−1 } are linearly independent, so
a1 (λ1 − λk ) = · · · = ak−1 (λk−1 − λk ) = 0.
Since λ1 , . . . , λk are distinct by assumption, λi − λk 6= 0 for i = 1, . . . , k − 1. Thus a1 = · · · =
ak−1 = 0. Thus ak vk = 0, which implies ak = 0 since vk 6= 0 (as vk is an eigenvector). We conclude
that {v1 , . . . , vk } are linearly independent. 
Corollary 4.13. Let T ∈ L(V ) and dim(V ) = n. If T has n distinct eigenvalues, then T is
diagonalizable.
Proof. Let λ1 , . . . , λn be n distinct eigenvalues of T . For each i, let vi be an eigenvector corre-
sponding to λi . By Proposition 4.12, {v1 , . . . , vn } is linearly independent. Since dim(V ) = n, this
set is a basis for V . Thus V has a basis consisting of eigenvectors for T , so T is diagonalizable. 
39
Example 4.14. The converse of Corollary 4.13 is false. For instance, the identity operator IV has
only one eigenvalue λ = 1, however it is diagonalizable.
So far in this class, which field F we are working over hasn’t really mattered. Now it matters.
Definition 4.15. A polynomial f (t) ∈ P (F ) splits over F if there are scalars c, a1 , . . . , an ∈ F
(not necessarily distinct) such that
f (t) = c(t − a1 )(t − a2 ) · · · (t − an )
Example 4.16. (1) t2 − 1 ∈ P2 (R) splits over R, namely t2 − 1 = (t − 1)(t + 1).
(2) t + 1 ∈ P2 (R) does not split over R. However, viewed as a polynomial t2 + 1 ∈ P2 (C), it
2

does split over C, namely t2 + 1 = (t + i)(t − i).


Lemma 4.17. Suppose T : V → V is a diagonalizable linear operator. Then the characteristic
polynomial of T splits.
Proof. Let n = dim(V ) and suppose T ∈ L(V ) is diagonalizable. Then there is an ordered basis β
of V such that [T ]β = D is a diagonal matrix:
 
λ1 0 · · · 0
 0 λ2 · · · 0 
D =  ..
 
.. . . .. 
. . . . 
0 0 · · · λn
If f (t) is the characteristic polynomial of T , then
f (t) = det(T − tIV ) = det(D − tIn ) = (λ1 − t) · · · (λn − t) = (−1)n (t − λ1 ) · · · (t − λn ). 
Definition 4.18. Let λ be an eigenvalue of a linear operator or matrix with characteristic polyno-
mial f (t). The (algebraic) multiplicity of λ is the largest positive integer k for which (t − λ)k
is a factor of f (t) (i.e., f (t) can be written as f (t) = (t − λ)k g(t) for some polynomial g(t)).
 
3 1 0
Example 4.19. Let A = 0 3 4. Then the characteristic polynomial is f (t) = −(t−3)2 (t−4).
0 0 4
Hence λ = 3 is an eigenvalue of A with multiplicity 2, and λ = 4 is an eigenvalue of A with
multiplicity 1.
Definition 4.20. Let T ∈ L(V ), λ an eigenvalue of T . We define Eλ , the eigenspace of T
corresponding to λ, as
Eλ = {x ∈ V : T (x) = λx} = N (T − λIV ) (and similarly for a matrix).
Note that this is a subspace of V , consisting of 0 and the eigenvectors of T corresponding of λ.
Sometimes we refer to dim Eλ as the geometric multiplicity of λ. The next proposition says that
the geometric multiplicity of a particular eigenvalue is always at most the algebraic multiplicity:
Proposition 4.21. Let T ∈ L(V ), dim(V ) < ∞, λ an eigenvalue of T with multiplicity m. Then
1 ≤ dim(Eλ ) ≤ m.
Proof. Since λ is an eigenvalue, there is at least one nonzero v ∈ Eλ , and thus 1 ≤ dim Eλ .
Next, choose an ordered basis {v1 , . . . , vp } for Eλ . By Corollary 1.58 to the Replacement Lemma,
we can extend this to an ordered basis β = {v1 , . . . , vp , vp+1 , . . . , vn } for V . Let A = [T ]β . Since
v1 , . . . , vp are eigenvectors of T corresponding to λ, we have
 
λIp B
A =
0 C
40
where Ip is the p×p identity matrix, B ∈ Mp×(n−p) (F ), C ∈ M(n−p)×(n−p) (F ), and 0 is the (n−p)×p
zero matrix. Then
f (t) = det(A − tIn )
 
(λ − t)Ip B
=
0 C − tIn−p
= det((λ − t)Ip ) det(C − tIn−p ) (exercise)
p
= (λ − t) g(t),
where g(t) is a polynomial. Thus (λ − t)p is a factor of f (t), hence the (algebraic) multiplicity of
λ is at least p. Since dim(Eλ ) = p, this shows dim Eλ ≤ m. 
Lemma 4.22. Let T ∈ L(V ), λ1 , . . . , λk be distinct eigenvalues of T . Let vi ∈ Eλi for each
i = 1, . . . , k. If v1 + · · · + vk = 0, then vi = 0 for all i.
Proof. Assume towards a contradiction that (after possibly rearranging the order), we have vi 6= 0
for 1 ≤ i ≤ m, and vi = 0 for i > m, for some 1 ≤ m ≤ k. Then for each i ≤ m, vi is an eigenvector
of T corresponding to λi (since vi 6= 0), and v1 + · · · + vm = 0. This contradicts Proposition 4.12
since v1 , . . . , vm must be linearly independent. 

The following shows there is no extra linear dependence happening between different eigenspaces:
Proposition 4.23. Let T ∈ L(V ), let λ1 , . . . , λk be distinct eigenvalues of T . For each i = 1, . . . , k,
let Si be a finite linearly independent subset of Eλi . Then S = S1 ∪ · · · ∪ Sk is also a linearly
independent subset of V .
Proof. Suppose that Si = {vi,1 , . . . , vi,ni } for each i = 1, . . . , k, and some integer ni ≥ 0. Then
S = {vi,j : 1 ≤ i ≤ k, 1 ≤ j ≤ ni }. Let {ai,j } be a collection of scalars in F such that
ni
k X
X
ai,j vi,j = 0.
i=1 j=1

For each i, let wi := nj=1


P i
ai,j vi,j . Then wi ∈ Eλi and w1 + · · · + wk = 0. By the above lemma,
wi = 0 for each i = 1, . . . , k. But since each Si is linearly independent, it follows that ai,j = 0 for
all j. Thus S is linearly independent. 
Theorem 4.24. Let T ∈ L(V ), dim(V ), and assume that the characteristic polynomial of T splits.
Let λ1 , . . . , λk be the distinct eigenvalues of T . Then
(1) T is diagonalizable if and only if the multiplicity of λi is equal to dim(Eλi ) for each i.
(2) If T is diagonalizable and βi is an ordered basis for Eλi for each i, then β = β1 ∪ · · · ∪ βk is
an ordered basis for V consisting of eigenvectors of T (and hence [T ]β is a diagonal matrix).
Proof. For each i = 1, . . . , k, let mi denote the (algebraic) multiplicity of λi , and di = dim Eλi , and
suppose n = dim V .
(⇒) Suppose T is diagonalizable. Then there is β, an ordered basis of eigenvectors of T . For
each i, we let βi = β ∩ Eλi , and set ni := |βi |. Then we know
• ni ≤ di for each i, because βi is a linearly independent subset of Eλi , and di = dim Eλi ,
• di ≤ mi by Proposition 4.21
Pk
• ni = n, because β contains n vectors in total,
Pi=1
k
• i=1 mi = n, because the degree of the characteristic polynomial of T is equal to the sum of
the algebraic multiplicities of the eigenvalues (since it splits), and also equal to dim(V ) = n.
41
Combining these yields:
k
X k
X k
X
n = ni ≤ di ≤ mi = n.
i=1 i=1 i=1
Thus ki=1 (mi − di ) = 0. This implies mi = di for all i.
P
(⇐) Conversely, suppose that mi = di for all i. For each i, let βi be an ordered basis for Eλi , and
set β = β1 ∪ · · · ∪ βk . By Proposition 4.23, β is linearly independent. Furthermore, since di = mi
for all i, β has ki=1 di = ki=1 mi = n many vectors in it. Thus β is a basis for all of V consisting
P P
of eigenvectors of T . Thus T is diagonalizable. 
We now summarize what we know so far:
Test for diagonalization 4.25. Suppose T ∈ L(V ), where dim V = n. Then T is diagonalizable
if and only if both of the following conditions hold:
(1) The characteristic polynomial of T splits, and
(2) For each eigenvalue λ of T , the (algebraic) multiplicity of λ in the characteristic polynomial
equals dim Eλ = dim N (T − λIV ) = n − rank(T − λIV ).
An analogous statement holds for square matrices A ∈ Mn×n (F ).
 
3 1 0
Example 4.26. Consider A = 0 3 0 ∈ M3×3 (R). We’ll test A’s diagonalizability. The
0 0 4
characteristic polynomial is
 
3−t 1 0
f (t) = det(A − tI3 ) = det  0 3−t 0  = −(t − 4)(t − 3)2 .
0 0 4−t
This shows that f (t) splits, so condition (1) for diagonalizability holds. The eigenvalues are λ1 = 4
(with multiplicity 1) and λ2 = 3 (with multiplicity 2). Condition (2) is automatically satisfied for
λ1 (Since Proposition 4.21 says that 1 ≤ dim Eλ1 ≤ mult λ1 = 1). Thus we need only to check
condition (2) for λ2 . We see that the matrix
 
0 1 0
A − λ2 I3 = 0 0 0
0 0 1
has rank 2 (by which we mean LA−λ2 I3 has rank 2). Thus dim Eλ2 = 3 − rank(A − λ2 I3 ) = 3 − 2 =
1 6= 2 (the multiplicity of λ2 ). We conclude that A is not diagonalizable.
 
0 −2
Example 4.27. Now consider A = . Then f (t) = det(A − tI2 ) = (t − 1)(t − 2). Thus
1 3
λ1 = 1, λ2 = 2 are the eigenvalues of A, both with multiplicity 1, so A is diagonalizable since this
forces conditions (1) and (2) to be satisfied. Furthermore, calculations show that
   
−2 −1
Eλ1 = N (LA − 1 · IR2 ) = span{ } and Eλ2 = span{ }
1 1
   
−2 −1
Thus β = { } is a basis for Eλ1 and β2 = { } is a basis for Eλ2 . By Theorem 4.24,
1 1
   
−2 −1
β = β1 ∪ β2 = { , } is a basis for V = R2 consisting of eigenvectors of A. Thus [LA ]β is
1 1
   
−2 −1 −1 1 0
a diagonal matrix. In particular, if we let Q = , then Q AQ = , which shows
1 1 0 2
that A is similar to a diagonal matrix, hence diagonalizable.
42
Above, we used the following consequence of our change-of-basis material:
Remark 4.28. Let A ∈ Mn×n (F ), and suppose γ = {v1 , . . . , vn } is an ordered basis for F n . Then
[LA ]γ = Q−1 AQ, where Q = (v1 v2 · · · vn ). This is because for Q defined this way, we have
Q = [IV ]βγ , where β is the standard ordered basis of F n .

43
5. Inner product spaces
We will now look at a new topic, inner product spaces. These are an abstract version of Rn equipped
with the dot product from multivariable calculus. Inner products give rise to a notion of norm (i.e.,
length). Consequently, we need to work over a field of scalars where the scalars themselves have a
certain “magnitude”:
Convention 5.1. In this section, F = R or F = C. We are no longer assuming that our vector
spaces are always finite-dimensional.
Recall that the field of complex numbers C comes equipped with the operation of complex conju-
gation: given a + bi ∈ C, where a, b ∈ R, we define a + bi := a − bi to be the complex conjugate of
a + bi. Here are some basic arithmetic properties of complex conjugation:
• for α ∈ C, we have α = α iff α ∈ R
• for α, β ∈ C, we have α + β = α + β and αβ = αβ and (α/β) = α/β if β 6= 0
• for α ∈ C, we have α = α (conjugating twice does nothing)
• for α = a + bi ∈ C with a, b ∈ R, we define the real part of α as Re α := a and the
imaginary part of α as Im α := b. We can calculate these parts using complex conjugates:
Re α = (α + α)/2 and Im α = (α − α)/2i
We also have the absolute value (or modulus√ or norm) of a complex number: for α = a + bi ∈ C
with a, b ∈ R, the absolute value is |α| := a2 + b2 ∈ R. Here are the basic properties:
• for α, β ∈ C, |αβ| = |α| · |β| and |α/β| = |α|/|β| for β 6= 0
• for α, β ∈ C, |α + β| ≤ |α| + |β| (Triangle inequality)
• for α ∈ C, αα = |α|2 ∈ R (this is the key reason we care about complex conjugates in this
section).

5.1. Inner products and norms. We start by defining what an inner product is:
Definition 5.2. Let V be a vector space over F . An inner product on V is a function
h·, ·i : V × V → F
which assigns to each pair (x, y) ∈ V × V a scalar hx, yi ∈ F such that for all x, y, z ∈ V and c ∈ F :
(1) hcx + y, zi = chx, zi + hy, zi (linear in the first variable)
(2) hx, yi = hy, xi (conjugate symmetry)
(3) if x 6= 0, then hx, xi > 0 (positivity)
Remark 5.3. Suppose h·, ·i is an inner product on a vector space V over F .
(1) If F = R, then conjugate symmetry is just hx, yi = hy, xi for all x, y ∈ V , since c = c for all
c ∈ R.
(2) As usual for linearity, we actually have for all a1 , . . . , an ∈ F and x1 , . . . , xn , y ∈ V that
* n + n
X X
ai xi , y = ai hxi , yi.
i=1 i=1

Example 5.4. We define the standard inner product on F n as follows: for x = (a1 , . . . , an ), y =
(b1 , . . . , bn ) ∈ F n , we define:
n
X
hx, yi := ai bi .
i=1
44
It is straightforward to verify the condition (1), (2), and (3) for this inner product. For example,
if z = (c1 , . . . , cn ) and c ∈ F , then
X n n
X n
X
hcx + z, yi = (cai + ci )bi = c ai bi + ci bi = chx, yi + hz, yi.
i=1 i=1 i=1
Note: when F = R, then the complex conjugations play no role and hx, yi is just the usual dot
product from 33A.
The next example shows that if we have an inner product, we can define many other inner products:
Example 5.5. If h·, ·i is an inner product on V , then given r > 0 from R, we can define another
inner product h·, ·i0 on V by defining hx, yi0 := rhx, yi for all x, y ∈ V . (If r ≤ 0, this would not
give an inner product).
The next example shows up all the time in math, although it looks nothing like the usual dot
product:
Example 5.6. Let V = C([1, 0]), the R-vector space of all continuous functions f : [0, 1] → R. For
f, g ∈ V we define: Z 1
hf, gi := f (t)g(t)dt.
0
This defines an inner product on C([0, 1]). For example, given f, g, h ∈ V and c ∈ R, we have
Z 1 Z 1 Z 1
hcf + g, hi = (cf (t) + g(t))h(t)dt = c f (t)h(t)dt + g(t)h(t)dt = chf, hi + hg, hi.
0 0 0
Conjugate symmetry is clear. For positivity, if f 6= 0 ∈ C([0, 1]), then (f (t))2 > 0 for some t ∈ [0, 1],
R1
so 0 (f (t))2 dt > 0 since f is continuous (you will prove this type of thing in math131a).
Example 5.7. Let A ∈ Mm×n (F ). We define the conjugate transpose of A as the n × m matrix
A∗ such that (A∗ )ij := Aji . Note that when F = R, A∗ = At . For example:
   
i 1 + 2i −i 2
A = and A∗ = .
2 3 + 4i 1 − 2i 3 − 4i
Now consider V = Mn×n (F ), and define hA, Bi := tr(B ∗ A) for A, B ∈ V . This defines an inner
product on V , the so-called Frobenius inner product. We’ll check the positivity condition (the
rest are obvious): Note that
Xn Xn Xn Xn X n Xn X n
∗ ∗ ∗
hA, Ai = tr(A A) = (A A)ii = (A )ik Aki = Aki Aki = |Aki |2 ,
i=1 i=1 k=1 i=1 k=1 i=1 k=1
which is the sum of squares of the magnitudes of each entry, so if A 6= 0, then hA, Ai > 0.
Vector spaces themselves do not come with an inner product operation. If we choose to equip a
vector space with a particular inner product (which may or may not be possible), then the vector
space upgrades itself to a so-called inner product space:
Definition 5.8. A vector space V over F equipped with an inner product h·, ·i is called an inner
product space. If F = R, V is called a real inner product space, and if F = C, then V is
called a complex inner product space.
Note that if V is an inner product space with inner product h·, ·i and W ⊆ V is a subspace of V ,
then we may naturally also consider W as an inner product space using the restriction of h·, ·i to
W.
Here are some basic properties of inner product spaces:
45
Lemma 5.9. Let V be an inner product space, let x, y, z ∈ V and c ∈ F . Then
(1) hx, cy + zi = chx, yi + hx, zi (“conjugate linear” in second variable)
(2) hx, 0i = h0, xi = 0, and if hx, xi = 0, then x = 0 (properties of zero)
(3) If hx, zi = hy, zi for all z ∈ V , then x = y (fancy way of showing two vectors are equal)

Proof. (1) Note that

hx, cy + zi = hcy + z, xi = chy, xi + hz, xi = chy, xi + hz, xi = chx, yi + hx, zi.

(2) Note that hx, 0i = hx, 0xi = 0hx, xi = 0hx, xi = 0. Similarly h0, xi = 0. Also, if x 6= 0, then
hx, xi > 0.
(3) Suppose hx, zi = hy, zi for all z ∈ V . We want to show that x = y. It is sufficient to show
that x − y = 0. Note that

hx − y, x − yi = hx, x − yi − hy, x − yi = 0

since hx, x − yi = hy, x − yi by assumption (using z := x − y). Thus x − y = 0 by (2) above.




Recall that the usual dot product in R3 gave us a round-about way of defining the length √ of a vector:
the length of x = (a, b, c) ∈ R3 is √
the distance between (a, b, c) and (0, 0, 0) which is a2 + b2 + c2 .
In other words, the length of x is x · x. We use this idea as the definition of the norm (i.e., length)
of a vector in an abstract inner product space:

Definition 5.10. SupposepV is an inner product space. Then for every x ∈ V , we define the norm
or length of x as kxk := hx, xi.

Example 5.11. Let V = F n , equipped with the standard inner product. Given x = (a1 , . . . , an ) ∈
V , then
n
!1/2
X
kxk = k(a1 , . . . , an )k = |ai |2
i=1

is the usual Euclidean definition of length.

Many familiar properties of length hold in this generality:

Lemma 5.12. Suppose V is an inner product space over F . Then for all x, y ∈ V and c ∈ F we
have
(1) kcxk = |c| · kxk.
(2) kxk ≥ 0, and kxk = 0 iff x = 0,
(3) |hx, yi| ≤ kxk · kyk (Cauchy-Schwarz Inequality)
(4) kx + yk ≤ kxk + kyk (Triangle Inequality)

Proof. (1) and (2) are routine.


(3) If y = 0, then hx, yi = 0 and kxk · kyk = 0, so the result holds. Now assume y 6= 0, so
hy, yi > 0. We will employ a sneaky trick. Let c ∈ F be arbitrary. Note that

0 ≤ kx − cyk2 = hx − cy, x − cyi = hx, xi − chx, yi − chy, xi + cchy, yi.


46
The above inequality holds for any c ∈ F . Thus it holds for c = hx, yi/hy, yi. Plugging in this c in
gives us:
hx, yi hx, yi hx, yi hx, yi
0 ≤ hx, xi − hx, yi − hy, xi + hy, yi
hy, yi hy, yi hy, yi hy, yi
|hx, yi|2 |hx, yi|2 |hx, yi|2
= hx, xi − − + using αα = |α|2 for any α ∈ F
hy, yi hy, yi hy, yi
|hx, yi|2
= kxk2 − .
kyk2
Rearranging this inequality yields
|hx, yi|2 ≤ kxk2 kyk2
and taking square roots gives the desired inequality.
(4) We will prove instead that kx + yk2 ≤ (kxk + kyk)2 . Note that
kx + yk2 = hx + y, x + yi
= hx, xi + hy, xi + hx, yi + hy, yi
= kxk2 + 2 Re(hx, yi) + kyk2 using α + α = 2 Re(α) for α ∈ F
2 2
≤ kxk + 2|hx, yi| + kyk
≤ kxk2 + 2kxk · kyk + kyk2 by Cauchy-Schwarz
2
= (kxk + kyk) . 

5.2. Orthogonality. You may recall in earlier courses in R2 and R3 , the following formula for the
dot product:
~x · ~y = k~xk · k~y k cos θ
where θ ∈ [0, π] is the angle between ~x and ~y . The most important case of this is when θ = π/2
(the angle is 90◦ , a right-angle), i.e., when ~x and ~y are orthogonal or perpendicular. This happens
precisely when the dot product ~x · ~y = 0 equals zero. We will generalize this to arbitrary inner
product spaces now.
Definition 5.13. Let V be an inner product space.
(1) Vectors x, y ∈ V are orthogonal (think “perpendicular”) if hx, yi0.
(2) A subset S ⊆ V is orthogonal if for every x, y ∈ V , if x 6= y, then hx, yi.
(3) A vector x ∈ V is a unit vector if kxk = 1.
(4) A subset S ⊆ V is orthonormal if S is orthogonal and consists entirely of unit vectors.
Remark 5.14. In general, it is favorable to work with unit vectors. We can often replace vectors
with unit vectors. This process is called normalization. More specifically:
(1) A set of vectors S = {v1 , v2 , . . .} is orthonormal iff
(
hvi , vj i = 0 for all i 6= j
hvi , vi i = 1 for all i.
(2) If S = {v1 , v2 , . . .} is orthogonal, and a1 , a2 , a3 , . . . ∈ F are non-zero scalars, then the set
{a1 v1 , a2 v2 , . . .} is also orthogonal.  
1
(3) If x ∈ V is such that x 6= 0, then y := kxk x is a unit vector. We say that y is obtained
from x by normalizing.
47
(4) By (2) and (3), given a set of nonzero orthogonal vectors, we can obtain an orthonormal
set by normalizing every vector in it.
Example 5.15. In R3 , {(1, 1, 0), (1, −1, 1), (−1, 1, 2)} is an orthogonal set of nonzero vectors, but
it is not orthonormal. By normalizing each of the vectors, we obtain an orthonormal set:
 1 1 1
√ (1, 1, 0), √ (1, −1, 1), √ (−1, 1, 2) .
2 3 6
5.3. Orthonormal bases and Gram-Schmidt orthogonalization.
Definition 5.16. Let V be an inner product space. A subset S of V is an orthonormal basis
for V is S is an ordered basis for V and S is orthonormal.
The same way a bases are the building blocks of a vector space, orthonormal bases are the building
blocks of inner product spaces.
Example 5.17. The standard ordered basis for F n is an orthonormal basis for the inner product
space F n (equipped with the standard inner product).
Example 5.18. The set    
1 2 2 −1
√ ,√ , √ ,√
5 5 5 5
is an orthonormal basis for R2 .
The following illustrates the utility of orthonormal/orthogonal bases/sets; it makes finding the
coefficients in linear combinations very easy.
Proposition 5.19. Let V be an inner product space and S = {v1 , . . . , vk } be an orthogonal subset
of V consisting of distinct nonzero vectors. If y ∈ span(S), then
k
X hy, vi i
y = vi .
kvi k2
i=1
In addition, if S is orthonormal, then
k
X
y = hy, vi ivi .
i=1

Proof. Note first that since {v1 , . . . , vk } is orthogonal, if i 6= j, then hvi , vj i = 0 (we’ll use this
below). Write y = ki=1 ai vi , where a1 , . . . , ak ∈ F . Then for j = 1, . . . , k we can “apply h·, vj i” to
P

“y = ki=1 ai vi ” to get
P
* k +
X
because y = ki=1 ai vi
P
hy, vj i = ai vi , vj
i=1
k
X
= ai hvi , vj i because h·, vj i is linear in first variable
i=1
= aj hvj , vj i because hvi , vj i = 0 if i 6= j
= aj kvj k2
Since vj 6= 0 by assumption, we can solve for aj to get aj = hv, vj i/kvj k2 .
We are now done. However, to belabor the point, we have shown for each j = 1, . . . , n, that
aj = hv, vj i/kvj k2 . Since j is just a dummy index, this means that for each i = 1, . . . , n, we
48
have ai = hv, vi i/kvi k2 . Plugging this expression for ai back into our original linear combination
y = ki=1 ai vi yields the desired formula
P

k
X hy, vi i
y = vi .
kvi k2
i=1

In case S is orthonormal, then each vi is unit length, i.e., kvi k = 1 for each i. Thus the above
expression simplifies to
X k
y = hy, vi ivi
i=1
in this case. 

The proof of Proposition 5.19 also implies that orthogonal sets of nonzero vectors are linearly
independent:
Corollary 5.20. Let V be an inner product space and let S be an orthogonal subset of S consisting
of nonzero vectors. Then S is linearly independent.
Proof. Suppose v1 , . . . , vk ∈ S and ki=1 ai vi = 0. As in the proof of Proposition 5.19 with y = 0,
P
if we apply h·, vj i to this linear combination, we deduce that aj = h0, vj i/kvj k2 = 0 for all j. Thus
S is linearly independent. 
Example 5.21. By the previous corollary, the orthonormal set
 
1 1 1
β = √ (1, 1, 0), √ (1, −1, 1), √ (−1, 1, 2) .
2 3 6
from a previous example is an orthonormal basis for R3 (since it is in fact linearly independent).
Let x = (2, 1, 3). Using Proposition 5.19 we can compute:
1 1 3
a1 = hx, v1 i = 2 · √ + 1 · √ + 3 · 0 = √
2 2 2
1 1 1 4
a2 = hx, v2 i = 2 · √ − 1 · √ + 3 · √ = √
3 3 3 3
 
1 1 2 5
a3 = hx, v3 i = 2 · − √ +1· √ +3· √ = √
6 6 6 6
and thus
3 4 5
x = √ v1 + √ v2 + √ v3 .
2 3 6
This example shows that expressing an arbitrary vector in terms of an orthonormal basis is as easy
as taking inner products. Now the questions is, how do we obtain an orthonormal basis?
We will look at a special case first (see Figure 6.1 on Page 344 of Friedberg). Suppose {w1 , w2 } is
a linearly independent set of vectors in an inner product space V . Let W := Span(w1 , w2 )
• We want to replace {w1 , w2 } with an orthogonal set {v1 , v2 } which spans the same subspace.
• We keep w1 as is, i.e., set v1 := w1 .
• We want to subtract some multiple of w1 from w2 to create “a version” of w2 which is
orthogonal to w1 (we’ll call it v2 ).
• I.e., we want to find c ∈ F such that w1 and v2 := w2 − cw1 are orthogonal.
49
• Let’s solve for c ∈ F :
0 = hv2 , w1 i = hw2 − cw1 , w1 i = hw2 , w1 i − chw1 , w1 i
and thus
hw2 , w1 i hw2 , w1 i
c = , and v2 = w2 − w1 .
kw1 k2 kw1 k2
The following gives the general case:
Gram-Schmidt Process 5.22. Let V be an inner product space, and S = {w1 , . . . , wn } a linearly
independent set of vectors. Define S 0 = {v1 , . . . , vn }, where v1 := w1 , and
k−1
X hwk , vj i
vk := wk − vj for 2 ≤ k ≤ n
kvj k2
j=1

Then S0 is an orthogonal set of non-zero vectors such that Span(S 0 ) = Span(S).


Proof. We will proceed by induction on n, the number of vectors in S.
Base Case: If n = 1, then we are done with S10 = S1 and v1 = w1 6= 0.
Induction step: Suppose n > 1. Assume that we know the statement of “Gram-Schmidt Process”
is true for any linearly independent set of n − 1 vectors. Then we can apply the statement to the
0
set Sn−1 := {v1 , . . . , vn−1 } to obtain a set Sn−1 = {v1 , . . . , vn−1 } with the desired properties, i.e.,
that
k−1
X hwk , vj i
vk := wk − vj for 2 ≤ k ≤ n − 1,
kvj k2
j=1
and 0
Sn−1
is an orthogonal set of nonzero vectors such that Span(Sn−1 0 ) = Span(Sn−1 ).
0 0
We will show that S := Sn−1 ∪ {vn } = {v1 , . . . , vn−1 , vn } also has the desired properties, where
n−1
X hwn , vj i
(∗) vn := wn − vj
kvj k2
j=1
0
If vn = 0, then (∗) implies that wn ∈ Span(Sn−1 ) = Span(Sn−1 ), contradicting the assumption that
S is linearly independent, thus vn 6= 0. Now note that for i = 1, . . . , n − 1, we have
n−1
X hwn , vj i hwn , vi i
hvn , vi i = hwn , vi i − 2
hvj , vi i = hwn , vi i − kvi k2 = 0
kvj k | {z } kvi k2
j=1
=0 if i 6= j

Thus S0 is an orthogonal set of nonzero vectors (we already know hvi , vj i behaves as aspected for
0
i, j ∈ {1, . . . , n − 1} since Sn−1 is an orthogonal set of nonzero vectors by assumption). It follows
that Span(S ) ⊆ Span(S). However, S 0 is linearly independent by Corollary 5.20 and has n vectors,
0

thus Span(S 0 ) = Span(S). 


Example 5.23. Use Gram-Schmidt to orthogonalize the following subset of V = R4 (with the
standard inner product): {w1 , w2 , w3 } where w1 = (1, 0, 1, 0), w2 = (1, 1, 1, 1), and w3 = (0, 1, 2, 1).
First, we define v1 := w1 = (1, 0, 1, 0). Next we have
hw2 , v1 i 2
v2 := w2 − 2
= (1, 1, 1, 1) − (1, 0, 1, 0) = (0, 1, 0, 1).
kv1 k 2
Furthermore,
hw3 , v1 i hw3 , v2 i 2 2
v3 := w3 − 2
− 2
= (0, 1, 2, 1) − (1, 0, 1, 0) − (0, 1, 0, 1) = (−1, 0, 1, 0).
kv1 k kv2 k 2 2
50
Now we have an orthogonal set of vectors {v1 , v2 , v3 } which have the same span as {w1 , w2 , w3 }.
Suppose we want to go one step further and get an orthonormal basis for this subspace. Then we
just normalize our orthogonal set:

1 1
u1 := v1 = √ (1, 0, 1, 0)
kv1 k 2
1 1
u2 := v2 = √ (0, 1, 0, 1)
kv2 k 2
1 1
u3 := v3 = √ (−1, 0, 1, 0)
kv3 k 2

Thus {u1 , u2 , u3 } is an orthonormal basis for the subspace Span(w1 , w2 , w3 ).

Corollary 5.24. Let V be a finite-dimensional inner product space. Then V has an orthonormal
basis β. Furthermore, if β = {v1 , . . . , vn } and x ∈ V , then x = ni=1 hx, vi ivi .
P

Proof. Let β0 be an ordered basis for V (not necessarily orthogonal or orthonormal). Applying
Gram-Schmidt 5.22, we obtain an orthogonal set β 0 of nonzero vectors with Span(β0 ) = Span(β 0 ) =
V . Normalizing each vector in β 0 , we obtain an orthonormal set β such that Span(β) = Span(β 0 ) =
V . β is linearly independent by Corollary 5.20, hence is an orthonormal basis. The rest follows
from Proposition 5.19. 

Corollary 5.25. Let V be a finite-dimensional inner product space with an orthonormal basis β =
{v1 , . . . , vn }. Let T be a linear operator on V , and let A = [T ]β . Then for any i, j, Aij = hT (vj ), vi i.

5.4. Orthogonal complement. In this subsection, we generalize the relationship between a plane
in R3 which passes through the origin, and a line through the origin which is normal to that plane.

Definition 5.26. Let V be an inner product and let S ⊆ V be a non-empty set of vectors. We
define the orthogonal complement of S to be the set S ⊥ (pronounced “S perp”) to be the set
of all vectors in V that are orthgonal to all vectors in S, i.e.,

S ⊥ :=

x ∈ V : hx, yi = 0 for all y ∈ S .

Note that S ⊥ ⊆ V is a subspace of V , even if S was not a subspace.

Example 5.27. (1) For V any inner product space, we always have {0}⊥ = V and V ⊥ = {0}.
(2) For V = R3 (with standard inner product), and S = {e3 }, then S ⊥ is the xy-plane, i.e.,
S ⊥ = Span(e1 , e2 ).

Proposition 5.28. Suppose V is an inner product space, W ⊆ V is a subspace such that dim(W ) <
∞. Let y ∈ V . Then there are unique vectors u ∈ W and z ∈ W ⊥ such that y = u+z. Furthermore,
if {v1 , . . . , vk } is an orthonormal basis for W , then u = ki=1 hy, vi ivi .
P

Proof. Let {v1 , . . . , vk } be an orthonormal basis for W and let u := ki=1 hy, vi ivi . Define z := y −u.
P

Then u ∈ W and y = u + z. To show that z ∈ W ⊥ , it suffices to show that z is orthogonal to each


vj (by conjugate linearity, it follows that z will be orthogonal to Span(v1 , . . . , vk ) = W ). Then for
51
j = 1, . . . , k we have
k
* +
X
hz, vj i = y− hy, vi ivi , vj
i=1
k
X
= hy, vj i − hy, vi ihvi , vj i
i=1
= hy, vj i − hy, vj i
= 0.
For uniqueness of u and z, suppose that y = u + z = u0 + z 0 where u0 ∈ W and z 0 ∈ W ⊥ . Then
u − u0 = z 0 − z ∈ W ∩ W ⊥ = {0}. Thus u = u0 and z = z 0 . 
Corollary 5.29. Let V , W , y = u + z be as in Proposition 5.28. The vector u is the unique vector
in W that is “closest” to y in the following sense: given any x ∈ W , ky − xk ≥ ky − uk, and this
inequality is an equality if and only if x = u.
Proof. Suppose x ∈ W . Then u − x ∈ W is orthogonal to to z ∈ W ⊥ , so we have
ky − xk2 = ku + z − xk2
= k(u − x) + zk2
= ku − xk2 + kzk2 because u − x ⊥ z, see HW problem
≥ kzk2
= ky − uk2 .
Next, suppose ky −xk = ky −uk. Then the inequality above is equality, so we have ku−xk2 +kzk2 =
kzk2 . Thus ku − xk = 0, so x = u. 
The vector u in the above Corollary 5.29 and Proposition 5.28 is called the orthogonal projection
of y on W .
Proposition 5.30. Suppose V is an inner product space with dim(V ) = n. Let S = {v1 , . . . , vk } ⊆
V be orthonormal. Then
(1) S can be extended to an orthonormal basis {v1 , . . . , vk , vk+1 , . . . , vn } for V .
(2) If W = Span(S), then S1 := {vk+1 , . . . , vn } is an orthonormal basis for W ⊥ .
(3) If W is any subspace of V , then dim(V ) = dim(W ) + dim(W ⊥ ).
Proof. (1) By Corollary 1.60, S can be extended to an ordered basis S 0 = {v1 , . . . , vk , wk+1 , . . . , wn }
for V . Now we apply the Gram-Schmidt Process to S 0 . The first k vectors will stay the
same since they are already orthogonal to each other. Thus we get a new orthogonal ordered
basis of V , and then normalizing the last n − k gives us an orthonormal basis.
(2) S1 is linearly independent because it is contained in a basis. SincePS1 is a subset of W ⊥ , we
need to show that Span(S1 ) ⊇ W . For any x ∈ V , we have x = ni=1 hx, vi ivi . If x ∈ W ⊥ ,

then hx, vi i = 0 for i = 1, . . . , k. Thus


n
X
x = hx, vi ivi ∈ Span(S1 ).
i=k+1

(3) Let W be a subspace of V . It is a finite dimensional inner product space since V is, so
W has an orthonormal basis {v1 , . . . , vk }. By (1) and (2), it follows that dim(V ) = n =
k + (n − k) = dim(W ) + dim(W ⊥ ). 
52
Example 5.31. Let W = Span({e1 , e2 }) in F 3 . Then x = (a, b, c) ∈ W ⊥ iff 0 = hx, e1 i = a and
0 = hx, e1 i = b. Thus x = (a, b, c) ∈ W ⊥ iff x = (0, 0, c) (so a = b = 0). so W ⊥ = Span({e3 }).

53
6. Appendix: non-linear algebra math
6.1. Sets. A set is a collection of mathematical objects. Mathematical objects can be almost
anything: numbers, other sets, functions, vectors, etc. For instance:
{2, 5, 7}, {3, 5, {8, 9}}, and {1, 3, 5, 7, . . .}
are all sets. A member of a set is called is called an element of the set. The membership relation
is denoted with the symbol “∈”, for instance, we write “2 ∈ {2, 5, 7}” (pronounced “2 is an element
of the set {2, 5, 7}”) to denote that the number 2 is a member of the set {2, 5, 7}. There are several
ways to describe a set:
• by explicitly listing the elements in that set, i.e., the set {2, 5, 7} is a set with three elements,
the number 2, the number 5, and the number 7.
• by specifying a “membership requirement” that determines precisely which objects are in
that set. For instance:

n ∈ Z : n is positive and odd
is the set of all odd positive integers. The above set is pronounced “the set of all integers
n such that n is positive and odd”. The colon “:” is usually pronounced “such that”, and
the condition to the right of the colon is the membership requirement. Defining a set in
this way is sometimes referred to as using set-builder notation since you are describing how
the set is built (in the above example, the set is built by taking all integers and keeping the
ones that are positive and odd), instead of explicitly specifying which elements are in the
set.
Here are some famous sets you should be familiar with:
(1) The emptyset is the set with no elements. It is denoted by ∅ or {}.
(2) The set of natural numbers is the set N = {1, 2, 3, 4, . . .} (in this class we do not consider
0 to be a natural number, in accordance with the textbook).
(3) The set of integers is the set Z = {0, 1, −1, 2, −2, 3, −3, . . .}.
(4) The set of rational numbers is the set
 
k
Q = : k, l ∈ Z and l 6= 0
l
of all fractions of integers.
(5) The set of real numbers, which we denote √ by R. This set of numbers contains all rational
numbers but it also contains numbers like 2 and π. In this class we will not discuss exactly
what the real numbers are (this is done in MATH131A), and we will assume familiarity with
the basic properties of R.
(6) The set of complex numbers are all numbers of the form
{a + bi : a, b ∈ R}
where i2 = 1.
Consider the following two sets:
{2, 5, 7} versus {2, 5, 7, 9}
Notice how every element of the first set is contained in the second set. This relationship is denoted
with the symbol ⊆, pronounced “is a subset of”. Specifically: given two sets A and B, we say that
A is a subset of B (written: A ⊆ B) if for every x ∈ A, it follows that x ∈ B. Note that for every
set A, it is always automatically true that ∅ ⊆ A, because there are no elements x ∈ ∅ such that
54
x 6∈ A (since there are no elements in ∅, period). We have the following subset relations among our
famous sets from above:
∅ ⊆ N ⊆ Z ⊆ Q ⊆ R ⊆ C.
Question 6.1. Suppose you are in a situation where you are asked to prove A ⊆ B for some sets
A and B. What is the general strategy?
Answer. This depends on what the sets A and B actually are, but in general, you take an arbitrary
element x ∈ A and by some argument, conclude that x also has to be an element of B, i.e., that
x ∈ B. Here, “arbitrary” means that you are not allowed to assume anything specific about the
element x except that it belongs to A. We give an example from linear algebra. 
Example 6.2. Prove that
(a, b, c) ∈ R3 : a = 0 and b = 0 ⊆ (a, b, c) ∈ R3 : a + b = 0
 

Proof. Call the first set A and the second set B. We want to prove that A ⊆ B. Let (a, b, c) ∈ A
be arbitrary. This means that a = 0 and b = 0. We want to show that (a, b, c) is an element of B.
In order to be an element of B, it would have to be true that a + b = 0. However, since a = 0 and
b = 0, then we have a + b = 0 + 0 = 0. Thus (a, b, c) satisfies the membership requirement for B so
we can conclude that (a, b, c) ∈ B. Since (a, b, c) was an arbitrary element of A, we can conclude
that A ⊆ B. 
INCORRECT proof. We want to show that A ⊆ B. Let (a, b, c) be an element of A, for instance,
(0, 0, 2). The vector (0, 0, 2) is also in B since a + b = 0 + 0 = 0 for this vector. Thus A ⊆ B.
[Here, the crime is that we showed that a single specific vector from A is also in B. This does not
constitute a proof that all vectors from A are also elements of B.] 
Since sets are just collections of elements, we would like to say that
{2, 5, 7} is the same set as {7, 5, 2},
i.e., the only thing that determines a set completely is which elements are in it. In other words, we
say that two sets A and B are equal (written A = B) if they have the same elements.
Question 6.3. Suppose you are asked to prove that A = B where A and B are sets (possibly with
two different-seeming descriptions). How do you prove that A = B?
Answer. This means you have to prove two separate things:
(1) prove A ⊆ B, and
(2) prove B ⊆ A.
So this breaks down to two different proofs, each one then reduces to answering Question 6.1 for
those particular sets. 

6.2. Set operations - making new sets from old. Let A, B be sets. We define the union of
A and B to be the set of all elements which are members of either A or B, written:
A ∪ B := {x : x ∈ A or x ∈ B}
We define the intersection of A and B to be the set of all elements which are members of both A
and B, written:
A ∩ B := {x : x ∈ A and x ∈ B}
For example:
{1, 2, 3} ∪ {3, 4, 5} = {1, 2, 3, 4, 5} and {1, 2, 3} ∩ {3, 4, 5} = {3}
55
We define the (cartesian) product of A and B to be the set of all ordered pairs (a, b) such that
a ∈ A and b ∈ B, written:

A × B := (a, b) : a ∈ A and b ∈ B
For example, 
{1, 2} × {3, 4} = (1, 3), (1, 4), (2, 3), (2, 4)
We can also take the cartesian product of finitely many sets A1 , . . . , An :

A1 × · · · × An := (a1 , a2 , . . . , an ) : ai ∈ Ai for i = 1, . . . , n
Given a set A and n ≥ 1, we define
An := A
| × ·{z
· · × A}
n times
For example, this is how we define the vector space F n , where F is a field.
6.3. Functions. In this subsection we give a precise set-theoretic definition of a function.
Definition 6.4. A function is an ordered triple (f, A, B) such that
(1) A and B are sets, and
(2) f ⊆ A × B is a subset of the cartesian product A × B with the properties:
(a) for every a ∈ A, there is b ∈ B such that (a, b) ∈ f , and
(b) for every a ∈ A, if b, b0 ∈ B are such that (a, b), (a, b0 ) ∈ f , then b = b0 .
Given a function (f, A, B), the set A is called the domain, and the set B is called the codomain.
We express this fact by writing f : A → B, which is read “f is a function from A to B”. Sometimes
we will just refer to a function f instead of (f, A, B) when the domain and codomain are clear from
context (or don’t matter).
In practice, we almost never think of a function as an ordered triple (f, A, B). Furthermore,
instead of writing “(a, b) ∈ f ”, we write instead “f (a) = b”. We think of a function as being a rule
which assigns to each a ∈ A, a unique element b ∈ B.
Example 6.5. Let f be the following function from A = {0, 1} to B = {2, 3, 4}:

f = (0, 3), (1, 2) .
Usually, we would describe f instead by specifying in this case: f (0) = 3 and f (1) = 2.

Question 6.6. Suppose you have two functions f : A → B and g : A → B that have the same
domain and codomain. How do you prove that f = g?
Answer. To show that f = g, this means that f and g take the same values on every possible input.
Thus in a proof, you let x ∈ A be arbitrary, and then you somehow prove that f (x) = g(x). 
Definition 6.7. Given functions f : A → B and g : B → C, we define the composition of f and
g as the function g ◦ f : A → C defined by (g ◦ f )(a) := g(f (a)) for every a ∈ A. In other words,
the pair (a, c) ∈ g ◦ f if and only if there is b ∈ B such that f (a) = b and g(b) = c.
Perhaps the most important thing we can say about composition of functions at this level of
generality is the following:
Proposition 6.8 (Associativity of function composition). Suppose f : A → B, g : B → C and
h : C → D. Then we have to compositions g ◦ f : A → C and h ◦ g : B → D, and
h ◦ (g ◦ f ) = (h ◦ g) ◦ f
as functions A → D.
56
Proof. We want to show that two functions are equal. This means we need to show that they take
the same values on all possible inputs Let a ∈ A be arbitrary. Note that
 
h ◦ (g ◦ f ) (a) = h (g ◦ f )(a) (definition of ◦)

= h g(f (a)) (definition of ◦)

= (h ◦ g) f (a) (definition of ◦)

= (h ◦ g) ◦ f (a) (definition of ◦)
Thus h ◦ (g ◦ f ) = (h ◦ g) ◦ f as functions. 
Definition 6.9. Let A be a set. Then we define the identity function IA : A → A by IA (a) = a
for every a ∈ A.
Suppose f : A → B is a function. We say a function g : B → A is an inverse of f if f g = IB
and gf = IA . If f has an inverse, then we say f is invertible.
Note that if f is invertible, then the inverse of f is unique and we denote it by f −1 . Indeed,
suppose g, g 0 : B → A are inverses of f . Then
g = gIB = g(f g 0 ) = (gf )g 0 = IA g 0 = g 0 .
Here are some basic facts about invertibility of functions you should know:
Lemma 6.10. Suppose f : A → B and g : B → C are functions. Then
(1) If f and g are both invertible, then
(a) gf is invertible, with (gf )−1 = f −1 g −1 ,
(b) f −1 is also invertible, with (f −1 )−1 = f .
(2) f is invertible if and only if f is a bijection (one-to-one and onto).
Proof. (1) For (a), it is enough to show that f −1 g −1 is an inverse to gf . Note that
(f −1 g −1 )(gf ) = f −1 (g −1 g)f = f −1 IB f = f −1 f = IA .
Likewise, we also have (gf )(f −1 g −1 ) = IB . This shows that gf is invertible and its (unique)
inverse is f −1 g −1 .
For (b), it is enough to show that f is an inverse to f −1 . We know that f f −1 = IB and
f f = IA . Thus f −1 is invertible and its inverse is f , i.e., (f −1 )−1 = f .
−1

(2) (⇒) For any b ∈ B, f f −1 (b) = IB (b) = b, thus f (f −1 (b)) = b, with f −1 (b) ∈ A, so f is
onto. Likewise, if f (a1 ) = f (a2 ), then f −1 f (a1 ) = f −1 f (a2 ), and so IA (a1 ) = IA (a2 ), i.e.,
a1 = a2 . Thus f is one-to-one.
We leave the (⇐) direction as an exercise. 

6.4. Induction. In this subsection we will review the proof principle of induction. Occasionally
we will use induction to prove results in class. In this class2, the natural numbers is the set
N = {1, 2, 3, 4, . . .}
of positive integers. We will not attempt to construct the natural numbers axiomatically, instead
we assume that they are already given and that we are familiar with their basic properties, for
instance, how the operations +, · and the ordering ≤ work with N. Here is an important basic
property about N which we can take for granted:
2In other textbooks, sometimes the natural numbers include zero, i.e., N = {0, 1, 2, 3, 4, . . .}. Also sometimes
people might write N0 = {0, 1, 2, 3, . . .}. We won’t do this here, but it’s good to know about it. The important thing
is that when you are communicating about the natural numbers with someone else, you are always on the same page
about whether you are including 0 or not.
57
Well-ordering Principle 6.11. Suppose S ⊆ N is such that S 6= ∅. Then S has a least element,
i.e., there is some a ∈ S such that for all b ∈ S, a ≤ b.
The Well-Ordering Principle of N gives us the following important proof principle about natural
numbers:
Principle of Induction 6.12. Suppose P (n) is a property that a natural number n may or may
not have. Suppose that
(1) P (1) holds (this is called the “base case for the induction”), and
(2) for every n ∈ N, if P (n) holds, then P (n + 1) holds (this is called the “inductive step”).
Then P (n) holds for every natural number n ∈ N.
Proof. Define the set: 
S := n ∈ N : P (n) is false ⊆ N.
Assume towards a contradiction that P (n) does not hold for every natural number n ∈ N. Thus
S 6= ∅. By the Well-Ordering Principle, the set S has a least element a. Since P (1) holds by
assumption, we know that 1 < a (so a − 1 ∈ N). Since a is the least element of S, then the
natural number a − 1 6∈ S, so P (a − 1) holds. By assumption (2), this implies P (a) holds, a
contradiction. 
Warning 6.13. In part (2) of the Principle of Induction, it does not say you have to prove P (n+1)
is true. It says you have to prove that the following implication holds:
 
P (n) is true =⇒ P (n + 1) is true
Here is the standard first example of a proof by induction:
Example 6.14. The equality
n(n + 1)
1 + 2 + ··· + n =
2
holds for all n ∈ N.
Proof. Let P (n) be the assertion:
1
P (n) : “1 + 2 + · · · + n = n(n + 1) is true.”
2
We will show that P (n) holds for all n ∈ N by induction on n.
First, we show that P (1) holds outright. This is easy because P (1) says “1 = 21 · 1 · 2”, which is
obviously true.
Next, we will show that P (n) implies P (n + 1). Suppose P (n) holds, i.e.,
1
1 + 2 + · · · + n = n(n + 1).
2
We must now show that P (n + 1) also holds. To see this, add n + 1 to both sides of the above
equality:
1
1 + 2 + · · · + n + (n + 1) = n(n + 1) + (n + 1)
2
= (n/2 + 1)(n + 1)
1
= (n + 2)(n + 1)
2
1 
= (n + 1) (n + 1) + 1 .
2
Thus P (n + 1) holds as well. 
58
We also have the following variant of the Principle of Induction, which starts at some integer other
than 1 (for instance, if we want to start at 0):
Corollary 6.15 (Principle of Induction starting at N ). Let N ∈ Z and suppose P (n) is a property
that an n ≥ N may or may not have. Suppose that
(1) P (N ) holds.
(2) for every n ≥ N , if P (n) holds, then P (n + 1) holds.
Then P (n) holds for every natural number n ≥ N .
Proof. We will prove this by reducing it to the original Induction Principle by shifting. Let Q(n)
be the statement:
Q(n) : “P (n + N − 1) holds.”
Then (1) implies that Q(1) holds. Also, (2) implies that for every n ≥ 1, Q(n) ⇒ Q(n + 1).
Thus Q(n) is true for all n ≥ 1 by the Principle of Induction. In other words, P (n) is true for all
n ≥ N. 

References
1. Stephen H. Friedberg, Arnold J. Insel, and Lawrence E. Spence, Linear algebra, custom edition for university of
california, los angeles ed., Prentice Hall, Inc., Upper Saddle River, NJ, 2003.
E-mail address: allen@math.ucla.edu

Department of Mathematics, University of California, Los Angeles, Los Angeles, CA 90095

59

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy