UW Math 308 Notes
UW Math 308 Notes
Math 308
S. Paul Smith
Department of Mathematics, Box 354350, University of Washington, Seattle, WA 98195
E-mail address: smith@math.washington.edu
Contents
Chapter 0. Introduction
1. What its all about
2. Some practical applications
3. The importance of asking questions
1
1
2
2
3
3
10
13
15
15
Chapter 2. Matrices
1. What is a matrix?
2. A warning about our use of the word vector
3. Row and column vectors = points with coordinates
4. Matrix arithmetic: addition and subtraction
5. The zero matrix and the negative of a matrix
6. Matrix arithmetic: multiplication
7. Pitfalls and warnings
8. Transpose
9. Some special matrices
10. Solving an equation involving an upper triangular matrix
11. Some special products
17
17
18
18
20
20
21
25
25
26
27
28
31
31
31
31
31
31
31
31
33
33
36
iii
iv
CONTENTS
3.
4.
5.
6.
7.
8.
37
38
38
41
41
42
45
45
45
46
47
48
49
49
51
52
53
54
57
57
58
58
60
Chapter 7. Subspaces
1. The definition and examples
2. The row and column spaces of a matrix
3. Lines, planes, and translations of subspaces
4. Linguistic difficulties: algebra vs. geometry
61
61
64
65
67
69
69
69
73
75
75
76
78
78
79
81
83
83
CONTENTS
83
84
84
85
86
86
86
90
91
91
92
94
95
96
96
96
97
98
98
98
99
103
103
106
108
108
109
113
113
114
115
116
117
118
122
125
125
125
129
129
131
vi
CONTENTS
133
133
133
134
135
135
136
138
140
143
146
147
147
149
149
149
150
150
152
157
157
157
159
159
160
160
161
161
162
165
165
166
CHAPTER 0
Introduction
1. What its all about
The practical problem, solving systems of linear equations, that motivates the subject of Linear Algebra, is introduced in chapter 4. Although
the problem is concrete and easily understood, the methods and theoretical
framework required for a deeper understanding of it are abstract. Linear algebra can be approached in a completely abstract fashion, untethered from
the problems that give rise to the subject.
Such an approach is daunting for the average student so we will strike a
balance between the practical, specific, abstract, and general.
Linear algebra plays a central role in almost all parts of modern technology. Systems of linear equations involving hundreds, thousands, even
billions of unknowns are solved every second of every day in all corners of
the globe. One of the more fantastic uses is the way in which Google prioritizes pages on the web. All web pages are assigned a page rank that measures
its importance. The page ranks are the unknowns in an enormous system
of linear equations. To find the page rank one must solve the system of
linear equations. To handle such large systems of linear equations one uses
sophisticated techniques that are developed first as abstract results about
linear algebra.
Systems of linear equations are rephrased in terms of matrix equations,
i.e., equations involving matrices. The translation is straightforward but
after mastering the basics of matrix arithmetic one must interpret those
basics in geometric terms. That leads to linear geometry and the language
of vectors and vector spaces.
Chapter 1 provides a brief account of linear geometry. As the name
suggests, linear geometry concerns lines. The material about lines in the
plane is covered in high school. Unless you know that material backwards
and forwards linear algebra will be impossible for you. Linear geometry
also involves higher dimensional analogues of lines, for examples, lines and
planes in 3-space, or R3 as we will denote it. I am assuming you met that
material in a multivariable calculus course. Unless you know that material
backwards and forwards linear algebra will be impossible for you.1
1I expect you to know the material about linear geometry in R2 and R3 . By that I
dont mean that you have simply passed a course where that material is covered. I expect
you to have understood and mastered that material and that you retain that mastery today. If
1
0. INTRODUCTION
After reviewing linear geometry we will review basic facts about matrices
in chapter 2. That will be familiar to some students. Very little from chapter
2 is required to understand the initial material on systems of linear equations
in chapter 4.
2. Some practical applications
(1) solving systems of linear equations
(2) birth-death processes, Leslie matrices (see Ch. 6 in Difference equations. From rabbits to chaos), Lotka-Volterra
(3) Google page rank
(4) X-ray tomography, MRI, density of tissue, a slice through the
body is pixellated and each pixel has a different density to be
inferred/computed from the input/output of sending various rays
through the body. Generally many more measurements than unknowns so almost always obtain an inconsistent system but want
the least-squares solution.
3. The importance of asking questions
You will quickly find that this is a difficult course. Everyone finds it
difficult. The best advice I can give you is to ask questions. When you read
something you dont understand dont just skim over it thinking you will
return to it later and then fail to return to it. Ask, ask, ask. The main
reason people fail this course is because they are afraid to ask questions.
They think asking a question will make them look stupid. Smart people
arent afraid to ask questions. No one has all the answers.
The second most common reason people fail this course is that they wait
till the week before the exam to start thinking seriously about the material.
If you are smart you will try to understand every topic in this course at the
time it first appears.
you are not in command of the material in chapter 1 master it as soon as possible. That
is essential if you want to pass this course.
CHAPTER 1
Linear Geometry
1. Linearity
The word linear comes from the word line.
The geometric aspect of linear algebra involves lines, planes, and their
higher dimensional analogues: e.g., lines in the plane, lines in 3-space, lines
in 4-space, planes in 3-space, planes in 4-space, 3-planes in 4-space, 5-planes
in 8-space, and so on, ad infinitum. Such things form the subject matter of
linear geometry.
Curvy things play no role in linear algebra or linear geometry. We ignore
circles, spheres, ellipses, parabolas, etc. All is linear.
1.1. What is a line? You already know what a line is. More accurately, you know something about lines in the plane, R2 , or in 3-space, R3 .
In this course, you need to know something about lines in n-space, Rn .
1.2. What is Rn ? Rn is our notation for the set of all n-tuples of
real numbers. We run out of words after pair, triple, quadruple, quintuple,
sextuple, ... so invent the word n-tuple to refer to an ordered sequence of n
numbers where n can be any positive integer.
For example, (8,7,6,5,4,3,2,1) is an 8-tuple. It is not the same as the 8tuple (1,2,3,4,5,6,7,8). We think of these 8-tuples as labels for two different
points in R8 . We call the individual entries in (8,7,6,5,4,3,2,1) the coordinates
of the point (8,7,6,5,4,3,2,1); thus 8 is the first coordinate, 7 the second
coordinate, etc.
Often we need to speak about a point in Rn when we dont know its
coordinates. In that case we will say something like this: let (a1 , a2 , . . . , an )
be a point in Rn . Here a1 , a2 , . . . , an are some arbitrary real numbers.
1.3. The origin. The origin is a special point in Rn : it is the point
having all its coordinates equal to 0. For example, (0,0) is the origin in R2 ;
(0,0,0) is the origin in R3 ; (0,0,0,0) is the origin in R4 ; and so on. We often
write 0 for the origin.
There is a special notation for the set of all points in Rn except the origin,
namely Rn {0}. We use the minus symbol because Rn {0} is obtained by
taking 0 away from Rn . Here taking away is synonymous with removing.
This notation permits a useful brevity of expression: suppose p Rn {0}
means the same thing as suppose p is a point in Rn that is not 0.
3
1. LINEAR GEOMETRY
1.5. Multiplying a point in Rn by a number. First, some examples. Consider the point p = (1, 1, 2, 3) in R4 . Then 2p = (2, 2, 4, 6),
5p = (5, 5, 10, 15), 3p = (3, 3, 6, 9); more generally, if t is any
real number tp = (t, t, 2t, 3t). In full generality, if is a real number and
p = (a1 , . . . , an ) is a point in Rn , p denotes the point (a1 , . . . , an ); thus,
the coordinates of p are obtained by multiplying each coordinate of p by
. We call p a multiple of p.
1.6. Lines. We will now define what we mean by the word line. If p is
a point in Rn that is not the origin, the line through the origin in the direction
p is the set of all multiples of p. Formally, if we write Rp to denote that
line,1
Rp = {p | R}.
n
If q is another point in R , the line through q in the direction p is
{q + p | R};
we often denote this line by q + Rp.
When we speak of a line in Rn we mean a subset of Rn of the form
q + Rp. Thus, if I say L is a line in Rn I mean there is a point p Rn {0}
and a point q Rn such that L = q + Rp. The p and q are not uniquely
determined by L.
Proposition 1.1. The lines L = q + Rp and L0 = q 0 + Rp0 are the same
if and only if p and p0 are multiples of each other and q q 0 lies on the line
Rp; i.e., if and only if Rp = Rp0 and q q 0 Rp.
Proof.2 () Suppose the lines are the same, i.e., L = L0 . Since q 0 is on the
line L0 it is equal to q + p for some R. Since q is on L, q = q 0 + p0 for
1I like the notation Rp for the set of all multiples of p because it is similar to the
notation 2p: when x and y are numbers we denote their product by xy, the juxtaposition
of x and y. So Rp consists of all products p where ranges over the set of all real
numbers.
2To show two sets X and X 0 are the same one must show that every element of X is
in X 0 , which is written as X X 0 , and that every element of X 0 is in X.
To prove an if and only if statement one must prove each statement implies the
other; i.e., the statement A if and only if B is true if the truth of A implies the truth of
B and the truth of B implies the truth of A. Most of the time, when I prove a result of
the form A if and only if B I will break the proof into two parts: I will begin the proof
that A implies B by writing the symbol () and will begin the proof that B implies A
by writing the symbol ()
1. LINEARITY
1. LINEAR GEOMETRY
So, take a deep breath, gather your courage, and plunge into the bracing
waters. I will be there to help you if you start sinking. But I cant be
everywhere at once, and I wont always recognize whether you are waving
or drowning. Shout help if you are sinking. Im not telepathic.
1.8. Basic properties of Rn . Here are some things we will prove:
(1) If p and q are different points in Rn there is one, and only one, line
through p and q. We denote it by pq.
(2) If L is a line through the origin in Rn , and q and q 0 are points in
Rn , then either q + L = q 0 + L or (q + L) (q 0 + L) = .
(3) If L is a line through the origin in Rn , and q and q 0 are points in
Rn , then q + L = q 0 + L if and only if q q 0 L.
(4) If p and q are different points in Rn , then pq is the line through p
in the direction q p.
(5) If L and L0 are lines in Rn such that L L0 , then L = L0 . This is
a consequence of (2).
If L is a line through the origin in Rn , and q and q 0 are points in Rn , we say
that the lines q + L and q 0 + L are parallel.
Dont just accept these as facts to be memorized. Learning is more than
knowinglearning involves understanding. To understand why the above
facts are true you will need to look at how they are proved.
Proposition 1.2. If p and q are different points in Rn there is one, and
only one, line through p and q, namely the line through p in the direction
q p.
Proof. The line through p in the direction q p is p + R(q p). The line
p + R(q p) contains p because p = p + 0 (q p). It also contains q because
q = p + 1 (q p). We have shown that the line through p in the direction
q p passes through p and q.
It remains to show that this is the only line that passes through both p
and q. To that end, let L be any line in Rn that passes though p and q. We
will use Proposition 1.1 to show that L is equal to p + R(q p).
By our definition of the word line, L = q 0 + Rp0 for some points q 0 R
and p0 R {0}. By Proposition 1.1, q 0 + Rp0 is equal to p + R(q p) if
Rp0 = R(q p) and q 0 p Rp0 .
We will now show that Rp0 = R(q p) and q 0 p Rp0 .
Since p L and q L, there are numbers and such that p = q 0 + p0 ,
which implies that q 0 p Rp0 , and q = q 0 + p0 . Thus p q = q 0 + p
q 0 q 0 = ( )q 0 . Because p 6= q, p q 6= 0; i.e., ( )p0 6= 0. Hence
6= 0. Therefore p0 = ( )1 (p q). It follows that every multiple of
p0 is a multiple of p q and every multiple of p q is a multiple of p0 . Thus
Rp0 = R(q p). We already observed that q 0 p Rp0 so Proposition 1.1
tells us that q 0 + Rp0 = p + R(q p), i.e., L = p + R(q p).
1. LINEARITY
1.8.1. Notation. We will write pq for the unique line in Rn that passes
through the points p and q (when p 6= q). Proposition 1.2 tells us that
pq = p + R(q p).
Proposition 1.1 tells us that pq has infinitely many other similar descriptions.
For example, pq = q + R(p q) and pq = q + R(q p).
1.8.2. Although we cant picture R8 we can ask questions about it. For
example, does the point (1,1,1,1,1,1,1,1) lie on the line through the points
(8,7,6,5,4,3,2,1) and (1,2,3,4,5,6,7,8)? Why? If you can answer this you
understand what is going on. If you cant you dont, and should ask a
question.
1.9. Parametric description of lines. You already know that lines
in R3 can be described by a pair of equations or parametrically. For example,
the line given by the equations
(
x+y+z
= 4
(1-1)
x + 3y + 2z = 9
is the set of points of the form (t, 1 + t, 3 2t) as t ranges over all real
numbers; in set-theoretic notation, the line is
(1-2)
we call t a parameter; we might say that t parametrizes the line or that the
line is parametrized by t.
The next result shows that every line in Rn can be described parametrically.
Proposition 1.3. If p and q are distinct points in Rn , then
(1-3)
Proof. We must show that the sets pq and {tp + (1 t)q | t R} are the
same. We do this by showing each set contains the other one.
Let p0 be a point on pq. Since pq = p + R(q p), p0 = p + (q p) for
some R. Thus, if t = 1 , then
p0 = (1 )p + q = tp + (1 t)q.
Therefore pq {tp + (1 t)q | t R}. If t is any number, then
tp + (1 t)q = q + t(p q) q + R(p q) = pq
so {tp + (1 t)q | t R} pq. Thus, pq = {tp + (1 t)q | t R}.
1. LINEAR GEOMETRY
1.9.1. Dont forget this remark. One of the key ideas in understanding
systems of linear equations is moving back and forth between the parametric
description of linear subsets of Rn and the description of those linear subsets
as solution sets to systems of linear equations. For example, (1-2) is a
parametric description of the set of solutions to the system (1-1) of two
linear equations in the unknowns x, y, and z. In other words, every solution
to (1-1) is obtained by choosing a number t and then taking x = t, y = 1 + t,
and z = 3 2t.
1.9.2. Each line L in Rn has infinitely many parametric descriptions.
1.9.3. Parametric descriptions of higher dimensional linear subsets of
Rn .
1.10. Linear subsets of Rn . Let L be a subset of Rn . We say that L
is linear if
(1) it is a point or
(2) whenever p and q are different points on L every point on the line
pq lies on L.
1.11. What is a plane? Actually, it would be better if you thought
about this question. How would you define a plane in Rn ? Look at the
definition of a line for inspiration. We defined a line parametrically, not by
a collection of equations. Did we really need to define a line in two steps,
i.e., first defining a line through the origin, then defining a general line?
1.12. Hyperplanes. The dot product of two points u = (u1 , . . . , un )
and v = (v1 , . . . , vn ) in Rn is the number
u v = u1 v1 + + un vn .
Notice that u v = v u. If u and v are non-zero and u v = 0 we say u and
v are orthogonal. If n is 2 or 3, the condition that u v = 0 is equivalent to
the condition that the line 0u is perpendicular to the line 0v.
Let u be a non-zero point in Rn and c any number. The set
H := {v Rn | u v = c}
is called a hyperplane in Rn .
Proposition 1.4. If p and q are different points on a hyperplane H,
then all the points on the line pq lie on H.
Proof. Since H is a hyperplane there is a point u Rn {0} and a number
c such that H = {v Rn | u v = c}.
1. LINEARITY
x2 + y 2 = z 2
and
x+y+z =1
10
1. LINEAR GEOMETRY
2. LINES IN R2
11
(c a)(y b) = (d b)(x a)
12
1. LINEAR GEOMETRY
get a true equality, namely (c a)(d b) = (d b)(c a). Thus, the line
given by equation (2-1) is the line through p and q which we denote by pq.
Lets write L for the set of points {tp + (1 t)q | t R}. Less elegantly,
but more explicitly,
n
o
L = ta + (1 t)c, tb + (1 t)d t R .
The proposition claims that L = pq. To prove this we must show that every
point in L belongs to pq and every point on pq belongs to L.
We first show that L pq, i.e., every point in L belongs to pq. A point
in L has coordinates ta + (1 t)c, tb + (1 t)d . When we plug these
coordinates into (2-1) we obtain
(c a)(tb + (1 t)d b) = (d b)(ta + (1 t)c a),
i.e., (c a)(1 t)(d b) = (d b)(1 t)(c a). Since this really is an equality
we have shown that ta + (1 t)c, tb + (1 t)d lies on pq. Therefore L pq.
To prove the opposite inclusion, take a point (x, y) that lies on pq; we
will show that (x, y) is in L. Do do that we must show there is a number
t such that (x, y) = (ta + (1 t)c, tb + (1 t)d). Because (x, y) lies on pq,
(c a)(y b) = (d b)(x a). This implies that
yd
xc
=
.
ac
bd
Lets call this number t. If a = c we define t to be the number (y d)/(bd);
if b = d, we define t to be the number (x c)/(a c); we cant have a = c
and b = d because then (a, b) = (c, d) which violates the hypothesis that
(a, b) and (c, d) are different points.
In any case, we now have x c = t(a c) and y d = t(b d). We
can rewrite these as x = ta + (1 t)c and y = tb + (1 t)d. Thus (x, y) =
(ta + (1 t)c, tb + (1 t)d), i.e., (x, y) belongs to the set L. This completes
the proof that L pq.
2.5. Remarks on the previous proof. You might ask How did you
know that (2-1) is the line through (a, b) and (c, d)?
Draw three points (a, b), (c, d), and (x, y), on a piece of paper. Draw the
line segment from (a, b) to (c, d), and the line segment from (c, d) to (x, y).
Can you see, or convince yourself, that (x, y) lies on the line through (a, b)
and (c, d) if and only if those two line segments have the same slope? I hope
so because that is the key idea behind the equation (2-1). Let me explain.
The slopes of the two lines are
db
yb
and
.
ca
xa
Thus, (x, y) lies on the line through (a, b) and (c, d) if and only if
db
yb
=
.
ca
xa
13
It is possible that a might equal c in which case the expression on the left
of the previous equation makes no sense. To avoid that I cross multiply and
rewrite the required equality as (d b)(x a) = (c a)(y b).
Of course, there are other reasonable ways to write the equation for the
line pq but I like mine because it seems elegant and fits in nicely with the
geometry of the situation, i.e., the statements I made about slopes.
2.6. About proofs. A proof is a narrative argument intended to convince other people of the truth of a statement.
By narrative I mean that the argument consists of sentences and paragraphs. The sentences should be grammatically correct and easy to understand. Those sentences will contain words and mathematical symbols.
Not only must the words obey the rules of grammar, so too must the symbols. Thus, there are two kinds of grammatical rules: those that govern the
English language and those that govern mathematical phrases.
By saying other people I want to emphasize that you are writing for
someone else, the reader. Make it easy for the reader. It is not enough that
you understand what your sentences say. Others must understand them too.
Indeed, the quality of your proof is measured by the effect it has on other
people. If your proof does not convince other people you have done a poor
job.
If your proof does not convince others it might be that your proof is
incorrect. You might have fooled yourself. If your proof does not convince
others it might be because your argument is too long and convoluted. In
that case, you should re-examine your proof and try to make it simpler.
Are parts of your narrative argument unnecessary? Have you said the same
thing more than once? Have you used words or phrases that add nothing to
your argument? Are your sentences too long? Is it possible to use shorter
sentences. Short sentences are easier to understand. Your argument might
fail to convince others because it is too short. Is some essential part of the
argument missing? Is it clear how each statement follows from the previous
ones? Have you used an undefined symbol, word, or phrase?
When I figure out how to prove something I usually rewrite the proof
many times trying to make it as simple and clear as possible. Once I have
convinced myself that my argument works I start a new job, that of polishing
and refining the argument so it will convince others.
Even if you follow the rules of grammar you might write nonsense. A
famous example is Red dreams sleep furiously. For some mathematical
nonsense, consider the sentence The derivative of a triangle is perpendicular
to its area.
3. Points, lines, and planes in R3
3.1. R3 denotes the set of ordered triples of real numbers. Formally,
R3 := {(a, b, c) | a, b, c R}.
14
1. LINEAR GEOMETRY
and
15
CHAPTER 2
Matrices
You can skip this chapter if you want, start reading at chapter 4, and
return to this chapter whenever you need to.
Matrices are an essential part of the language of linear algebra and linear
equations. This chapter isolates this part of the language so you can easily
refer back to it when you need to.
1. What is a matrix?
1.1. An m n matrix (we read this as an m-by-n matrix) is a rectangular array of mn numbers arranged into m rows and n columns. For
example,
1 3 0
(1-1)
A :=
4 5 2
is a 2 3 matrix. The prefix m n is called the size of the matrix.
1.2. We use upper case letters to denote matrices. The numbers in the
matrix are called its entries. The entry in row i and column j is called the
ij th entry, and if the matrix is denoted by A we often write Aij for its ij th
entry. The entries in the matrix A in (1-1) are
A11 = 1, A12 = 3, A13 = 0
A21 = 4, A22 = 5, A23 = 2.
In this example, we call 5 the 22-entry; read this as two-two entry. Likewise,
the 21-entry (two-one entry) is 4.
We sometimes write A = (aij ) for the matrix whose ij th entry is aij . For
example, we might write
a11 a12 a13
A=
.
a21 a22 a23
Notice that aij is the entry in row i and column j
1.3. Equality of matrices. Two matrices A and B are equal if and
only if they have the same size and Aij = Bij for all i and j.
17
18
2. MATRICES
19
and
v=
u = (1 2 3 4)
3 .
4
v than v.
Warning. .
20
2. MATRICES
=
.
4 5 6
3 2 1
1 7 5
Abstractly, if (aij ) and (bij ) are matrices of the same size, then
(aij ) + (bij ) = (aij + bij )
and
21
and
5 4 3
B=
2 1 2
5
1) = (1 2)
=15+22=9
2
4
=14+21=6
2) = (1 2)
1
3
3) = (1 2)
=13+22=7
2
5
1) = (0 1)
= 0 5 1 2 = 2
2
4
2) = (0 1)
= 0 4 1 1 = 1
1
3
3) = (0 1)
= 0 3 1 2 = 2
2
In short,
AB =
9
6
7
.
2 1 2
22
2. MATRICES
q
q
q
/q
q
q
/
q
q
q
q
q
q
q
q
q
q
q
q
(AB)ij =
n
X
Ait Btj .
t=1
23
In other words every entry on the NW-SE diagonal is 1 and all other entries
are 0. For example, the 4 4 identity matrix is
1 0 0 0
0 1 0 0
0 0 1 0 .
0 0 0 1
The key property of the n n identity matrix I is that if A is any m n
matrix, then AI = A and, if B is any n p matrix IB = B. You should
check this with a couple of examples to convince yourself it is true. Even
better give a proof of it for all n by using the definition of the product. For
example, if A is an m n matrix, then
(AI)ij =
n
X
k=1
where we used the fact that Ikj is zero except when kji, i.e., the only nonzero term in the sum is Aij Ijj = Aij .
6.6. Multiplication is associative. Another way to see whether you
understand the definition of the product in (6-1) is to try using it to prove
that matrix multiplication is associative, i.e., that
(6-2)
(AB)C = A(BC)
for any three matrices for which this product makes sense. This is an important property, just as it is for numbers: if a, b, and c, are real numbers
then (ab)c = a(bc); this allows us to simply write abc, or when there are four
numbers abcd, because we know we get the same answer no matter how we
group the numbers in forming the product. For example
(2 3) (4 5) = 6 20 = 120
and
(2 3) 4 5 = (6 4) 5 = 24 5 = 120.
Formula (6-1) only defines the product of two matrices so to form the
product of three matrices A, B, and C, of sizes k `, ` m, and m n,
respectively, we must use (6-1) twice. But there are two ways to do that:
first compute AB, then multiply on the right by C; or first compute BC,
then multiply on the left by A. The formula (AB)C = A(BC) says that
those two alternatives produce the same matrix. Therefore we can write
ABC for that product (no parentheses!) without ambiguity. Before we can
do that we must prove (6-2) by using the definition (6-1) of the product of
two matrices.
24
2. MATRICES
6.7. Try using (6-1) to prove (6-2). To show two matrices are equal you
must show their entries are the same. Thus, you must prove the ij th entry of
(AB)C is equal to the ij th entry of A(BC). To begin, use (6-1) to compute
((AB)C)ij . I leave you to continue.
This is a test of your desire to pass the course. I know you want to pass
the course, but do you want that enough to do this computation?1
Can you use the associative law to prove that if J is an n n matrix
with the property that AJ = A and JB = B for all m n matrices A and
all n p matrices B, then J = In ?
6.8. The distributive law. If A, B, and C, are matrices of sizes such
that the following expressions make sense, then
(A + B)C = AC + BC.
6.9. The columns of the product AB. Suppose the product AB
exists. It is sometimes useful to write B j for the j th column of B and write
B = [B 1 , . . . , B n ].
The columns of AB are then given by the formula
AB = [AB 1 , . . . , AB n ].
You should check this asssertion.
6.10. The product Ax. Suppose A = [A1 , A2 , . . . , An ] is a matrix
with n columns and
x1
x2
x = .. .
.
xn
Then
Ax = x1 A1 + x2 A2 + + xn An .
This last equation is one of the most important equations in the course.
Tattoo it on your body. If you dont know this equation you will probably
fail the course.
Try to prove it yourself. First do two simple examples, A a 2 3 matrix,
and A a 3 2 matrix. Then try the general case, A an m n matrix.
1Perhaps I will ask you to prove that (AB)C = A(BC) on the midtermwould you
8. TRANSPOSE
25
6.11. Multiplication by a scalar. There is another kind of multiplication we can do. Let A be an m n matrix and let c be any real number.
We define cA to be the m n matrix obtained by multiplying every entry
of A by c. Formally, (cA)ij = cAij for all i and j. For example,
1 3 0
3
9 0
3
=
.
4 5 2
12 15 6
We also define the product Ac be declaring that it is equal to cA, i.e.,
Ac = cA.
7. Pitfalls and warnings
7.1. Warning: multiplication is not commutative. If a and b are
numbers, then ab = ba. However, if A and B are matrices AB need not
equal BA even if both products exist. For example, if A is a 2 3 matrix
and B is a 3 2 matrix, then AB is a 2 2 matrix whereas BA is a 3 3
matrix.
Even if A and B are square matrices of the same size, which ensures
that AB and BA have the same size, AB need not equal BA. For example,
0 0
1 0
0 0
(7-1)
=
1 0
0 0
1 0
but
(7-2)
1 0
0 0
0 0
=
.
0 0
1 0
0 0
T
1 4
1 2 3
= 2 5 .
4 5 6
3 6
Notice that (AT )T = A.
26
2. MATRICES
v=
3 .
4
Check that
(AB)T = B T AT .
A lot of students get this wrong: they think that (AB)T is equal to AT B T .
(If the product AB makes sense B T AT need not make sense.)
9. Some special matrices
We have already met two special matrices, the identity and the zero
matrix. They behave like the numbers 1 and 0 and their great importance
derives from that simple fact.
9.1. Symmetric matrices. We call A a symmetric matrix if AT =
A. A symmetric matrix must be a square matrix. A symmetric matrix is
symmetric about its main diagonal. For example
0 1 2 3
1 4 5 6
2 5 7 8
3 6 8 9
is symmetric.
A square matrix A is symmetric if Aij
the matrix
1 2
2 0
5
4
5
4
9
is symmetric. The name comes from the fact that the entries below the
diagonal are the same as the corresponding entries above the diagonal where
corresponding means, roughly, the entry obtained by reflecting in the
diagonal. By the diagonal of the matrix above I mean the line from the top
left corner to the bottom right corner that passes through the numbers 1,
0, and 9.
If A is any square matrix show that A + AT is symmetric.
Is AAT symmetric?
Use the definition of multiplication to show that (AB)T = B T AT .
2Printers dislike blank space because it requires more paper. They also dislike black
27
0 1 2 3
0 4 5 6
0 0 7 8
0 0 0 9
is upper triangular.
9.4. Lower triangular matrices. A square matrix is lower triangular
if all its entries above the diagonal are zero. For example,
1 0 0
4 5 0
7 8 9
is lower triangular. The transpose of a lower triangular matrix is upper
triangular and vice-versa.
10. Solving an equation involving an upper triangular matrix
Here is an easy problem: find numbers x1 , x2 , x3 , x4 such that
1 2 3 4
x1
1
0 2 1 0 x2 2
Ax =
0 0 1 1 x3 = 3 = b.
0 0 0 2
x4
4
Multiplying the bottom row of the 4 4 matrix by the column x gives 2x4
and we want that to equal b4 which is 4, so x4 = 2. Multiplying the third
row of the 4 4 matrix by x gives x3 + x4 and we want that to equal b3
which is 3. We already know x4 = 2 so we must have x3 = 1. Repeating
this with the second row of A gives 2x2 x3 = 2, so x2 = 21 . Finally the
first row of A gives x1 + 2x2 + 3x3 + 4x4 = 1; plugging in the values we have
found for x4 , x3 , and x2 , we get x1 + 1 3 + 8 = 1 so x1 = 5. We find that
a solution is given by
5
1
2
x=
1 .
4
Is there any other solution to this equation? No. Our hand was forced
at each stage.
Here is a hard problem:
28
2. MATRICES
1 0
0 2
0 0
(11-1)
..
.
0 0
0 0
x1
1 x1
0
0
0
x x
0
0
0
.2 2. 2
3
0
0
.. ..
.. =
..
.
.
. .
0 m1 0 .. ..
0
0
m
xm
m xm
29
CHAPTER 3
31
CHAPTER 4
1. Systems of equations
You have been solving equations for years. Sometimes you have considered the solutions to a single equation. For example, you are familiar with
the fact that the solutions to the equation y = x2 4x + 1 form a parabola.
Likewise, the solutions to the equation (x 2)2 + (y 1)2 + z 2 = 1 form a
sphere of radius 1 centered at the point (2, 1, 0).
1.2. Pairs of numbers (a, b) also label points in the plane. More precisely, we can draw a pair of axes perpendicular to each other, usually called
the x- and y-axes, and (a, b) denotes the point obtained by going a units
along the x-axis and b-units along the y-axis. The picture below illustrates
33
34
the situation:
yO
(a, b)
/x
(b, a)
(a, b)
/x
The order also matters when we say (3, 2) is a solution to the equation
y = x2 x 4. Although (3, 2) is a solution to y = x2 x 4, (2, 3) is not.
1.3. R2 and R3 . We write R2 for the set of all ordered pairs of numbers
(a, b). Here R denotes the set of real numbers and the superscript 2 in R2
indicates pairs of real numbers. The general principle is that a solution to
an equation in 2 unknowns is an ordered pair of real numbers, i.e., a point
in R2 .
A solution to an equation in 3 unknowns, (x 2)2 + (y 1)2 + z 2 = 1
for example, is an ordered triple of numbers, i.e., a triple (a, b, c) such that
(a 2)2 + (b 1)2 + c2 does equal 1. The set of all ordered triples of numbers
is denoted by R3 .
1. SYSTEMS OF EQUATIONS
35
1.4. A system of equations. We can consider more than one equation at a time. When we consider several equations at once we speak of a
system of equations. For example, we can ask for solutions to the system of
equations
(
y = x2 x 4
(1-1)
y = 3x 7.
A solution to this system of equations is an ordered pair of numbers (a, b)
with the property that when we plug in a for x and b for y both equations
are true. That is, (a, b) is a solution to the system (1-1) if b = a2 a 4
and b = 3a 7. For example, (3, 2) is a solution to the system (1-1). So is
(1, 4). Although (4, 8) is a solution to y = x2 x 4 it is not a solution
to the system (1-1) because 8 6= 3 4 7.
On the other hand, (4, 8) is not a solution to the system (1-1): it is
a solution to y = x2 x 4 but not a solution to y = 3x 7 because
8 6= 3 4 7.
1.5. More unknowns and Rn . The demands of the modern world are
such that we often encounter equations with many unknowns, sometimes
billions of unknowns. Lets think about a modest situation, a system of 3
equations in 5 unknowns. For example, a solution to the system of equations
2
3
4
5
x1 + x2 + x3 + x4 + x5 = 100
(1-2)
x1 + 2x2 + 3x3 + 4x4 + 5x5 = 0
x1 x2 + x3 x4 x25 = 20
is an ordered 5-tuple1 of numbers, (s1 , s2 , s3 , s4 , s5 ), having the property
that when we plug in s1 for x1 , s2 for x2 , s3 for x3 , s4 for x4 , and s5 for x5 ,
all three of the equations in (1-2) become true. We use the notation R5 to
denote the set of all ordered 5-tuples of real numbers. Thus, the solutions
to the system (1-2) are elements of R5 .
We often call elements of R5 points just as we call points in the plane
points. A point in R5 has 5 coordinates, whereas a point in the plane has 2
coordinates.
More generally, solutions to a system of equations with n unknowns are,
or can be thought of as, points in Rn . We sometimes call Rn n-space. For
example, we refer to the physical world around us as 3-space. If we fix
an origin and x-, y-, and z-axes, each point in our physical world can be
1We run out of words after pair, triple, quadruple, quintuple, sextuple, ... so invent the
word n-tuple to refer to an ordered sequence of n numbers where n can be any positive
integer. For example, (8,7,6,5,4,3,2,1) is an 8-tuple. It is not the same as the 8-tuple
(1,2,3,4,5,6,7,8). We think these 8-tuples as labels for two different points in R8 . Although
we cant picture R8 we can ask questions about it. For example, does (1,1,1,1,1,1,1,1) lie
on the line through the points (8,7,6,5,4,3,2,1) and (1,2,3,4,5,6,7,8)? The answer is yes.
Why? If you can answer this you understand what is going on. If you cant you dont,
and should ask a question.
36
a1 x1 + + an xn = b
37
A := ..
..
..
.
.
.
am1 am2
called the coefficient matrix.
amn
38
xn
(A| b) = ..
..
..
.
.
.
.
.
.
am1 am2
amn | bm
5. SPECIFIC EXAMPLES
39
there is no point lying on both lines and therefore no common solution to the
pair of equations, i.e., no solution to the given system of linear equations.
5.3. No solutions. The 3 2 system
x1 + x2 = 2
2x1 x2 = 1
x1 x2 = 3
has no solutions because the only solution to the 2 2 system consisting
of the first two equations is (1, 1) and that is not a solution to the third
equation in the 3 2 system. Geometrically, the solutions to each equation
lie on a line in R2 and the three lines do not pass through a common point.
5.4. A unique solution. The 3 2 system
x1 + x2 = 2
2x1 x2 = 1
x1 x2 = 0
has a unique solution, namely (1, 1). The three lines corresponding to the
three equations all pass through the point (1, 1).
5.5. Infinitely many solutions. It is obvious that the system consisting of the single equation
x1 + x2 = 2
has infinitely many solutions, namely all the points lying on the line of slope
1 passing through (0, 2) and (2, 0).
5.6. Infinitely many solutions. The 2 2 system
x1 + x2 = 2
2x1 + 2x2 = 4
has infinitely many solutions, namely all the points lying on the line of slope
1 passing through (0, 2) and (2, 0) because the two equations actually give
the same line in R2 . A solution to the first equation is also a solution to the
second equation in the system.
5.7. Infinitely many solutions. The 2 3 system
x1 + x2 + x3 = 3
2x1 + x2 x3 = 4
also has infinitely many solutions. The solutions to the first equation are
the points on the plane x1 + x2 + x3 = 3 in R3 . The solutions to the second
equation are the points on the plane 2x1 +x2 x3 = 4 in R3 . The two planes
meet one another in a line. That line can be described parametrically as the
points
(1 + 2t, 2 3t, t)
40
as t ranges over all real numbers. You should check that when x1 = 1 + 2t,
x2 = 2 3t, and x3 = t, both equations in the 2 3 system are satisfied.
5.8. A unique solution. The 3 3 system
x1 + x2 + x3 = 3
2x1 + x2 x3 = 4
4x1 + 3x2 2x3 = 13
has a unique solution, namely (x1 , x2 , x3 ) = (1, 5, 1). The solutions to
each equation are the points in R3 that lie on the plane given by the equation.
There is a unique point lying on all three planes , namely (1, 5, 1). This
is typical behavior: two planes in R3 meet in a line (unless they are parallel),
and that line will (usually!) meet a third plane in a point. The next two
example show this doesnt always happen: in Example 5.9 the third plane is
parallel to the line that is the intersection of the first two planes; in Example
5.10 the third plane is contains the line that is the intersection of the first
two planes.
5.9. No solution. We will now show that the 3 3 system
x1 + x2 + x3 = 3
2x1 + x2 x3 = 4
3x1 + x2 3x3 = 0
has a no solution. In Example 5.7 we showed that all solutions to the
system consisting of the first two equations in the 3 3 system we are now
considering are of the form (1 + 2t, 2 3t, t) for some t in R. However, if
(x1 , x2 , x3 ) = (1 + 2t, 2 3t, t), then
(5-1)
41
(1 i m).
Then
(6-1)
42
Lines in the plane are the same things as equations a1 x + a01 y = b1 . The
points on the line are the solutions to the equation. The solutions to a pair
(system) of equations a1 x + a01 y = b1 and a2 x + a02 y = b2 are the points that
lie on both lines, i.e., their intersection.
In 3-space, which we will denote by R3 , the solutions to a single equation
form a plane. Two planes in R3 intersect at either
(1) no point (they are parallel) or
(2) infinitely many points (they are not parallel).
In the second case the intersection is a line if the planes are different, and a
plane if they are the same.
Three planes in R3 intersect at either
(1) a unique point or
(2) no point (all three are parallel) or
(3) infinitely many points (all three planes are the same or the intersection of two of them is a line that lies on the third plane).
I leave you to consider the possibilities for three lines in the plane and
four planes in R3 . Discuss with friends if necessary.
8. Homogeneous systems
A system of linear equations of the form Ax = 0 is called a homogeneous
system. A homogeneous system always has a solution, namely x = 0, i.e.,
x1 = = xn = 0. This is called the trivial solution because little brainpower
is needed to find it. Other solutions to Ax = 0 are said to be non-trivial.
For a homogeneous system the issue is to describe the set of non-trivial
solutions.
8.1. A trivial but important remark. If (s1 , . . . , sn ) is a solution to
a homogeneous system so is (s1 , . . . , sn ) for all R, i.e., all points on
the line through the origin 0 = (0, . . . , 0) and (s1 , . . . , sn ) are solutions.
8.2. A geometric view. The next result, Proposition 8.1, shows that
solutions to the system of equations Ax = b are closely related to solutions to
the homogeneous system of equations Ax = 0.3 We can see this relationship
in the setting of a single equation in 2 unknowns.
The solutions to the equation 2x + 3y = 0 are points on a line in the
plane R2 . Draw that line: it is the line through the origin with slope 23 .
The origin is the trivial solution. Other points on the line such as (3, 2)
are non-trivial solutions. The solutions to the non-homogeneous equation
2x + 3y = 1 are also points on a line in R2 . Draw it. This line has slope 32
too. It is parallel to the line 2x + 3y = 0 but doesnt go through the origin.
Pick any point on the second line, i.e., any solution to 2x + 3y = 1. Lets
pick (1, 1). Now take a solution to the first equation, say (3, 2) and add
3Students seem to have a hard time understanding Proposition 8.1. Section 8.3 dis-
cusses an analogy to Proposition 8.1 that will be familiar to you from your calculus courses.
8. HOMOGENEOUS SYSTEMS
43
44
CHAPTER 5
46
(2) in each non-zero row the left-most non-zero entry, called the leading
entry of that row, is to the right of the leading entry in the row above
it.
A matrix E is in row-reduced echelon form (RREF) if
(1) it is in row echelon form, and
(2) every leading entry is a 1, and
(3) every column that contains a leading entry has all its other entries
equal to zero.
For example,
0 1 0 2 1 0 0 3
0 0 1 3 1 0 0 4
0 0 0 0 0 1 0 5
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
is in RREF.
Theorem 2.1. Given a matrix A, there is a unique row-reduced echelon
matrix that is row equivalent to A.
Proof.
The uniqueness result in Theorem 2.1 allows us to speak of the rowreduced echelon matrix that is equivalent to A. We will denote it by
rref(A).
2.1. Reminder. It is important to know there are infinitely many sequences of elementry row operations that can get you from A to rref(A).
In an exam, if you are given a specific matrix A and asked to put A in
row reduced echelon form there is a unique matrix, rref(A), that you must
produce, but the row operations that one student uses to get from A to
rref(A) will probably not be the same as the row operations used by another
student to get from A to rref(A). There is no best way to proceed in doing
this and therefore no guiding rule I can give you to do this. Experience and
practice will teach you how to minimize the number and complexity of the
operations and how to keep the intermediate matrices reasonably nice. I
often dont swap the order of the rows until the very end. I begin by trying
to get zeroes in the columns, i.e., all zeroes in some columns, and only one
non-zero entry in other columns, and I dont care which order of columns I
do this for.
3. An example
Doing elementary row operations by hand to put a matrix in row-reduced
echelon form can be a pretty tedious task. It is an ideal task for a computer!
But, lets take a deep breath and plunge in. Suppose the system is
MORE TO DO
47
1 1 1 | 3
1 1 1 | 3
3 2 0 | 7 .
(A | b) = 2 1 1 | 4
4 1 1 | 13
6 5 0 | 19
Then we subtracted twice the second row
1 1 1 |
3 2 0 |
0 1 0 |
3
7 .
5
1 1 1 | 3
1 1 1 | 3
1 1 1 | 3
3 2 0 | 7
3 0 0 | 3
1 0 0 | 1
0 1 0 | 5
0 1 0 | 5
0 1 0 | 5
which gives x1 = 1. Now subtract the second and third rows from the top
row to get
1 1 1 | 3
0 0 1 | 1
1 0 0 | 1
1 0 0 | 1
0 1 0 | 5
0 1 0 | 5
which gives x3 = 1. Admittedly the last matrix is not in row-reduced
echelon form because the left-most ones do not proceed downwards in the
48
0 0 1 | 1
1 0
1 0 0 | 1
0 1
0 1 0 | 5
0 0
0 | 1
0 | 5 .
1 | 1
7. CONSISTENT SYSTEMS
49
6. Inconsistent systems
A system of linear equations is inconsistent if it has no solution (section
6). If a row of the form
(0 0 0 | b)
(6-1)
7. Consistent systems
Suppose Ax = b is a consistent m n system of linear equations .
Form the augmented matrix (A | b).
Perform elementary row operations on (A | b) until you obtain (rref(A) | d).
If column j contains the left-most 1 that appears in some row of
rref(A) we call xj a dependent variable.
The other xj s are called independent or free variables.
The number of dependent variables equals rank(A) and, because
A has n columns, the number of independent/free variables is n
rank(A).
All solutions are now obtained by allowing each independent variable to take any value and the dependent variables are now completely determined by the equations corresponding to (rref(A) | d)
and the values of the independent/free variables.
An example will clarify my meaning. If
(7-1)
1
0
0
(rref(A) | d) =
0
0
0
0
1
0
0
0
0
2
3
0
0
0
0
0
0
1
0
0
0
0 3
4 | 1
0 4
0 | 3
0 2 1 | 0
1 1 2 | 5
0 0
0 | 0
0 0
0 | 0
50
x3
2x6 + x7
x=
x
+
2x
6
7
x6
x7
or, equivalently,
(7-2)
1
2
3
4
3
3
4
0
0
1
0
0
x = 0 + x3 0 + x6 2 + x7
1
5
0
1
2
0
0
1
0
0
0
0
1
as x3 , x6 , x7 range over all real numbers. We call (7-2) the general solution
to the system.1
7.1. Particular vs. general solutions. If we make a particular choice
of x3 , x6 , and x7 , we call the resulting x a particular solution to the system.
1You will later see that the vectors obtained from the three right-most terms of (7-2)
as x3 , x6 , and x7 , vary over all of R form a subspace, namely the null space of A. Thus (72) is telling us that the solutions to Ax = b belong to a translate of the null space of A by
a particular solution of the equation, the particular solution being the term immediately
to the right of the equal sign in (7-2). See chapter 3 and Proposition 3.1 below for more
about this.
51
3
4
0
2 ,
1
1
0
4
0
0
1
2
0
1
52
x=
As (s, t) ranges over all points of R2 , i.e., as s and t range over R, the points
( 23 + s + t, 2s, 2t) are all points on the plane 2x y z = 3.
In a similar fashion, the points on the line that is the intersection of the
planes 2xy z = 3 and x+y +z = 4 are given by the parametric equations
x=
y=
z=
9. The importance of rank
We continue to discuss a consistent m n system Ax = b.
We have seen that rank(A) m and rank(A) n (Proposition
5.1) and that the number of independent variables is n rank(A).
If rank(A) = n there are no independent variables, so the system
has a unique solution.
If rank(A) < n there is at least one independent variable, so the
system has infinitely many solutions.
53
54
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
An elementary matrix is a matrix that is obtained by performing a single
elementary row operation on an identity matrix. For example, if we switch
rows 2 and 4 of I4 we obtain the elementary matrix
1 0 0 0
0 0 0 1
0 0 1 0 .
0 1 0 0
If we multiply the third row of I4 by a
mentary matrix
1 0 0
0 1 0
0 0 c
0 0 0
If we replace the first row of I4 by itself + c times the second row we get
the elementary matrix
1 c 0 0
0 1 0 0
0 0 1 0 .
0 0 0 1
Theorem 11.1. Let E be the elementary matrix obtained by performing
an elementary row operation on the mm identity matrix. If we perform the
same elementary row operation on an m n matrix A the resulting matrix
is equal to EA.
We wont prove this. The proof is easy but looks a little technical because
of the notation. Lets convince ourselves that the theorem is plausible by
looking at the three elementary matrices above and the matrix
1 2 3
3 2 1
A=
1 1 1 .
1 0 2
1
0
0
0
and
and
1
0
0
0
1
0
0
0
c
1
0
0
0
0
1
0
55
0 0 0
1 2 3
1 2 3
0 0 1
3 2 1 = 1 0 2 ;
0 1 0
1 1 1
1 1 1
1 0 0
1 0 2
3 2 1
0
1
0
0
0
0
c
0
0
1
3
0
0 1
1
1
0
1
3
0
0 1
1
1
2
2
1
0
2
2
1
0
3
1
3
1
=
1 c
2
1
2
2
c
0
3
1
;
c
2
3
1 + 3c 2 + 2c 3 + c
1
2
1
= 3
.
1 1
1
1
2
1
0
2
CHAPTER 6
57
58
or
hv 1 , . . . , v n i.
59
Ax = x1 A1 + + xn An
a11
a12
a1n
a21
a22
a2n
x1 A1 + + xn An = x1 .. + x2 .. + + xn ..
.
.
.
am1
am2
amn
a11 x1
a12 x2
a1n xn
a21 x1 a22 x2
a2n xn
= .. + .. + + ..
. .
.
am1 x1
am2 x2
amn xn
a11
a21
= ..
.
a12
a22
am1 am2
a1n
x1
x2
a2n
.. ..
. .
amn
xn
=Ax.
The above calculation made use of scalar multiplication (6.11): if r R and
B is a matrix, then rB is the matrix whose ij th entry is rBij ; since r and
Bij are numbers rBij = Bij r.
A useful way of stating Proposition 3.1 is that for a fixed A the set
of values Ax takes as x ranges over all of Rn is Sp(A1 , . . . , An ), the linear
span of the columns of A. Therefore, if b Rm , the equation Ax = b has
a solution if and only if b Sp(A1 , . . . , An ). This is important enough to
state as a separate result.
Corollary 3.2. Let A be an m n matrix and b Rm . The equation
Ax = b has a solution if and only if b is a linear combination of the columns
of A.
60
CHAPTER 7
Subspaces
Let W denote the set of solutions to a system of homogeneous equations
Ax = 0. It is easy to see that W has the following properties: (a) 0 W ;
(b) if u and v are in W so is u + v; (c) if u W and R, then u W .
Subsets of Rn having this property are called subspaces and are of particular
importance in all aspects of linear algebra. It is also important to know
that every subspace is the set of solutions to some system of homogeneous
equations.
Another important way in which subspaces turn up is that the linear
span of a set of vectors is a subspace.
Now we turn to the official definition.
1. The definition and examples
A subset W of Rn is called a subspace if
(1) 0 W , and
(2) u + v W for all u, v W , and
(3) u W for all R and all u W .
The book gives a different definition but then proves that a subset W of Rn
is a subspace if and only if it satisfies these three conditions. We will use
our definition not the books.
We now give numerous examples.
1.1. The zero subspace. The set {0} consisting of the zero vector in
Rn is a subspace of Rn . We call it the zero subspace. It is also common to
call it the trivial subspace.
1.2. Rn itself is a subspace. This is clear.
1.3. The null space and range of a matrix. Let A be an m n
matrix. The null space of A is
N (A) := {x | Ax = 0}.
The range of A is
R(A) := {Ax | x Rn }.
Proposition 1.1. Let A be an m n matrix. Then N (A) is a subspace
of Rn and R(A) is a subspace of Rm .
61
62
7. SUBSPACES
63
1.5. Planes through the origin are subspaces. Suppose that {u, v}
is linearly independent subset of Rn . We introduce the notation
Ru + Rv := {au + bv | a, b R}.
We call Ru + Rv the plane through u and v.
Lemma 1.3. Ru + Rv is a subspace of Rn .
Proof. Exercise. It is also a special case of Proposition 1.4 below.
has a solution if and only if b is a linear combination of the columns of A because the
range of an m n matrix can be described as
R(A) = {b Rm | b = Ax for some x Rn }
= {b Rm | the equation b = Ax has a solution}.
64
7. SUBSPACES
65
`
L
?
?
?
66
7. SUBSPACES
Returning to the general case, every point on ` is the sum of (0, c) and a
point on the line y = mx so, if we write L for the subspace y = mx, then
` = (0, c) + L
= {(0, c) + v | v L}
= {(0, c) + (x, mx) | x R}
= (0, c) + R(1, m).
Let V be a subspace of Rn and z Rn . We call the subset
z + V := {z + v | v V }
the translation of V by z. We also call z + V a translate of V .
Proposition 3.1. Let S be the set of solutions to the equation Ax = b
and let V be the set of solutions to the homogeneous equation Ax = 0. Then
S is a translate of V : if z is any solution to Ax = b, then S = z + V .
Proof. Let x Rn . Then A(z + x) = b + Ax, so z + x S if and only if
x V . The result follows.
Now would be a good time to re-read chapter 7: that chapter talks
about obtaining solutions to an equation Ax = b by translating N (A) by
a particular solution to the equation Ax = b. In fact, Proposition 3.1 and
Proposition 8.1 really say the same thing in slightly different languages, and
the example in chapter 7 illustrates what these results mean in a concrete
situation.
3.1. The linear geometry of R3 . This topic should already be familiar.
Skew lines, parallel planes, intersection of line and plane, parallel lines.
For example, find an equation for the subspace of R3 spanned by (1, 1, 1)
and (1, 2, 3). Since the vectors are linearly independent that subspace will be
a plane with basis {(1, 1, 1), (1, 2, 3)}. A plane in R3 is given by an equation
of the form
ax + by + cz = d
and is a subspace if and only if d = 0. Thus we want to find a, b, c R such
that ax + by + cz = 0 is satisfied by (1, 1, 1) and (1, 2, 3). More concretely,
we want to find a, b, c R such that
x+y+z =0
x + 2y + 3z = 0.
This is done in the usual way:
1 1 1 | 0
1 1 1 | 0
1 2 3 | 0
0 1 2 | 0
1 0 1 | 0
.
0 1 2 | 0
67
CHAPTER 8
70
1 2 3 4
0 1 2 3
E=
0 0 0 1
0 0 0 0
6
5
3
0
Therefore
1.v i
bj v j = 0,
j6=i
71
72
Corollary 2.5. All bases for a subspace have the same number of elements.
Proof. Suppose that {v 1 , . . . , v d } and {w1 , . . . , wr } are bases for W . Then
W = Sp(v 1 , . . . , v d ) = Sp(w1 , . . . , wr )
and the sets {v 1 , . . . , v d } and {w1 , . . . , wr } are linearly independent.
By Theorem 2.4, a linearly independent subset of Sp(v 1 , . . . , v d ) has d
elements so r d. Likewise, by Theorem 2.4, a linearly independent subset
of Sp(w1 , . . . , wr ) has r elements so d r. Hence d = r.
The next result is a special case of Theorem 2.4 because Rn = Sp{e1 , . . . , en }.
Nevertheless, we give a different proof which again emphasizes the fact that
a homogeneous system of equations in which there are more unknowns than
equations has a non-trivial solution.
Theorem 2.6. If n > m, then every set of n vectors in Rm is linearly
dependent.
Proof. Lets label the vectors A1 , . . . , An and make them into the columns
of an m n matrix that we call A. Then Ax = x1 A1 + + xn An so
the equation Ax = 0 has a non-trivial solution (i.e., a solution with some
73
74
CHAPTER 9
76
Hence
1
b = (a1
0 a1 )A1 + + (a0 an )An .
1
2
4 2
3 1
has the property that EA = I. You should check that. The analysis in the
previous paragraph tells me that a solution to the equation (2-1) is
1 4 2
x1
1
1
7
=E
=
=
.
x2
5
5
4
2 3 1
2. INVERSES
77
A1
AA1 = I = A1 A,
provided it exists. If A has an inverse we say that A is invertible.
2.1. Uniqueness of A1 . Our definition says that the inverse is the
matrix . . .. The significance of Lemma 2.1 is that it tells us there is at
most one matrix, E say, with the property that AE = EA = I. Check you
understand that!
2.2. Not every square matrix has an inverse. The phrase provided
it exists suggests that not every n n matrix has an inversethat is correct.
For example, the zero matrix does not have an inverse. We will also see
examples of non-zero square matrices that do not have an inverse.
2.3. The derivative of a function f (x) at a R is defined to be
f (a + h) f (a)
h0
h
provided this limit exists. The use of the phrase provided it exists in
the definition of the inverse of a matrix plays the same role as the phrase
provided this limit exists plays in the definition of the derivative.
lim
78
79
80
so we conclude that ad bc 6= 0.
(=) If ad bc 6= 0 a simple calculation shows that the claimed inverse
is indeed the inverse.
The number ad bc is called the determinant of
a b
.
c d
We usually write det(A) to denote the determinant of a matrix. Theorem 5.1
therefore says that a 2 2 matrix A has an inverse if and only if det(A) 6= 0.
In chapter 12 we define the determinant for n n matrices and a prove
that an n n matrix A has an inverse if and only if det(A) 6= 0. That
is a satisfying result. Although the formula for the determinant appears
complicated at first it is straightforward to compute, even for large n, if one
has a computer handy. It is striking that the invertibility of a matrix can be
determined by a single computation even if doing the computation by hand
can be long and prone to error.
Theorem 5.2. Let A be an n n matrix. Then A has an inverse if and
only if it is non-singular.
Proof. () Suppose A has an inverse. If Ax = 0, then
x = Ix = (A1 A)x = A1 (Ax) = A1 .0 = 0
so A is non-singular.
() Suppose A is non-singular. Then Ax = b has a unique solution for
all b Rn by Theorem 1.1. Let
e1 = (1, 0, . . . , 0)T ,
Rn
and define uj
to be the unique column vector such that Auj = ej . Let
U = [u1 , . . . , un ] be the n n matrix whose columns are u1 , . . . , un . Then
AU = [Au1 , . . . , Aun ] = [e1 , . . . , en ] = I.
We will show that U is the inverse of A but we do not compute U A to do
this!
First we note that U is non-singular because if U x = 0, then
x = Ix = (AU )x = A(U x) = A.0 = 0.
Because U is non-singular we may apply the argument in the previous paragraph to deduce the existence of a matrix V such that U V = I. (The
previous paragraph showed that if a matrix is non-singular one can multiply
it on the right by some matrix to obtain the identity matrix.) Now
V = IV = (AU )V = A(U V ) = AI = A
so AU = U A = I, proving that U = A1 .
81
need to show that U A = I too before we are allowed to say U is the inverse
of A. It is not obvious that AU = I implies U A = I because there is no
obvious reason for A and U to commute. That is why we need the last
paragraph of the proof.
Corollary 5.3. Let A be an n n matrix. If AU = I, then U A = I,
i.e., if AU = I, then U is the inverse of A.
Corollary 5.4. Let A be an n n matrix. If V A = I, then AV = I,
i.e., if V A = I, then V is the inverse of A.
Proof. Apply Corollary 5.3 with V playing the role that A played in Corollary 5.3 and A playing the role that U played in Corollary 5.3.
Theorem 5.5. The following conditions on an nn matrix A are equivalent:
(1) A is non-singular;
(2) A is invertible;
(3) rank(A) = n;
(4) rref(A) = I, the identity matrix;
(5) Ax = b has a unique solution for all b Rn ;
(6) the columns of A are linearly independent;
6. If A is non-singular how do we find A1 ?
We use the idea in the proof of Theorem 5.2:
the j th column of A1 is the solution to the equation
Ax = ej so we may form the augmented matrix (A | ej )
and perform row operations to get A in row-reduced echelon form.
We can carry out this procedure in one great big military operation. If we
want to find the inverse of the matrix
1 2 1
A = 3 0 1
1 1 1
we perform elementary row operations
1 2 1 |
3 0 1 |
1 1 1 |
1 0 0
0 1 0
0 0 1
so we get the identity matrix on the left hand side. The matrix on the
right-hand side is then A1 . Check it out!
CHAPTER 10
T (a1 , . . . , ad ) := a1 v 1 + + ad v d
T (1 x1 + + s xs ) = 1 T (x1 ) + + s T (xs )
83
84
85
86
8. RANK + NULLITY
87
so B spans
We will now show that B is linearly independent. If
c1 v 1 + + cq v q + d1 u1 + . . . + dp up = 0,
then
0 = A c1 v 1 + + cq v q + d1 u1 + . . . + dp up
= c1 Av 1 + + cq Av q + d1 Au1 + . . . + dp Aup
= c1 w1 + + cq wq + 0 + . . . + 0.
But {w1 , . . . , wq } is linearly independent so
c1 = = cq = 0.
Therefore
d1 u1 + . . . + dp up = 0.
But {u1 , . . . , up } is linearly independent so
d1 = = dp = 0.
Hence B is linearly independent, and therefore a basis for Rn . But B has
p + q elements so p + q = n, as required.
We now give the long-promised new interpretation of rank(A).
88
0 1 1
0 0 1
0 0 0
0 0 1
0 0 0
0 0 0
1 1 1
0 0 0
0 0 0
8. RANK + NULLITY
89
90
1 0 2 0 0 3
4
0 1 3 0 0 4
0
0 0 0 1 0 2 1
(9-1)
E =
0 0 0 0 1 1 2
0 0 0 0 0 0
0
0 0 0 0 0 0
0
the dependent variables are x1 , x2 , x4 , and x5 . The independent variables
x3 , x6 , and x7 , may take any values and then the dependent variables are
determined by the equations
x1
x2
x4
x5
Now set any one of the independent variables to 1 and set the others to zero.
This gives solutions
2
3
4
3
4
0
1
0
0
0 , 2 , 1 .
0
1
2
0
1
0
0
0
1
These are elements of the null space and are obviously linearly independent.
They span the null space.
MORE TO SAY/DO...basis for null space
CHAPTER 11
Linear transformations
Linear transformations are special kinds of functions between vector
spaces. Subspaces of a vector space are of crucial importance and linear
transformations are precisely those functions between vector spaces that
send subspaces to subspaces. More precisely, if T : V W is a linear
transformation and U is a subspace of V , then
{T (u) | u U }
is a subspace of W . That subspace is often called the image of U under T .
1. A reminder on functions
Let X and Y be any sets. A function f from X to Y , often denoted
f : X Y , is a rule that associates to each element x X a single element
f (x) that belongs to Y . It might be helpful to think of f as a machine: we
put an element of X into f and f spits out an element of Y . We call f (x)
the image of x under f . We call X the domain of f and Y the codomain.
The range of f is the subset R(f ) := {f (x) | x X} Y . The range of
f might or might not equal Y . For example, the functions f : R R and
g : R R0 given by the formulas f (x) = x2 and g(x) = x2 are different
functions even though they are defined by the same formula. The range g
is equal to the codomain of g. The range of f is not equal to the codomain
of f .
Part of you might rebel at this level of precision and view it as pedantry
for the sake of pedantry. However, as mathematics has developed it has
become clear that such precision is necessary and useful. A great deal of
mathematics involves formulas but those formulas are used to define functions and we add a new ingredient, the domain and codomain.
You have already met some of this. For example, the function f (x) = x1
is not defined at 0 so is not a function from R to R. It is a function from
R {0} to R. The range of f is not R but R {0}. Thus the function
f : R {0} R {0} defined by the formula f (x) = x1 has an inverse,
namely itself.
At first we learn that the number 4 has two square roots, 2 and 2,
but when we advance a little we define the square root function R0
R0 . This is because we want the function f (x) = x2 to have an inverse.
The desire to have inverses often makes it necessary to restrict the domain
or codomain. A good example is the inverse sine function. Look at the
91
92
definition of sin1 () in your calculus book, or on the web, and notice how
much care is taken with the definition of the codomain.
2. First observations
Let V be a subspace of Rn and W a subspace of Rm . A linear transformation
from V to W is a function F : V W such that
(1) F (u + v) = F (u) + F (v) for all u, v V , and
(2) F (av) = aF (v) for all a R and all v V .
It is common to combine (1) and (2) into a single condition: F is a linear
transformation if and only if
F (au + bv) = aF (u) + bF (v)
for all a, b R and all u, v V .
2.1. Equality of linear transformations. Two linear transformations from V to W are equal if they take the same value at all points of V .
This is the same way we define equality of functions: f and g are equal if
they take the same values everywhere.
2.2. Linear transformations send 0 to 0. The only distinguished
point in a vector space is the zero vector, 0. It is nice that linear transformations send 0 to 0.
Lemma 2.1. If T : V W is a linear transformation, then T (0) = 0.1
Proof. The following calculation proves the lemma:
0 = T (0) T (0)
= T (0 + 0) T (0)
= T (0) + T (0) T (0)
by property (1)
= T (0) + 0
= T (0).
This proof is among the simplest in this course. Those averse to reading
proofs should look at each step in the proof, i.e., each = sign, and ask what
is the justification for that step. Ask me if you dont understand it.
2.3. The linear transformations R R.
1When we write T (0) = 0 we are using the same symbol 0 for two different things: the
0 in T (0) is the zero in V and the other 0 is the zero in W . We could write T (0V ) = 0W
but that looks a little cluttered. It is also unnecessary because T is a function from V
to W so the only elements we can put into T () are elements of V , so for T (0) to make
sense the 0 in T (0) must be the zero in V ; likewise, T spits out elements of W so T (0)
must be an element of W .
2. FIRST OBSERVATIONS
93
T (a1 v 1 + + ap v p ) = a1 T (v 1 ) + + ap T (v p ).
94
2.9. A composition of linear transformations is a linear transformation. If S : V W and T : U V are linear transformations, their
composition, denoted by ST or S T , is defined in the same way as the
composition of any two functions, namely
(ST )(u) = S(T (u))
or
by definition of ST
so ST is a linear transformation.
2.10. A linear combination of linear transformations is a linear
transformation. If S and T are linear transformations from U to V and
a, b R, we define the function aS + bT by
(aS + bT )(x) := aS(x) + bT (x).
It is easy to check that aS + bT is a linear transformation from U to V too.
We call it a linear combination of S and T . Of course the idea extends to
linear combinations of larger collections of linear transformations.
3. Linear transformations and matrices
Let A be an m n matrix and define T : Rn Rm by T (x) := Ax.
Then T is a linear transformation from Rn to Rm . The next theorem says
that every linear transformation from Rn to Rm is of this form.
Theorem 3.1. Let T : Rn Rm be a linear transformation. Then there
is a unique m n matrix A such that T (x) = Ax for all x Rn .
Proof. Let A be the m n matrix whose j th column is Aj := T (ej ). If
x = (x1 , . . . , xn )T is an arbitrary element of Rn , then
T (x) = T (x1 e1 + + xn en )
= x1 T (e1 ) + + xn T (en )
= x 1 A1 + + x n An
= Ax
as required.
In the context of the previous theorem, we sometimes say that the matrix
A represents the linear transformation T .
96
PAULexplicit example.
5. Invertible matrices and invertible linear transformations
6. How to find the formula for a linear transformation
First, what do we mean by a formula for a linear transformation? We
mean the same thing as when we speak of a formula for one of the functions
you met in calculusthe only difference is the appearance of the formula.
For example, in calculus f (x) = x sin x is a formula for the function f ;
similarly, g(x) = |x| is a formula for the function g; we can also give the
formula for g by
(
x
if x 0
g(x) =
x if x 0.
The expression
2a b
a
T b =
a + b + c
c
3c 2b
is a formula for a linear transformation T : R3 R4 .
Although a linear transformation may not be given by an explicit formula, we might want one in order to do calculations. For example, rotation
about the origin in the clockwise direction by an angle of 45 is a linear
transformation, but that description is not a formula for it. A formula for
it is given by
a+b
1 1
a
a
1
1
= 2
.
T
= 2
ba
1 1
b
b
Let T : Rn Rm be a linear transformation. Suppose {v 1 , . . . , v n } is
a basis for Rn and you are told the values of T (v 1 ), . . . , T (v n ). There is a
simple procedure to find the m n matrix A such that T (x) = Ax for all
x Rn . The idea is buried in the proof of Theorem 3.1.
MORE TO DO
7. Rotations in the plane
Among the more important linear transformations R2 R2 are the
rotations. Given an angle we write T for the linear transformation that
rotates a point by an angle of radians in the counterclockwise direction.
Before we obtain a formula for T some things should be clear. First,
T T = T+ . Second, T0 = 0. Third, T2n = idR2 and T(2n+1) = idR2
for all n Z. Fourth, rotation in the counterclockwise direction by an angle
of is the same as rotation in the clockwise direction by an angle of so
T has an inverse, namely T .
Lets write A for the unique 2 2 matrix such that T x = A x for all
x R2 . The first column of A is given by T (e1 ) and the second column
8. REFLECTIONS IN R2
97
and
!
1
a
b
e2 = 2
b
+a
2
b
a
a +b
we get
!
2
1
1
a
b
a b2
a
= 2
Ae1 = 2
b
b
a
2ab
a + b2
a + b2
and
!
1
1
2ab
a
b
Ae2 = 2
b
+a
= 2
b
a
a + b2
a + b2 b2 a2
so
2
1
a b2
2ab
2ab
b2 a2
a2 + b2
In summary, if x R2 , then Ax is the reflection of x in the line ay = bx.
A=
2Draw a picture to convince yourself that this does what you expect the reflection in
L to do.
98
You might exercise your muscles by showing that all reflections are similar to one another. Why does it suffics to show each is similar to the
reflection
1 0
?
0 1
8.1. Reflections in higher dimensions. Let T : Rn Rn . We call
T a reflection if T 2 = I and the 1-eigenspace of T has dimension n 1.
9. Invariant subspaces
10. The one-to-one and onto properties
For a few moments forget linear algebra. The adjectives one-to-one and
onto can be used with any kinds of functions, not just those that arise in
linear algebra.
Let f : X Y be a function between any two sets X and Y .
We say that f is one-to-one if f (x) = f (x0 ) only when x = x0 .
Equivalently, f is one-to-one if it sends different elements of X to
different elements of Y , i.e., f (x) 6= f (x0 ) if x 6= x0 .
The range of f , which we denote by R(f ), is the set of all values it
takes, i.e.,
R(f ) := {f (x) | x X}.
I like this definition because it is short but some prefer the equivalent definition
R(f ) := {y Y | y = f (x) for some x X}.
We say that f is onto if R(f ) = Y . Equivalently, f is onto if every
element of Y is f (x) for some x X.
10.1. The sine function sin : R R is not one-to-one because sin =
sin 2. It is not onto because its range is [1, 1] = {y | 1 y 1}.
However, if we consider sin as a function from R to [1, 1] it is becomes
onto, though it is still not one-to-one.
The function f : R R given by f (x) = x2 is not one-to-one because
f (2) = f (2) for example, and is not onto because its range is R0 := {y
R | ly 0}. However, the function
f : R0 R0 ,
f (x) = x2 ,
x2
x1
T
= x1
x2
x1 + x2
is one-to-one but not onto because (0, 0, 1)T is not in the range. Of course,
since the range of T is a plane in R3 there are lots of other elements of R3
not in the range of T .
Proposition 11.1. A linear transformation is one-to-one if and only if
its null space is zero.
Proof. Let T be a linear transformation. By definition, N (T ) = {x | T (x) =
0}. If T is one-to-one the only x with the property that T (x) = T (0) is x = 0,
so N (T ) = {0}.
Conversely, suppose that N (T ) = {0}. If T (x) = T (x0 ), then T (xx0 ) =
T (x) T (x0 ) = 0 so x x0 N (T ), whence x = x0 thus showing that T is
one-to-one.
Another way of saying this is that a linear transformation is one-to-one
if and only if its nullity is zero.
12. Gazing into the distance: differential operators as linear
transformations
You do not need to read this chapter. It is only for those who are curious
enough to raise their eyes from the road we have been traveling to pause,
refresh themselves, and gaze about in order to see more of the land we are
entering. Linear algebra is a big subject that permeates and provides a
framework for almost all areas of mathematics. So far we have seen only a
small piece of this vast land.
We mentioned earlier that the set R[x] of all polynomials in x is a vector
space with basis 1, x, x2 , . . .. Differentiation is a linear transformation
d
: R[x] R[x]
dx
because
d
df
dg
(af (x) + bg(x)) = a
+b .
dx
dx
dx
100
0 1 0 0
d
0 0 2 0
= 0 0 0 3
dx
..
.
0 0
0 0
0 0
..
.
d
(xn ) = nxn1 .
because dx
d
The null space of dx
is the set of constant functions, the subspace R.1.
d
The range of dx is all of R[x]: every polynomial is a derivative of another
polynomial.
Another linear transformation from R[x] to itself is multiplication by
x, i.e., T (f ) = xf . The composition of two linear transformations is a linear
d
to obtain another
transformation. In particular, we can compose x and dx
linear transformation. There are two compositions, one in each order. It is
interesting to compute the difference of these two compositions. Lets do
that
Write T : R[x] R[x] for the linear transformation T (f ) = xf and
D : R[x] R[x] for the linear transformation D(f ) = f 0 , the derivative
of f with respect to x. Then, applying the product rule to compute the
derivative of the product xf , we have
H := xx yy ,
F := yx
are linear transformations R[x, y]n R[x, y]n for every n. The action of
EF F E on a polynomial f is
(EF F E)(f ) = xy yx yx xy (f )
= xy (yfx ) yx (xfy )
= x(fx + yfxy ) y(fy + xfyx )
= xfx + xyfxy yfy yxfyx
= xfx yfy
= xx yy (f )
= H(f ).
Since the transformations EF F E and H act in the same way on every
function f , they are equal; i.e., EF F E = H. We also have
(HE EH)(f ) = xx yy xy xy xx yy (f )
= xx yy (xfy ) xy )(xfx yfy )
= x(fy + xfyx ) yxfyy x2 fxy + x(fy + yfyy )
= 2xfy
= 2 xy (f )
= 2E(f ).
Therefore HE EH = 2E. I leave you to check that HF F H = 2F .
This example illuminates several topics we have discussed. The null
space of E is Rxn . The null space of F is Ry n . Each xni y i is an eigenvector
for H having eigenvalue n 2i. Thus H has n + 1 distinct eigenvalues
n, n2, n4, . . . , 2n, n and, as we prove in greater generality in Theorem
1.2, these eigenvectors are linearly independent.
MORE to say spectral lines....
CHAPTER 12
Determinants
The determinant of an n n matrix A, which is denoted det(A), is a
number computed from the entries of A having the wonderful property that:
A is invertible if and only if det(A) 6= 0.
We will prove this in Theorem 3.1 below.
Sometimes, for brevity, we denote the determinant of A by |A|.
1. The definition
We met the determinant of a 2 2 matrix in Theorem 5.1:
a b
det
= ad bc.
c d
The formula for the determinant of an n n matrix is more complicated. It is defined inductively by which I mean that the formula for the
determinant of an n n matrix involves the formula for the determinant of
an (n 1) (n 1) matrix.
1.1. The 3 3 case. The determinant det(A) = det(aij ) of a 3 3
matrix is the number
a11 a12 a13
a
a
a
a
a
a
a21 a22 a23 = a11 22 23 a12 21 23 + a13 21 22
a32 a33
a31 a33
a31 a32
a31 a32 a33
= a11 (a22 a33 a23 a32 ) a12 (a21 a33 a23 a31 ) + a13 (a21 a32 a22 a31 )
The coefficients a11 , a12 , and a13 , on the right-hand side of the formula are
the entries in the top row of A and they appear with alternating signs +1
and 1; each a1j is then multiplied by the determinant of the 2 2 matrix
obtained by deleting the row and column that contain a1j , i.e., deleting the
top row and the j th column of A.
In the second line of the expression for det(A) has 3 2 = 6 = 3!
terms. Each term is a product of three entries and those three entries are
distributed so that each row of A contains exactly one of them and each
column of A contains exactly one of them. In other words, each term is a
product a1p a2q a3r where {p, q, r} = {1, 2, 3}.
103
104
12. DETERMINANTS
= det(aij ) of a 4 4
a21 a23 a24
a31 a33 a34
a41 a43 a44
Look at the similarities with the 3 3 case. Each entry in the top row of A
appears as a coefficient with alternating signs +1 and 1 in the formula on
the right. Each a1j is then multiplied by the determinant of the 3 3 matrix
obtained by deleting the row and column that contain a1j , i.e., deleting the
top row and the j th column of A. Multiplying out the formula on the right
yields a total of 4 6 = 24 = 4! terms. Each term is a product of four
entries and those four entries are distributed so that each row of A contains
exactly one of them and each column of A contains one of them. In other
words, each term is a product a1p a2q a3r a4s where {p, q, r, s} = {1, 2, 3, 4}.
1.3. The n n case. Let A be an n n matrix. For each i and j
between 1 and n we define the (n 1) (n 1) matrix Aij to be that
obtained by deleting row i and column j from A. We then define
(1-1)
det(A) := a11 |A11 | a12 |A12 | + a13 |A13 | + (1)n1 a1n |A1n |.
where {j1 , . . . , jn } = {1, . . . , n}. In particular, each row and each column
contains exactly one of the factors a1j1 , a2j2 , . . . , anjn .
Proposition 1.1. If A has a column of zeros, then det(A) = 0.
Proof. Suppose column j consists entirely if zeroes, i.e., a1j = a2j = =
anj = 0. Consider one of the n! terms in det(A), say a1j1 , a2j2 , . . . , anjn .
Because {j1 , . . . , jn } = {1, . . . , n}, one of the factors apjp in the product
a1j1 , a2j2 , . . . , anjn belongs to column j, so is zero; hence a1j1 , a2j2 , . . . , anjn =
0. Thus, every one of the n! terms in det(A) is zero. Hence det(A) = 0.
A similar argument shows that det(A) = 0 if one of the rows of A consists
entirely of zeroes.
1. THE DEFINITION
105
0 a33
det 0
..
..
.
.
0
0
0
..
.
a1n
a2n
a3n
= a11 a22 . . . ann .
..
.
a1n
106
12. DETERMINANTS
a
b
a b
= a(d + b) b(c + a) = ad bc = det
.
c+a d+b
c d
107
it follows that
(2-4)
But the determinants of the three 22 matrices in this sum are the negatives
of the three 2 2 determinants in (2-3). Hence the determinant of the 3 3
matrix in (2-4) is the negative of the determinant of the 3 3 matrix in
(2-3). This proves that det(B) = det(A) if B is obtained from A by
switching two rows.
Now consider replacing the third row of the 3 3 matrix A in (2-3) by
the sum of rows 2 and 3 to produce B. Then det(B) is
a11
a12
a13
a21
a22
a23
a21 + a31 a22 + a32 a23 + a33
= a11
a22
a23
a22 + a32 a23 + a33
a12
a21
a23
a21 + a31 a23 + a33
+ a13
a21
a22
.
a21 + a31 a22 + a32
a32 a33
a
a
a
a
a12 31 33 + a13 31 32
a22 a23
a21 a23
a21 a22
108
12. DETERMINANTS
4. Properties
Theorem 4.1. det(AB) = (det A)(det B).
Proof. We break the proof into two separate cases.
Suppose A and B are invertible. Then A = E1 . . . Em and B = Em+1 . . . En
where each Ej is an elementary matrix. By repeated application of Proposition 2.2,
det(A) = det(E1 . . . Em )
= det(E1 ) det(E2 . . . Em )
=
= det(E1 ) det(E2 ) det(Em ).
Applying the same argument to det(B) and det(AB) we see that det(AB) =
(det A)(det B).
Suppose that either A or B is not invertible. Then AB is not invertible,
so det(AB) = 0 by Theorem 3.1. By the same theorem, either det(A) or
det(B) is zero. Hence det(AB) = (det A)(det B) in this case too.
Corollary 4.2. If A is an invertible matrix, then det(A1 ) = (det A)1 .
Proof. It is clear that the determinant of the identity matrix is 1, so the
result follows from Theorem 4.1 and the fact that AA1 = I.
det(A) = 0 if two rows (or columns) of A are the same.
det(AT ) = det(A).
109
1 3 2
1 4 4
1 5 6
is divisible by 12 because it is not changed by adding 100 times the 1st
column and 10 times the second column to the third column whence
1 3 2
1 3 132
1 3 11
det 1 4 4 = det 1 4 144 = 12 det 1 4 12 .
1 5 6
1 5 156
1 5 13
Similarly, the
2 8
det 3 9
8 6
9
2 8
1 = det 3 9
7
8 6
289
2 8
391 = 17 det 3 9
867
8 6
is divisible by 17.
Amuse your friends at parties.
5.2. A nice example. If the matrix
b c
a d
A=
d a
c b
d
c
b
a
b c
d
0 = det a d c
d a b
=b(db ac) c(ab cd) + d(a2 + d2 )
=d(a2 + b2 + c2 + d2 )
by 17 so
17
23 .
51
110
12. DETERMINANTS
The condition that is one-to-one ensures that each row and each column
contains exactly one of the terms in (5-1).
There area total of n! permutations and the determinant of (aij ) is the
sum of n! terms (5-1), one for each permutation. However, there remains
the issue of the signs.
Let be a permutation of {1, 2, . . . , n}. Write (1), (2), . . . , (n) one
after the other. A transposition of this string of numbers is the string of
numbers obtained by switching the positions of any two adjacent numbers.
For example, if we start with 4321, then each of 3421, 4231, and 4312, is a
transposition of 4321. It is clear that after some number of transpositions
we can get the numbers back in their correct order. For example,
4321 4312 4132 1432 1423 1243 1234
or
4321 4231 2431 2341 2314 2134 1234.
If it takes an even number of transpositions to get the numbers back into
their correct order we call the permutation even. If it takes an odd number
we call the permutation odd. We define the sign of a permutation to be the
number
(
+1 if is even
sgn() :=
1 if is odd
It is not immediately clear that the sign of a permutation is well-defined. It
is, but we wont give a proof.
111
The significance for us is that the coefficient of the term a1(1) a2(2) an(n)
in the expression for the determinant is sgn(). Hence
X
sgn() a1(1) a2(2) an(n)
det(aij ) =
CHAPTER 13
Eigenvalues
Consider a fixed n n matrix A. Viewed as a linear transformation
from Rn to Rn , A will send a line through the origin to either another line
through the origin or zero. Some lines might be sent back to themselves.
That is a nice situation: the effect of A on the points on that line is simply
to scale them by some factor, say. The line Rx is sent by A to either zero
or the line R(Ax), so A sends this line to itself if and only if Ax is a multiple
of x. This motivates the next definition.
1. Definitions and first steps
Let A be an n n matrix. A number R is an eigenvalue for A if
there is a non-zero vector x such that Ax = x. If is an eigenvalue for A
we define
E := {x | Ax = x}.
We call E the -eigenspace for A and the elements in E are called eigenvectors
or -eigenvectors for A.
We can define E for any R but it is non-zero if and only if is an
eigenvalue.1
Since
Ax = x 0 = Ax x = Ax Ix = (A I)x,
it follows that
E = N (A I).
We have therefore proved that E is a subspace of Rn . The following theorem
also follows from the observation that E = N (A I).
Theorem 1.1. Let A be an n n matrix. The following are equivalent:
(1) is an eigenvalue for A;
(2) A I is singular;
(3) det(A I) = 0.
Proof. Let R. Then is an eigenvalue for A if and only if there is a
non-zero vector v such that (A I)v = 0; but such a v exists if and only
1Eigen is a German word that has a range of meanings but the connotation here is
probably closest to peculiar to, or particular to, or characteristic of. The eigenvalues
of a matrix are important characteristics of it.
113
114
13. EIGENVALUES
3. THE 2 2 CASE
115
The point of this example is that the matrix A does not initially appear
to have any special geometric meaning but by finding its eigenvalues and
eigenspaces the geometric meaning of A becomes clear.
3. The 2 2 case
a b
Theorem 3.1. The eigenvalues of A =
are the solutions of the
c d
quadratic polynomial equation
(3-1)
2 (a + d) + ad bc = 0.
A quadratic polynomial with real coefficients can have either zero, one,
or two roots. It follows that a 2 2 matrix with real entries can have either
zero, one, or two eigenvalues. (Later we will see that an n n matrix with
real entries can have can have anywhere from zero to n eigenvalues.)
3.1. The polynomial x2 + 1 has no real zeroes, so the matrix
0 1
1 0
has no eigenvalues. You might be happy to observe that this is the matrix
that rotates the plane through an angle of 90 in the counterclockwise direction. It is clear that such a linear transformation will not send any line
to itself: rotating by 90 moves every line.
Of course, it is apparent that rotating by any angle that is not an integer
multiple of will have the same property: it will move every line, so will have
no eigenvalues. You should check this by using the formula for A in(7-1)
and then applying the quadratic formula to the equation (2x2.eigenvals).
3.2. The polynomial x2 4x + 4 has one zero, namely 2, because it is a
square, so the matrix
3 1
A=
1 1
has exactly one eigenvalue, namely 2. The 2-eigenspace is
1 1
1
E2 = N (A 2I) = N
=R
.
1 1
1
116
13. EIGENVALUES
1
1
is 2
. We know that dim E2 = 1
1
1
because the rank of A 2I is 1 and hence its nullity is 1.
You should check that A times
1 2
2
= N (B + I) = N
=R
.
2
4
1
117
not have an eigenvector. (Notice this argument did not use anything about
the size of the matrix A. If A is any square matrix with real entries such
that A2 + I = 0, then A does not have an eigenvalue. We should really say
A does not have a real eigenvalue.
With that preliminary step out of the way lets proceed to prove the
result we want. Let v be any non-zero vector in R3 and let V = Sp(v, Av).
Then V is an A-invariant subspace by which I mean if w V , then Aw
V . More colloquially, A sends elements of V to elements of V . Moreover,
dim V = 2 because the fact that A has no eigenvectors implies that Av is
not a multiple of v; i.e., {v, Av} is linearly independent.
Since V is a 2-dimensional subspace of R3 there is a non-zero vector w
R3 that is not in V . By the same argument, the subspace W := Sp(w, Aw)
/ V , V + W = R3 . From the
is A-invariant and has dimension 2. Since w
formula
dim(V + W ) = dim V + dim W dim(V W )
we deduce that dim(V W ) = 1. Since V and W are A-invariant, so is
V W . However, V W = Ru for some u and the fact that Ru is Ainvariant implies that Au is a scalar multiple of u; i.e., u is an eigenvector
for A. But that is false: we have already proved A has no eigenvectors.
Thus we are forced to conclude that there is no 3 3 matrix A such that
A2 + I = 0.
4.1. Why did I just prove that? A recent midterm contained the
following question: if A is a 33 matrix such that A3 +A = 0, is A invertible.
The answer is no, but the question is a very bad question because the
proof I had in mind is flawed. Let me explain my false reasoning: if
A3 + A = 0, then A(A2 + I) = 0 so A is not invertible because, as we
observed in chapter 4.4, if AB = 0 for some non-zero matrix B, then A
is not invertible. However, I can not apply the observation in chapter 4.4
unless I know that A2 + I is non-zero, and it is not easy to show that A2 + I
is not zero. The argument above is the simplest argument I know, and it
isnt so simple.
There is a shorter argument using determinants but at the time of the
midterm I had not introduced either determinants or eigenvalues.
5. The characteristic polynomial
The characteristic polynomial of an n n matrix A is the degree n polynomial
det(A tI).
You should convince yourself that det(A tI) really is a polynomial in t of
degree n. Lets denote that polynomial by p(t). If we substitute a number
for t, then
p() = det(A I).
The following result is an immediate consequence of this.
118
13. EIGENVALUES
1
1
B :=
1
1
1 1 1
1 1 1
1 1 1
1 1 1
1t
1
det
1
1
119
1
1
1
2t
0
0
t2
1 t 1
1
2t
0
t 2
= det 0
.
0
1 1 t 1
0
2 t t 2
1
1 1 t
1
1
1 1 t
1
0
0
1
0
1
0
1
.
det(B tI) = (2 t)3 det
0
0
1
1
1 1 1 1 t
The determinant does not change if
the last row so
0
det(B tI) = (2 t)3 det
0
0
0
1
0
1
= (t 2)3 (t + 2).
1
1
0 2 t
1
1
E2 = N (B 2I) = N
1
1
1
1
1
1
1
1
1
1
1
1
.
1
1
The rank of B 2I, i.e., the dimension of the linear span of its columns, is 1,
and its nullity is therefore 3. Therefore dim E2 = 3. It is easy to find three
linearly independent vectors x such that (B 2I)x = 0 and so compute
1
1
1
1
0
0
E2 = R
0 + R 1 + R 0 .
0
0
1
Finally,
1
3 1 1 1
1 3 1 1
1
= N (B + 2I) = N
1 1 3 1 = R 1 .
1 1 1 3
1
E2
1
1
A :=
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
0
1
1
1
.
1
1
120
13. EIGENVALUES
N (A) = R
0 + R 0 .
0
0
0
1
We dont need to know the zeroes of the cubic factor f (x) = x3 5x2 +2x1
just yet. However, a computation gives
f (1) = 3,
f (10) > 0,
f (0) = 1,
f 32 > 0,
so the graph y = f (x) crosses the x-axis at points in the open intervals
2
and
(1, 10).
0, 32 ,
3, 1 ,
Hence f (x) has three real zeroes, and A has three distinct non-zero eigenvalues. Because A has a 2-dimensional 0-eigenspace it follows that R5 has a
basis consisting of eigenvectors for A.
Claim: If is a non-zero eigenvalue for A, then
1 1
1 1
4 1 2
3
1
2 + 1
is a -eigenvector for A.
Proof of Claim: We must show that Ax = x. Since is an eigenvalue
it is a zero of the characteristic polynomial; since 6= 0, f () = 0, i.e.,
3 52 + 2 1 = 0. The calculation
1 2
1 1
1 1 1 1 1
1 1 1 1 1 1 1 1 2
2
2
Ax =
1 1 1 1 1 4 1 = 1 = x
3
1 1 0 1 1
3
1
1 1 1 0 1
2 + 2
2 + 1
shows that the Claim is true, however please, please, please, notice we used
the fact that is a solution to the equation x3 5x2 + 2x 1 = 0 in the
calculation in the following way: because 3 52 + 2 1 = 0 it follows
that 1 2 = (4 1 2 ), so the third entry in Ax is times the
third entry in x, thus justifying the final equality in the calculation.
Notice we did not need to compute the eigenvalues in order to find the
eigenspaces. If one wants more explicit information one needs to determine
the roots of the cubic equation x3 5x2 + 2x 1 = 0.
121
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
1
1
1
1
0
1
in row-reduced echelon form. To do this first subtract the top row from each
of the other rows to obtain
1 1
1
1
1
0
0
0
0 0
0
.
0 1 0
0
0 1
Now add the bottom row to the top row and subtract the bottom row from
each of the middle three rows, then move the bottom row to the top to get
0
0
1
1 1
1
0
1
0 0
.
1
0 0
1
0 0 1 1
1 0
1 1
0 1
0 0
0 0
1
1
0
1
1
.
1
1
1 1
Replace the second row by the second row minus the first row plus the third
row to get
1 0
0
1
0 0
1
2
3
0 1 0
.
0 0 1
1
0 0 1 1
122
13. EIGENVALUES
Now multiply the third row by 1, add the bottom row to the second row,
and subtract the bottom row from the fourth row, then multiply the bottom
row by 1, to get
1 0 0
1
0 0 0 2 + 1
3
1
B := 0 1 0
.
0 0 0 1 + 1
0 0 1
1 +
and
x5 = 2 + 1 .
1
1
2
x=
4 1 .
3
2 + 1
This is the matrix in the statement of the claim we made on page 120.
6.4. It aint easy. Before finding the eigenvector in the claim I made
many miscalculations, a sign here, a sign there, and pretty soon I was
doomed and had to start over. So, take heart: all you need is perseverance,
care, and courage, when dealing with a 5 5 matrix by hand. Fortunately,
there are computer packages to do this for you. The main thing is to understand the principle for finding the eigenvectors. The computer programs
simply implement the principle.
7. The utility of eigenvectors
Let A be an n n matrix, and suppose we are in the following good
situation:
Rn has a basis consisting of eigenvectors for A;
given v Rn we have an effective means of writing v as a linear
combination of the eigenvectors for A.
Then, as the next result says, it is easy to compute Av.
123
CHAPTER 14
126
127
an inverse that also belongs to Q(1). The number system Q(1) might also
seem a little mysterious until I tell you that a copyof Q(1) can already be
found inside R. In
fact, 1 is just a new symbol for 2. The number a + b 1
is really just a + b 2.
Now re-read the previous paragraph with this additional information.
2.2. The complex plane. As mentioned above, it is useful to picture
the set of real numbers as the points on a line, the real line, and equally
useful to picture the set of complex numbers as the points on a plane. The
horizontal axis is just the real line, sometimes called the real axis, and the
vertical axis, sometimes called the imaginary axis, is all real multiples of i.
We denote it by iR.
iR
O
a+ib
The
complex
plane C
1+2i
i1
1+i
2+i
/R
128
number is its relection in the real axis and z = z because reflecting twice in
the real axis is the same as doing nothing.
2.4. Roots of polynomials. Complex numbers were invented to provide solutions to quadratic polynomials that had no real solution. The
ur-example is x2 + 1 = 0 which has no solutions in R but solutions i in
C. This is the context in which you meet complex numbers at high school.
You make the observation there that the solutions of ax2 + bx + c = 0 are
given by
b b2 4ac
2a
provided a 6= 0 and the equation has no real solutions when b2 4ac < 0
but has two complex zeroes,
b
4ac b2
b
4ac b2
= +i
and
= i
.
2a
2a
2a
2a
are complex conjugates of each other.
Notice that the zeroes and
It is a remarkable fact that although the complex numbers were invented
with no larger purpose than providing solutions to quadratic polynomials,
they provide all solutions to polynomials of all degrees.
Theorem 2.1. Let f (x) = xn + a1 xn1 + a2 xn2 + + an1 x + an
be a polynomial with coefficients a1 , . . . , an belonging to R. Then there are
complex numbers r1 , . . . , rn such that
f (x) = (x r1 )(x r2 ) (x rn ).
Just as for quadratic polynomials, the zeroes of f come in conjugate pairs
because if f () = 0, then, using the properties of conjugation mentioned at
the end of chapter 2.3,
=
n + a1
n1 + a2
n2 + + an1
+ an
f ()
= n + a1 n1 + a2 n2 + + an1 + an
= n + a1 n1 + a2 n2 + + an1 + an
= f ()
= 0.
2.5. Application to the characteristic polynomial of a real matrix. Let A be an n n matrix whose entries are real numbers. Then
det(A tI) is a polynomial with real coefficients. If C is a zero of
det(A tI) so is .
Recall that the eigenvalues of A are exactly (the real numbers that are)
the zeroes of det(A tI). We wish to view the complex zeroes of det(A tI)
as eigenvalues of A but to do that we have to extend our framework for
linear algebra from Rn to Cn .
129
then x
is a eigenvector for A.
Proof. This is a simple calculation. If Ax = x, then
x
x = Ax = x =
A
x = A
as claimed.
A := A
and call this the conjugate transpose of A.
||z|| = z z.
If z 6= 0,
z 1 = z/||z||.
Check this.
We can extend the notion of norm, or size, from complex numbers to
vectors in Cn .
130
One important difference between complex and real linear algebra involves the definition of the norm of a vector. Actually, this change already
begins when dealing with complex numbers. The distance of a real number x from zero is given by its absolute
value |x|. The distance of a point
p
x = (x1 , x2 ) R2 from zero is x21 + x22 . Picturing the complex plane as
R2 , the distance of z = a + ib from 0 is
p
a2 + b2 = z z.
If Rn , the distance of a point x = (x1 , . . . , xn )| from the origin is given
by taking the square root of its dot product with itself:
q
x x=
x
j xj =
|xj |2
j=1
j=1
This is a clever proof. Lets see how to prove the 2 2 case in a less
clever way. Consider the symmetric matrix
a b
A=
b d
5. AN EXTENDED EXERCISE
131
2 1 1 1
1 1 1 0
A=
1 0 1 1 .
1 1 0 1
Compute the characteristic polynomial of A and factor it as much as you can.
Remember to use elementary row operations before computing det(A xI).
Can you write the characteristic polynomial of A as a product of 4 linear
terms, i.e., as (x )(x )(x )(x ) with , , , C?
What are the eigenvalues and corresponding eigenspaces for A in C4 ?
Show that A6 = I and that no lower power of A is the identity.
Suppose that is a complex eigenvalue for A, i.e., there is a non-zero
vector x C4 such that Ax = x. Show that 6 = I.
Let = e2i/3 . Thus 3 = 1. Show that (t 1)(t )(t 2 ) = 0.
Show that
{1, , 2 } = { C | 6 = 1}.
Compute Av 1 , Av 2 , Av 3 , and Av 4 , where
0
0
1
3
1
1
1
1
v1 =
1 , v 2 = 1 , v 3 = , v 4 = 2 .
1
1
2
CHAPTER 15
Orthogonality
The word orthogonal is synonymous with perpendicular.
1. Geometry
In earlier chapters I often used the word geometry and spoke of the
importance of have a geometric view of Rn and the sets of solutions to
systems of linear equations. However, geometry really involves lengths and
angles which have been absent from our discussion so far. The word geometry
is derived from the Greek word geometrein which means to measure the land:
cf. or meaning earth and metron or o meaning an instrument
to measuremetronome, metre, geography, etc.
In order to introduce angles and length in the context of vector spaces
we use the dot product.
2. The dot product
The dot product of vectors u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) in Rn
is the number
u v := u1 v1 + + un vn .
The norm or length of u is
||u|| := u u.
The justification for calling this the length is Pythagorass Theorem.
Notice that
u u = 0 if and only if u = 0this makes sense: the only vector
having length zero is the zero vector.
If u and v are column vectors of the same size, then u v = uT v =
v T u.
2.1. Angles. We define the angle between two non-zero vectors u and
v in Rn to be the unique angle [0, ] such that
uv
cos() :=
.
||u|| ||v||
Since cos( 2 ) = 0, the angle between u and v is 2 = 90 if and only if uv = 0.
Before accepting this definition as reasonable we should check that it
gives the correct answer for the plane.
Suppose u, v R2 . If u or v is a multiple of the other, then the angle
between them is zero and this agrees with the formula because cos(0) = 1.
133
134
15. ORTHOGONALITY
Now assume u 6= v. The cosine rule applied to the triangle with sides u, v,
and u v, gives
||u v||2 = ||u||2 + ||v||2 2||u|| ||v|| cos().
The left-hand side of this expression is ||u||2 2(u v) + ||v||2 . The result now
follows.
General case done by reducing to the R2 case using the fact that every
pair of vectors in Rn lies in a plane.
3. Orthogonal vectors
Two non-zero vectors u and v are said to be orthogonal if u v = 0.
The following result is a cute observation.
Proposition 3.1. Let A be a symmetric nn matrix and suppose and
are different eigenvalues of A. If u E and v E , then u is orthogonal
to v.
Proof. The calculation
uT v = (u)T v = (Au)T v = uT AT v = uT Av = uT v
shows that ( )uT v = 0. But 6= 0 so uT v = 0.
135
f g :=
dx.
1 x2
1
It is easy to see that f g = g f , that f (g + h) = f g + f g, and
(f ) g = (f g) for all R.
It is common practice to call this dot product the inner product and write
it as (f, g) rather than f g. We will do this.
The Chebyshev polynomials are defined to be
T0 (x) := 1
T1 (x) := x
Tn (x) = 2xTn (x) Tn1 (x)
and
for all n 2.
if m 6= n
Z 1
0
Tm (x)Tn (x)
dx =
if m = n = 0
1 x2
1
/2 if m = n 1
Proof. Use a change of variables x = cos() and use an induction argument
to prove that Tn (cos()) = cos(n).
5. Orthogonal and orthonormal bases
A set of non-zero vectors {v 1 , . . . , v d } is said to be orthogonal if v i is
orthogonal to v j whenever i 6= j. A set consisting of a single non-zero vector
is declared to be an orthogonal set.
Lemma 5.1. An orthogonal set of vectors is linearly independent.
Proof. Let {v 1 , . . . , v d } be an orthogonal set of vectors and suppose that
a1 v 1 + + ad v d = 0. Then
0 = v i (a1 v 1 + + ad v d ).
However, when we multiply this out all but one term will vanish because
v i v j = 0 when j 6= i. The non-vanishing term is ai v i , so ai v i = 0. But v i 6= 0,
so ai = 0. Thus all the ai s are zero and we conclude that {v 1 , . . . , v d } is
linearly independent.
136
15. ORTHOGONALITY
137
138
15. ORTHOGONALITY
7. Orthogonal matrices
A matrix A is orthogonal if AT = A1 . By definition, an orthogonal
matrix has an inverse so must be a square matrix.
An orthogonal linear transformation is a linear transformation implemented
by an orthogonal matrix, i.e., if T (x) = Ax and A is an orthogonal matrix
we call T an orthogonal linear transformation.
Theorem 7.1. The following conditions on an nn matrix Q are equivalent:
(1) Q is an orthogonal matrix;
(2) Qx Qy = x y for all x, y Rn ;
(3) ||Qx|| = ||x|| for all x Rn ;
(4) the columns of Q, {Q1 , . . . , Qn }, form an orthonormal basis for Rn .
Proof. We start by pulling a rabbit out of a hat. Let x, y Rn . Then
1
2
2
1
= 4 x x + 2x y + y y 14 x x 2x y + y y
4 ||x + y|| ||x y||
= x y.
We will use the fact that 14 ||x + y||2 ||x y||2 later in the proof.
(1) (2) If Q is orthogonal, then QT Q = I so
(Qx) (Qy) = (Qx)T (Qy) = xT QT Qy = xT y = x y.
(2) (3) because (3) is a special case of (2), the case x = y.
(3) (4) By using the hypothesis in (3) and the rabbit we obtain
x y = 14 ||x + y||2 ||x y||2
= 14 ||Q(x + y)||2 ||Q(x y)||2
= 14 ||Qx + Qy||2 ||Qx Qy||2
= (Qx) (Qy).
(We have just shown that (3) implies (2).) In particular, if {e1 , . . . , en } is
the standard basis for Rn , then
ei ej = (Qei ) (Qej ) = Qi Qj .
7. ORTHOGONAL MATRICES
139
140
15. ORTHOGONALITY
Theorem 7.4. Let A be a diagonal matrix with real entries. For every
orthogonal matrix Q, Q1 AQ is symmetric.
Proof. A matrix is symmetric if it is equal to its transpose. We have
T
T
T
Q1 AQ = QT AT Q1 = Q1 AT QT = Q1 AQ
so Q1 AQ is symmetric.
8. Orthogonal projections
Let V be a subspace of Rn . Let {u1 , . . . , uk } be an orthonormal basis
for V .
The function PV : Rn Rn defined by the formula
PV (v) := (v u1 )u1 + + (v uk )uk
(8-1)
it is a linear transformation;
PV (v) = v for all v V ;
R(PV ) = V ;
ker(PV ) = V ;
PV PV = PV .3
2Thus, although the formula on the right-hand side of (8-1) depends on the choice of
8. ORTHOGONAL PROJECTIONS
141
v = P (v) + v P (v) .
142
15. ORTHOGONALITY
143
say that. The theorem states a more general result. Try to understand why the theorem
implies P (v) is the point in R(A) closest to b. If you cant figure this out, ask.
5
The adjective best applies to the word approximation, not to the word
theorem.
6We use the word unique to emphasize the fact that there is only one point in V that
is closest to b. This fact is not obvious even though your geometric intuition might suggest
otherwise. If we drop the requirement that V is a subspace there can be more than one
point in V that is closest to some given point b. For example, if V is the circle x2 + y 2 = 1
in R2 and b = (0, 0), then all points in V have the same distance from b. Likewise, if
2
2
b = (0, 2) andq
V is the
parabola
qy = x in R , there are two points on V that are closest
to b, namely
3 3
,
2 2
and
3 3
,
2 2
144
15. ORTHOGONALITY
through 0 and w P (b). Thus, the triangle has a right angle at the origin.
The longest side of a right-angled triangle is its hypotenuse which, in this
case, is the line through b P (b), and w P (b). That side has length
||(w P (b)) (b P (b))|| = ||w b||.
Therefore ||w b|| > ||b P (b)||.
L0
It is pretty obvious that one should draw a line L0 through v that is perpendicular to L and the intersection point L L0 will be the point on L closest
to v.
That is the idea behind the next result because the point LL0 is PL (v).
9.2. A problem: find the line that best fits the data. Given
points
(1 , 1 ), . . . , (m , m )
in R2 find the line y = dx + c that best approximates those points. In
general there will not be a line passing through all the points but we still
want the best line. This sort of problem is typical when gathering data
to understand the relationship between two variables x and y. The data
obtained might suggest a linear relationship between x and y but, especially
in the social sciences, it is rare that the data gathered spells out an exact
linear relationship.
We restate the question of finding the line y = dx + c as a linear algebra
problem. The problem of finding the line that fits the data (i , i ), 1 i
145
d1 + c
d2 + c
d3 + c
..
.
dm + cm
1 1
1
2 1 2
d
3 1 d
3
=
or
A
=b
c
c
..
..
..
.
.
.
m 1
m
where A is an m 2 matrix and b Rm .
As we said, there is usually no solution ( dc ) to this system of linear
equations but we still want to find a line that is as close as possible to
the given points. The next definition makes the idea as close as possible
precise.
Definition 9.2. Let A be an m n matrix and b a point in Rm . We
call x a least-squares solution to the equation Ax = b if Ax is as close
as possible to b; i.e.,
||Ax b|| ||Ax b||
for all x Rn .
R(A)
b
R(A)
146
15. ORTHOGONALITY
AT (b
147
1 1 1
1
2 2 1 2
a
2
a
2 3 1 3
3
b
b = b
=
or
A
..
..
..
.. c
c
.
.
.
.
2
m
m m 1
where A is an m 3 matrix and b Rm .
So, using the ideas in chapter 9 we must solve the equation
AT Ax = AT b
to get an a , b , and c that gives a parabola y = a x2 + b x + c that best
fits the data.
10.1. Why call it least-squares? We are finding an x that minimizes
||Ax b|| and hence ||Ax b||2 But
||Ax b||2 = (Ax b) (Ax b) = a sum of squares.
11. Gazing into the distance: Fourier series
12. Infinite sequences
CHAPTER 16
Similar matrices
1. Definition
Similar matrices are just that, similar, very similar, in a sense so alike
as to be indistinguishable in their essential properties. Although a precise
definition is required, and will be given shortly, it should be apparent that
the matrices
1 0
0 0
A=
and
B=
0 0
0 1
are, in the everyday sense of the word, similar. Multiplication by A and
multiplication by B are linear transformations from R2 to R2 ; A sends e1 to
itself and kills e2 ; B sends e2 to itself and kills e1 . Pretty similar behavior,
huh!?
150
3. Example
Lets return to the example in chapter 1. The only difference between
A and B is the choice of labeling: which order do we choose to write
1
0
and
0
1
or, if you prefer, which is labelled e1 and which is labelled e2 .
There is a linear transformation from R2 to R2 that interchanges e1 and
e2 , namely
0 1
S=
;
Se1 = e2 and Se2 = e1 .
1 0
Since S 2 = I, S 1 = S and
0 1
0 0
0 1
0 1
0 1
1 0
1
SBS =
=
=
= A.
1 0
0 1
1 0
0 0
1 0
0 0
The calculation shows that A = SBS 1 ; i.e., A is similar to B in the
technical sense of the definition in chapter 2.
4
We now show that similar matrices have some of the same properties.
Theorem 4.1. Similar matrices have the same
(1) determinant;
(2) characteristic polynomial;
(3) eigenvalues.
Proof. Suppose A and B are similar. Then A = SBS 1 for some invertible
matrix S.
(1) Since det(S 1 ) = (det S)1 , we have
det(A) = (det S)(det B)(det S 1 ) = (det S)(det B)(det S)1 = det(B).
We used the fact that the determinant of a matrix is a number.
(2) Since S(tI)S 1 = tI, A tI = S(B tI)S 1 ; i.e., A tI and B tI
are similar. (2) now follows from (1) because the characteristic polynomial
of A is det(A tI).
(3) The eigenvalues of a matrix are the roots of its characteristic polynomial. Since A and B have the same characteristic polynomial they have
the same eigenvalues.
4.1. Warning: Although they have the same eigenvalues similar matrices do not usually have the same eigenvectors or eigenspaces. Nevertheless,
there is a precise relationship between the eigenspaces of similar matrices.
We prove that in Proposition 4.2
151
152
4.3. Intrinsic and extrinsic properties of matrices. It is not unreasonable to ask whether a matrix that is similar to a symmetric matrix is symmetric. Suppose A is symmetric and S is invertible. Then
(SAS 1 )T = (S 1 )T AT S T = (S T )1 AS T and there is no obvious reason
which this should be the same as SAS 1 . The explicit example
1
1 2
2 3
1 2
8 11
1 2
8 5
=
=
0 1
3 4
0 1
3 4
0 1
3 2
shows that a matrix similar to a symmetric matrix need not be symmetric.
PAUL - More to say.
5. Diagonalizable matrices
5.1. Diagonal matrices. A square matrix D = (dij ) is diagonal if
dij = 0 for all i 6= j. In other words,
1 0 0
0
0
0 2 0
0
0
0 0 3
0
0
D = ..
.
..
..
.
0 0 0 n1 0
0 0 0
0
n
for some 1 , . . . , n R. We sometimes use the abbreviation
1 0 0
0
0
0 2 0
0
0
0 0 3
0
0
diag(1 , . . . , n ) = ..
.. .
.
.
.
.
.
0 0 0 n1 0
0 0 0
0
n
The identity matrix and the zero matrix are diagonal matrices.
Let D = diag(1 , . . . , n ). The following facts are obvious:
(1)
(2)
(3)
(4)
5. DIAGONALIZABLE MATRICES
153
and
2 1
S=
,
1 1
then
S
AS =
1 1
1 2
5 6
2 1
2 0
=
3 4
1 1
0 1
so A is diagonalizable.
5.4. The obvious questions are how do we determine whether A is diagonalizable or not and if A is diagonalizable how do we find an S such
that S 1 AS is diagonal. The next theorem and its corollary answer these
questions.
Theorem 5.1. An n n matrix A is diagonalizable if and only if it has
n linearly independent eigenvectors.
Proof. We will use the following fact. If B is a p q matrix and C a q r
matrix, then the columns of BC are obtained by multiplying each column
of C by B; explicitly,
BC = [BC 1 , . . . , BC r ]
where C j is the j th column of C.
() Let {u1 , . . . , un } be linearly independent eigenvectors for A with
Auj = j uj for each j. Define
S := [u1 , . . . , un ],
i.e., the j th column of S is uj . The columns of S are linearly independent
so S is invertible. Since
I = S 1 S = S 1 [u1 , . . . , un ] = [S 1 u1 , . . . , S 1 un ],
S 1 uj = ej , the j th standard basis vector.
154
Now
S 1 AS = S 1 A[u1 , . . . , un ]
= S 1 [Au1 , . . . , Aun ]
= S 1 [1 u1 , . . . , n un ]
= [1 S 1 u1 , . . . , n S 1 un ]
= [1 e1 , . . . , n en ]
1 0 0
0 2 0
0 0 3
= ..
..
.
.
0 0 0
0 0 0
0
0
0
n1
0
0
0
0
..
.
0
n
Thus A is diagonalizable.
() Now suppose A is diagonalizable, i.e.,
S 1 AS = D = diag(1 , . . . , n )
for some invertible S. Now
[AS 1 , . . . , AS n ] = AS
= SD
= S[1 e1 , . . . , n en ]
= [1 Se1 , . . . , n Sen ]
= [1 S 1 , . . . , n S n ]
so AS j = j S j for all j. But the columns of S are linearly independent so
A has n linearly independent eigenvectors for A.
The proof of Theorem 5.1 established the truth of the following corollary.
Corollary 5.2. If A is a diagonalizable nn matrix and u1 , . . . , un are
linearly independent eigenvectors for A and S = [u1 , . . . , un ], then S 1 AS
is diagonal.
Corollary 5.3. If an n n matrix has n distinct eigenvalues it is
diagonalizable.
Proof. Suppose 1 , . . . , n are the distinct eigenvalues for A and for each
i let v i be a non-zero vector such that Av i = i v i . By Theorem 1.2,
{v 1 , . . . , v n } is a linearly independent set. It now follows from Theorem
5.1 that Ais diagonalizable.
5. DIAGONALIZABLE MATRICES
155
6 6
1
= N (A + I) = N
=R
.
3 3
1
Thus S can be any matrix with one column a non-zero multiple of (2 1)T
and the other a non-zero multiple of (1 1)T . For example, the matrix we
used before, namely
2 1
,
1 1
works.
It is important to realize that S is not the only matrix that works.
For example, if
3 2
R=
.
3 1
Now
1 1 2
5 6
3 2
1 0
1
=
.
R AR =
3 4
3 1
0 2
3 3 3
Notice that R1 AR 6= S 1 AS. In particular, A is similar to two different
diagonal matrices.
CHAPTER 17
158
4. THE WORD IT
159
case, it seems to me that the word all should be used in this definition.
The following version of the definition seems less clear:
We say that {v 1 , . . . , v k } is orthogonal if v i v j = 0 where
i 6= j.
Where seems to need some interpretation and may be mis-interpreted.
Better to use the word all because there is little possibility it can be
misunderstood.
2.3. I used the word where in my definition of linear span.
A linear combination of vectors v 1 , . . . , v n in Rm is a vector
that can be written as
a1 v 1 + + an v n
for some numbers a1 , . . . , an R.
Let A be any m n matrix and x Rn . Then Ax is a
linear combination of the columns of A. Explicitly,
(2-1)
Ax = x1 A1 + + xn An
where Aj denotes the j th column of A.
A function T : V W between two vector spaces is a
linear transformation if T (au + bu) = aT (u) + bT (v) where
u, v V and a, b R.
160
CHAPTER 18
There are three loops and three nodes in this circuit. We must put arrows on
each loop indicating the direction the current is flowing. It doesnt matter
how we label these currents, or the direction we choose for the current flowing
161
162
I1
O
I3
I2
D
I4
O
I5
G
2. MAGIC SQUARES
163
4
9
2
n = 3,
3 8
1 2 3
5 1 5E = 4
0 4
7 6
3 2
1
1
0
0
0
matrix is
0
1
0
0
0
1
0
0
0
0
1
0
0
1
1
0
0
0
0
1
1
0
0
0
1
1
0
0
1
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
1
1
0
1
0
0
0
0
0
1
1
0
0
1
main diagonal
column 2
column 3
column 1
row 2
row 1
anti-diagonal
row 3
164
I dont want
progressively
1 0 0
0 1 0
0 0 1
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0
0 1
0 0 0
1
MD
0 1
0 0 1
0
C2
0 0
1 0 0
1
C3
1 1 0 1 0 1
C1-MD
0 1 1 1 0 1
AD-C3
0 0
0 0 0
0
R2+R1+R3-C1-C2-C3
0 2 1 0 1 2
R1-MD-C2-C3
0 0
0 1 1
1
row 3
then
1
0
0
0
0 1
0 0 0
1
0 1
0 0 1
0
0 0
1 0 0
1
1 1 0 1 0 1
0 1 1 1 0 1
0 0
0 0 0
0
0 0 3 2 1 4
0 0
0 1 1
1
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
MD
C2
C3
C1-MD
AD-C3
R2+R1+R3-C1-C2-C3
R1-MD-C2-3C3+2AD
row 3
We can already see from this that x32 and x33 will be the independent
variables, so lets take
x33 = 1, x32 = 2.
Evaluating the dependent variables gives
x31 = 3, x23 = 4, x22 = 0, x21 = 4, x13 = 3, x12 = 2, x11 = 1,
which is the 0-valued magic square we obtained before.
The rank of the matrix is 8-1=7, so its nullity is 2. That is equivalent
to the existence of two independent variables, so lets find another solution
that together with the first will give us a basis. It is obvious though that a
3 3 matrix has various symmetries that we can exploit.
For the n n case we get a homogeneous system of 2n + 2 equations in
n2 variables. If n > 2 this system has a non-trivial solution.
CHAPTER 19
Last rites
1. A summary of notation
Sets are usually denoted by upper case letters.
f : X Y denotes a function f from a set X to a set Y . If f : X Y
and g : Y Z are functions, then g f or gf denotes the composition first
do f then do g. Thus gf : X Z and is defined by
(gf )(x) = g(f (x)).
This makes sense because f (x) is an element of Y so g can be applied to it.
The identity function idX : X X is defined by
idX (x) = x
for all x X.
166
2. A sermon
Learning a new part of mathematics is like learning other things: photography, cooking, riding a bike, playing chess, listening to music, playing
music, tennis, tasting wine, archery, seeing different kinds of art, etc. At
some point one does, or does not, make the transition to a deeper involvement where one internalizes and makes automatic the steps that are first
learned by rote. At some point one does, or does not, integrate the separate
discrete aspects of the activity into a whole.
You will know whether you are undergoing this transition or not. It may
be happening slowly, more slowly than you want, but it is either happening
or not. Be sensitive to whether it is happening. Most importantly, ask
yourself whether you want it to happen and, if you do, how much work
you want to do to make it happen. It will not happen without your active
involvement.
Can you make fluent sentences about linear algebra? Can you formulate
the questions you need to ask in order to increase your understanding?
Without this internalization and integration you will feel more and more like
a juggler with too many balls in the air. Each new definition, concept, idea,
is one more ball to keep in the air. It soon becomes impossible unless one
sees the landscape of the subject as a single entity into which the separate
pieces fit together neatly and sensibly.
Linear algebra will remain a great struggle if this transition is not happening. A sense of mastery will elude you. Its opposite, a sense that the
subject is beyond you, will take rootparalysis and fear will set in. That
is the dark side of all areas of life that involve some degree of competence,
public performance, evaluation by others, and consequences that you care
about.
Very few people, perhaps none, get an A in 300- and higher-level math
courses unless they can integrate the separate parts into a whole in which
the individual pieces not only make sense but seem inevitable and natural.
It is you, and only you, who will determine whether you understand
linear algebra. It is not easy.
Good luck!