Interactive Linear Algebra
Interactive Linear Algebra
Dan Margalit
Georgia Institute of Technology
Joseph Rabinoff
Georgia Institute of Technology
JOSEPH RABINOFF
School of Mathematics
Georgia Institute of Technology
rabinoff@math.gatech.edu
Joseph Rabinoff contributed all of the figures, the demos, and the technical
aspects of the project, as detailed below.
All source code can be found on GitHub. It may be freely copied, modified, and
redistributed, as detailed in the appendix entitled “GNU Free Documentation Li-
cense.”
Larry Rolen wrote many of the exercises.
v
vi
Variants of this textbook
• The version for math 1553 is fine-tuned to contain only the material covered
in Math 1553 at Georgia Tech.
The section numbering is consistent across versions. This explains why Section
6.3 does not exist in the Math 1553 version, for example.
You are currently viewing the master version.
vii
viii
Contents
1 Overview 1
5 Determinants 185
5.1 Determinants: Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5.2 Cofactor Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.3 Determinants and Volumes . . . . . . . . . . . . . . . . . . . . . . . . . 220
ix
x CONTENTS
7 Orthogonality 337
7.1 Dot Products and Orthogonality . . . . . . . . . . . . . . . . . . . . . . 338
7.2 Orthogonal Complements . . . . . . . . . . . . . . . . . . . . . . . . . . 345
7.3 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
7.4 Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
7.5 The Method of Least Squares . . . . . . . . . . . . . . . . . . . . . . . . 380
B Notation 407
Index 421
Chapter 1
Overview
The Subject of This Textbook Before starting with the content of the text, we
first ask the basic question: what is linear algebra?
• Linear: having to do with lines, planes, etc.
1
2 CHAPTER 1. OVERVIEW
• Sometimes the coefficients also contain parameters, like the eigenvalue equa-
tion
(7 − λ)x + y+ 3z = 0
(
−3x + (2 − λ) y − 3z = 0
−3x − 2 y + (−1 − λ)z = 0.
This text is roughly half computational and half conceptual in nature. The
main goal is to present a library of linear algebra tools, and more importantly, to
teach a conceptual framework for understanding which tools should be applied in
a given context.
If Matlab can find the answer faster than you can, then your question is just
an algorithm: this is not real problem solving.
The subtle part of the subject lies in understanding what computation to ask
the computer to do for you—it is far less important to know how to perform com-
putations that a computer can do better than you anyway.
3
Ax = b or Ax = λx or Ax ≈ b.
Here we present some sample problems in science and engineering that require
linear algebra to solve.
Example (Civil Engineering). The following diagram represents traffic flow around
the town square. The streets are all one way, and the numbers and arrows indi-
cate the number of cars per hour flowing along each street, as measured by sensors
underneath the roads.
250 120
x
120 70
w y
175 530
z
115 390
There are no sensors underneath some of the streets, so we do not know how
much traffic is flowing around the square itself. What are the values of x, y, z, w?
Since the number of cars entering each intersection has to equal the number of
cars leaving that intersection, we obtain a system of linear equations:
w + 120 = x + 250
x + 120 = y + 70
y + 530 = z + 390
z + 115 = w + 175.
4 CHAPTER 1. OVERVIEW
x C2 H6 + y O2 → z CO2 + w H2 O
What ratio of the molecules is needed to sustain the reaction? The following
three equations come from the fact that the number of atoms of carbon, hydro-
gen, and oxygen on the left side has to equal the number of atoms on the right,
respectively:
2x = z
6x = 2w
2 y = 2z + w.
4. rabbits produce 0, 6, 8 baby rabbits in their first, second, and third years,
respectively.
If you know the rabbit population in 2016 (in terms of the number of first, sec-
ond, and third year rabbits), then what is the population in 2017? The rules for
reproduction lead to the following system of equations, where x, y, z represent the
number of newborn, first-year, and second-year rabbits, respectively:
A common question is: what is the asymptotic behavior of this system? What will
the rabbit population look like in 100 years? This turns out to be an eigenvalue
problem.
Left: the population of rabbits in a given year. Right: the proportions of rabbits in
that year. Choose any values you like for the starting population, and click “Advance
1 year” several times. What do you notice about the long-term behavior of the ratios?
This phenomenon turns out to be due to eigenvectors.
5
(0, 2), (2, 1), (1, −1), (−1, −2), (−3, 1), (−1, −1).
Its orbit around the sun is elliptical; it is described by an equation of the form
x 2 + B y 2 + C x y + Dx + E y + F = 0.
What is the most likely orbit of the asteroid, given that there was some signifi-
cant error in measuring its position? Substituting the data points into the above
equation yields the system
There is no actual solution to this system due to measurement error, but here is
the best-fitting ellipse:
(0, 2)
(−3, 1)
(2, 1)
(−1, 1)
(1, −1)
(−1, −2)
Example (Computer Science). Each web page has some measure of importance,
which it shares via outgoing links to other pages. This leads to zillions of equations
in zillions of variables. Larry Page and Sergei Brin realized that this is a linear
algebra problem at its core, and used the insight to found Google. We will discuss
this example in detail in Section 6.6.
6 CHAPTER 1. OVERVIEW
How to Use This Textbook There are a number of different categories of ideas
that are contained in most sections. They are listed at the top of the section, under
Objectives, for easy review. We classify them as follows.
• Recipes: these are algorithms that are generally straightforward (if some-
times tedious), and are usually done by computer in real life. They are
nonetheless important to learn and to practice.
• Essential vocabulary words: these vocabulary words are essential in that they
form the essence of the subject of linear algebra. For instance, if you do not
know the definition of an eigenvector, then by definition you cannot claim
to understand linear algebra.
• Theorems: these describe in a precise way how the objects of interest relate
to each other. Knowing which recipe to use in a given situation generally
means recognizing which vocabulary words to use to describe the situation,
and understanding which theorems apply to that problem.
Finally, we remark that there are over 140 interactive demos contained in the
text, which were created to illustrate the geometry of the topic. Click the “view in
a new window” link, and play around with them! You will need a modern browser.
Internet Explorer is not a modern browser; try Safari, Chrome, or Firefox. Here is
a demo from Section 7.5:
Feedback Every page of the online version has a link on the bottom for providing
feedback. This will take you to the GitHub Issues page for this book. It requires a
Georgia Tech login to access.
8 CHAPTER 1. OVERVIEW
Chapter 2
In Section 2.1, we will introduce systems of linear equations, the class of equa-
tions whose study forms the subject of linear algebra. In Section 2.2, will present
a procedure, called row reduction, for finding all solutions of a system of linear
equations. In Section 2.3, you will see hnow to express all solutions of a system
of linear equations in a unique way using the parametric form of the general solu-
tion.
Objectives
9
10 CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS: ALGEBRA
During the first half of this textbook, we will be primarily concerned with un-
derstanding the solutions of systems of linear equations.
Definition. An equation in the unknowns x, y, z, . . . is called linear if both sides
of the equation are a sum of (constant) multiples of x, y, z, . . ., plus an optional
constant.
For instance,
3x + 4 y = 2z
−x − z = 100
are linear equations, but
3x + yz = 3
sin(x) − cos( y) = 2
are not.
We will usually move the unknowns to the left side of the equation, and move
the constants to the right.
A system of linear equations is a collection of several linear equations, like
x + 2 y + 3z = 6
(
2x − 3 y + 2z = 14 (2.1.1)
3x + y − z = −2.
A system of linear equations need not have a solution. For example, there do
not exist numbers x and y making the following two equations true simultane-
ously:
x + 2y = 3
§
x + 2 y = −3.
In this case, the solution set is empty. As this is a rather important property of a
system of equations, it has its own name.
Definition. A system of equations is called inconsistent if it has no solutions. It
is called consistent otherwise.
A solution of a system of equations in n variables is a list of n numbers. For
example, (x, y, z) = (1, −2, 3) is a solution of (2.1.1). As we will be studying
solutions of systems of equations throughout this text, now is a good time to fix
our notions regarding lists of numbers.
2.1. SYSTEMS OF LINEAR EQUATIONS 11
In other words, Rn is just the set of all (ordered) lists of n real numbers. We
will draw pictures of Rn in a moment, but keep in mind that this is the definition.
For example, (0, 23 , −π) and (1, −2, 3) are points of R3 .
−3 −2 −1 0 1 2 3
(1, 2)
(0, −3)
12 CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS: ALGEBRA
(−2, 2, 2)
(1, −1, 3)
A point in 3-space, and its coordinates. Click and drag the point, or move the sliders.
Example (Color Space). All colors you can see can be described by three quanti-
ties: the amount of red, green, and blue light in that color. (Humans are trichro-
matic.) Therefore, we can use the points of R3 to label all colors: for instance, the
point (.2, .4, .9) labels the color with 20% red, 40% green, and 90% blue intensity.
2.1. SYSTEMS OF LINEAR EQUATIONS 13
blue
n
gree
red
Example (Traffic Flow). In Chapter 1, we could have used R4 to label the amount
of traffic (x, y, z, w) passing through four streets. In other words, if there are
10, 5, 3, 11 cars per hour passing through roads x, y, z, w, respectively, then this
can be recorded by the point (10, 5, 3, 11) in R4 . This is useful from a psychologi-
cal standpoint: instead of having four numbers, we are now dealing with just one
piece of data.
w y
Definition (Lines). For our purposes, a line is a ray that is straight and infinite in
both directions.
2.1. SYSTEMS OF LINEAR EQUATIONS 15
Two Equations in Two Variables. Now consider the system of two linear equa-
tions
x − 3 y = −3
§
2x + y = 8.
A solution to the system of both equations is a pair of numbers (x, y) that makes
both equations true at once. In other words, it as a point that lies on both lines
simultaneously. We can see in the picture above that there is only one point where
the lines intersect: therefore, this system has exactly one solution. (This solution
is (3, 2), as the reader can verify.)
Usually, two lines in the plane will intersect in one point, but of course this is
not always the case. Consider now the system of equations
x − 3 y = −3
§
x − 3 y = 3.
These define parallel lines in the plane.
2.1. SYSTEMS OF LINEAR EQUATIONS 17
The fact that that the lines do not intersect means that the system of equations
has no solution. Of course, this is easy to see algebraically: if x − 3 y = −3, then
it is cannot also be the case that x − 3 y = 3.
There is one more possibility. Consider the system of equations
x − 3 y = −3
§
2x − 6 y = −6.
The second equation is a multiple of the first, so these equations define the same
line in the plane.
In this case, there are infinitely many solutions of the system of equations.
Two Equations in Three Variables. Consider the system of two linear equations
x + y +z =1
n
x − z = 0.
Each equation individually defines a plane in space. The solutions of the system
of both equations are the points that lie on both planes. We can see in the picture
below that the planes intersect in a line. In particular, this system has infinitely
many solutions.
This means that every point on the line has the form (t, 1− t) for some real number
t. In this case, we call t a parameter, as it parameterizes the points on the line.
t = −1
t =0
t =1
Note that in each case, the parameter t allows us to use R to label the points
on the line. However, neither line is the same as the number line R: indeed, every
point on the first line line has two coordinates, like the point (0, 1), and every point
on the second line has three coordinates, like (0, 1, 0).
In this case, we need two parameters t and w to describe all points on the plane.
Note that the parameters t, w allow us to use R2 to label the points on the plane.
However, this plane is not the same as the plane R2 : indeed, every point on this
plane has three coordinates, like the point (0, 0, 1).
Objectives
4. Learn which row reduced matrices come from inconsistent linear systems.
x + 2 y + 3z = 6 −3x − 6 y − 9z = −18
( (
multiply 1st by −3
2x − 3 y + 2z = 14 −−−−−−−−−→ 2x − 3 y + 2z = 14
3x + y − z = −2 3x + y − z = −2
x + 2 y + 3z = 6 x+ 2 y + 3z = 6
( (
2nd = 2nd−2×1st
2x − 3 y + 2z = 14 −−−−−−−−−−→ −7 y − 4z = 2
3x + y − z = −2 3x + y − z = −2
x + 2 y + 3z = 6 3x + y − z = −2
( (
3rd ←→ 1st
2x − 3 y + 2z = 14 −−−−−−→ 2x − 3 y + 2z = 14
3x + y − z = −2 x + 2 y + 3z = 6
x + 2 y + 3z = 6 x+ 2 y + 3z = 6
2nd = 2nd−2×1st
2x − 3 y + 2z = 14 −−−−−−−−−−→ −7 y − 4z = 2
3x + y − z = −2 3x + = −2
y− z
x + 2 y + 3z = 6
3rd = 3rd−3×1st
−−−−−−−−−→ −7 y − 4z = 2
= −20
−5 y − 10z
x + 2 y + 3z = 6
2nd ←→ 3rd
−−−−−−→ −5 y − 10z = −20
=
−7 y − 4z 2
x + 2 y + 3z =6
divide 2nd by −5
−−−−−−−−→ y + 2z =4
=2
−7 y − 4z
x + 2 y + 3z = 6
3rd = 3rd+7×2nd
−−−−−−−−−−→ y + 2z = 4
= 30
10z
2.2. ROW REDUCTION 21
At this point we’ve eliminated both x and y from the third equation, and we can
solve 10z = 30 to get z = 3. Substituting for z in the second equation gives
y + 2 · 3 = 4, or y = −2. Substituting for y and z in the first equation gives
x + 2 · (−2) + 3 · 3 = 6, or x = 3. Thus the only solution is (x, y, z) = (1, −2, 3).
We can check that our solution is correct by substituting (x, y, z) = (1, −2, 3)
into the original equation:
x + 2 y + 3z = 6 1 + 2 · (−2) + 3 · 3 = 6
( (
substitute
2x − 3 y + 2z = 14 −−−−→ 2 · 1 − 3 · (−2) + 2 · 3 = 14
3x + y − z = −2 3 · 1 + (−2) − 3 = −2.
This is called an augmented matrix. The word “augmented” refers to the vertical
line, which we draw to remind ourselves where the equals sign belongs; a matrix
is a grid of numbers without the vertical line. In this notation, our three valid ways
of manipulating our equations become row operations:
Here the notation R1 simply means “the first row”, and likewise for R2 , R3 ,
etc.
Eliminating a variable from an equation means producing a zero to the left of the
line in an augmented matrix. First we produce zeros in the first column (i.e. we
eliminate x) by subtracting multiples of the first row.
1 2 3 6 1 2 3 6
R2 =R2 −2R1
2 −3 2 14 −−−−−−→ 0 −7 −4 2
3 1 −1 −2 3 1 −1 −2
1 2 3 6
R3 =R3 −3R1
−−−−−−→ 0 −7 −4 2
0 −5 −10 −20
This was made much easier by the fact that the top-left entry is equal to 1, so we
can simply multiply the first row by the number below and subtract. In order to
eliminate y in the same way, we would like to produce a 1 in the second column.
We could divide the second row by −7, but this would produce fractions; instead,
let’s divide the third by −5.
1 2 3 6 1 2 3 6
R3 =R3 ÷−5
0 −7 −4 2 −−−−−→ 0 −7 −4 2
0 −5 −10 −20 0 1 2 4
1 2 3 6
R2 ←→R3
−−−−→ 0 1 2 4
0 −7 −4 2
1 2 3 6
R2 ←→R3
−−−−→ 0 1 2 4
0 0 10 30
1 2 3 6
R3 =R3 ÷10
−−−−−→ 0 1 2 4
0 0 1 3
We swapped the second and third row just to keep things orderly. Now we translate
this augmented matrix back into a system of equations:
1 2 3 6 x + 2 y + 3z = 6
(
becomes
0 1 2 4 −−−−→ y + 2z = 4
0 0 1 3 z=3
Hence z = 3; back-substituting as in this example gives (x, y, z) = (1, −2, 3).
2.2. ROW REDUCTION 23
The process of doing row operations to a matrix does not change the solution
set of the corresponding linear equations!
Indeed, the whole point of doing these operations is to solve the equations using
the elimination method.
Definition. Two matrices are called row equivalent if one can be obtained from
the other by doing some number of row operations.
So the linear equations of row-equivalent matrices have the same solution set.
Example (An Inconsistent System). Solve the following system of equations using
row operations:
x+ y =2
(
3x + 4 y = 5
4x + 5 y = 9
Solution. First we put our system of equations into an augmented matrix.
x+ y =2
1 1 2
(
augmented matrix
3x + 4 y = 5 −−−−−−−−−→ 3 4 5
4x + 5 y = 9 4 5 9
2. The first nonzero entry of a row is to the right of the first nonzero entry of
the row above.
3. Below the first nonzero entry of a row, all entries are zero.
Definition. A pivot is the first nonzero entry of a row of a matrix in row echelon
form.
1 0 ? 0 ?
0 1 ? 0 ? ? = any number
0 0 0 1 ? 1 = pivot
0 0 0 0 0
2.2. ROW REDUCTION 25
A matrix in reduced row echelon form is in some sense completely solved. For
example,
1 0 0 1 x = 1
(
becomes
0 1 0 −2 −−−−→ y = −2
0 0 1 3 z = 3.
Example. The following matrices are in reduced row echelon form:
1 0 2 1 17 0 0 0 0
0 1 8 0 .
0 1 −1 0 0 1 0 0 0
The following matrices are in row echelon form but not reduced row echelon form:
2 7 1 4
2 1 0 0 2 1 1 17 0 2 1 3
.
0 1 0 1 1 0 0 0
0 0 0 3
This assumes, of course, that you only do the three legal row operations, and
you don’t make any arithmetic errors.
We will not prove uniqueness, but maybe you can!
26 CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS: ALGEBRA
Step 1a: Swap the 1st row with a lower one so a leftmost nonzero entry is in
the 1st row (if necessary).
Step 1b: Scale the 1st row so that its first nonzero entry is equal to 1.
Step 1c: Use row replacement so all entries below this 1 are 0.
Step 2a: Swap the 2nd row with a lower one so that the leftmost nonzero entry
is in the 2nd row.
Step 2b: Scale the 2nd row so that its first nonzero entry is equal to 1.
Step 2c: Use row replacement so all entries below this 1 are 0.
Step 3a: Swap the 3rd row with a lower one so that the leftmost nonzero entry
is in the 3rd row.
etc.
Last Step: Use row replacement to clear all entries above the pivots, starting
with the last pivot.
0 −7 −4 2
2 4 6 12 .
3 1 −1 −2
Solution.
2.2. ROW REDUCTION 27
! !
0 −7 −4 2 R1 ←→ R2 2 4 6 12
2 4 6 12 0 −7 −4 2
3 1 −1 −2 3 1 −1 −2
Step 1a: Row swap Step 1b: Scale to make this 1.
to make this nonzero.
!
R1 = R1 ÷ 2 1 2 3 6
0 −7 −4 2
3 1 −1 −2
Step 1c: Subtract a multiple of
the first row to clear this.
!
R2 ←→ R3 1 2 3 6
0 −5 −10 −20
0 −7 −4 2
Step 2a: This is already nonzero.
Step 2b: Scale to make this 1.
!
Note how Step 2b R2 = R2 ÷ −5 1 2 3 6
0 1 2 4
doesn’t create fractions. 0 −7 −4 2
Step 2c: Add 7 times
the second row to clear this.
!
R3 = R3 + 7R2 1 2 3 6
0 1 2 4
0 0 10 30
Step 3a: This is already nonzero.
Step 3b: Scale to make this 1.
!
R3 = R3 ÷ 10 1 2 3 6
0 1 2 4
0 0 1 3
Last step: add multiples of
the third row to clear these.
!
R2 = R2 − 2R3 1 2 3 6
0 1 0 −2
0 0 1 3
!
R1 = R1 − 3R3 1 2 0 −3
0 1 0 −2
0 0 1 3
Last step: add −2 times
the third row to clear this.
!
R1 = R1 − 2R2 1 0 0 1
0 1 0 −2
0 0 1 3
The reduced row echelon form of the matrix tells us that the only solution is
(x, y, z) = (1, −2, 3).
Get
a 1 here Clear down Get a 1 here
1 ? ? ?
? ? ? ?
1 ? ? ?
? ? ? ? ? ? ? ? 0 1 ? ?
? ? ? ? ? ? ? ? 0 ? ? ?
? ? ? ? ? ? ? ? 0 ? ? ?
2x + 10 y = −1
§
3x + 15 y = 2
Solution.
1 5 − 12
2 10 −1 R1 =R1 ÷2
−−−−−→ (Step 1b)
3 15 2 3 15 2
1
R2 =R2 −3R1 1 5 −2
−−−−−−→ 7 (Step 1c)
0 0 2
R2 =R2 × 27
1
1 5 −2
−−−−−→ (Step 2b)
0 0 1
R1 =R1 + 21 R2
1 5 0
−−−−−−→ (Step 2c)
0 0 1
x + 5y = 0
n
0 = 1.
In the above example, we saw how to recognize the reduced row echelon form
of a consistent system.
In other words, the row reduced matrix of an inconsistent system looks like
this:
1 0 ? ? 0
0 1 ? ? 0
0 0 0 0 1
1. When the reduced row echelon form of a matrix has a pivot in every non-
augmented column, then it corresponds to a system with a unique solution:
1 0 0 1 x = 1
(
translates to
0 1 0 −2 −−−−−−→ y = −2
0 0 1 3 z = 3.
2. When the reduced row echelon form of a matrix has a pivot in the last (aug-
mented) column, then it corresponds to a system with a no solutions:
x + 5y = 0
1 5 0
n
translates to
−−−−−−→
0 0 1 0 = 1.
What happens when one of the non-augmented columns lacks a pivot? This is the
subject of Section 2.3.
30 CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS: ALGEBRA
2x + y + 12z = 1
§
x + 2 y + 9z = −1
Solution.
2 1 12 1 R1 ←→R2 1 2 9 −1
−−−−→ (Optional)
1 2 9 −1 2 1 12 1
R2 =R2 −2R1 1 2 9 −1
−−−−−−→ (Step 1c)
0 −3 −6 3
R2 =R2 ÷−3 1 2 9 −1
−−−−−→ (Step 2b)
0 1 2 −1
R1 =R1 −2R2 1 0 5 1
−−−−−−→ (Step 2c)
0 1 2 −1
x + 5z = 1
§
y + 2z = −1.
Objectives
x + 2 y + 9z = −1.
We solve it using row reduction:
2 1 12 1 R1 ←→R2 1 2 9 −1
−−−−→ (Optional)
1 2 9 −1 2 1 12 1
R2 =R2 −2R1 1 2 9 −1
−−−−−−→ (Step 1c)
0 −3 −6 3
R2 =R2 ÷−3 1 2 9 −1
−−−−−→ (Step 2b)
0 1 2 −1
R1 =R1 −2R2 1 0 5 1
−−−−−−→ (Step 2c)
0 1 2 −1
This row reduced matrix corresponds to the linear system
x + 5z = 1
§
y + 2z = −1.
In what sense is the system solved? We rewrite as
x = 1 − 5z
§
y = −1 − 2z.
For any value of z, there is exactly one value of x and y that make the equations
true. But we are free to choose any value of z.
We have found all solutions: it is the set of all values x, y, z, where
x = 1 − 5z
(
y = −1 − 2z z any real number.
z= z
For instance, setting z = 0 gives the solution (x, y, z) = (1, −1, 0), and setting
z = 1 gives the solution (x, y, z) = (−4, −3, 1).
A picture of the solution set (the yellow line) of the linear system in this example.
There is a unique solution for every value of z; move the slider to change z.
In the above example, the variable z was free because the reduced row echelon
form matrix was
1 0 5 1
.
0 1 2 −1
In the matrix
1 ? 0 ? ?
,
0 0 1 ? ?
the free variables are x 2 and x 4 . (The augmented column is not free because it
does not correspond to a variable.)
Recipe: Parametric form. The parametric form of the solution set of a con-
sistent system of linear equations is obtained as follows.
4. Move all free variables to the right hand side of the equations.
Moving the free variables to the right hand side of the equations amounts to
solving for the non-free variables (the ones that come from columns with pivots) in
terms of the free variables. One can think of the free variables as being independent
variables, and the non-free variables being dependent.
Implicit Versus Parameterized Equations. The solution set of the system of linear
equations
2x + y + 12z = 1
§
x + 2 y + 9z = −1
is a line in R3 , as we saw in this example. These equations are called the implicit
equations for the line: the line is defined implicitly as the simultaneous solutions
to those two equations.
The parametric form
x = 1 − 5z
§
y = −1 − 2z.
can be written as follows:
This called a parameterized equation for the same line. It is an expression that
produces all points of the line in terms of one parameter, z.
One should think of a system of equations as being an implicit equation for
its solution set, and of the parametric form as being the parameterized equation
for the same set. The parameteric form is much more explicit: it gives a concrete
recipe for producing all solutions.
2.3. PARAMETRIC FORM 33
You can chose any value for the free variables in a (consistent) linear system.
Free variables come from the columns without pivots in a matrix in row echelon
form.
Example. Suppose that the reduced row echelon form of the matrix for a linear
system in four variables x 1 , x 2 , x 3 , x 4 is
1 0 0 3 2
.
0 0 1 4 −1
The free variables are x 2 and x 4 : they are the ones whose columns are not pivot
columns.
This translates into the system of equations
x1 + 3x 4 = 2 x 1 = 2 − 3x 4
§ §
parametric form
−−−−−−−−→
x 3 + 4x 4 = −1 x 3 = −1 − 4x 4 .
(x 1 , x 2 , x 3 , x 4 ) = (2 − 3x 4 , x 2 , −1 − 4x 4 , x 4 ),
for any values of x 2 and x 4 . For instance, (2, 0, −1, 0) is a solution (with x 2 = x 4 =
0), and (5, 1, 3, −1) is a solution (with x 2 = 1, x 4 = −1).
x + y +z =1
which is already in reduced row echelon form. The free variables are y and z. The
parametric form for the general solution is
(x, y, z) = (1 − y − z, y, z)
for any values of y and z. This is the parametric equation for a plane in R3 .
A plane described by two parameters y and z. Any point on the plane is obtained by
substituting suitable values for y and z.
34 CHAPTER 2. SYSTEMS OF LINEAR EQUATIONS: ALGEBRA
1. The last column is a pivot column. In this case, the system is inconsistent.
There are zero solutions, i.e., the solution set is empty. For example, the
matrix
1 0 0
0 1 0
0 0 1
comes from a linear system with no solutions.
2. Every column except the last column is a pivot column. In this case, the
system has a unique solution. For example, the matrix
1 0 0 a
0 1 0 b
0 0 1 c
3. The last column is not a pivot column, and some other column is not a
pivot column either. In this case, the system has infinitely many solutions,
corresponding to the infinitely many possible values of the free variable(s).
For example, in the system corresponding to the matrix
1 −2 0 3 1
,
0 0 1 4 −1
Primary Goals.
These objects are related in a beautiful way by the rank theorem in Section 3.9.
We will develop a large amount of vocabulary that we will use to describe
the above objects: vectors (Section 3.1), spans (Section 3.2), linear independence
(Section 3.5), subspaces (Section 3.6), dimension (Section 3.7), coordinate sys-
tems (Section 3.8), etc. We will use these concepts to give a precise geometric
description of the solution set of any system of equations (Section 3.4). We will
also learn how to express systems of equations more simply using matrix equations
(Section 3.3).
3.1 Vectors
Objectives
35
36 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
3.1.1 Vectors in Rn
We have been drawing points of Rn as dots in the line, plane, space, etc. We can
also draw them as arrows. Since we have two geometric interpretations in mind,
in what follows we will call an ordered list of n real numbers an element of Rn .
Points and Vectors. A point is an element of Rn , drawn as a point (a dot).
1
the vector 3
The difference is purely psychological: points and vectors are both just lists of
numbers.
Interactive: A vector in R3 , by coordinates.
A vector in R3 , and its coordinates. Drag the arrow head and tail.
3.1. VECTORS 37
Unless otherwise specified, we will assume that all vectors start at the origin.
Vectors makes sense in the real world: many physical quantities, such as ve-
locity, are represented as vectors. But it makes more sense to think of the velocity
of a car as being located at the car.
Remark. Some authors use boldface letters to represent vectors, as in “v”, or use
arrows, as in “~
v ”. As it is usually clear from context if a letter represents a vector,
we do not decorate vectors in this way.
Note. Another way to think about a vector is as a difference between two points,
1
or the arrow from one point to another. For instance, 2 is the arrow from (1, 1)
to (2, 3).
(2, 3)
1
2
(1, 1)
38 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Addition and scalar multiplication work in the same way for vectors of length n.
Example.
1 4 5 1 −2
2 + 5 = 7 and − 2 2 = −4 .
3 6 9 3 −6
The Parallelogram Law for Vector Addition Geometrically, the sum of two vec-
tors v, w is obtained as follows: place the tail of w at the head of v. Then v + w is
the vector whose tail is the tail of v and whose head is the head of w. Doing this
both ways creates a parallelogram. For example,
1 4 5
+ = .
3 2 5
Why? The width of v + w is the sum of the widths, and likewise with the
heights.
3.1. VECTORS 39
5=2+3=3+2
v
w
v+
v
w
5=1+4=4+1
The parallelogram law for vector addition. Click and drag the heads of and w.
v−
w
Scalar Multiplication A scalar multiple of a vector v has the same (or opposite)
direction, but a different length. For instance, 2v is the vector in the direction of
v but twice as long, and − 12 v is the vector in the opposite direction of v, but half
as long. Note that the set of all scalar multiples of a (nonzero) vector v is a line.
2v
0v
− 21 v
• v1 + v2
• v1 − v2
v1
• 2v1 + 0v2
v2 • 2v2
• −v1
The locations of these points are found using the parallelogram law for vector
addition. Any vector on the plane is a linear combination of v1 and v2 , with suitable
coefficients.
Linear combinations of two vectors in R2 : move the sliders to change the coefficients of
v1 and v2 . Note that any vector on the plane can be obtained as a linear combination
of v1 , v2 with suitable coefficients.
Linear combinations of three vectors: move the sliders to change the coefficients of
v1 , v2 , v3 . Note how the parallelogram law for addition of three vectors is more of a
“parallepiped law”.
3 1
1 3/2 −1/2
v= , v= , − v= , ...
2 2 3 2 −1
The set of all linear combinations is the line through v. (Unless v = 0, in which
case any scalar multiple of v is again 0.)
42 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Example (Linear Combinations of Collinear Vectors). The set of all linear combi-
nations of the vectors
2 −1
v1 = and v2 =
2 −1
is the line containing both vectors.
v1
v2
The difference between this and a previous example is that both vectors lie on
the same line. Hence any scalar multiples of v1 , v2 lie on that line, as does their
sum.
Interactive: Linear combinations of two collinear vectors.
Linear combinations of two collinear vectors in R2 . Move the sliders to change the
coefficients of v1 , v2 . Note that there is no way to “escape” the line.
3.2. VECTOR EQUATIONS AND SPANS 43
Objectives
simplifies to
x −y 8 x − y! 8
2x + −2 y = 16 or 2x − 2 y = 16 .
6x −y 3 6x − y 3
For two vectors to be equal, all of their coordinates must be equal, so this is just
the system of linear equations
x− y = 8
(
2x − 2 y = 16 (3.2.2)
6x − y = 3.
in the unknowns x, y? This vector equation translates into the system of linear
equations
x− y = 8
(
2x − 2 y = 16
6x − y = 3.
We form an augmented matrix and row reduce:
1 −1 8 1 0 −1
RREF
2 −2 16 − −→ 0 1 −9 .
6 −1 3 0 0 0
A picture of the vector equation (3.2.1). Try to solve the equation geometrically by
moving the sliders.
A Picture of a Consistent System. We saw in the above example that the system
of equations (3.2.2) is consistent. Equivalently, this means that the vector equation
(3.2.1) has a solution. Therefore, the figure above is a picture of a consistent system
of equations. Compare this figure below.
and row reducing. Note that the columns of the augmented matrix are the vectors
from the original vector equation, so it is not actually necessary to write the system
of equations: one can go directly from the vector equation to the augmented matrix
by “smooshing the vectors together”.
x 1 v1 + x 2 v2 + · · · + x p vp = b
Now we have (at least) two equivalent ways of thinking about systems of equa-
tions:
1. Augmented matrices.
3.2.2 Spans
It will be important to know what are all linear combinations of a set of vectors
v1 , v2 , . . . , vp in Rn . In other words, we would like to understand the set of all
vectors b in Rn such that the vector equation (in the unknowns x 1 , x 2 , . . . , x p )
x 1 v1 + x 2 v2 + · · · + x p vp = b
The above definition is the first of several essential definitions that we will see
in this textbook. They are essential in that they form the essence of the subject of
linear algebra: learning linear algebra means (in part) learning these definitions.
All of the definitions are important, but it is essential that you learn and understand
the definitions marked as such.
reads as: “the set of all things of the form x 1 v1 + x 2 v2 + · · · + x p vp such that
x 1 , x 2 , . . . , x p are in R.” The vertical line is “such that”; everything to the left of it
is “the set of all things of this form”, and everything to the right is the condition
that those things must satisfy to be in the set. Specifying a set in this way is called
set builder notation.
All mathematical notation is only shorthand: any sequence of symbols must
translate into a usual sentence.
x 1 v1 + x 2 v2 + · · · + x p vp = b
has a solution.
is consistent.
Equivalent means that, for any given list of vectors v1 , v2 , . . . , vp , b, either all
three statements are true, or all three statements are false.
3.2. VECTOR EQUATIONS AND SPANS 47
This is a picture of an inconsistent linear system: the vector w on the right-hand side
of the equation x 1 v1 + x 2 v2 = w is not in the span of v1 , v2 . Convince yourself of this
by trying to solve the equation x 1 v1 + x 2 v2 = w by moving the sliders, and by row
reduction. Compare this figure.
Span{v, w}
Span{v}
v
v
Span{v, w}
Pictures of spans in R2 .
Interactive picture of a span of two vectors in R2 . Check “Show x.v + y.w” and move
the sliders to see how every point in the violet region is in fact a linear combination
of the two vectors.
48 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Span{v} Span{v, w}
v v
Span{u, v, w} Span{u, v, w}
v v
u
u
w w
Pictures of spans in R3 . The span of two noncollinear vectors is the plane containing
the origin and the heads of the vectors. Note that three coplanar (but not collinear)
vectors span a plane and not a 3-space, just as two collinear vectors span a line and
not a plane.
Interactive picture of a span of two vectors in R3 . Check “Show x.v + y.w” and move
the sliders to see how every point in the violet region is in fact a linear combination
of the two vectors.
Interactive picture of a span of three vectors in R3 . Check “Show x.v + y.w + z.u”
and move the sliders to see how every point in the violet region is in fact a linear
combination of the three vectors.
3.3. MATRIX EQUATIONS 49
Objectives
1. Understand the equivalence between a system of linear equations, an aug-
mented matrix, a vector equation, and a matrix equation.
When we say “A is an m×n matrix,” we mean that A has m rows and n columns.
Remark. In this book, we do not reserve the letters m and n for the numbers of
rows and columns of a matrix. If we write “A is an n × m matrix”, then n is the
number of rows of A and m is the number of columns.
Definition. Let A be an m × n matrix with columns v1 , v2 , . . . , vn :
| | |
A = v1 v2 · · · vn
| | |
This is a vector in Rm .
50 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Example.
1
4 5 6 4 5 6 32
2 =1 +2 +3 = .
7 8 9 7 8 9 50
3
In order for Ax to make sense, the number of entries of x has to be the same as
the number of columns of A: we are using the entries of x as the coefficients of the
columns of A in a linear combination. The resulting vector has the same number
of entries as the number of rows of A, since each column of A has that number of
entries.
• A(u + v) = Au + Av
• A(cu) = cAu
x 1 v1 + x 2 v2 + · · · + x n vn = b.
x 1 v1 + x 2 v2 + · · · + x n vn = b,
Four Ways of Writing a Linear System. We now have four equivalent ways of
writing (and thinking about) a system of linear equations:
1. As a system of equations:
2x 1 + 3x 2 − 2x 3 = 7
§
x 1 − x 2 − 3x 3 = 5
2. As an augmented matrix:
2 3 −2 7
1 −1 −3 5
We will move back and forth freely between the four ways of writing a linear
system, over and over again, for the rest of the book.
52 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Another Way to Compute Ax The above definition is a useful way of defining the
product of a matrix with a vector when it comes to understanding the relationship
between matrix equations and vector equations. Here we give a definition that is
better-adapted to computations by hand.
Definition. A row vector is a matrix with one row. The product of a row vector
of length n and a (column) vector of length n is
x1
x2
a1 a2 · · · an ... = a1 x 1 + a2 x 2 + · · · + an x n .
xn
This is a scalar.
Example.
1
4 5 6 2
1
= 4 · 1 + 5 · 2 + 6 · 3 = 32 .
3
4 5 6
2 =
7 8 9 7·1+8·2+9·3 50
1
3
7 8 9 2
3
Then
Ax = b has a solution
x1
x2
... = b
⇐⇒ there exist x 1 , x 2 , . . . , x n such that A
xn
⇐⇒ there exist x 1 , x 2 , . . . , x n such that x 1 v1 + x 2 v2 + · · · + x n vn = b
⇐⇒ b is a linear combination of v1 , v2 , . . . , vn
⇐⇒ b is in the span of the columns of A.
2 1
Example (An Inconsistent System). Let A = −1 0 . Does the equation Ax =
1 −1
0
2 have a solution?
2
2 1
v1 = −1 and v2 = 0 ,
1 −1
0
and the target vector (on the right-hand side of the equation) is w = 2. The
2
equation Ax = w is consistent if and only if w is contained in the span of the
columns of A. So we draw a picture:
54 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Span{v1 , v2 }
w
v1
v2
Let us check our geometric answer by solving the matrix equation using row
reduction. We put the system into an augmented matrix and row reduce:
2 1 0 1 0 0
RREF
−1 0 2 −−→ 0 1 0 .
1 −1 2 0 0 1
The last equation is 0 = 1, so the system is indeed inconsistent, and the matrix
equation
2 1 0
−1 0 x = 2
1 −1 2
has no solution.
2 1
Example (A Consistent System). Let A = −1 0 . Does the equation Ax =
1 −1
1
−1 have a solution?
2
3.3. MATRIX EQUATIONS 55
Span{v1 , v2 }
w
v1
v2
Let us check our geometric answer by solving the matrix equation using row
reduction. We put the system into an augmented matrix and row reduce:
2 1 1 1 0 1
RREF
−1 0 −1 −−→ 0 1 −1 .
1 −1 2 0 0 0
56 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
When Solutions Always Exist Building on this note, we have the following cri-
terion for when Ax = b is consistent for every choice of b.
Recall that equivalent means that, for any given matrix A, either all of the
conditions of the above theorem are true, or they are all false.
Be careful when reading the statement of the above theorem. The first two
conditions look very much like this note, but they are logically quite different
because of the quantifier “for all b”.
An example where the criteria of the above theorem are satisfied. The violet region is
the span of the columns v1 , v2 , v3 of A, which is the same as the set of all b such that
Ax = b has a solution. If you drag b, the demo will solve Ax = b for you and move
x.
An example where the criteria of the above theorem are not satisfied. The violet line
is the span of the columns v1 , v2 , v3 of A, which is the same as the set of all b such that
Ax = b has a solution. Try dragging b in and out of the column span.
Objectives
2. Understand the difference between the solution set and the column span.
In this section we will study the geometry of the solution set of any matrix
equation Ax = b.
58 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
A homogeneous system always has the solution x = 0. This is called the trivial
solution. Any nonzero solution is called nontrivial.
Observation. In the above example, the last column of the augmented matrix
1 3 4 0
2 −1 2 0
1 0 1 0
will be zero throughout the row reduction process. So it is not really necessary to
write augmented matrices when solving homogeneous systems.
Example (The solution set is a line). What is the solution set of Ax = 0, where
1 −3
A= ?
2 −6
3.4. SOLUTION SETS 59
Ax = 0
Interactive picture of the solution set of Ax = 0. If you drag x along the line spanned
3 3
by 1 , the product Ax is always equal to zero. This is what it means for Span{ 1 }
to be the solution set of Ax = 0.
Since there were two variables in the above example, the solution set is a subset
of R2 . Since one of the variables was free, the solution set is a line.
Example (The solution set is a plane). What is the solution set of Ax = 0, where
1 −1 2
A= ?
−2 2 −4
( x = x − 2x
1 2 3
x2 = x2
x3 = x3.
x1 1 −2
x = x2 = x2 1 + x3 0 .
x3 0 1
This vector equation is called the parametric vector form of the solution set.
Since x 2 and x 3 are allowed to
be anything,
this says that the solution set is the
1 −2
set of all linear combinations of 1 and 0 . In other words, the solution set
0 1
is
1 −2
Span 1 ,
0 .
0 1
Ax = 0
Interactive picture of the solution set of Ax = 0. If you drag x along the violet plane,
the product Ax is always equal to zero. This is what it means for the plane to be the
solution set of Ax = 0.
Since there were three variables in the above example, the solution set is a
subset of R3 . Since two of the variables were free, the solution set is a plane.
x1 − 8x 3 − 7x 4 = 0
§
x 2 + 4x 3 + 3x 4 = 0
62 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
x1 = 8x 3 + 7x 4
x2 = −4x 3 − 3x 4
x = x3
3
x4 = x4.
8 7
x1
x −4 −3
x = 2 = x3 + x4 .
x3 1 0
x4 0 1
This vector equation is called the parametric vector form of the solution set.
Since x 3 and x 4 are allowed to be anything, this says that the solution set is the
8 7
−4 −3
set of all linear combinations of and . In other words, the solution
1 0
0 1
set is
8 7
−4 −3
Span , .
1 0
0 1
Since there were four variables in the above example, the solution set is a subset
of R4 . Since two of the variables were free, the solution set is a plane.
x = x i vi + x j v j + x k vk + · · ·
Example (The solution set is a line). What is the solution set of Ax = b, where
1 −3 −3
A= and b= ?
2 −6 −6
This corresponds to the single equation x 1 −3x 2 = −3. We can write the parametric
form as follows:
x 1 = 3x 2 − 3
§
x 2 = x 2 + 0.
We turn the above system into a vector equation:
x1 3 −3
x= = x2 + .
x2 1 0
This vector equation is called the parametric vector form of the solution set. Since
x 2 is allowed to
be anything, this says that the−3solution set is the set of all scalar
3
multiples of 1 , translated by the vector p = 0 . This is a line which contains p
3
and is parallel to Span{ 1 }: it is a translate of a line. We write the solution set as
§ ª
3 −3
Span + .
1 0
Ax = b
p Ax = 0
64 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Interactive picture of the solution set of Ax = b. If you drag x along the violet line, the
product Ax is always equal to b. This is what it means for the line to be the solution
set of Ax = b.
In the above example, the solution set was all vectors of the form
x1 3 −3
x= = x2 +
x2 1 0
−3
where x 2 is any scalar. The vector p = 0 is also a solution of Ax = b: take
x 2 = 0. We call p a particular solution.
Example (The solution set is a plane). What is the solution set of Ax = b, where
1 −1 2 1
A= and b= ?
−2 2 −4 −2
This vector equation is called the parametric vector form of the solution set.
Since x 2 and x 3 are allowed to
beanything,
this
says that the solution set is the
set
1 −2 1
of all linear combinations of 1 and 0 , translated by the vector p = 0.
1 1 0
3.4. SOLUTION SETS 65
1 −2
This is a plane which contains p and is parallel to Span 1 ,
0 : it is a
1 1
translate of a plane. We write the solution set as
1 −2 1
Span 1 ,
0 + 0 .
1 1 0
Ax = b
Interactive picture of the solution set of Ax = b. If you drag x along the violet plane,
the product Ax is always equal to b. This is what it means for the plane to be the
solution set of Ax = b.
In the above example, the solution set was all vectors of the form
x1 1 −2 1
x = x 2 = x 2 1 + x 3 0 + 0 .
x3 0 1 0
1
where x 2 and x 3 are any scalars. In this case, a particular solution is p = 0.
0
66 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
In the previous example and the example before it, the parametric vector form
of the solution set of Ax = b was exactly the same as the parametric vector form
of the solution set of Ax = 0 (from this example and this example, respectively),
plus a particular solution.
Ax = b
Ax = 0
See the interactive figures in the next subsection for visualizations of the key
observation.
Dimension of the solution set. As in the first subsection, when there is one
free variable in a consistent matrix equation, the solution set is a line—this line
does not pass through the origin when the system is inhomogeneous—when
there are two free variables, the solution set is a plane, etc.
We will develop a rigorous definition of dimension in Section 3.7, but for now
it is important to note that the “dimension” of the solution set of a consistent
system is equal to the number of free variables.
• The solution set: for fixed b, this is the set of all x such that Ax = b.
• The span of the columns of A: this is the set of all b such that Ax = b is
consistent.
68 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Left: the solution set of Ax = b is in violet. Right: the span of the columns of A is
in violet. As you move x, you change b, so the solution set changes—but all solution
sets are parallel planes. If you move b within the span of the columns, the solution
set also changes, and the demo solves the equation to find a particular solution x. If
you move b outside of the span of the columns, the system becomes inconsistent, and
the solution set disappears.
Left: the solution set of Ax = b is in violet. Right: the span of the columns of A is
in violet. As you move x, you change b, so the solution set changes—but all solution
sets are parallel planes. If you move b within the span of the columns, the solution
set also changes, and the demo solves the equation to find a particular solution x. If
you move b outside of the span of the columns, the system becomes inconsistent, and
the solution set disappears.
Objectives
Sometimes the span of a set of vectors is “smaller” than you expect from the
number of vectors, as in the picture below. This means that (at least) one of the
vectors is redundant: it can be removed without affecting the span. In the present
section, we formalize this idea in the notion of linear (in)dependence.
Span{u, v, w}
Span{v, w}
v
u
v
w
w
Pictures of sets of vectors that are linearly dependent. Note that in each case, one
vector is in the span of the others—so it doesn’t make the span bigger.
x 1 v1 + x 2 v2 + · · · + x p vp = 0.
linearly independent?
Solution. Equivalently, we are asking if the homogeneous vector equation
1 1 3 0
x 1 + y −1 + z 1 = 0
1 2 4 0
has a nontrivial solution. We solve this by forming a matrix and row reducing (we
do not augment because of this observation in Section 3.4):
1 1 3 1 0 2
row reduce
1 −1 1 −−−−−→ 0 1 1
1 2 4 0 0 0
This says x = −2z and y = −z. So there exist nontrivial solutions: for instance,
taking z = 1 gives this equation of linear dependence:
1 1 3 0
−2 1 − −1 + 1 = 0 .
1 2 4 0
Move the sliders to solve the homogeneous vector equation in this example. Do you see
why the vectors need to be coplanar in order for there to exist a nontrivial solution?
linearly independent?
Solution. Equivalently, we are asking if the homogeneous vector equation
1 1 3 0
x 1 + y −1 + z 1 = 0
−2 2 4 0
3.5. LINEAR INDEPENDENCE 71
has a nontrivial solution. We solve this by forming a matrix and row reducing (we
do not augment because of this observation in Section 3.4):
1 1 3 1 0 0
row reduce
1 −1 1 −−−−−→ 0 1 0
−2 2 4 0 0 1
This says x = y = z = 0, i.e., the only solution is the trivial solution. We conclude
that the set is linearly independent.
Move the sliders to solve the homogeneous vector equation in this example. Do you
see why the vectors would need to be coplanar in order for there to exist a nontrivial
solution?
x 1 v1 + x 2 v2 + · · · + x p vp = 0
has only the trivial solution, if and only if the matrix equation Ax = 0 has only
the trivial solution, where A is the matrix with columns v1 , v2 , . . . , vp :
| | |
A = v1 v2 · · · vp .
| | |
• The vectors {v1 , v2 , . . . , vp } are linearly independent if and only if the matrix A
with columns v1 , v2 , . . . , vp has a pivot in every column, if and only if Ax = 0
has only the trivial solution.
• Solving the matrix equatiion Ax = 0 will either verify that the columns v1 , v2 , . . . , vp
are linearly independent, or will produce a linear dependence relation by sub-
stituting any nonzero values for the free variables.
72 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Suppose that A has more columns than rows. Then A cannot have a pivot in
every column (it has at most one pivot per row), so its columns are automatically
linearly dependent.
1. Two vectors are linearly dependent if and only if they are collinear, i.e., one is
a scalar multiple of the other.
Proof.
1 · v1 + 0 · v2 + · · · + 0 · vp = 0.
x 1 v1 + x 2 v2 + · · · + x r vr = 0,
With regard to the first fact, note that the zero vector is a multiple of any vector,
so it is collinear with any other vector. Hence facts 1 and 2 are consistent with each
other.
3.5. LINEAR INDEPENDENCE 73
The previous theorem makes precise in what sense a set of linearly dependent
vectors is redundant.
with not all of x j+1 , . . . , x p equal to zero. Suppose for simplicity that x p 6= 0. Then
we can rearrange:
1
vp = − x 1 v1 + x 2 v2 + · · · + x j−1 v j−1 − v j + x j+1 v j+1 + · · · + x p−1 vp−1 .
xp
This says that vp is in the span of {v1 , v2 , . . . , vp−1 }, which contradicts our assump-
tion that v j is the last vector in the span of the others.
If you make a set of vectors by adding one vector at a time, and if the span got
bigger every time you added a vector, then your set is linearly independent.
Span{v}
• Neither is in the span of the other, so we can apply the first criterion.
• The span got bigger when we added w, so we can apply the increasing span
criterion.
Span{w}
v
w
Span{v}
76 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
• The span did not increase when we added u, so we can apply the increasing
span criterion.
In the picture below, note that v is in Span{u, w}, and w is in Span{u, v}, so we
can remove any of the three vectors without shrinking the span.
Span{w}
Span{v, w}
v
w
u
Span{v}
• We can remove w without shrinking the span, so we can apply the second
criterion.
• The span did not increase when we added w, so we can apply the increasing
span criterion.
3.5. LINEAR INDEPENDENCE 77
Span{v}
These three vectors {v, w, u} are linearly dependent: indeed, {v, w} is already
linearly dependent, so we can use the third fact.
u
Span{v}
Move the vector heads and the demo will tell you if they are linearly independent and
show you their span.
Move the vector heads and the demo will tell you that they are linearly dependent and
show you their span.
The two vectors {v, w} below are linearly independent because they are not
collinear.
Span{w}
Span{v}
The three vectors {v, w, u} below are linearly independent: the span got bigger
when we added w, then again when we added u, so we can apply the increasing
span criterion.
3.5. LINEAR INDEPENDENCE 79
Span{v, w}
Span{w}
v
u
Span{v}
• We can remove u without shrinking the span, so we can apply the second
criterion.
• The span did not increase when we added u, so we can apply the increasing
span criterion.
80 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Span{v, w}
Span{w}
u v
Span{v}
Note that three vectors are linearly dependent if and only if they are coplanar.
Indeed, {v, w, u} is linearly dependent if and only if one vector is in the span of the
other two, which is a plane (or a line) (or {0}).
The four vectors {v, w, u, x} below are linearly dependent: they are the columns
of a wide matrix. Note however that u is not contained in Span{v, w, x}. See this
warning.
3.5. LINEAR INDEPENDENCE 81
Span{v, w}
Span{w}
x v
u
Span{v}
The vectors {v, w, u, x} are linearly dependent, but u is not contained in Span{v, w, x}.
Move the vector heads and the demo will tell you if they are linearly independent and
show you their span.
Move the vector heads and the demo will tell you if they are linearly independent and
show you their span.
Then we can delete the columns of A without pivots (the columns corresponding to
the free variables), without changing Span{v1 , v2 , . . . , vp }.
The pivot columns are linearly independent, so we cannot delete any more columns.
Proof. If the matrix is in reduced row echelon form:
1 0 2 0
A = 0 1 3 0
0 0 0 1
then the column without a pivot is visibly in the span of the pivot columns:
2 1 0 0
3 = 2 0 + 3 1 + 0 0 ,
0 0 0 1
If the matrix is not in reduced row echelon form, then we row reduce:
1 7 23 3 1 0 2 0
RREF
A= 2 4 16 0 −−→ 0 1 3 0 .
−1 −2 −8 4 0 0 0 1
The following two vector equations have the same solution set, as they come from
row-equivalent matrices:
1 7 23 3
x 1 2 + x 2 4 + x 3 16 + x 4 0 = 0
−1 −2 −8 4
1 0 2 0
x 1 0 + x 2 1 + x 3 3 + x 4 0 = 0.
0 0 0 1
We conclude that
23 1 7 3
16 = 2 2 + 3 4 + 0 0
−8 −1 −2 4
3.5. LINEAR INDEPENDENCE 83
and that
1 7 3
x 1 2 + x 2 4 + x 4 0 = 0
−1 −2 4
Note that it is necessary to row reduce A to find which are its pivot columns.
However, the span of the columns of the row reduced matrix is generally not equal
to the span of the columns of A. See theorem in Section 3.7 for a restatement of
the above theorem.
Therefore, the first two columns of A are the pivot columns, so we can delete the
others without changing the span:
1 2 1 2 0 −1
Span −2 , −3
= Span −2 , −3 , 4 ,
5 .
2 4 2 4 0 −2
Pivot Columns and Dimension. Let d be the number of pivot columns in the
matrix
| | |
A = v1 v2 · · · vp .
| | |
• Et cetera.
84 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
3.6 Subspaces
Objectives
x + 3y + z = 0
is a subset of R2 .
Above we expressed C in set builder notation.
• If u, v are vectors in V and x, y are scalars, then xu, y v are also in V by the
third property, so xu + y v is in V by the second property. Therefore, all of
Span{u, v} is contained in V
If you choose enough vectors, then eventually their span will fill up V , so we
already see that a subspace is a span.
Example. The set Rn is a subspace of itself: indeed, it contains zero, and is closed
under addition and scalar multiplication.
Example. The set {0} containing only the zero vector is a subspace of Rn : it con-
tains zero, and if you add zero to itself or multiply it by a scalar, you always get
zero.
Example (A line through the origin). A line L through the origin is a subspace.
Example (A plane through the origin). A plane P through the origin is a subspace.
86 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Non-example (A line not containing the origin). A line L (or any other subset)
that does not contain the origin is not a subspace. It fails the first defining property:
every subspace contains the origin by definition.
Non-example (A circle). The unit circle C is not a subspace. It fails all three
defining properties: it does not contain the origin, it is not closed under addition,
and it is not closed under scalar multiplication. In the picture, one red vector is
the sum of the two black vectors (which are contained in C), and the other is a
scalar multiple of a black vector.
Non-example (A line union a plane). The union of a line and a plane in R3 is not
a subspace. It contains the origin and is closed under scalar multiplication, but it
is not closed under addition: the sum of a vector on the line and a vector on the
plane is not contained in the line or in the plane.
In order to verify that a subset of Rn is in fact a subspace, one has to check the
three defining properties. That is, unless the subset has already been verified to
be a subspace: see this important note below.
Solution. First we point out that the condition “2a = 3b” defines whether or not
a
a vector is in V : that is, to say b is in V means that 2a = 3b. In other words, a
vector is in V if twice its first coordinate equals three times its second coordinate.
0
Let us check the first property. The subset V does contain the zero vector 0 ,
because 2 · 0 = 3 · 0.
Next we check the second property. To show that V is closed under addition,
a c
we have to check that for any vectors u = b and v = d in V , the sum u + v is in
V . Since we cannot assume anything else about u and v, we must treat them as
unknowns.
We have
a+c
a c
+ = .
b d b+d
a+c
To say that b+d is contained in V means that 2(a + c) = 3(b + d), or 2a + 2c =
3b + 3d. The one thing we are allowed to assume about u and v is that 2a = 3b
and 2c = 3d, so we see that u + v is indeed contained in V .
Next we check the third property. To show that V is closed under scalar multi-
a
plication, we have to check that for any vector v = b in V and any scalar c in R,
the product cv is in V . Again, we must treat v and c as unknowns. We have
a ca
c = .
b cb
ca
To say that c b is contained in V means that 2(ca) = 3(c b), i.e., that c · 2a = c · 3b.
The one thing we are allowed to assume about v is that 2a = 3b, so cv is indeed
contained in V .
Since V satisfies all three defining properties, it is a subspace. In fact, it is the
line through the origin with slope 2/3.
Is V a subspace?
Solution. First we point out that the condition “ab = 0” defines whether or not
a
a vector is in V : that is, to say b is in V means that ab = 0. In other words, a
vector is in V if the product of its coordinates is zero, i.e., if one (or both) of its
coordinates are zero.
0
Let us check the first property. The subset V does contain the zero vector 0 ,
because 0 · 0 = 0.
Next we check the third property. To show that V is closed under scalar multi-
a
plication, we have to check that for any vector v = b in V and any scalar c in R,
the product cv is in V . Since we cannot assume anything else about v and c, we
must treat them as unknowns.
We have
a ca
c = .
b cb
ca
To say that c b is contained in V means that (ca)(c b) = 0. Rewriting, this means
c 2 (a b) = 0. The one thing we are allowed to assume about v is that ab = 0, so we
see that cv is indeed contained in V .
Next we check the second property. It turns out that V is not closed under
addition; to verify this, we must show that there exists some vectors u, v in V such
that u + v is not contained in V . The easiest way
to do so is to produce examples
1 0
of such vectors. We can take u = 0 and v = 1 ; these are contained in V because
the products of their coordinates are zero, but
1 0 1
+ =
0 1 1
V
90 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
2. If u = a1 v1 + a2 v2 + · · · + a p vp and v = b1 v1 + b2 v2 + · · · + b p vp are in
Span{v1 , v2 , . . . , vp }, then
is also in Span{v1 , v2 , . . . , vp }.
cv = ca1 v1 + ca2 v2 + · · · + ca p vp
is also in Span{v1 , v2 , . . . , vp }.
• The null space of A is the set of all solutions of the homogeneous equation
Ax = 0:
Nul(A) = x in Rn Ax = 0 .
This is a subspace of Rn .
Proof. We have to verify the three defining properties. To say that a vector v is in
Nul(A) means that Av = 0.
A(u + v) = Au + Av = 0 + 0 = 0
A(cv) = cAv = c · 0 = 0
This is a line in R3 .
Col(A)
Nul(A)
Notice that the column space is a subspace of R3 , whereas the null space is a
subspace of R2 . This is because A has three rows and two columns.
The column space and the null space of a matrix are both subspaces, so they
are both spans. The column space of a matrix A is defined to be the span of the
columns of A. The null space is defined to be the solution set of Ax = 0, so this
is a good example of a kind of subspace that we can define without any spanning
set in mind. In order to do computations, however, it is usually necessary to find a
spanning set.
To be clear: the null space is the solution set of a (homogeneous) system of
equations. For example, the null space of the matrix
1 7 2
A = −2 1 3
4 −2 −3
is the solution set of Ax = 0, i.e., the solution set of the system of equations
x + 7 y + 2z = 0
(
−2x + y + 3z = 0
4x − 2 y − 3z = 0.
To find a spanning set for the null space, one has to solve this system of equations.
Recipe: Compute a spanning set for a null space. To find a spanning set
for Nul(A), compute the parametric vector form of the solutions to the homo-
geneous equation Ax = 0. The vectors attached to the free variables form a
spanning set for Nul(A).
3.6. SUBSPACES 93
Example (Two free variables). Find a spanning set for the null space of the matrix
2 3 −8 −5
A= .
−1 2 −3 −8
1 0 −1 2
.
0 1 −2 −3
The free variables are x 3 and x 4 ; the parametric form of the solution set is
x1 = x 3 − 2x 4
1 −2
x1
x2 = 2x 3 + 3x 4 x2 2 3
parametric
−−−−−→ x = x 3 1 + x 4 0 .
x = x3 vector form 3
3
x4 = x4 x4 0 1
Therefore,
1 −2
2 3
Nul(A) = Span , .
1 0
0 1
Example (No free variables). Find a spanning set for the null space of the matrix
1 3
A= .
2 4
1 0
.
0 1
There are no free variables; hence the only solution of Ax = 0 is the trivial solution.
In other words,
Nul(A) = {0} = Span{0}.
94 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
• Can you verify directly that it satisfies the three defining properties of a
subspace?
Objectives
2. Recipes: basis for a column space, basis for a null space, basis of a span.
1. V = Span{v1 , v2 , . . . , vm }, and
Recall that a set of vectors is linearly independent if and only if, when you re-
move any vector from the set, the span shrinks (Theorem 3.5.12). In other words,
if {v1 , v2 , . . . , vm } is a basis of a subspace V , then no proper subset of {v1 , v2 , . . . , vm }
will span V : it is a minimal spanning set.
A subspace generally has infinitely many different bases, but they all contain
the same number of vectors.
We leave it as an exercise to prove that any two bases have the same number of
vectors; one might want to wait until after learning the invertible matrix theorem
in Section 4.5.
then x = y = 0.
0
1
1
0
1
1
1
0
Example (The standard basis of Rn ). One shows exactly as in the above example
that the standard coordinate vectors
1 0 0 0
0 1 0 0
.
.. , e2 = ... , . . . , en−1 = ... , en = ...
e1 =
0 0 1 0
0 0 0 1
form a basis for Rn . This is sometimes known as the standard basis.
In particular, Rn has dimension n.
3.7. BASIS AND DIMENSION 97
Example. The previous example implies that any basis for Rn has n vectors in
it. Let v1 , v2 , . . . , vn be vectors in Rn , and let A be the n × n matrix with columns
v1 , v2 , . . . , vn .
1. To say that {v1 , v2 , . . . , vn } spans Rn means that A has a pivot in every row:
see this theorem in Section 3.3.
Since A is a square matrix, it has a pivot in every row if and only if it has a pivot
in every column. We will see in Section 4.5 that the above two conditions are
equivalent to the invertibility of the matrix A.
Example. Let
x −3 0
V= y in R x + 3 y + z = 0
3
B= 1 , 1 .
z 0 −3
3. Linearly independent:
−3 0 −3c1 0
c1 1 + c2 1 = 0 =⇒ c1 + c2 = 0 =⇒ c1 = c2 = 0.
0 −3 −3c2 0
Alternatively, one can observe that the two vectors are not collinear.
98 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Since V has a basis with two vectors, it has dimension two: it is a plane.
A picture of the plane V and its basis B = {v1 , v2 }. Note that B spans V and is linearly
independent.
Now we show how to find bases for the types of subspaces we encountered in
Section 3.6, namely: a span, the column space of a matrix, and the null space of
a matrix.
A basis for the column space First we show how to compute a basis for the
column space of a matrix.
The above theorem is referring to the pivot columns in the original matrix, not
its reduced row echelon form. Indeed, a matrix and its reduced row echelon
form generally have different column spaces. For example, in the matrix A
below:
1 2 0 −1 1 0 −8 −7
A= −2 −3 4 5 RREF
−−→ 0 1 4 3
2 4 0 −2 0 0 0 0
the pivot columns are the first two columns, so a basis for Col(A) is
1 2
−2 , −3 .
2 4
The first two columns of the reduced row echelon form certainly span a dif-
ferent subspace, as
1 0 a
Span 0 , 1 = b a, b in R ,
0 0 0
A basis of a span Computing a basis for a span is the same as computing a basis
for a column space. Indeed, the span of finitely many vectors v1 , v2 , . . . , vm is the
column space of a matrix, namely, the matrix A whose columns are v1 , v2 , . . . , vm :
| | |
A = v1 v2 · · · vm .
| | |
Example (Another basis of the same span). Find a basis of the subspace
1 2 0 −1
V = Span −2 , −3 , 4 , 5
2 4 0 −2
which does not consist of the first two vectors, as in the previous example.
Solution. The point of this example is that the above theorem gives one basis for
V ; as always, there are infinitely more.
Reordering the vectors, we can express V as the column space of
0 −1 1 2
A0 = 4 5 −2 −3 .
0 −2 2 4
These are the last two vectors in the given spanning set.
A basis for the null space In order to compute a basis for the null space of a
matrix, one has to find the parametric vector form of the solutions of the homoge-
neous equation Ax = 0.
Theorem. The vectors attached to the free variables in the parametric vector form of
the solution set of Ax = 0 form a basis of Nul(A).
In lieu of a proof, we illustrate the theorem with an example, and we leave it
to the reader to generalize the argument.
Example (A basis of a null space). Find a basis of the null space of the matrix
0 −1 1 2
A = 4 5 −2 −3 .
0 −2 2 4
Hence the parametric form and parametric vector form of the solutions of Ax = 0
are
x 1 = 8x 3 + 7x 4
8 7
x 2 = −4x 3 − 3x 4 −4 −3
=⇒ x = x3 + x4 .
x3 = x3 1 0
x4 =
x4 0 1
Every solution of Ax = 0 has the above form for some values of x 3 , x 4 : this is the
point of the parametric vector form. It follows that
8 7
−4 −3
Nul(A) = Span , .
1 0
0 1
0 8 7 8x 3 + 4x 4
0 −4 −3 −4x 3 − 3x 4
0 = x 3 1 + x 4 0 = x3 .
0 0 1 x4
102 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
Objectives
x = c1 v1 + c2 v2 + · · · + cm vm
x = c1 v1 + c2 v2 + · · · + cm vm
x = c10 v1 + c20 v2 + · · · + cm
0
vm .
Since B is linearly independent, the only solution to the above equation is the
trivial solution: all the coefficients must be zero. It follows that ci − ci0 for all i,
which proves that c1 = c10 , c2 = c20 , . . . , cm = cm
0
.
3.8. BASES AS COORDINATE SYSTEMS 103
Example. Consider the standard basis of R3 from this example in Section 3.7:
1 0 0
e1 = 0 , e2 = 1 , e3 = 0 .
0 0 1
According to the above fact, every vector in R3 can be written as a linear combi-
nation of e1 , e2 , e3 , with unique coefficients. For example,
3 1 0 0
v = 5 = 3 0 + 5 1 − 2 0 = 3e1 + 5e2 − 2e3 .
−2 0 0 1
cm
If we change the basis, then we can still give instructions for how to get to the
point (3, 5, −2), but the instructions will be different. Say for example we take the
basis
1 0 0
v1 = e1 + e2 = 1 , v2 = e2 = 1 , v3 = e3 = 0 .
0 0 1
We can write (3, 5, −2) in this basis as 3v1 + 2v2 − 2v3 . In other words: start at
the origin, travel northeast 3 times as far as v1 , then 2 units east, then 2 units
down. In this situation, we can say that “3 is the v1 -coordinate of (3, 5, −2), 2 is
the v2 -coordinate of (3, 5, −2), and −2 is the v3 -coordinate of (3, 5, −2).”
The above definition gives a way of using Rm to label the points of a subspace of
dimension m: a point is simply labeled by its B-coordinate vector. For instance,
if we choose a basis for a plane, we can label the points of that plane with the
points of R2 .
104 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
v1
v2 w
3.8. BASES AS COORDINATE SYSTEMS 105
This picture could be the grid of streets in Palo Alto, California. Residents of
Palo Alto refer to northwest as “north” and to northeast as “east”. There is a reason
for this: the old road to San Francisco is called El Camino Real, and that road runs
from the southeast to the northwest in Palo Alto. So when a Palo Alto resident
says “go south two blocks and east one block”, they are giving directions from the
origin to the Whole Foods at w.
A picture of the basis B = {v1 , v2 } of R2 . The grid indicates the coordinate system
defined by the basis B; one set of lines measures the v1 -coordinate, and the other set
measures the v2 -coordinate. Use the sliders to find the B-coordinates of w.
Example. Let
2 1
v1 = −1 v2 = 0 .
1 −1
u3
v1
u1 v2 u2
u4
Left: the B-coordinates of a vector x. Right: the vector x. The violet grid on the right
is a picture of the coordinate system defined by the basis B; one set of lines measures
the v1 -coordinate, and the other set measures the v2 -coordinate. Drag the heads of the
vectors x and [x]B to understand the correspondence between x and its B-coordinate
vector.
Solution.
2
We have c1 = 2 and c2 = 3, so v = 2v1 + 3v2 and [v]B = 3 .
3.8. BASES AS COORDINATE SYSTEMS 107
A picture of the plane V and the basis B = {v1 , v2 }. The violet grid is a picture of the
coordinate system defined by the basis B; one set of lines measures the v1 -coordinate,
and the other set measures the v2 -coordinate. Use the sliders to find the B-coordinates
of v.
Solution.
1. We write V as the column space of a matrix A, then row reduce to find the
pivot columns, as in this example in Section 3.7.
2 −1 2 1 0 2
RREF
A = 3 1 8 −−→ 0 1 2.
2 1 6 0 0 0
The first two columns are pivot columns, so we can take B = {v1 , v2 } as our
basis for V .
3
We have c1 = 3 and c2 = 2, so x = 3v1 + 2v2 , and thus [x]B = 2 .
A picture of the plane V and the basis B = {v1 , v2 }. The violet grid is a picture of the
coordinate system defined by the basis B; one set of lines measures the v1 -coordinate,
and the other set measures the v2 -coordinate. Use the sliders to find the B-coordinates
of x.
108 CHAPTER 3. SYSTEMS OF LINEAR EQUATIONS: GEOMETRY
cm
x = c1 v1 + c2 v2 + · · · + cm vm
x = c1 v1 + c2 v2 + · · · + cm vm .
Objectives
In this section we present two important general facts about dimensions and
bases.
3.9. THE RANK THEOREM AND THE BASIS THEOREM 109
Definition. The rank of a matrix A, written rank(A), is the dimension of the column
space Col(A).
The nullity of a matrix A, written nullity(A), is the dimension of the null space
Nul(A).
According to this theorem in Section 3.7, the rank of A is equal to the number of
columns with pivots. On the other hand, this theorem in Section 3.7 implies that
nullity(A) equals the number of free variables, which is the number of columns
without pivots. To summarize:
Clearly (the number of columns with pivots) plus (the number of columns without
pivots) equals (the number of columns of A), so we have proved the following
theorem.
rank(A) + nullity(A) = n.
Example (The rank is 2 and the nullity is 2). Consider the following matrix and
its reduced row echelon form:
1 2 0 −1 1 0 −8 −7
A= −2 −3 4 5 RREF
−−→ 0 1 4 3
2 4 0 −2 0 0 0 0
so nullity(A) = 2.
In this case, the rank theorem says that 2 + 2 = 4, where 4 is the number of
columns.
This 3 × 3 matrix has rank 1 and nullity 2. The violet plane on the left is the null
space, and the violet line on the right is the column space.
This 3 × 3 matrix has rank 2 and nullity 1. The violet line on the left is the null space,
and the violet plane on the right is the column space.
Then {v1 , v2 , . . . , vm+k } is a basis for V , which implies that dim(V ) = m + k > m.
But we were assuming that V has dimension m, so B must have already been a
basis.
Now suppose that B = {v1 , v2 , . . . , vm } spans V . If B is not linearly independent,
then by this theorem in Section 3.5, we can remove some number of vectors from B
without shrinking its span. After reordering, we can assume that we removed the
last k vectors without shrinking the span, and that we cannot remove any more.
Now V = Span{v1 , v2 , . . . , vm−k }, and {v1 , v2 , . . . , vm−k } is a basis for V because it is
linearly independent. This implies that dim V = m−k < m. But we were assuming
that dim V = m, so B must have already been a basis.
In other words, if you already know that dim V = m, and if you have a set of
m vectors B = {v1 , v2 , . . . , vm } in V , then you only have to check one of:
1. B is linearly independent, or
2. B spans V ,
in order for B to be a basis of V . If you did not already know that dim V = m, then
you would have to check both properties.
For example, if V is a plane, then any two noncollinear vectors in V form a
basis.
Example (Yet another basis for a span). Find a basis of the subspace
1 2 0 −1
V = Span −2 , −3 , 4 ,
5
2 4 0 −2
which is different from the bases in this example in Section 3.7 and this example
in Section 3.7.
Solution. We know from the previous examples that dim V = 2. By the basis
theorem, it suffices to find any two noncollinear vectors in V . We write two linear
combinations of the four given spanning vectors, chosen at random:
1 2 3 2 0 −2
1
w1 = −2 + −3 = −5
w2 = − −3 +
4 = 5 .
2 4 6 4 2 0 −4
Primary Goal. Learn about linear transformations and their relationship to ma-
trices.
In practice, one is often lead to ask questions about the geometry of a trans-
formation: a function that takes an input and produces an output. This kind of
question can be answered by linear algebra if the transformation can be expressed
by a matrix.
Example. Suppose you are building a robot arm with three joints that can move
its hand around a plane, as in the following picture.
x
= f (θ , φ, ψ)
ψ y
113
114 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
Objectives
1. Learn to view a matrix geometrically as a function.
The set of all possible output vectors are the vectors b such that Ax = b has some
solution; this is the same as the column space of A by this note in Section 3.3.
x
Ax Col(A)
b = Ax
Rn Rm
Interactive: A 2 × 3 matrix.
Interactive: A 3 × 2 matrix.
Multiplication by A simply sets the z-coordinate equal to zero: it projects onto the
x y-plane.
Multiplication by the matrix A projects a vector onto the x y-plane. Move the input
vector x to see how the output vector b changes.
b = Ax
Multiplication by the matrix A reflects over the y-axis. Move the input vector x to see
how the output vector b changes.
b = Ax
Multiplication by the matrix A dilates the plane by a factor of 1.5. Move the input
vector x to see how the output vector b changes.
118 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
Multiplication by A does not change the input vector at all: it is the identity trans-
formation which does nothing.
b = Ax
Multiplication by the matrix A does not move the vector x: that is, b = Ax = x. Move
the input vector x to see how the output vector b changes.
We substitute a few test points in order to understand the geometry of the trans-
formation:
4.1. MATRIX TRANSFORMATIONS 119
1 −2
A =
2 1
−1 −1
A =
1 −1
0 2
A =
−2 0
b = Ax
x+y
x 1 1 x
A = = .
y 0 1 y y
b = Ax
Multiplication by the matrix A adds the y-coordinate to the x-coordinate. Move the
input vector x to see how the output vector b changes.
4.1.2 Transformations
At this point it is convenient to fix our ideas and terminology regarding functions,
or transformations. This allows us to systematize our discussion of matrices as
functions.
x
range
T (x)
T
Rn Rm
domain codomain
The points of the domain Rn are the inputs of T : this simply means that it makes
sense to evaluate T on lists of n numbers. Likewise, the points of the codomain
Rm are the outputs of T : this means that the result of evaluating T is always a list
of m numbers.
The range of T is the set of all vectors in the codomain that actually arise as
outputs of the function T , for some input. In other words, the range is all vectors
b in the codomain such that T (x) = b has a solution x in the domain.
Example (A Function of one variable). Most of the functions you may have seen
previously have domain and codomain equal to R = R1 . For example,
the length of the opposite
edge over the hypotenuse of
sin: R −→ R sin(x) = .
a right triangle with angle x
in radians
Notice that we have defined sin by a rule: a function is defined by specifying what
the output of the function is for any possible input.
You may be used to thinking of such functions in terms of their graphs:
(x, sin x)
In this case, the horizontal axis is the domain, and the vertical axis is the
codomain. This is useful when the domain and codomain are R, but it is hard
122 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
to do when, for instance, the domain is R2 and the codomain is R3 . The graph of
such a function is a subset of R5 , which is difficult to visualize. For this reason, we
will rarely graph a transformation.
Note that the range of sin is the interval [−1, 1]: this is the set of all possible
outputs of the sin function.
The inputs of f each have two entries, and the outputs have three entries. In this
case, we have defined f by a formula, so we evaluate f by substituting values for
the variables:
2+3 5
2
f = cos(3) = cos(3) .
3
3 − 22 −1
In other words, f takes a vector with three entries, then rotates it; hence the ouput
of f also has three entries. In this case, we have defined f by a geometric rule.
In other words, the identity transformation does not move its input vector: the
output is the same as the input. Its domain and codomain are both Rn , and its
range is Rn as well, since every vector in Rn is the output of itself.
x
= f (θ , φ, ψ)
ψ y
and define T (x) = Ax. The domain of T is R3 , and the codomain is R2 . The range
of T is the column space; since all three columns are collinear, the range is a line
in R2 .
and define T (x) = Ax. The domain of T is R2 , and the codomain is R3 . The range
of T is the column space; since A has two columns which are not collinear, the
range is a plane in R3 .
and let T (x) = Ax. What are the domain, the codomain, and the range of T ?
The inputs and outputs have three entries, so the domain and codomain are
both R3 . The possible outputs all lie on the x y-plane, and every point on the x y-
plane is an output of T (with itself as the input), so the range of T is the x y-plane.
Be careful not to confuse the codomain with the range here. The range is a
plane, but it is a plane in R3 , so the codomain is still R3 . The outputs of T all have
three entries; the last entry is simply always zero.
126 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
In each case, the associated matrix transformation T (x) = Ax has domain and
codomain equal to R2 . The range is also R2 , as can be seen geometrically (what
is the input for a given output?), or using the fact that the columns of A are not
collinear (so they form a basis for R2 ).
3. Does there exist a vector w in R3 such that there is more than one v in R2
with T (v) = w?
Note: all of the above questions are intrinsic to the transformation T : they
make sense to ask whether or not T is a matrix transformation. See the next ex-
ample. As T is in fact a matrix transformation, all of these questions will translate
into questions about the corresponding matrix A.
Solution.
4.1. MATRIX TRANSFORMATIONS 127
2. Let
7
b = 1 .
7
3. Does there exist a vector w in R3 such that there is more than one v in R2
with T (v) = w?
Note: we asked (almost) the exact same questions about a matrix transforma-
tion in the previous example. The point of this example is to illustrate the fact
that the questions make sense for a transformation that has no hope of coming
from a matrix. In this case, these questions do not translate into questions about
a matrix; they have to be answered in some other way.
Solution.
2. We have
e7 ln(e7 ) 7
T 2πn = cos(2πn) = 1
e7 ln(e7 ) 7
for any whole number n. Hence there are infinitely many such vectors.
Objectives
In this section, we discuss two of the most basic questions one can ask about a
transformation: whether it is one-to-one and/or onto. For a matrix transformation,
we translate these questions into the language of matrices.
one-to-one
x T (x)
T ( y)
y
T (z)
range
z
T
Rn Rm
130 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
• There exists some vector b in Rm such that the equation T (x) = b has more
than one solution x in Rn .
not one-to-one
x T (x) = T ( y)
range
z
T (z)
T
Rn Rm
x
= f (θ , φ, ψ)
ψ y
1. T is one-to-one.
Recall that equivalent means that, for a given matrix, either all of the statements
are true simultaneously, or they are all false.
A picture of the matrix transformation T . As you drag the input vector on the left
side, you see that different input vectors yield different output vectors on the right
side.
Geometrically, T is projection onto the x y-plane. Any two vectors that lie on
the same vertical line will have the same projection. For b on the x y-plane, the
solution set of T (x) = b is the entire vertical line containing b. In particular,
T (x) = b has infinitely many solutions.
A picture of the matrix transformation T . The violet line is the null space of A, i.e.,
solution set of T (x) = 0. If you drag x along the violet line, the output T (x) = Ax
does not change. This demonstrates that T (x) = 0 has more than one solution, so T
is not one-to-one.
x − y + 2z = 0 =⇒ x = y − 2z.
The free variables are y and z. Taking y = 1 and z = 0 gives the nontrivial solution
1 1 0
1 −1 2
T 1 = 1 = 0 = T 0 .
−2 2 −4
0 0 0
A picture of the matrix transformation T . The violet plane is the solution set of
T (x) = 0. If you drag x along the violet plane, the output T (x) = Ax does not
change. This demonstrates that T (x) = 0 has more than one solution, so T is not
one-to-one.
1 0 0
0 1 0
0 0 0
0 0 0
x onto
range(T )
T (x)
T
Rn Rm = codomain
• There exists a vector b in Rm such that the equation T (x) = b does not have
a solution.
• There is a vector in the codomain that is not the output of any input vector.
not onto
T (x)
x range(T )
T
Rn Rm = codomain
136 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
Example (Functions of one variable). The function sin: R → R is not onto. Indeed,
taking b = 2, the equation sin(x) = 2 has no solution. The range of sin is the closed
interval [−1, 1], which is smaller than the codomain R.
The function exp: R → R defined by exp(x) = e x is not onto. Indeed, taking
b = −1, the equation exp(x) = e x = −1 has no solution. The range of exp is the
set (0, ∞) of all positive real numbers.
The function f : R → R defined by f (x) p = x 3 is onto. Indeed, the equation
3
f (x) = x = b always has the solution x = b.
3
A picture of the matrix transformation T . Every vector on the right side is the output
of T for a suitable input. If you drag b, the demo will find an input vector x with
output b.
4.2. ONE-TO-ONE AND ONTO TRANSFORMATIONS 137
Hence A does not have a pivot in every row, so T is not onto. In fact, since
1 0 x
x x
T = 0 1 = y ,
y y
1 0 x
we see that for every output vector of T , the third entry is equal to the first. There-
fore,
b = (1, 2, 3)
is not in the range of T .
A picture of the matrix transformation T . The range of T is the violet plane on the
right; this is smaller than the codomain R3 . If you drag b off of the violet plane, then
the equation Ax = b becomes inconsistent; this means T (x) = b has no solution.
There is not a pivot in every row, so T is not onto. The range of T is the column
space of A, which is equal to
§ ª § ª
1 −1 2 1
Span , , = Span ,
−2 2 −4 −2
138 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
A picture of the matrix transformation T . The range of T is the violet line on the
right; this is smaller than the codomain R2 . If you drag b off of the violet line, then
the equation Ax = b becomes inconsistent; this means T (x) = b has no solution.
The previous two examples illustrate the following observation. Suppose that
T (x) = Ax is a matrix transformation that is not onto. This means that range(T ) =
Col(A) is a subspace of Rm of dimension less than m: perhaps it is a line in the
plane, or a line in 3-space, or a plane in 3-space, etc. Whatever the case, the range
of T is very small compared to the codomain. To find a vector not in the range of
T , choose a random nonzero vector b in Rm ; you have to be extremely unlucky
to choose a vector that is in the range of T . Of course, to check whether a given
vector b is in the range of T , you have to solve the matrix equation Ax = b to see
whether it is consistent.
4.2.3 Comparison
The above expositions of one-to-one and onto transformations were written to
mirror each other. However, “one-to-one” and “onto” are complementary notions:
neither one implies the other. Below we have provided a chart for comparing the
two. In the chart, A is an m × n matrix, and T : Rn → Rm is the matrix transforma-
tion T (x) = Ax.
4.2. ONE-TO-ONE AND ONTO TRANSFORMATIONS 139
T is one-to-one T is onto
T (x) = b has at most one solution T (x) = b has at least one solution
for every b. for every b.
The columns of A are linearly The columns of A span Rm .
independent.
A has a pivot in every column. A has a pivot in every row.
A picture of the matrix transformation T . The violet plane is the solution set of
T (x) = 0. If you drag x along the violet plane, the output T (x) = Ax does not
change. This demonstrates that T (x) = 0 has more than one solution, so T is not
one-to-one. The range of T is the violet line on the right; this is smaller than the
codomain R2 . If you drag b off of the violet line, then the equation Ax = b becomes
inconsistent; this means T (x) = b has no solution.
Example (A matrix transformation that is one-to-one but not onto). Let A be the
matrix
1 0
A = 0 1,
1 0
and define T : R2 → R3 by T (x) = Ax. This transformation is one-to-one but not
onto, as we saw in this example and this example.
Example (A matrix transformation that is onto but not one-to-one). Let A be the
matrix
1 1 0
A= ,
0 1 1
and define T : R3 → R2 by T (x) = Ax. This transformation is onto but not one-to-
one, as we saw in this example and this example.
A picture of the matrix transformation T . Every vector on the right side is the output of
T for a suitable input. If you drag b, the demo will find an input vector x with output
b. The violet line is the null space of A, i.e., solution set of T (x) = 0. If you drag x
along the violet line, the output T (x) = Ax does not change. This demonstrates that
T (x) = 0 has more than one solution, so T is not one-to-one.
Example (Matrix transformations that are both one-to-one and onto). In this sub-
section in Section 4.1, we discussed the transformations defined by several 2 × 2
matrices, namely:
−1 0
Reflection: A=
0 1
1.5 0
Dilation: A=
0 1.5
1 0
Identity: A=
0 1
0 −1
Rotation: A=
1 0
1 1
Shear: A= .
0 1
One-to-one is the same as onto for square matrices. We observed in the previ-
ous example that a square matrix has a pivot in every row if and only if it has a
pivot in every column. Therefore, a matrix transformation T from Rn to itself is
one-to-one if and only if it is onto: in this case, the two notions are equivalent.
Conversely, by this note and this note, if a matrix transformation T : Rm → Rn
is both one-to-one and onto, then m = n.
Objectives
For example, we saw in this example in Section 4.1 that the matrix transformation
0 −1
2
T : R −→ R 2
T (x) = x
1 0
is a counterclockwise rotation of the plane by 90◦ . However, we could have defined
T in this way:
Given this definition, it is not at all obvious a priori that T is a matrix transforma-
tion, or what matrix it is associated to.
T (u + v) = T (u) + T (v)
T (cu) = cT (u)
for all vectors u, v in Rn and all scalars c. Since a matrix transformation satisfies
the two defining properties, it is a linear transformation
We will see in the next subsection that a linear transformation is a matrix trans-
formation; we just haven’t computed its matrix yet.
Facts about linear transformations. Let T : Rn → Rm be a linear transformation.
Then:
1. T (0) = 0.
Proof.
1. Since 0 = −0, we have
by the second defining property. The only vector w such that w = −w is the
zero vector.
4.3. LINEAR TRANSFORMATIONS 143
In engineering, the second fact is called the superposition principle. For exam-
ple, T (cu+d v) = cT (u)+d T (v) for any vectors u, v and any scalars c, d. To restate
the first fact:
A linear transformation necessarily takes the zero vector to the zero vector.
T T (v)
T (u + v)
u+v v T (u)
θ
u
For the second property, cT (u) is the vector obtained by rotating u by the angle
θ , then changing its length by a factor of c (reversing direction of c < 0. On the
other hand, T (cu) first changes the length of c, then rotates. But it does not matter
in which order we do these two operations.
T
T (cu)
T (u)
u cu θ
This verifies that T is a linear transformation. We will find its matrix in the
next subsection. Note however that it is not at all obvious that T can be expressed
as multiplication by a matrix.
x1 x2
coordinates of u, v, we need to give those names as well; say u = y1 and v = y2 .
For the first property, we have
3(x 1 + x 2 ) − ( y1 + y2 )
x + x2
x1 x
T + 2 =T 1 = y1 + y2
y1 + y2
y1 y2
x1 + x2
(3x 1 − y1 ) + (3x 2 − y2 )
= y1 + y2
x1 + x2
3x 1 − y1 3x 2 − y2
x x
= y1 + y2 = T 1
+T 2 .
y1 y2
x1 x2
2x + 1
x |x| x xy x
T1 = T2 = T3 = .
y y y y y x − 2y
but that
1 |1| 1 −1
−T1 =− =− = .
0 0 0 0
Therefore, this transformation does not satisfy the second property.
For the second transformation, we note that
1 2 2·2 4
T2 2 = T2 = =
1 2 2 2
but that
1 1·1 1 2
2T2 =2 =2 = .
1 1 1 2
Therefore, this transformation does not satisfy the second property.
For the third transformation, we observe that
2(0) + 1
0 1 0
T3 = = 6= .
0 0 − 2(0) 0 0
Since T3 does not take the zero vector to the zero vector, it cannot be linear.
The ith entry of ei is equal to 1, and the other entries are zero.
From now on, for the rest of the book, we will use the symbols e1 , e2 , . . . to
denote the standard coordinate vectors.
There is an ambiguity in this notation: one has to know from context that e1 is
meant to have n entries. That is, the vectors
1
1 0
and
0
0
in R2 in R3
e2 e3
e1 e2
e1
These are the vectors of length 1 that point in the positive directions of each
of the axes.
148 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
For example,
1 2 3 1 1 1 2 3 0 2 1 2 3 0 3
4 5 6 0 = 4 4 5 6 1 = 5 4 5 6 0 = 6 .
7 8 9 0 7 7 8 9 0 8 7 8 9 1 9
Definition. The n × n identity matrix is the matrix I n whose columns are the n
standard coordinate vectors in Rn :
1 0 ··· 0 0
0 1 ··· 0 0
. . .. ..
In = . . ...
. . . . .
0 0 ··· 1 0
0 0 ··· 0 1
We will see in this example below that the identity matrix is the matrix of the
identity transformation.
Then T is the matrix transformation associated with A: that is, T (x) = Ax.
The matrix A in the above theorem is called the standard matrix for T . The
columns of A are the vectors obtained by evaluating T on the n standard coordinate
vectors in Rn . To summarize part of the theorem:
n m | | |
T: R →R
−−−→ m × n matrix A = T (e1 )
T (e2 ) · · · T (en )
Linear transformation | | |
T : Rn → Rm
←−−− m × n matrix A
T (x) = Ax
T (e2 )
e2
T (e1 )
cos(θ ) θ
sin(θ )
θ
cos(θ ) e1 sin(θ )
cos(θ )
T (e1 ) =
sin(θ )
cos(θ ) − sin(θ )
=⇒ A =
− sin(θ ) sin(θ ) cos(θ )
T (e2 ) =
cos(θ )
We saw in the above example that the matrix for counterclockwise rotation of
the plane by an angle of θ is
cos(θ ) − sin(θ )
A= .
sin(θ ) cos(θ )
Solution. We substitute the standard coordinate vectors into the formula defin-
ing T :
3(1) − 0 3
1
T (e1 ) = T = =
0 0
0
−1
1 1
3
=⇒ A = 0 1 .
3(0) − 1 −1 1 0
0
T (e2 ) = T = = 1
1
1
0 0
yz yz yz
reflect x y project yz
−−−−−→ −−−−−→
e1 xy xy xy
Since e1 lies on the x y-plane, reflecting over the x y-plane does not move e1 .
Since e1 is perpendicular to the yz-plane, projecting e1 onto the yz-plane sends it
to zero. Therefore,
0
T (e1 ) = 0 .
0
yz yz yz
reflect x y project yz
−−−−−→ −−−−−→
e2 xy xy xy
Since e2 lies on the x y-plane, reflecting over the x y-plane does not move e2 .
Since e2 lies on the yz-plane, projecting onto the yz-plane does not move e2 either.
Therefore,
0
T (e2 ) = e2 = 1 .
0
152 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
yz yz yz
e3 reflect x y project yz
−−−−−→ −−−−−→
xy xy xy
Illustration of a transformation defined in steps. Click and drag the vector on the left.
Recall from this definition in Section 4.1 that the identity transformation is the
transformation IdRn : Rn → Rn defined by IdRn (x) = x for every vector x.
Example (The standard matrix of the identity transformation). Verify that the
identity transformation IdRn : Rn → Rn is linear, and compute its standard matrix.
Solution. We verify the two defining properties of linear transformations. Let
u, v be vectors in Rn . Then
If c is a scalar, then
IdRn (cu) = cu = c IdRn (u).
Since IdRn satisfies the two defining properties, it is a linear transformation.
4.4. MATRIX MULTIPLICATION 153
Now that we know that IdRn is linear, it makes sense to compute its standard
matrix. For each standard coordinate vector ei , we have IdRn (ei ) = ei . In other
words, the columns of the standard matrix of IdRn are the standard coordinate
vectors, so the standard matrix is the identity matrix
1 0 ··· 0 0
0 1 ··· 0 0
. .
In = . . . . . ... ..
. . ..
0 0 ··· 1 0
0 0 ··· 0 1
We computed in this example that the matrix of the identity transform is the
identity matrix: for every x in Rn ,
x = IdRn (x) = I n x.
Therefore, I n x = x for all vectors x: the product of the identity matrix and a vector
is the same vector.
Objectives
(2 exp)(x) = 2 · exp(x) = 2e x .
T +U =U +T S + (T + U) = (S + T ) + U
c(T + U) = cT + cU (c + d)T = cT + d T
c(d T ) = (cd)T T +0= T
transformation.
4.4. MATRIX MULTIPLICATION 155
(T ◦ U)(x) = T (U(x)).
T ◦U T ◦ U(x)
x
T
U
U(x)
Rp Rn Rm
T ◦U
Rp U U(x) T Rm
x T ◦ U(x)
Rn
Recall from this definition in Section 4.1 that the identity transformation is the
transformation IdRn : Rn → Rn defined by IdRn (x) = x for every vector x.
Properties of composition. Let S, T, U be transformations and let c be a scalar.
Suppose that T : Rn → Rm , and that in each of the following identities, the do-
mains and the codomains are compatible when necessary for the composition to
be defined. The following properties are easily verified:
S ◦ (T + U) = S ◦ T + S ◦ U (S + T ) ◦ U = S ◦ U + T ◦ U
c(T ◦ U) = (cT ) ◦ U c(T ◦ U) = T ◦ (cU) if T is linear
T ◦ IdRn = T IdRm ◦T = T
S ◦ (T ◦ U) = (S ◦ T ) ◦ U
and
1 1 1
U◦T =U = .
0 0 1
1
Since T ◦ U and U ◦ T have different outputs for the input vector 0 , they are
different transformations. (See this example.)
a11 · · · a1 j · · · a1n
. .. ..
.. . .
ith row
a ··· a ··· a
i1 ij in
.. .. ..
. . .
am1 · · · am j · · · amn
jth column
Definition.
• The sum of two m×n matrices is the matrix obtained by summing the entries
of A and B individually:
In view of the above fact, the following properties are consequences of the
corresponding properties of transformations. They are easily verified directly from
the definitions as well.
A+ B = B + A C + (A + B) = (C + A) + B
c(A + B) = cA + cB (c + d)A = cA + dA
c(dA) = (cd)A A+ 0 = A
Example.
1 0 1 0
1 1 0 1 1 0 1 1 0
0 1 = 0 1
0 1 1 0 1 1 0 1 1
1 0 1 0
1 1 1 1
= =
1 1 1 1
In order for the vectors Av1 , Av2 , . . . , Avp to be defined, the numbers of rows of
B has to equal the number of columns of A.
If B has only one column, then AB also has one column. A matrix with one
column is the same as a vector, so the definition of the matrix product generalizes
the definition of the matrix-vector product.
If A is a square matrix, then we can multiply it by itself; we define its powers
to be
A2 = AA A3 = AAA etc.
The row-column rule for matrix multiplication Recall from this definition in
Section 3.3 that the product of a row vector and a column vector is the scalar
x1
x2
a1 a2 · · · an ... = a1 x 1 + a2 x 2 + · · · + an x n .
xn
The following procedure for finding the matrix product is much better adapted
to computations by hand; the previous definition is more suitable for proving the
4.4. MATRIX MULTIPLICATION 161
theorem below.
Recipe: The row-column rule for matrix multiplication. Let A be an m × n
matrix, let B be an n × p matrix, and let C = AB. Then the i j entry of C is the
ith row of A times the jth column of B:
Here is a diagram:
b · · · b · · · b
c11 · · · c1 j · · · c1p
a11 · · · a1k · · · a1n 11 1j 1p
... .. .. .. .. .. . .. ..
. . .
ith row . . .. . .
=
a
i1 · · · a ik · · · a
b
in k1 · · · b kj · · · b c
kp i1
· · · c ij · · · c
ip
.. .. .. . .. .. .. .. ..
. . . .. . . . . .
am1 · · · amk · · · amn bn1 · · · bn j · · · bnp cm1 · · · cm j · · · cmp
Proof. The row-column rule for matrix-vector multiplication in Section 3.3 says
that if A has rows r1 , r2 , . . . , rm and x is a vector, then
— r1 — r1 x
— r2 — r x
Ax = . x = 2. .
.. ..
— rm — rm x
The definition of matrix multiplication is
| | | | | |
A c1 c2 · · · c p = Ac1 Ac2 · · · Ac p .
| | | | | |
It follows that
— r1 — r 1 c1 r 1 c2 · · · r 1 c p
— r2 — | | | r 2 c1 r 2 c2 · · · r 2 c p
.
..
c 1 c 2 · · · c p
= .
.. .. .. .
| | | . .
— rm — rm c1 rm c2 · · · rm c p
Example. The row-column rule allows us to compute the product matrix one entry
at a time:
1 −3
1 · 1 + 2 · 2 + 3 · 3 14
1 2 3
2 −2 = =
4 5 6
3 −1
1 −3
1 2 3
2 −2 = =
4 5 6 4 · 1 + 5 · 2 + 6 · 3 32
3 −1
162 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
If c is a scalar, then
Since T ◦ U satisfies the two defining properties in Section 4.3, it is a linear trans-
formation.
Now that we know that T ◦ U is linear, it makes sense to compute its standard
matrix. Let C be the standard matrix of T ◦ U, so T (x) = Ax, U(x) = B x, and
T ◦ U(x) = C x. By this theorem in Section 4.3, the first column of C is C e1 , and
the first column of B is Be1 . We have
By definition, the first column of the product AB is the product of A with the first
column of B, which is Be1 , so
It follows that C has the same first column as AB. The same argument as applied
to the ith standard coordinate vector ei shows that C and AB have the same ith
column; since they have the same columns, they are the same matrix.
The theorem justifies our choice of definition of the matrix product. This is the
one and only reason that matrix products are defined in this way. To rephrase:
cos(45◦ ) − sin(45◦ ) 1 1 −1
A= =p
sin(45◦ ) cos(45◦ ) 2 1 1
cos(90◦ ) − sin(90◦ )
0 −1
B= = .
sin(90◦ ) cos(90◦ ) 1 0
1 1 −1 1 −1 −1
0 −1
AB = p =p .
2 1 1 1 0 2 1 −1
This is consistent with the fact that T ◦U is counterclockwise rotation by 90◦ +45◦ =
135◦ : we have
cos(135◦ ) − sin(135◦ ) 1 −1 −1
=p
sin(135◦ ) cos(135◦ ) 2 1 −1
p p
because cos(135◦ ) = −1/ 2 and sin(135◦ ) = 1/ 2.
and
cos(α ± β) = cos(α) cos(β) ∓ sin(α) sin(β)
using the theorem as applied to rotation transformations, as in the previous exam-
ple.
The matrix of the composition T ◦ U is the product of the matrices for T and U.
yz yz
reflect x y
e1 −−−−−→ U(e1 )
xy xy
Since e1 lies on the x y-plane, reflecting it over the x y-plane does not move it:
1
U(e1 ) = 0 .
0
yz yz
reflect x y
−−−−−→
e2 xy U(e2 ) xy
Since e2 lies on the x y-plane, reflecting over the x y-plane does not move it
either:
0
U(e2 ) = e2 = 1 .
0
4.4. MATRIX MULTIPLICATION 165
yz yz
e3 reflect x y
−−−−−→
xy U(e3 ) xy
In particular, we have
1 1 1 0 1 0 1 1
6 = ,
0 1 1 1 1 1 0 1
Recall from this definition in Section 4.3 that the identity transformation is
the n × n matrix I n whose columns are the standard coordinate vectors in Rn .
The identity matrix is the standard matrix of the identity transformation: that is,
x = IdRn (x) = I n x for all vectors x in Rn .
In view of the above theorem, the following properties are consequences of the
corresponding properties of transformations.
C(A + B) = CA + C B (A + B)C = AC + BC
c(AB) = (cA)B c(AB) = A(cB)
AI n = A ImA = A
(AB)C = A(BC)
Most of the above properties are easily verified directly from the definitions.
The associativity property (AB)C = A(BC), however, is not (try it!). It is much
easier to prove by relating matrix multiplication to composition of transformations,
and using the obvious fact that composition of transformations is associative.
4.5. MATRIX INVERSES 167
Although matrix multiplication satisfies many of the properties one would ex-
pect, one must be careful when doing matrix arithmetic, as there are several prop-
erties that are not satisfied in general.
Objectives
3. Recipes: compute the inverse matrix, solve a linear system by taking inverses.
168 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
AB = I n and BA = I n .
In this case, the matrix B is called the inverse of A, and we write B = A−1 .
are inverses.
Solution. We will check that AB = I2 and that BA = I2 .
2 1 1 −1 1 0
AB = =
1 1 −1 2 0 1
1 −1 2 1 1 0
BA = =
−1 2 1 1 0 1
Remark. There exist non-square matrices whose product is the identity. Indeed,
if
1 0
1 0 0
A= and B = 0 1
0 1 0
0 0
Proof.
1. The equations AA−1 = I n and A−1 A = I n at the same time exhibit A−1 as the
inverse of A and A as the inverse of A−1 .
2. We compute
Here we used the associativity of matrix multiplication and the fact that
I n B = B. This shows that B −1 A−1 is the inverse of AB.
Why is the inverse of AB not equal to A−1 B −1 ? If it were, then we would have
I n = (AB)(A−1 B −1 ) = ABA−1 B −1 .
But there is no reason for ABA−1 B −1 to equal the identity matrix: one cannot switch
the order of A−1 and B, so there is nothing to cancel in this expression.
More generally, the inverse of a product of several invertible matrices is the
product of the inverses, in the opposite order; the proof is the same. For instance,
(ABC)−1 = C −1 B −1 A−1 .
170 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
1
d −b
−1
A = .
det(A) −c a
1 1
a b d −b ad − bc 0
AB = = = I2 .
c d det(A) −c a ad − bc 0 ad − bc
−ab + ab
−b a b −b 0
T = = = =0
a c d a −bc + ad det(A)
d a b d ad − bc det(A)
T = = = = 0.
−c c d −c cd − cd 0
If A is the
zero matrix,
then it is obviously not invertible. Otherwise, one of
−b d
v = a and v = −c will be a nonzero vector in the null space of A. Suppose
that there were a matrix B such that BA = I2 . Then
v = I2 v = BAv = B0 = 0,
Example. Let
1 2
A= .
3 4
Proof. First suppose that the reduced row echelon form of ( A | I n ) does not have
the form ( I n | B ). This means that fewer than n pivots are contained in the first n
columns (the non-augmented part), so A has fewer than n pivots. It follows that
Nul(A) 6= {0} (the equation Ax = 0 has a free variable), so there exists a nonzero
vector v in Nul(A). Suppose that there were a matrix B such that BA = I n . Then
v = I n v = BAv = B0 = 0,
The columns x 1 , x 2 , . . . , x n of the matrix B in the row reduced form are the solu-
tions to these equations:
1 1 0 0 1 −6 −2
A 0 = e1 : 0 1 0 0 −2 −1
0 0 0 1 0 3/2 1/2
−6 1 0 0 1 −6 −2
A −2 = e2 : 0 1 0 0 −2 −1
3/2 0 0 1 0 3/2 1/2
−2 1 0 0 1 −6 −2
A −1 = e3 : 0 1 0 0 −2 −1 .
1/2 0 0 1 0 3/2 1/2
By this fact in Section 4.3, the product Bei is just the ith column x i of B, so
ei = Ax i = ABei
for all i. By the same fact, the ith column of AB is ei , which means that AB is the
identity matrix. Thus B is the inverse of A.
Example (An invertible matrix). Find the inverse of the matrix
1 0 4
A = 0 1 2.
0 −3 −4
1 0 4 1 0 0 1 0 4 1 0 0
R3 =R3 +3R2
0 1 2 0 1 0 −−−−−−→ 0
1 2 0 1 0
0 −3 −4 0 0 1 0 0 2 0 3 1
R1 = R1 − 2R3
1 0 0 1 −6 −2
R2 = R2 − R3
−−−−−→ 0 1 0 0 −2 −1
0 0 2 0 3 1
1 0 0 1 −6 −2
R3 =R3 ÷2
−−−−−→ 0 1 0 0 −2 −1 .
0 0 1 0 3/2 1/2
We check:
1 0 4 1 −6 −2 1 0 0
0 1 2 0 −2 −1 = 0 1 0 .
0 −3 −4 0 3/2 1/2 0 0 1
1 0 4 1 0 0 1 0 4 1 0 0
R3 =R3 +3R2
0 1 2 0 1 0 −−−−−−→ 0 1 2 0 1 0 .
0 −3 −6 0 0 1 0 0 0 0 3 1
At this point we can stop, because it is clear that the reduced row echelon form
will not have I3 in the non-augmented part: it will have a row of zeros. By the
theorem, the matrix is not invertible.
x = A−1 b.
Proof. We calculate:
Here we used associativity of matrix multiplication, and the fact that I n x = x for
any vector b.
Here we used
1 3
det = 1 · 2 − (−1) · 3 = 5.
−1 2
2 3 2 1 0 0 1 0 3 0 1 0
R1 ←→R2
1 0 3 0 1 0 −−−−→ 2 3 2 1 0 0
2 2 3 0 0 1 2 2 3 0 0 1
R2 = R2 − 2R1
1 0 3 0 1 0
R3 = R3 − 2R1
−−−−−→ 0 3 −4 1 −2 0
0 2 −3 0 −2 1
1 0 3 0 1 0
R2 =R2 −R3
−−−−−→ 0 1 −1 1 0 −1
0 2 −3 0 −2 1
1 0 3 0 1 0
R3 =R3 −2R2
−−−−−−→ 0 1 −1 1 0 −1
0 0 −1 −2 −2 3
1 0 3 0 1 0
R3 =−R3
−−−−→ 0 1 −1 1 0 −1
0 0 1 2 2 −3
R1 = R1 − 3R3
1 0 0 −6 −5 9
R = R2 + R3
−2−−−− →0 1 0 3 2 −4 .
0 0 1 2 2 −3
4.5. MATRIX INVERSES 175
The advantage of solving a linear system using inverses is that it becomes much
faster to solve the matrix equation Ax = b for other, or even unknown, values of
b. For instance, in the above example, the solution of the system of equations
2x 1 + 3x 2 + 2x 3 = b1
(
x1 + 3x 3 = b2
2x 1 + 2x 2 + 3x 3 = b3 ,
for all vectors x. This means that if you apply T to x, then you apply U, you get
the vector x back, and likewise in the other order.
and p
3
g ◦ f (x) = g( f (x)) = g(x 3 ) = x 3 = x.
In other words, taking the cube root undoes the transformation that takes a num-
ber to its cube.
Define f : R → R by f (x) = x 2 . This is not an invertible function. Indeed, we
have f (2) = 2 = f (−2), so there is no way to undo f : the inverse transformation
would not know if it should send 2 to 2 or −2. More formally, if g : R → R satisfies
g( f (x)) = x, then
U T
4.5. MATRIX INVERSES 177
T U
T T
The transformation T is equal to its own inverse: applying T twice takes a vector
back to where it started.
and
0 0 0
0 = U ◦ T 0 = U T 0 = U(0).
1 1 1
Projection onto the x y-plane is not an invertible transformation: all points on each
vertical line are sent to the same point by T , so there is no way to undo T .
Proposition.
Proof. To say that T is one-to-one and onto means that T (x) = b has exactly one
solution for every b in Rn .
Suppose that T is invertible. Then T (x) = b always has the unique solution
x = T −1 (b): indeed, applying T −1 to both sides of T (x) = b gives
x = T −1 (T (x)) = T −1 (b),
T (x) = T (T −1 (b)) = b.
T −1 ◦ T ◦ U ◦ T = T −1 ◦ IdRn ◦T.
We have T −1 ◦ T = IdRn and IdRn ◦U = U, so the left side of the above equation
is U ◦ T . Likewise, IdRn ◦T = T and T −1 ◦ T = IdRn , so our equality simplifies to
U ◦ T = IdRn , as desired.
If instead we had assumed only that U ◦ T = IdRn , then the proof that T ◦ U =
IdRn proceeds similarly.
Remark. It makes sense in the above definition to define the inverse of a trans-
formation T : Rn → Rm , for m 6= n, to be a transformation U : Rm → Rn such that
T ◦ U = IdRm and U ◦ T = IdRn . In fact, there exist invertible transformations
T : Rn → Rm for any m and n, but they are not linear, or even continuous.
If T is a linear transformation, then it can only be invertible when m = n, i.e.,
when its domain is equal to its codomain. Indeed, if T : Rn → Rm is one-to-one,
then n ≤ m by this note in Section 4.2, and if T is onto, then m ≤ n by this note in
Section 4.2. Therefore, when discussing invertibility we restrict ourselves to the
case m = n.
1
3/2 0 2/3 0
A =
−1
= .
9/4 0 3/2 0 2/3
By the theorem, T is invertible, and its inverse is the matrix transformation for
A−1 :
2/3 0
T (x) =
−1
x.
0 2/3
We recognize this as a dilation by a factor of 2/3.
4.5. MATRIX INVERSES 181
cos(θ ) − sin(θ )
.
sin(θ ) cos(θ )
1 1 −1
A= p ,
2 1 1
where we have used the trigonometric identities
1 1
cos(45◦ ) = p sin(45◦ ) = p .
2 2
The determinant of A is
1 1 1 −1 1 1
det(A) = p · p − p p = + = 1,
2 2 2 2 2 2
so the inverse is
1
1 1
A =p
−1
.
2 −1 1
By the theorem, T is invertible, and its inverse is the matrix transformation for
A−1 :
1
1 1
T (x) = p
−1
x.
2 −1 1
We recognize this as a clockwise rotation by 45◦ , using the trigonometric identities
1 1
cos(−45◦ ) = p sin(−45◦ ) = − p .
2 2
Example (Reflection). Let T : R2 → R2 be the reflection over the y-axis. Is T
invertible? If so, what is T −1 ?
Solution. In this example in Section 4.1 we showed that the matrix for T is
−1 0
A= .
0 1
1. A is invertible.
2. T is invertible.
4. A has n pivots.
6. Nul(A) = {0}.
7. nullity(A) = 0.
10. T is one-to-one.
14. Col(A) = Rn .
16. rank(A) = n.
17. T is onto.
dimension of Nul(A). The equivalence of 5 and 8 results from this theorem in Sec-
tion 3.5, and the equivalence of 5 and 10 results from this theorem in Section 4.2.
The basis theorem in Section 3.9 implies the equivalence of 8 and 9, since any set
of n linearly independent vectors in Rn forms a basis.
(11 ⇐⇒ 13 ⇐⇒ 14 ⇐⇒ 15 ⇐⇒ 16 ⇐⇒ 17): Assertions 13, 14, and
17 are translations of each other, because Col(A) is the span of the columns of A,
which is equal to the range of T . The equivalence of 11 and 13 follows from this
theorem in Section 3.3. Since dim(Rn ) = n, the only n-dimensional subspace of
Rn is all of Rn , so 14, 15, and 16 are equivalent.
(4 ⇐⇒ 8) and (4 ⇐⇒ 13): Since A has n rows and columns, it has a pivot
in every row (resp. column) if and only if it has n pivots. By this theorem in
Section 3.5, there is a pivot in every column if and only if the columns are linearly
independent, and by this theorem in Section 3.3, there is a pivot in every row if
and only if the rows span Rn .
(2 ⇐⇒ 10 ⇐⇒ 17): By this proposition, the transformation T is invertible
if and only if it is both one-to-one and onto, so 2 is equivalent to (10 and 17). We
have already shown (10 ⇐⇒ 8 ⇐⇒ 4 ⇐⇒ 13 ⇐⇒ 17), so 10 and 17 are
equivalent to each other; thus they are both equivalent to 2.
At this point, we have demonstrated the equivalence of assertions 1–17.
(1 =⇒ 18 =⇒ 17): Invertibility of A means there exists a matrix B such that
AB = I n and BA = I n , so 1 implies 18. Now we prove directly that T is onto. Let b
be a vector in Rn , and let x = B b. Then
T (x) = Ax = AB b = I n b = b,
2. non-invertible matrices.
For invertible matrices, all of the statements of the invertible matrix theorem
are true.
For non-invertible matrices, all of the statements of the invertible matrix the-
orem are false.
184 CHAPTER 4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA
Solution. The second column is a multiple of the first. The columns are linearly
dependent, so A does not satisfy condition 8 of the invertible matrix theorem.
Therefore, A is not invertible.
Example. Let A be an n × n matrix and let T (x) = Ax. Suppose that the range of
T is Rn . Show that the columns of A are linearly independent.
Solution. The range of T is the column space of A, so A satisfies condition 14 of
the invertible matrix theorem. Therefore, A also satisfies condition 8, which says
that the columns of A are linearly independent.
Determinants
At this point we have said all that we will say about the first part. This chapter
belongs to the second.
Primary Goal. Learn about determinants: their computation and their properties.
Objectives
185
186 CHAPTER 5. DETERMINANTS
2. Learn some ways to eyeball a matrix with zero determinant, and how to
compute determinants of upper- and lower-triangular matrices.
3. Learn the basic properties of the determinant, and how to apply them.
In this section, we define the determinant, and we present one way to com-
pute it. Then we discuss some of the many wonderful properties the determinant
enjoys.
In each of the first three cases, doing a row operation on a matrix scales the
determinant by a nonzero number. (Multiplying a row by zero is not a row op-
eration.) Therefore, doing row operations on a square matrix A does not change
whether or not the determinant is zero.
The reason behind using these particular defining properties is geometric. See
Section 5.3.
5.1. DETERMINANTS: DEFINITION 187
1 0
Example. Compute det .
0 3
1 0
Solution. Let A = . Since A is obtained from I2 by multiplying the second
0 3
row by the constant 3, we have
det(A) = 3 det(I2 ) = 3 · 1 = 3.
1 0 0
Example. Compute det 0 0 1 .
5 1 0
Solution. First we row reduce, then we compute the determinant in the opposite
order:
1 0 0
0 0 1 det = −1
5 1 0
1 0 0
R2 ←→ R3
−−−−−−−→ 5 1 0 det = 1
0 0 1
1 0 0
R2 = R2 − 5R1
−−−−−−−→ 0 1 0 det = 1
0 0 1
The reduced row echelon form is I3 , which has determinant 1. Working backwards
from I3 and using the four defining properties, we see that the second matrix also
has determinant 1 (it differs from I3 by a row replacement), and the first matrix
has determinant −1 (it differs from the second by a row swap).
2 1
Example. Compute det .
1 4
Solution. First we row reduce, then we compute the determinant in the opposite
order:
2 1
det = 7
1 4
R1 ←→ R2 1 4
−−−−−−−→ det = −7
2 1
R2 = R2 − 2R1 1 4
−−−−−−−→ det = −7
0 −7
R2 = R2 ÷ −7 1 4
−−−−−−−→ det = 1
0 1
R1 = R1 − 4R2 1 0
−−−−−−−→ det = 1
0 1
188 CHAPTER 5. DETERMINANTS
The reduced row echelon form of the matrix is the identity matrix I2 , so its de-
terminant is 1. The second-last step in the row reduction was a row replacement,
so the second-final matrix also has determinant 1. The previous step in the row
reduction was a row scaling by −1/7; since (the determinant of the second ma-
trix times −1/7) is 1, the determinant of the second matrix must be −7. The first
step in the row reduction was a row swap, so the determinant of the first matrix
is negative the determinant of the second. Thus, the determinant of the original
matrix is 7.
Definition.
diagonal entries
a11 a12 a13
a11 a12 a13 a14 a
21 a22 a23
a21 a22 a23 a24
a31 a32 a33
a31 a32 a33 a34
a41 a42 a43
• A square matrix is called upper-triangular if its nonzero entries all lie above
the diagonal, and it is called lower-triangular if its nonzero entries all lie
below the diagonal. It is called diagonal if all of its nonzero entries lie on
the diagonal, i.e., if it is both upper-triangular and lower-triangular.
diagonal
upper-triangular lower-triangular
? ? ? ? ? 0 0 0 ? 0 0 0
0 ? ? ? ? ? 0 0 0 ? 0 0
0 0 ? ? ?
? ? ? 0 0 0 0
0 0 0 ? ? ? ? ? 0 0 0 ?
Proof.
1. Suppose that A has a zero row. Let B be the matrix obtained by negating
the zero row. Then det(A) = − det(B) by the second defining property. But
A = B, so det(A) = det(B):
1 2 3 1 2 3
R2 =−R2
0 0 0 −−−− → 0 0 0.
7 8 9 7 8 9
2. First suppose that A is upper-triangular, and that one of the diagonal entries
is zero, say aii = 0. We can perform row operations to clear the entries above
the nonzero diagonal entries:
a11 ? ? ? a11 0 ? 0
0 a22 ? ? 0 a22 ? 0
−−−→
0 0 0 ? 0 0 0 0
0 0 0 a44 0 0 0 a44
In the resulting matrix, the ith row is zero, so det(A) = 0 by the first part.
Still assuming that A is upper-triangular, now suppose that all of the diagonal
entries of A are nonzero. Then A can be transformed to the identity matrix
by scaling the diagonal entries and then doing row replacements:
a ? ? scale by 1 ? ? row 1 0 0
a−1 , b−1 , c −1 replacements
0 b ? −−−−−−−→ 0 1 ? −−−−−−− → 0 1 0
0 0 c 0 0 1 0 0 1
det = abc ←−−−−−−− det = 1 ←−−−−−−− det = 1
Since det(I n ) = 1 and we scaled by the reciprocals of the diagonal entries,
this implies det(A) is the product of the diagonal entries.
The same argument works for lower triangular matrices, except that the the
row replacements go down instead of up.
A matrix can always be transformed into row echelon form by a series of row
operations, and a matrix in row echelon form is upper-triangular. Therefore, we
have a systematic way of computing the determinant:
Solution. We row reduce the matrix, keeping track of the number of row swaps
and of the scaling factors used.
5.1. DETERMINANTS: DEFINITION 191
0 −7 −4 2 4 6
R1 ←→R2
2 4 6 −−−−→ 0
−7 −4 r = 1
3 7 −1 3 7 −1
1 2 3
R1 =R1 ÷2 1
−−−−−→ 0 −7 −4 scaling factors = 2
3 7 −1
1 2 3
R3 =R3 −3R1
−−−−−−→ 0 −7 −4
0 1 −10
1 2 3
R2 ←→R3
−−−−→ 0 1 −10 r = 2
0 −7 −4
1 2 3
R3 =R3 +7R2
−−−−−−→ 0 1 −10
0 0 −74
We made two row swaps and scaled once by a factor of 1/2, so the recipe says
that
0 −7 −4
1 · 1 · −74
det 2 4 6 = (−1)2 · = −148.
3 7 −1 1/2
1 2 3
Example. Compute det 2 −1 1 .
3 0 1
Solution. We row reduce the matrix, keeping track of the number of row swaps
and of the scaling factors used.
1 2 3 1 2 3
R2 =R2 −2R1
2 −1 1 −−−−−− → 0 −5 −5
R3 =R3 −3R1
3 0 1 0 −6 −8
1 2 3
R2 =R2 ÷−5
−−−−−→ 0 1 1 scaling factors = − 15
0 −6 −8
1 2 3
R3 =R3 +6R2
−−−−−−→ 0 1 1
0 0 −2
We did not make any row swaps, and we scaled once by a factor of −1/5, so
the recipe says that
1 2 3
1 · 1 · −2
det 2 −1 1 = = 10.
3 0 1 −1/5
192 CHAPTER 5. DETERMINANTS
• If a = 0, then
a b 0 b c d
det = det = − det = −bc.
c d c d 0 b
• If a 6= 0, then
a b 1 b/a 1 b/a
det = a · det = a · det
c d c d 0 d − c · b/a
= a · 1 · (d − bc/a) = ad − bc.
Theorem (Existence of the determinant). There exists one and only function from
the set of square matrices to the real numbers, that satisfies the four defining proper-
ties.
No matter which row operations you do, you will always compute the same
value for the determinant.
Proof. If A is invertible, then its reduced row echelon form is the identity matrix by
the invertible matrix theorem in Section 4.5. Since row operations do not change
whether the determinant is zero, and since det(I n ) = 1, this implies det(A) 6= 0.
Conversely, if A is not invertible, then it is row equivalent to a matrix with a zero
row. Again, row operations do not change whether the determinant is nonzero, so
in this case det(A) = 0.
By the invertibility property, a matrix that does not satisfy any of the properties
of the invertible matrix theorem in Section 4.5 has zero determinant.
Corollary. Let A be a square matrix. If the rows or columns of A are linearly depen-
dent, then det(A) = 0.
Proof. If the columns of A are linearly dependent, then A is not invertible by state-
ment 8 of the invertible matrix theorem in Section 4.5. Suppose now that the rows
of A are linearly dependent. If r1 , r2 , . . . , rn are the rows of A, then one of the rows
is in the span of the others, so we have an equation like
r2 = 3r1 − r3 + 2r4 .
R2 = R2 − 3R1 ; R2 = R2 + R3 ; R2 = R2 − 2R4
then the second row of the resulting matrix is zero. Hence A is not invertible in
this case either.
Alternatively, if the rows of A are linearly dependent, then one can combine
statement 8 of the invertible matrix theorem in Section 4.5 and the transpose
property below to conclude that det(A) = 0.
In particular, if two rows/columns of A are multiples of each other, then det(A) =
0. We also recover the fact that a matrix with a row or column of zeros has deter-
minant zero.
Example. The following matrices all have zero determinant:
3 1 2 4
0 2 −1 5 −15 11 π e 11
0 5 10 , 3 −9 2 , 0 0 0 0
4 2 5 12 , 3π 3e 33 .
0 −7 3 2 −6 16 12 −7 2
−1 3 4 8
The proofs of the multiplicativity property and the transpose property below,
as well as the cofactor expansion theorem in Section 5.2 and the determinants
and volumes theorem in Section 5.3, use the following strategy: define another
function d : {n × n matrices} → R, and prove that d satisfies the same four defining
properties as the determinant. By the existence theorem, the function d is equal to
the determinant. This is an advantage of defining a function via its properties: in
order to prove it is equal to another function, one only has to check the defining
properties.
194 CHAPTER 5. DETERMINANTS
The proof of the Claim is by direct calculation; we leave it to the reader to gener-
alize the above equalities to n × n matrices.
As a consequence of the Claim and the four defining properties, we have the
following observation. Let C be any square matrix.
1. If E is the elementary matrix for a row replacement, then det(EC) = det(C).
In other words, left-multiplication by E does not change the determinant.
4. We have
det(I n B) det(B)
d(I n ) = = = 1.
det(B) det(B)
det(An ) = det(A)n
for all n ≥ 1. If A is invertible, then the equation holds for all n ≤ 0 as well; in
particular,
1
det(A−1 ) = .
det(A)
Proof. Using the multiplicativity property, we compute
and
and so on.
Nowhere did we have to compute the 100th power of A! (We will learn an efficient
way to do that in Section 6.4.)
Proof. The determinant of the product is the product of the determinants by the
multiplicativity property:
AT
A
a11 a21 a31
a11 a12 a13 a14
a12 a22 a32
a21 a22 a23 a24
a13 a23 a33
a31 a32 a33 a34
a14 a24 a34
flip
(AB) T = B T AT .
bn
a1
a2
= b1 b2 · · · bn ... = B A .
T T
an
det(A) = det(AT ).
5.1. DETERMINANTS: DEFINITION 199
Proof. We follow the same strategy as in the proof of the multiplicativity property:
namely, we define
d(A) = det(AT ),
and we show that d satisfies the four defining properties of the determinant. Again
we use elementary matrices, also introduced in the proof of the multiplicativity
property.
1. Let C 0 be the matrix obtained by doing a row replacement on C, and let E be
the elementary matrix for this row replacement, so C 0 = EC. The elementary
matrix for a row replacement is either upper-triangular or lower-triangular,
with ones on the diagonal:
1 0 3 1 0 0
R1 = R1 + 3R3 : 0 1 0 R3 = R3 + 3R1 : 0 1 0 .
0 0 1 3 0 1
3. Let C 0 be the matrix obtained by swapping two rows of C, and let E be the
elementary matrix for this row replacement, so C 0 = EC. The E is equal to
its own transpose:
T
0 1 0 0 1 0
R1 ←→ R2 : 1 0 0 = 1 0 0 .
0 0 1 0 0 1
4. Since I nT = I n , we have
The transpose property is very useful. For concreteness, we note that det(A) =
det(AT ) means, for instance, that
1 2 3 1 4 7
det 4 5 6 = det 2 5 8 .
7 8 9 3 6 9
This implies that the determinant has the curious feature that it also behaves well
with respect to column operations. Indeed, a column operation on A is the same
as a row operation on AT , and det(A) = det(AT ).
Corollary. The determinant satisfies the following properties with respect to column
operations:
The previous corollary makes it easier to compute the determinant: one is al-
lowed to do row and column operations when simplifying the matrix. (Of course,
one still has to keep track of how the row and column operations change the de-
terminant.)
2 7 4
Example. Compute det 3 1 3 .
4 0 1
Solution. It takes fewer column operations than row operations to make this
matrix upper-triangular:
2 7 4 −14 7 4
C1 =C1 −4C3
3 1 3 −−−−−− → −9 1 3
4 0 1 0 0 1
49 7 4
C1 =C1 +9C2
−−−−−−→ 0 1 3
0 0 1
5.1. DETERMINANTS: DEFINITION 201
We performed two column replacements, which does not change the determi-
nant; therefore,
2 7 4 49 7 4
det 3 1 3 = det 0 1 3 = 49.
4 0 1 0 0 1
T (x) = det(x, v2 , . . . , vn ).
w = cv + c2 v2 + · · · + cn vn
R 1 = R 1 − c2 R 2 ; R1 = R1 − c3 R3 ; ... R 1 = R 1 − cn R n ,
v + w − (c2 v2 + · · · + cn vn ) = v + cv = (1 + c)v.
202 CHAPTER 5. DETERMINANTS
Therefore,
R 1 = R 1 + c2 R 2 ; R 1 = R 1 + c3 R 3 ; ... R 1 = R 1 + cn R n
T (cv) = det(cv, v2 , . . . , vn )
= det(cv + c2 v2 + · · · + cn vn , v2 , . . . , vn )
= det(w, v2 , . . . , vn ) = T (w),
Therefore,
−1 7 2 1 7 2
det 2 −3 2 = − det 0 −3
2
3 1 1 0 1 1
0 7 2 0 7 2
+ 2 det 1 −3 2 + 3 det 0 −3 2 .
0 1 1 1 1 1
204 CHAPTER 5. DETERMINANTS
1
det(A−1 ) = .
det(A)
det(AT ) = det(A).
Objectives
1. Learn to recognize which methods are best suited to compute the determi-
nant of a given matrix.
2. Recipes: the determinant of a 3 × 3 matrix, compute the determinant using
cofactor expansions.
3. Vocabulary words: minor, cofactor.
Ci j = (−1)i+ j det(Ai j ).
Note that the signs of the cofactors follow a “checkerboard pattern.” Namely,
(−1)i+ j is pictured in this matrix:
+ − + −
− + − +
+ .
− + −
− + − +
Example. For
1 2 3
A = 4 5 6,
7 8 9
!
1 2 3
1 2
1 2
A23 = 4 5 6 = C23 = (−1) 2+3
det = (−1)(−6) = 6
7 8 7 8
7 8 9
We want to show that d(A) = det(A). Instead of showing that d satisfies the four
defining properties of the determinant in Section 5.1, we will prove that it satsifies
the three alternative defining properties in Section 5.1, which were shown to be
equivalent.
1. We claim that d is multilinear in the rows of A. Let A be the matrix with rows
v1 , v2 , . . . , vi−1 , v + w, vi+1 , . . . , vn :
a11 a12 a13
A = b 1 + c1 b 2 + c2 b 3 + c3 .
a31 a32 a33
On the other hand, the (i, 1)-cofactors of A, B, and C are all the same:
a a
(−1) 2+1
det(A21 ) = (−1) 2+1
det 12 13
a32 a33
= (−1)2+1 det(B21 ) = (−1)2+1 det(C21 ).
Now we compute
X 0
d(A) = (−1)i+1 (bi + ci ) det(Ai1 ) + (−1)i +1 ai1 det(Ai 0 1 )
i 0 6=i
= (−1) i+1
bi det(Bi1 ) + (−1) ci det(Ci1 )
i+1
X 0
(−1)i +1 ai1 det(Bi 0 1 ) + det(Ci 0 1 )
+
i 0 6=i
X 0
= (−1)i+1 bi det(Bi1 ) + (−1)i +1 ai1 det(Bi 0 1 )
i 0 6=i
X 0
+ (−1)i+1 ci det(Ci1 ) + (−1)i +1 ai1 det(Ci 0 1 )
i 0 6=i
= d(B) + d(C),
as desired. This shows that d(A) satisfies the first defining property in the
rows of A.
We still have to show that d(A) satisfies the second defining property in the
rows of A. Let B be the matrix obtained by scaling the ith row of A by a factor
of c:
a11 a12 a13 a11 a12 a13
A = a21 a22 a23 B = ca21 ca22 ca23 .
a31 a32 a33 a31 a32 a33
On the other hand, the (i, 1)-cofactors of A and B are the same:
a a
(−1) 2+1
det(B21 ) = (−1) 2+1
det 12 13 = (−1)2+1 det(A21 ).
a32 a33
208 CHAPTER 5. DETERMINANTS
Now we compute
X 0
d(B) = (−1) i+1
cai1 det(Bi1 ) + (−1)i +1 ai 0 1 det(Bi 0 1 )
i 0 6=i
X 0
= (−1)i+1 cai1 det(Ai1 ) + (−1)i +1 ai 0 1 · c det(Ai 0 1 )
i 0 6=i
X
i 0 +1
= c (−1) i+1
cai1 det(Ai1 ) + (−1) ai 0 1 det(Ai 0 1 )
i 0 6=i
= c d(A),
as desired. This completes the proof that d(A) is multilinear in the rows of
A.
2. Now we show that d(A) = 0 if A has two identical rows. Suppose that rows
i1 , i2 of A are identical, with i1 < i2 :
a11 a12 a13 a14
a a a a
A = 21 22 23 24 .
a31 a32 a33 a34
a11 a12 a13 a14
The (i1 , 1)-minor can be transformed into the (i2 , 1)-minor using i2 − i1 − 1
row swaps:
Therefore,
3. It remains to show that d(I n ) = 1. The first is the only one nonzero term in
the cofactor expansion of the identity:
This proves that det(A) = d(A), i.e., that cofactor expansion along the first column
computes the determinant.
Now we show that cofactor expansion along the jth column also computes the
determinant. By performing j − 1 column swaps, one can move the jth column
of a matrix to the first column, keeping the other columns in order. For example,
here we move the third column to the first, using two column swaps:
Let B be the matrix obtained by moving the jth column of A to the first column
in this way. Then the (i, j) minor Ai j is equal to the (i, 1) minor Bi1 , since deleting
the ith column of A is the same as deleting the first column of B. By construction,
the (i, j)-entry ai j of A is equal to the (i, 1)-entry bi1 of B. Since we know that we
can compute determinants by expanding along the first column, we have
n
X n
X
det(B) = (−1)i+1 bi1 det(Bi1 ) = (−1)i+1 ai j det(Ai j ).
i=1 i=1
This proves that cofactor expansion along the ith column computes the determi-
nant of A.
By the transpose property in Section 5.1, the cofactor expansion along the ith
row of A is the same as the cofactor expansion along the ith column of AT . Again
by the transpose property, we have det(A) = det(AT ), so expanding cofactors along
a row also computes the determinant.
Note that the theorem actually gives 2n different formulas for the determinant:
one for each row and one for each column. For instance, the formula for cofactor
expansion along the first column is
n
X
det(A) = ai1 Ci1 = a11 C11 + a21 C21 + · · · + an1 Cn1
i=1
= a11 det(A11 ) − a21 det(A21 ) + a31 det(A31 ) − · · · ± an1 det(An1 ).
210 CHAPTER 5. DETERMINANTS
Remember, the determinant of a matrix is just a number, defined by the four defin-
ing properties in Section 5.1, so to be clear:
You obtain the same number by expanding cofactors along an y row or column.
Now that we have a recursive formula for the determinant, we can finally prove
the existence theorem in Section 5.1.
Proof. Let us review what we actually proved in Section 5.1. We showed that if
det: {n × n matrices} → R is any function satisfying the four defining properties
of the determinant (or the three alternative defining properties), then it also satis-
fies all of the wonderful properties proved in that section. In particular, since det
can be computed using row reduction by this recipe in Section 5.1, it is uniquely
characterized by the defining properties. What we did not prove was the exis-
tence of such a function, since we did not know that two different row reduction
procedures would always compute the same answer.
Consider the function d defined by cofactor expansion along the first row:
n
X
d(A) = (−1)i+1 ai1 det(Ai1 ).
i=1
Solution. We make the somewhat arbitrary choice to expand along the first row.
The minors and cofactors are
5.2. COFACTOR EXPANSIONS 211
2 1 3
2 1 2 1
A11 = −1
2 1 =
C11 = + det =4
2 3 2 3
−2 2 3
2 1 3
−1 1 −1 1
A12 = −1 2 1 = C12 = − det =1
−2 3 −2 3
−2 2 3
2 1 3
−1 2 −1 2
A13 = −1
2 1 =
C13 = + det = 2.
−2 2 −2 2
−2 2 3
Thus,
det(A) = a11 C11 + a12 C12 + a13 C13 = (2)(4) + (1)(1) + (3)(2) = 15.
The determinant of a 2 × 2 matrix. Let us compute (again) the determinant of a
general 2 × 2 matrix
a b
A= .
c d
The minors are
a b a b
A11 = A12 =
= d = c
c d c d
a b a b
A21 = A22 =
= b = a .
c d c d
The minors are all 1 × 1 matrices. As we have seen that the determinant of a
1 × 1 matrix is just the number inside of it, the cofactors are therefore
C11 = + det(A11 ) = d C12 = − det(A12 ) = −c
C21 = − det(A21 ) = −b C22 = + det(A22 ) = a
Expanding cofactors along the first column, we find that
det(A) = aC11 + cC21 = ad − bc,
which agrees with the formulas in this definition in Section 4.5 and this example
in Section 5.1.
The determinant of a 3 × 3 matrix. We can also use cofactor expansions to find
a formula for the determinant of a 3 × 3 matrix. Let is compute the determinant
of
a11 a12 a13
A = a21 a22 a23
a31 a32 a33
by expanding along the first row. The minors and cofactors are:
212 CHAPTER 5. DETERMINANTS
a11 a12 a13
a22 a23 a a
A11 = a21
a22 a23 =
C11 = + det 22 23
a32 a33 a32 a33
a31 a32 a33
a11 a12 a13
a21 a23 a a
A12 = a21 a22 a23 = C12 = − det 21 23
a31 a33 a31 a33
a31 a32 a33
a11 a12 a13
a21 a22 a a
A13 = a21
a22 a23 =
C13 = + det 21 22
a31 a32 a31 a32
a31 a32 a33
a11 a12 a13 a11 a22 a33 + a12 a23 a31 + a13 a21 a32
det a21 a22 a23 =
a31 a32 a33 −a13 a22 a31 − a11 a23 a32 − a12 a21 a33
a11 a12 a13 a11 a12 a11 a12 a13 a11 a12
a21 a22 a23 a21 a22 − a21 a22 a23 a21 a22
a31 a32 a33 a31 a32 a31 a32 a33 a31 a32
Alternatively, it is not necessary to repeat the first two columns if you allow your
diagonals to “wrap around” the sides of a matrix, like in Pac-Man or Asteroids.
5.2. COFACTOR EXPANSIONS 213
1 3 5
Example. Find the determinant of A = 2 0 −1 .
4 −3 1
Solution. We repeat the first two columns on the right, then add the products of
the downward diagonals and subtract the products of the upward diagonals:
13 5 1 3 1 3 5 1 3
20 −1 2 0 − 2 0 −1 2 0
4−3 1 4 −3 4 −3 1 4 −3
1 3 5 (1)(0)(1) + (3)(−1)(4) + (5)(2)(−3)
det 2 0 −1 = = −51.
4 −3 1 − (5)(0)(4) − (1)(−1)(−3) − (3)(2)(1)
Solution. The fourth column has two zero entries. We expand along the fourth
column to find
−2 −3 2 2 5 −3
det(A) = 2 det 1 3 −2 − 5 det 1 3 −2
−1 6 4 −1 6 4
− 0 det(don’t care) + 0 det(don’t care).
We only have to compute two cofactors. We can find these determinants using any
method we wish; for the sake of illustration, we will expand cofactors on one and
use the formula for the 3 × 3 determinant on the other.
Expanding along the first column, we compute
−2 −3 2
det 1 3 −2
−1 6 4
3 −2 −3 2 −3 2
= −2 det − det − det
6 4 6 4 3 −2
= −2(24) − (−24) − 0 = −48 + 24 + 0 = −24.
214 CHAPTER 5. DETERMINANTS
−λ 2 7 12
3 1−λ 2 −4
A= .
0 1 −λ 7
0 0 0 2−λ
We only have to compute one cofactor. To do so, first we clear the (3, 3)-entry by
performing the column replacement C3 = C3 + λC2 , which does not change the
determinant:
−λ 2 7 −λ 2 7 + 2λ
det 3 1 − λ 2 = det 3 1 − λ 2 + λ(1 − λ) .
0 1 −λ 0 1 0
= −λ3 + λ2 + 8λ + 21.
Therefore, we have
2. Cofactor expansion.
This is usually most efficient when there is a row or column with several
zero entries, or if the matrix has unknown entries.
Remember, all methods for computing the determinant yield the same number.
1
d −b d −b
A= =⇒ −1
A = .
−c a det(A) −c a
1
C11 C21
A−1 = .
det(A) C12 C22
Note that the (i, j) cofactor Ci j goes in the ( j, i) entry the adjugate matrix, not the
(i, j) entry: the adjugate matrix is the transpose of the cofactor matrix.
Remark. In fact, one always has A · adj(A) = adj(A) · A = det(A)I n , whether or not
A is invertible.
1 1 0 1 0 1
A11 = A12 = A13 =
1 0 1 0 1 1
0 1 1 1 1 0
A21 = A22 = A23 =
1 0 1 0 1 1
0 1 1 1 1 0
A31 = A32 = A33 =
1 1 0 1 0 1
It is clear from the previous example that (5.2.1) is a very inefficient way of
computing the inverse of a matrix, compared to augmenting by the identity matrix
and row reducing, as in this subsection in Section 4.5. However, it has its uses.
• If a matrix has unknown entries, then it is difficult to compute its inverse
using row reduction, for the same reason it is difficult to compute the de-
terminant that way: one cannot be sure whether an entry containing an
unknown is a pivot or not.
• This formula is useful for theoretical purposes. Notice that the only denomi-
nators in (5.2.1) occur when dividing by the determinant: computing cofac-
tors only involves multiplication and addition, never division. This means,
for instance, that if the determinant is very small, then any measurement
error in the entries of the matrix is greatly magnified when computing the
inverse. In this way, (5.2.1) is useful in error analysis.
The proof of the theorem uses an interesting trick called Cramer’s Rule, which
gives a formula for the entries of the solution of an invertible matrix equation.
Cramer’s Rule. Let x = (x 1 , x 2 , . . . , x n ) be the solution of Ax = b, where A is an
invertible n × n matrix and b is a vector in Rn . Let Ai be the matrix obtained from A
by replacing the ith column by b. Then
det(Ai )
xi = .
det(A)
Proof. First suppose that A is the identity matrix, so that x = b. Then the matrix
Ai looks like this:
1 0 b1 0
0 1 b2 0
0 0 b 0.
3
0 0 b4 1
218 CHAPTER 5. DETERMINANTS
Expanding cofactors along the ith row, we see that det(Ai ) = bi , so in this case,
det(Ai )
x i = bi = det(Ai ) = .
det(A)
In particular, det(A) and det(Ai ) are both scaled by a factor of c, so det(Ai )/ det(A)
is unchanged.
5.2. COFACTOR EXPANSIONS 219
In particular, det(A) and det(Ai ) are both negated, so det(Ai )/ det(A) is un-
changed.
Now we compute
det(A1 ) d − 2b det(A2 ) 2a − c
= = .
det(A) ad − bc det(A) ad − bc
It follows that
1
d − 2b
x= .
ad − bc 2a − c
Now we use Cramer’s rule to prove the first theorem of this subsection.
Proof. The jth column of A−1 is x j = A−1 e j . This vector is the solution of the matrix
equation
Ax = A A−1 e j = I n e j = e j .
220 CHAPTER 5. DETERMINANTS
By Cramer’s rule, the ith entry of x j is det(Ai )/ det(A), where Ai is the matrix
obtained from A by replacing the ith column of A by e j :
a11 a12 0 a14
a a 1 a24
Ai = 21 22 (i = 3, j = 2).
a31 a32 0 a34
a41 a42 0 a44
Expanding cofactors along the ith column, we see the determinant of Ai is exactly
the ( j, i)-cofactor C ji of A. Therefore, the jth column of A−1 is
C j1
C
1 j2
xj = . ,
det(A) ..
C jn
and thus
C11 C21 ··· Cn−1,1 Cn1
| | | C12 C22 ··· Cn−1,2 Cn2
1 .. .. .. .. ..
A−1
= x1 x2
· · · xn =
.
. . . . .
| | | det(A) C
1,n−1 C2,n−1 · · · Cn−1,n−1 Cn,n−1
C1n C2n · · · Cn−1,n Cnn
Objectives
1. Understand the relationship between the determinant of a matrix and the
volume of a parallelepiped.
2. Learn to use determinants to compute volumes of parallelograms and trian-
gles.
3. Learn to use determinants to compute the volume of some curvy shapes like
ellipses.
4. Pictures: parallelepiped, the image of a curvy shape under a linear transfor-
mation.
5. Theorem: determinants and volumes.
6. Vocabulary word: parallelepiped.
e3
e2
e2
e1 e1
P
v2
v1
v3
v2
v1
222 CHAPTER 5. DETERMINANTS
When does a parallelepiped have zero volume? This can happen only if the
parallelepiped is flat, i.e., it is squashed into a lower dimension.
v1 v1
v3
v2 P v2 P
This means exactly that {v1 , v2 , . . . , vn } is linearly dependent, which by this corol-
lary in Section 5.1 means that the matrix with rows v1 , v2 , . . . , vn has determinant
zero. To summarize:
| det(A)| = vol(P).
Proof. Since the four defining properties characterize the determinant, they also
characterize the absolute value of the determinant. Explicitly, | det | is a function
on square matrices which satisfies these properties:
The absolute value of the determinant is the only such function: indeed, by this
recipe in Section 5.1, if you do some number of row operations on A to obtain a
matrix B in row echelon form, then
(product of the diagonal entries of B)
| det(A)| = .
(product of scaling factors used)
For a square matrix A, we abuse notation and let vol(A) denote the volume
of the paralellepiped determined by the rows of A. Then we can regard vol as a
function from the set of square matrices to the real numbers. We will show that
vol also satisfies the above four properties.
1. For simplicity, we consider a row replacement of the form R n = R n + cR i .
The volume of a paralellepiped is the volume of its base, times its height:
here the “base” is the paralellepiped determined by v1 , v2 , . . . , vn−1 , and the
“height” is the perpendicular distance of vn from the base.
height
v2 v3 height
v2
v1 base
e
bas v1
height v2
v2 −−−−→ height
v2 − .5v1
v1 v1
base bas
e
v3 + .5v1
2. For simplicity, we consider a row scale of the form R n = cR n . This scales the
length of vn by a factor of |c|, which also scales the perpendicular distance
of vn from the base by a factor of |c|. Thus, vol(A) is scaled by |c|.
height
3
v2 −−−−→ 3 4 height
4 v2
v1 v1
e e
bas bas
4
3 v3
4
−−−−→ 3 height
v3 height
v2 v2
base base
v1 v1
v2 −−−−→ v1
v1 v2
v2 −−−−→ v3
v3 v2
v1 v1
5.3. DETERMINANTS AND VOLUMES 225
4. The rows of the identity matrix I n are the standard coordinate vectors e1 , e2 , . . . , en .
The associated paralellepiped is the unit cube, which has volume 1. Thus,
vol(I n ) = 1.
Since det(A) = det(AT ) by the transpose property, the absolute value of det(A)
is also equal to the volume of the paralellepiped determined by the columns of A
as well.
Example (Length). A 1 × 1 matrix A is just a number a . In this case, the paral-
lelepiped P determined by its one row is just the interval [0, a] (or [a, 0] if a < 0).
The “volume” of a region in R1 = R is just its length, so it is clear in this case that
vol(P) = |a|.
vol(P) = |a|
0 a
a a b
area = det = |ad − bc|
b c d
c
d
Example. Find the area of the parallelogram with sides (1, 3) and (2, −3).
2
−1
−1
−4
Note that we do not need to know where the origin is in the picture: vectors
are determined by their length and direction, not where they start. The area is
−1 −4
det = |1 + 8| = 9.
2 −1
Example (Area of a triangle). Find the area of the triangle with vertices (−1, −2), (2, −1), (1, 3).
2
5
3
1
v2 v1
v1 v2
— v1 — — v1 —
det >0 det <0
— v2 — — v2 —
a
For example, if v1 = b , then the counterclockwise rotation of v1 by 90◦ is
−b
v2 = a by this example in Section 4.3, and
a b
det = a2 + b2 > 0.
−b a
b
On the other hand, the clockwise rotation of v1 by 90◦ is −a , and
a b
det = −a2 − b2 < 0.
b −a
For a 3×3 matrix with rows v1 , v2 , v3 , the right-hand rule determines the sign of
the determinant. If you point the index finger of your right hand in the direction
of v1 and your middle finger in the direction of v2 , then the determinant is positive
if your thumb points roughly in the direction of v3 , and it is negative otherwise.
v3 v3
v2 v1
v1 v2
— v1 — — v1 —
det — v2 — > 0
det — v2 — < 0
— v3 — — v3 —
T
e2 C | | P
v1 v2 v2
| |
e1 v1
Since the unit cube has volume 1 and its image has volume | det(A)|, the trans-
formation T scaled the volume of the cube by a factor of | det(A)|. To rephrase:
The notation T (S) means the image of the region S under the transformation
T . In set builder notation, this is the subset
T (S) = T (x) | x in S .
In fact, T scales the volume of any region in Rn by the same factor, even for
curvy regions.
Proof. Let C be the unit cube, let v1 , v2 , . . . , vn be the columns of A, and let P
be the paralellepiped determined by these vectors, so T (C) = P and vol(P) =
| det(A)|. For " > 0 we let "C be the cube with side lengths ", i.e., the paralellepiped
determined by the vectors "e1 , "e2 , . . . , "en , and we define "P similarly. By the
second defining property, T takes "C to "P. The volume of "C is " n (we scaled each
of the n standard vectors by a factor of ") and the volume of "P is " n | det(A)| (for
the same reason), so we have shown that T scales the volume of "C by | det(A)|.
5.3. DETERMINANTS AND VOLUMES 231
T
| |
"e2 "C v1 v2
"v2 "P
| |
"e1
"v1
S T (S)
T
T
T (S)
S
1 2
2 1
Solution. The area of the unit circle is π, so the area of S is π/2. The transfor-
mation T scales areas by a factor of | det(A)| = |1 − 4| = 3, so
3π
vol(T (S)) = 3 vol(S) = .
2
Example (Area of an ellipse). Find the area of the interior E of the ellipse defined
by the equation
2x − y 2 y + 3x 2
+ = 1.
2 3
Solution. This ellipse is obtained from the unit circle X 2 + Y 2 = 1 by the linear
change of coordinates
2x − y
X=
2
y + 3x
Y= .
3
In other words, if we define a linear transformation T : R2 → R2 by
(2x − y)/2
x
T = ,
y ( y + 3x)/3
x x
then T y lies on the unit circle C whenever y lies on E.
T
E
1 −1/2
C
1 1/3
5.3. DETERMINANTS AND VOLUMES 233
Remark (Multiplicativity of | det |). The above theorem also gives a geometric rea-
son for multiplicativity of the (absolute value of the) determinant. Indeed, let A
and B be n × n matrices, and let T, U : Rn → Rn be the corresponding matrix trans-
formations. If C is the unit cube, then
vol T ◦ U(C) = vol T (U(C)) = | det(A)| vol(U(C))
= | det(A)| · | det(B)| vol(C)
= | det(A)| · | det(B)|.
On the other hand, the matrix for the composition T ◦ U is the product AB, so
vol T ◦ U(C) = | det(AB)| vol(C) = | det(AB)|.
4. rabbits produce 0, 6, 8 baby rabbits in their first, second, and third years,
respectively.
What is the asymptotic behavior of this system? What will the rabbit population
look like in 100 years?
Left: the population of rabbits in a given year. Right: the proportions of rabbits in
that year. Choose any values you like for the starting population, and click “Advance
1 year” several times. What do you notice about the long-term behavior of the ratios?
This phenomenon turns out to be due to eigenvectors.
In Section 6.1, we will define eigenvalues and eigenvectors, and show how to
compute the latter; in Section 6.2 we will learn to compute the former. In Sec-
tion 6.3 we introduce the notion of similar matrices, and demonstrate that similar
matrices do indeed behave similarly. In Section 6.4 we study matrices that are
similar to diagonal matrices and in Section 6.5 we study matrices that are simi-
lar to rotation-scaling matrices, thus gaining a solid geometric understanding of
large classes of matrices. Finally, we spend Section 6.6 presenting a common kind
of application of eigenvalues and eigenvectors to real-world problems, including
searching the Internet using Google’s PageRank algorithm.
235
236 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
Objectives
In this section, we define eigenvalues and eigenvectors. These form the most
important facet of the structure theory of square matrices. As such, eigenvalues
and eigenvectors tend to play a key role in the real-life applications of linear alge-
bra.
Av v is an eigenvector
w
w is not an eigenvector
v Aw
Aw w Av
v v is an eigenvector
w is not an eigenvector
Au
u L
The vector u is not an eigenvector, because Au is not collinear with u and the
origin.
Az L
L
Av
The vector v is an eigenvector because Av is collinear with v and the origin. The
vector Av has the same length as v, but the opposite direction, so the associated
eigenvalue is −1.
w
Aw
L
240 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
An eigenvector of A is a vector x such that Ax is collinear with x and the origin. Click
and drag the head of x to convince yourself that all such vectors lie either on L, or on
the line perpendicular to L.
Au
The vector u is not an eigenvector, because Au is not collinear with u and the
origin.
Az
Av
w Aw
An eigenvector of A is a vector x such that Ax is collinear with x and the origin. Click
and drag the head of x to convince yourself that all such vectors lie on the coordinate
axes.
Example (Identity). Find all eigenvalues and eigenvectors of the identity matrix
In.
Solution. The identity matrix has the property that I n v = v for all vectors v in
Rn . We can write this as I n v = 1 · v, so every nonzero vector is an eigenvector with
eigenvalue 1.
Av
v
v
Av
6.1. EIGENVALUES AND EIGENVECTORS 243
On the other hand, any vector v on the x-axis has zero y-coordinate, so it is
not moved by A. Hence v is an eigenvector with eigenvalue 1.
w Aw
All eigenvectors of a shear lie on the x-axis. Click and drag the head of x to find the
eigenvectors.
v
Av
This rotation matrix has no eigenvectors. Click and drag the head of x to find one.
244 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
6.1.2 Eigenspaces
Let A be an n × n matrix, and let λ be a scalar. The eigenvectors with eigenvalue
λ, if any, are the nonzero solutions of the equation Av = λv. We can rewrite this
equation as follows:
Av = λv
⇐⇒ Av − λv = 0
⇐⇒ Av − λI n v = 0
⇐⇒ (A − λI n )v = 0.
6.1. EIGENVALUES AND EIGENVECTORS 245
Therefore, the eigenvectors of A with eigenvalue λ, if any, are the nontrivial solu-
tions of the matrix equation (A−λI n )v = 0, i.e., the nonzero vectors in Nul(A−λI n ).
If this equation has no nontrivial solutions, then λ is not an eigenvector of A.
The above observation is important because it says that finding the eigenvec-
tors for a given eigenvalue means solving a homogeneous system of equations. For
instance, if
7 1 3
A = −3 2 −3 ,
−3 −2 −1
then an eigenvector with eigenvalue λ is a nontrivial solution of the matrix equa-
tion
7 1 3 x x
−3 2 −3 y = λ y .
−3 −2 −1 z z
This translates to the system of equations
7x + y + 3z = λx (7 − λ)x + y+ 3z = 0
( (
−3x + 2 y − 3z = λ y −−−→ −3x + (2 − λ) y − 3z = 0
−3x − 2 y − z = λz −3x − 2 y + (−1 − λ)z = 0.
This is the same as the homogeneous matrix equation
7−λ 1 3 x
−3 2 − λ −3 y = 0,
−3 −2 −1 − λ z
i.e., (A − λI3 )v = 0.
Note. Since a nonzero subspace is infinite, every eigenvalue has infinitely many
eigenvectors. (For example, multiplying an eigenvector by a nonzero scalar gives
another eigenvector.) On the other hand, there can be at most n linearly indepen-
dent eigenvectors of an n × n matrix, since Rn has dimension n.
x = −4 y
§
1 4 parametric parametric x −4
−−−−−→ −−−−−→ =y .
0 0 form y = y vector form y 1
This matrix has determinant −6, so it is invertible. By the invertible matrix theo-
rem in Section 4.5, we have Nul(A − I2 ) = {0}, so 1 is not an eigenvalue.
The eigenvectors of A with eigenvalue −2, if any, are the nonzero solutions of
the matrix equation (A + 2I2 )v = 0. We have
2 −4 1 0 4 −4
A + 2I2 = +2 = .
−1 −1 0 1 −1 1
Hence
there exist eigenvectors with eigenvalue −2, namely, any nonzero multiple
1
1
of 1 . A basis for the −2-eigenspace is 1 .
The matrix A − 2I3 has two free variables, so the null space of A − 2I3 is nonzero,
and thus 2 is an eigenvector. A basis for the 2-eigenspace is
0 −2
1 , 0 .
0 1
This is a plane in R3 .
The eigenvectors of A with eigenvalue 12 , if any, are the nonzero solutions of
the matrix equation (A − 21 I3 )v = 0. We have
7/2 0 3 1 0 0 3 0 3
1 1
A − I3 = −3/2 2 −3 −
0 1 0 = −3/2 3/2 −3 .
2 −3/2 0 1 2 0 0 1 −3/2 0 −3/2
This is a line in R3 .
The number 0 is an eigenvalue of A if and only if Nul(A − 0I3 ) = Nul(A) is
nonzero. This is the same as asking whether A is noninvertible, by the invertible
matrix theorem in Section 4.5. The determinant of A is det(A) = 2 6= 0, so A is
invertible by the invertibility property in Section 5.1. It follows that 0 is not an
eigenvalue of A.
The 2-eigenspace is the violet plane. This means that A scales every vector in that
plane by a factor of 2. The 12 -eigenspace is the green line. Click and drag the vector x
around to see how A acts on that vector.
The violet line L is the 1-eigenspace, and the green line L ⊥ is the −1-eigenspace.
6.1. EIGENVALUES AND EIGENVECTORS 249
1. A is invertible.
2. T is invertible.
4. A has n pivots.
6. Nul(A) = {0}.
7. nullity(A) = 0.
10. T is one-to-one.
14. Col(A) = Rn .
16. rank(A) = n.
17. T is onto.
20. det(A) 6= 0.
Objectives
1. Learn that the eigenvalues of a triangular matrix are the diagonal entries.
Solution. We have
λ 0
5 2
f (λ) = det(A − λI2 ) = det −
2 1 0 λ
5−λ
2
= det
2 1−λ
= (5 − λ)(1 − λ) − 2 · 2 = λ2 − 6λ + 1.
Example (Finding eigenvalues). Find the eigenvalues and eigenvectors of the ma-
trix
5 2
A= .
2 1
Example (Finding eigenvalues). Find the eigenvalues and eigenvectors of the ma-
trix
0 6 8
1
A = 2 0 0.
0 21 0
−λ3 + 3λ + 2
= −λ2 − 2λ − 1 = −(λ + 1)2 .
λ−2
Therefore,
f (λ) = −(λ − 2)(λ + 1)2 ,
so the only eigenvalues are λ = 2, −1.
We compute the 2-eigenspace by solving the homogeneous system (A−2I3 )x =
0. We have
−2 6 8 1 0 −16
1 RREF
A − 2I3 = 2 −2 0 −−→ 0 1 −4 .
1
0 2 −2 0 0 0
The parametric form and parametric vector form of the solutions are:
x = 16z x 16
(
y = 4z −→ y = z 4 .
z= z z 1
The green line is the −1-eigenspace, and the violet line is the 2-eigenspace.
Form of the characteristic polynomial It is time that we justified the use of the
term “polynomial.” First we need a vocabulary word.
Definition. The trace of a square matrix A is the number Tr(A) obtained by sum-
ming the diagonal entries of A:
a11 a12 ··· a1,n−1 a1n
a21 a22 ··· a2,n−1 a2n
. .. .. ..
..
= a11 + a22 + · · · + ann .
Tr .
. . . . .
a
n−1,1 an−1,2 · · · an−1,n−1 an−1,n
an1 an2 · · · an,n−1 ann
In other words, the coefficient of λn−1 is ± Tr(A), and the constant term is det(A) (the
other coefficients are just numbers without names).
a−λ
b
f (λ) = det(A − λI2 ) = det = (a − λ)(d − λ) − bc
c d −λ
= λ2 − (a + d)λ + (ad − bc) = λ2 − Tr(A)λ + det(A).
6.2. THE CHARACTERISTIC POLYNOMIAL 255
Solution. We have
For example, if A has integer entries, then its characteristic polynomial has
integer coefficients. This gives us one way to find a root by hand, if A has an
eigenvalue that is a rational number. Once we have found one root, then we can
reduce the degree by polynomial long division.
The determinant of A is the constant term f (0) = 4; its integer divisors are ±1, ±2, ±4.
We check which are roots:
In the above example, we could have expanded cofactors along the second
column to obtain
7−λ
3
f (λ) = (2 − λ) det .
−3 −1 − λ
Since 2 − λ was the only nonzero entry in its column, this expression already
has the 2 − λ term factored out: the rational root theorem was not needed. The
determinant
in the above expression is the characteristic polynomial of the matrix
7 3
−3 −1 , so we can compute it using the trace and determinant:
f (λ) = (2 − λ) λ2 − (7 − 1)λ + (−7 + 9) = (2 − λ)(λ2 − 6λ + 2).
The constant term is zero, so A has determinant zero. We factor out λ, then eyeball
the roots of the quadratic factor:
Finding Eigenvalues of a 3×3 Matrix. Let A be a 3×3 matrix. Here are some
strategies for factoring its characteristic polynomial f (λ). First, you must find
one eigenvalue:
3. If the matrix is triangular, the roots are the diagonal entries (see below).
1 7 2 4
0 1 3 11
A= .
0 0 π 101
0 0 0 0
6.3 Similarity
Objectives
1 11 6
A=
10 9 14
2/3
has eigenvalues 2 and 1/2, with corresponding eigenvectors v1 = 1 and v2 =
−1
1 . Notice that
The matrices A and D behave similarly. Click “multiply” to multiply the colored points
by D on the left and A on the right. (We will see in Section 6.4 why the points follow
hyperbolic paths.)
The other case of particular importance will be matrices that “behave” like a
rotation matrix: indeed, this will be crucial for understanding Section 6.5 geomet-
rically. See this important note.
In this section, we study in detail the situation when two matrices behave sim-
ilarly with respect to different coordinate systems. In Section 6.4 and Section 6.5,
we will show how to use eigenvalues and eigenvectors to find a simpler matrix
that behaves like a given matrix.
are not similar. Indeed, the second matrix is the identity matrix I2 , so if C is any
invertible 2 × 2 matrix, then
3 0
C I2 C = C C = I2 6=
−1 −1
.
0 −2
As in the above example, one can show that I n is the only matrix that is similar
to I n , and likewise for any scalar multiple of I n .
Proof.
C −1 AC = C −1 (C BC −1 )C = B.
It follows that
−12 5 3 0
and
−30 13 0 −2
are similar to each other.
An = C B n C −1 .
Next we have
A3 = A2 A = (C B 2 C −1 )(C BC −1 ) = C B 2 (C −1 C)BC −1 = C B 3 C −1 .
100 −1
−2 3 0 −1 −2 3
A 100
= .
1 −1 1 0 1 −1
The matrix 01 −1 0 is a counterclockwise rotation by 90◦ . If we rotate by 90◦ four
times, then we end up where we started. Hence rotating by 90◦ one hundred times
is the identity transformation, so
−1
−2 3 1 0 −2 3 1 0
A100
= = .
1 −1 0 1 1 −1 0 1
Since C is the matrix with columns v1 , v2 , . . . , vn , this says that x = C[x]B . Multi-
plying both sides by C −1 gives [x]B = C −1 x. To summarize:
This says that C changes from the B-coordinates to the usual coordinates, and
C −1 changes from the usual coordinates to the B-coordinates.
multiply by C −1
[x]B Ax x
B[x]B
multiply by C
1/2 3/2 2 0 1 1
A= B= C= .
3/2 1/2 0 −1 1 −1
1
One can verify that A = C BC −1 : see this example in Section 6.4. Let v1 = 1 and
1
v2 = −1 , the columns of C, and let B = {v1 , v2 }, a basis of R2 .
The matrix B is diagonal: it scales the x-direction by a factor of 2 and the
y-direction by a factor of −1.
264 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
B
e2
e1 Be1
Be2
multiply by C −1
[x]B
scale x by 2
scale y by −1 Ax
B[x]B x
multiply by C
1 5
Now let x = 2 −3 .
1
2. Multiplying by B scales the coordinates: B[x]B = −2 .
1
3. Interpreting −2 as a B-coordinate vector, we multiply by C to get
1 −1
Ax = C = v1 − 2v2 = .
−2 3
[x]B multiply by C −1
Ax
scale x by 2
scale y by −1
x
B[x]B
multiply by C
To summarize:
B
e2
e1 Be1
Be2
C −1 C
Av1
v1 Av2
v2
A
The geometric relationship between the similar matrices A and B acting on R2 . Click
and drag the heads of x and [x]B . Study this picture until you can reliably predict
where the other three vectors will be after moving one of them: this is the essence of
the geometry of similar matrices.
1 −8 −9 1 −1 −3
2 0
A =
0
B= C =
0
.
5 6 13 0 −1 2 2 1
−1 −3
Then A0 = C 0 B(C 0 )−1 , as one can verify. Let v10 = 12 2 and v20 = 21 1 , the columns
of C 0 , and let B0 = {v10 , v20 }. Then A0 does the same thing as B, as in the previous
example, except A0 uses the B0 -coordinate system. In other words:
B
e2
e1 Be1
Be2
(C 0 )−1 C0
A0 v10
v10
v20
A0 v20
A0
The geometric relationship between the similar matrices A0 and B acting on R2 . Click
and drag the heads of x and [x]B0 .
1
7 −17 0 −1 2 −1/2
A= B= C= .
6 5 −7 1 0 1 1/2
2 −1
One can verify that A = C BC −1 . Let v1 = 1 and v2 = 12 1 , the columns of C, and
B
e2 Be1
e1 Be2
1 −5
−1
Ax = C = −v1 + v2 = .
1 2 −1
multiply by C −1
B[x]B [x]B x
rotate by 90◦
Ax
multiply by C
1 −1
Now let x = 2 −2 .
1
3. Interpreting −1/2 as a B-coordinate vector, we multiply by C to get
1 1 9
1
Ax = C = v1 − v2 = .
−1/2 2 4 3
multiply by C −1
Ax
rotate by 90◦
B[x]B
x
[x]B
multiply by C
To summarize:
• B rotates counterclockwise around the circle centered at the origin and pass-
ing through e1 and e2 .
• A rotates counterclockwise around the ellipse centered at the origin and pass-
ing through v1 and v2 .
270 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
B
e2
Be1
e1
Be2
C −1 C
v1
v2 Av1
A0 Av2
The geometric relationship between the similar matrices A and B acting on R2 . Click
and drag the heads of x and [x]B .
6.3. SIMILARITY 271
B
e2
Be1
e1
Be2
C −1 C
v1
v2 Av1
A0 Av2
272 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
The geometric relationship between the similar matrices A and B acting on R3 . Click
and drag the heads of x and [x]B .
A − λI n = C BC −1 − λC C −1 = C BC −1 − CλC −1
= C BC −1 − CλI n C −1 = C(B − λI n )C −1 .
Therefore,
Here we have used the multiplicativity property in Section 5.1 and its corollary in
Section 5.1.
Since the eigenvalues of a matrix are the roots of its characteristic polynomial,
we have shown:
By this theorem in Section 6.2, similar matrices also have the same trace and
determinant.
6.3. SIMILARITY 273
both have characteristic polynomial f (λ) = (λ − 1)2 , but they are not similar,
because the only matrix that is similar to I2 is I2 itself.
Given that similar matrices have the same eigenvalues, one might guess that
they have the same eigenvectors as well. Upon reflection, this is not what one
should expect: indeed, the eigenvectors should only match up after changing from
one coordinate system to another. This is the content of the next fact, remembering
that C and C −1 change between the usual coordinates and the B-coordinates.
v is an eigenvector of A =⇒ C −1 v is an eigenvector of B
v is an eigenvector of B =⇒ C v is an eigenvector of A.
1 1
so A = C BC −1 . Let v1 = 1 and v2 = −1 , the columns of C. Recall that:
This means that the x-axis is the 2-eigenspace of B, and the y-axis is the −1-
eigenspace of B; likewise, the “v1 -axis” is the 2-eigenspace of A, and the “v2 -axis”
is the −1-eigenspace of A. This is consistent with the fact, as multiplication by C
changes e1 into C e1 = v1 and e2 into C e2 = v2 .
ce
1-
a
ei
sp
C
ge
en
ns
ig
pa
e
2-
ce
2-eigenspace
−1-eigenspace
The eigenspaces of A are the lines through v1 and v2 . These are the images under C
of the coordinate axes, which are the eigenspaces of B.
C
ens
pac
e
2-eigenspace
−1-eigenspace
−1-
eige
nspa
ce
6.4. DIAGONALIZATION 275
The eigenspaces of A0 are the lines through v10 and v20 . These are the images under C 0
of the coordinate axes, which are the eigenspaces of B.
In other words, the x y-plane is the −1-eigenspace of B, and the z-axis is the 2-
eigenspace of B; likewise, the “v1 , v2 -plane” is the −1-eigenspace of A, and the
“v3 -axis” is the 2-eigenspace of A. This is consistent with the fact, as multiplication
by C changes e1 into C e1 = v1 , e2 into C e2 = v2 , and e3 into C e3 = v3 .
The −1-eigenspace of A is the green plane, and the 2-eigenspace of A is the violet line.
These are the images under C of the x y-plane and the z-axis, respectively, which are
the eigenspaces of B.
6.4 Diagonalization
Objectives
2. Develop a library of examples of matrices that are and are not diagonalizable.
Diagonal matrices are the easiest kind of matrices to understand: they just
scale the coordinate directions by their diagonal entries. In Section 6.3, we saw
that similar matrices behave in the same way, with respect to different coordinate
systems. Therefore, if a matrix is similar to a diagonal matrix, it is also relatively
easy to understand. This section is devoted to the question: “When is a matrix
similar to a diagonal matrix?”
6.4.1 Diagonalizability
Before answering the above question, first we give it a name.
Example. Let
−1
1/2 3/2 1 1 2 0 1 1
A= = .
3/2 1/2 1 −1 0 −1 1 −1
Now we come to the primary criterion for diagonalizability. It shows that di-
agonalizability is an eigenvalue problem.
278 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
λ1 0 · · · 0
| | | 0 λ2 · · · 0
C = v1 v2 · · · vn
D= ... .. . . ,
..
| | | . . .
0 0 · · · λn
λ1 0 · · ·
0 0
0 λ2 · · · 0 0
. . . ..
D= .. .. . . . ..
.
.
0 0 ··· λ 0 n−1
0 0 ··· 0 λn
1 0 0 0 0
0 2 0 0 = 3 · 0 .
0 0 3 1 1
same order.
−1
| | | λ1 0 0 | | |
A = v1 v2 v3 0 λ2 0 v1 v2 v3
| | | 0 0 λ3 | | |
−1
| | | λ3 0 0 | | |
= v3 v2 v1 0 λ2 0 v3 v2 v1 .
| | | 0 0 λ1 | | |
There are other ways of finding different diagonalizations of the same matrix.
For instance, you can scale one of the eigenvectors by a constant c:
−1
| | | λ1 0 0 | | |
A = v1
v2 v3 0 λ2 0 v1 v2 v3
| | | 0 0 λ3 | | |
−1
| | | λ1 0 0 | | |
= cv1 v2 v3 0 λ2 0 cv1 v2 v3 ,
| | | 0 0 λ3 | | |
you can find a different basis entirely for an eigenspace of dimension at least 2,
etc.
The green line is the −1-eigenspace of A, and the violet line is the 2-eigenspace. There
are two linearly independent (noncollinear) eigenvectors visible in the picture: choose
any nonzero vector on the green line, and any nonzero vector on the violet line.
Therefore, the eigenvalues are 0 and 2. We need to compute eigenvectors for each
eigenvalue. We start with λ1 = 0:
2/3 −4/3 RREF 1 −2
(A − 0I2 )v = 0 ⇐⇒ v = 0 −−→ v = 0.
−2/3 4/3 0 0
2
The parametric form is x = 2 y, so v1 = 1 is an eigenvector with eigenvalue λ1 .
Now we find an eigenvector with eigenvalue λ2 = 2:
−4/3 −4/3 RREF 1 1
(A − 2I2 )v = 0 ⇐⇒ v = 0 −−→ v = 0.
−2/3 −2/3 0 0
1
The parametric form is x = − y, so v2 = −1 is an eigenvector with eigenvalue 2.
The eigenvectors v1 , v2 are linearly independent, so the diagonalization theo-
rem says that
2 1 0 0
A = C DC −1
for C = D= .
1 −1 0 2
Alternatively, if we choose 2 as our first eigenvalue, then
1 2 2 0
A = C D (C )
0 0 0 −1
for C = 0
D =
0
.
−1 1 0 0
282 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
Therefore, the eigenvalues are 1 and 2. We need to compute eigenvectors for each
eigenvalue. We start with λ1 = 1:
3 −3 0 1 −1 0
RREF
(A − I3 )v = 0 ⇐⇒ 2 −2 0 v = 0 −−→ 0 0 0 v = 0.
1 −1 0 0 0 0
The green plane is the 1-eigenspace of A, and the violet line is the 2-eigenspace. There
are three linearly independent eigenvectors visible in the picture: choose any two
noncollinear vectors on the green plane, and any nonzero vector on the violet line.
3. If there are fewer than n total vectors in all of the eigenspace bases Bλ ,
then the matrix is not diagonalizable.
λ1 0 · · · 0
| | | 0 λ2 · · · 0
C = v1 v2 · · · vn and D = ... .. . . ,
..
| | | . . .
0 0 · · · λn
We will justify the linear independence assertion in part 4 in the proof of this
theorem below.
Example (A shear is not diagonalizable). Let
1 1
A= ,
0 1
In other words, the 1-eigenspace is exactly the x-axis, so all of the eigenvectors
of A lie on the x-axis. It follows that A does not admit two linearly independent
eigenvectors, so by the diagonalization theorem, it is not diagonalizable.
In this example in Section 6.1, we studied the eigenvalues of a shear geomet-
rically; we reproduce the interactive demo here.
T (x)
L
The line L (violet) is the 1-eigenspace of A, and L ⊥ (green) is the 0-eigenspace. Since
there are linearly independent eigenvectors, we know that A is diagonalizable.
so the 2-eigenspace is the z-axis. In particular, all eigenvectors of A lie on the xz-
plane, so there do not exist three linearly independent eigenvectors of A. By the
diagonalization theorem, the matrix A is not diagonalizable.
Notice that A contains a 2 × 2 block on its diagonal that looks like a shear:
1 1 0
A = 0 1 0.
0 0 2
λ 1
Aλ =
0 λ
The characteristic polynomial is f (λ) = −(λ − 1)2 (λ − 2), so that 1 and 2 are the
eigenvalues, with algebraic multiplicities 2 and 1, respectively. We computed that
the 1-eigenspace is a plane and the 2-eigenspace is a line, so that 1 and 2 also have
geometric multiplicities 2 and 1, respectively. This matrix is diagonalizable.
The green plane is the 1-eigenspace of A, and the violet line is the 2-eigenspace. Hence
the geometric multiplicity of the 1-eigenspace is 2, and the geometric multiplicity of
the 2-eigenspace is 1.
also has characteristic polynomial f (λ) = −(λ − 1)2 (λ − 2), so that 1 and 2 are
the eigenvalues, with algebraic multiplicities 2 and 1, respectively. In this case,
however, both eigenspaces are lines, so that both eigenvalues have geometric mul-
tiplicity 1. This matrix is not diagonalizable.
We saw in the above examples that the algebraic and geometric multiplicities
need not coincide. However, they do satisfy the following fundamental inequality,
the proof of which is beyond the scope of this text.
Theorem (Algebraic and Geometric Multiplicity). Let A be a square matrix and let
λ be an eigenvalue of A. Then
1 ≤ (the geometric multiplicity of λ) ≤ (the algebraic multiplicity of λ).
In particular, if the algebraic multiplicity of λ is equal to 1, then so is the geo-
metric multiplicity.
We can use the theorem to give another criterion for diagonalizability (in ad-
dition to the diagonalization theorem).
6.4. DIAGONALIZATION 289
1. A is diagonalizable.
0 = c1 v1 + c2 v2 + · · · + cn vn .
Grouping the eigenvectors with the same eigenvalues, this sum has the form
Since eigenvectors with distinct eigenvalues are linearly independent, each “some-
thing in Vi ” is equal to zero. But this implies that all coefficients c1 , c2 , . . . , cn are
equal to zero, since the vectors in each Bi are linearly independent. Therefore, A
has n linearly independent eigenvectors, so it is diagonalizable.
The first part of the third statement simply says that the characteristic poly-
nomial of A factors completely into linear polynomials over the real numbers: in
other words, there are no complex (non-real) roots. The second part of the third
statement says in particular that for any diagonalizable matrix, the algebraic and
290 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
The examples at the beginning of this subsection illustrate the theorem. Here
we give some general consequences for diagonalizability of 2×2 and 3×3 matrices.
1. A has two different eigenvalues. In this case, each eigenvalue has algebraic
and geometric multiplicity equal to one. This implies A is diagonalizable.
For example:
1 7
A= .
0 2
1. A has three different eigenvalues. In this case, each eigenvalue has algebraic
and geometric multiplicity equal to one. This implies A is diagonalizable.
For example:
1 7 4
A = 0 2 3 .
0 0 −1
6.4. DIAGONALIZATION 291
2. A has two distinct eigenvalues λ1 , λ2 . In this case, one has algebraic mul-
tiplicity one and the other has algebraic multiplicity two; after reordering,
we can assume λ1 has multiplicity 1 and λ2 has multiplicity 2. This implies
that λ1 has geometric multiplicity 1, so A is diagonalizable if and only if the
λ2 -eigenspace is a plane. For example:
1 7 4
A = 0 2 0.
0 0 2
or 2:
1 0 1
A = 0 1 1
0 0 1
Similarity and multiplicity Recall from this fact in Section 6.3 that similar ma-
trices have the same eigenvalues. It turns out that both notions of multiplicity of
an eigenvalue are preserved under similarity.
Proof. Since A and B have the same characteristic polynomial, the multiplicity of
λ as a root of the characteristic polynomial is the same for both matrices, which
proves the first statement. For the second, suppose that A = C BC −1 for an invert-
ible matrix C. By this fact in Section 6.3, the matrix C takes eigenvectors of B to
eigenvectors of A, both with eigenvalue λ.
Let {v1 , v2 , . . . , vk } be a basis of the λ-eigenspace of B. We claim that {C v1 , C v2 , . . . , C vk }
is linearly independent. Suppose that
c1 C v1 + c2 C v2 + · · · + ck C vk = 0.
By the invertible matrix theorem in Section 6.1, the null space of C is trivial, so
this implies
c1 v1 + c2 v2 + · · · + ck vk = 0.
Since v1 , v2 , . . . , vk are linearly independent, we get c1 = c2 = · · · = ck = 0, as
desired.
By the previous paragraph, the dimension of the λ-eigenspace of A is greater
than or equal to the dimension of the λ-eigenspace of B. By symmetry (B is similar
to A as well), the dimensions are equal, so the geometric multiplicities coincide.
For instance, the four matrices in this example are not similar to each other,
because the algebraic and/or geometric multiplicities of the eigenvalues do not
match up. Or, combined with the above theorem, we see that a diagonalizable
matrix cannot be similar to a non-diagonalizable one, because the algebraic and
geometric multiplicities of such matrices cannot both coincide.
The 1-eigenspace of D is the x y-plane, and the 2-eigenspace is the z-axis. The
matrix C takes the x y-plane to the 1-eigenspace of A, which is again a plane, and
the z-axis to the 2-eigenspace of A, which is again a line. This shows that the
geometric multiplicities of A and D coincide.
The matrix C takes the x y-plane to the 1-eigenspace of A (the grid) and the z-axis to
the 2-eigenspace (the green line).
The converse of the theorem is false: there exist matrices whose eigenvectors
have the same algebraic and geometric multiplicities, but which are not similar.
See the example below. However, for 2 × 2 and 3 × 3 matrices whose characteristic
polynomial has no complex (non-real) roots, the converse of the theorem is true.
(We will handle the case of complex roots in Section 6.5.)
Example (Matrices that look similar but are not). Show that the matrices
0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0
A= and B =
0 0 0 1 0 0 0 1
0 0 0 0 0 0 0 0
have the same eigenvalues with the same algebraic and geometric multiplicities,
but are not similar.
Solution. These matrices are upper-triangular. They both have characteristic
polynomial f (λ) = λ4 , so they both have one eigenvalue 0 with algebraic mul-
tiplicity 4. The 0-eigenspace is the null space, which has dimension 2 in each
case because A and B have two columns without pivots. Hence 0 has geometric
multiplicity 2 in each case.
To show that A and B are not similar, we note that
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0
A2 = and B 2 = ,
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
A2 = C B 2 C −1 = C0C −1 = 0,
On the other hand, suppose that A and B are diagonalizable matrices with the
same characteristic polynomial. Since the geometric multiplicities of the eigenval-
ues coincide with the algebraic multiplicities, which are the same for A and B, we
conclude that there exist n linearly independent eigenvectors of each matrix, all
of which have the same eigenvalues. This shows that A and B are both similar to
the same diagonal matrix. Using the transitivity property of similar matrices, this
shows:
Diagonalizable matrices are similar if and only if they have the same character-
istic polynomial, or equivalently, the same eigenvalues with the same algebraic
multiplicities.
are similar.
Solution. Both matrices have the three distinct eigenvalues 1, −1, 4. Hence they
are both diagonalizable, and are similar to the diagonal matrix
1 0 0
0 −1 0 .
0 0 4
By the transitivity property of similar matrices, this implies that A and B are similar
to each other.
Example (Diagonal matrices with the same entries are similar). Any two diagonal
matrices with the same diagonal entries (possibly in a different order) are similar
to each other. Indeed, such matrices have the same characteristic polynomial. We
saw this phenomenon in this example, where we noted that
−1
1 0 0 0 0 1 3 0 0 0 0 1
0 2 0 = 0 1 00 2 00 1 0 .
0 0 3 1 0 0 0 0 1 1 0 0
Example (Eigenvalues |λ1 | > 1, |λ2 | < 1). Describe how the matrix
1 11 6
A=
10 9 14
5 1
f (λ) = λ − Tr(A)λ + det(A) = λ − λ + 1 = (λ − 2) λ −
2 2
.
2 2
We compute the 2-eigenspace:
1
−9 6 RREF 1 −2/3
(A − 2I3 )v = 0 ⇐⇒ v = 0 −−→ v = 0.
10 9 −6 0 0
2/3
The parametric form of this equation is x = 2/3 y, so one eigenvector is v1 = 1 .
For the 1/2-eigenspace, we have:
1 1 6 6
RREF 1 1
A − I3 v = 0 ⇐⇒ v = 0 −−→ v = 0.
2 10 9 9 0 0
−1
The parametric form of this equation is x = − y, so an eigenvector is v2 = 1 . It
follows that A = C DC −1 , where
2/3 −1 2 0
C= D= .
1 1 0 1/2
Dynamics of the matrices A and D. Click “multiply” to multiply the colored points by
D on the left and A on the right.
Example (Eigenvalues |λ1 | > 1, |λ2 | > 1). Describe how the matrix
1
13 −2
A=
5 −3 12
1
3 −2 RREF 1 −2/3
(A − 2I3 )v = 0 ⇐⇒ v = 0 −−→ v = 0.
5 −3 2 0 0
2/3
The parametric form of this equation is x = 2/3 y, so one eigenvector is v1 = 1 .
For the 3-eigenspace, we have:
1
−2 −2 RREF 1 1
(A − 3I3 )v = 0 ⇐⇒ v = 0 −−→ v = 0.
5 −3 −3 0 0
−1
The parametric form of this equation is x = − y, so an eigenvector is v2 = 1 . It
follows that A = C DC −1 , where
2/3 −1 2 0
C= D= .
1 1 0 3
Dynamics of the matrices A and D. Click “multiply” to multiply the colored points by
D on the left and A on the right.
6.4. DIAGONALIZATION 297
Example (Eigenvalues |λ1 | < 1, |λ2 | < 1). Describe how the matrix
1 12 2
A =
0
30 3 13
acts on the plane.
Solution. This is the inverse of the matrix A from the previous example. In that
example, we found A = C DC −1 for
2/3 −1 2 0
C= D= .
1 1 0 3
The diagonal matrix D−1 does the opposite of what D does: it scales the x-coordinate
by 1/2 and the y-coordinate by 1/3. Therefore, it moves vectors closer to both
coordinate axes, but faster in the y-direction. The matrix A0 does the same thing,
but with respect to the v1 , v2 -coordinate system.
Dynamics of the matrices A0 and D−1 . Click “multiply” to multiply the colored points
by D−1 on the left and A0 on the right.
Example (Eigenvalues |λ1 | = 1, |λ2 | < 1). Describe how the matrix
1 5 −1
A=
6 −2 4
acts on the plane.
Solution. First we diagonalize A. The characteristic polynomial is
3 1 1
f (λ) = λ − Tr(A)λ + det(A) = λ − λ + = (λ − 1) λ −
2 2
.
2 2 2
Next we compute the 1-eigenspace:
1 −1 −1
RREF 1 1
(A − I3 )v = 0 ⇐⇒ v = 0 −−→ v = 0.
6 −2 −2 0 0
−1
The parametric form of this equation is x = − y, so one eigenvector is v1 = 1 .
For the 1/2-eigenspace, we have:
1 1 2 −1
RREF 1 −1/2
A − I3 v = 0 ⇐⇒ v = 0 −−→ v = 0.
2 6 −2 1 0 0
298 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
1/2
The parametric form of this equation is x = 1/2 y, so an eigenvector is v2 = 1 .
It follows that A = C DC −1 , where
−1 1/2 1 0
C= D= .
1 1 0 1/2
The diagonal matrix D scales the y-coordinate by 1/2 and does not move the
x-coordinate. Therefore, it simply moves vectors closer to the x-axis along ver-
tical lines. The matrix A does the same thing, in the v1 , v2 -coordinate system:
multiplying a vector by A scales the v2 -coordinate by 1/2 and does not change
the v1 -coordinate. Therefore, A “sucks vectors into the 1-eigenspace” along lines
parallel to v2 .
Dynamics of the matrices A and D. Click “multiply” to multiply the colored points by
D on the left and A on the right.
scales the x-coordinate by 1/2, the y-coordinate by 2, and the z-coordinate by 3/2.
Looking straight down at the x y-plane, the points follow parabolic paths taking
them away from the x-axis and toward the y-axis. The z-coordinate is scaled by
3/2, so points fly away from the x y-plane in that direction.
If A = C DC −1 for some invertible matrix C, then A does the same thing as D,
but with respect to the coordinate system defined by the columns of C.
Dynamics of the matrices A and D. Click “multiply” to multiply the colored points by
D on the left and A on the right.
Objectives
In Section 6.4, we saw that a matrix whose characteristic polynomial has dis-
tinct real roots is diagonalizable: it is similar to a diagonal matrix, which is much
simpler to analyze. In this section, we study matrices whose characteristic poly-
nomial has complex roots. It turns out that such a matrix is similar (in the 2 × 2
case) to a rotation-scaling matrix, which is also relatively easy to understand.
In a certain sense, this entire section is analogous to Section 6.4, with rotation-
scaling matrices playing the role of diagonal matrices.
See Appendix A for a review of the complex numbers.
1 − (1 + i)
−1 −i −1
A − (1 + i)I2 = = .
1 1 − (1 + i) 1 −i
Now we row reduce, noting that the second row is i times the first:
−i −1 R2 =R2 −iR1 −i −1 R1 =R1 ÷−i 1 −i
−−−−−−→ −−−−−→ .
1 −i 0 0 0 0
i
The parametric form is x = i y, so that an eigenvector is v1 = 1 . Next we compute
an eigenvector for λ = 1 − i. We have
1 − (1 − i)
−1 i −1
A − (1 − i)I2 = = .
1 1 − (1 − i) 1 i
Now we row reduce, noting that the second row is −i times the first:
i −1 R2 =R2 +iR1 i −1 R1 =R1 ÷i 1 i
−−−−−−→ −−−−→ .
1 i 0 0 0 0
−i
The parametric form is x = −i y, so that an eigenvector is v2 = 1 .
We can verify our answers:
1 −1 i i−1 i
= = (1 + i)
1 1 1 i+1 1
1 −1 −i −i − 1 −i
= = (1 − i) .
1 1 1 −i + 1 1
Example (A 3 × 3 matrix). Find the eigenvalues and eigenvectors, real and com-
plex, of the matrix
4/5 −3/5 0
A = 3/5 4/5 0 .
1 2 2
1 2 2−λ 5
This polynomial has one real root at 2, and two complex roots at
p
8/5 ± 64/25 − 4 4 ± 3i
λ= = .
2 5
6.5. COMPLEX EIGENVALUES 301
We row reduce, noting that the second row is −i times the first:
i 1 0 i 1 0
R2 =R2 +iR1
1 −i 0 −−−−−−−−→ 0 0 0
6−3i
1 2 5 1 2 6−3i
5
i 1 0
R3 =R3 +iR1
−−−−−−−−→ 0 0 0
6−3i
0 2+i 5
i 1 0
R2 ←→R3
−−−−−−−−→ 0 2 + i 6−3i 5
0 0 0
1 −i 0
R1 =R1 ÷i
−−−−−−−−→ 0 1 9−12i 25
R2 = R2 ÷ (2 + i)
0 0 0
1 0 12+9i
25
R1 =R1 +iR2
−−−−−−−−→ 0 1 9−12i 25
.
0 0 0
The free variable is z; the parametric form of the solution is
12 + 9i
x =− z
25
y = − 9 − 12i z.
25
Taking z = 25 gives the eigenvector
−12 − 9i
v2 = −9 + 12i .
25
A similar calculation (replacing all occurences of i by −i) shows that an eigenvector
with eigenvalue (4 − 3i)/5 is
−12 + 9i
v3 = −9 − 12i .
25
302 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
If A is a matrix with real entries, then its characteristic polynomial has real
coefficients, so this note implies that its complex eigenvalues come in conjugate
pairs. In the first example, we notice that
i
1 + i has an eigenvector v1 =
1
−i
1 − i has an eigenvector v2 = .
1
In the second example,
−12 − 9i
4 + 3i
has an eigenvector v1 = −9 + 12i
5 25
−12 + 9i
4 − 3i
has an eigenvector v2 = −9 − 12i
5 25
In these cases, an eigenvector for the conjugate eigenvalue is simply the conju-
gate eigenvector (the eigenvector obtained by conjugating each entry of the first
eigenvector). This is always true; we leave the verification to the reader.
−w w
It is obvious that z is in the null space of this matrix, as is −z , for that matter.
Note that we never had to compute the second row of A − λI2 !
Example (A 2 × 2 matrix, the easy way). Find the complex eigenvalues and eigen-
vectors of the matrix
1 −1
A= .
1 1
i −i
In this example we found the eigenvectors 1 and 1 for the eigenvalues 1+ i
1
and 1 − i, respectively, but in this example we found the eigenvectors −i and
1
i for the same eigenvalues of the same matrix. These vectors do not look like
multiples of each other at first—but since we now have complex numbers at our
disposal, we can see that they actually are multiples:
i 1 −i 1
−i = i = .
1 −i 1 i
Proposition. Let
a −b
A=
b a
be a rotation-scaling matrix. Then:
cos θ − sin θ
r 0
with a scaling matrix .
sin θ cos θ 0 r
a
b
In other words (a/r, b/r) lies on the unit circle. Therefore, it has the form (cos θ , sin
θ ),
a/r
where θ is the counterclockwise angle from the positive x-axis to the vector b/r ,
a
or since it is on the same line, to b :
6.5. COMPLEX EIGENVALUES 305
a
cos θ
b a/r
= .
b/r sin θ
(a/r, b/r) θ
It follows that
cos θ − sin θ
a/r −b/r r 0
A= r = ,
b/r a/r 0 r sin θ cos θ
as desired.
For the last statement, we compute the eigenvalues of A as the roots of the
characteristic polynomial:
p p
Tr(A) ± Tr(A)2 − 4 det(A) 2a ± 4a2 − 4(a2 + b2 )
λ= = = a ± bi.
2 2
Geometrically, a rotation-scaling matrix does exactly what the name says: it
rotates and scales (in either order).
Example (A rotation-scaling matrix). What does the matrix
1 −1
A=
1 1
do geometrically?
Solution. Thisp is a rotation-scaling
p matrix with a = b = 1. Therefore, it scales
by a factor of det(A) = 2 and rotates counterclockwise by 45◦ :
1
1
45◦
Here is a picture of A:
45◦
rotate by p
scale by 2
306 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
5π
rotate by
6
scale by 2
6.5. COMPLEX EIGENVALUES 307
Multiplication by the matrix A rotates the plane by 5π/6 and dilates by a factor of 2.
Move the input vector x to see how the output vector b changes.
p
− 3
The matrix in the second example has second column 1 , which is rotated
counterclockwise from theppositive x-axis by an angle of 5π/6. This rotation angle
is not equal to tan 1/(− 3) = − π6 . The problem is that arctan always outputs
−1
values between −π/2 and π/2: it does not account for points in the second or third
quadrants. This is why we drew a triangle and used its (positive) edge lengths to
compute the angle ϕ:
p
− 3
1 π
1 ϕ = tan −1
p =
3 6
1 θ 5π
ϕ θ =π−ϕ = .
p 6
3
p
− 3
Alternatively, we could have observed that 1 lies in the second quadrant,
so that the angle θ in question is
1
θ = tan −1
p + π.
− 3
a
When finding the rotation angle of a vector b , do not blindly compute
a
tan−1 (b/a), since this will give the wrong answer when b is in the second or
third quadrant. Instead, draw a picture.
Proof. First we need to show that Re(v) and Im(v) are linearly independent, since
otherwise C is not invertible. If not, then there exist real numbers x, y, not both
equal to zero, such that x Re(v) + y Im(v) = 0. Then
( y + i x)v = ( y + i x) Re(v) + i Im(v)
= y Re(v) − x Im(v) + (x Re(v) + y Im(v)) i
= y Re(v) − x Im(v).
x + yi
Av = λv = (a + bi)
z + wi
(ax − b y) + (a y + bx)i
=
(az − bw) + (aw + bz)i
a y + bx
ax − b y
= +i .
az − bw aw + bz
a y + bx
ax − b y
ARe(v) = AIm(v) = .
az − bw aw + bz
a
C BC Re(v) = C Be1 = C
−1
= a Re(v) − b Im(v)
−b
x y ax − b y
=a −b = = ARe(v)
z w az − bw
b
C BC Im(v) = C Be2 = C
−1
= b Re(v) + a Im(v)
a
a y + bx
x y
=b +a = = AIm(v).
z w aw + bz
Aw = A c Re(v) + d Im(v)
= cARe(v) + dAIm(v)
= cC BC −1 Re(v) + dC BC −1 Im(v)
= C BC −1 c Re(v) + d Im(v)
= C BC −1 w.
x + yi x + yi
x y
Re(a + bi) = a Im(a + bi) = b Re = Im = .
z + wi z z + wi w
a −b
B= with a = Re(λ), b = − Im(λ).
b a
Geometrically, the rotation-scaling theorem says that a 2×2 matrix with a complex
eigenvalue behaves similarly to a rotation-scaling matrix. See this important note
in Section 6.3.
One should regard the rotation-scaling theorem as a close analogue of the di-
agonalization theorem in Section 6.4, with a rotation-scaling matrix playing the
310 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
4. Then A = C BC −1 for
| |
Re(λ) Im(λ)
C = Re(v) Im(v) and B= .
− Im(λ) Re(λ)
| |
45◦
rotate by p
scale by 2
C −1 C
rotate by 45◦
“around an ellipse”
p
scale by 2
To summarize:
• B rotates around the circle centered at the originp and passing through e1 and
e2 , in the direction from e1 to e2 , then scales by 2.
• A rotates around the ellipse centered at the origin and passing through
p Re(v)
and Im(v), in the direction from Re(v) to Im(v), then scales by 2.
The reader might want to refer back to this example in Section 6.3.
The geometric action of A and B on the plane. Click “multiply” to multiply the colored
points by B on the left and A on the right.
1
λ = 1+i as our eigenvalue, then0we
If instead we had chosen would have found
the eigenvector v = 1−i . In this case we would have A = C B (C 0 )−1 , where
0
1 1 1 0
C = Re
0
Im =
1−i 1−i 1 −1
Re(λ) Im(λ) 1 1
B0 = = .
− Im(λ) Re(λ) −1 1
p
So, A is also similar to a clockwise rotation by 45◦ , followed by a scale by 2.
312 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
p
− 3+1 −2
A= p
1 − 3−1
do geometrically?
p p p
Tr(A) ± Tr(A)2 − 4 det(A) −2 3 ± 12 − 16 p
λ= = = − 3 ± i.
2 2
p
We choose the eigenvalue λ = − 3−i and find a corresponding eigenvector, using
the trick:
p 1 + i −2 eigenvector
2
A − (− 3 − i)I2 = −−−−−→ v = .
? ? 1+i
2 2 2 0
C = Re Im =
1+i 1+i 1 1
p
Re(λ) Im(λ) − 3 −1
B= = p .
− Im(λ) Re(λ) 1 − 3
The matrix B is the rotation-scaling matrix in the above example: it rotates coun-
terclockwise by an angle of 5π/6 and scales by a factor of 2. The matrix A does
the same thing, but with respect to the Re(v), Im(v)-coordinate system:
6.5. COMPLEX EIGENVALUES 313
rotate by 5π/6
scale by 2
C −1 C
rotate by 5π/6
“around an ellipse”
scale by 2
To summarize:
• B rotates around the circle centered at the origin and passing through e1 and
e2 , in the direction from e1 to e2 , then scales by 2.
• A rotates around the ellipse centered at the origin and passing through Re(v)
and Im(v), in the direction from Re(v) to Im(v), then scales by 2.
The reader might want to refer back to this example in Section 6.3.
The geometric action of A and B on the plane. Click “multiply” to multiply the colored
points by B on the left and A on the right.
p
If instead we had chosen λ = − 3 − i as our eigenvalue, then we would have
2
found the eigenvector v = 1−i . In this case we would have A = C 0 B 0 (C 0 )−1 , where
2 2 2 0
C = Re
0
Im =
1−i 1−i 1 −1
p
Re(λ) Im(λ) − 3 1
B =
0
= p .
− Im(λ) Re(λ) −1 − 3
So, A is also similar to a clockwise rotation by 5π/6, followed by a scale by 2.
314 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
We saw in the above examples that the rotation-scaling theorem can be applied
in two different ways to any given matrix: one has to choose one of the two con-
jugate eigenvalues to work with. Replacing λ by λ has the effect of replacing v by
v, which just negates all imaginary parts, so we also have A = C 0 B 0 (C 0 )−1 for
| |
Re(λ) − Im(λ)
C 0 = Re(v) − Im(v) and B =
0
.
Im(λ) Re(λ)
| |
The matrices B and B 0 are similar to each other. The only difference between them
Re(λ) Re(λ)
is the direction of rotation, since − Im(λ) and Im(λ) are mirror images of each other
over the x-axis:
Re(λ)
Im(λ)
θ
Re(λ)
−θ
− Im(λ)
The discussion that follows is closely analogous to the exposition in this sub-
section in Section 6.4, in which we studied the dynamics of diagonalizable 2 × 2
matrices.
A3 v
A2 v
Av
v
|λ| = 1: when the scaling factor is equal to 1, then vectors do not tend to get
longer or shorter. In this case, repeatedly multiplying a vector by A simply “rotates
around an ellipse”. For example,
p p
1 3 + 1 p −2 3−i
A= λ= |λ| = 1
2 1 3−1 2
Av v
A2 v
A3 v
316 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
|λ| < 1: when the scaling factor is less than 1, then vectors tend to get shorter,
i.e., closer to the origin. In this case, repeatedly multiplying a vector by A makes
the vector “spiral in”. For example,
p p
1 3 + 1 p −2 3−i 1
A= p λ= p |λ| = p < 1
2 2 1 3−1 2 2 2
A3 v
A2 v
Av
The geometric action of A and B on the plane. Click “multiply” to multiply the colored
points by B on the left and A on the right.
6.5. COMPLEX EIGENVALUES 317
Interactive: |λ| = 1.
p p
1 3 + 1 p −2 1
−1
3 p 2 0
A= B= C=
2 1 3−1 2 1 3 1 1
p
3−i
λ= |λ| = 1
2
The geometric action of A and B on the plane. Click “multiply” to multiply the colored
points by B on the left and A on the right.
The geometric action of A and B on the plane. Click “multiply” to multiply the colored
points by B on the left and A on the right.
λ 0
.
0 λ
• The matrix B is block diagonal, where the blocks are 1 × 1 blocks containing
the real eigenvalues (with their multiplicities), or 2 × 2 blocks containing the
matrices
Re(λ) Im(λ)
− Im(λ) Re(Λ)
for each non-real eigenvalue λ (with multiplicity).
The block diagonalization theorem is proved in the same way as the diagonal-
ization theorem in Section 6.4 and the rotation-scaling theorem. It is best under-
stood in the case of 3 × 3 matrices.
We search for a real root using the rational root theorem. The possible rational
roots are ±1, ±2, ±4; we find f (2) = 0, so that λ − 2 divides f (λ). Performing
polynomial long division gives
f (λ) = −(λ − 2) λ2 − 2λ + 2 .
The geometric action of A and B on R3 . Click “multiply” to multiply the colored points
by B on the left and A on the right. (We have scaled C by 1/6 so that the vectors x
and y have roughly the same size.)
Objectives
1. Learn examples of stochastic matrices and applications to difference equa-
tions.
2. Recipe: find the steady state of a positive stochastic matrix.
3. Picture: dynamics of a positive stochastic matrix.
4. Theorem: the Perron–Frobenius theorem.
5. Vocabulary words: difference equation, (positive) stochastic matrix, steady
state.
vt+1 = Avt
In other words:
Note that
vt = Avt−1 = A2 vt−2 = · · · = At v0 ,
which should hint to you that the long-term behavior of a difference equation is
an eigenvalue problem.
We will use the following example in this subsection and the next. Understand-
ing this section amounts to understanding this example.
Example. Red Box has kiosks all over Atlanta where you can rent movies. You
can return them to any other kiosk. For simplicity, pretend that there are three
kiosks in Atlanta, and that every customer returns their movie the next day. Let
vt be the vector whose entries x t , y t , z t are the number of copies of Prognosis
Negative at kiosks 1, 2, and 3, respectively. Let A be the matrix whose i, j-entry is
the probability that a customer renting Prognosis Negative from kiosk j returns it
to kiosk i. For example, the matrix
.3 .4 .5
A = .3 .4 .3
.4 .2 .2
encodes a 30% probability that a customer renting from kiosk 3 returns the movie
to kiosk 2, and a 40% probability that a movie rented from kiosk 1 gets returned
to kiosk 3. The second row (for instance) of the matrix A says:
322 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
Therefore, Avt represents the number of movies in each kiosk the next day:
Avt = vt+1 .
4. rabbits produce 0, 6, 8 rabbits in their first, second, and third years, respec-
tively.
Let vt be the vector whose entries x t , y t , z t are the number of rabbits aged 0, 1,
and 2, respectively. The rules above can be written as a system of equations:
x t+1 = 6 y t + 8z t
y t+1 = 21 x t
1
z t+1 = 2 yt .
We compute A has eigenvalues 2 and −1, and that an eigenvector with eigenvalue
2 is
16
v = 4 .
1
This partially explains why the ratio x t : y t : z t approaches 16 : 4 : 1 and why all
three quantities eventually double each year in this demo:
Left: the population of rabbits in a given year. Right: the proportions of rabbits in
that year. Choose any values you like for the starting population, and click “Advance
1 year” several times. Notice that the ratio x t : y t : z t approaches 16 : 4 : 1, and that
all three quantities eventually double each year.
is a positive stochastic matrix. The fact that the columns sum to one says that all
of the movies rented from a particular kiosk must be returned to some other kiosk
(remember that every customer returns their movie the next day). For instance,
the first column says:
The sum is 100%, as all of the movies are returned to one of the three
kiosks.
324 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
The matrix A represents the change of state from one day to the next:
x t+1 xt .3x t + .4 y t + .5z t
y t+1 = A y t = .3x t + .4 y t + .3z t .
z t+1 zt .4x t + .2 y t + .2z t
This says that the total number of copies of Prognosis Negative in the three kiosks
does not change from day to day, as we expect.
The fact that the entries of the vectors vt and vt+1 sum to the same number is
a consequence of the fact that the columns of a stochastic matrix sum to one.
Let A be a stochastic matrix, let vt be a vector, and let vt+1 = Avt . Then the
sum of the entries of vt equals the sum of the entries of vt+1 .
n
X n
X n
X
|λ| · |x j | = ai j x i ≤ ai j · |x i | ≤ ai j · |x j | = 1 · |x j |,
i=1 i=1 i=1
Pn
where the last equality holds because i=1
ai j = 1. This implies |λ| ≤ 1.
In fact, for a positive stochastic matrix A, one can show that if λ 6= 1 is a (real
or complex) eigenvalue of A, then |λ| < 1. The 1-eigenspace of a stochastic matrix
is very important.
One should think of a steady state vector w as a vector of percentages. For example,
if the movies are distributed according to these percentages today, then they will
be have the same distribution tomorrow, since Aw = w. And nomatter the starting
distribution of movies, the long-term distribution will always be the steady state
vector.
The sum c of the entries of v0 is the total number of things in the system being
modeled. The total number does not change, so the long-term state of the system
326 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
The above recipe is suitable for calculations by hand, but it does not take ad-
vantage of the fact that A is a stochastic matrix. In practice, it is generally faster
to compute a steady state vector by computer as follows:
The Perron–Frobenius theorem asserts that, for any vector v0 , the vectors v1 =
Av0 , v2 = Av1 , . . . approach a vector whose entries are the same: 50% of the sum
will be in the first entry, and 50% will be in the second.
We can see this explicitly, as follows. The eigenvectors u1 , u2 form a basis B for
R2 ; for any vector x = a1 u1 + a2 u2 in R2 , we have
a2
Ax = A(a1 u1 + a2 u2 ) = a1 Au1 + a2 Au2 = a1 u1 + u2 .
2
Iterating multiplication by A in this way, we have
a2
At x = a1 u1 + u2 −→ a1 u1
2t
as t → ∞. This shows that At x approaches
a
a1 u1 = 1 .
a1
Note that the sum of the entries of a1 u1 is equal to the sum of the entries of a1 u1 +
a2 u2 , since the entries of u2 sum to zero.
To illustrate the theorem with numbers, let us choose a particular value for u0 ,
1 x
say u0 = 0 . We compute the values for u t = ytt in this table:
t xt yt
0 1.000 0.000
1 0.750 0.250
2 0.625 0.375
3 0.563 0.438
4 0.531 0.469
5 0.516 0.484
6 0.508 0.492
7 0.504 0.496
8 0.502 0.498
9 0.501 0.499
10 0.500 0.500
0.5
We see that u t does indeed approach 0.5 .
Now we turn to visualizing the dynamics of (i.e., repeated multiplication by)
the matrix A. This matrix is diagonalizable; we have A = C DC −1 for
1 1 1 0
C= D= .
1 −1 0 1/2
The matrix D leaves the x-coordinate unchanged and scales the y-coordinate by
1/2. Repeated multiplication by D makes the y-coordinate very small, so it “sucks
all vectors into the x-axis.”
328 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
The matrix A does the same thing as D, but with respect to the coordinate
system defined by the columns u1 , u2 of C. This means that A “sucks all vectors
into the 1-eigenspace”, without changing the sum of the entries of the vectors.
Dynamics of the stochastic matrix A. Click “multiply” to multiply the colored points
by D on the left and A on the right. Note that on both sides, all vectors are “sucked
into the 1-eigenspace” (the red line).
Example. Continuing with the Red Box example, we can illustrate the Perron–Frobenius
theorem explicitly. The matrix
.3 .4 .5
A = .3 .4 .3
.4 .2 .2
Notice that 1 is strictly greater in absolute value than the other eigenvalues, and
that it has algebraic (hence, geometric) multiplicity 1. One computes eigenvectors
for the eigenvalues 1, −0.2, 0.1 to be, respectively,
7 −1 1
u1 = 6 u2 = 0
u3 = −3 .
5 1 2
Ax = A(a1 u1 + a2 u2 + a3 u3 )
= a1 Au1 + a2 Au2 + a3 Au3
= a1 u1 − 0.2a2 u2 + 0.1a3 u3 .
At x = a1 u1 − (0.2) t a2 u2 + (0.1) t a3 u3 −→ a1 u1
What do the above calculations say about the number of copies of Prognosis
Negative in the Atlanta Red Box kiosks? Suppose that the kiosks start with 100
copies of the movie, with 30 copies at kiosk 1, 50 copies at kiosk 2, and 20 copies
at kiosk 3. Let v0 = (30, 50, 20) be the vector describing this state. Then there will
be v1 = Av0 movies in the kiosks the next day, v2 = Av1 the day after that, and so
on. We let vt = (x t , y t , z t ).
t xt yt zt
0 30.000000 50.000000 20.000000
1 39.000000 35.000000 26.000000
2 38.700000 33.500000 27.800000
3 38.910000 33.350000 27.740000
4 38.883000 33.335000 27.782000
5 38.889900 33.333500 27.776600
6 38.888670 33.333350 27.777980
7 38.888931 33.333335 27.777734
8 38.888880 33.333333 27.777786
9 38.888891 33.333333 27.777776
10 38.888889 33.333333 27.777778
(Of course it does not make sense to have a fractional number of movies; the
decimals are included here to illustrate the convergence.) The steady-state vector
says that eventually, the movies will be distributed in the kiosks according to the
percentages
7 38.888888%
1
w= 6 = 33.333333% ,
18 5 27.777778%
which agrees with the above table. Moreover, this distribution is independent of
the beginning distribution of movies in the kiosks.
Now we turn to visualizing the dynamics of (i.e., repeated multiplication by)
the matrix A. This matrix is diagonalizable; we have A = C DC −1 for
7 −1 1 1 0 0
C = 6 0 −3 D = 0 −.2 0 .
5 1 2 0 0 .1
The matrix D leaves the x-coordinate unchanged, scales the y-coordinate by −1/5,
and scales the z-coordinate by 1/10. Repeated multiplication by D makes the y-
and z-coordinates very small, so it “sucks all vectors into the x-axis.”
The matrix A does the same thing as D, but with respect to the coordinate
system defined by the columns u1 , u2 , u3 of C. This means that A “sucks all vectors
into the 1-eigenspace”, without changing the sum of the entries of the vectors.
330 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
Dynamics of the stochastic matrix A. Click “multiply” to multiply the colored points
by D on the left and A on the right. Note that on both sides, all vectors are “sucked
into the 1-eigenspace” (the green line). (We have scaled C by 1/4 so that vectors have
roughly the same size on the right and the left. The “jump” that happens when you
press “multiply” is a negation of the −.2-eigenspace, which is not animated.)
The picture of a positive stochastic matrix is always the same, whether or not
it is diagonalizable: all vectors are “sucked into the 1-eigenspace,” which is a line,
without changing the sum of the entries of the vectors. This is the geometric
content of the Perron–Frobenius theorem.
• If a zillion unimportant pages link to your page, then your page is still im-
portant.
• If only one unknown page links to yours, your page is not important.
Alternatively, there is the random surfer interpretation. A “random surfer” just sits
at his computer all day, randomly clicking on links. The pages he spends the most
time on should be the most important. So, the important (high-ranked) pages are
those where a random surfer will end up most often. This measure turns out to be
equivalent to the rank.
The Importance Matrix. Consider an Internet with n pages. The importance
matrix is the n × n matrix A whose i, j-entry is the importance that page j passes
to page i.
6.6. STOCHASTIC MATRICES 331
Observe that the importance matrix is a stochastic matrix, assuming every page
contains a link: if page i has m links, then the ith column contains the number
1/m, m times, and the number zero in the other entries.
Example. Consider the following Internet with only four pages. Links are indi-
cated by arrows.
1
3
A 1
B
3
1 1 1
1 3 2 2
1
2
C D
1
2
The matrix on the left is the Importance Matrix, and the final equality expresses
the Importance Rule.
Example (A page with no links). Consider the following Internet with three pages:
A
1
C
1
B
1 D
2
1
1 2
1 1
A B C 1 2 2
1 2
1
2
E
0 1 0 0 0
1 0 0 0 0
0 0 0 12 21 .
0 0 12 0 12
0 0 12 12 0
6.6. STOCHASTIC MATRICES 333
both with eigenvalue 1. So there is more than one rank vector in this case. Here
the Importance Matrix is stochastic, but not positive.
Here is Page and Brin’s solution. First we fix the importance matrix by replacing
each zero column with a column of 1/ns, where n is the number of pages:
0 0 0 0 0 1/3
A = 0 0 0 becomes A0 = 0 0 1/3 .
1 1 0 1 1 1/3
The 25 Billion Dollar Eigenvector. The PageRank vector is the steady state
of the Google Matrix.
This exists and has positive entries by the Perron–Frobenius theorem. The hard
part is calculating it: in real life, the Google Matrix has zillions of rows.
Example. What is the PageRank vector for the following Internet? (Use the damp-
ing factor p = 0.15.)
334 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
1
3
A 1
B
3
1 1 1
3 2 2
1
2
C D
1
2
1 1 1
3 2 2
1
2
.35 .25
1
2
6.6. STOCHASTIC MATRICES 335
Page C is the most important, with a rank of 0.558, and page B is the least
important, with a rank of 0.1752.
336 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
Chapter 7
Orthogonality
closest point
337
338 CHAPTER 7. ORTHOGONALITY
Example. In data modeling, one often asks: “what line is my data supposed to lie
on?” This can be solved using a simple application of the least-squares method.
Example. Gauss invented the method of least squares to find a best-fit ellipse: he
correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind
the sun in 1801.
Objectives
7.1. DOT PRODUCTS AND ORTHOGONALITY 339
1. Understand the relationship between the dot product, length, and distance.
3. Vocabulary words: dot product, length, distance, unit vector, unit vector
in the direction of x.
closest point
The closest point has the property that the difference between the two points is
orthogonal, or perpendicular, to the subspace. For this reason, we need to develop
robust notions of orthogonality, length, and distance.
xn yn
1. Commutativity: x · y = y · x.
xn xn
• x·x ≥0
• x · x = 0 ⇐⇒ x = 0.
It is easy to see why this is true for vectors in R2 , by the Pythagorean theorem.
3
4
p
3
5
= 32 + 42 = 5
=
4
42
4
+
32
p
For vectors in R3 , one can check that kxk really is the length of x, although
now this requires two applications of the Pythagorean theorem.
Note that the length of a vector is the length of the arrow; if we think in terms
of points, then the length is its distance from the origin.
This says that scaling a vector by c scales its length by |c|. For example,
6 3 3
= 2 =2 = 10.
8 4 4
Now that we have a good notion of length, we can define the distance between
points in Rn . Recall that the difference between two points x, y is naturally a
vector, namely, the vector y − x pointing from x to y.
Definition. The distance between two points x, y in Rn is the length of the vector
from x to y:
dist(x, y) = k y − xk.
p
3 p
dist(x, y) = k y − xk = = 32 + 22 = 13.
2
y
x
y−
Vectors with length one are very common in applications, so we give them a
name.
p
Definition. A unit vector is a vector x with length kxk = x · x = 1.
The standard coordinate vectors e1 , e2 , e3 , . . . are unit vectors (have length one):
1 p
ke1 k = 0 = 12 + 02 + 02 = 1.
0
For any nonzero vector x, there is a unique unit vector pointing in the same direc-
tion. It is obtained by dividing by the length of x.
342 CHAPTER 7. ORTHOGONALITY
Fact. Let x be a nonzero vector in Rn . The unit vector in the direction of x is the
vector x/kxk.
This is in fact a unit vector (noting that kxk is a positive number, so 1/kxk =
1/kxk):
x 1
= kxk = 1.
kxk kxk
3
Example. What is the unit vector u in the direction of x = ?
4
Solution. We divide by the length of x:
1 1 3
x 3 3/5
u= =p = = .
kxk 32 + 42 4 5 4 4/5
u
0
Since 0 · x = 0 for any vector x, the zero vector is orthogonal to every vector
in Rn .
We can see why the important fact is true using the law of cosines. In our
language, the law of cosines asserts that if x, y are two nonzero vectors, and if α
is the angle between them, then
k y − xk2 = kxk2 + k yk2 − 2kxk k yk cos α.
7.1. DOT PRODUCTS AND ORTHOGONALITY 343
ky
−x
k
k yk
x
α kxk
In matrix form:
1 1 −1 RREF 1 1 0
−−→ .
1 1 1 0 0 1
For instance,
−1 1 −1 1
1 · 1 =0 and 1 · 1 = 0.
0 −1 0 1
Remark (Angle between two vectors). More generally, the law of cosines gives a
formula for the angle α between two nonzero vectors:
Objectives
It will be important to compute the set of all vectors that are orthogonal to a
given set of vectors. It turns out that a vector is orthogonal to a set of vectors if
and only if it is orthogonal to the span of those vectors, which is a subspace, so we
restrict ourselves to the case of subspaces.
This is the set of all vectors v in Rn that are orthogonal to all of the vectors in
W . We will show below that W ⊥ is indeed a subspace.
W
W⊥
346 CHAPTER 7. ORTHOGONALITY
The orthogonal complement of the line spanned by v is the perpendicular line. Click
and drag the head of v to move it.
W
⊥
W
The orthogonal complement of the line spanned by v is the perpendicular plane. Click
and drag the head of v to move it.
W⊥
W
7.2. ORTHOGONAL COMPLEMENTS 347
— v2T —
W = all vectors orthogonal to each v1 , v2 , . . . , vm = Nul
⊥
.. .
.
— vmT —
Proof. To justify the first equality, we need to show that a vector x is perpendicular
to the all of the vectors in W if and only if it is perpendicular only to v1 , v2 , . . . , vm .
Since the vi are contained in W , we really only have to show that if x · v1 = x · v2 =
· · · = x · vm = 0, then x is perpendicular to every vector v in W . Indeed, any vector
in W has the form v = c1 v1 + c2 v2 + · · · + cm vm for suitable scalars c1 , c2 , . . . , cm , so
x · v = x · (c1 v1 + c2 v2 + · · · + cm vm )
= c1 (x · v1 ) + c2 (x · v2 ) + · · · + cm (x · vm )
= c1 (0) + c2 (0) + · · · + cm (0) = 0.
Therefore, x is in W ⊥ .
To prove the second equality, we let
— v1T —
— v2T —
A= .. .
.
— vmT —
By the row-column rule for matrix multiplication in Section 3.3, for any vector x
in Rn we have T
v1 x v1 · x
v2T x v2 · x
Ax = ... = ... .
vmT x vm · x
348 CHAPTER 7. ORTHOGONALITY
then Span{v1 , v2 }⊥ is the solution set of the homogeneous linear system associated
to the matrix
— v1T —
1 7 2
= .
— v2T — −2 3 1
This is the solution set of the system of equations
x 1 + 7x 2 + 2x 3 = 0
§
−2x 1 + 3x 2 + x 3 = 0.
Example. Compute
⊥
1 1
Span 1 , 1
.
−1 1
The orthogonal complement of the plane spanned by v = (1, 1, −1) and w = (1, 1, 1).
1. W ⊥ is also a subspace of Rn .
2. (W ⊥ )⊥ = W.
3. dim(W ) + dim(W ⊥ ) = n.
Proof. For the first assertion, we verify the three defining properties of subspaces.
1. The zero vector is in W ⊥ because the zero vector is orthogonal to every vector
in Rn .
(u + v) · x = u · x + v · x = 0 + 0 = 0.
(cu) · x = c(u · x) = c0 = 0.
has more columns than rows (it is “wide”), so its null space is nonzero by this note
in Section 4.2. Let x be a nonzero vector in Nul(A). Then
T
v1 x v1 · x
v2T x v2 · x
0 = Ax =
... = ...
vkT x vk · x
Row(A) = Col(AT ).
We showed in the above proposition that if A has rows v1T , v2T , . . . , vmT , then
Taking orthogonal complements of both sides and using the second fact gives
Row(A) = Nul(A)⊥ .
352 CHAPTER 7. ORTHOGONALITY
To summarize:
— v1T —
— v2T —
Span{v1 , v2 , . . . , vm }⊥ = Nul .. .
.
— vmT —
so
−3 4 −1 −3 3 −2
W = Row 3 −3 0
⊥ = Span 4 , −3 ,
4 .
−2 4 −2 −1 0 −2
These vectors are necessarily linearly dependent (why)?
Remark (Row rank and column rank). Let A be an m × n matrix. By the rank
theorem in Section 3.9, we have
dim Col(A) + dim Nul(A) = n.
On the other hand the third fact says that
dim Nul(A)⊥ + dim Nul(A) = n,
which implies dim Col(A) = dim Nul(A)⊥ . Since Nul(A)⊥ = Row(A), we have
dim Col(A) = dim Row(A).
In other words, the span of the rows of A has the same dimension as the span of the
columns of A, even though the first lives in Rn and the second lives in Rm . This fact
is often stated as “the row rank equals the column rank.”
Objectives
1. Understand the orthogonal decomposition of a vector with respect to a sub-
space.
2. Understand the relationship between orthogonal decomposition and orthog-
onal projection.
3. Understand the relationship between orthogonal decomposition and the clos-
est vector on / distance to a subspace.
4. Learn the basic properties of orthogonal projections as linear transforma-
tions and as matrix transformations.
5. Recipes: orthogonal projection onto a line, orthogonal decomposition by
solving a system of equations, orthogonal projection via a complicated ma-
trix product.
6. Pictures: orthogonal decomposition, orthogonal projection.
7. Vocabulary words: orthogonal decomposition, orthogonal projection.
x − xW
xW W
x = xW + xW ⊥
x = x W + x W ⊥ = yW + yW ⊥
x W − yW = yW ⊥ − x W ⊥ .
Since W and W ⊥ are subspaces, the left side of the equation is in W and the right
side is in W ⊥ . Therefore, x W − yW is in W and in W ⊥ , so it is orthogonal to itself,
which implies x W − yW = 0. Hence x W = yW and x W ⊥ = yW ⊥ , which proves
uniqueness.
7.3. ORTHOGONAL PROJECTION 355
x = xW + xW ⊥
We see that the orthogonal decomposition in this case expresses a vector in terms
of a “horizontal” component (in the x y-plane) and a “vertical” component (on the
z-axis).
xW ⊥
xW W
356 CHAPTER 7. ORTHOGONALITY
AT x = AT (x W + x W ⊥ ) = AT x W + AT x W ⊥ = AT x W .
AT x = AT x W = AT Ac.
x
x L⊥ u
u· x
xL = u
L u·u
To reiterate:
2
Example
(Projection onto a line in R ). 3Compute the orthogonal projection of
−6
x = 4 onto the line L spanned by u = 2 , and find the distance from x to L.
Solution. First we find
x ·u −18 + 8 3 10 3 1 −48
xL = u= =− x L⊥ = x − xL = .
u·u 9+4 2 13 2 13 72
−6
4
3
2
10 3
−
13 2
L
x W = Ac xW ⊥ = x − xW .
Example (Projection onto the x y-plane). Use the theorem to compute the orthog-
onal decomposition of a vector with respect to the x y-plane in R3 .
Solution. A basis for the x y-plane is given by the two standard coordinate vec-
tors
1 0
e1 = 0
e2 = 1 .
0 0
Let A be the matrix with columns e1 , e2 :
1 0
A = 0 1.
0 0
Then
x1 x1
1 0 1 0 0 x
AT A = = I2 AT
x2 =
x2 = 1 .
0 1 0 1 0 x2
x3 x3
We have
2 1 −2
A A=
T
A x=
T
.
1 2 3
We form an augmented matrix and row reduce:
1 −7
2 1 −2 RREF 1 0 −7/3
−−→ =⇒ c = .
1 2 3 0 1 8/3 3 8
It follows that
1 2
1 1
xW = Ac = 8 xW ⊥ = x − xW = −2 .
3 7 3 2
1 0 1
0 1 1
A= .
−1 0 1
0 −1 −1
We compute
2 0 0 −3
AT A = 0 2 2 AT x = −3 .
0 2 4 0
We form an augmented matrix and row reduce:
2 0 0 −3 1 0 0 −3/2 −3
RREF 1
0 2 2 −3 −−→ 0 1 0 −3 =⇒ c = −6 .
0 2 4 0 0 0 1 3/2 2 3
It follows that
0 0
1 −3 1 5
xW = Ac = xW ⊥ = .
2 6 2 0
3 5
In the context of the above theorem, if we start with a basis of W , then it turns
out that the square matrix AT A is automatically invertible! (It is always the case
that AT A is square and the equation AT Ac = AT x is consistent, but AT A need not be
invertible in general.)
x W = A(AT A)−1 AT x.
Proof. We will show that Nul(AT A) = {0}, which implies invertibility by the invert-
ible matrix theorem in Section 6.1. Suppose that AT Ac = 0. Then AT Ac = AT 0, so
0W = Ac by the theorem. But 0W = 0 (the orthogonal decomposition of the zero
vector is just 0 = 0 + 0), so Ac = 0, and therefore c is in Nul(A). Since the columns
of A are linearly independent, we have c = 0, so Nul(AT A) = 0, as desired.
Let x be a vector in Rn and let c be a solution of AT Ac = AT x. Then c =
(AT A)−1 AT x, so x W = Ac = A(AT A)−1 AT x.
362 CHAPTER 7. ORTHOGONALITY
Example (Matrix for a projection). Continuing with the above example, let
1 1 x1
W = Span 0 , 1 x = x2 .
−1 0 x3
Compute x W .
Solution. Clearly the spanning vectors are noncollinear, so according to the corol-
lary, we have x W = A(AT A)−1 AT x, where
1 1
A = 0 1.
−1 0
We compute
1
2 1 2 −1
A A=
T
=⇒ (A A) T −1
= ,
1 2 3 −1 2
so
1 1 x1
1 2 −1
1 0 −1
xW = A(AT A)−1 AT x = 0 1 x2
3 −1 2 1 1 0
−1 0 x3
2x 1 + x 2 − x 3
2 1 −1 x1
1 1
= 1 2 1 x 2 = x 1 + 2x 2 + x 3 .
3 −1 1 2 x3 3
−x + x + 2x 1 2 3
1. T is a linear transformation.
4. T ◦ T = T .
5. The range of T is W .
Proof.
7.3. ORTHOGONAL PROJECTION 363
T (c x) = (c x)W = c x W = cT (x).
We compute the standard matrix of the orthogonal projection in the same way
as for any other transformation: by evaluating on the standard coordinate vec-
tors. In this case, this means projecting the standard coordinate vectors onto the
subspace.
Example (Matrix of a projection). Let L be the line in R2 spanned by the vector
3
u = 2 , and define T : R2 → R2 by T (x) = x L . Compute the standard matrix B for
T.
Solution. The columns of B are T (e1 ) = (e1 ) L and T (e2 ) = (e2 ) L . We have
u · e1 3 3
(e1 ) L = u=
u·u 13 2 1 9 6
=⇒ B = .
u · e2 2 3 13 6 4
(e2 ) L = u=
u·u 13 2
To compute each (ei )W , we solve the matrix equation AT Ac = AT ei for c, then use
the equality (ei )W = Ac. First we note that
2 1 1 0 −1
A A=
T
; A ei = the ith column of A =
T T
.
1 2 1 1 0
and for e3 :
−1
1
2 1 −1 RREF 1 0 −2/3 −2/3
−−→ =⇒ (e1 )W = A = 1 .
1 2 0 0 1 1/3 1/3 3 2
It follows that
2 1 −1
1
B = 1 2 1 .
3 −1 1 2
A(AT A)−1 AT .
1. Col(B) = W.
2. Nul(B) = W ⊥ .
3. B 2 = B.
4. B is similar to the diagonal matrix with m ones and n−m zeros on the diagonal,
where m = dim(W ).
Proof. The first three assertions are translations of properties 5, 3, and 4, respec-
tively, using this important note in Section 4.1 and this theorem in Section 4.4.
For the final assertion, we showed in the proof of this theorem that there is a
basis of Rn of the form {v1 , . . . , vm , vm+1 , . . . , vn }, where {v1 , . . . , vm } is a basis for
W and {vm+1 , . . . , vn } is a basis for W ⊥ . Each vi is an eigenvector of B: indeed, for
i ≤ m we have
Bvi = T (vi ) = vi = 1 · vi
because vi is in W , and for i > m we have
Bvi = T (vi ) = 0 = 0 · vi
One can verify by hand that B 2 = B (try it!). We compute W ⊥ as the null space of
1 0 −1 RREF 1 0 −1
−−→ .
1 1 0 0 1 1
Remark. As we saw in this example, if you are willing to compute bases for W and
W ⊥ , then this provides a third way of finding the standard matrix B for projection
onto W : indeed, if {v1 , v2 , . . . , vm } is a basis for W and {vm+1 , vm+2 , . . . , vn } is a basis
for W ⊥ , then
1 ··· 0 0 ··· 0
... . . . ... ... . . . ... −1
| | | 0 ··· 1 0 ··· 0
| | |
B = v1 v1 · · · vn 0 · · · 0 0 · · · 0 v1 v1 · · · vn
,
| | | | | |
.. . . . .. .. . . . ..
. . . .
0 ··· 0 0 ··· 0
where the middle matrix in the product is the diagonal matrix with m ones and
n − m zeros on the diagonal. However, since you already have a basis for W , it is
faster to multiply out the expression A(AT A)−1 AT as in the corollary.
Remark (Reflections). Let W be a subspace of Rn , and let x be a vector in Rn . The
reflection of x over W is defined to be the vector
refW (x) = x − 2x W ⊥ .
In other words, to find refW (x) one starts at x, then moves to x − x W ⊥ = x W , then
continues in the same direction one more time, to end on the opposite side of W .
x
−x W ⊥
xW
W refW (x)
−x W ⊥
3. refW is similar to the diagonal matrix with m = dim(W ) ones on the diagonal
and n − m negative ones.
368 CHAPTER 7. ORTHOGONALITY
Objectives
e3
e2
e1
is orthonormal.
370 CHAPTER 7. ORTHOGONALITY
v1 v2 vm
§ ª
, ,...,
kv1 k kv2 k kvm k
is an orthonormal set.
Is B = {u1 , u2 } orthogonal?
Solution. We only have to check that
a −b
· = −ab + ab = 0.
b a
A nice property enjoyed by orthogonal sets is that they are automatically lin-
early independent.
Fact. An orthogonal set is linearly independent. Therefore, it is a basis for its span.
7.4. ORTHOGONAL SETS 371
Proof. Suppose that {u1 , u2 , . . . , um } is orthogonal. We need to show that the equa-
tion
c1 u 1 + c2 u 2 + · · · + c m u m = 0
has only the trivial solution c1 = c2 = · · · = cm = 0. Taking the dot product of both
sides of this equation with u1 gives
0 = u1 · 0 = u1 · c1 u1 + c2 u2 + · · · + cm um
= c1 (u1 · u1 ) + c2 (u1 · u2 ) + · · · + cm (u1 · um )
= c1 (u1 · u1 )
because u1 · ui = 0 for i > 1. Since u1 6= 0 we have u1 · u1 6= 0, so c1 = 0. Similarly,
taking the dot product with ui shows that each ci = 0, as desired.
One advantage of working with orthogonal sets is that it gives a simple formula
for the orthogonal projection of a vector.
Projection Formula. Let W be a subspace of Rn , and let {u1 , u2 , . . . , um } be an or-
thogonal basis for W . Then for any vector x in Rn , the orthogonal projection of x
onto W is given by the formula
x · u1 x · u2 x · um
xW = u1 + u2 + · · · + um .
u1 · u1 u2 · u2 um · um
Proof. Let
x · u1 x · u2 x · um
y= u1 + u2 + · · · + um .
u1 · u1 u2 · u2 um · um
This vector is contained in W because it is a linear combination of u1 , u2 , . . . , um .
Hence we just need to show that x − y is in W ⊥ , i.e., that ui · (x − y) = 0 for each
i = 1, 2, . . . , m. For u1 , we have
x · u1 x · u2 x · um
u1 · (x − y) = u1 · x − u1 − u2 − · · · − um
u1 · u1 u2 · u2 um · um
x · u1
= u1 · x − (u1 · u1 ) − 0 − · · · − 0
u1 · u1
= 0.
A similar calculation shows that ui · (x − y) = 0 for each i, so x − y is in W ⊥ , as
desired.
If {u1 , u2 , . . . , um } is an orthonormal basis for W , then the denominators ui ·ui =
1 go away, so the projection formula becomes even simpler:
x W = (x · u1 ) u1 + (x · u2 ) u2 + · · · + (x · um ) um .
Example. Suppose that L = Span{u} is a line. The set {u} is an orthogonal basis
for L, so the Projection Formula says that for any vector x, we have
x ·u
xL = u,
u·u
as in this example in Section 7.3. See also this example in Section 7.3 and this
example in Section 7.3.
372 CHAPTER 7. ORTHOGONALITY
In other words, for an orthogonal basis, the projection of x onto W is the sum of the
projections onto the lines spanned by the basis vectors. In this sense, projection
onto a line is the most important example of an orthogonal projection.
Example (Projection onto the x y-plane). Continuing with this example in Sec-
tion 7.3 and this example in Section 7.3, use the projection formula to compute
the orthogonal projection of a vector onto the x y-plane in R3 .
Solution. A basis for the x y-plane is given by the two standard coordinate vec-
tors
1 0
e1 = 0
e2 = 1 .
0 0
Orthogonal projection of a vector onto the x y-plane in R3 . Note that x W is the sum of
the projections of x onto the e1 - and e2 -coordinate axes (shown in orange and brown,
respectively).
Find x W and x W ⊥ .
Solution. The vectors
1 1
u1 = 0 u2 = 1
−1 1
7.4. ORTHOGONAL SETS 373
Then we have
−1
xW ⊥ = x − xW = 2 .
−1
Orthogonal projection of a vector onto the plane W . Note that x W is the sum of the
projections of x onto the lines spanned by u1 and u2 (shown in orange and brown,
respectively).
x · u1 x · u2 x · um
[x]B = , , ..., .
u1 · u1 u2 · u2 um · um
Solution. Since
1 −4
u1 = u2 =
2 2
form an orthogonal basis of R2 , we have
x · u1 x · u2 3·2 3·2 6 3
[x]B = , = , = , .
u1 · u1 u2 · u2 12 + 22 (−4)2 + 22 5 10
x
6
5 u1
u2
u1
3
10 u2
7.4. ORTHOGONAL SETS 375
The following example shows that the Projection Formula does in fact require
an orthogonal basis.
Non-Example (A non-orthogonal basis). Consider the basis B = {v1 , v2 } of R2 ,
where
2 1
v1 = v2 = .
−1/2 2
1
This is not orthogonal because v1 · v2 = 2 − 1 = 1 6= 0. Let x = 1 . Let us try to
compute x = x R2 using the Projection Formula with respect to the basis B:
x · v1 x · v2 3/2 3 1
2 111/85
x R2 = v1 + v2 = + = 6= x.
v1 · v1 v2 · v2 17/4 −1/2 5 2 87/85
Since x = x R2 , we see that the Projection Formula does not compute the orthogonal
projection in this case. Geometrically, the projections of x onto the lines spanned
by v1 and v2 do not sum to x, as we can see from the picture.
v2
111/85
87/85
x Span{v2 }
x
x Span{v1 }
v1
When v1 and v2 are not orthogonal, then x R2 = x is not necessarily equal to the sum
(red) of the projections (orange and brown) of x onto the lines spanned by v1 and v2 .
1. u1 = v1
v2 · u1
2. u2 = (v2 )Span{u1 }⊥ = v2 − u1
u1 · u1
v3 · u1 v3 · u2
3. u3 = (v3 )Span{u1 ,u2 }⊥ = v3 − u1 − u2
u1 · u1 u2 · u2
..
.
m−1
X vm · ui
m. um = (vm )Span{u1 ,u2 ,...,um−1 }⊥ = vm − ui .
i=1
ui · ui
v2
u2 = (v2 ) L1⊥
L1 v1 = u 1
v3 · u1 v3 · u2
3. u3 = v3 − u1 − u2
u1 · u1 u2 · u2
3 1 0 1
4 1
= 1 −
1 − 0 = −1 .
1 2 0 1 1 0
u1 · u2 = 0 u1 · u3 = 0 u2 · u3 = 0.
W2
u2
(v
3)
W
2
u1
v3
)W2
⊥
(v 3
=
u3
L1
7.4. ORTHOGONAL SETS 379
1 −1 4
1 4 −2
v1 = v2 = v3 = .
1 4 −2
1 −1 0
1
1
1. u1 = v1 =
1
1
−1 1 −5/2
v2 · u1 4 6 1 5/2
2. u2 = v2 − u1 = − =
u1 · u1 4 4 1 5/2
−1 1 −5/2
v3 · u1 v3 · u2
3. u3 = v3 − u1 − u2
u1 · u1 u2 · u2
4 1 −5/2 2
−2 0 1 −20 5/2 0
= − − 5/2 = 0 .
−2 24 1 25
0 1 −5/2 −2
We saw in the proof of the Gram–Schmidt Process that for every i between 1
and m, the set {u1 , u2 , . . . , ui } is a an orthogonal basis for Span{v1 , v2 , . . . , vi }.
If we had started with a spanning set {v1 , v2 , . . . , vm } which is linearly depen-
dent, then for some i, the vector vi is in Span{v1 , v2 , . . . , vi−1 } by the increasing
span criterion in Section 3.5. Hence
You can use the Gram–Schmidt Process to produce an orthogonal basis from
any spanning set: if some ui = 0, just throw away ui and vi , and continue.
380 CHAPTER 7. ORTHOGONALITY
1. If you already have an orthogonal basis, it is almost always easier to use the
projection formula. This often happens in the sciences.
2. If you are going to have to compute the projections of many vectors onto the
same subspace, it is worth your time to run Gram–Schmidt to produce an
orthogonal basis, so that you can use the projection formula.
3. If you only have to project one or a few vectors onto a subspace, it is faster
to use the theorem in Section 7.3. This is the method we will follow in
Section 7.5.
Objectives
Suppose that Ax = b does not have a solution. What is the best ap-
proximate solution?
For our purposes, the best approximate solution is called the least-squares solution.
We will present two methods for finding least-squares solutions, and we will give
several applications to best-fit problems.
7.5. THE METHOD OF LEAST SQUARES 381
x ) ≤ dist(b, Ax)
dist(b, Ab
Recall that dist(v, w) = kv−wk is the distance between the vectors v and w. The
term “least squares” comes from the fact that dist(b, Ax) = kb − Ab x k is the square
root of the sum of the squares of the entries of the vector b −Ab
x . So a least-squares
solution minimizes the sum of the squares of the differences between the entries
x and b. In other words, a least-squares solution solves the equation Ax = b
of Ab
as closely as possible, in the sense that the sum of the squares of the difference
b − Ax is minimized.
b Ax
Ax 0
x = bCol(A)⊥
b − Ab
Ax
Col A
x = bCol(A)
Ab
Where is b
x in this picture? If v1 , v2 , . . . , vn are the columns of A, then
x1
b
bx2
x = A
Ab ... = b
x v +b
1 1 x 2 v2 + · · · + b
x n vn .
xn
b
Hence the entries of b x are the “coordinates” of bCol(A) with respect to the spanning
set {v1 , v2 , . . . , vm } of Col(A). (They are honest B-coordinates if the columns of A
are linearly independent.)
x 1 v1 0 v1
x = bCol(A)⊥
b − Ab
b
x 2 v2
b
v2
Col A
x = bCol(A)
Ab
The violet plane is Col(A). The closest that Ax can get to b is the closest vector on
Col(A) to b, which is the orthogonal projection bCol(A) (in blue). The vectors v1 , v2 are
the columns of A, and the coefficients of b
x are the lengths of the green lines. Click and
drag b to move it.
AT Ax = AT b
x of Ax = b, then
To reiterate: once you have found a least-squares solution b
bCol(A) is equal to Ab
x.
and
6
0 1 2 0
A b=
T
0 = .
1 1 1 6
0
We form an augmented matrix and row reduce:
5 3 0 RREF 1 0 −3
−−→ .
3 3 6 0 1 5
−3
Therefore, the only least-squares solution is b x= 5 .
This solution minimizes the distance from Ab x to b, i.e., the sum of the squares
x = b − bCol(A) = bCol(A)⊥ . In this case, we have
of the entries of b − Ab
6 5 1 Æ p
kb − Abx k = 0 − 2 = −2 = 12 + (−2)2 + 12 = 6.
0 −1 1
384 CHAPTER 7. ORTHOGONALITY
p
Therefore, bCol(A) = Ab
x is 6 units from b.
In the following picture, v1 , v2 are the columns of A:
b
5v2
v1 p −3v1
6
v2
−3
bCol(A) = A
5
Col A
The violet plane is Col(A). The closest that Ax can get to b is the closest vector on
Col(A) to b, which is the orthogonal projection bCol(A) (in blue). The vectors v1 , v2
x are the B-coordinates of bCol(A) , where
are the columns of A, and the coefficients of b
B = {v1 , v2 }.
Solution. We have
2 0
2 −1 0 5 −1
A A=
T
−1 1 =
0 1 2 −1 5
0 2
and
1
2 −1 0 2
A b=
T
0 = .
0 1 2 −2
−1
We form an augmented matrix and row reduce:
5 −1 2 RREF 1 0 1/3
−−→ .
−1 5 −2 0 1 −1/3
7.5. THE METHOD OF LEAST SQUARES 385
1 1
x=
Therefore, the only least-squares solution is b 3 −1 .
The red plane is Col(A). The closest that Ax can get to b is the closest vector on
Col(A) to b, which is the orthogonal projection bCol(A) (in blue). The vectors v1 , v2
x are the B-coordinates of bCol(A) , where
are the columns of A, and the coefficients of b
B = {v1 , v2 }.
The reader may have noticed that we have been careful to say “the least-squares
solutions” in the plural, and “a least-squares solution” using the indefinite article.
This is because a least-squares solution need not be unique: indeed, if the columns
of A are linearly dependent, then Ax = bCol(A) has infinitely many solutions. The
following theorem, which gives equivalent criteria for uniqueness, is an analogue
of this corollary in Section 7.3.
Theorem. Let A be an m × n matrix and let b be a vector in Rm . The following are
equivalent:
1. Ax = b has a unique least-squares solution.
3. AT A is invertible.
In this case, the least-squares solution is
x = (AT A)−1 AT b.
b
Proof. The set of least-squares solutions of Ax = b is the solution set of the consis-
tent equation AT Ax = AT b, which is a translate of the solution set of the homoge-
neous equation AT Ax = 0. Since AT A is a square matrix, the equivalence of 1 and 3
follows from the invertible matrix theorem in Section 6.1. The set of least squares-
solutions is also the solution set of the consistent equation Ax = bCol(A) , which has
a unique solution if and only if the columns of A are linearly independent by this
theorem in Section 3.5.
Example (Infinitely many least-squares solutions). Find the least-squares solu-
tions of Ax = b where:
1 0 1 6
A = 1 1 −1
b = 0 .
1 2 −3 0
Solution. We have
3 3 −3 6
AT A = 3 5 −7 AT b = 0 .
−3 −7 11 6
386 CHAPTER 7. ORTHOGONALITY
( x = −x − 5
1 3 x1 −1 −5
parametric
x 2 = 2x 3 + 3 −−−−−→ x = x2 = x3 2 + 3 .
b
vector form
x3 = x3 x3 1 0
b
v2
v1
v3 bCol(A) = Ab
x
Col A
The three columns of A are coplanar, so there are many least-squares solutions. (The
demo picks one solution when you move b.)
7.5. THE METHOD OF LEAST SQUARES 387
b · u1 b · u2 b · um (b · u2 )/(u2 · u2 )
bCol(A) = u1 + u2 + · · · + um = A .. .
u1 · u1 u2 · u2 um · um .
(b · um )/(um · um )
Note that the least-squares solution is unique in this case, since an orthogonal set
is linearly independent.
b · u1 b · u2 b · um
x=
b , , ..., .
u1 · u1 u2 · u2 um · um
(0, 6)
(2, 0)
(1, 0)
y = M x + B.
If our three data points were to lie on this line, then the following equations would
be satisfied:
6= M ·0+B
0= M ·1+B (7.5.1)
0 = M · 2 + B.
In order to find the best-fit line, we try to solve the above equations in the un-
knowns M and B. As the three points do not actually lie on a line, there is no
actual solution, so instead we compute a least-squares solution.
Putting our linear equations into matrix form, we are trying to solve Ax = b
for
0 1 6
M
A= 1 1
x= b = 0 .
B
2 1 0
We solved this least-squares problem in this example: the only least-squares solu-
M −3
tion to Ax = b is b
x = B = 5 , so the best-fit line is
y = −3x + 5.
7.5. THE METHOD OF LEAST SQUARES 389
(0, 6)
y=
−3 x
+5
(2, 0)
(1, 0)
−3(0) + 5 f (0)
−3
A = −3(1) + 5 = f (1) .
5
−3(2) + 5 f (2)
In other words, Ab x is the vector whose entries are the y-coordinates of the graph
of the line at the values of x we specified in our data points, and b is the vector
whose entries are the y-coordinates of those data points. The difference b − Ab x is
the vertical distance of the graph from the data points:
390 CHAPTER 7. ORTHOGONALITY
(0, 6)
−1
y=
−3 x
5 +
2
(2, 0)
(1, 0)
−1
6 −1
−3
x = 0 −A
b − Ab = 2
5
0 −1
The best-fit line minimizes the sum of the squares of these vertical distances.
The best-fit line minimizes the sum of the squares of the vertical distances (violet).
Click and drag the points to see how the best-fit line changes.
Example (Best-fit parabola). Find the parabola that best approximates the data
points
(3, 2)
(−1, 1/2)
(2, −1/2)
(1, −1)
y = B x 2 + C x + D.
If the four points were to lie on this parabola, then the following equations would
be satisfied:
1
= B(−1)2 + C(−1) + D
2
−1 = B(1)2 + C(1) + D
1 (7.5.2)
− = B(2)2 + C(2) + D
2
2 = B(3)2 + C(3) + D.
We treat this as a system of equations in the unknowns B, C, D. In matrix form,
we can write this as Ax = b for
1 −1 1 1/2
B
1 1 1 −1
A= x = C b= .
4 2 1 −1/2
D
9 3 1 2
379
88 y = 53x 2 − x − 82.
5
379
88 y = 53x 2 − x − 82
5
(3, 2)
(−1, 0.5)
Now we consider what exactly the parabola y = f (x) is minimizing. The least-
squares solution bx minimizes the sum of the squares of the entries of the vector
b − Ab
x . The vector b is the left-hand side of (7.5.2), and
53 379
88 (−1)
2
− 440 (−1) − 41 f (−1)
44
53 (1)2 379
− 440 (1) − 41 f (1)
x=
Ab 88 41 =
44
.
53 (2)2
88
379
− 440 (2) − 44 f (2)
53
88 (3)
2 379
− 440 (3) − 41
44
f (3)
In other words, Ab x is the vector whose entries are the y-coordinates of the graph
of the parabola at the values of x we specified in our data points, and b is the
vector whose entries are the y-coordinates of those data points. The difference
b − Abx is the vertical distance of the graph from the data points:
7.5. THE METHOD OF LEAST SQUARES 393
379
88 y = 53x 2 − x − 82
5
21
220
7
− 220
− 14
55
21
110
1/2
−7/220
53/88
−1 21/110
x =
b − Ab − A −379/440 =
−1/2 −14/55
−41/44
2 21/220
The best-fit parabola minimizes the sum of the squares of these vertical dis-
tances.
The best-fit parabola minimizes the sum of the squares of the vertical distances (vio-
let). Click and drag the points to see how the best-fit parabola changes.
Example (Best-fit linear function). Find the linear function f (x, y) that best ap-
proximates the following data:
x y f (x, y)
1 0 0
0 1 1
−1 0 3
0 −1 4
What quantity is being minimized?
Solution. The general equation for a linear function in two variables is
f (x, y) = B x + C y + D.
We want to solve the following system of equations in the unknowns B, C, D:
B(1) + C(0) + D = 0
B(0) + C(1) + D = 1
(7.5.3)
B(−1) + C(0) + D = 3
B(0) + C(−1) + D = 4.
394 CHAPTER 7. ORTHOGONALITY
1 0 1 0
B
0 1 1 1
A= x = C b = .
−1 0 1 3
D
0 −1 1 4
(0, 1, 1)
f (1, 0) f (0, 1)
(1, 0, 0)
7.5. THE METHOD OF LEAST SQUARES 395
Now we consider what quantity is being minimized by the function f (x, y).
The least-squares solution b
x minimizes the sum of the squares of the entries of the
vector b − Ab
x . The vector b is the right-hand side of (7.5.3), and
− 23 (1) − 32 (0) + 2 f (1, 0)
− 3 (0) − 3 (1) + 2 f (0, 1)
x =
Ab − 3 (−1) − 3 (0) + 2 = f (−1, 0) .
2 2
2 2
− 32 (0) − 32 (−1) + 2 f (0, −1)
In other words, Ab x is the vector whose entries are the values of f evaluated on
the points (x, y) we specified in our data table, and b is the vector whose entries
are the desired values of f evaluated at those points. The difference b − Ab x is
the vertical distance of the graph from the data points, as indicated in the above
picture. The best-fit linear function minimizes the sum of these vertical distances.
The best-fit linear function minimizes the sum of the squares of the vertical distances
(violet). Click and drag the points to see how the best-fit linear function changes.
All of the above examples have the following form: some number of data points
(x, y) are specified, and we want to find a function
y = B1 g1 (x) + B2 g2 (x) + · · · + Bm g m (x)
that best approximates these points, where g1 , g2 , . . . , g m are fixed functions of
x. Indeed, in the best-fit line example we had g1 (x) = x and g2 (x) = 1; in the
best-fit parabola example we had g1 (x) = x 2 , g2 (x) = x, and g3 (x) = 1; and
in the best-fit linear function example we had g1 (x 1 , x 2 ) = x 1 , g2 (x 1 , x 2 ) = x 2 ,
and g3 (x 1 , x 2 ) = 1 (in this example we take x to be a vector with two entries).
We evaluate the above equation on the given data points to obtain a system of
linear equations in the unknowns B1 , B2 , . . . , Bm —once we evaluate the g i , they
just become numbers, so it does not matter what they are—and we find the least-
squares solution. The resulting best-fit function minimizes the sum of the squares
of the vertical distances from the graph of y = f (x) to our original data points.
To emphasize that the nature of the functions g i really is irrelevant, consider
the following example.
Example (Best-fit trigonometric function). What is the best-fit function of the form
y = B + C cos(x) + D sin(x) + E cos(2x) + F sin(2x) + G cos(3x) + H sin(3x)
passing through the points
−4 −3 −2 −1 0 1 2 3 4
, , , , , , , , ?
−1 0 −1.5 .5 1 −1 −.5 2 −1
396 CHAPTER 7. ORTHOGONALITY
(3, 2)
(0, 1)
(−1, .5)
(−3, 0)
(2, −.5) (4, −1)
(−4, −1) (1, −1)
(−2, −1.5)
All of the terms in these equations are numbers, except for the unknowns
B, C, D, E, F, G, H:
−0.1435
0.2611
−0.2337
x ≈ 1.116 .
b
−0.5997
−0.2767
0.1076
(3, 2)
(0, 1)
(−1, .5)
(−3, 0)
(2, −.5)
(4, −1)
(−4, −1) (1, −1)
(−2, −1.5)
y ≈ −0.14 + 0.26 cos(x) − 0.23 sin(x) + 1.11 cos(2x) − 0.60 sin(2x) − 0.28 cos(3x) + 0.11 sin(3x)
398 CHAPTER 7. ORTHOGONALITY
As in the previous examples, the best-fit function minimizes the sum of the
squares of the vertical distances from the graph of y = f (x) to the data points.
The best-fit function minimizes the sum of the squares of the vertical distances (violet).
Click and drag the points to see how the best-fit function changes.
The next example has a somewhat different flavor from the previous ones.
Example (Best-fit ellipse). Find the best-fit ellipse through the points
(0, 2), (2, 1), (1, −1), (−1, −2), (−3, 1), (−1, −1).
(0, 2)
(−3, 1)
(2, 1)
(−1, 1)
(1, −1)
(−1, −2)
This is an implicit equation: the ellipse is the set of all solutions of the equation,
just like the unit circle is the set of solutions of x 2 + y 2 = 1. To say that our data
points lie on the ellipse means that the above equation is satisfied for the given
7.5. THE METHOD OF LEAST SQUARES 399
values of x and y:
(0)2 + B(2)2 + C(0)(2) + D(0) + E(2) + F = 0
(2)2 + B(1)2 + C(2)(1) + D(2) + E(1) + F = 0
(1)2 + B(−1)2 + C(1)(−1) + D(1) + E(−1) + F = 0
(7.5.4)
(−1)2 + B(−2)2 + C(−1)(−2) + D(−1) + E(−2) + F = 0
(−3)2 + B(1)2 + C(−3)(1) + D(−3) + E(1) + F = 0
(−1)2 + B(−1)2 + C(−1)(−1) + D(−1) + E(−1) + F = 0.
To put this in matrix form, we move the constant terms to the right-hand side of
the equals sign; then we can write this as Ax = b for
4 0 0 2 1 0
B
1 2 2 1 1 −4
C
1 −1 1 −1 1 −1
A= x = D b = .
4 2 −1 −2 1 −1
E
1 −3 −3 1 1 −9
F
1 1 −1 −1 1 −1
We compute
7 −5
−19
36 0 12
7 19 9 −5 1 17
A A = −5
T
9 16 1 −2 A b = 20 .
T
0 −5 1 12 0 −9
12 1 −2 0 6 −16
We form an augmented matrix and row reduce:
7 −5 0 12 −19
36 1 0 0 0 0 405/266
7 19 9 −5 1 17 0 1 0 0 0 −89/133
RREF
−5 9 16 1 −2 20 −−→ 0 0 1 0 0 201/133 .
0 −5 1 12 0 −9 0 0 0 1 0 −123/266
12 1 −2 0 6 −16 0 0 0 0 1 −687/133
The least-squares solution is
405/266
−89/133
x = 201/133 ,
b
−123/266
−687/133
so the best-fit ellipse is
405 2 89 201 123 687
x2 + y − xy+ x− y− = 0.
266 133 133 266 133
Multiplying through by 266, we can write this as
266x 2 + 405 y 2 − 178x y + 402x − 123 y − 1374 = 0.
400 CHAPTER 7. ORTHOGONALITY
(0, 2)
(−3, 1)
(2, 1)
(−1, 1)
(1, −1)
(−1, −2)
contains the rest of the terms on the left-hand side of (7.5.4). Therefore, the
x − b are the quantities obtained by evaluating the function
entries of Ab
minimized =
f (0, 2)2 + f (2, 1)2 + f (1, −1)2 + f (−1, −2)2 + f (−3, 1)2 + f (−1, −1)2 .
7.5. THE METHOD OF LEAST SQUARES 401
One way to visualize this is as follows. We can put this best-fit problem into the
framework of this example by asking to find an equation of the form
f (x, y) = x 2 + B y 2 + C x y + Dx + E y + F
x y f (x, y)
0 2 0
2 1 0
1 −1 0
−1 −2 0
−3 1 0
−1 −1 0.
The resulting function minimizes the sum of the squares of the vertical distances
from these data points (0, 2, 0), (2, 1, 0), . . ., which lie on the x y-plane, to the
graph of f (x, y).
The best-fit ellipse minimizes the sum of the squares of the vertical distances (violet)
from the points (x, y, 0) to the graph of f (x, y) on the left. The ellipse itself is the
zero set of f (x, y), on the right. Click and drag the points on the right to see how the
best-fit ellipse changes. Can you arrange the points so that the best-fit conic section
is actually a hyperbola?
Note. Gauss invented the method of least squares to find a best-fit ellipse: he
correctly predicted the (elliptical) orbit of the asteroid Ceres as it passed behind
the sun in 1801.
402 CHAPTER 7. ORTHOGONALITY
Appendix A
Complex Numbers
In this Appendix we give a brief review of the arithmetic and basic properties of
the complex numbers.
As motivation, notice that the rotation matrix
0 −1
A=
1 0
Definition.
The real numbers are just the complex numbers of the form a + 0i, so that R is
contained in C.
a
We can identify C with R2 by a + bi ←→ b . So when we draw a picture of C,
we draw the plane:
i
1
real axis
1−i
imaginary axis
403
404 APPENDIX A. COMPLEX NUMBERS
a + bi = a − bi.
z+w=z+w and zw = z · w.
f (x) = (x − λ1 )(x − λ2 ) · · · (x − λn )
Degree-3 Polynomials. A real cubic polynomial has either three real roots, or one
real root and a conjugate pair of complex roots.
For example, f (x) = x 3 − x = x(x − 1)(x + 1) has three real roots; its graph
looks like this:
406 APPENDIX A. COMPLEX NUMBERS
has one real root at 5 and a conjugate pair of complex roots ±i. Its graph looks
like this:
Appendix B
Notation
The following table defines the notation used in this book. Page numbers or ref-
erences refer to the first appearance of each symbol.
407
408 APPENDIX B. NOTATION
409
410 APPENDIX C. HINTS AND SOLUTIONS TO SELECTED EXERCISES
Appendix D
411
412 APPENDIX D. GNU FREE DOCUMENTATION LICENSE
A “Modified Version” of the Document means any work containing the Doc-
ument or a portion of it, either copied verbatim, or with modifications and/or
translated into another language.
A “Secondary Section” is a named appendix or a front-matter section of the
Document that deals exclusively with the relationship of the publishers or authors
of the Document to the Document’s overall subject (or to related matters) and con-
tains nothing that could fall directly within that overall subject. (Thus, if the Doc-
ument is in part a textbook of mathematics, a Secondary Section may not explain
any mathematics.) The relationship could be a matter of historical connection
with the subject or with related matters, or of legal, commercial, philosophical,
ethical or political position regarding them.
The “Invariant Sections” are certain Secondary Sections whose titles are desig-
nated, as being those of Invariant Sections, in the notice that says that the Docu-
ment is released under this License. If a section does not fit the above definition of
Secondary then it is not allowed to be designated as Invariant. The Document may
contain zero Invariant Sections. If the Document does not identify any Invariant
Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-
Cover Texts or Back-Cover Texts, in the notice that says that the Document is re-
leased under this License. A Front-Cover Text may be at most 5 words, and a
Back-Cover Text may be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, rep-
resented in a format whose specification is available to the general public, that is
suitable for revising the document straightforwardly with generic text editors or
(for images composed of pixels) generic paint programs or (for drawings) some
widely available drawing editor, and that is suitable for input to text formatters
or for automatic translation to a variety of formats suitable for input to text for-
matters. A copy made in an otherwise Transparent file format whose markup, or
absence of markup, has been arranged to thwart or discourage subsequent mod-
ification by readers is not Transparent. An image format is not Transparent if
used for any substantial amount of text. A copy that is not “Transparent” is called
“Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII with-
out markup, Texinfo input format, LaTeX input format, SGML or XML using a
publicly available DTD, and standard-conforming simple HTML, PostScript or PDF
designed for human modification. Examples of transparent image formats include
PNG, XCF and JPG. Opaque formats include proprietary formats that can be read
and edited only by proprietary word processors, SGML or XML for which the DTD
and/or processing tools are not generally available, and the machine-generated
HTML, PostScript or PDF produced by some word processors for output purposes
only.
The “Title Page” means, for a printed book, the title page itself, plus such fol-
lowing pages as are needed to hold, legibly, the material this License requires to
appear in the title page. For works in formats which do not have any title page
as such, “Title Page” means the text near the most prominent appearance of the
413
2. VERBATIM COPYING You may copy and distribute the Document in any
medium, either commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License applies to the Doc-
ument are reproduced in all copies, and that you add no other conditions what-
soever to those of this License. You may not use technical measures to obstruct
or control the reading or further copying of the copies you make or distribute.
However, you may accept compensation in exchange for copies. If you distribute
a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you
may publicly display copies.
3. COPYING IN QUANTITY If you publish printed copies (or copies in media that
commonly have printed covers) of the Document, numbering more than 100, and
the Document’s license notice requires Cover Texts, you must enclose the copies
in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts
on the front cover, and Back-Cover Texts on the back cover. Both covers must also
clearly and legibly identify you as the publisher of these copies. The front cover
must present the full title with all words of the title equally prominent and visible.
You may add other material on the covers in addition. Copying with changes
limited to the covers, as long as they preserve the title of the Document and satisfy
these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should
put the first ones listed (as many as fit reasonably) on the actual cover, and con-
tinue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more
than 100, you must either include a machine-readable Transparent copy along with
each Opaque copy, or state in or with each Opaque copy a computer-network lo-
cation from which the general network-using public has access to download using
public-standard network protocols a complete Transparent copy of the Document,
414 APPENDIX D. GNU FREE DOCUMENTATION LICENSE
free of added material. If you use the latter option, you must take reasonably pru-
dent steps, when you begin distribution of Opaque copies in quantity, to ensure
that this Transparent copy will remain thus accessible at the stated location until
at least one year after the last time you distribute an Opaque copy (directly or
through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document
well before redistributing any large number of copies, to give them a chance to
provide you with an updated version of the Document.
4. MODIFICATIONS You may copy and distribute a Modified Version of the Doc-
ument under the conditions of sections 2 and 3 above, provided that you release
the Modified Version under precisely this License, with the Modified Version fill-
ing the role of the Document, thus licensing distribution and modification of the
Modified Version to whoever possesses a copy of it. In addition, you must do these
things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of
the Document, and from those of previous versions (which should, if there
were any, be listed in the History section of the Document). You may use the
same title as a previous version if the original publisher of that version gives
permission.
B. List on the Title Page, as authors, one or more persons or entities respon-
sible for authorship of the modifications in the Modified Version, together
with at least five of the principal authors of the Document (all of its prin-
cipal authors, if it has fewer than five), unless they release you from this
requirement.
C. State on the Title page the name of the publisher of the Modified Version, as
the publisher.
F. Include, immediately after the copyright notices, a license notice giving the
public permission to use the Modified Version under the terms of this License,
in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required
Cover Texts given in the Document’s license notice.
I. Preserve the section Entitled “History”, Preserve its Title, and add to it an
item stating at least the title, year, new authors, and publisher of the Modi-
fied Version as given on the Title Page. If there is no section Entitled “History”
in the Document, create one stating the title, year, authors, and publisher of
the Document as given on its Title Page, then add an item describing the
Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access
to a Transparent copy of the Document, and likewise the network locations
given in the Document for previous versions it was based on. These may be
placed in the “History” section. You may omit a network location for a work
that was published at least four years before the Document itself, or if the
original publisher of the version it refers to gives permission.
L. Preserve all the Invariant Sections of the Document, unaltered in their text
and in their titles. Section numbers or the equivalent are not considered part
of the section titles.
M. Delete any section Entitled “Endorsements”. Such a section may not be in-
cluded in the Modified Version.
The author(s) and publisher(s) of the Document do not by this License give
permission to use their names for publicity for or to assert or imply endorsement
of any Modified Version.
5. COMBINING DOCUMENTS You may combine the Document with other doc-
uments released under this License, under the terms defined in section 4 above
for modified versions, provided that you include in the combination all of the In-
variant Sections of all of the original documents, unmodified, and list them all
as Invariant Sections of your combined work in its license notice, and that you
preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple
identical Invariant Sections may be replaced with a single copy. If there are mul-
tiple Invariant Sections with the same name but different contents, make the title
of each such section unique by adding at the end of it, in parentheses, the name of
the original author or publisher of that section if known, or else a unique number.
Make the same adjustment to the section titles in the list of Invariant Sections in
the license notice of the combined work.
In the combination, you must combine any sections Entitled “History” in the
various original documents, forming one section Entitled “History”; likewise com-
bine any sections Entitled “Acknowledgements”, and any sections Entitled “Dedi-
cations”. You must delete all sections Entitled “Endorsements”.
electronic form. Otherwise they must appear on printed covers that bracket the
whole aggregate.
9. TERMINATION You may not copy, modify, sublicense, or distribute the Doc-
ument except as expressly provided under this License. Any attempt otherwise to
copy, modify, sublicense, or distribute it is void, and will automatically terminate
your rights under this License.
However, if you cease all violation of this License, then your license from a par-
ticular copyright holder is reinstated (a) provisionally, unless and until the copy-
right holder explicitly and finally terminates your license, and (b) permanently, if
the copyright holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is reinstated per-
manently if the copyright holder notifies you of the violation by some reasonable
means, this is the first time you have received notice of violation of this License
(for any work) from that copyright holder, and you cure the violation prior to 30
days after your receipt of the notice.
Termination of your rights under this section does not terminate the licenses
of parties who have received copies or rights from you under this License. If your
rights have been terminated and not permanently reinstated, receipt of a copy of
some or all of the same material does not give you any rights to use it.
10. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may
publish new, revised versions of the GNU Free Documentation License from time
to time. Such new versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See http://www.gnu.org/
copyleft/.
Each version of the License is given a distinguishing version number. If the
Document specifies that a particular numbered version of this License “or any later
version” applies to it, you have the option of following the terms and conditions
418 APPENDIX D. GNU FREE DOCUMENTATION LICENSE
either of that specified version or of any later version that has been published (not
as a draft) by the Free Software Foundation. If the Document does not specify a
version number of this License, you may choose any version ever published (not as
a draft) by the Free Software Foundation. If the Document specifies that a proxy
can decide which future versions of this License can be used, that proxy’s public
statement of acceptance of a version permanently authorizes you to choose that
version for the Document.
ADDENDUM: How to use this License for your documents To use this License
in a document you have written, include a copy of the License in the document
and put the following copyright and license notices just after the title page:
Copyright (C) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled "GNU
Free Documentation License".
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace
the “with. . . Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
419
If you have Invariant Sections without Cover Texts, or some other combination of
the three, merge those two alternatives to suit the situation.
If your document contains nontrivial examples of program code, we recom-
mend releasing these examples in parallel under your choice of free software li-
cense, such as the GNU General Public License, to permit their use in free software.
420 APPENDIX D. GNU FREE DOCUMENTATION LICENSE
Index
421
422 INDEX